Now with functions to assess disclosure risk versions 1.9-0

This version is currently available from github.com/gillian-raab/synthpop but we plan to have it on CRAN before 2026. It includes two functions disclosure() and multi.disclosure() provide measures of identity and attribute disclosure for the synthetic data compared to the same measures for the original. Parameters include the set of keys that it is anticipated that an intruder might know about units in the original file as well the target or targets for which the disclosure risk is being assessed. Details are in the help files for each function, A vignette disclosure.pdf (Practical Privacy Metrics for Synthetic Data) illustrates their use and can also be downloaded from here.

Differentially private (DP) algorithms synthpop versions 1.8-0

The two methods that synthesise categorical data from cross-tabulations can now be made to produce differentially private synthetic data. The two functions catall.syn and ipf.syn each have a new parameters epsilon to make the synthetic data DP. A paper evaluating the properties of DP synthetic data created by these methods Utility and Disclosure Risk for Differentially Private Synthetic Categorical Data can be accessed here.

Utility of synthetic data synthpop version 1.7-0

Two methods for synthesising categorical data (catall.syn) and ipf,syn There is also a new function utility.tables() that calculates and plots tables of utility measures. Options are all one-way tables, all two-way tables or three-way tables for a specified third variable along with pairs of all other variables. All utility functions, originally designed to be used for synthetic data objects of class synds created by the synthpop function syn() or syn.strata(), can now be used to compare one or more synthesised data sets with the original records, where the records are R data frames or lists of data frames.

The new features were added in synthpop version 1.7-0 which is available on CRAN since 2021/11/17.  A new utility vignette 'Assessing, visualizing and improving the utility of synthetic data' provides useful information on utility functions in synthpop and utility measures in general.

synthpop 1.5-0

A new version of synthpop (synthpop 1.5-0) was added to CRAN on 2018/08/16. Note, you should be running a recent version of R (e.g. 3.5.0) before installing it. Several new features and improvements are included. For example, a fuller range of options in the utility functions, as described in the paper 'General and specific utility measures for synthetic data' published in the Journal of the Royal Statistical Society: Series A. Also synthesis by log-linear models for categorical data is implemented. The method used is iterative proportional fitting, with models defined by the margins they constrain. The appearance of zero cells in the synthesised data is controlled by setting a prior for each cell. See the package NEWS file for more details.

Stay connected with us

Enter your email address to receive occasional updates

Submitting...

Something went wrong

Your email has been received

The new features were added in synthpop version 1.7-0 which is available on CRAN since 2021/11/17.  A new utility vignette 'Assessing, visualizing and improving the utility of synthetic data' provides useful information on utility functions in synthpop and utility measures in general.

A new version of synthpop (synthpop 1.5-0) was added to CRAN on 2018/08/16. Note, you should be running a recent version of R (e.g. 3.5.0) before installing it. Several new features and improvements are included. For example, a fuller range of options in the utility functions, as described in the paper 'General and specific utility measures for synthetic data' published in the Journal of the Royal Statistical Society: Series A. Also synthesis by log-linear models for categorical data is implemented. The method used is iterative proportional fitting, with models defined by the margins they constrain. The appearance of zero cells in the synthesised data is controlled by setting a prior for each cell. See the package NEWS file for more details.

Utility measures and methods available in the synthpop package have been extended and improved. There is also a new function that calculates and plots tables of utility measures. Options are all one-way tables, all two-way tables or three-way tables for a specified third variable along with pairs of all other variables. All utility functions, originally designed to be used for synthetic data objects of class created by the synthpop function or , can now be used to compare one or more synthesised data sets with the original records, where the records are R data frames or lists of data frames.

Stay connected with us

Enter your email address to receive occasional update

Submitting...

Something went wrong

Your email has been received