Abstract
The synMicrodata package provides a flexible and fully joint modeling approach for generating synthetic microdata containing both continuous and categorical variables. Built on a nonparametric Bayesian model with Dirichlet process priors, the package captures complex multivariate dependencies in original datasets, even in the presence of missing values. It generates multiple synthetic datasets through a modular workflow for data preprocessing, model fitting, and data synthesis. Simulation studies demonstrate that synMicrodata preserves key marginal statistics and achieves nominal coverage rates. The package produces competitive results when compared to existing synthetic data generation methods, under both complete and missing data scenarios. Consequently, synMicrodata is a valuable tool for ensuring privacy in data dissemination while enabling valid statistical inference on confidential data through simulation.
| Original language | English |
|---|---|
| Article number | 102541 |
| Journal | SoftwareX |
| Volume | 34 |
| DOIs | |
| State | Published - Jun 2026 |
Keywords
- Categorical variable
- Continuous variable
- Data privacy
- Dirichlet process priors
- Imputation
- Missing data
Fingerprint
Dive into the research topics of 'SynMicrodata: An R package for generating synthetic microdata via a nonparametric Bayesian approach'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver