Friday, October 28, 2016

WES Synthetic File

Question

“A researcher is interested to know the process of generating WES synthetic files, especially for workplace surveys. Is there a document somewhere outlining that? She is trying to find out if the synthetic data have any resemblance with the master data, especially for descriptive analysis?”

Answer

Synthetic files contain dummy data and have the same record layout as the master files to enable researchers with remote access privileges to write and test their programs before sending them to Statistics Canada to be run against the actual master files.

“It is the researcher’s responsibility to ensure that their analysis programs run properly. To this end Statistics Canada provides them with synthetic files that they can use for development and testing of their programs. The synthetic files have the same format as the Master microdata files, but contain some artificial data and fewer records. The primary objective in the creation of the synthetic files is that the artificial data be consistent with the codeset and skip patterns in the questionnaire. This is important since realistic data are needed to test analysis programs properly. A secondary objective is to preserve, at least approximately, the marginal distributions of variables and the relationships between closely related variables from the Master microdata files.”

When creating a synthetic file to test programs it is necessary to have a minimum semblance of "realism". For example, one cannot test a program with a 10-variable regression model when there are only 9 observations in the dataset; another program may fail because the SDF has no observations with a given studied characteristic. But again, the data in the file is not real.