Question:
I have a researcher wanting to use the CCHS PUMFs to look at the sexual identity variable, but even though the question has been asked in the Socio-Demographic portion of the questionnaire since Cycle 2.1 and reports have been published citing the data (such as https://www150.statcan.gc.ca/n1/daily-quotidien/040615/dq040615b-eng.htm) neither of us can find this variable in any file other than the 2015-16 cycle. I hope we are not just missing something, but perhaps there is a reason this information was not made part of the PUMF files?
Answer:
I’ve received the following response from subject matter:
“In the past, the subject matter team preparing the PUMFs for release considered the sexual orientation concept as an indirect identifier and chose not to include it with the file. In 2015-2016, our team requested to the Microdata Release Committee to allow us to add the variable to our 2015-2016 PUMF. They agreed that the risk of disclosure was low and approved the addition of the variable. We will continue to assess the risk, but I imagine we will continue with the release of that variable going forward.”
Showing posts with label Public Use Microdata File (PUMF). Show all posts
Showing posts with label Public Use Microdata File (PUMF). Show all posts
Thursday, July 11, 2019
Wednesday, June 26, 2019
Canadian Community Health Survey PUMF 2017-2018
Question:
Given that the CCHS PUMF for 2015/2016 was only released earlier this year, the researcher realizes that the PUMF for 2017-2018 is probably not in the very near future, but would like to know if there is any estimated timeframe for its release.
Answer:
We do not have an official release date for the CCHS 2017-2018 PUMF. Last I had spoken to subject matter, they were in the collection stage and didn’t expect to have anything released for another 1-2 years.
We will be updating our list of expected releases on the website shortly!
Given that the CCHS PUMF for 2015/2016 was only released earlier this year, the researcher realizes that the PUMF for 2017-2018 is probably not in the very near future, but would like to know if there is any estimated timeframe for its release.
Answer:
We do not have an official release date for the CCHS 2017-2018 PUMF. Last I had spoken to subject matter, they were in the collection stage and didn’t expect to have anything released for another 1-2 years.
We will be updating our list of expected releases on the website shortly!
Labels:
Health,
Public Use Microdata File (PUMF)
Thursday, June 20, 2019
Uniform Crime Reporting - Recent Files
Question:
I have a researcher looking at the Uniform Crime Reporting. They are wondering if you will be releasing an update or more current data. Or should we be looking at another dataset?
Answer:
The Masterfile will be released to the RDCs at the end of July, however there is no intention of creating a PUMF.
I have a researcher looking at the Uniform Crime Reporting. They are wondering if you will be releasing an update or more current data. Or should we be looking at another dataset?
Answer:
The Masterfile will be released to the RDCs at the end of July, however there is no intention of creating a PUMF.
Monday, June 10, 2019
CCHS Master File - Chinese Immigrants
Question:
A researcher is looking at possibly submitting an application to the RDC to access the CCHS-IMDB around healthy immigrant effect on older Chinese immigrants. She is wondering if it was possible to find out what the sample size is for the variable born in China? I have checked the CCHS PUMF and this table but both only have general options of white, black, Asian, or other and not the specific country or ethnic origin she is looking for.
Answer:
I’ve spoken to one of the analysts at the RDC and although she would not be able to confirm sample sizes at the moment (as they do not have the data yet), she doesn’t anticipate there will be a problem with the proposal, given the sample size. Of course the proposal does go through institutional review, so if there are any problems anticipated for the researcher it would be flagged at that point.
A researcher is looking at possibly submitting an application to the RDC to access the CCHS-IMDB around healthy immigrant effect on older Chinese immigrants. She is wondering if it was possible to find out what the sample size is for the variable born in China? I have checked the CCHS PUMF and this table but both only have general options of white, black, Asian, or other and not the specific country or ethnic origin she is looking for.
Answer:
I’ve spoken to one of the analysts at the RDC and although she would not be able to confirm sample sizes at the moment (as they do not have the data yet), she doesn’t anticipate there will be a problem with the proposal, given the sample size. Of course the proposal does go through institutional review, so if there are any problems anticipated for the researcher it would be flagged at that point.
Friday, June 7, 2019
New files on Statistics Canada Nesstar - CCHS 2015
We are pleased to inform you that the following are now available on the Statistics Canada Nesstar WebView site.
Canadian Community Health Survey (CCHS) 2015 Nutrition - PUMF
Canadian Community Health Survey (CCHS) 2015-2016 Annual - PUMF
Canadian Community Health Survey (CCHS) 2015 Nutrition - PUMF
Canadian Community Health Survey (CCHS) 2015-2016 Annual - PUMF
Friday, May 31, 2019
2016 Labour Force Survey - missing NOCs
Question:
A researcher is looking for NOC occupation groups for the monthly 2016 Labour Force Survey PUMFs. Is this the way it should be represented: NOCS_01_25 STC Nesstar and NOCS_01_47 STC Nesstar when there actually are no such variables in 2016?
Is it correct that the values are 100% missing for in 2016? It would be helpful to have NOC values for 2016.
What was the intention in 2016 when there are NOC values for the PUMF for these variables, e.g., on Odesi see January 2015: NOCS_01_25 link and NOC_01_47 link and for January 2017: NOC_10 LINK and NOC_40 LINK? Would this be considered a problem with the data files?
As an aside, there are SOC80_21 and SOC80_49 variables (with values) in 2015 and 2016. Revisions to the 2015 LFS
Follow-Up Question:
I’d like to make a correction please to my question, with thanks to Scholars Portal for reviewing the occupation variables from 2015 – 2017 LFS monthly PUMFs.
Is it correct that there are supposed to be placeholders for a number of occupation variables but no data as highlighted in yellow below?
A researcher is looking for NOC occupation groups for the monthly 2016 Labour Force Survey PUMFs. Is this the way it should be represented: NOCS_01_25 STC Nesstar and NOCS_01_47 STC Nesstar when there actually are no such variables in 2016?
Is it correct that the values are 100% missing for in 2016? It would be helpful to have NOC values for 2016.
What was the intention in 2016 when there are NOC values for the PUMF for these variables, e.g., on Odesi see January 2015: NOCS_01_25 link and NOC_01_47 link and for January 2017: NOC_10 LINK and NOC_40 LINK? Would this be considered a problem with the data files?
As an aside, there are SOC80_21 and SOC80_49 variables (with values) in 2015 and 2016. Revisions to the 2015 LFS
Follow-Up Question:
I’d like to make a correction please to my question, with thanks to Scholars Portal for reviewing the occupation variables from 2015 – 2017 LFS monthly PUMFs.
Is it correct that there are supposed to be placeholders for a number of occupation variables but no data as highlighted in yellow below?
- All the LFS monthly PUMFs, 1987-2015, have data for NOCS-01-25 and NOCS_01_47, but as for the variables SOC80_21 and SOC80_49, there is no data.
- The monthly 2016 LFS PUMFs have variables named NOCS_01_25, NOCS_01_47, SOC80_21 and SOC80_49 but with no data.
- The monthly 2017 LFS PUMFs have two occupation variables only, the new NOC_10 and NOC_40.
Answer:
The short answer to this is simply that the labels have changed from the older years to the newer ones. For example, SOC80_21 was replaced with an NOC listing instead. From what I’ve been told, the descriptions of the variables should remain the same, it’s just a matter of needing to match them up from year to year (not necessarily the answer you were looking for I’m sure!)
So yes, it is correct that there is no data.
Wednesday, May 22, 2019
Questions about Housing Data
Question:
An Education researcher here wants to work with data at the DA level that describes dwelling characteristics including rental costs, housing values, tenure.
While he may find suitable data through dwelling Variables in the RDC using the 2006 and 2016 Census files, and the 2011 NHS he was wondering how he may access data from the Canadian Housing Statistics Program (CHSP).
His questions on the CHSP are as follows:
An Education researcher here wants to work with data at the DA level that describes dwelling characteristics including rental costs, housing values, tenure.
While he may find suitable data through dwelling Variables in the RDC using the 2006 and 2016 Census files, and the 2011 NHS he was wondering how he may access data from the Canadian Housing Statistics Program (CHSP).
His questions on the CHSP are as follows:
- What reference periods are available
- If the program does not go back earlier than 2017, did another program collect comparable data?
- Will a PUMF be produced under this program?
- He may wish to explore a Custom Tabulation, if so would data at the DA level on the above noted dwelling characteristics (and possibly additional variables) be available?
Answer:
I’ve received the following response from subject matter:
What reference periods are available
- 2018
If the program does not go back earlier than 2017, did another program collect comparable data?
- The Census is likely the best source.
Will a PUMF be produced under this program?
- This is not planned.
He may wish to explore a Custom Tabulation, if so would data at the DA level on the above noted dwelling characteristics (and possibly additional variables) be available?
No, the CHSP provides data at the CSD level and above.
Labels:
Public Use Microdata File (PUMF)
Tuesday, May 21, 2019
Estimated Release of 2017 APS PUMF
Question:
I have a researcher interested in the 2017 Aboriginal Peoples Survey data. When might the PUMF be released? It’s not listed on the tentative release dates page yet. I understand if nothing specific is known at this time, but it would be helpful to know if it won’t be in 2019, for example.
Answer:
The 2017 APS PUMF will not be disseminated until 2020 – the exact date hasn’t been determined yet however.
I have a researcher interested in the 2017 Aboriginal Peoples Survey data. When might the PUMF be released? It’s not listed on the tentative release dates page yet. I understand if nothing specific is known at this time, but it would be helpful to know if it won’t be in 2019, for example.
Answer:
The 2017 APS PUMF will not be disseminated until 2020 – the exact date hasn’t been determined yet however.
Tuesday, May 14, 2019
PCCF7B / FCCP7B
The latest version of the PCCF file (PCCF7B) is now available on the EFT. It can be found at the following location:
/MAD_PCCF_FCCP_DAM/ROOT/2019/PCCF7B.zip
/MAD_PCCF_FCCP_DAM/ROOT/2019/PCCF7B.zip
Monday, May 13, 2019
GSS Victimization - Territories and Geography
Question:
I have a question about microdata files available for the GSS (victimization) cycle 28, or previous. A researcher is interested in joining data from this survey with data from the Census and therefore requires more detailed geographic information than what is available with the DLI PUMF. In the PUMF, the geographic variables only 'go down' to the provincial level. The restricted dataset has many more geographic variables available, which would be better in this case.
However, the microdata files aren't consistent with their inclusion of data from the territories vs provinces. None of the DLI PUMFs contain data from the territories. The cycle 28 RDC dataset only contains variables for responses from the provinces, while the cycle 23 RDC dataset only contains responses from the territories.
Are there any microdata datasets (RDC/DLI) that contain responses from both the provinces and territories, that will have more specific geographic variables?
Answer:
I’ve received the following response from subject matter:
“It is never a good idea to ‘join data’ from two different surveys, given differences in in their populations, methodologies and time frames.
In general, the GSS is only carried out in the provinces, the exception being the Victimization cycle, last completed in 2014.
As of 2019, the GSS targets a sample size of approximately 20,000 respondents. Sometimes a cycle has a higher target sample size if funding has been received for an oversample, either in the form of a geographical sample top-up (i.e., adding more units in certain geographic areas), a targeted oversample (e.g. focussing on immigrants, youth, or another population group), or a general oversample (i.e., increasing the raw sample size). With a final sample of 20,000 respondents, basic survey estimates are usually available for the national and regional levels, and for some provinces and census metropolitan areas. Depending on the survey topic, the sample size may be sufficient to produce estimates for certain population groups such as lone parent families, certain visible minority groups or seniors.”
I have a question about microdata files available for the GSS (victimization) cycle 28, or previous. A researcher is interested in joining data from this survey with data from the Census and therefore requires more detailed geographic information than what is available with the DLI PUMF. In the PUMF, the geographic variables only 'go down' to the provincial level. The restricted dataset has many more geographic variables available, which would be better in this case.
However, the microdata files aren't consistent with their inclusion of data from the territories vs provinces. None of the DLI PUMFs contain data from the territories. The cycle 28 RDC dataset only contains variables for responses from the provinces, while the cycle 23 RDC dataset only contains responses from the territories.
Are there any microdata datasets (RDC/DLI) that contain responses from both the provinces and territories, that will have more specific geographic variables?
Answer:
I’ve received the following response from subject matter:
“It is never a good idea to ‘join data’ from two different surveys, given differences in in their populations, methodologies and time frames.
In general, the GSS is only carried out in the provinces, the exception being the Victimization cycle, last completed in 2014.
As of 2019, the GSS targets a sample size of approximately 20,000 respondents. Sometimes a cycle has a higher target sample size if funding has been received for an oversample, either in the form of a geographical sample top-up (i.e., adding more units in certain geographic areas), a targeted oversample (e.g. focussing on immigrants, youth, or another population group), or a general oversample (i.e., increasing the raw sample size). With a final sample of 20,000 respondents, basic survey estimates are usually available for the national and regional levels, and for some provinces and census metropolitan areas. Depending on the survey topic, the sample size may be sufficient to produce estimates for certain population groups such as lone parent families, certain visible minority groups or seniors.”
Labels:
Census,
Geography,
Public Use Microdata File (PUMF)
Tuesday, April 30, 2019
Public Service Employee Survey (PSES)
Question:
We are getting more demand from profs and grad students for the PSES 2017, 2014 and 2011 cycle PUMFs. Is there any chance we will see any of these cycles as PUMFs? This PUMF question has probably been asked and answered, but if it could be re-asked, our researchers would very much appreciate it.
Answer:
We’ve received the following response from subject matter:
“PUMFs are only created when clients request and fund them. That has not been the case for PSES for the last few cycles. 2008 was the most recent cycle when a PUMF was created. I don’t expect that TBS (the PSES sponsor) is planning to order PUMFs for the 2011-2017 cycles.
The master files for all cycles are available in the RDCs, if that’s at all helpful for the researchers.”
We are getting more demand from profs and grad students for the PSES 2017, 2014 and 2011 cycle PUMFs. Is there any chance we will see any of these cycles as PUMFs? This PUMF question has probably been asked and answered, but if it could be re-asked, our researchers would very much appreciate it.
Answer:
We’ve received the following response from subject matter:
“PUMFs are only created when clients request and fund them. That has not been the case for PSES for the last few cycles. 2008 was the most recent cycle when a PUMF was created. I don’t expect that TBS (the PSES sponsor) is planning to order PUMFs for the 2011-2017 cycles.
The master files for all cycles are available in the RDCs, if that’s at all helpful for the researchers.”
Labels:
Government,
Labour,
Public Use Microdata File (PUMF)
Wednesday, April 17, 2019
PCCF: Population centre and rural area classification size; value =0
Question:
I did a quick search in the DLI listserv archive because I vaguely remembered seeing this question before, and indeed I found that the question was asked twice… but I could not find the answer to it on the listserv, so I will ask it again!
In the Population centre and rural area classification size of the PCCF, why are there many records coded 0 . This does not correspond to a valid value listed in the documentation (from 1 to 4 depending on rural area or size of the population centre).
Answer:
Here is an earlier answer from StatCan about this. Please take a look to see if it answers your question?:
I have been provided with the following information from Subject Matter. As it is quite technical, I’ll be pasting it directly as I received it from them:
------------------------------------
As the researcher is only looking at some of the records in the PCCF, depending on where these records (about 12,000) are, the possibility exists that the DAs they link to may or may not overlap PopCentres. With the Geographic Attribute File, we can create a correspondence between DAs and PopCentres, and test that correspondence with the researcher’s records.
If you wish to provide the researcher’s subset of PCCF records, we may be able to capture some useful PopCentre info. No promises, but this looks like a lead. Otherwise, the information provided by Subject Matter above is quite useful.
I did a quick search in the DLI listserv archive because I vaguely remembered seeing this question before, and indeed I found that the question was asked twice… but I could not find the answer to it on the listserv, so I will ask it again!
In the Population centre and rural area classification size of the PCCF, why are there many records coded 0 . This does not correspond to a valid value listed in the documentation (from 1 to 4 depending on rural area or size of the population centre).
Answer:
Here is an earlier answer from StatCan about this. Please take a look to see if it answers your question?:
I have been provided with the following information from Subject Matter. As it is quite technical, I’ll be pasting it directly as I received it from them:
------------------------------------
The PCCF starts by trying to link postal codes to block faces, but if it can’t it then moves up and tries to link them to DissBs, then DAs. The Rep_Pt_Type variable has the info on what level of link was made.
PopCentres are built from Dissemination Blocks (DissB), not Dissemination Areas (DA). See the hierarchy chart…
http://www12.statcan.gc.ca/census-recensement/2016/ref/dict/geoint-eng.cfm
Thus, if the PCCF cannot link to a DissB for a particular postal code, and has to move up to the DA level, there is no way to link the postal code to a PopCentre. Many DAs overlap multiple PopCentres, so we can’t choose just one.
For the 125,163 records with a POP_CNTR_RA_SIZE_CLASS of 0, these all have a Rep_Pt_Type of 3, meaning they linked to a DA. So, there is no PopCentre data available for them.
That is the explanation for why the PCCF has 0 in the POP_CNTR_RA_SIZE_CLASS.
-------------------------------------
If you wish to provide the researcher’s subset of PCCF records, we may be able to capture some useful PopCentre info. No promises, but this looks like a lead. Otherwise, the information provided by Subject Matter above is quite useful.
Friday, April 5, 2019
New Release: LFS March 2019
We are pleased to inform you that the following product is now available.
Labour Force Survey (LFS) – March 2019
This public use microdata file contains non-aggregated data for a wide variety of variables collected from the Labour Force Survey (LFS). The LFS collects monthly information on the labour market activities of Canada's working age population. This product is for users who prefer to do their own analysis by focusing on specific subgroups in the population or by cross-classifying variables that are not in our catalogued products. The Labour Force Survey estimates are based on a sample, and are therefore subject to sampling variability. Estimates for smaller geographic areas, industries, occupations or cross tabulations will have more variability. For an explanation of sampling variability of estimates, and how to use standard errors to assess this variability, consult the Data Quality section in the Guide to the Labour Force Survey.
EFT: /MAD_PUMF_FMGD_DAM/Root/3701_LFS_EPA/1976-2019/data/micro2019/micro2019-03.zip
Nesstar access will be available on Monday, April 8th.
Labour Force Survey (LFS) – March 2019
This public use microdata file contains non-aggregated data for a wide variety of variables collected from the Labour Force Survey (LFS). The LFS collects monthly information on the labour market activities of Canada's working age population. This product is for users who prefer to do their own analysis by focusing on specific subgroups in the population or by cross-classifying variables that are not in our catalogued products. The Labour Force Survey estimates are based on a sample, and are therefore subject to sampling variability. Estimates for smaller geographic areas, industries, occupations or cross tabulations will have more variability. For an explanation of sampling variability of estimates, and how to use standard errors to assess this variability, consult the Data Quality section in the Guide to the Labour Force Survey.
EFT: /MAD_PUMF_FMGD_DAM/Root/3701_LFS_EPA/1976-2019/data/micro2019/micro2019-03.zip
Nesstar access will be available on Monday, April 8th.
Wednesday, April 3, 2019
National Graduates Survey 2015
Question:
May I please have an update for the release of the PUMF of the 2015 NGS? Tentative release dates page just says 2019. Researcher is inquiring.
Answer:
I’ve received the following reply from subject matter:
“Unfortunately, we do not even have dates for the initial data release (current thinking is near the end of this year), but the PUMF could be up to a year after that. We hope to have a preliminary calendar of releases for this fiscal year by the end of this month.”
May I please have an update for the release of the PUMF of the 2015 NGS? Tentative release dates page just says 2019. Researcher is inquiring.
Answer:
I’ve received the following reply from subject matter:
“Unfortunately, we do not even have dates for the initial data release (current thinking is near the end of this year), but the PUMF could be up to a year after that. We hope to have a preliminary calendar of releases for this fiscal year by the end of this month.”
Monday, March 25, 2019
Survey of Hhld Spending PUMFs
Question:
Will there be PMNFs made for the SHS from 2012 to 2016?
Answer:
Subject matter has simply responded saying that they are creating a PUMF for the SHS 2017 – I suspect this is meant to indicate that there are no plans to create one for 2012 to 2016.
Will there be PMNFs made for the SHS from 2012 to 2016?
Answer:
Subject matter has simply responded saying that they are creating a PUMF for the SHS 2017 – I suspect this is meant to indicate that there are no plans to create one for 2012 to 2016.
Survey of Household Spending (SHS)
Question:
I am wondering if a recent version of the Survey of Household Spending (SHS) is available. I looked on the FTP server and the most recent data set is from 2009. But here it states there might be more current versions.
Answer:
The 2017 SHS PUMF is expected to be released in the Fall – this would be the most current version.
Follow-up Question:
Just to confirm, there will not be a 2016 PUMF?
Follow-up Answer:
Based on the answers I’ve received from subject matter, it appears as if the 2017 will be the only one they are producing.
I am wondering if a recent version of the Survey of Household Spending (SHS) is available. I looked on the FTP server and the most recent data set is from 2009. But here it states there might be more current versions.
Answer:
The 2017 SHS PUMF is expected to be released in the Fall – this would be the most current version.
Follow-up Question:
Just to confirm, there will not be a 2016 PUMF?
Follow-up Answer:
Based on the answers I’ve received from subject matter, it appears as if the 2017 will be the only one they are producing.
Friday, March 15, 2019
Question about Labour Force Survey 2016
Question:
I have a faculty member contacting me about the Labour Force Survey for 2016. The occupational variables for all the months in 2016 are blank. I double checked the data in Nesstar and none of the occupation variables include the year 2016 – they are all prior to 2015. The occupation variables return in the 2017 data. I was reading the documentation, and the occupation codes changed between 2015 and 2016, but I would still expect to find the new codes in 2016. Is this an error?
Answer:
Subject matter returned the following comments:
No this isn’t an error. The occupational codes were removed from the LFS PUMF in 2016 during product revision. The NOC groupings where changed in 2017 to better protect respondent confidentiality and mitigate the risk of disclosure.
During the reassessment phase in 2016, the classification was completely removed with no plans to reintroduce it. Which does in fact create a break in the series.
I have a faculty member contacting me about the Labour Force Survey for 2016. The occupational variables for all the months in 2016 are blank. I double checked the data in Nesstar and none of the occupation variables include the year 2016 – they are all prior to 2015. The occupation variables return in the 2017 data. I was reading the documentation, and the occupation codes changed between 2015 and 2016, but I would still expect to find the new codes in 2016. Is this an error?
Answer:
Subject matter returned the following comments:
No this isn’t an error. The occupational codes were removed from the LFS PUMF in 2016 during product revision. The NOC groupings where changed in 2017 to better protect respondent confidentiality and mitigate the risk of disclosure.
During the reassessment phase in 2016, the classification was completely removed with no plans to reintroduce it. Which does in fact create a break in the series.
Wednesday, March 13, 2019
Public Access to PUMFS
Question:
It was my understanding that with the Statistics Canada Open License, members of the public could access to PUMFs as long as they signed an agreement…?
Answer:
As of October 2018, PUMFs fall under the Statistics Canada Open Licence. More information on what users can now do with the data can be found here: https://www.statcan.gc.ca/eng/reference/licence and FAQs here: https://www.statcan.gc.ca/eng/reference/licence-faq. An agreement does not need to be signed in order for members of the public to access the PUMFs.
Follow-up Questions:
When I look at PUMFs on StatCan.gc.ca, I see that they’re available to the public via the data portal if they fill out the order form. Does this mean that a member of the public requesting a PUMF via statcan.gc.ca would still be asked to pay, while a member of the public requesting the PUMF from a DLI member institution would be able to access it free of charge?
Follow-up Answer:
As always, you can get PUMFs for free from subject matter. There is not yet a central location for free download of PUMFs however, but it’s in the works!
Follow-up Question:
Another question about the PUMF Open License, this one from a researcher. They ask:
Does that mean that PUMFs acquired via DLI could be shared outside of the institution without restriction, e.g. packaged as part of a textbook for broad dissemination?
Follow-up Answer:
PUMFs acquired via the DLI can be shared outside of the institution, provided that StatCan is cited in the documentation.
It was my understanding that with the Statistics Canada Open License, members of the public could access to PUMFs as long as they signed an agreement…?
Answer:
As of October 2018, PUMFs fall under the Statistics Canada Open Licence. More information on what users can now do with the data can be found here: https://www.statcan.gc.ca/eng/reference/licence and FAQs here: https://www.statcan.gc.ca/eng/reference/licence-faq. An agreement does not need to be signed in order for members of the public to access the PUMFs.
Follow-up Questions:
When I look at PUMFs on StatCan.gc.ca, I see that they’re available to the public via the data portal if they fill out the order form. Does this mean that a member of the public requesting a PUMF via statcan.gc.ca would still be asked to pay, while a member of the public requesting the PUMF from a DLI member institution would be able to access it free of charge?
Follow-up Answer:
As always, you can get PUMFs for free from subject matter. There is not yet a central location for free download of PUMFs however, but it’s in the works!
Follow-up Question:
Another question about the PUMF Open License, this one from a researcher. They ask:
Does that mean that PUMFs acquired via DLI could be shared outside of the institution without restriction, e.g. packaged as part of a textbook for broad dissemination?
Follow-up Answer:
PUMFs acquired via the DLI can be shared outside of the institution, provided that StatCan is cited in the documentation.
Wednesday, March 6, 2019
Census 2016 PUMF .dct file needed
Question:
I have a researcher who is using the Census 2016 PUMF with STATA. They have the Stata .do file, and the RAW .dat data file.
But they are telling me they need a Stata dictionary file .dct to load the .dat file.
I am not familiar with Stata. The .dat file is loading correctly in SPSS. I see from a little reading that one can create a .dct file describing the data and its location. But doe anyone else work with Stata. How are you generating a .dct file for the Census 2016 PUMF.
Answer:
I’ve been able to get hold of the .dct file and have placed it in the RAW folder (within both the English and French folders). Hopefully this is what the researcher needs!
I have a researcher who is using the Census 2016 PUMF with STATA. They have the Stata .do file, and the RAW .dat data file.
But they are telling me they need a Stata dictionary file .dct to load the .dat file.
I am not familiar with Stata. The .dat file is loading correctly in SPSS. I see from a little reading that one can create a .dct file describing the data and its location. But doe anyone else work with Stata. How are you generating a .dct file for the Census 2016 PUMF.
Answer:
I’ve been able to get hold of the .dct file and have placed it in the RAW folder (within both the English and French folders). Hopefully this is what the researcher needs!
Labels:
Census,
Public Use Microdata File (PUMF)
Thursday, February 21, 2019
Census 2016 PUMF Weights
Question:
A prof was delighted to be able to download the Census 2016 PUMF.
However, the weights do not seem to make sense to her.
These are her comments:
"I’m working with the PUMF file now and the weights look VERY odd. For starters, the file only contains 16 replicate weights – normally Stat Can provides 500 to 1000. For another thing, the individual (frequency weight) weight [variable name – WEIGHT] is nearly identical for all respondents. Is that what Stat Can intended?
Any information you can pass on would be really helpful."
Answer:
Subject matter has responded with the following:
“I’m happy to hear the prof is enjoying the PUMF.
We appreciate the concerns expressed by the user and we can assure them that the contents of the product are what Statistics Canada intended.
I would recommend that the users consult the PUMF user guide and codebook for answers to their questions. Chapter 3 – Sampling method, estimation and data quality should resolve any questions with regard to the Weight and replicate weight variables.”
Follow-up Question:
Here is the reply I received from my prof – and the answer provided by Subject division does not answer her question.
"Here’s an illustrative example where the application of the Weight variable creates some problems:
There are approximately 390,000 live births in Canada each year. When I run a cross tab to look at the number of women who are parents (couples or single parents) with a child under 1 year, I get a result of over 708,000, which just can’t be right."
I asked her for more detail to try to replicate how she got her answer and this is her reply:
"…my tabulation is restricted to women (Sex), of reproductive age (20 to 44 years per AGEGRP) who are in a couple with children or are single parents (CFSTAT) and who live with a child age 1 or younger (PKID0_1).
Here’s my Stata output for the derived variable (as per above) “newmom”
. tabulate newmom [fweight = rndweight]
newmom | Freq. Percent Cum.
------------+-----------------------------------
1 | 708,291 100.00 100.00
------------+-----------------------------------
Total | 708,291 100.00"
Follow-up Answer:
I received the following response from subject matter:
“On page 109 of the 2016 Census Individuals PUMF User Guide, the PKID0_1 variable includes children aged 0 or 1. Therefore it provides data on presence of children as of Census Day and it includes children that are under the age of 2.
As a result, the counts of children in Census family here presents higher numbers. Foster children are not included in this grouping.”
A prof was delighted to be able to download the Census 2016 PUMF.
However, the weights do not seem to make sense to her.
These are her comments:
"I’m working with the PUMF file now and the weights look VERY odd. For starters, the file only contains 16 replicate weights – normally Stat Can provides 500 to 1000. For another thing, the individual (frequency weight) weight [variable name – WEIGHT] is nearly identical for all respondents. Is that what Stat Can intended?
Any information you can pass on would be really helpful."
Answer:
Subject matter has responded with the following:
“I’m happy to hear the prof is enjoying the PUMF.
We appreciate the concerns expressed by the user and we can assure them that the contents of the product are what Statistics Canada intended.
I would recommend that the users consult the PUMF user guide and codebook for answers to their questions. Chapter 3 – Sampling method, estimation and data quality should resolve any questions with regard to the Weight and replicate weight variables.”
Follow-up Question:
Here is the reply I received from my prof – and the answer provided by Subject division does not answer her question.
"Here’s an illustrative example where the application of the Weight variable creates some problems:
There are approximately 390,000 live births in Canada each year. When I run a cross tab to look at the number of women who are parents (couples or single parents) with a child under 1 year, I get a result of over 708,000, which just can’t be right."
I asked her for more detail to try to replicate how she got her answer and this is her reply:
"…my tabulation is restricted to women (Sex), of reproductive age (20 to 44 years per AGEGRP) who are in a couple with children or are single parents (CFSTAT) and who live with a child age 1 or younger (PKID0_1).
Here’s my Stata output for the derived variable (as per above) “newmom”
. tabulate newmom [fweight = rndweight]
newmom | Freq. Percent Cum.
------------+-----------------------------------
1 | 708,291 100.00 100.00
------------+-----------------------------------
Total | 708,291 100.00"
Follow-up Answer:
I received the following response from subject matter:
“On page 109 of the 2016 Census Individuals PUMF User Guide, the PKID0_1 variable includes children aged 0 or 1. Therefore it provides data on presence of children as of Census Day and it includes children that are under the age of 2.
As a result, the counts of children in Census family here presents higher numbers. Foster children are not included in this grouping.”
Labels:
Census,
Public Use Microdata File (PUMF)
Subscribe to:
Posts (Atom)