Wednesday, February 27, 2019

GSS Cycle 31 and 32

I was wondering when the PUMF for the GSS Cycle 31 (Family) would be available? According to this product page - – the data was released on February 7, 2019.

For the GSS Cycle 32 (Caregiving and Care Receiving), the product page ( says the first release is scheduled for autumn 2019. Would this include the PUMF or would that be later? I have a researcher who is interested in using this survey for a research project and would like to give her an estimate for when the PUMF would be available.

We’ve received the following response from subject matter:

“For GSS Cycle 31 (Family), it is safe to say Winter 2020.
As for GSS Cycle 32 (Caregiving and Care Receiving), the release of the analytical file is planned for November 2019.  The PUMF for GSS 32 (Caregiving and Care Receiving) will not be ready till sometime in 2020; We don’t have a specific time frame at this time.

The researcher use the PUMF for the cycle 26 (2012 GSS on Caregiving and Care Receiving) in the interim, if they haven’t already.”

Farms Expenses Income Yields By Type of Farm Over Years

Researcher wants to know the smallest geography possible to obtain:
-farm income
-off farm income
--by specific type of farm
--- barley, canola, durum, pea and wheat by ‘small area’ in AB
---(or by Oilseed and grain farming [1111])


--- blueberries and grapes by ‘small area’ in BC
---(or by Fruit and tree nut farming [1113])

-by at least 5 separate (preferably consecutive) years

(i/we know of:
Estimated areas, yield and production of principal field crops by Small Area Data Regions

Farm financial survey, financial structure by farm type

(which has Oilseed and grain farming versus Fruit and tree nut farming) and its predecessor

Farm financial survey, financial structure by farm type, 2001-2011


Estimated areas, yield and production of principal field crops by Small Area Data Regions

Total and average off-farm income by source )

I’ve received the following response from subject matter:

“The best source of this information would be the Census of Agriculture which releases at something along the lines of township level geography.  The census of course is every five years not every year.

The off farm income would be from the AgPop linkage data set if you wanted it associated with the Census of AG (available only at province level). 

Intercensal data is available from our survey programs but those produce data at much higher levels of geography.” 

Tuesday, February 26, 2019

CCHS Rapid Response Files

I have a student looking to access data from two of the rapid response components of the CCHS: 

Food skills – knowledge, planning and transference of skills (2012)
Food skills – mechanical skills and food conceptualization (2013)  
I'm thinking that they are likely not available as PUMFs, as via odeso I am only seeing documentation for the first, and nothing for the second, an no further luck on the EFT. Just want to throw it out there to the community though to see if anyone has insights and/or if anyone knows if they may perhaps be available instead via RDCs before I go any further. 

Subject matter has responded with the following:
“These 2 Rapid response files are available via RDC request.”

Sex Education in the Schools

I just received this request from a prof.

Any tips would be greatly appreciated!

“I was looking for any sort of general population survey that actually measured whether someone had been exposed to sex education in their schooling. Hopefully, some other relevant aspects of their behaviour would be measured, but the key thing is a question measuring exposure to formal sex ed.”

We’ve received the following from subject matter:

“Please note that we do not have data that actually measured whether someone had been exposed to sex education in their schooling (details about specific classes or data before postsecondary.)”

Monday, February 25, 2019

Data Requested on Pay Frequency

A researcher is searching for a data source where he can get “employee pay frequency”, see figure 1 below, for the following variables:
- for private and public companies,
- industries at the 2 digit NAICS code level, &
- provinces and territories (or provinces).

Figure 1, e.g., two questions asked so that StatCan can calculate employees’ weekly earnings or hourly earnings


LFI_Q202 — [Including tips and commissions,] what is his/her hourly rate of pay? Go to 220

If no hourly rate then next question is …

LFI_Q204 — What is the easiest way for you to tell us his/her wage or salary, [including tips and commissions,] before taxes and other deductions? Would it be yearly, monthly, weekly, or on some other basis?

If “Yearly”, go to 209

If “Monthly”, go to 208

If “Semi-monthly”, go to 207

If “Bi-weekly”, go to 206

If “Weekly” or “Other”, go to 205


Source, Labour Force Survey user guide

I wasn’t able to find other surveys (other than the Survey of Employment, Payrolls and Hours) which ask about pay frequency, but then this was not an exhaustive search.

And please see U.S. example:- link:  “How frequently do private businesses pay workers?” [] Beyond the Numbers, May 2014, BLS

Re: Figure 2 … *Email included an attached screen capture of Fig. 2

Any leads would be appreciated.  It would be great if this could be retrieved from a PUMF or RTRA, but if not, what are the options.

Follow-Up Question:
This is in follow-up to this data request (pay frequency…) below.  The researcher just specified to me that he wants to obtain firm-level data.  Noting that this would likely require the  Survey of Employment, Payrolls and Hours (SEPH), I am looking for validation please on whether SEPH it is available in the RDC’s or CDER.  I don’t find it on either the page of survey data sources for the RDC’s or CDER.

The SEPH data is unfortunately not available in the RDCs, and is not currently available in CDER, however they have been in discussions with the SEPH manager about including it in CDER and he was open to the idea, provided there is a demand for it. He noted that it would take some time to prepare the files for use though.

Follow-up Question:
Thank you so much for confirming information about access to SEPH data.  This is what I had thought could be the answer.  And although we have at least one request a year for SEPH data, how that would translate to CDER users I am not sure J

In the meantime, it would be great to find out more about another possible source or sources for tabulations.

On a pure curiosity level, could DLI contacts request information on CDER researchers from their institutions, e.g., numbers?

Follow-up Answer:
Unfortunately it doesn’t look as if we’ll be able to get this particular bit of information:

“In theory the data is available but not something that we publish. Also the concept is not quite what the researcher is looking for. There is a difference in “employee pay frequency” and “easiest way to tell us your wage/salary”.

I am also not sure how CDER numbers would work, but I can look into that for you!

Follow-up Question:
Thank you so much for looking into data sources for this research question.

The researcher would like to please inquire into the possibility of using multiple years of SEPH data for pay frequency at the firm level in CDER.  The researcher acknowledges that the SEPH measure of “easiest way to tell us your wage/salary” would only be a proxy for pay frequency.  Given this somewhat restrictive definition, and the fact that SEPH is not currently available in CDER, what would be the next steps for inquiring about the possibility of using SEPH in CDER?  Does the researcher need to provide any specific information at this stage and would it be acceptable for me to be looped in the discussions?

Follow-up Answer:
Our CDER contact has replied with the following:

“Do you know a bit more about the scope of the project? CDER only provides access to business microdata for research projects and not custom tabulations. If this is indeed the case, I will get in touch with the manager of SEPH and perhaps we can organize a call with the researcher.”

Friday, February 22, 2019

SPSD/M Licence Agreement

The Licence Agreement for the SPSD/M is still in place, right? The open data licence is for PUMFs only?

Yes, the SPSD/M Licence Agreement is still in place.

Thursday, February 21, 2019

Census 2016 PUMF Weights

A prof was delighted to be able to download the Census 2016 PUMF.

However, the weights do not seem to make sense to her.

These are her comments:
"I’m working with the PUMF file now and the weights look VERY odd. For starters, the file only contains 16 replicate weights – normally Stat Can provides 500 to 1000. For another thing, the individual (frequency weight) weight [variable name – WEIGHT] is nearly identical for all respondents.  Is that what Stat Can intended?
Any information you can pass on would be really helpful."

Subject matter has responded with the following:

“I’m happy to hear the prof is enjoying the PUMF.

We appreciate the concerns expressed by the user and we can assure them that the contents of the product are what Statistics Canada intended.

I would recommend that the users consult the PUMF user guide and codebook for answers to their questions. Chapter 3 – Sampling method, estimation and data quality should resolve any questions with regard to the Weight and replicate weight variables.”

Follow-up Question:
Here is the reply I received from my prof – and the answer provided by Subject division does not answer her question.

"Here’s an illustrative example where the application of the Weight variable creates some problems:

There are approximately 390,000 live births in Canada each year. When I run a cross tab to look at the number of women who are parents (couples or single parents) with a child under 1 year, I get a result of over 708,000, which just can’t be right."

I asked her for more detail to try to replicate how she got her answer and this is her reply:

"…my tabulation is restricted to women (Sex), of reproductive age (20 to 44 years per AGEGRP) who are in a couple with children or are single parents (CFSTAT) and who live with a child age 1 or younger  (PKID0_1).
Here’s my Stata output for the derived variable (as per above) “newmom”

. tabulate newmom [fweight = rndweight]

     newmom |      Freq.     Percent        Cum.
          1 |    708,291      100.00      100.00
      Total |    708,291      100.00"

Follow-up Answer:
I received the following response from subject matter:

“On page 109 of the 2016 Census Individuals PUMF User Guide, the PKID0_1 variable includes children aged 0 or 1. Therefore it provides data on presence of children as of Census Day and it includes children that are under the age of 2.

As a result, the counts of children in Census family here presents higher numbers. Foster children are not included in this grouping.”

Research Data Management Resources

Portage has recently published some new Training Aids for Research Data Management.

They include the following:

  • an online training module - Research Data Management (RDM) 101
  • Additions to the “Good Enough” series (1 page guides for busy people)

    • Data Curation
    • Dataverse
    • FRDR

These bilingual resources are available for everyone here: 

2006 Census PUMF

I was going through the user guide for the Census and I was a bit unclear whether I could calculate 95% confidence intervals (e.g. poisson approximation or gamma-distribution) on the age-standardized weighted proportions (of visible minority groups) simply through my direct age-standardization algorithm or if there are other considerations (e.g. variables WT1 through WT8).

We received the following response from subject matter:

Poisson, gamma or other distributional assumptions are typically used for model based inference from data that is not sourced from surveys. It assumes that the event of interest is a random process following the hypothesized distribution and that your data is one realization from this random process. This assumption gives you a tool to derive a model based variance and confidence intervals. It is preferable to use a design-based method in the context of complex survey design data where the characteristic(event) is fixed for all units in the population of interest but the random error between the sample estimate and the true population parameter is due to the random sampling.

For standardized estimates from complex surveys, you need to account for the survey design information in the estimation of the variance. For the 2006 PUMF data file, the steps to do so are described in Chapter 3, section C.2 "Estimation of the sampling variability".

Specifically for the 2006 Census PUMF, you would compute 8 age-standardized estimates, once using each weight variable (step 1) and continue through to step 6.

The method depicted there provides you with 95% Wald confidence interval. It's the method most commonly used for convenience but it rests upon some assumptions, namely that the distribution of the estimator is normal which tends to be inappropriate for very small proportions on smaller domains of estimation. I assume it is less likely to affect you since standardization generally commends larger estimation domains.

Wednesday, February 20, 2019

PCCF+ ID Variable

I’m trying to set up a PCCF+ file for someone and wanted to check…

My input file is a dataset with two columns: ID and Postal Code.  My question is about the ID variable: are these randomly generated or is the ID pulled from an actual place?

The ID needed can be from the researcher’s dataset or simply 1,2,3,…. that you put in to run the dataset. When I am helping researchers with the PCCF+, I suggest that they use the ID field from the dataset that the postal codes are from. That way, when they get the output file it is easier to match up to their dataset.

The rule for ID in the PCCF+ is that it is unique and up to 15 characters. I have used IDs containing letters and it has worked.

I have attached a PCCF+ Guide that Jeff Moon wrote that is very helpful in running the PCCF+. The example uses version 6B1 but process is the same regardless of the PCCF+ version.

** The original email had the PCCF+ Guide attached.

Census Agriculture Region (CAR) Boundary Files

I’m trying to locate ag region boundary files prior to 2001. I’ve looked through the various subdirectories on the EFT server by cannot find what I’m looking for.

Looks like we have 1996 CAR boundaries in the GeoPortal$DLI_1996_Census_CBF_Eng_Nat_car

And something called the Agricultural Ecumene layers from earlier years (1986 and 1991)

If you load the layers to the map and go to the ‘Download’ tab and choose ‘Download all’ we’ve packaged the original data from StatCan in there (should be in E00 format). For 1996 this file was named ‘car96cangeo.e00’.

Tuesday, February 19, 2019

Annual Labour Force Survey (LFS) Files

Some students are looking for annual LFS files instead of monthly. Are those available somewhere? Are the old annual Equinox files archived and available?

We only have the monthly LFS PUMFs available at this point in time, but have been exploring options for annual files as well. I can’t speak to the Equinox files however though.

Thursday, February 14, 2019

Ethnicity and Census

I have received what looks like a deceptively simple request. One of our researchers is looking for ethnicity and/or country of origin by CT (although he would take DA, obviously) for the 2016 census. I can’t seem to find anything on the Statcan website, and the FTP site doesn’t seem to have anything like the cumulative profiles at the DA level as from previous years. 

Answer from DLI List:
Have you tried the Canadian Census Analyzer from CHASS ( It has ethnic origin and place of birth for both CT and DA. You do have to be a subscribing institution to access it but the website says UBC is. 

Census subject matter has responded with the following:

"We only have a limited number of standard products available at these smaller geographies. This would need to be done as a custom request."

Wednesday, February 13, 2019

User Guide for 2016 PUMF

Has a User Guide for the 2016  Census PUMF been released?

The user guide can be found on the EFT: /MAD_PUMF_FMGD_DAM/Root/3901_Census-Pop_Recens-Pop/2016/Individuals-Particuliers/English/Documentation and user guide

Households and the Environment Survey (HES) 2017

I have a request from a researcher who would like access to the most recent HES microdata from 2017.

 According the website, the main release date for the survey data was January 28, 2019, but a search in the DLI collection and the Statistics Canada website Data catalogue only finds the 2015 survey data.

(I was only able to find the Households and the Environment: Radon awareness and testing 2017 from the STATCAN website)

Could you tell me the main release for 2017 is available, and if so, where I can direct the researcher. Also, will there be a PUMF available for this survey (and if so, when will it be available)?

Here is the response from subject matter:

“The release has been delayed and will likely take place by the middle of March.”

The PUMF has also been delayed and there is unfortunately no foreseeable date for its release.

Discrimination at school based on language?

A researcher is looking for data on discrimination in education (school: elementary, secondary or postsecondary) based on language.  We note that various GSS surveys and the Public Service Employee Survey have questions on discrimination based on language linked to work (and even types of discrimination at work in the case of the GSS cycle 30) but not linked to school.  Is there anything that we are missing?

We’ve received the following response from subject matter:

“It appears that at this time we do not have information of this sort.  However, C34 of the GSS will and will be starting collection in April with a scheduled release date of November 2020.”

Monday, February 11, 2019

2016 Census Individuals PUMF Question: Weights Missing

A researcher here has reported that the 2016 individuals PUMF has no weights for the Stata and ASCII versions. I just checked the SPSS version and that’s the case there, too. I can’t open the SAS file, but I’m not sure if that’s an error with the file or with my version of SAS.

In any event, from SPSS (looks better in a monospaced typeface):

Individuals weighting factor 
|N|Valid  |0     |
| |-------|------|
| |Missing|930421|

Individuals weighting factor
|              |Frequency|Percent|
|Missing|System|930421   |100.0  |

Can anybody offer any insight into this? Would someone mind checking the SAS version?

I’m very sorry for all of the confusion regarding the Census PUMF! I think the issue was simply in my SPSS settings on my computer during staging… It looks as if I’ve been able to correct the weight issue (but please let me know if you notice anything that’s “off”).

I’ve also added the SAS, SPSS, and Stata “Command Files” to the RAW data folder for your use as well. I apologize again for the wait with this – it’s the first time we’ve seen these types of files and wanted to confirm that we could distribute them first!

Hospital-Level Employment Data

 I’ve got a researcher interested in turnover/retention, absenteeism, and injury rates for nurses at Ontario hospitals. I know CIHI has some related data at the province level, but does anyone know of any source more granular? We’ll be looking at the LHIN level too (for as long as Ontario still has them…)

We’ve been able to track down the following on our end:

“Please note that we have very limited information on Ontario nurse turnover/retention, absenteeism, and injury rates, however, we have found some information on injuries from the Ontario government Web site that could be of interest for you:

Also, we have found some information on turnover/retention, absenteeism from the College of Nurses of Ontario:

To obtain additional information, we suggest contacting both of these organisation.”

Canadian Business Patterns (CBP) – December 2018

We are pleased to inform you that the following product is now available.

Canadian Business Patterns (CBP) – December 2018

EFT: /MAD_DLI_IDD_DAM/Root/other_autres/1105_CBP_SIC/2018/cbp2018-december

Friday, February 8, 2019

Labour Force Survey (LFS) – January 2019

We are pleased to inform you that the following product is now available.

Labour Force Survey (LFS) – January 2019

This public use microdata file contains non-aggregated data for a wide variety of variables collected from the Labour Force Survey (LFS). The LFS collects monthly information on the labour market activities of Canada's working age population. This product is for users who prefer to do their own analysis by focusing on specific subgroups in the population or by cross-classifying variables that are not in our catalogued products. The Labour Force Survey estimates are based on a sample, and are therefore subject to sampling variability. Estimates for smaller geographic areas, industries, occupations or cross tabulations will have more variability. For an explanation of sampling variability of estimates, and how to use standard errors to assess this variability, consult the Data Quality section in the Guide to the Labour Force Survey.

EFT: /MAD_PUMF_FMGD_DAM/Root/3701_LFS_EPA/1976-2019/data/

Nesstar Webview: Labour Force Survey (LFS), January 2019 []

Thursday, February 7, 2019

Bootstrap Weights for CCHS 2104 Annual Component

A student here is looking for the bootstrap weights file (b5.txt) that  is described in the documentation for CCHS 2014 Annual. It does not appear on or on the EFT site; indeed, I cannot find any trace of CCHS 2014 on the EFT site. Any ideas as to where we might find this file?

 The 2014 annual file can be found here: /MAD_PUMF_FMGD_DAM/Root/3226_CCHS-Ann_ESCC-Ann/2013-14/2014

I myself do not see any bootstrap datasets so I will need to go back to subject matter with this one.

Subject matter has responded with the following:

"Please note that the 2014 CCHS PUMF did not contain Boostrap weights data. It was not distributed because it did not exist at the time, we only started introducing the Boostraps with the 2015 Pumf."

Wednesday, February 6, 2019

Census of Population 2016 PUMF

We are pleased to inform you that the following products are now available.

Census of Population 2016 PUMF – Individuals

EFT: /MAD_PUMF_FMGD_DAM/Root/3901_Census-Pop_Recens-Pop/2016

Phone App Download Data

I was wondering if anyone had any leads on where to look for this data request:

“I am looking for data on the number of downloads of dating applications (phone apps) in Canada from 2012-present”

I’ve looked around but haven’t found any data that is publicly available.  Maybe it’s just been one of those days…

Secondly, Statista seems to have a lot of aggregated data from reports and I was curious if anyone here has a subscription to it and how they find the site?

We’ve received the following response from subject matter:

“The Canadian Internet Use Survey (CIUS) 2018 contains questions related to online activities such as using the Internet for a dating website or app.

The survey is currently in collection and the results will be released in the Fall 2019.

Another source to check with would be the Digital economy Survey, July 2017 to June 2018, they might have some information related to dating apps.”

Questions about the Public Service Employee Survey

I have a researcher who is asking about the “Public Service Employee Survey”.  The last available PUMF is 2008.  They are wondering if any more PUMF’s are available or planned.   I see that the survey was redesigned in 2008.  I suspect the answer is no more PUMFs but it never hurts to ask.

Also, the PUMF lists responses by “Respondent ID”.   I researcher would like to know if the “Respondent ID” remain consistant over time.  Ie.  Is the same respondent referred to by the “respondent ID” in different PUMFs.

Here is the response from subject matter:

“TBS (the sponsoring client for PSES) did not request PUMFs for any cycle after 2008, so unfortunately these are not available.  The master file is available in the RDC, but there are restrictions on who is able to access it (to avoid employers seeing responses from, or being able to identify, their employees). 

For PSES, respondents are not tracked over multiple cycles.  For that reason, there is no identifying variable that corresponds to the same respondent from one cycle to the next.”

Tuesday, February 5, 2019

Population of Businesses with Employees

On the page describing Canadian Business Patterns,, it states the program began in 1980 and that “The last release of the Population of businesses with employees data was on March 8, 2007.”

(was Population of businesses with employees an official product since Population is capitalised?)

Via ftp and wds stats/data can (only0 be obtained back to 1988.

How does one obtain stats/data back to 1980?

I’ve received the following response from subject matter:

“We would offer our business counts by Canada/Province, NAICS and Employment Size Range free of charge in the Cube environment available here:

We also do offer custom business counts for a fee. The earliest available year is 1898 although NAICS was introduced in 2004, location counts began in 2007 and businesses without employees was first included in our counts in 2004.”

Follow-up Question:
Some clarification is needed. On the page describing Canadian Business Patterns, it states the program began in 1980.

The response from subject matter stated "The earliest available year is 1898." That must be a typo. The link subject matter provided leads to the "Cube environment" with 11,978 results(!) which, to say the least, is not helpful.

Via DLI ftp & wds one can get back to 1988.

Is a cu$tom request needed for 1980 (or earlier?) to 1988?

Follow up Answer:
Subject matter has confirm that yes, this was a typo. Sorry about that! 1989 is the earliest available year. The Cube environment link seems to be set for internal use and unfortunately resets for external clients. If you type the words “Canadian business counts” in the search field, all of the tables will appear going back to December 2015.

In regards to the custom tables – 1980 is simply the year that the Division itself had started. Unfortunately there is no data available earlier than 1989.

Monday, February 4, 2019

New files on Statistics Canada Nesstar

We are pleased to inform you that the following are now available on the Statistics Canada Nesstar WebView site.

General Social Survey - Canadians at Work and Home (GSS) 2016 - Cycle 30 PUMF
Survey of Financial Spending (SFS) 2016 PUMF
Employment Insurance Coverage Survey (EICS) 2017 PUMF

Friday, February 1, 2019

Population of Alberta Files

I need to find the population of a number of municipalities in Alberta by dissemination area.  We are trying to create a map with population of part of Alberta.  The map is a river basin.  We have the shape files for the river basin, but I need the dissemination area data so that the files can be joined. 

Are these available on the DLI site?  I saw some files, but wasn't sure exactly which ones to download.

I’ve consulted with subject matter on this one and they gave me the following information:

“The client can use Geosuite for this:

1.      First, search for the province of Alberta. There will be a dropdown with the results, select it, and press the magnifying glass.

2.      Next, click on DA in the hierarchy chart (highlighted in yellow). A list of all of the DAs in Alberta will be listed below.

3.      You can now export this list and select only the columns needed for your analysis (i.e. DAuid and population). You can add or remove these columns with the Show/Hide columns button.

4.      You can now export your data into a .csv file. Note: If you selected specific columns, choose export selection. If you require all of the data that was first loaded, choose export all.