Tuesday, December 30, 2014

Updated Products: TSRC

Please note the updated products listed below and the path to access them via the EFT site.

Travel Survey of Residents of Canada (TSRC)

The Travel Survey of Residents of Canada (TSRC) is a major source of data used to measure the size and status of Canada's tourism industry. It was developed to measure the volume, the characteristics and the economic impact of domestic travel. 

Since the beginning of 2005 this survey replaces the Canadian Travel Survey (CTS).

EFT:

/MAD_DLI/Root/other-products/Travel Survey of Residents of Canada - tsrc/2013
/MAD_PUMF/Root/PUMF/Travel Survey of Residents of Canada - tsrc/2013

Monday, December 29, 2014

Aboriginal Educators

Question

I have a student looking to find out the number of aboriginal education graduates who were able to get teaching positions in the last 5 years. Is there data on this? If so, where can I find it?

Answer

This type of information is actually not possible with the way the information is collected here. We can’t identify the link between Aboriginal graduates and if they are able to secure a teaching position later on.

We would be able to tell the highest level of education of an Aboriginal person, their occupation etc. at the time the survey is collected (most recent one being APS 2012) but not the correlation the researcher is looking for. (In the last 5 years)

If they can be more specific we may be able to do a custom tabulation (at a cost and delay of time).

Updated Products: Boundary Files for AgCensus 2011

Please note the updated products listed below and the path to access them via the EFT site.

Census Agricultural Regions Boundary Files of the 2011 Census of Agriculture

These boundary files for Canada contain the boundaries of all 82 census agricultural regions delineated for the 2011 Census of Agriculture.
They were created to support the spatial analysis and thematic mapping of data from the 2011 Census of Agriculture.
Geographic Information System software is required for creating the thematic maps. Software is not provided with the product.
These boundary files are positionally consistent with Geography Division's Road Network Files and Skeletal Road Network Files,
which can provide additional geographic context for mapping applications.
More info: <http://www.statcan.gc.ca/pub/92-637-x/92-637-x2011001-eng.htm>

EFT:

/MAD_DLI/Root/geo/2011/CensusAgriculture_Regions_boundary_files/English

Tuesday, December 23, 2014

Updated Products - ICO Q3 2014

Please note the products listed below and the path to access them via the DLI EFT.

Inter-corporate ownership (ICO) – Q3 2014

This product is a directory of corporate ownership in Canada that provides information on every individual corporation that is part of a group of commonly controlled corporations with combined assets exceeding $600 million or combined revenue exceeding $200 million. Individual corporations with debt obligations or equity owing to non-residents exceeding a net book value of $1 million are covered as well.

Ultimate corporate control is determined through a careful study of holdings by corporations, the effects of options, insider holdings, convertible shares and interlocking directorships.

The information presented is based on non-confidential returns filed by Canadian corporations under the Corporations Returns Act and on research using public sources such as Internet sites. Entries for each corporation provide both the country of control and the country of residence.

/MAD_DLI/Root/other-products/Inter-corporate ownership – ico

Announcement in The Daily — Inter-corporate ownership, third quarter 2014 <http://www.statcan.gc.ca/daily-quotidien/141223/dq141223h-eng.htm>

Monday, December 22, 2014

Survey of Neurological Conditions in Canada

Question

A faculty member would like to access the Survey of Neurological Conditions in Institutions in Canada. As far as I can tell, it is not accessible via the DLI. How can she go about getting access to it?

Answer

The Health Division explained that the only way for anyone to access this survey is through a custom request. I am including what I was able to find on our website about this survey:

<http://www23.statcan.gc.ca/imdb/p2SV.pl?Function=getSurvey&SDDS=5187>

<http://www5.statcan.gc.ca/cansim/a05?searchTypeByValue=1&lang=eng&id=1051305&pattern=1051305>

Let me know if you wish to request a custom tab. I will give you the contact name for cost estimates, turnaround time, etc.

User Guide for NHS Hierarchical File

Question

In the notes about the variable Religion, it makes reference to a variable called ReligDer – I presume that this variable is only available on the Master file?


Answer

Subject matter explained that: RELIG is the name of the original variable in the master file.
RELIGDER is the name of a derived (aggregated) version of RELIG, and is in the master file. We use that RELIGDER version and aggregate it even more to create this RELIGION variable in the HPUMF.

So yes, the first sentence of the description is a bit misleading... it would be more correct to say:

“RELIGION is the aggregated version of RELIGDER which is the aggregated version of the variable RELIG (detailed responses).”

Saturday, December 20, 2014

Seven Generations

Question

It looks like our CD-ROM of this product has disappeared. It is so old now that I don't know if the software would still work. Can anyone tell me if it is available online or anywhere?

<http://www.crr.ca/en/component/flexicontent/items/item/19716-for-seven-generations-an-information-legacy-of-the-royal-commission-on-aboriginal-peoples-cd-rom>

Answer

We’ve looked at extracting content from the “For Seven Generations” CD before, but had little success. Much of the content is buried in proprietary software called “Folio Views”. Not at all conducive to exporting reports, documents, etc.

We did manage to find a copy of the 5 volumes of the final report. We found it initially at http://caid.ca/RepRoyCommAborigPple.html and then combined the chapters into the original 5 volumes. The results of this are found in QSpace, or digital repository at the link shown below. So, while all of the things listed below appear on the “For Seven Generations” CD, as far as I know only the first item is available on the web.

- Final Report <http://qspace.library.queensu.ca/handle/1974/6874>
- People to People, Nation to Nation
- Hundreds of Research Reports
-Guide for Developing Native Awareness Curricula for High School and Adult Levels
- Thousands of Pages of Testimony from over 2000 People

I managed to google up one article on converting Folio Views to HTML here:<http://www.law.cornell.edu/papers/lii/fffhtml.htm>

Though it looks like you need a copy of Folio Views in any case. For a one-shot deal, it might not be too onerous to setup a virtual machine running whatever version of Windows you might need to install whatever version of FV you can get your hands on. Then, you could try this method, or some adaptation thereof.

Another option if you can get a working FV installation would be to see if you can print from the application, and if so, install a PDF printer driver and export it all that way. This will likely be time-consuming and result in a huge loss of fidelity and embedded metadata, but might produce acceptable access copies at least.

The Aboriginal Peoples Commission disk is just one example. There are a great many Stats Canada and other Canadian government publications from the 1990’s in Folio Views and other extinct or soon to be extinct file formats which are in need of saving.

Eventually we will need a list of the products which need saving, and the permissions to convert them, then we can figure out a methodology. I would be interested to hear if there are folks in Stats Can, elsewhere in government, or at other Universities, who are thinking or have thought about this problem.

I would assume that the author of the For Seven Generations would hold the copyright to the product? Third party materials could be subject to copyrights held by other organizations. Where information has been produced or copyright is not held by Government of Canada, the materials are protected under the Copyright Act, and international agreements. 

Friday, December 19, 2014

Survey Methodology, December 2014


Published today

Today, Statistics Canada releases on its website Volume 40, Number 2 (December 2014) <http://www.statcan.gc.ca/pub/12-001-x/12-001-x2014002-eng.htm?cmp=cwe-cae> 
of its scientific journal Survey Methodology. This highly recognized, peer reviewed journal allows researchers, statisticians, mathematicians and methodologists from around the world to share research in the field of survey techniques and their practical applications.

Waksberg Invited Paper

This December 2014 issue opens with the fourteenth paper to receive the Walksberg Award, in honour of Joseph Waksberg’s contributions to the theory and practice of survey methodology. In her paper entitled From multiple modes for surveys to multiple data sources for estimates, the author, Constance F. Citro, suggests ways to inculcate a culture of official statistics that focuses on the end result of relevant, timely, accurate and cost-effective statistics and treats surveys, along with other data sources, as means to that end.

The December issue contains eight other papers and two short notes:

Papers:

- Brady T. West and Michael R. Elliott - Frequentist and Bayesian approaches for comparing interviewer variance components in two groups of survey interviewers

- Jianqiang C. Wang, Jean D. Opsomer and Haonan Wang - Bagging non-differentiable estimators in complex surveys

- Jae Kwang Kim and Shu Yang - Fractional hot deck imputation for robust inference under item nonresponse in survey sampling

- David G. Steel and Robert Graham Clark - Potential gains from using unit level cost information in a model-assisted framework

- Sun Woong Kim, Steven G. Heeringa and Peter W. Solenberger - Optimal solutions in controlled selection problems with two-way stratification

- Paul Knottnerus - On aligned composite estimates from overlapping samples for growth rates and totals

- Andrés Gutiérrez, Leonardo Trujillo and Pedro Luis do Nascimento Silva- The estimation of gross flows in complex surveys with random nonresponse

- Yan Lu - Chi-squared tests in dual frame surveys

Short Notes:

Guillaume Chauvet and Guylène Tandeau de Marsac - Estimation methods on multiple sampling frames in two-stage sampling designs

Qi Dong, Michael R. Elliott and Trivellore E. Raghunathan - Combining information from multiple complex surveys

Published since 1975, Survey Methodology has been a dependable reference point entirely dedicated to the latest advances in the field of survey techniques and methodology used around the world. To increase access to this scientific research and for environmental reasons, Statistics Canada publishes the journal free of charge on its website. Historical issues from volume 25 are also available online.

Subscribing is easy and free


Would you like to receive an automatic alert by email when new issues of Survey Methodology are available on Statistics Canada’s website? Go to <www.statcan.gc.ca/eng/mystatcan>

Login or Register.
Once inside the My StatCan portal, select ‘Email notifications.’
Under the ‘Publications’ tab, choose the subject ‘Statistical Methods,’ then add the item ‘Survey Methodology (12-001-X)’.

For more information, visit <www.statcan.gc.ca/surveymethodology>

Don’t hesitate to share this with anyone who might be interested in Survey Methodology.

Historic Oil Prices

Question

We have a researcher who is looking for historic crude oil prices from 1900 (or earlier) to 1940. Where wpuld we be able to find this?

Answer


Global Financial Database has data for the price of West Texas Intermediate Oil Price (US$/Barrel) from 1859-present:

- Monthly From Sep 1859 To Dec 1874
- Daily From Jan 1875 To Dec 1895
- Monthly From Jan 1896 To Dec 1899
- Daily From Jan 1900 To Dec 1919
- Monthly From Jan 1920 To Dec 1976
- Daily From Jan 1977 To Dec 2014

CTADS 2013 PUMF Release

Question

Are we still expecting the PUMF for the Canadian Tobacco, Alcohol and Drugs Survey (CTADS) 2013 to be released/available this month <http://www.statcan.gc.ca/eng /dli/ prod_date>

Answer

The CTADS PUMF file is set to be released on February 3, 2015,

Thursday, December 18, 2014

Retail Sales by NAICS Code

Question

A researcher is looking for “retail sales by naics code for different employee size categories. Would it be possible to acquire these data for NAICS 444 for each of the CMAs in Ontario and for Ontario as a whole?”

A subsequent message indicated that he thinks this is collected via the Business Register, but I don’t see anything in the documentation on the STC website indicating that retail sales is a variable.

CANSIM supplies us with Table 080-0026 (data from Annual Retail Trade Survey) but only at Canada level and no employee size categories.

But maybe there’s more behind what I can see or a way of joining those two data files together! Let me know what you know or find.

Answer

Would the following table meet your researcher’s needs? I confirmed with Retail and Service Industries Division that they would not have this type of information available.

CANSIM table 551-0006. See below snip image:
Displaying image001.png





New ICS Study

New facts on pension coverage in Canada

by Marie Drolet and René Morissette, Social Analysis and Modelling Division (SAMD)
Insights on Canadian Society

This study examines the characteristics of Canadian workers aged 25 to 54 who are covered by defined benefit registered pension plans (RPPs) as well as those covered by defined contribution RPPs or hybrid plans. It does so by using new data from the Longitudinal and International Study of Adults (LISA), first conducted in 2012.
<http://www.statcan.gc.ca/daily-quotidien/141218/dq141218b-eng.htm>

Standardized Neighborhood Household Income Quintiles (QAIPPE)

Question

I have a researcher who has indicated he needs standardized neighbourhood household income quintiles (QAIPPE) from the PCCF+ for 1981-2006. He writes,

To extract the historical income quintiles, we need the following files:

*FILENAME QAIPPE06 'H:\SESREF\IPEIMM06-R.CAN'; /* INCLUDED IN PCCF+ V5x */ FILENAME QAIPPE01 'H:\SESREF\C01IPEQDi.CAN'; FILENAME QAIPPE96 'H:\SESREF\C96IPEQD.REDONE'; FILENAME QAIPPE91 'H:\SESREF\C91IPEQD_SEN.CAN'; FILENAME QAIPPE86 'H:\SESREF\C86IPEQD_NOSEN.CAN'; FILENAME QAIPPE81 'H:\SESREF\C81IPEQD';

Do you have these files or older versions of pccf?
qaippe2006 in pccf5C
qaippe2001 in pccf4A
qaippe1996 in pccf3E

I know "nothing" about PCCF+ except where to retrieve it. I'm happy to download the older files if necessary, but I wonder if there's something to be done with the current file. Does anyone know?

Answer

If possible, could you save these files and copy them into PCCF5K and PCCF+ 6A1 within the DLI?

I uploaded the added files to

pccf6a1-fccp6a1-ver2.zip
pccf5k-fccp5k-canada-ver4.zip

in the following directory:

/MAD_DLI_PCCF/Root/Health-PCCF-plus-Sante-FCCP-plus

Wednesday, December 17, 2014

DLI Access

Question

I don't use my DLI links often, but up until now I have accessed things through the website < http://www.statcan.gc.ca/dli-ild/products-eng.htm> and/or through ftp <stcftp.statcan.ca>.

Yet now I cannot access either. When I go to the website and click on the "DLI" link it just goes back to the same product page. And if I try to use an ftp client with my old password it also doesn't work I get a connection time out could not connect to server error. Has the way we access DLI changed? Or have the passwords changed?

Answer

I apologize for the inconvenience with the website, we are having an issue with a redirect. It should be corrected soon which will link to the appropriate application to access the data.

The DLI Collection is hosted on the EFT, for more information see the DLI Survival Guide: Accessing and Citing DLI Data <http://www.statcan.gc.ca/eng/dli/guide/toc/3000273>

This transition happened in 2013. A user account information would have been provided to you. I will follow up off list to confirm your access credentials.

Tuesday, December 16, 2014

Corn Prices in Canada

Question

Table 001-0010 on Cansim gives the average price per tonne of corn up to 1984.

What happened to the price after this date? Is there a good source for Canadian corn prices from 1985-to date?

Answer

The Agriculture Division Confirmed that we, Farm cash receipts, have corn prices starting in 1986. There are also monthly corn prices back to 1985 available through another DLI member.

The Farm cash receipts annual average price is based on corn marketed by producers within a year and does not include grain corn sold for feed from farm to farm within the same province or feed fed on the farm.

Our price may not be exactly comparable to the prices in 001-0010.

I believe the prices in 001-0010 are on an annual crop year base rather than an annual calendar year, and we think it is an average of all the crop production (including fed on farm).

The latter difference (including all feed corn) would not likely make much of a difference for corn but unfortunately we cannot compare a common year to see for sure.

We could create a file with average annual corn prices from 1986 to date in about an hour.

DLI Links

Question

It was pointed out to me that all the DLI links at <http://www.statcan.gc.ca/dli-ild/products-eng.htm> don't seem to be working. Only the Nesstar links are working.

Answer

Yes, we are having issues with a redirect that is put in place. A new DLI product page is being created with Web Operations. I hope to have the links fixed soon. In the interim, the information is available on the DLI EFT site.

Response Rates in the 2011 National Household Survey and the 2006 Census

Question

I see the global non-response rate (GNR) from the NHS home page for Canada <http://www12.statcan.gc.ca/nhs-enm/2011/dp-pd/prof/details/page.cfm?Lang=E& Geo1 =PR&Code1=01&Data=Count&SearchText=canada&SearchType=Begins&SearchPR=01&A1=All&B1=All&Custom=&TABID=1> is 26.1%. However, I also see frequent reference to a response rate of 68% (e.g., http://www.chamber.ca/download.aspx?t=0&pid=f9d85161-2e65-e411-a071-000c29c04ade, page 67).

Why and how do these two numbers differ? What are they measuring? What were the comparable numbers for the 2006 long-form census?


Answer

There are 2 different rates being referenced by your client. These are the survey response rates and the Global non-response rates (GNRs) of the questionnaire. Here are some explanations that should clear up the confusion between the 2 different concepts and some information about the 2006 2A and 2B long form rates:

From the 2011 NHS Guide <http://www12.statcan.gc.ca/nhs-enm/2011/ref/nhs-enm_guide/guide_2-eng.cfm>:

Survey response rate

The response rate, which is the ratio of the number of questionnaires completed to the total number of occupied private dwellings in the sample, is 68.6% for Canada, all collection methods combined. This is similar to the response rate for other voluntary surveys conducted by Statistics Canada.

Since the NHS sampling design includes a subsample for non-response follow-up, a weighted response rate that takes this subsample into account is needed to get a better idea of the quality of the NHS data. In the calculation of the weighted response rate, the households in the subsample that responded to the NHS represent not only themselves but also the non-respondent households that are not in the subsample.

Note: The response rates are based on the NHS's final sampling weights. The initial sampling weight of the dwellings that responded to the NHS before a specific date during the collection period is equal to the sampling fraction in their area. The dwellings that were in the non-response follow-up subsample and responded were assigned a larger weight to compensate for non-response. The weighted response rates are calculated as follows: the weighted number of sampled private dwellings that returned a questionnaire divided by the weighted number of sampled private dwellings classified as occupied.

The overall response rate for the 2006 Census was 96.5%. The rate for the 2A, or short form, was 97.2 % and for the 2B, or long form, it was 93.7 %.

And this explains the Global Non-response Rate (GNR):

The global non-response rate was calculated in order to determine whether the data for a geographic area is of sufficient quality to be released as it is an important measure of the quality of NHS estimates. It combines household and item non-response, as such it reflects the risk of non-response bias. This measure was also used to decide when to disseminate counts for a given geographic area for the 2011 Census, just as it was used in the 2006 Census for the dissemination of short form counts and long form sample estimates.

In the specific case of the NHS, the global non-response rate is weighted to take account for the initial sampling and the sub-sampling prior to non-response follow-up. The global non-response rate was calculated as the ratio of two weighted estimates for a given geographic area. The numerator of the ratio is an estimate of the total number of questions for which no response were obtained over all households (i.e. respondents and non-respondents) in the given geographic area. The denominator of the ratio is an estimate of the total number of questions for which responses were expected over all households (i.e. respondents and non-respondents) in the given geographic area.

For the 2011 Census and the 2006 Census (short and long forms), the global non-response rate was un-weighted. The global non-response rate was calculated as the ratio of two counts for a given geographic area. The numerator of the ratio is a count of the total number of questions for which no response were obtained over all households (i.e. respondents and non-respondents) in the given geographic area. The denominator of the ratio is a count of the total number of questions for which responses were expected over all households (i.e. respondents and non-respondents) in the given geographic area.

As for the 2006 Census the GNR was not published as a rate. Instead data quality indicators (commonly referred to as data quality flags) were attached to each standard geographic area disseminated. In the 2006 Census database environments, the data quality indicators consist of a five-digit numeric field. On the database and in electronic products browsed via Beyond 20/20, these flags are displayed as a five-digit numeric code (example: 0 2 1 3 1). The following link shows the breakdown of this 5-digit code: Data quality and confidentiality standards and guidelines (public): Data quality practices <http://www12. statcan.gc.ca/census-recensement/2006/ref/notes/DQ-QD/Appendix_B-Annexe_B-eng.cfm>. These areas are flagged on the database according to the non-response rate. Geographic areas with a non-response rate higher than or equal to 25% are suppressed from tabulations. Geographic areas with a global non-response rate higher than or equal to 5% and lower than 25% are broken into 2 categories and are flagged according to the following ranges: falling between 5% and 10% and falling between 10% and 25%. These geographic areas are identified in tabulations, but not suppressed.

NHS Global Non-Response Rate

Question

What exactly does the GNR measure? I read in the NHS User Guide that it “combines household and item non-response”, but am trying to come up with an explanation that I can give to users.

Is it a measure of the number of survey questions that were completed in an area divided by the number of survey questions that were sent out to the area?


Answer

The Census Division explained that: "The global non-response rate was calculated in order to determine whether the data for a geographic area is of sufficient quality to be released as it is an important measure of the quality of NHS estimates. It combines household and item non-response, as such it reflects the risk of non-response bias. This measure was also used to decide when to disseminate counts for a given geographic area for the 2011 Census, just as it was used in the 2006 Census for the dissemination of short form counts and long form sample estimates.

In the specific case of the NHS, the global non-response rate is weighted to take account for the initial sampling and the sub-sampling prior to non-response follow-up. The global non-response rate was calculated as the ratio of two weighted estimates for a given geographic area. The numerator of the ratio is an estimate of the total number of questions for which no response were obtained over all households (i.e. respondents and non-respondents) in the given geographic area. The denominator of the ratio is an estimate of the total number of questions for which responses were expected over all households (i.e. respondents and non-respondents) in the given geographic area."

Monday, December 15, 2014

Education Indicators in Canada: An International Perspective 2014

Published today

Today, Statistics Canada releases the report Education Indicators in Canada: An International Perspective 2014 <http://www.statcan.gc.ca/pub/81-604-x/2014001/intro-eng.htm?cmp=cwe-cae>, a rich source of education statistics for Canada, and its provinces and territories. Jointly produced with the Council of Ministers of Education (Canada), it contains a set of 12 indicators presented in five chapters, which cover topics such as:

educational attainment of the adult population,
the connection between educational attainment and the labour market,
investment in each student in public and private institutions at several levels of education,
transitions from education to the working world,
working time and teaching time of teachers in public institutions,
and much more!

This report is a companion report to the Organisation for Economic Co-operation and Development (OECD), Education at a Glance, which presents complete data for all OECD member countries, including Canada.

NHS User Guide

Question

The NHS User Guide <http://www12.statcan.gc.ca/nhs-enm/2011/ref/nhs-enm_guide/99-001-x2011001-eng.pdf> in section 6.3 indicates for areas with 50%+ GNR that “The estimates for such areas have such a high level of error that they should not be released under most circumstances.” What are the circumstances under which they might be released? 

Answer

All geographic areas are available through a custom data tabulation from the nearest regional office provided that these areas are not suppressed due to data confidentiality rules. This is mentioned in the 2011 NHS Data Quality and Confidentiality Standards and Guidelines (Public) <http://www12.statcan.ca/nhs-enm/2011/ref/DQ-QD/index-eng.cfm> as well as the data confidentiality rules for the NHS:

Geographic areas with a global non-response rate higher than or equal to 50% are suppressed from standard data products but will be available as a custom request.

LFS Revisions

Labour Force Survey (LFS) - Upcoming revisions
Following the release of final population estimates from each census, a standard revision is applied to the Labour Force Survey (LFS) estimates. The revised estimates are scheduled to be released on CANSIM in early February 2015, and will include the following:
- LFS data will be adjusted to reflect the 2011 Census population estimates and will be revised back to 2001. LFS data are currently based on estimates from the 2006 Census.- Geographic boundaries will be updated to the 2011 Standard Geographical Classification (SGC) from the current 2006 SGC. This change will slightly modify the boundaries of some Census Metropolitan Areas (CMAs) and Economic Regions (ERs).
- Three ERs will be combined for data quality reasons.

- New CANSIM tables will be created for all sub-provincial areas based on the 2011 census boundaries and the data series will be available for 2001 onward. A concordance table for the CANSIM vectors will be provided prior to release.
While the overall imputation strategy will not be changed, the revisions will include an update to the variables used to create the imputation groups to reflect both current response patterns and relationships between key variables. In early February 2015, these changes will be implemented historically, starting in January 2008.
Key labour market trends as well as rates of unemployment, employment and participation will be essentially unchanged as a result of these updates, and most changes to estimates will be minor.
Note that these revisions will not include updates to the classification structures for industry and occupation. These updates will take place in January 2016.
Sample redesign
Every ten years, the LFS undergoes a sample redesign to reflect changes in population and labour market characteristics, as well as new definitions of geographical boundaries. The redesigned sample will be introduced starting in January 2015 and will be fully implemented by June 2015.

Education Indicators in Canada: An International Perspective 2014

Published today

Today, Statistics Canada releases the report Education Indicators in Canada: An International Perspective 2014 <http://www.statcan.gc.ca/pub/81-604-x/2014001/intro-eng.htm?cmp=cwe-cae2>, a rich source of education statistics for Canada, and its provinces and territories. Jointly produced with the Council of Ministers of Education (Canada), it contains a set of 12 indicators presented in five chapters, which cover topics such as:

- educational attainment of the adult population,
- the connection between educational attainment and the labour market,
- investment in each student in public and private institutions at several levels of education,
- transitions from education to the working world,
- working time and teaching time of teachers in public institutions,
- and much more!

This report is a companion report to the Organisation for Economic Co-operation and Development (OECD), Education at a Glance, which presents complete data for all OECD member countries, including Canada.

Friday, December 12, 2014

Revised version of the English SPSS file for National Household Survey, 2011, Hierarchical PUMF

We now have a revised version of the SPSS file for the NHS Hierarchical PUMF.

Differences from the DLI version: 

It includes:
- missing values declarations, and
- value labels for top and bottom coded values and special values (e.g., 1 “positive values that would have rounded to 0”).
· reordered the variables into the order that they appear in the codebook, rather than the alphabetical order in which they were originally presented in the SPSS file from DLI.

· Be warned –  3 of the variables are renamed to conform to the old-time 8-character limit (for users of old software) – uncomment the “rename variables" statements if you would like to use the official Statcan variable names.

Thursday, December 11, 2014

Canadian Community Health Survey (CCHS) Mental Health 2012 PUMF

The Canadian Community Health Survey (CCHS) – Mental Health 2012 PUMF!

After the release of the CCHS - Mental Health data file, typographical revisions were made to the Derived Variable descriptions for Work Stress (WST).

The revisions are described on the EFT. Most changes only affect the variable descriptions and therefore the data are not impacted. However, the derived variable WSTDJST had an error in the specifications which also resulted in the variable being calculated incorrectly. Data users who are working with the WSTDJST variable are instructed to use the code provided in this Errata to correct the variable.

Eft: /MAD_DLI/Root/other-products/Canadian Community Health Survey-cchs/2012-mh-sm

We regret any inconvenience this may have caused you or your organization and thank you in advance for your understanding.

Wednesday, December 10, 2014

Updated Products: 2011 National Household Survey Public Use Microdata File

2011 National Household Survey Public Use Microdata File – Hierarchical File

This PUMF product provides access to non-aggregated data covering a sample of 1% of the Canadian households. It is a comprehensive social, demographic and economic database about Canada and its people, and contains a wealth of characteristics on the population. The file enables the study of individuals in relation to their census families, economic families and households. The geographic identifiers have been restricted to the provinces, the three territories grouped into a region called Northern Canada and selected metropolitan areas (Toronto, Montréal, Vancouver, Edmonton and Calgary) to ensure respondents’ anonymity.

This product, contains the data file (in ASCII format); user documentation and supporting information; all licence agreements; and SAS, SPSS and Stata program source codes to enable users to read the set of records. It is important to note that users will require knowledge of data manipulation packages (or software) such as SAS, SPSS or Stata to use this product.

EFT: MAD_DLI/Root/NHS_ENM/2011/PUMF-FMGD/hierarch

Monday, December 8, 2014

Transfer Payments

Question

I am currently working on a project related to Canadian Government's expenditures. I am trying to find annual data, preferably from 1976 to 2013, about government actual expenses on social policies, such as child related transfers to families, or public income support payments during periods of maternity and parental leave.

I found these tables from CANSIM (there is no data available from 1981-1988 or after 2009 when the series was terminated):

- Federal government and government sector revenue and expenditure, annual (Dollars), 1961 to 1980 <http://www5.statcan.gc.ca/cansim/a26lang=eng&retrLang=eng&id=3840022 &tabMode=dataTable&srchLan=-1&p1=-1&p2=9>

- Federal, provincial and territorial general government revenue and expenditures, for fiscal year ending March 31, annual (Dollars), 1989 to 2009 <http://www5.statcan.gc.ca/cansim /a26?lang=eng&retrLang=eng&id=3850002&tabMode=dataTable&srchLan=-1&p1=-1&p 2=9>

If you could advise anything further that would be great.

Answer

You may have noted in the CANSIM table 385-002 that the 385-XXXX series was compiled under the old Financial Management System framework. Statistics Canada is adopting the international standard Government Finance Statistics. More information on this can be found here:

- Moving from the Financial Management System to Government Finance Statistics - http://www.statcan.gc.ca/pub/13-605-x/2010001/article/11155-eng.htm.

See release notice of the GFS in the Daily - Canadian Government Finance Statistics, 2008 to 2012 (provisional):
<http://www.statcan.gc.ca/daily-quotidien/141119/dq141119a-eng.htm>

Note: The data sources, methods and concepts that underlie the CGFS-based data depart significantly from the Financial Management System (FMS)-based data previously published by Statistics Canada.

Available in CANSIM: tables CANSIM table385-0033 to 385-0039 <http://www5.statcan.gc.ca/cansim/a03?lang=eng&pattern=385-0033..385-0039&p2=31)>. 
Definitions, data sources and methods: survey number5218 <http://www.statcan.gc.ca/imdb-bmdi/5218-eng.htm>

The Add/Remove Data Tab has more options with respect to Statement of government operations and balance sheet. If the user wants more detailed information, if available, it would require a custom tabulation.

Friday, December 5, 2014

CSD geography changes (2011) corresponding to years 2002 - 2010

Question

I have a researcher looking for information on CSD geography changes (2011) corresponding to years 2002 - 2010

1) What sources she should work with to trace Census-to-Census CSDuid correspondences?

So far, we have come across the following document as a reference source: "Interim List of Changes to Municipal Boundaries, Status, and Names (geography products: geographic reference products)" <http://www5.statcan.gc.ca/olc-cel/olc.action?ObjId=92F0009X&ObjType=2&lang=en&limit=1>

We note that this document lists changes by CSD names and not by CSDuid’s (in tables 1 and 2. 
There seem to be two types of changes that would be relevant to her CSDuid matching work (re: 2011 CSDuid’s to the four CCHS cycles): 4 Dissolution and 5A CSD has annexed a complete CSD (could also include the annexation of another CSD part).

2) Is this correct, are these above-mentioned two changes the ones that would be relevant for CSD correspondence purposes? Are there others?

3) Could these tables be made available in a spreadsheet-friendly format with matching CSDuid’s, so that she can search by Remarks, e.g. Complete annexation or Now part of (dissolution), and then capture the relevant CSDuid’s? 

4) If not, should she create her own CSDname/CSDuid lookup tables from the PCCF’s? 

5) Would there happen to be an easier way to do this? Are there actual tables that would give the Census-to-Census CSDuid corresponding changes? 

Answer

1) That is correct, we don’t have an electronic Census to Census CSDuid correspondence file but we do track changes through this document. There are concordance tables showing changes to Census Subdivisions between Census years, however they only show those CSDs which have changed and is not a complete listing of CSDs for each Census year.
The links to these tables are found on the Concordances between classifications page of the Definitions, data sources and methods website: <http://www.statcan.gc.ca/concepts/con cordances-classifications-eng.htm>. Table 2 does contain the CSDuid. The link I provided above also includes the CSDuid.

2) Please refer to Table B within the desired Interim List; the bottom of the table outlines all CSD change codes. To best address this person’s inquiry about correspondence, are they asking only about geographic change? One could argue that name changes and changes to population counts also constitute a change.

3) Please refer the client to the link above and see if it meets her needs.

4) See above.

5) See above. 

Updated Products - Labour Force Survey (LFS) Nov 2014

Labour Force Survey (LFS) – November 2014

LFS data for November 2014 are now available on the EFT site.

The Labour Force Survey estimates are based on a sample, and are therefore subject to sampling variability. Estimates for smaller geographic areas, industries, occupations or cross tabulations will have more variability. For an explanation of sampling variability of estimates, and how to use standard errors to assess this variability, consult the Data Quality section in the Guide to the Labour Force Survey.

The LFS guide: <http://www.statcan.gc.ca/pub/71-543-g/71-543-g2014001-eng.htm>

Eft: /MAD_DLI/Root/other-products/Labour Force Survey - lfs/1976-2013/data/micro2014-11.zip

Thursday, December 4, 2014

CCHS Bootstrap Weights

Question

A researcher is looking for the bootstrap weights for the 2003, 2005, and 2007 CCHS. Are these available? I did find the information in the Nesstar WebView for Cycle 2.1 (2003):

"Weighting - The principle behind estimation in a probability sample is that each person in the sample "represents", besides himself or herself, several other persons not in the sample. For example, in a simple random 2% sample of the population, each person in the sample represents 50 persons in the population. In the terminology used here, it can be said that each person has a weight of 50. The weighting phase is a step that calculates, for each person, his or her associated sampling weight. This weight must be used to derive meaningful estimates from the survey. For example, if the number of individuals who had a major depressive episode is to be estimated, the weights of survey respondents having that characteristic should be summed. In order for estimates produced from survey data to be representative of the covered population and not just the sample itself, a user must incorporate the survey weights into their calculations. In order to determine the quality of an estimate, the variance must be calculated. Because the CCHS uses a multi-stage survey design, there is no simple formula that can be used to calculate variance estimates. Therefore, an approximative method is needed. Coefficient of variation, standard deviation and confidence intervals can then be calculated from the variance. Thebootstrap re-sampling method used in the CCHS involves the selection of simple random samples known as replicates, and the calculation of the variation between the estimates from replicate to replicate. In each stratum, a simple random sample of (n-1) of the n clusters is selected with replacement to form a replicate. Note that since the selection is with replacement, a cluster may be chosen more than once. In each replicate, the survey weight for each record in the (n-1) selected clusters is recalculated. These weights are then post-stratified according to demographic information in the same way as the sampling design weights in order to obtain the final bootstrap weights. The entire process (selecting simple random samples, recalculating and post-stratifying weights for each stratum) is repeated B times, where B is large. The CCHS typically uses B=500, to produce 500 bootstrap weights. To obtain the bootstrap variance estimator, the point estimate for each of the B samples must be calculated. The standard deviation of these estimates is the bootstrap variance estimator. Statistics Canada has developed a program that can perform all of these calculations for the user: the Bootvar program."

Is the Bootvar program what he would need?

Answer

I have confirmed with the author division that Bootstrap weights are only available for the masterfile. I will review the information from the Cycle 2.1 with the codebook.

Wednesday, December 3, 2014

Requested List of 2001 Global Non-Response Rates by Census Tracts

Question

May we please request help locating a list of global non-response rates for 2001 Census Tracts? And in follow-up may we also request technical documentation on the global non-response rates threshold for suppressing Census Tract aggregate data (100% and 20%)? If this is not digitized, catalogue numbers would be great. A graduate student would like to know the reason for suppression of CT 5050054.00 data for a Geography thesis.

Answer


Have you consulted 2001 Census Handbook, Appendix B: <http://www.statcan.gc.ca/access_acces/archive.action?l=eng&archive=1&loc=http://www12.statcan.gc.ca/english/census01/products/reference/dict/appendices/app002.pdf>. Data Quality, Sampling and Weighting, Confidentiality and Random Rounding?

Within the appendix, it is noted: "Area suppression is the deletion of all characteristic data for geographic areas with populations below a specified size. The extent to which data are suppressed depends upon the following factors:
- If the data are tabulated from the 100% database, they are suppressed if the total population in the area is less than 40.
- If the data are tabulated from the 20% sample database, they are suppressed if the total non-institutional population in the area from either the 100% or 20% database is less than 40.

There are some exceptions to these rules.