Friday, November 28, 2008

Vehicle Kilometres Question


I am dealing with another research request for motor vehicle information, this time for recent vehicle-kilometres data. There is a table showing estimates by province in the Canadian Vehicle Survey: Annual, 53-223-X.

The researcher is looking for estimates of vehicle-kilometres driven in three major metropolitan areas, Montreal, Toronto and Vancouver. Is there any hope or possibility of getting something at this level?


Transportation Division can provide this data for a fee. Please let me know which years you need so they can provide you with a cost estimate.

Need Aboriginal Population Profile From 1991 and 1996


Is it possible to obtain the Aboriginal Population Profile for Canada, Provinces and Territories, Census Divisions and Census Subdivisions for both the 1996 and 1991 Censuses?

We were successful with this same request sometime ago when we were needing the 2001 data and are keeping our fingers crossed about this request.


According to my Census consultant, for the 1991 and 1996 censuses there no "cumulative profiles" as there are now for the aboriginal population. However, she has suggested the following print and CD-ROM products instead:

94-325 Profile of Canada's Aboriginal Population
This publication presents a statistical overview of each Aboriginal group in comparison with the non-Aboriginal population. A wide range of demographic and socio-economic variables is displayed by individual entries and grouped under six main headings. This publication also provides a profile of
demographic and socio-economic characteristics for the population with Aboriginal origins and/or Indian Registration, for Canada, provinces and territories. It is based on 20% sample data collected by the 1991 Census of Canada.

94-326 Canada's Aboriginal Population by Census Subdivision
This publication gives the population of each Aboriginal group by registration and band membership status.

94-327 Aboriginal Data - Age and Sex
This publication provides age and sex distributions for the 1991 population reporting Aboriginal origin, and the 1991 population identifying with their Aboriginal origins and/or who are registered under the Indian Act, for Canada, the provinces and territories, and selected census metropolitan areas. The data presented in this publication are taken from two sources: the 1991 Census and the 1991 Aboriginal Peoples Survey (APS).

These products were only available in print format and should be available in your library.

However, for 1996 there was a CD-ROM product:

94F0011XCB1996000 Portrait of Aboriginal Population in Canada, 1996 Census (20% Sample Data)
This CD-ROM provides a portrait of aboriginal population in Canada. This product is part of the Dimensions Series which provides census statistical information on topics of public interest.

It is available from the DLI FTP site:

For data at lower levels of geography you will need to investigate the Basic Summary Tabulations for those years or you may need to consider a semi-custom or custom tabulation. If you want to pursue the latter option, please let me know.

CCHS 4.1 and patient satisfaction and utilzation questions by health region


My question: Will a user be able to obtain data on patient satisfaction and utilization questions by health region from CCHS 4.1?
We are hopeful because the sample size has grown to 65,000 (this was a problem two years ago).

See: Sampling This is a sample survey with a cross-sectional design.

To provide reliable estimates to the 121 health regions (HRs), a sample of 65,000 respondents is required on an annual basis. A multi-stage sample allocation strategy gives relatively equal importance to the HRs and the provinces. In the first step, the sample is allocated among the provinces according to the size of their respective populations and the number of HRs they contained. Each province's sample is then allocated among its HRs proportionally to the square root of the population in each HR. (excerpt from:

However, after looking at the questionnaire we are cautiously optimistic:
Optional Content section, specifically page 268, Appendix 2, the patient satisfaction topics are only covered off by a few jurisdictions (at least according to this table).

CCHS 3.1 provided patient satisfaction data by province, and every province participated which may be why there is confusion now.


Here is the response from the author division regarding your question on patient satisfaction and utilization questions by health region from CCHS 2007:

"Patient satisfaction (PAS): Was asked out of a sub-sample of respondents in the 10 provinces. Selected as optional by Yukon and the Northwest Territories. Therefore, only provincial/territorial estimates are possible. However, this module was also selected as optional module by Ontario and Saskatchewan. Health region (HR) level estimates are possible for these 2 provinces only, where sample size allows.

Utilisation: if you are referring to the Access to health care services module (ACC): this was also asked as a subsample to the 10 provinces only. Not asked and not selected by none of the Territories. Selected as optional module by New-Brunswick and Ontario. HR level estimates possible for these 2 provinces only, where sample size allows.

Utilisation: if you are referring to the Health Care Utilization module (HCU): This module is part of Core content and therefore asked of everybody. HR level estimates are then possible, of course where sample size allows."

I hope this answers your question. Please note that CCHS 2007 is effectively CCHS 4.1, but they have dropped the Cycle number from the name in favour of the year.

Synthetic Files for the CCHS


This question is in regards to the new synthetic files announced for the CCHS:

Canadian Community Health Survey 2007 - Synthetic Files

The central objective of the Canadian Community Health Survey (CCHS) is to gather health-related data at sub-provincial levels of geography (health region or combined health regions).

Note: It is important to note that these synthethic files do not contain real data and should never be used for analytical purposes. Their only purpose is to assist users in developing and testing the computer programs that are to be submitted by remote job submission.

FTP: /dli/cchs/Synthetic_files-Dummy_files/

I am confused by this latest synthetic file for the CCHS. Other synthetic files were identified by an associated Cycle. What is the CCHS 2007 synthetic file all about?

Also, I think the web link for this 2007 synthetic file may inadvertently be conjoined with the web link for Cycle 1.1?

I had asked Health Division about the Cycle number when they sent us the data, and they provided the following explanation:
"Effective starting this cycle, the CCHS team made a decision to drop the .1 in the naming convention and simply state the year- (because CCHS has gone to continuous collection, the term CCHS 4.1 is not quite accurate anymore). Therefore, the CD is correctly named CCHS 2007."
Also, you are correct about the web link problem. There is an error in the web link for Cycle 1.1 on our English site (the links on the French site are correct). I will have our team fix that today.

Thanks for letting us know.

2001 Census Question


A health researcher at UBC has found everything he needs from the 2001 census except for one thing, the age structure of the total aboriginal identity population by forward sortation areas. I too have looked and have not been able to find a table with this level of detail.

Is it possible to confirm that that is the case? If so, are the data available through a custom tabulation?


I agree with you that the only way you can get this info is through a custom tabulation. In order to obtain that tabulation, you may contact the Vancouver Regional Office.

Wednesday, November 26, 2008



We currently encountered problems again in accessing the CANSIM (E-STAT) tables by subject. We got the following message:

Error opening the file: F:\WWW\ROOT\CII\CII_SUBJ.TPF
Could you please check the source of the problem?


The manager of E-STAT informed me that the redirects for some outdated links on the E-STAT site are not working at the moment. To ensure that E-STAT functions properly, she suggests accessing it from the following URLs:

in English:
in French:

DLI Website Not Available


The DLI website does not seem to be available in the French and in the
English version.


These two URLs will work if you take out ".gc"

We will send out a general notice on URL in the near future.


Official Announcement

On November 24, 2008, the Data Liberation Initiative web pages were successfully converted to comply with Treasury Board's Common Look and Feel (CLF) 2.0 guidelines. However, there are still some modifications that need to be done to the page URLs to make them fully compliant with CLF 2.0.

Therefore, we ask that you refrain from making any changes to your bookmarks and links to the DLI web pages at this time. The URLs that were in place prior to November 24 still work and will work for the next two years at a minimum thanks to redirection.

I will let you know as soon as the URLs have been modified to satisfy CLF 2.0 requirements.

We apologize for the inconvenience and thank you for your cooperation.

Survey of Innovation 2005 and Biotechnology Use and Development Survey 2007


Will there be a PUMF for the 2005 Survey of Innovation, as there were for 1999 and 2003? In addition, will there be a PUMF for the 2007 Biotechnology Use and Development Survey?

I noticed that all related CANSIM tables for these surveys have been terminated (with statistics up to 2005). Are there plans for any new/ongoing CANSIM tables for these two surveys?


I received the following response from the author division of both surveys (Please note that the IMDB section the Statistics Canada website incorrectly provides a link to a "PUMF" for the 1999 and 2003 Survey of Innovation. However, there is no PUMF and the link actually takes you to Excel data tables. I will ask my contact at the IMDB to correct that as soon as possible):

"We do not have a PUMF for the Survey of Innovation 2005 nor have we ever had a PUMF for the Survey of Innovation 1999 or 2003. What we do have is a researcher database that is accessible through the facilitated access program.

Our surveys are occasional. When CANSIM indicates that a series has been terminated what this means is that we will not be producing a time series of data. For example, we will not re-run the Survey of Innovation 2005 to update the data. We are in the planning stages for a 2008 Survey of Innovation. Some of the questions may be the same as they were in 2005 but many will be different. As well, the sample unit will be different. So in answer to the question there will be no more CANSIM tables for the Survey of Innovation 2005.

With respect to the Biotech Survey Program - at present there is no funding for a survey for 2007 and therefore no plans for any data. We continue to speak to interested parties, but it appears they are either not able or willing to provide the funds necessary to maintain this survey. This could change at any point, but it appears less likely as time passes. Obviously if there is no survey, there will be no updates to the CANSIM tables.

To the best of my knowledge, there has never been a PUMF for Biotech. The population is very small and there are quantitative variables which are sensitive. The data can only be accessed through the Facilitated Access Research Program run by the division. This program is limited to accredited researchers (PhDs
at Canadian universities or government departments) with an approved econometric research project and security clearance."

I hope this answers your question. Please let me know if you need any further clarification.

Monday, November 24, 2008

NLSCY Question

Is my understanding of the PUMFs available for the NLSCY correct?

1. PUMF's available for cycles 1-3 only

2. data for cycles 4-6 only available through RDCs.

You are correct in saying that the DLI offers NLSCY PUMFs for Cycles 1-3
only. We also offer synthetic files for Cycles 1 to 6.

I just received confirmation that the Research Data Centres have Cycles 1-6.
They expect to have Cycle 7 as well before the end of this year.

I hope this answers your question.

Aboriginal Educational Attainment


I’m looking at the figure for educational attainment of the Total Aboriginal identity population, 15 years and over, for the province of Ontario.

If I look at the following topic-based tabulation, the figure for ‘University certificate, diploma or degree’ is 16,480:

Aboriginal Identity (8), Highest Certificate, Diploma or Degree (14), Major Field of Study - Classification of Instructional Programs, 2000 (14), Area of Residence (6), Age Groups (10A) and Sex (3) for the Population 15 Years and Over Data products — Aboriginal Identity (8), Highest Certificate, Diploma or Degree (14), Major Field of Study - Classification of Instructional Programs, 2000 (14), Area of Residence (6), Age Groups (10A) and Sex (3) for the Population 15 Years for Canada, Provinces and Territories (97-560-X2006028)

However, if you look at the same data in the Aboriginal Population Profile 2006, the figure for ‘University, certificate diploma or degree’ is 12,435, which is the figure in the table above for ‘university certificate or degree’. The figures for all the other variables in the ‘Highest certificate, diploma or degree’ dimension match. Unless I’ve missed something, it looks like the label in the Aboriginal Population Profile 2006 is incorrect. If that is so, can this be corrected?


Our Census consultant has confirmed that the label "University, certificate diploma or degree" in the Aboriginal Population Profile 2006 is incorrect. It will be changed to "University certificate or degree."

Thanks very much for spotting that.

LFS and Immigrants

A question from a user:

"In 2006, the LFS started collecting information on immigrants. This information is not available through the DLI file. Are there any plans to include the 5 questions on immigrants in the PUMF? As the LFS is not available in RDCs, is data only available through existing publications and custom tabulations?"

I have informed her that in fact the LFS microdata _are_ available in the RDCs. But what about the answer to the question about the availability of these variables in the LFS pumf?

Our LFS consultant said that they are planning to add more variables and geography (i.e. CMAs) to the LFS PUMFs. Pending approval, they hope to add these sometime in 2009-2010. This is consistent with information provided to us at the April 2008 EAC meeting.

Follow-up Question
Will the geography include CAs as well as CMAs?

Follow-up Answer
The LFS consultant has confirmed that they only provinces and selected CMA's will be made available.

Notes about SHS 2006


1) The Data Dictionary (shs2006cbk.pdf) presents conflicting information for two variables (NMVEHONP and VEHLEASP, page 36).

The Readme file indicates that the effective date for household equipment counts was the interview date. Would you confirm that the "Long name" information is inaccurate? If it is inaccurate, the variable labels used in SPSS and SAS are also in error.

Long name: Number of vehicles owned on December 31
Description: This variable gives the number of vehicles (car, van/mini-van, truck/sport utility vehicle) owned by members of the household at time of interview completely or partially for private use, excluding those leased.

Long name: Vehicles leased on December 31
Description: This variable gives the number of vehicles (car, van/mini-van, truck/sport utility vehicle) leased by members of the household at time of interview completely or partially for private use.

I'd also point out that since the responses to VEHLEASP are "yes" and "no", the "Description inaccurately describes the variable: it's not the "number of vehicles" leased but instead is a code to reflect the presence of leased vehicles (as indicated in the "Unit of Measure").

2) I was looking at variable NETCONEP, and saw that there was no "missing" category for the variable. Instead, people who weren't asked the question (those who did not use the internet from home, from variable INTERNET) are collapsed into the NETCONEP category of "0 -- No internet connection". I think that it would be preferable to explicitly separate those who weren't asked the question from those who responded that they didn't have an internet connection. It's unlikely, but possible, that respondents have a home internet connection, but do not use it: this possibility is eliminated by the current coding, which doesn't accurately reflect the questionnaire.

I have discussed your comments and questions about the SHS 2006 with the author division. Here are the responses they have provided.

1. "The correct date is December 31 of the reference year for vehicles. And the vehleasp does indicate the presence of at least one leased vehicle on Dec 31, not the number. The definitions should be modified."

2. "If the respondant says no to internet use from home we do not ask any further questions on internet access and by default assign "no access". This may not be strictly true in some rare cases, but we have to minimise questions that the respondants consider a waste of time when they have already told us "no"."
The author division will be correcting the documentation to reflect the correct reference date for vehicles. I have asked them to provide us with the corrected versions as soon as they are available.

For the NETCONEP variable, I will ask our SPSS coder to change the status of the "0 -- No internet connection" code to "missing".

Canadian Travel Survey - updates?

I was wondering whether there is or will be an update to the CTS? We only have till 2004 - but it looks like 2006 is the most recent release?

In early 2005, the Canadian Travel Survey (CTS) was replaced by the Travel Survey of Residents of Canada (TSRC):

Q1. The online catalogue does refer to a 2006 PUMF for the CTS. I will contact the author division to see if this is incorrect.

Q2. In either case, I will ask them to send us an updated PUMF for either the CTS or the TSRC and I will keep you posted.

Answer to:

Q1 It is a mistake. They should take off this link.
Q2 We will prepare a 2006 CD-Rom this fall for DLI.

Follow-Up Question
Thanks for the info - I'm assuming that 2005 will be included as well?
The first PUMF for TSRC is 2006.
(This will be sent to DLI sometime in Nov.)

No public microdata files is foreseen for TSRC 2005

However, the 2007 microdata files will be ready before the end of 2008.

Aboriginal Population Stats


I have a faculty member who's looking for Aboriginal (including Metis, Inuit, etc.) population stats by age characteristics from the 2006 Census, The catch is that he'd like it in 1-year age increments rather than the standard 5-year increment. I suspect he's looking at either a custom tabulation or an applying to use the nearest RDC, but I thought I'd ask the group first.

Our Census consultant has confirmed that population statistics showing single years of age for aboriginals would be a custom data retrieval.
Let me know if you would like to order it, and I will ask one of our account executives to get in touch with you.

Friday, November 21, 2008


I have a researcher who is VERY interested in obtaining some custom tabulations from both the UCASS and Survey of Earned Doctorates. How would they go about doing this?

Also - is there any historical data available through DLI for the SED?

For custom tabulations, I can have an account executive contact you or your
researchers directly. Would you like me to arrange that?

As for the SED, there is data for 2003/4 and 2004/5 available under the
"Education" folder on the DLI FTP site and website. However, according to
the Daily there was a release of 2002/3 data as well. I will ask our
Education consultant whether they can provide us with that data and keep you

Weighting in GSS Cycles

I have a faculty member asking this question regarding GSS cycles and weighting. Your help will be greatly appreciated.

I'm using cycles 1, 9 and 18 of the General Social Survey. Do I have to incorporate the weights for the cases to find correlations and do regression analyses? For example, in cycle 9 the weight variables PERWGHT.

Yes, you should always use the weighted data to do this type of calculations and you may want to use the program bootvar to calculate the variance.

Hope this helps.

Agriculture-Population Linkage Database Access

1) Does the DLI community have access to the Agriculture-Population Linkage Database as described at:

How about through the RDC's?

2) My student really needs access to the raw data, the published tables are not adequate for her research on "farm women" -
different from female farm operators.


1) Data from the 2001 Agriculture-Population Linkage Database are available for free on the Statistics Canada website:

The data found at the above-mentioned link are the only data mentioned in the Daily release for this product: However, I will check with Census Division to see if any additional data could be obtained through custom tabulations or other means and keep you posted.

2) Our Agriculture consultant confirmed that 2001 Agriculture-Population Linkage data are only available (1) via the web at the link already provided below, or (2) via custom tabulations.

He also mentioned that 2006 Agriculture-Population Linkage data will be available on the Statistics Canada website on December 2, 2008.

Please let me know if your student is interested in a custom tabulation.

Older versions of the Postal Code Conversion File


I’m writing to see about getting older versions of the PCCF (Postal Code Conversion File). The DLI FTP site has files from 2007 to March of 2008, but nothing older. I have locally stored files from earlier years, but lack 2004 through 2006. A researcher here needs a version of the PCCF from this period. Are these older files still available? Is there a reason why we don’t have these older files on the DLI FTP site?

Older versions of the PCCF are available on our FTP site. The PCCFs for 2002 to 2006 are available in the 2001 folder:


More PCCF files can be found in the 1986, 1991, and 1996 folders at: /ftp/dli/geography

GSS-17 Ethnic Question


The GSS17 questionnaire has questions on ethnicity, but we are not finding a corresponding variable in the PUMF. The user guide does not address the issue of missing data, and I cannot seem to locate the user guide for the master file.

Could you ask the great folks at GSS where ethnicity is hiding in GSS17?


The GSS team has provided the following explanation for the missing ethnicity variables in the Cycle 17 PUMF:

"All the ethnic variables were suppressed the cycle 17 PUMF, in order to protect the confidentiality of respondents. This was done to all GSS PUMF that could have contained ethnic variables, except for cycle 20 that only has 7 categories of ethnic responses (those are very limited types of answers). This exercise of variable suppression is necessary to avoid the possibility of identifying respondents, based on certain characteristics that could be used to identify someone, such as their ethnic origin. The ethnic variables for cycle 17 are only available in the analytical file, available in the RDCs."

CANSIM Table 180-0003


I have a question regarding CANSIM TABLE 180-0003 (Financial and taxation statistics for enterprises, by North American Industry Classification System (NAICS), annual …. ). This table presents information aggregated for all of CANADA. A researcher is inquiring whether the information on CANSIM Table 180-0003, is also available, via the DLI, (or elsewhere), for individual provinces.

(I did take a look at the CDROM Financial Performance Indicators for Canadian Business, and to the extent I was able to understand what was being presented, it seems this product presents ratios, not amounts. Volume 3, presents provincial level information, but seemingly only for small corporations. So I am assuming this DCL CDROM is not likely to be a source of the required information? )

Sorry our provincial data is not available on the Statistics Canada website. This data is not readily available so programs must be run in order to compile the data. For this reason, the data is compiled under a cost recovery program. If you would like further information on our cost recovery program and what might be available for purchase, please reply by return email.

TLAC Preliminary Tables


As I have a user (a library administrator) anxiously waiting for the TLAC 2008-2009 file to be available through DLI, I just had a quick look at the FTP site to see if something had been posted. The tuition-data(tlac).zip file was apparently updated on November 14 so I looked at the content and saw that a new file appeared which is called: tlac-fssuc tables-tableaux 1 to-à 6 - 2008-2009 prelim.xls (created or last modified also on November 14)

So I figured this would present preliminary data for the tables 1 to 6. Is that correct?

In any case I tried to open the file but it appears to be corrupted, it cannot be properly unzipped.


Yes - the data for 2008-2009 that you found is the preliminary data. We are supposed to be getting some additional preliminary files this week so we were going to announce them once we had all the files. You are welcome to use the preliminary data that is there now. Our team fixed them so you should be able to unzip them now. Please try again and let me know if they are still corrupted.

FYI - I just received the additional TLAC preliminary files for 2008-09. I will have our team post them right away and will announce them shortly.

Follow-Up Question
Can you tell me what difference there is between the preliminary files and the final ones? But maybe this explanation will be part of your official announcement later on, in which case ignore this question.

By the way, the files are fine now; I was able to unzip them without a hitch.

Do these files include Table 7?

Follow-up Answer
1) It does (or rather did) include Table 7, but I just received an email from the author division asking me to remove these files from our site because they have found errors in them.

So, I will be sending out an announcement to that effect in a few moments.

2)The author division has explained that the TLAC preliminary tables include some estimates, whereas the final tables do not. Here is the full explanation provided by the division:

"On the questionnaire, under General Instruction, the first instruction is:

Whenever possible, final fees and living accommodation costs should be reported. If they have not yet been determined your best estimate should be reported. Place an "e" after each estimated figure on the questionnaire.

This means for 2008-2009 the data includes final and estimated fees. For 2007-2008 it is all final data since in the questionnaire it was asked to report the actual fees for the previous year. This is the explanation of why for 2007-2008 we have final tables while it is preliminary for 2008-2009. After next year's survey, 2008-2009 tables will be final and 2009-2010 will be preliminary."

Wednesday, November 19, 2008

2006 Census of Agriculture


A researcher at York has asked whether it is possible to find the total hectares and types of production systems for farms in Toronto, separate from the CCS of Vaughan, Ontario.

Is it possible to get this kind of detail, for free or for fee?


I suspect you may have used the Agricultural Community Profiles to find this data. Have you also looked at the Land Use data tables on the web at: ?

Excel tables from the Census of Agriculture are also available on our FTP

TABLE: ceag_farm_data-reag_donnees_sur_les_exploitations.csv

The Reference Documents that will allow you to decipher these tables are available in the following directory:


For example, in the reference document
"2006_ceag_farm_data-variable_descriptionv2.xls", you will notice variables
such as the following:

VAL_AOWNED NUMBER Total area owned - Acres Acres

In the 2006_ceag_geography.xls document, you will be able to identify whether the CCS of interest are available in the actual table.

Follow-up Question

Thank you for all of your suggestions.  Unfortunately, the data in both the Agricultural Community Profiles and Excel tables from the ftp site only refers to the CCS of Vaughan, Ontario.  My user would like data for
Toronto, which has been amalgamated into this CCS.  Is this data available, either for free or fee?

Follow-up Answer

Agriculture Division confirmed that data for Toronto would be available through a custom request:

"The cost is $580 plus taxes for the following:
All farm and operator data for Toronto Division, 2006 Census"

Ethnic Diversity Survey


A faculty member here is looking at using the Ethnic Diversity Survey. We couldn’t find a “number of children ever born” variable - is there any measure of fertility other than number of children in the household (which obviously doesn’t capture children who have moved out)?


The author division of the Ethnic Diversity Survey has confirmed that this survey does not measure the number of children who live outside of the household.

Messengers in Toronto - Info needed


I have a faculty member looking for the languages spoken (preferred) or ethnicity of messengers working in the City of Toronto (or even CMA)

I think the census is going to be the only resource that has this The relevant NAICS is 49221 or 492210 and the relevant NOC-S is B563.

Census 2006 tables that go to the level of detail for NAICS and NOC-S do not have detailed data on language or ethnicity.

I wonder if Census could supply this info or if not whether it would be possible to get a custom tab?

Thanks for any help and suggestions of other sources.


Our Census Consultant has confirmed that this would be a custom tabulation - specifically either Industry or Occupation codes crossed by the pertinent language or ethnic origin variable. However, the noted the following:

"the counts I see for NAICS and NOC-S are at the CMA level, and a custom consultant may indicate that the counts are not sufficient enough to produce any valid information for any detailed crossing of the variable."

NPHS Cycle 7 Synthetic Files


Is there any information as to when the NPHS cycle 7 synthetic files may be released to DLI?


The NPHS team has indicated that they are hoping to get us the NPHS 7 Cycle 7 synthetic files towards the end of December.

Ontario Wage Survey 1999


I have a prof here at Guelph looking for the 1999 Ontario Wage Survey. I found a reference to it in the Daily - December 1991 - here's the link:

Is this available through DLI? The link that is provided in the Daily is no longer working.


1) Try this:

2) I found the following info about the 1999 Ontario Wage Survey

3) This looks like a business based survey that was done by the Small Business and Special Surveys Division. Usually business based surveys do not produce a public use file. They may have produces some tables or a report. I suspect that the data may be available only through custom tabulations - for a fee of course.

Friday, November 14, 2008

2006 Journey to Work data


A faculty member at McMaster needs 2006 Journey to Work data for Moncton-305, Saint John-310 and Saskatoon-725. I have looked into 2006 JTW custom tables that were made available through DLI but I could not find data for these 3 areas. Is there any place else where I should be looking for this data? Has the JTW data been produced for these areas? Any information related to this matter will be greatly appreciated. Thanks in advance for any help!

Did you try the following tables? They are available on the FTP at: /dli/census/2006/2006_pow_consortium/final_2006_ct_flows/canada

Commuting Flow for Census Metropolitan Areas, Census Agglomerations and Census Tracts: Mode of Transportation (9) and Sex (3) for the Employed Labour Force 15 Years and Over Having a Usual Place of Work, 2006 Census - 20% Sample Data

Catalogue number 97C0088

Commuting Flow for Census Metropolitan Areas, Census Agglomerations and Census Tracts: Work Activity (4) and Sex (3) for the Employed Labour Force 15 Years and Over Having a Usual Place of Work, 2006 Census - 20% Sample Data

Catalogue number 97C0089

Depending on what you are looking for specifically, the tables in the following directory may help:


I also found these tables in the topic-based tabulations section: /dli/census/2006/Topic-based-tabulations/place-of-work-and-commuting-to-work/b2020

Place of Work Status (5), Age Groups (9) and Sex (3) for the Employed
Labour Force 15 Years and Over Canada, Provinces, Territories, Census
Metropolitan Areas and Census Agglomerations - Cat. No. 97-561-X2006006

Commuting Distance (km) (9), Age Groups (9) and Sex (3) for the Employed
Labour Force 15 Years and Over Having a Usual Place of Work Canada,
Provinces, Territories, Census Metropolitan Areas and Census
Agglomerations - Cat. No. 97-561-X2006010

Mode of Transportation (9), Age Groups (9) and Sex (3) for the Employed
Labour Force 15 Years and Over Having a Usual Place of Work or No Fixed
Workplace Address Canada, Provinces, Territories, Census Metropolitan
Areas and Census Agglomerations - Cat. No. 97-561-X2006012

Obtaining a License for Census Data


A researcher here is collaborating with the City of Ottawa on a research project and would like to obtain a license for some Census data. Could you please provide me with a contact for her?

Please have your researcher contact Licensing Services. Their contact details can be found on the following page of our website:

Wednesday, November 12, 2008

PUMF for 2005 International Survey of Reading Skills (ISRS)


I would like to inquire whether there will there be a PUMF for the International Survey of Reading Skills (ISRS) – 2005 /Enquête internationale sur les compétences en lecture (EICL) -2005, as noted in The Daily last January :


1) I just received confirmation from the author division that there will be no PUMF for the ISRS 2005 survey.

2) The author division has provided me with some additional information on this topic. Apparently, the reason there will be no PUMF for the ISRS 2005 is because the sample size is too small.

Information on Visa Students


I received the following request: "I am interested in gaining access to Stats Can info through the Data Liberation Initiative, especially as concerns statistics for international education. Can you please tell me how I may search/access this data?

For example, we are currently in need of the most up-to-date stats on country of origin for visa students in Canada and how many of those come from the U.S."

I thought this would be fairly straightforward; not quite. I found an AUCC publication on Enrolment but the data go only to 2004. And I have looked all through PSIS and found several tables, but none that fit the bill. There's a Daily article from Feb 2008 with 2005-06 data, but they aren't detailed enough.

Any other thoughts? I called our planning folks and they actually provide data to PSIS so we only have our own here taken from our admin system.


1) It's not Stats Can that collects that info, but rather Citizenship and Immigration. For example, the 2006 Facts and figures at: has a bunch of tables on foreign students, including the top 10 countries of origin.

Before 2003, they were reported in:

Facts and figures [yyyy]: statistical overview of the temporary resident and refugee claimant population.
I am not sure whether or not LAC has those on its web site, but I have copies from 1999-2002 (pdf files), if you need them.

2) I have found statistics on foreign or international students enrolled in Canadian universities in OECD sources. Look in SourceOECD under OECD databases: OECD.Stat (2004-2006) or Education Statistics, where one of the databases is called Foreign Students Enrolled (1998-2003).

3) Our Education consultant told me that data on country of origin for visa students in Canada is available as a custom tabulation:

"This would be available from a custom extraction from PSIS. The cost would be $160 for one year and $10 for each additional year added at the same time. Turn-around time for delivery is 5-10 working days and pre-payment by credit card would be required."

Monday, November 10, 2008

Winter Insitute on Statistical Literacy for Librarians 2009

The University of Alberta Libraries will be hosting its third Winter Institute on Statistical Literacy for Librarians from February 18-20, 2009. This training event will provide strategies and skills for finding, evaluating and retrieving published statistics and will be useful to information professionals working in academic, public and special libraries.

This workshop will not provide instruction about how to do data analysis, although some examples will include the use of data to demonstrate how statistics are produced.

The instructional focus is on making digitally published statistics more accessible as information to librarians and their patrons. The topics to be covered include:

- A framework for thinking about published statistics
- How statistics and data are related
- Statistical definitions, standards and classifications
- Metadata for statistical displays
- How official and non-official statistics are produced
- Evaluating statistics and statistical sources
- Tools and strategies for locating statistics
- Citation standards for statistics
- Geographical and spatial representation of statistics
- Addressing the challenge of finding statistics for small geographic areas

The conference is restricted to 30 participants on a first-come, first-serve basis. The registration fee is $250 and includes continental breakfast, coffee breaks and lunch.

For more information and to register, visit

Wednesday, November 5, 2008

NGS (National Graduates Survey)/FOG PUMFs


Could we get an update on when/if a pumf is likely, please? Researchers are starting to see publications based on the RDC data, and want to be able to verify/expand these analyses, but can't, which is frustrating for both the researchers as well as us. If they can't verify the results, they often won't quote them or use them in teaching either, which sort of defeats the purpose of doing the surveys in the first place.

And no, a couple of excel tables of aggregate stats from the 2005 FOG is not a sufficient sop.


The latest information we have about the availability of pumfs from either the NGS 2000 or FOG 2005 is:

"pumf in March 2004 (dlilist 2003/02/03);"
"availability of a public use microdata file uncertain, maybe by spring
2006 (dlilist 2005/09/19);"
"April 2006 (dlilist 2006/01/19);"
"Production of a pumf currently on hold due to workload (dlilist
2006/07/12); "

Inquiry on "Gross Rent" variable in 2001 Census


A student has asked me about the following 2001 Census table:

This table shows ‘Gross Rent as a percentage of 2000 Household Income’. What is confusing under ‘gross rent’ is the presence of both a “50% and over” category and a “50-99%” category. Why have both categories, and what is the practical difference between them?

The 2001 Census Dictionary ( ) does not include the 50-99% category for this variable.


Owner.s Major Payments or Gross Rent as a Percentage of Household


Part A . Plain Language Definition

Percentage of a household.s average total monthly income which is spent on shelter-related expenses.

Those expenses include the monthly rent (for tenants) or the mortgage payment (for owners) and the

costs of electricity, heat, municipal services, etc. The percentage is calculated by dividing the total

shelter-related expenses by the household.s total monthly income and multiplying the result by 100.

Part B . Detailed Definition

Refers to the proportion of average monthly 2000 total household income which is spent on owner's major

payments (in the case of owner-occupied dwellings) or on gross rent (in the case of tenant-occupied

dwellings). This concept is illustrated below:

(a) Owner-occupied non-farm dwellings:

Owner's major payments X 100 = ___%

(2000 total annual household income) /12

(b) Tenant-occupied non-farm dwellings:

Gross rent X 100 = ___%

(2000 total annual household income) /12

Censuses: 2001 (1/5 sample), 1996 (1/5 sample), 1991 (1/5 sample), 1986 (1/5 sample), 1981

(1/5 sample)

Reported for: Private households in owner- or tenant-occupied non-farm dwellings

Question Nos.: Derived variable: Questions 51, H6 (a), (b), (c), H7, H8 (a), (c) and (f)

Responses: Not applicable

Remarks: The response categories used in the census products are as follows: less than 15%;

15-19%; 20-24%; 25-29%; 30-34%; 35-39%; 40-49%; 50% and over.

Any thoughts?


1) I had to consult with the Census division for a specific explanation in regards to your question. Their answer is as follows:

"Gross rent Refers to the proportion of average monthly 2000 total household income which is spent on owner's major payments (in the case of owner-occupied dwellings) or on gross rent (in the case of tenant-occupied dwellings).

The relatively high shelter cost to household income ratios for some households may have resulted from the difference in the reference period for shelter cost and household income data. The reference period for shelter cost data (gross rent for tenants, and owner's major payments for owners) is 2001, while household income is reported for the year 2000. As well, for some households the 2000 household income may represent income for only part of a year.

In the category "Average monthly total of all shelter expenses paid by tenant households", "Gross rent" includes the monthly rent and the costs of electricity, heat and municipal services.

In the case below, 4,795 represents 50% and over while 2,920 represents 50-99%. Therefore 1,875 spend over 99% of their total household income on Gross rent."

I hope this clarifies the numbers shown in the table.

As well, we suggest you use the numbers provided in this table cautiously. There are notes in regards to the reference period for shelter cost and household income data wich you may access by clicking on the link "More information on this product is available here.", provided under the table you have referenced.

2) I received a similar response from our Census consultant, and I also received some information regarding why the 50-99% category wasn't included in the 2001 Census Dictionary. The consultant concluded that this was an oversight because the dictionary is produced before some of the standard products are released:

"The dictionary is usually produced before the standard products are released and at times after analysis of the data by our subject matter experts they may decide to include additional breakdowns or to remove breakdowns do to the quality of the data."

However, she is going to suggest that the 50-99% category be added to the 2011 Dictionary, and asked me to thank you for bringing the issue to her attention.

Difference between Census Access levels 2 and 3


Can someone clarify what we have access to on the Census site that is level 2 (DSP?) and level 3 access, please? I am doing a presentation next week and want to make sure that I'm providing the correct info.

Thanks in advance.


1) Your question creates an opportunity to put in a plug for
the DLI Survival Guide on the STC website. You will find a description
of Level 2 access under the section on "Accessing and Citing DLI Data." See:
The details are under the heading: "Census data -- a more detailed level of geography."

2) Here is a cut and paste form the DLI Survival Guide:

Census data - a more detailed level of geography

Commonly referred to as Level 2 access to Census data, DLI members have access to
additional Census data at lower levels of geography and the additional option to download
to Beyond 20/20 format if they so desire. These access levels apply to the release of the
standard topic-based tabulations, release profile components and the cumulative profiles.

The following summarizes our Internet product availability at Level 2:

* Topic-based tabulations for all levels of geography (except
forward sortation area (FSA) and dissemination area (DA)).
* Release profile components and cumulative profiles for all levels
of geography (except forward sortation area (FSA) and
dissemination area (DA)).
* Dissolved census subdivision (CSD) profile data.

(Although FSA and DA levels are not available through Level 2 access, these levels of
geography are readily available on the DLI FTP site.)

If your DLI-member institution does not currently have access to level-2 Census data,
please contact the DLI unit <> with the IP range for your
institution and we will facilitate the access for your institution.

3) A description of the different Census access levels is available in the Survival Guide:

Also, in replying to a similar question back in April, we obtained the following definitions from our Census consultants:

Level 0 = Available for free to all users via the Internet
Level 2 = Available to pre-determined users, key stakeholders and partners. So in essence our level 2 users have access to everything up to level 2 and is available on our site.

Level 3 = Available for a fee. Key partners and stakeholders, including the DLI, have access to these files however. In the case of the DLI, they are distributed through the DLI FTP site.

4) I was at a Census presentation yesterday and level 3 is internal access for STC. Census does provide Level 3 data to some key partners and stakeholders, including the DLI. However, DLI contacts can only access Level 3 data via the FTP site.

5) When we last talked you had mentioned that you would be interested to know the criteria used by Census in determining whether a file is classified as level 2 or 3. I thought I should share the answer with the group as it elaborates on my recent posting about census access level definitions (below).

I discussed the matter with the Chief of Census Standard Products and Internet Development, and she confirmed that the size of the file determines its level. If it is small enough to be delivered via the web, it will be classified as level 2. However, if a file is too large for dissemination via the web, it can only be distributed via the FTP site and will be classified as level 3. Typically, it is the DA-level files that end up being too large. However, there are a few products at other levels of geography that are too large to be delivered via the website and must be classified as level 3 as well. For example: a detailed industry by detailed occupation table crossed with other variables at the CD/CSD level STILL may have to sit at Level 3 because of size.