Friday, February 27, 2009
A student is researching recent statistics on the consumption of olive oil in New Brunswick. Is there any data available via DLI?
The person responsible for the product "Canada Food Stats" told me that the division doesn't have detailed data on the oil consumption (by type of oil: e.g. Olive, Canola) nor provincial data on food consumption.
I have a researcher who would like to know how to gain access to the Small and Medium Enterprise Data Warehouse. Any possibilities through the DLI?
"No SME Data Warehouse microdata are accessible by the public. All counts are aggregated"
And, from the Daily release:
"Custom tabulations are available through Small Business and Special Surveys Division on a cost recovery basis."
Therefore, I have asked the division whether they would be willing to provide us with some of the aggregate data that is normally only available through cost recovery. I will let you know if I receive a positive response.
In the process of providing datasets from this source to a graduate student, we discovered what appears to be an anomaly with the import file for 2004. This file is much smaller (at 13MG) than its equivalents for all other years (over 50 MG in general). The data is missing for all merchandises with an HS10 code above 4707300090 (part of 4707300090 is also missing). The files on FTP server and on the DLI Web server seem to be identical in that respect.
Is there any reason for that or is just a deficient file? More importantly, can the complete import data be obtained for 2004?
The author division has agreed to rerun the 2004 import file and provide the DLI with a corrected copy. As soon as the corrected version is available on our site I will send out an announcement.
I am preparing the CCHS(2007) Synthetic files for a research team, and have found that there are missing data for WTS_S, the 'shared' Weight variable. This is unexpected, as all the cases have a weight for the Master weight variable, WTS-M.
The research team would appreciate elucidation on this point!
My contact in Health Statistics Division provided the following explanation for the missing WTS_S data in the CCHS 2007 synthetic file:
"The reason for WTS_S being blank is because it's the share weight and therefore only populated by users that agreed to share their data."
Two researchers here at York, using the pumfs from the 2006 SLID, have reported problems
with the syntax files in SPSS and SAS. Both are receiving error messages for missing
value specifications. I am not very familiar with these applications and don't know how
to help. Any suggestions would be very welcome.
The SPSS syntax files for the 2006 SLID pumfs were quite bad.
1) integer variables were uniformly coded as alphabetic
2) there was no specification of decimals, required by several variables
3) most of the missing data specifications that were given were in error
(ie 4-digit missing variable specs for 2-digit variables (eg '9696 thru
9999' instead of '96 thru 99'), and 2-digit variable specs for 1-digit
variables (eg '66 thru 99' instead of '6 thru 9'))
4) the bulk of the variables that needed missing data specifications had
I haven't looked at the SAS syntax, but assume it suffers from the same
Partly revised syntax files are linked from:
We're currently working with the user summary tapes in the 1981 Census. I can't find any associated documentation to some of the files on the FTP site. For the following files, we need the French equivalent of these documents:
For the file CTH81A10, there is no available documentation in French or English.
For the following files, the French section of these documents are found after the English section.
c1981bst-ea-b10indoc.pdf (French section starts on page 189)
c1981bst-ea-b30indoc.pdf (page 100)
c1981bst-ea-b40indoc.pdf (page 99)
I don't have documentation for the file CTH81A10 (c1981bst-ct-a10hhdat.gz on our FTP site). I'm going to ask the Census team find us the document.
Monday, February 23, 2009
Do we know when the Survey of Household Spending 2007 PUMF will be coming out?
I just found out that the author division expects to release the SHS 2007 PUMF sometime in May. I will add this information to the expected release dates page on the DLI website.
Friday, February 20, 2009
All those previously requesting the "Commuting Flows" data have been
using it for years, and just wanted the files. However, I now have a
student who is hoping that I will explain its structure to her. So, I
am hoping to have your indulgence for my ignorance and explain it to me
-- so that I can explain it to her! Is there a user guide for these
In the file for Ontario, for example (97561xcb2006008_ont.ivt) :
Total - Industry - North American Industry Classification System 2002
Total - Sex Male
Canada (01) 20000 / Ontario (35) 5134485 2559960 2574525
Total - Industry - North American Industry Classification System 2002
Total - Sex Male
Arviat (6205015) HAM 00000 / Ottawa (3506008) C 0 0
I assume that the first line means that for all of Canada, 5,134,485
people commute TO or WITHIN Ontario, and that the last line means that
some number of males, between 1 and 10 in the hamlet of Arviat in
Nunavut commute to Ottawa. Is that correct?
The table title leads me to the conclusion that the 7-digit codes are
CSD codes. But I am puzzled by the 5-digit codes just before the / --
going from 20000 to 00000: what do they represent?
Also, given the large distances of some of these combinations, how is
As I indicated initially, if there is a description somewhere that I
should be reading, to get these answers, point me to it.
I found the following explanation of random rounding in the Census Dictionary:
Rounding is a mathematical operation that can increase a number, decrease a number or leave it unchanged; only certain predetermined values are permitted. For example, we could decide in advance to round figures to the nearest multiple of 10, the next highest multiple of 10, or the next lowest multiple of 10. So, if we round 10, 13 and 17 to the next lowest multiple of 10, the result would be 10 in all three cases.
The random rounding method is based on established probabilities. It involves rounding every figure in a table (including the totals) randomly up or down to the nearest multiple of 5, or, in some cases, 10. For instance, random rounding of 12 to a multiple of 5 would yield either 10 or 15; applying the same operation to 10 would produce 10. This technique provides strong protection against direct, residual or negative disclosure, without adding significant error to the census data.
So your understanding of the random rounding is correct.
In terms of documentation regarding commuting, our census consultant suggests that you read the definition of commuting distance in the Census Dictionary:
http://www12.statcan.ca/english/census06/reference/dictionary/pop019.cfm (don't forget to click on "More information..." at the bottom of the definition)
Since commuting distance is derived from Question 46, you can refer to the definitions relating to that question as well:
Finally, you can see the actual question (46) from the questionnaire itself:
We have a question about the NLSCY codebooks. This was discovered by
Kevin Selbee, our other Data Specialist.
Identifying Suppressed Variables in NLSCY Codebooks.
Codebooks for Cycle 2 and Cycle 3 of the National Longitudinal Survey
of Children and Youth (NLSCY) do not contain necessary information
about the variables in the master files that are wholly or partially
suppressed in the public use files. In the case of wholly suppressed
variables, users of the PUMFs cannot tell which variables are or are
not available in the public-use files without going to the actual data
files and scanning a full set of frequency tables. In the case of
partially suppressed variables, users have no way of determining that
the frequency counts in the public data files differ from those in the
master file codebooks because valid cases in the master files have
been re-coded to missing values in the PUMFs.
Currently, the codebook for Cycle 1 contains the required
information on wholly or partially suppressed variables but the
codebooks for Cycle 2 and Cycle 3 do not.
The author division has confirmed that they do not have a list of the suppressed variables at this time. However, they will be looking into a temporary solution for this and will let me know what they come up with. I will keep you posted.
I see from the StatsCan website that NID[cat no 13C0015] is available, for “some geographic areas” from 1986.
We’re interested in NID Table 4[Taxfilers and Dependents with Income, by Source of Income] at the CSD, if possible, dating back 25 years, if possible. What I’ve downloaded so far, from the SAAD directory, is NID Table 4 for the CMAs and provinces from 1995. Is that it? Or am I missing something?
My contact in SAADD (the author division) has confirmed the following regarding the NID data:
1) data by province and Canada levels for any of SAADD's standard products, which includes NID, is free upon request.
2) If a student requires any SAADD data (standard products) ,a few NID tables for example, its free by CMA only.
3) Data by CD, FSA or by postal walk always entails a fee which includes students.
4) CSDs is a census term and does not apply to the geography within SAADD. However, our postal cities (also known as cityID) are a very close approximation to CSDs. This data is available for a fee which also includes students.
The above applies to all years of data available, and data prior to 1995 is available. However, that data has not been provided to the DLI and clients will have to request it from the division and get a quote. If you are interested in a quote, please let me know.
A guide to SAADD's statistical information packages is available for free online if you would like further information : http://www.statcan.gc.ca/pub/17-507-x/17-507-x2006001-eng.pdf
Friday, February 13, 2009
Does anyone know if there are boundary files for CMAs by FSA, and what the path to them would be?
I confirmed with our geography consultant that no such file exists. According to the consultant, a user would have to take the two separate layers (CMA and FSA), and join/select the FSA's within CMA's spatially. Even then, FSA's do not respect any of our standard areas and would not match the CMA's boundaries.
Thanks very much; that makes sense to me. Do you know the path to the 2001 CMA boundary file?
As requested, here are the paths to the 2001 cartographic boundary files for CMAs:
A Social Work MA student researching high school drop out rates saw this
article in the Globe and Mail, and we are trying to determine what "new
data" might have prompted it:
When I search "high school dropout" on the Statscan website, I do not
find anything more recent than 2006. The article implies that such data
are collected annually. IF they are, we would be interested to acquire
It uses Labour force survey data, and
"The high school dropout rate is defined as the proportion of young people
aged 20 to 24 who are not attending school, and who have not graduated
from high school"
2)That 2005 study mentioned above below is referenced by a Canadian Council
on Learning (CCL) study that was released on February 4, 2009. The CCL
study draws on Statistics Canada and other data, particularly custom
extractions from the 2001 Census and 2004 SLID. This CCL study is also
mentioned in other, longer news articles similar to the short Globe and Mail
one that you found. Here is a link to the CCL study:
So I suspect that the "new data" is actually the data synthesized in the CCL
report. I find that newspaper stories are often vague about the exact
source of data and tend to attribute everything to Statistics Canada. The
Globe and Mail article may be an example of this.
However, I am going to talk to one of the authors of the study
mentioned because he was consulted on the CCL study. Hopefully he can
provide some insight, or at least confirm that we haven't missed anything.
I have a researcher who is looking for new faculty appointments in Canada - Full-Time Canadian University Teachers Appointed in current year for 2000-2001 to most recent by Field of Study and Gender. The UCASS files tell us how many faculty there are - but not how many were newly appointed that year. Is this available or is this a custom tab?
Our Education consultant confirmed that this would be a custom request. She estimates that the cost would be about $350, but can provide you with a final cost once she knows the complete list of variables needed.
Please let me know if you are interested in pursuing this custom tabulation.
I am helping a graduate student who is looking for detailed employment and earnings data by 4-digit NAICS for 1988 to 1990. More specifically, she is looking for a detailed NAICS breakdown for the following codes: 3111-3399 (86 manufacturing industries).
This data is available from 1991 onward through the Survey of Employment, Payrolls and Hours (SEPH). Unfortunately, I cannot find the equivalent for the 1988 to 1990 period. As far as I can tell from the variables, the Labour Force Survey only provides 43 individual NAICS codes, so it is not nearly as detailed as the SEPH.
Is the data available anywhere?
1) Have you checked CANSIM Table 301-0001? It’s SIC though, so your student would need to use a concordance table…
2) As a follow-up to this question, my contact in the author division explained that NAICS data is not available prior to 1991:
"Data from the Survey of employment payrolls and hours (SEPH) from 1983 to 2000 was based on the Sic-80 industrial classification system. Data at the 4 digit level was not available at the time. Note that the program has gone many changes through time. Data was converted from Sic 80 To Naics 1997, then from Naics 1997 to Naics 2002 and again from Naics 2002 to Naics 2007. In the late 90's the program was in a redesign phase making the task of revising back difficult. When we converted to Naics 1997 we revised data only from 1991."
She also forwarded me 2 concordance tables that you can use. I will send these two you off-list.
The student is currently trying to use data from CANSIM table 301-0001 as suggested by Gail. I wonder however what happened to the SEPH data from 1983 to 2000. Would it be available somewhere? There is no mention of it on the SEPH description page or on the “Other Reference Periods” page. I understand that the data was available under SIC 80 at the time but at what level? 3 digit? The SIC 80 classification for manufacturing industries is relatively detailed even at the 3 digit level so it might be useful for my student.
SEPH data prior to 2001 should be available in the products and services mentioned at the bottom of this page :
http://www.statcan.gc.ca/pub/75f0010x/4060181-eng.htm (Labour Market and Income Data Guide, March 2002)
In addition, if you do a CANSIM advanced search for "SEPH", and limit the results to only the terminated tables, you will get results for SEPH data from 1983 to 2000. Many of those tables offer a SIC 80 breakdown of 283 items, which seems to be 3-digit.
My contact in the division did caution that this data has not been revised and may not be comparable to the revised data (which covers series for 1991 to present). You should probably mention this to your student. The following paper discusses the impact of the SEPH redesign, including the NAICS conversion and the impact it has on historical data.
Monday, February 9, 2009
The Daily announced that there was supplementary data available on December 10 for this product. I looked on the DLI FTP site and didn't see it. Do you know when we will be getting the supplementary data, if at all? I have a staff member looking for this.
The Dec. 10 release is for preliminary data from the University and College Academic Staff System (UCASS) survey. For this survey, the author division produces preliminary data (for institutions that completed the survey before December) and final data (which contains data for all institutions that have reported by March). The Division only provides the DLI with updated data once they have the final data in March. They do not provide us with the preliminary data.
If you wanted the preliminary data in particular, you would need to order it from the division. Please be aware that custom UCASS data starts at $250 and the price varies depending on what is needed.
Please let me know if your staff member is interested in ordering the custom UCASS data.
We have received the CD-ROM for FPICB from the DSP and I have downloaded it from the DLI web site.
There is no difference in the two products. Provincial level information is only available for firms with revenue of $30,000 to $5 Million on both products.
If there is supposed to be a difference, I don't see it.
This has been quite a chase but after many verifications, this is what we were able to confirm with the Subject Matter Division as well as with the DSP:
DSP do get Financial Performance Indicators but only for National level
DLI do get National, Provincial and territorial level.
For more recent years, we have CBP data at the CA/CMA, but a researcher wants to go back further where the counts are available only at the provincial level (at least according to what I downloaded from the DLI site).
Might there be the possibility of getting counts at CA/CMA for the years 1994-1996? Total establishment counts are fine – she doesn’t need an industry breakdown.
I just confirmed with Subject-Matter Division that these data at CA/CMA levels are only available through custom tabs for these previous years. If you decide to go this direction, let me know and I will provide you the contact person for that service.
I’ve been asked for 1981 census data for Wallaceburg, Ontario. I’m trying to determine its geographical level back in 1981, but my print editions only go back to 1986, and I can’t find anything online. In 1986 it was a CA. Could someone be so kind as to look up Wallaceburg for me in the 1981 census to see if it was still a CA?
The information I found in 1981 Census Catalogue Number 93-906 indicates that Wallaceburg was not a CA in 1981. The pertinent line of data states:
Wallaceburg, [T], Kent County [CD], Chatham [Census Consolidated Subdivision].
I checked in PCensus as well, and again, Wallaceburg did not show up as a CA. It showed up as a Census Subdivision. This jives with the designation [T] in the books (i.e. Town).
Could we get a 2006 census profile for health regions, please? And since some health regions are differentiated at the DB level, not possible to replicate.
It is hardly reasonable to expect researchers to use the community profile interface and download the data one region at a time. Since the data are in the community profile interface, it is clear that such a profile exists, although it would appear to contains only 375 characteristics (at last count) rather than the full 2000+ in the full cumulative profile.
We checked with Health Statistics Division and they confirmed the following:
"Yes there is a plan to produce the 2006 Census Basic profile by HR and disseminate these data through the next issue of the Health Regions product, in the Census data section. The date for that product update is not set yet, but we are hoping that it will be relatively soon. There are additional data from the 2006 census already available in the CANSIM Health Indicators profile http://www.statcan.gc.ca/pub/82-221-x/2008001/5202308-eng.htm (census data). Note that this table is being revised, to including additional income-related indicators very soon too."
The Division also thanked you for the feedback, as it reveals the value of disseminating the full profile.
I was just getting the NLSCY files for a graduate student here, and loading them into SPSS. As taught in the training sessions, I run a frequency with each file to see that it loaded correctly. In the course of doing this I encountered a couple of discrepancies.
Cycle 2, 10-13 file: I run a frequency and get a total of 4145 records. This matches the DLI web page, but the codebook indicates that there are 4498 respondents.
Cycle 3, 10-15 file: I run a frequency and get a total of 5539 records. Once again, this matches the DLI web page, but the codebook indicates that there are 6380 respondents.
Given that it is Friday afternoon after a long and busy week, it is quite possible that I am missing something obvious, and if I had to guess, it would be something about the weights. However, for the grad student's peace of mind, I thought that it would be prudent to get the explanation from a more reliable source.
The author division confirmed that the number of records you are getting is correct. The larger number indicated in the codebooks apply to the master files, not the PUMF. Here is the verbatim response from the division:
"there is a discrepancy between the number of cases in the codebook and the pumf c2 self-complete file (10 to 13). There are no cycle 2 or cycle 3 pumf codebooks, the only codebooks available are ones based on the master file counts and frequencies. I have 4,145 records on the pumf c2 self-complete versus 4,498 records in the self-complete codebook. The number of cycle 3 records on the self-complete pumf is 5,539."
I hope this is clear and thanks for bringing it to our attention.
I helped a grad student in Law find data she needed for a paper she was
writing on divorce and custody cases. She submitted the paper, and a
reviewer has written the following statement in his/her comments:
"You should be looking at the percentage of marriages which break-up
after 1 year, 2 years, 3 years, etc. by year of marriage. I am certain
that Statistics Canada could supply this data."
We are not quite so certain that Statistics Canada could supply this
data --- !
She would like to be able to comment on the reviewer's comment, so I
said would go to our source of all StatsCan knowledge -- our DLI reps,
and see what authoritative response you might be able to provide. She
will be very grateful!
1) Some of the main statistical data items included in the Divorce Database are:
- Court where divorce was registered
- Date the divorce application was filed
- Applicant for divorce
- Date of birth of husband and wife
- Husband and wife's previous marital status
- Date of marriage
- Reason for marriage breakdown
- Effective date of divorce
- Number of dependents
- Custody of dependents
- Date of birth of dependents
Divorces, by duration of marriage, Canada, provinces and territories, annual (number)
In addition, the publication 'Divorces in Canada" provides a nice summary listing of other divorce-related data available through Statistics Canada.
I hope this helps.
I’d like to give an update or expected delivery date for this data that was requested back in October by a professor here. Is there any way we can find out when it is expected?
I just received word from our Education consultant that the updated UCASS data is expected to be available towards the end of March 2009.
I have a student wondering when the Survey of Labour and Income
Dynamics (SLID) for 2006 will be released. I checked the Products
Release page and we were supposed to get it in Dec 2008, but it is not
on the DLI FTP site.
The 2006 PUMF was released on January 13th, 2009, and I had asked the
division to provide us with a copy. I still haven't received it so I will
follow-up with them today and get back to you.
The division just confirmed that they will be sending me the 2006 SLID PUMF
this week. I will announce it on the list once it is available on our web
and FTP sites.
I’ve been trying to find price information for corn and ethanol and I seem to be getting close but coming up flat…would appreciate any advice from those who work with these types of stats more than I do! Ideally, I’m looking for prices (versus indices, but I’ll take those too if that’s all that’s available) on a monthly basis for the past 10 years or so.
I am still looking into the ethanol prices, but for corn you may want to look at CANSIM Table 002-0043:
"Farm product prices, crops and livestock, monthly (dollars per metric tonne unless otherwise noted)
Let me know if this table does not provide you with the data you need for corn.
Also, the publication "Cereals and Oilseeds Review" (22-007-XIB) provides corn prices per ton on a monthly basis, but it doesn't go back 10 years:
It should be noted that on many of the other CANSIM tables I found (such as Table 001-0010), there was a footnote stating that "Average farm price and total farm value were discontinued in 1984". CANSIM Table 002-0043 was the only relevant table I found that did not contain this footnote.
I will continue to look into the ethanol price question and get back to you.
I was unable to find any data on ethanol prices, so I checked with the appropriate divisions to see if any was available through custom requests. The divisions confirmed that Statistics Canada does not produce any data on ethanol prices.
Monday, February 2, 2009
I have a couple of questions, which I’m hoping someone with experience in geographical hierarchies might be able to answer -
I’ve been asked to find data on Wallaceburg, Ontario. According to GeoSuite Wallaceburg is a Locality and also an Urban Area. It’s in the CA of Chatham-Kent, which is untracted, so I’m presuming the only way to find data is through the DAs.
I have the reference map for Wallaceburg, and the DAs are marked, which solves part of the problem. However, I’ve ended up with a couple of questions I can’t answer.
First, although I can find the DAs which make up Wallaceburg by copying down the codes from the reference map, I thought there must be a more elegant way of producing a list. I can get a DA list for Chatham-Kent CA from GeoSuite, but not for Wallaceburg, presumably because DAs don’t roll up to localities or urban areas (although if I bring up a map of Wallaceburg on GeoSearch, I can ‘identify’ DAs within it, one at a time). Is there a way of using any of our software or tables to bring up a list of DAs for just Wallaceburg?
The other thing I’m a bit puzzled about is when I bring up the entry for Wallaceburg locality on GeoSuite, under Geographic levels on the printout I have: DA /AD: 35360278, and CT / SR: 5569935.00. Why is it giving a CT code when the Chatham-Kent CA (556) isn’t tracted? And why just one DA code, when there are a number on the Wallaceburg map? Does GeoSuite just give one central DA to represent the whole locality?
Any help would be greatly appreciated! - I’m meeting the prof 9:00 am tomorrow morning.
1) The 9935 suffix stands for “Untracted, Ontario”, if I recall properly. This fits nicely with the “556” prefix, which indicates “Chatham-Kent CMA”.
If you are using ArcGIS, you could overlay a boundary map of Wallaceburg onto a map of dissemination areas, and cookie cut the Wallaceburg DAs out from the rest. You could download the Urban Area boundary map from the 2006 Census (http://janus.ssc.uwo.ca/html/2006_census_geographic_files.html#eng), and overlay it on the DA map of Ontario found on the same page. The advantage of this would be that if the prof wanted to do any mapping, the needed layer would have already been created (and could be used to match to census data).
2)I forwarded your question to a consultant in Geography Division and received the following answer:
5569935.00 is a default CT code for non CT areas (556 = CMAuid, 99 = not applicable, 35 = PRuid)."