Wednesday, February 28, 2007

Updated Products - CTD / Education

Canadian Trade Data - 2006 (NEW!), 2004 & 2005 Revised

Canadian Trade Data is a product from the International Trade Division of Statistics Canada. It contains Canada's trade activity with the rest of the world.

FTP: ftp/dli/trade/can_trade

Education Tables

The DLI has been provided with a few standard tables from Culture, Tourism and the Centre for Education Statistics Division that they usually sell at a fee but are not listed in the on-line catalogue. These are deemed to be semi-custom tables.

The data stem from a variety of surveys including:

- Financial Information of Universities and Colleges Survey (FIUC)
- Elementary-Secondary Education Statistics Project (ESESP)
- Tuition and Living Accommodation Costs for Full-time Students at Canadian Degree-granting Institutions
- University and College Academic Staff System (UCASS)
- Financial Information of Universities and Colleges Survey (FIUC)

FTP: /usr2/ftp/dli/education

Tuesday, February 27, 2007

SLID 2003 Documentation

Users of the latest version of SLID 2003 should be aware that there is a conflicting information in the documentation.

Eg page 2 of the record layout (slid2003cbk.pdf) claims variable rcvcmp28 is in column 110 of the person file, while page 29 of the same document claims it is in column 111.

At least reading it from 110 gives the appropriate frequencies. But all column locations for the person file, beyond column 110 appear to be correct in one place, and incorrect in another place.

Degrees by Field of Study


I have a faculty member who wants # of degrees granted by field of study by province by gender for as recently as possible if not he can use Enrollment # by field of study by province by gender for as recently as possible.

CANSIM 477-0011 to 477-0014 do not have the field of study 81-229xib is only up to 2000 and does not have a detailed enough breakdown of field of study.

On the CAUT website, in their Almanac, for 2003-2004 they have degrees by field of study by gender but only for the whole of Canada (Table 3.11).

CAUT gets its stats from the Centre for Education Statistics. Should I e-mail them or can DLI get them? Alternatively how much would a custom tab be?


1) In CANSIM, the information is provided by field of study however the data is based on Classification of Instructional Programs (CIP) and the primary categories are available.

This is available in Table 477-0014 University degrees, diplomas and certificates granted, by program level, Classification of Instructional Programs, Primary Grouping (CIP_PG) and sex, annual (number)

2) If further detail is required on the CIP, it is available for 2, 4 and 6 digit level from 1992 to 2004 on a cost recovery basis only. The following link provides information on the classification system used.

To provide a custom table by CIP, by province, by gender for 1 year of data would be $165 with each additional year costing $15.

Caution must be used however if a historical comparison is being done as it cannot be done at the 4 or 6 digit level. As about 40% of the institutions still report using the old system, data cannot be used at this detailed level for historical analysis. If historical analysis is done, it must not be done lower than 2 digit level.

2 digit CIP is still more detailed than CANSIM. It contains the following groups:

Field of study (2 digit) Code
Agriculture, Agriculture Operations and Related Sciences 01
Natural Resources and Conservation 03
Architecture and Related Services 04
Area, Ethnic, Cultural and Gender Studies 05
Communication, Journalism and Related Programs 09
Communications Technologies/Technicians and Support Services 10
Computer and Information Sciences and Support Services 11
Education 13
Engineering 14
Engineering Technologies/Technicians 15
Aboriginal and Foreign Languages, Literatures and Linguistics 16
Family and Consumer Sciences/Human Sciences 19
Legal Professions and Studies 22
English Language and Literature/Letters 23
Liberal Arts and Sciences, General Studies and Humanities 24
Library Science 25
Biological and Biomedical Sciences 26
Mathematics and Statistics 27
Military Technologies 29
Multidisciplinary/Interdisciplinary Studies 30
Parks, Recreation, Leisure and Fitness Studies 31
Basic Skills 32
Leisure and Recreational Activities 36
Personal Awareness and Self-improvement 37
Philosophy and Religious Studies 38
Theology and Religious Vocations 39
Physical Sciences 40
Psychology 42
Security and Protective Services 43
Public Administration and Social Service Professions 44
Social Sciences 45
Mechanic and Repair Technologies/Technicians 47
Precision Production 48
Transportation and Materials Moving 49
Visual and Performing Arts 50
Health Professions and Related Clinical Sciences 51
Business, Management, Marketing and Related Support Services 52
History 54
French Language and Literature/Letters 55
Dental, Medical and Veterinary Residency Programs 60
Other instructional program 89

If a custom table is required, turn-around time for delivery is 5-10 working days and pre-payment is required.

Monday, February 26, 2007

Weeding DLI CD-Roms


A librarian here is undertaking a weeding of our CD Rom collection. She asked me about the DLI collection, and now I am asking you. If a product is available in its entirety on the FTP (and preferably a user-access system like IDLS), what are the arguments, if any, for keeping the CDs?


1) If indeed the cd-roms are available in zipped format on the FTP, you could reproduce them at will. But this would require that you be around to access the site - persons (assistants) without the password could not access it on demand, whereas a cd-rom can be pulled off a shelf.

2) A physical copy would give you access at any time (if for some reason the FTP site was down - not accounting for the Mirror site at U of A)

3) If you already have the physical copy, help save the planet by re-using instead of re-burning

4) The DLI ftp site pretty much provides the latest edition or version, but not prior ones. Even other products in which the entire product is available on the STC web site, has important differences in editions. The Health indicators database is one of the few that comes to mind as having several editions available. And then there are also products such as ICO which are not at all self-replacing.

Monday, February 19, 2007

Seed Statistics


The following is a question from a PhD student:

Would you please advise me about whether statistics on the amount of seed that is saved ("Saved seeds" being those collected by farmers from the previous years' crops for use in future years) by Canadian farmers is available and if so, how I might obtain it. Preferably, there would be statistics collected in various years so a comparison could be made; however, I would also be interested in the latest collected data on this topic.

As a worst case alternative, if there is data on the amount of certified seed used, the acreage seeded, and the recommended rates of seeding I suppose I could calculate the amount of non-certified seed used in various years. However, since this would not necessarily indicate the amount of seed saved, obviously real statistics on saved seed would be most helpful.


The Census of Agriculture does not collect the information, and I asked the Agriculture Division for their input. It follows:

We calculate the total amount of seed used by multiplying the seeded area by average seeding rates. The seed data are published in supply-disposition tables for the major grains and special crops. These tables are available in catalogue 22-007 Cereals and oilseeds review for free on the Statistics Canada website under Publications and are also on CANSIM, tables 41 and 42 for a small fee (*but available free of charge through E-Stat or CHASS if you subscribe*).

The calculated seed data are likely also available from the Crop Reporting Unit of the Agriculture Division as a special tabulation for a fee.

Friday, February 16, 2007

Monthly Variable for GSS 19


I have a student trying to look at seasonable variations in how people use their time.

The General Social Survey 19 on time use was done between January and November and could be used to look at seasonal variations, if there was a variable to identify the month in which each survey was completed. We found a day of the week variable, but no month or other date variable.

Has anyone else faced this problem?

If they had access to the whole GSS file through the Research Data Centre, would they be able to access the month variable?


We don't have that variable in the PUMF anymore because of the confidentiality. The month variable is only with the analytical file. If he can get access to the RDC, he can access these files. He can also ask for a custom tabulation but their would be a cost.

The GSS 12, 1998, also on time use, does include a month of interview variable, but it is almost 10 years old.

Thursday, February 15, 2007

2006 Boundary Files

The 2006 Boundary Files, available in electronic format, contain the boundaries for four geographic areas - dissemination blocks, dissemination areas, census tracts and federal electoral districts. They provide a framework for mapping and spatial analysis.

The files are available in two formats: ArcInfo and MapInfo.


Clients can use the Correspondence Files to determine how two specific geographic areas (dissemination areas and dissemination blocks) correspond to each other for the 2001 and 2006 Censuses.


Data Acquisition and Use Agreement


a) I am wondering if there is still a necessity to have this document signed by users of DLI material given we do have a secure site? I see that some locations have mounted the agreement as a condition of use.

b) If we are required to have faculty sign the agreement, does the one signature suffice for all DLI material requested for that individual, or is it to be signed each time a request for data is made?


a) Users are not actually required to sign the DLI licence to use DLI data. From what I understand, this is the scoop:

1) Some DLI member-institutions request that users sign the Model Use Licence to ensure that the users are familiar with the terms of use. The fact that they are signing a licence may make the users feel more responsible for following the conditions outlined in the licence. Should anything happen (such as inadmissible use of the data), the data centre is protected against the "I didn't know" excuse!

2) Some DLI member institutions also use the filled licences as a gauge of activity. By requesting a signed licence from all users, they have have a quick count of activity in their data centres.

3) Some institutions provide data through protected web access and the terms of use of the data are clearly available for users. I think some even have the terms of use pop up before users download the data and they have to agree to proceed. Others, on the other hand, have a brief blurb about the terms of use and a link to the full terms. Some simply have a link with the terms of use.

As you can see, there is a wide variety of methods used by DLI contacts to monitor the use of the data. We find that the smaller institutions can manage having every user sign a licence, whereas the larger institutions provide information about the terms of use for the data, but do not request a formal signature.

b) I think this will have to be decided at the institutional level. The DLI does not require signed licences and so we can not stipulate whether you need one for each data product or one per request.

I believe this is something that should be discussed with your people. The thing that matters to the DLI is that the data are used in accordance with our licence. How you see that this is enforced is at your discretion.

Tuesday, February 13, 2007

Updated Products - TAMS 2006

Travel Activities and Motivation Survey - 2006

The Travel Activities and Motivation Survey (TAMS) was conducted by Statistics Canada on behalf of the Canadian Tourism Commission and four provincial and territorial agencies responsible for tourism. The types of information collected are: areas of Canada travelled to in the previous two years and travel intentions for the next two years; reasons non-travellers do not travel; participation in recreational and entertainment activities; reasons for travelling in Canada and to Canadian provinces and territories; types of accommodation used while travelling; sources of travel planning information; and impressions of parts of Canada as travel destinations.

FTP: /ftp/dli/tams/2006

Additional Missing Variables in CCHS 3.1


I found a couple of other variables that were not in the CCHS 3.1 pumf even though the documentation listed them as being there. They are NDEEDFTT and LBFEGJST. Do you know if the Health division will be putting out updated documentation? Or at least an Errata?


For NDEEDFTT (and all other NDE... variables), same problem as SCHEDSTG, only selected in one HR (this time Nunavut), and therefore suppressed from the PUMF.

For the LBF... variables (complete "labour force" module), the story is slightly more complicated. This module was offered as optional (but not selected), and it was also asked of the 1st sub-sample of respondents. The LBF... variables are included in the separate PUMF sub-sample 1 data file hss1.txt. The twist here is that our naming convention for sub-sample variables replaces the 4th character of the variable name (which normally refers to the cycle) by a "z". Thus, in this case NDE"e"DFTT, is now NDE"z"DFTT. In the hss1.txt file (and corresponding data dictionary reports, layouts, etc.) LBF"e"... variables are in actuality LBF"z"... variables. Users should consult Appendix A (Guidelines for the use of sub-sample variables) in the User Guide.

Trade by Mode of Transport


A grad student here is attempting to look at international trade. He is interested in looking specifically at trade by mode of transportation - what is the value of material arriving by rail / air / sea / road? He'd like at least twenty data points (twenty years, by preference). He doesn't care about examining particular commodities. CANSIM table 404-0021 would be useful if it covered more countries, more modes of transportation, and a longer time period.

I don't think that the trade statistics that we currently have allow this type of analysis - are there other Statistics Canada data which would capture this information? Should we be checking with some other government department for these data?


International Trade Division can produce this data, but at a fee only (multiple hundreds of dollars).

The Transportation Division does not have this type of information at the level requested. The student can have a look at our following publications for some information. I have not perused the pubs and doubt it will meet the user's needs - but just an FYI.

Trucking in Canada 53-222
Shipping in Canada 54-205
Rail in Canada 52-216

Looks like the CANSIM table you found may be the closest thing available free of charge.

Monday, February 12, 2007

Updated Products - Education

Education Tables

Our friends in the Culture, Tourism and the Centre for Education Statistics Division provided the DLI with a few standard tables that they usually sell at a fee but are not listed in the on-line catalogue. These are deemed to be semi-custom tables.

The data stem from a variety of surveys including:

- Financial Information of Universities and Colleges Survey (FIUC)
- Elementary-Secondary Education Statistics Project (ESESP)
- Tuition and Living Accommodation Costs for Full-time Students at Canadian Degree-granting Institutions
- University and College Academic Staff System (UCASS)

Well worth a quick look!

FTP: /usr2/ftp/dli/education

Friday, February 9, 2007

Unable to locate derived variable in CCHS 3.1


Smoking Stage of Change (Current and Former Smokers)

Variable name: SCHEDSTG

Modules used: Smoking (SMK), Smoking -- Stages of Change (SCH)

Based on: SMKE_202, SMKE_06A, SMKE_06B, SMKE_09A, SMKE_09B, SMKE_10, SMKE_10A, SMKE_10B,

Product: Master Data File and Public Use Microdata File (PUMF)

Description: This variable classifies current and former smokers into categories based on the stages of change

Reference: DiClemente, C.C., Prochaska, J.O., Fairhurst, S., Velicer, W.F., Rossi J.S., & Velasquez, M. (1991).
The process of smoking cessation: An analysis of precontemplation, contemplation and contemplation/action.
Journal of Consulting and Clinical Psychology, 59, 295-304.

As noted in this documentation it is supposed to be in the PUMF, but I can't locate it in any of the cycles for the 3.1 - was it not included in the PUMF as released?


Here's what Health told us about the missing variable.

The "Smoking - Stages of Change" module is an optional module, selected in the 3.1 by a single health region: the Yukon Territory. In the PUMF, because the three territories are collapsed to a single geography, the module (hence all SCH module variables including SCHEDSTG) is suppressed. The 3.1 document on DVs is incorrect in stating the data reside in both the master (yes) and PUMF (no). The data dictionaries provide a more accurate picture. Sorry about the confusion.

Back one cycle in time, i.e., CCHS 2.1 (2003), the module had been selected more widely, and the data appear in the corresponding PUMF. A quick way to verify if/where optional content appears is to run a cross tab on HR (health region) geography against the module's selection flag. In the case of this particular module, in the 2.1: SCHCFOPT against GEOCDPMF. This takes 30 seconds using the PUMF's built-in B20/20 browser.

Wednesday, February 7, 2007

Turnover rates by industry


A grad student here is looking for 2005 and 2006 annual turnover rates for two industries (I'm still getting clarification about what industries she's after). She's defining turnover rate (or attrition) as "the rate employees separate/leave an organization voluntarily (or involuntarily). Usually, it's calculated by number of terminations divided by total number of employees in an organization."

I realize I can probably concoct the data by manipulating the LFS, but was hoping to find something pre-digested, as it were.


This research might help the student find sources:

The division has also suggested using the Labour Force Survey.

Friday, February 2, 2007

Graduate Salary Information


I had thought that this would be an easy find--my mistake. If it should have been an easy find, my apologies. I think that the information is likely in ESIS (Enhanced Student Information System), but this doesn't appear to be part of the DLI collection. Is there any way to get the following information short of paying for a custom tab?

I would like to know the average salary after 2 years (or 3 or 4 or 5 years) for people graduating with a BA in communication in Canada. Several years ago, a sessional we had hired in Mass Communication had extracted that information from Statistics Canada.

His information compared the salaries of people who had a BA in communication with those who had a community college degree with those who had a BA and then went to college. In all cases, the straight BA came out significantly ahead (especially when compounded over 10 or 20 years). He had the statistics after 2 years, after 5 and after 10.


ESIS would not be the best source for this information as it tends to be administrative data from the universities themselves. However, I am not sure if you looked at the National Graduate Survey (NGS) or the follow-up surveys?

If the information can not be found in the PUMF, a custom extraction can be done - but starting at $1000 and taking weeks to complete.

1995 is the most recent NGS data we have in the DLI collection. The production of NGS 2000 is still on hold (but I heard they are actually working towards it, but it is at the bottom of a long list of priorities. We will see some basic tables and analysis at some point, the release of custom tables which will be provided free of charge to the DLI, and then the PUMF). No date provided.

The best I can offer is that there is a Master file for 2000 and the user can order custom tabulations at a fee. There is no data yet
available for 2005.

National Survey of the Work and Health of Nurses


I've had an enquiry regarding the eventual possibility of a PUMF coming out of the above-named survey (first described in the Dec
11, 2006 issue of the Daily). Any possibility, or is it destined for the RDC?


There will not be a PUMF created for this survey.

The RDCs will get a copy of the master file some time in the future, but not before March 2007.

LSIC Synthetic Files


There's a researcher asking about synthetic files for the longitudinal survey of immigrants to Canada. I've found several discussions in the archives, but no definitive answer regarding their status. Can anyone tell me if synthetic files were ever produced for LSIC?


LSIC created one synthetic file for wave 1 (wave 2 and 3 will not have a synthetic file). This file is not available to the DLI and does not reside in the RDC either. The RDC never asked for the file, although if they did ask perhaps they could get a copy?

Sorry - only custom tabulations at a fee will be available for this survey for members of the DLI.

Thursday, February 1, 2007

Annual retail store data (ARSD)


We have the ARSD CD-ROM for 1999/00. The description talks about annual releases. Did this ever happen?


The CD-Rom is discontinued, but the data is available in two CANSIM tables we currently have under Retail - 80-0011 and 80-0012, and they are up to date.

The division does offer more data (such as sales per square foot, total operating revenues per square foot and number of locations per square foot) but these are only at a price.