Wednesday, October 26, 2005

Income deciles


A faculty member here wants data from both SLID and SHS by income decile (not quintile, as has already been provided for us in SHS 2003). I know that I can use SPSS to generate the deciles, using


and then, by using these figures to derive another variable, do crosstabs by income decile. What I'm not sure about is whether I
should calculate the deciles to create the derived variable before or after applying the weight variable. Could someone out
there give me some advice?


If you want the decile variable to indicate membership in the weighted sample, keep the weight variable on when calculating the deciles using the Frequencies command. To satisfy your curiosity, run the Frequencies command with the weight variable on and then off and compare the differences between the deciles of these two runs. You should see differences between the values of the deciles as a result of the sampling design.

When recoding income into deciles, the weight variable will not be invoked and can be either on or off.

Tuesday, October 25, 2005

NPHS supplement


I have a grad student looking for the Nutrition Supplement Survey for thefollowing years:

In the StatCan online catalogue it lists supplements of NPHS as being available to DLI members.

Is this still the case? If so, can we obtain these files for our student?

I do see the 1994-1995 supplemental files on the ftp site but those are the only ones available.

Is this a matter of custom tab or RDC as options for receiving the data?


First, there was never any supplement on nutrition in either of our surveys. We did have something related to nutrition but it is the Food Insecurity Supplement of the NPHS, and it was done as a one-time special survey in 1996-1997. This Supplement was sponsored by Human Resource Development Canada to help them in developing policies. The file was never released as a public use microdata file, unfortunately, and is available on request in Research Data Centers.

Second, we recently published the CCHS - Nutrition - General Health component, which is focused on nutrition. The Public file will be available at the end of November, and is also available in request in Research Data Centers.

Weather Data


Does anyone know if weather data is available in either ASCII or GIS format? A student wants Hourly data for 200+ weather
stations in Alberta for 1990-2002.


You'll find what you need at the national climate archive:

Monday, October 24, 2005

Posting survey subsets onto WebCT


A lab instructor for a large Research Methods class has inquired about possibly using DLI data for their labs. It is a large class, with 6 sections, and a total of about 180 students. Given the large size of the class, the instructor would like to try to use small subsets of the GSS, 10-15 variables, to correspond to specific research methods. The instructor would post these subsets onto WebCT for the students to use during the labs. The instructor could put these files up before the classes and take them down at the end of the week, or right after the classes if required. The students would complete small assignments during class tie and submit them at the end of the class period. Only students registered in the class would have access to the files. Would this be allowed under the DLI license? The "use" would of course be educational. However, would this be viewed as "redistribution" of the data?


This is a wonderful example of the teaching value of DLI data. Yes, this is definitely an acceptable use of the data licensed under DLI. In addition, WebCT adds another level of authentication to gaining access since students are required by WebCT to use an ID and password to enter this instructional webspace. The experiences of this lab use would make a great DLI Update article.

Thursday, October 13, 2005

NLSCY Cycle 5 - Download Problems


I have downloaded from the ftp site the files for NLSCY Cycle 5. The Coefficient of Variation Interface works fine from my pc, but when I loaded onto our website (using the same file structure and file name), I get the following errors:
Children 0-5 years - Run time error '1004' Method 'Sheets' of object'_Global' failed Children and Youth 8-19 years - Run time error '9' Subscript out of Range (same error for French or English version). Any suggestions?


I just met with an individual of Special Surveys Division to see if there was a simple answer to your question. We reviewed the files on the FTP and as the product is installing properly on your PC, which is the intended use of the product, there is no support for using the files as you describe. However, they suggested that perhaps not all files were being transferred over? Please ensure that all files listed on the FTP are transferred to the web. Also, they suggest that you create two folders (0-5 and 8-19) and follow the same reporting structure and demonstrated on the FTP. It could be that the program is looking for a file in a certain folder but it is found elsewhere - thus causing the trouble. Apologies for not being able to completely resolve the problem, but I hope the suggestions bring you one step closer.

Low Income Lone Mothers


Is it possible to find a number or percentage of poor (low income or "LICO" status) lone mothers in Vancouver, Toronto and Halifax by Census Tract, or at least, an area smaller than just a CMA?


1. The lowest geography I found crossing incidence of low income and family type was at the CMA level. I did not see any standard tables with this information at a lower level. The Census PUMF would not be an alternative as the lowest level of geography is the CMA. Unless someone out there knows another source for this information, I would say cost-recovery is your only option for STC stuff.

2. This would be far too detailed a table with income in it. I am quite sure that CMA is the lowest for which standard data would be available. I am not sure about a custom table but one must remember that income is a sensitive varaible.

Number of Jobs by Census Tracts

Last August, I had a special request from a professor working in the Transportation Department of Ecole Polytechnique and I would like to share results of my request with you.

The professor wanted the number of jobs by census tracts in Montreal related to place of work and not residential location. The only documents that answered that request are the following:

Place of Work Status (3), Industry - 1997 North American Industry Classification System (21) and Work Activity (4) for Employed Labour Force 15 Years and Over Having a Usual Place of Work or Working at Home, for Census Metropolitan Areas, Census Agglomerations and Census Tracts of Work, 2001 Census - 20% Sample Data

These files are available in the ftp dli collection. I used the file Dirlist.txt with the command "CTRL F" to find the documents.

N.B. The professor will contact the Census Division for Montréal to obtain the data for 1996.

Survey Names and Acronyms

Introduction of a new WEB page for the DLI Community:

Personal income per capita (and more!)


A patron is looking for any/all of the following:

Market income per capita
Disposable income per capita
Personal income per capita

Most importantly, he is looking for these variables from 1957-present (annually up to 5 year intervals) and he wants to compare provinces to the national average.

I think I can find some of these if I dig hard enough, but I was hoping someone might have a source at the tips of their
fingers. It could be a data or statistical source.


CANSIM table 380-0050 has personal income, personal disposable income, and income per capita, by province, annually from 1926-1990.

384-0012 has personal income and personal disposable income from 1980-2004, by province.

Any series that are totals can be divided by the estimated population for the same year, to produce per capita figures, no? The economists on this list will, I am sure, correct me if I am wrong here.

202-0201 has market income, from 1980-2003, for different economic family types, by province.

Since Stats Can defines market income as: Average market income is the sum of earnings (from employment and self-employment), investment income, (private) retirement income, and items under "Other income". It is equivalent to total income minus government transfers. see:

You should therefore be able to derive market income from 1961-1991 from appropriate series in CANSIM table 384-0034, and earlier numbers from the print volumes of the National income and expenditure accounts (13-531).

International Trade Classification


Students here are looking at the prospect of analyzing trade statistics for renewable energy technologies (e.g., windmills and components). Are these components explicitly identified in the SITC (and if so, where)?


1. The classification system of goods does not define whether the commodities are used for specific reasons (eg. energy). I did visit the Canadian International Merchandise Trade page
and performed a keyword search for "windmills." I got some results and the codes that accompany the commodity. You can maybe use that code to access data through the Stats Can trade databases.

I always find this to be a useful tool to find commodity codes.

2. There's an outline of the SITC (4-digit) at:

Code 7189 contains: Power generating machinery and parts thereof, n.e.s.: Engines and motors, n.e.s. (e.g., wind engines and hot air engines) and parts thereof, including parts of reaction engines (other than turbo jet parts).

Annual Estimate of the Number of Households in Canada


Where can I find an annual estimate of the number of households in Canada? (1990 - present)


1. How close would CANSIM table 051-0003 come to satisfying your patrons needs? It provides estimates of the number of census families annually from 1986 to 2004. Census families are different from households. The STC definition for household
is "a person or group of persons who occupy the same dwelling and do not have a usual place of residence elsewhere in Canada
or abroad."

On the other hand, the definition of census family is "a now-married couple, a common-law couple or a lone-parent with a child or youth who is under the age of 25 and who does not have his or her own spouse or child living in the household."

Table 051-0003 would miss out on all single individuals in its estimates. But maybe your patron would be willing to go with census families.

2. The first answer is a nice alternative to the number of households. The closest I was able to find was 1997-2004 from the Survey of Household Spending on CANSIM. It has the estimated number of households, but would still leave you short from 1990-1997. The surveys which SHS replaces were only performed every four years or so, which still leaves a gap for your series.

3. You can find the estimated number of households, in the publication FP markets. Canadian demographics, 1998- It was known as Canadian markets from 1985 up to 1998 and it is publish by Financial post. Hope it can help!

DLI On The Web

DLI "collection on the web" page is located at

International Students


I am looking for the information re: international students, by province and - university, number of, how many graduated, how many dropped out. Does such information exist?

I was also wondering if we have some local (Atlantic) statistics.


1. This might be helpful:

2. I have had similar requests in the past so was interested in this. I explored further and found more at and

Lots here on numbers, where they're coming from, and where they're going.

Wednesday, October 12, 2005

SLID 2001 and 2002 SPSS syntax files


I was wondering if anybody has put together the compete SPSS syntax files to the 2001 and 2002 SLID surveys, and would be willing to share? The DLI site presently offers the parts, but not the complete file.

If not, perhaps I could get some advice on the parts of the syntax available for these surveys. I have pasted together the parts of the 2002 SLID persons survey, and edited as necessary, but SPSS does not seem to like the variables list. It has several numbers that end in decimals, or have a number with a decimal in brackets after the column numbers. For example, "icswt26 23 - 32.4 (10.4) " . If I strip out the decimals and numbers in brackets, SPSS will pull in the data, and several frequencies I have run match the documentation. However, I am still a little concerned that the decimal numbers must have been there for a reason, and thought I should check before giving the student access to this and other recent SLID files.


No need to doubt your SPSS skills. The decimal point in the column range, i.e., 23-32, is incorrect SPSS syntax. I suspect that the SPSS code was generated from SAS code, where the variable "icswt26" is declared correctly for SAS as: icswt26 10.4 .

Thursday, October 6, 2005

Synthetic files for SLID

Currently: The new SLID-RET (Survey of Labour and Income Dynamics - Data Retrieval System) files were just loaded on the FTP at the following location ://ftp/dli/slid/SLIDRET-Ver-2_3. The website will be updated shortly.

2005 Road Network File (92-500-XWE).

The 2006 Census Dissemination Project is pleased to announce the official release of the 2005 Road Network File (92-500-XWE).

The 2005 Road Network File (RNF) is the first official release from the 2006 Census Geography suite of products and services. The RNF is a digital representation of Canada's national road network, containing information such as street names, type, direction and address ranges.

The unrestricted release of the RNF allows Canadians to preview the national road network which is the source for the creation of geographic units being used for 2006 Census of population. Other applications of this file include: mapping, geocoding, geographic search, area delineation, and database maintenance as a source for street names and locations.

The 2005 Road Network File (RNF) is available for Canada and individual provinces and territories in three formats: ArcINFO (.SHP), MapInfo (.TAB), and, for the first time, Geography Markup Language (.GML).

For the first time, the RNF is available free of charge to Canadians on the Internet and can be found by clicking on the "2006 Census" button located on the top navigation bar of the Statistics Canada home page and can be accessed under the "2006 Census." It can also be found by clicking on the "2001 Census" button on the top navigation bar of the Statistics Canada home page and can be accessed under "Recent Releases."

DLI and StatsCan Updates (October 2005 - February 2006)

Please note the updated products listed below and the path to access it via FTP. The web site will be updated shortly.

Small Area and Administrative Data
Canadian Capital Gains 1998 - 2003
Charitable Donors 1995 - 2003
Canadian Investment Income 1995 – 2003
Canadian Investors 1995 - 2003
Canadian Savers 1995 - 2003
Canadian Taxfilers 1995 - 2003
Economic Dependency Profiles 1989 – 2001 and 2003
Families 1995 - 2000 and 2003
Labour Force Income Profiles 1989 - 2000 and 2003
Neighbourhood Income and Demographics 1995 - 2000 and 2003
RRSP Contribution Limits (Room) 1995 - 2004
RRSP Contributors 1995 - 2003
Seniors 1995 - 2003
FTP: /ftp/dli/saad


Inter-corporate ownership (ICO) 2005-1, 2005-2, 2005-3
FTP: /dli/ico_CD


We have just completed a new and improved “Browse by subject” feature on the Statistics Canada website. We would appreciate your feedback/comments on the changes and any further improvements we could make.

Please use the feedback button provided at the bottom of the first page of “Browse by subject”.


General Social Survey - Cycle 16

Please note that new SPSS/SAS codes have been included on the revised CD-ROM.

FTP: /ftp/dli/gss/cycle16-2002


Please note that E-Stat is looking at adding “Canada Food Stats 2005″ (cat. No. 23F0001XCB) onto their site in the new year. Only the most recent issue of the cd-rom will be available through E-Stat. As I receive more information, I will let you know.


Postal Code Conversion File Plus V4G - October 2005

The Postal Code Conversion File Plus (PCCF+) (82F0086XDB), Version 4G with postal codes through October 2005, complements the Postal Code Conversion File

(PCCF). When the association between the postal code and census geography is not unique, the PCCF+ allows for a proportional allocation based on the population count.

In Version 4G, federal electoral districts according to the 2003 representation order, riding names and definitions have been updated to include changes in 2004 and 2005. Also, Ontario health region definitions have been updated to include changes through August 2005.

Users also need SAS to run this application.

FTP: dli/health/pccf4g-fccp4g


Labour Market Activity Survey - 1986 - 1990 (SPSS codes)

FTP: /ftp/dli/lmas/1986 (1987, 1988, 1989, 1990) /doc

FTP: /ftp/dli/lmas1986 (1987, 1988, 1989, 1990)job.sps
FTP: /ftp/dli/lmas1986 (1987, 1988, 1989, 1990)per.sps



Social Policy Simulation Database Model - V 14.0
The Social Policy Simulation Database and Model (SPSD/M) Version 14.0, based on 2002 microdata, is now available. The most recent SPSD/M can be used to study the impacts of changes to federal and provincial tax and benefit programs on families and governments from 1991 through 2010.

The SPSD/M is a static microsimulation model. It is comprised of a database, a series of tax/transfer algorithms and models, analytical software and user documentation. The SPSD/M has been produced as an occasional product starting in 1985. It has been in wide use by policy analysts in Canada studying virtually every change to the tax and transfer system since that time.

The SPSD/M is a tool designed to analyze the financial interactions of governments and individuals and families in Canada. It estimates the income redistributive effects or cost implications of changes in the personal taxation (including the GST and other commodity taxes) and the cash transfer system. The SPSD/M also helps researchers examine the potential impacts of changes in taxes, earnings, demographic trends, and a wide range of other factors.

The SPSD/M allows users to answer questions such as what if there are changes to the taxes Canadians paid or transfers they received who would gain and who would lose? Would single parent households in a particular province be better off and by how much? How much extra money would federal or provincial governments collect or pay out?

FTP: /dli/spsdm/spsdm-v14

Data on Advanced Voting


I am interesting in getting my hands on a dataset which summarizes advance and regular voting by candidate and riding in the 2000 and/or 2004 elections. I realize these data are available for the above elections aggregated at the provincial level, and disaggregated at the riding level by poll. However, I am interested in a set which would have the results by riding with some indication of the amount of advance voting for each candidate.


My guess is you would have to contact Elections Canada for that - they do provide the number of advance polls in some of their standard tables (in 2004 election results, tables 1 and 5), but only at the province level:

Update on SAAD Files


Any chance of an update to the SAAD files, especially 13c0016: Families? The latest data available under DLI seem to be to 2000.


I am pleased to announce that our FTP site was just updated with 2003 SAAD tables. There remains some gaps in the collection, but the author division is working at filling in those gaps through time.

Wholesale Prices


There is a class of students doing business plans here, and one of them is looking at making energy drinks using 100% fruit juice as one of the ingredients. What he would like to find is the price that he would have to pay for fruit juice. This would be the wholesale price, not the retail price. I was able to find retail price in CANSIM Table 326-0012. But, so far, I have not been able to dig up wholesale prices. Does anyone have any ideas?

You can get futures prices of commodities (such as orange juice) via Thomson Financial's Datastream database.

I would also look at the US Dept of Agriculture's AMS page of Wholesale Terminal prices here:

There are links for fruits and tropical fruits.

Also: You can find wholesale to retail prices of many types of food on Agriculture Canada's website in a database at

This database allows you to select the year, month, and day, as well as type of commodity, and location of market.

It then gives you an output, and allows you to export the data as a .csv file

DLI Updates

Just a quick note to let you know that the DLI just released a new DLI Update (Volume 8, issue 1). You can retrieve the DLI Update from the following web address:

Publication highlights:
DLI is Coming Out of Its Shell by Chuck Humphrey (University of Alberta)
STC Mainframe Computer by Mike Sivyer (Data Liberation Initiative)
The DLI Training Repository by Jane Fry (Carleton University)
The Licensing Portal by Monia Bergeron (Data Liberation Initiative)
Data Gaps in the DLI Collection by Mike Sivyer (Data Liberation Initiative)
Difference between DLI's FTP and Web Sites by Jackie Godfrey (Data Liberation Initiative)
Additions to the DLI Collection from July 2004 - 2005 by Monia Bergeron (Data Liberation Initiative)

Canadian Business Patterns June 2005

FTP: /dli/cbp/2005/


New Version of Beyond 20/20


Is the Version 7.0 of the Beyond 20/20 Browser available?


CD-ROM products started using version 7.0 at the beginning of this calendar year. I am not sure of the exact number of products, but possibly 3 or 4 so far.

Version 7.0 of Beyond 20/20 is now available on the FTP site at the following address: /ftp/dli/util/b2020-70.exe

Installing Beyond 20/20


I have been trying to get Beyond 20/20 installed on computers at the University -- particularly in the computer labs. I was able to get our systems people to do this in the past, but now they seem to have an issue with this. I received the following response as a result of a recent request:

"The Beyond 20/20 browser does not obey group policy, it cannot be installed on ADM machines. If there is a new version of the application available please let me know and I can test that to see if they have fixed the application. The easiest way to fix the problem would be to use standard Windows Open dialog boxes."

Has anyone experienced this at their institution? Does anyone know if the problem this systems person is describing is being addressed? I would very much like students to be able to open and work with IVT files at any computer in the university, particularly those in the labs.


2005: As far as I know there are no restrictions to the number of desktops that can have the Browser installed within the DLI community.

From the Past: Several years ago I requested the installation of the B2020 browser in our campus computing labs to increase access to the 1996 Census tables (like i said, the request was several years ago.) The Computing Centre would not install the browser until they had a license confirming that the copy they were installing was legitimate. Lynda Richardson, who was the DLI Liaison at the time, provided me with a document stating that the U of A was licensed through DLI to installed B2020 on our

I'm passing along this "old" news in the event that your computing centre raises the same concern. The DLI Unit will provide you with a document that you can give the software license administrator on your campus.

National Survey of Giving, Volunteering


I'm writing about the 1997 Survey of Giving, Volunteering and Participation. A student here raises a question about the variable AGEGRP, which has a frequency distribution as follows, from the User Guide:

Variable: AGEGRP Position: 518 Length: 1
Public microdata variable. Respondent Age
1 15-24 years 2,389 3,980,297
2 25-34 years 3,636 4,625,964
3 35-44 years 4,280 5,123,614
4 45-54 years 2,883 3,979,917
5 55-64 years 2,109 2,580,451
6 65 years and over 3,004 3,517,908
======= ==========
18,301 23,808,151

In the SPSS command code provided on the DLI FTP site, this variable is assigned a missing value for "6". So, in the converted SPSS System file we have 6 coded as 'missing', but the frequency 'missing' matches those "65 years and over" in the User Guide. Since our student is particularly interested in those 65 and over, this is a problem.

Is the AGEGRP value "6" incorrectly assigned as missing in the SPSS command code?


Just spoke with the DLI Team, and he confirmed that you are correct in your assumption - AGEGRP value -- incorrectly assigned as missing in the SPSS command code.

If you wish to go ahead and adjust the SPSS code to make use of the data immediately, please go ahead. We will work on fixing the SPSS code and reloading it on the sites.