Wednesday, November 29, 2006

March 2006 Postal Code Conversion File


I downloaded the March 2006 PCCF from the DLI directory, and have some questions.

In the directory structure, there is a "corrected-postal-codes" subdirectory, with two files:
1) pc_corr_cp.xls - which is a list of 550 postal codes --- Is this the list of postal code corrections mentioned in the second bullet on page 5 of the codebook? Are we supposed to do something with this file, and if so, what? Is there a list of the corrections made that could advise our users, or do we simply tell them "these postal codes got fixed somehow - we don't know what was wrong, or what was corrected"?

2) problemsdpl_problemesld.txt - which has 45 lines of data, each line having a "PCODE", "DPL", and "Incorrect_DPL"
--- Is this the file of "unnecessary Designated Place - postal code linkages" mentioned in section 4.4 of the codebook?

When I look at the records in the data file, it appears that if we make the changes recorded in problemsdpl_problemesld.txt, we will have identical duplicate records for the postal codes - so I assume we are supposed to delete the records with the incorrect DPL?

I ran the duplicate checking procedures I created against the non-retired SLI=1 postal codes (theoretically the "best match" file), and came up with 48 records that were duplicated. My list of 48 postal codes includes all 45 of the ones listed in the file problemsdpl_problemesld.txt, plus postal codes V9B0A4 (DPLs 0010 and 9959), V9B6X4 (DPLs 0010 and 9959), and V9B6X5 (DPLs 0010 and 9959).

What about these 3 postal codes that seem to be duplicated - which is the "wrong" DPL (and, if I'm right in assuming, a candidate for deletion)?

In the short term, a "readme" file in the corrected-postal-codes directory would be quite useful, I think, to instruct us in the use of these two files.

3) Why isn't it possible to get this file corrected by Statistics Canada, either at the source division or at DLI, so that each DLI institituiton doesn't (or shouldn't) have to make the same corrections?


1) This file is explained on page 5 as you mentioned, it is a list of postal codes that were linked to incorrect geographic units but have now been corrected. You do nothing with this file, these corrections are in the Postal Code Conversion File (PCCF) .

2) Yes, there were some duplicate DPL linkages created during the automated geocoding process and we have discovered a few more which include the postal codes you have mentioned below.

3) - EAC) The issue with the DLPs results from the automated geocoding process. The Postal Code Project Team is working on redesigning the geocoding system in order to reduce errors and increase data quality of the Postal Code Conversion File (PCCF). The January 30, 2007 PCCF release will contain the same DPL duplicates because we are not switching over to the new geocoding system until after this last release (based on 2001 geographic units). Once we release the first PCCF based on 2006 geographic units the DPL duplicates should no longer exist because we will be using the new geocoding system.

APS 2001 Off-Reserve Data and Chi-Square


I have a question regarding the use of the Chi-square test, this time when using the Aboriginal Peoples Survey.

The researcher notes that "One way to correct the Chi-square for a complex sample is to divide the Chi-square produced, say by SPSS, by the design effect for the sample being used. The result is only approximate, but it works well enough for sorting out what are likely significant relationships."

The answer received for the previous question suggested how to do this for the 2001 Census PUMF of Individuals. Now, he is using the APS 2001 Off-Reserve data set but cannot find a table of design effects, (or conversion factors, or quality factors or anything that looks like values that they can turn into design effect values) in the documentation tha he has for this complex sample. Has he missed something or might there be missing documentation? Do you know where this information might be, or there some other way to do the Chi-square test using the APS?


I am assuming that the researcher is referring to the PUMF, which does not contain the bootstrap weights and so it is not possible to do this kind of test using that dataset.

My best suggestion would be for the person to get access to the analytical file(s) in the RDC. Failing that, it might be possible to do it through the custom requests route.

Tuesday, November 28, 2006

Provincial Expenditures on Social Services and Health


A political science graduate student is looking for historical data (back to the 1970s) for provincial governments expenditures in social services and health. CANSIM Table 385-0008 provides data from 1989 onward and Historical Statistics of Canada (H332-344. Provincial governments, gross general expenditure by function) presents data from 1965 to 1975.

Where can we find statistics for the missing period, 1976 to 1988?


Time series data prior to 1989 were removed from Statistics Canada's CANSIM database because the statistical concepts and universe coverage employed when these terminated data were produced, differed significantly from SNA 93 international guidelines. SNA 93 guidelines were implemented during the 1997 Canadian System of National Accounts (CSNA) Historical Revision and the data which Public Institutions Division currently produces and supports covering the 1989 to present period, subscribe to these new guidelines.

Unfortunately, data prior to 1989 are unsupportable and unavailable.

Monday, November 27, 2006

Updated Products - GSS 19

General Social Survey, Cycle 19: Time Use (2005)

The core content of time use repeats that of cycle 12 (1998), cycle 7 (1992) and cycle 2 (1986), and provides data on the daily activities of Canadians. Question modules were also included on unpaid work activities, cultural activities, social networks and participation in sports. The target population of the General Social Survey consisted of all individuals aged 15 and over living in a private household in one of the ten provinces.

FTP: /ftp/gss/cycle19-2005

Friday, November 24, 2006

Birth and Death Rates for Sackville, N.S.


I have a student looking for birth and death rates for Sackville, Nova Scotia. This is a "Designated Place" according to the Census people. I can't seem to get to birth and death rates for anything other than provinces when using CANSIM. Sackville, N.S. is part
of the Halifax Regional Municipality and is served by the Cobequid Community Health Board. She can't seem to get information from HRM or the Health Board.

Any and all suggestions are welcome. I suspect it is another one of the situations where the raw data is collected, but no calculating is done or at least published on this level of geography.


1. Birth and death statistics were included in the 1996 Community Profiles. If you go to the STC website and select "Community profiles" from the left sidebar menu, you will find a link to the 1996 Community profiles in the bottom right of this page. A search for Sackville, NS results in two choices: both Subdivision C for Halifax. Taking this link will allow you to pick "Births and Deaths" from the following page.

2. Another alternative is to use Annual demographic statistics, which contains births, deaths, and population by county (=census division). From births/deaths and population, one can compute birth and death rates. Of course, It requires assuming that the birth/death rates for the county in also hold for Sackville, but it might be faster than waiting for birth/death stats to show up in the 2001 Community profiles.

3. Birth and death rates are available by county in the Nova Scotia Vital Statistics Annual Report
( .
Totals are also given for all incorporated areas (over 1000) but Sackville is not one of them.

Mining FDI in Latin America


A researcher here would like to get Canadian mining foreign direct investment into Latin American countries. The following CANSIM table comes closest:

Table 376-0053
International investment position, Canadian direct investment abroad and foreign direct investment in Canada, by industry and country, annual (dollars)

The closest industry is "energy and metallic minerals", but the real problem is the level of geographic detail; the only categories are US, UK, other EU countries, Japan, OECD, Japan and OECD countries, and the wonderfully evocative "all other foreign countries". Is it possible to get more detail from Statistics Canada? Alternatively, can anyone suggest a source that might provide a little more detail, especially for South America?


One could try the NRCan? (
They have statistics, a form to request information, and a link to industry associations (

The Mining Association of Canada puts out Facts and Figures
which does mention investments in general - they may have more detailed information if you contact them directly.

SLID 2000 Immigrant Variable


In the 2000 Survey of Labour and Income Dynamics there is a variable "immst15" (immigrant flag). Marginals are as follows:

1 yes 3,641
2 no 11,608
7 don't know 42,192

We're assuming that the "don't know" category should actually be "not applicable" and also that it refers to persons born in Canada. Is that a valid assumption?


Immigration status is available only for persons living in urban size of 500,000 and higher all other individuals are set to don't know for confidentiality reasons.

Follow up

Is there not some way that another code could be used that could be specifically for "confidentiality protection"? A code of "don't know" should be reserved for persons who indicate that as their response. This undermines confidence in coding.

Wednesday, November 22, 2006

LFS (Labour Force Survey) 2006 and SPSS


Is it safe to assume that the SPSS syntax file
can be used for 2006 monthly files as well?


Everything seems fine to use the same code for 2006.

Environmental and demographic data


One of our PhD's is looking to find data at the lowest geographic level possible on as wide a variety of environmental indicators (air, water and soil qualities, climate, etc) and wants to marry that with a wide range of demographics (age, gender, ethnicity, income, etc) at the same level of geography. He'd like the data from the 1990's to the present.

We've looked at several sources but can't really find anything. The Canadian Environmental Sustainability Indicators provides data back to the 1990s but only at the national and/or provincial level.

Does anyone out there know whether these data would be available at lower levels of geography? He says he's heard of data collected at either the postal code or FSA area, but we haven't been able to locate them.


1. He might take a look at the paper by Buzelli et al. (2006) in Canadian Geographer 50(3): 376-391 for a method of estimating/interpolating such environmental data at the Census Tract scale in Vancouver.

2. Have you looked at the National Land and Water Information Service for some of the environmental data? The main data page is:

You might start with the Land Potential Database. It contains "data about soil, climate, physiography, land use, modelled constraint free (potential) crop yields, actual crop yields and soil degradation for all regions of Canada".

3. Count Statistics Canada out for the data at that level of geography - and even if data was collected at a low level of geography,
it does not mean that it is reliable enough to publish at that level of geography. The counts of respondents plays a very important role in what level of geography is released (census vs. survey).

4. As an added piece of information, the Health Indicators project is working with Environment Account and Statistics Division in order to come up with indicators related to the environment. The major challenges are the level of geography and the link to the standard geography. When you talk about air, water and other environmental indicators, the standard boundaries are not always relevant. This is a challenge that needs to be addressed and in maybe two or three years, it may be possible to see some relevant data being released but not yet.

5. It depends whether the client wants to use an environmental boundary or census boundary. Agriculture Canada will soon be releasing Ag data at the sub-sub-basin level and soil landscape level. But most data are not available at this level.

Also, it depends on the type of data he is looking for. Waste management data, for instance, are only available at the provincial
level due to confidentiality.

6. A few more things your patron may want to look at:

Air quality/pollutants:

Canadian compendium on common air pollutants:

General ecological:

For climate:

Friday, November 17, 2006

Workplace and Employment Survey (WES) availabilty


I have a professor looking for the 2003 employee data PUMF for the WES. I see that synthetic files are available through the FTP site - are the PUMFs available?


There is no PUMF for this survey at this time in the DLI collection. I believe we were offered a version of a PUMF, but it was not overly useful and needed to be returned to the division - I don't recall any of the details at this time.

The synthetic files seem to be your only option at this exact time, but there may be some data from the survey in the DLI collection in the near future.

Wednesday, November 15, 2006

Canadian Community Health Survey 1.1 Question


I have been using the CCHS data file (cycle 1.1) PUMF and the dictionary for the PUMF. While the dictionary often lists "don't know," "refusal," "not stated" categories the actual PUMF file only lists missing (never not stated, etc.) Is it okay to assume that all those missing are either: not applicable, don't know, refusal or not stated?


1. The CCHS 1.1 PUMF does include the range of values for instances of missing information: not applicable, don't know, refusal or not stated. We have a version of this file in SPSS that includes these values and that declares them individually as missing.

Is your patron working with SPSS or SAS? I'm guessing that she or he is using SAS. Why SAS? Because of the way users typically assign missing data in this statistical system. For example, the variable CCCA_131 records whether the respondent has cancer. The values and labels for this variable are:

1 = YES
2 = NO

I have seen many researchers working in SAS assign missing data
using the following statement in a DATA step:

if ccca_131>2 then ccca_131=.; which treats the values 7, 8 and 9 as one missing category.

SAS does allow specifying 27 special missing values by using a decimal point followed by a letter or the underscore character. Therefore, a researcher could declare each of the missing values for CCCA_131 in three statements:

if ccca_131=7 then ccca_131=.A;
if ccca_131=8 then ccca_131=.B;
if ccca_131=9 then ccca_131=.C;

All three of these values would be treated as missing but SAS would differentiate between the three types of missing information. I haven't seen many researchers take the time to write this much code for all of the variables they are using, though.

2. The most recent versions of Stata includes support for multiple missing values, but earlier versions (before Stata) did not. So if your user converted the file using a program such as DBMSCopy, which converts to an older Stata format, then all of the different SPSS missing values would have been mapped onto a single Stata missing value. This might also happen if she coverted the file to the current Stata version, but used a program to convert that didn't deal well with either SPSS or Stata missing cvalues.

One solution might be to open the file in SPSS, undeclare the missing values, then do the conversion. After this was done the user could then declare the Stata missing values from within Stata.

3. Stat-Transfer, another data conversion program claims to handle multiple missing values. On the first options tab, there's a set of options for handling user-declared missing values (such as SPSS has), one of which is to convert them into Stata extended missing values.

However, I ran in to problems trying a simple transfer from SPSS using a dataset I had on hand - it took three different missing values on one variable, and returned a variable with only two - it combined two of the different values into a single Stata missing value. I spoke with a colleague who say that he's had the same problem.

Another option for handling user-declared missing values in StatTransfer is to "Use none" - this option simply preserves the original values of the variable. On the quick test I ran, this actually worked, and it's quicker than opening the file in SPSS and redeclaring all the missing values separately.

Friday, November 10, 2006

Amount collected by restaurant workers in tips


How much is collected in income tax for tips by restaurant / bar workers and how much do they actually collect from the folks leaving the tips?


That is a very tough question! The last time Statistics Canada looked at "The Underground Economy" was in 1994...

I did find a handy paragraph that could perhaps be of help:

3.5 Tips

Tips are calculated in the national accounts as a fixed percentage of gross business receipts, varying by industry and type of service provided (3% for accommodation, 10% for meals in restaurants, alcoholic beverages and hairdressing and 15% for taxi). The upper limit of tips missing from GDP due to underground transactions can be calculated directly by applying the same percentages to the estimates skimming of receipts...

Source: The Size of the Underground Economy in Canada, catalogue 13-602E, No. 2, Statistics Canada, p.30. (Gaëtan - don't look at my citation format!)

I would take some time to review the entire document - it has many components which are very helpful.

Although this will not meet your exact needs, it may hold some information to help guide you in your search.

Wednesday, November 8, 2006

Expenditures on antidepressants


Is there any data available on how much Canadians spend on antidepressants in a year?


Answer 1:

I am not sure Statistics Canada is the best source for this sort of data... I did perform a few searches, but the results are far from

Result # 1)

CANSIM Table 203-0008 -- Survey of household spending (SHS), household spending on health care, by province and territory, annual

One section is called: Medicinal and pharmaceutical products (Prescription medicines _and_ Other non-prescription medicines and pharmaceutical products).

As you can see, this is a very broad category and may be a poor indication of antidepressant expenditure.

Result # 2)

CANSIM Table 301-0006 - Principal statistics for manufacturing industries, by North American Industry Classification System (NAICS), annual (dollars unless otherwise noted)

Pharmaceutical and medicine manufacturing [3254]
Pharmaceutical and medicine manufacturing [32541]
Pharmaceutical and medicine manufacturing [325410]

Although you can get revenues from this table, extreme caution should be used.

1) This is for a broad category of pharmeceuticals and medicine - antidepressants is only one part
2) The manufacture's price may not be the retail price and could under estimate the revenues/sales.
3) Some of the product could be exported and would not be reflective of the Canadian consumption model.
4) There may be additional issues with this proposed option....

Answer 2:

CIHI has a Drug Spending Database:
Who is authorized to access?

Drug Expenditure in Canada: including very aggregate data.
Drug expenditure data in this report are obtained from the National Health Expenditure Database (NHEX) maintained by the Canadian Institute for Health Information (CIHI). Drug expenditure data in NHEX are macro-level data and do not allow for decomposition of prescription costs or drug classes.

For a general overview:

Answer 3:

STC does not have that detailed information on antidepressant. CIHI collects the information on prescription but I don't think that you will get from them the details on antidepressant (unless they would do this request as a custom request).

CCHS 2.2 nutrition


When will the nutrition portion of the Canadian Community Health Survey 2.2 be released? The last information we had said August 2006.

Does a pumf for that portion of the survey need to be so different from the pumfs produced for the Nutrition Canada Survey?


We are not sure when the PUMF on Nutrition, 2.2, will be released. The Health Division was sent back to the drawing table and they are now working with Health Canada in finding what would be the appropriate roll-up for not divulgating third party information. As I mentionned before, we have to be very careful not to release information on third party that will breach confidentiality as well as it could be perceived as releasing market information (brand, consumption of one product versus another, market share, etc.). This is why they need to find the appropriate roll-up of information that will provide appropriate information and at the same time protect sensive information.

The content is quite different from other surveys and confidentiality is not only concerning identifying individuals but also related to sensitive market information related to what people eat or drink.

It's a brand new perspective and Statistics Canada has to be very careful on the information released.

Monday, November 6, 2006

CCHS 2004 cycle 2.2


Is the Approximate Sampling Variablility Tables (ARROX_SAMP_TAB_E.PDF) for CCHS cycle 2.2 only available as a pdf file? I have a researcher who is looking for the file in excel format.


The Approximate Sampling Variability Tables are not available in an Excel spreadsheet unfortunately.

Wednesday, November 1, 2006

Aboriginal Peoples Survey 2001 Adults off-reserve


The User's Guide for the APS (2001) Adults Off Reserve (pg. 27) "Appendix A, Rules for calculating approximate variance" refers to
"using the Excel file FindCV APS(PUMF).xls". Is the file aps2001vt.xls the file they are referring to?


You are correct. Because of our naming convention we had to change the file name from "FindCV APS(PUMF).xls" to "aps2001vt.xls".

2001 census files - can't unzip/open


I've downloaded the following files from the ftp site and have received the same error for all of them when trying to unzip: "Cannot open file. It does not appear to be a valide archive. If you downloaded this file, try downloading the file again." Then when I click OK I get "Errors occured while extracting. Do you want to view the last output?" When I say Yes I get the following message: "End-of-central-directory signature not found. Either this file is not a Zip file, or it constitutes one disk of a multi-part Zip file."

I did try downloading again but I get the same messages I downloaded about 40 other files that worked fine. Any ideas as to whether I'm doing something wrong or whether the files are corrupted?

List of files that I can't unzip (ftp/dli/census/2001/ascii/topic-based tabulations/):

Families and Household Living Arrangements

Language Composition of Canada

Canada's Workforce: Paid Work


Thank you for bringing this to our attention. There may have been a slight problem when the files were loaded onto the FTP.

We noted one small point - Canada's Workforce: Paid Work 95F0380XCB2001001 - this file seems to download, but the others
were problematic.

I am working at getting another copy of the files. I will advise you once we have reloaded them on the FTP.

Thanks again for keeping us informed about the situation and we'll try to get this done asap.

2006 Census release dates (Official re-announcement)

Please be advised that the detailed analysis, review and consultation regarding the 2006 Census release schedule and associated release dates has been completed. The originally published dates have been revised based on the impacts associated with the extension of 2006 Census field and collection activities.

The 2006 Census homepage has been modified. The link to information regarding the major 2006 Census release dates has been reinstated on the home page and the following is now presented:

2006 Census release dates
For the 2006 Census, due to the introduction of a number of automated processes, Statistics Canada envisaged releasing the population and dwelling counts earlier than in 2001 by a few weeks. However, given the tight labour market due to the strong economy in certain areas of the country (i.e. western Canada) and the difficulties that this has meant in hiring and retaining field staff, the completion of collection activities was extended by about five weeks. The impact of the extension of Field/Collection activities has resulted in adjustments to the originally published release dates. The revised release dates (as of October 31, 2006) are as follows:

Release no. 1: Tuesday March 13, 2007

  • Population and dwelling counts

Release no. 2: Tuesday July 17, 2007

  • Age and sex

Release no. 3: Wednesday September 12, 2007

  • Marital status

  • Common-law status

  • Families

  • Households

  • Housing and dwelling characteristics

Release no. 4: Tuesday December 4, 2007

  • Language

  • Immigration

  • Citizenship

  • Mobility and migration

Release no. 5: Tuesday January 15, 2008

  • Aboriginal peoples

Release no. 6: Tuesday March 4, 2008

  • Labour market activity

  • Industry

  • Occupation

  • Education

  • Language of work

  • Place of work

  • Mode of transportation

Release no. 7: Wednesday April 2, 2008

  • Ethnic origin

  • Visible minorities

Release no. 8: Thursday May 1, 2008

  • Income

  • Earnings

  • Shelter costs

We are advising that users, key stakeholders etc. who are enquiring as to the status of the 2006 Census release dates be directed to the 2006 Census home page and the link to information on release dates.

Updated Products - Geography

Please note the updated products listed below and the path to access them via the FTP.

1) Geography Boundary Files - Census 2006

The 2006 boundary files portray the official geographic limits used for census dissemination and are available for Provinces and Territories, Census Divisions, Economic Regions, Census Metropolitan Areas and Census Agglomerations, Census Consolidated Subdivisions, and Census Subdivisions. The boundary files are available in two formats: Digital Boundary Files and Cartographic Boundary Files. Digital Boundary Files depict the official boundaries of standard census geographic areas. The boundaries sometimes extend beyond shorelines into water, rather than follow the shoreline, to ensure that official limits are followed and that all land and islands are included. Cartographic Boundary Files contain boundaries of standard geographic areas that have been modified to follow shorelines. The files provide a framework for mapping and spatial analysis using commercially available geographic information systems (GISs) or other mapping software. A reference guide is included.

FTP: /dli/geography/arcinfo/
- cbf
- dbf

FTP: /dli/geography/mapinfo/
- cbf
- dbf

2) Standard Geographical Classification (SCG). Volume II. Reference Maps

The Standard Geographical Classification (SGC) is a system of names and codes representing areas of Canada. It consists of a three-tiered hierarchy - province or territory, census division, and census subdivision. This relationship is reflected in the seven-digit code. The SGC is used to identify information for particular geographical areas and to tabulate statistics. This volume provides a series of reference maps that show the boundaries, names, and SGC codes of all census divisions and census subdivisions in Canada, in effect on January 1, 2006. It also provides the names, codes and areal extent of census metropolitan areas, census agglomerations, and economic regions. A thematic map of the Statistical Area Classification (SAC) by census subdivision is included.

FTP: /dli/geography/reference-maps
- cd-csd-dr-sdr
- national-maps
- sgc-cgt