Friday, January 27, 2006

Census Geography Files


I am attempting to reconstruct the census geography for Newfoundland over all census periods. To conduct a comparative analysis I need to compensate for changes in the census collection areas over the censal time periods. Thus I would need all digitally available geographic files (PR, CD, CCS, CMA, CSD, EA) for Newfoundland prior to the 1996 census.


You can access files from 1981, 1986 and 1991 from the
FTP: /ftp/dli/geography.

Tuesday, January 24, 2006

Hours Worked by Households/Families


Is there a source anywhere that would give me the above information -- for 1982?


The LFS microdata files contain respondent and spouse's usual hours at all jobs, and at the main job. So you can at least calculate hours worked for the census family, though not for the economic family or the household.

Thursday, January 19, 2006

Canadian Heart Health Survey, 1986-1992


A PhD student is asking for data from the Canadian Heart Health Survey, 1986-1992. My preliminary research indicates that this survey may not be part of the DLI Collection? Is it correct to conclude that libraries need to order the Heart Heath in Canada CD-ROM from the Canadian Heart Heath Database Centre in order to access this survey data ? It appears the CD is available for a $50.00 handling fee. My conclusions are based on the following information.

There is information about the licensing and availability of a CDROM product containing The Complete Canadian Heart Health Database from the Canadian Heart Health Database Centre. The web pages reference the provincial health surveys that were carried out between 1986 and 1992, so I expect the survey data is on this CDROM. and and and

The University of Toronto Data Library Service also has a web page for the Canadian heart health survey, 1986-1992 at
but it appears this data is restricted to U of T?


1. The information I have on the data from this series of surveys is that it must be obtained from the Canadian Heart Health Database Centre. This is housed at Memorial University. These data are not part of the DLI Collection.

2. The DLI collection consists primarily of Statistics Canada holdings. We do have a few external sources (such as the Catch and Release data, and the Canadian Addiction Survey), but usually the authoring department contacts us to place the holdings in our collection (granted, sometimes it is a DLI contact who graciously informs them of our program and suggests that it would be a good place to keep their information).

Therefore, you must contact the Canadian Heart and Health Database Centre to procure the dataset.

Question about census profiles


A question has arisen about how to interpret something in the census profiles.

It relates to housing costs; the tables are in the last group listed under Electronic Profiles -

Income of Individuals, Families and Households, Social and Economic Characteristics of Individuals, Families and Households, Housing Costs, and Religion. These tables all have catalogue numbers from 95F0492XCB2001001 through 95F0492XCB2001009.

There are several instances like this in the subsets:

Tenant households spending 30% or more of household income on gross rent
Tenant households spending from 30% to 99% of household income on gross rent


Owner households spending 30% or more of household income on owner's major payments
Owner households spending from 30% to 99% of household income on owner's major payments

The second line is indented and appears to represent a subgroup under the first. The numbers in the 30%-99% group are consistently a bit lower than those in the 30% or more group.

What does this mean? The reference title "The 2001 Census Standard Products Stubsets" and the notes in the Beyond 20/20 tables do not refer to the difference between the two.


Please find the explanation from the division :

Before the data release, we consulted with different data users. Some users insisted that STC exclude households spending 100% or more of their income on shelter, on grounds that the data do not lend themselves to any meaningful interpretation and that their inclusion would skew the analyses. Other users demanded a qualifier to indicate that the 30% threshold does not necessarily and does not always mean affordability problem; but some threshold should be presented as a general indication of trends.

In response, we have released the data as:
1. Less than 30%
2. 30% or more (i.e. including 100% or more)
3. 30% to 99% (which excludes the households spending 100% or more)

The figures for category 3 would allow users to exclude households spending 100% or more on shelter, for their analyses.

The relatively high shelter cost to household income ratios for some households may have resulted from the difference in the reference period for shelter cost and household income data. The reference period for shelter cost data (gross rent for tenants, and owner’s major payments for owners) is 2001, while household income is reported for the year 2000. As well, for some households the 2000 household income may represent income for only part of a year.

National Graduates Survey 2000


I have a researcher enquiring about the NGS 2000, any idea if we can get our hands on this one?


The file is not yet available and this is the reply I received from the division:

Unfortunately, I can't give a date. We still need to prepare the actual file, present to the Microdata Release Committee and then address any concerns they might have. Depending on the nature of those concerns, we have to make a second presentation to them for approval to release. All these things take time.

They are hoping to have it out by the end of April 2006, but that seems like a very ambitious goal.

Monday, January 16, 2006

Old Labour Force Survey Data


A UBC professor is looking for Labour Force Survey statistics for the period 1960-1975, i.e. the years prior to those available on CANSIM and the Labour Force Historical Review as well as in the microdata files. His specific requirement is for the numbers of employed persons, male and female, for Canada, by five-year age groups.

I have combed through the print issues of The Labour Force. Before September 1971 the only age groups are 14-19 and 20-64. Beginning in September 1971 there are monthly breakdowns for ten-year age groups.

But in the earlier (pre-September 1971) issues there is a statement in the technical notes regarding access to more. It reads:

"Other Data Available - In addition to the published statistics, there is a considerable amount of data which can be obtained on request. Following is a list of data available.

1. Age and sex distributions..."

My question therefore is whether the numbers can still be retrieved for five-year age groups for the period 1960-1975 since we cannot locate them in the standard published data. Or is there somewhere else to look? It seems unlikely since the main labour force publication at the time published ten-year age groups only.


As you are probably aware, there were significant changes in the methodology of the Labour Force throughout the years, which explains why all data is not currently available in one continuous series. The division did revise the standard data (CANSIM, LFS Historical Review, etc..) back to 1976, but not earlier.

It was very good of you to use the older printed versions for additional information, but caution should be used as they used a different methodology and the numbers can not necessarily be used for comparison purposes. One could say it is comparing "apples and oranges" so to speak... I am simply alerting you to a problem that has happened to others.

In terms of getting older data in five year age groups, the only option that I can think of is to contact the division for a "fee" product that would ensure the continuous series with the same methodological concepts.

Thursday, January 12, 2006

Distance Education Enrollment in Universities across Canada


I just checked to see if the AETS covers distance education enrollment in universities across Canada but it does not. Is there something I've missed or would this require a customized tabulation?


In reference to your question, please find the Education Division's reply:

The only distance learning we have is in the enclosed link. There really is not a lot to offer at the university level.

FSA boundaries before 1996


Does anyone know if FSA geographic boundary files (ArcView format) are available through DLI at the national or provincial level prior to 1996? I can find the FSAs for CMAs in 1986, but nothing that would cover all of BC.


Unfortunately the files on our FTP site are the only ones we have for FSA 1986. All other levels of geography are also available from the FTP site. If you require anything else that is not found under the geography folder, it may be at a cost as it will not be a standard product.


Please note the following errata notice:

Subject: Re-issue of CD-ROM product

Because of an error in the preparation of the CD-ROM, the link "CV Tables - Approximate Sampling Variability Tables" in the Documentation screen page of the application leads to (opens) an incorrect version of the document.

The correct version of the document, named ARROX_SAMP_TAB_E.PDF (sic), is included in pdf format in the Documentation directory on the CD-ROM. As indicated above, however, this is not the document being opened via the link.

The situation will be rectified, and a revised CD-ROM will be issued shortly. We will send you a copy of the revised CD-ROM as soon as it is available.

We regret any inconvenience you or your organization may incur because of this.


The projected release date of the corrected cd-rom is three weeks at the earliest. The author division had asked for the files to be locked until it had had time to ensure nothing else was at fault. The division has completed its review and has suggested the files be unlocked, with the condition that users be made aware of the faulty link in the cd-rom interface, and that special attention be placed to ensure that users are referring to the correct CV tables.

GSS 16 main file update


I've loaded this file into IDLS, but encountered a piece of strangeness with it.

The codebook indicates that variable HLTH_UTIL_INDEX should have values ranging from 0.000 to 1.000, with 9.999 as a missing. Working with the current version of the data file, I'm encountering a problem - 284 cases report values of less than 0 (to as little as -0.309) for this variable.

I looked at the raw data file to make sure that I wasn't somehow altering it - the negative values appear there as well.

Is there a problem with the data, or with the documentation?


I was doing a presentation to the Health folk at STC using the original file a year or so back and came across the same thing. Since 0=death and 1=perfect health, I felt sure that the -.309 must have been a mistake. Not so, apparently. I asked for an explantion and the quick answer was worse than death. But they never came through on the real definition.

So, it's not an error in the data, but it is a puzzle. And the documentation does not reflect the reality of the numbers. The file I was working with was the CCHS file, not the GSS16. But the rest of the message remains the same.

Downloadable copies of the ACCOLEDS/DLI presentations

You can now download copies of the ACCOLEDS/DLI presentations from the U Sask Library website at:

Removal of files


If the division requests that we remove all older files, we oblige. Does that mean these older files have to be removed from data servers as IDLS, Landru, Sherlock, etc... or is it only to be removed from the DLI ftp/web site?


If the division provides the DLI with a new data file and asks us to remove it from our FTP site it means that they do not want the older version in circulation. This is usually because there was an error on the original file or they revised the weights, or some other action that will produce different results. If the maintainers of systems like IDLS, Landru and Sherlock want to ensure that their users have access to the latest and most up to date data files and be able to produce results that will match those being poduced elsewhere it is strongly advised that they remove the older version and install one that is now the "official version".

2001 Census - Population and Dwelling Counts by Postal Code


Does a file (from Statistics Canada or elsewhere) exist with current (or near-current) postal codes exist containing 2001 population and dwelling counts?

Version 8.3 of the DMTI CanMap Postal Geography contains 1996 population and dwelling counts.


This is the answer we received from geo division

I think the client must have received a version of the PCCF with population and dwelling counts as produced by DMTI. Our Postal Code Conversion File (PCCF) does not contain population and dwelling counts. I think that DMTI joined the PCCF with information from GeoRef 1996 to create a file with population and dwelling counts. The next release of the PCCF is scheduled for January 31, 2006.

If your client has a copy of GeoSuite 2001 then a join can be performed between that file and the latest PCCF to get a PCCF file with 2001 population and dwelling counts.

****Also on a side note*****

The January 2006 Postal code conversion file (PCCF) - 92F0153UCE and the Postal codes by federal riding file (PCFRF) (2003 Representation Order) - 92F0193UCB will be released on Tuesday, January 31, 2006. Both the PCCF and PCFRF reflect postal code data from the Canada Post Corporation up to and including October 2005.

The PCCF will again contain the Federal Electoral Districts for both the 1996 and 2003 Representation Orders.