Wednesday, February 22, 2006

Revised GSS Cycle 18 SPSS

SPSS files for the GSS Cycle 18 were produced. The file consistently uses short variable names, but includes a "rename" command if you want to work with the long ones. Missing value codes have also been added.

These files are available for download from
ftp://janus.ssc.uwo.ca/drl/SPSSSyntax.
I'll try to remember to put stuff there consistently until we establish a shared space for storing things like this.

Tuesday, February 21, 2006

Statistics Canada Catalogue Numbers

Here is the breakdown of the last three characters of a Statistics Canada catalogue number:

Eg: XCB (variable descriptor, medium, language)

Variable descriptor : Used to further define the product where necessary

P = preliminary: a product generally produced before the final conformation of data.
S = supplement: additional information added to an existing product; this information cannot stand on its own.
R = revision: a revision to an already published product, it can also be an Errata.
U = update: sections added to or replacing existing chapters in a product.
G = guide: user guides or guides to other Statistics Canada products.
M = monographic series: a series of individual monographs, typically a research paper series.
L = licence: licensing agreements.
X = not relevant to this product. Most products have an "X".

Medium : The medium in which the product is delivered to the client.

A = Audio-cassette
B = Braille
T = Tape
D = Diskette
M = Microfiche
C = CD-ROM
I = Internet
K = Kit
P = Paper/print

Language : the language that the product will be delivered in.

E = English
F = French
B = Bilingual

XCB - C= cd-rom, B = Bilingual
XDB - D= diskette, B = bilingual.
XPB - P= paper, B = bilingual

Monday, February 20, 2006

Average weight of women by CSD

Question

I have a student who is looking for the average weight of women in the town in Ontario that she lives in, Georgina, ON. From the CCHS, I can get to the geography level of the Health Region (Simco, York), but I can't seem to go to the actual town level. I can get the BMI, but I'm not sure about the actual weight--the PUMF does have a variable for self-report weight and a variable for measured weight.

Is there any other survey that collects weight and provides geography to the level of the town? (In this case Georgina is a CSD)

Answer

Statistics Canada does not have a product that would break down weight level down to the level of CSD (Georgina, ON). The lowest level of geography would be the Health Region (Simco, York).

Tuesday, February 14, 2006

Census Enumeration Area - Dissemination Area Correspondence File

Question

A patron is working with the "Enumeration area-dissemination area correspondence files" for the 1996-2001 Census. I have just downloaded these for him from the DLI FTP server. He has sent the following request for assistance. Any help, or direction, would be appreciated.

"I can access the files, but I'm not sure they're giving me exactly what I need. I'm hoping to match specific 2001 dissemination areas within census tracts (for example dissemination area 170158 within census tract 0010.00) to a 1996 enumeration area. I can't see how to do it."

Answers

1. Since the coding of EAs is totally independent of the coding of census tracts, but _dependant_ on the coding of FEDs, the census tract that an EA falls in in a particular census is irrelevant. In other words, there is only one EA 158 in FED 17, and only one FED 17 in whatever province the user is working with in the 1996 census (Quebec, Ontario, Alberta, and British Columbia each as an FED number 17 in the 2003 RO - so the users needs to distinguish which province as well). In the 2001 census, the FED was replaced by the census division, and each CD has a unique number within the province, and each DA has a unique number withing the CD. So the province code is essential, while the census tract is irrelevant, since it is not part of the EA or the DA coding.

The province+FED code+EA code in, say 1996, with in the EA-DA correspondence file, be mapped one or more to a province+census division code+DA [+block] code(s) in the 2001 census.

If the user is mainly concerned with census tracts, maybe s/he should be using the census tract conversion tables, which were published in the census tract profile volumes until about 1996, but which we have digitized and added to the ones produced by Stats Can, and all linked at:
http://www.chass.utoronto.ca/datalib/major/georef.htm

2. If i correctly understand what your patron wishes to do, it can be done through Geosuite 1996, which is available from the DLI FTP site under the Geography/1996/Geosuite directories. Here is what I understood your patron wanting to accomplish: take a 2001 DA and find its corresponding EA or EAs in 1996. Knowing this, identify the CT in which that DA was located in 1996.

Step 1: find the corresponding 1996 EAs. Seems that you've already located the correspondence list from the 2001 Census. This
file resides under Geosuite 2001 as a text file named:ea-sd_corr.txt.

I opened this file in Wordpad (which handles the Unix end-of-line characters correctly in Windows) and did an Edit/Find for the 2001 DA provided from your patron: 59170158. There are multiple records for this DA-to-EA linkage because the correspondence is done at the Block Face level. The first occurrence is with 1996 EA 59032301. I kept pressing F3 (Find Next) working my way down the file. The next 1996 EA corresponding with this DA is EA 59032308. Pressing F3 a couple more times reaches the end of file.

DA 59170158 in 2001 is made up of parts of EA 59032301 and 59032308 in 1996.

Step 2: To find the CT in which these two EA's belonged in 1996, start a 1996 Geosuite session.

1. From the menu, select CODE SEARCH.
2. From Level at the top left of the Code Search Window, select FED. Laine noted in her reply that EA's were encoded using PR (2-digits), FED (3-digits), EA (3-digits) in 1996.
3. In the Code text box, enter: 59032. This will display the Victoria Federal summary.
4. Click NEXT and on the subsequent window, select EA and click NEXT again.
5. Click the double-right-facing arrows (>>) to move all elements to the results box.
6. Click SET CONDITIONS and for the field EAuid EQUALS, enter: 59032301
7. Click ADD; select OR and for the field EAuid EQUALS, enter: 59032308
8. Click OK and then NEXT
9. The resulting window will contain the two EA's from 1996 with their corresponding Census geography. Scrolling to the right, you will find that the CMA for these EA's is 935 and the CT is 2417.

Friday, February 10, 2006

Car Sales and Senior Citizens

Question

A student is looking for buying habits of seniors in respect to cars. In other words, what cars are the senior citizens buying? National or provincial. Recent. I found a few surveys re : car sales / retail trade, but I can't match the numbers in $$ / units it with age groups of buyers.

Answer

Is the user seeking names/brands of cars purchased by seniors? If so, please note that Statistics Canada will not report that sort of information at all. We do not identify the brands purchased - only the commodity codes for that commodity.

Unfortunately, the surveys to which you are referring do not ask demographic questions about the buyers.

One idea that comes to mind is the Survey of Household Spending. For example, the internet publication 62-202-XIE (Spending Patterns in Canada) provides information on household equipment. They have a category for persons aged 65+ (single and couples are in different tables) and they break down the number of vehicles owned or rented by these seniors (breakdown = "automobiles" and "vans and trucks").

I am not sure if that is what your user is looking for, but you can probably get this data at a lower level of geography by using the tables found on the FTP (not 100% certain though).

Thursday, February 9, 2006

Need data on personal debt and assets

Question

Can anyone tell me where I might be able to find personal debt and assets by CMA? The period we are interested in is 1980-2000.

Answer

Assuming that you are looking for the assets and debts of individuals, one of the only products available through the DLI that can answer asset and debt questions is the Survey of Consumer Finances.

A few issues:

1) The data is not available at the CMA level (as far as I know, only National and Provincial)
2) Data specific to assets and debts was collected in 1977, 1984 and 1999.

Census Geography Question

Question

A professor is looking for geography files where a county (Census Division, ex. Albert county in NB) is broken down to census tracts and the revenues.

We also found this link to the maps

http://geodepot.statcan.ca/Diss/Maps/ReferenceMaps/
ct_cmapdf_f.cfm?startrow=46&Geocode=305&Cmaname=Moncton


It looks like we were somewhere close, but how does one go through 4 pages of maps identified only by codes?

Answer

A quick note about Census geographies:

A Census Tract only appears in Census Metropolitan and Census Agglomeration areas (large urban centres) which have been tracted. An entire county will not be covered by Census Tracts normally as it sometimes encompasses both large urban and rural areas.

When looking at a Census Division (county in New Brunswick) you can break it down into these geographies:

Census Division:
- Census Subdivision (CSD)
- Dissemination Area (DA)
- Designated Places (DPL)
- Census Consolidated Subdivision (CCS)

If your client is looking at a level of geography lower than CSD, perhaps DA would be your best bet as it will cover the entire county.

When you identify the level of geography you want, then perhaps the electronic profile would be a good match for finding the income variable. However, if you need for variables to be crossed (i.e., income by sex by age for example), then you must find the appropriate topic-based tabulations to meet your needs.

The codes correspond to the Dissemination Areas. You will need to find the DAs that interest you and then retrieve the data that corresponds to it from the DLI FTP server. That is the real reason for the maps - so you can visualise the areas and select which geography codes best meet your needs. The street names should help you situate the areas a little better.

Patents per Capita

Question

I am looking for patents issued per capita for the Census Metropolitan Areas of Ottawa, Toronto, Montreal and Vancouver. Any suggestions?

Preferably recent (2001+) data.

Answers

1. I checked with one of my colleagues who does a fair bit of patent searching and he replied: I couldn't get much from the Canadian patent database on inventor locations - inventor address didn't seem to be part of the publicly available record. US patents
http://www.uspto.gov/patft/index.html do have address info (probably since 1976). I tried locally and had to include the country code with the search: ic/waterloo AND icn/ca

A fancier search would list other cities (would I count St Jacobs or Breslau as part of Waterloo?). For date ranging you'd have to decide on whether you were limiting by patent date or application date (major difference in the US, not so in Canada). This picked up what inventions had inventors in this city.

2. Statistics Canada does not have any information on the overall number of patents, either at the national level or by CMA. However, the Canadian Patent Database is a public database at Industry Canada, from which you may be able to extract this information. See http://cipo.gc.ca.

We do have information on the number of Canadian patents issued to Canadian universities and research hospitals for 2003 but due to confidentiality, it is only available by province/region, not by CMA. See p. 30 of the document below.
www.statcan.ca/english/research/88F0006XIE/88F0006XIE2005018.pdf

Information on the number of US patents issued to the larger Canadian universities is available by institution from the AUTM survey. See http://www.autm.net.

'Daytime population' and 'Night time population' definitions

Question

I have done some searching on the 2001 census website, but cannot find definitions of what is meant by 'daytime population' versus 'night time population', as illustrated in the maps at:
http://geodepot.statcan.ca/Diss/Maps/ThematicMaps/placeofwork_e.cfm

Am I correct in guessing that:

Daytime population = (in the place of work tables) all those who worked full/part time with usual place of work in the CT/CSD (by place of work status) + those who worked at home (by place of work status) with place of residence in the CT/CSD + those who did not work in 2000 but with place of residence in the CT/CSD (place of work status)

Nightime population = population of the CT/CSD of residence

Answers and Responses

1. In short, night time population = place of residence and daytime population = place of work.

2. For those who may be interested to see these maps, thematic maps are available on
http://geodepot.statcan.ca/Diss/Maps/ThematicMaps/Index_e.cfm
for some Census Metropolitan Areas (the largest: Montreal, Toronto, Vancouver, etc.)

Annual Survey of Travel Arrangements

Question

I have a researcher who has found data from the Annual Survey of Travel Arrangements available on CANSIM (350002) and is now looking for more detailed geographic info - down to CMA, if possible. I figure we're not going to see this but thought I'd throw out the question to see what other sources of information people have used in the past.

Items the researcher is seeking include:
- services offered by businesses within the industry (hotels...)
- occupancy rates
- client-based packages they offer..
- revenue (?)

Answer

1. Statistics Canada does not offer that level of detail. Perhaps provincial tourist bureaus could be of assistance?

CANSIM via E-STAT & CHASS

Question

I have a student who is seeking current data from CANSIM and wondering why CANSIM via CHASS is not as user-friendly as CANSIM via E-STAT. In particular, they are frustrated by the inability to use 'dimensions' in CHASS to define in step-wise fashion the table they want to see. E-STAT is not up-to-date enough for this user... but the challenge of using CHASS to get more current data is proving difficult. Does anyone know if CHASS is thinking about upgrading their interface?

Answers and Responses

1. I understand the frustration. However, I am also told that CHASS have been working a revised interface to the CHASS database, one which does take advantage of the dimensions structure of the new CANSIM database format. At present, part of the problem is that the weekly CANSIM feed that CHASS receives does not contain the dimension labels, and so as the interface now stands, instead of ˜Province",˜Age groups", ˜Commodities" as dimension headers, the best they can provide is ˜Dimension 1", ˜Dimension 2," etc., which is not nearly so informative. I have seen the new interface, but without substantive dimension labels, it's a lot less than ideal. I too hope that this problem can be resolved quickly. The current CHASS interface is sort of liveable with small tables with not too many series in them, but with the huge tables now in CANSIM, it is very difficult.

2. I appreciate the problems with CHASS, but this has been going on since CANSIM II emerged. Our strategy is to search on E-STAT and if there's nothing up-to-date there, go to StatCan's CANSIM and search there. Just short of paying for the data, we advise our folk to write down the v numbers and then go to CHASS.

Very few of our users need to go past step 1; virtually none on Sept. 1 of any calendar year as that's when ESTAT is cut. But for the others, we try to keep them as far away from CHASS as we can until they have at least the v#.

3. CHASS suffers from the fact that it was the very first implementation of an Internet interface (non-Web and then Web) to CANSIM, or what became later CANSIM I, a year or so before STC had one of its own. When CANSIM II emerged, we chose to retain the format (for comparability reasons), while STC moved to a different (multi-dimensional) format.

We will have some form of dimensions by May/June 2006 - the exact implementation depending very much on the format of the data feed that we get weekly from STC. We do have an internal interface right now that uses dimensions - but it is not perfect. So, bear with us for 2-3 more months.

4. CHASS is a secondary distributor of CANSIM data. This means that CHASS pays Statistics Canada for the data and redistributes it at a charge.

Your suggestion of ip recognition is not viable as DLI institutions never received free access to the CANSIM database. The only free access through Statistics Canada is through E-Stat. Otherwise, you can pay CHASS to access up-to-date series and deal with their interface until it is updated.

Thursday, February 2, 2006

CA data for 1971

Question

A researcher is gathering CSD and CA age & sex population data for various BC communities going back to 1971. The only thing we seem to be missing at this point is 1971 CA data for the Terrace CA. Can anyone tell me if this exists anywhere and, if it does not exist, why? I am thinking that there was not a CA for Terrace until the following (1976) census?

Answers and Responses

1. As far as I can tell, in 1971 Terrace was CA number 152, see:
http://www.chass.utoronto.ca/datalib/other/76_01_CMACA_list.xls
and http://prod.library.utoronto.ca:8090/datalib/codebooks/c/cc71/doc/cmacode.txt
which was keypunched straight from the 1971 census aggregate tables documentation.

2. Unfortunately, our Census and Geography Divisions came up with a different answer:

From the information I have been able to find Terrace was not a CA in 1971 but it was in 1976. In 1971 in the publication cat #92-708 Table 9 lists the CA's with a population of 25,000 and over and Terrace does not appear. In 1976 in publication 92-806 Table 6 it does appear as a CA.

I did find the 1971 Census Place Name Reference List and the population of Terrace, BC SGC type SD (census subdivision) for 1971 was 9991. There is also a second entry for Terrace, BC with the SGC type PL (unincorporated place) with a population of 7820.

3. The DLI now has both the data and the metadata for the file requested (BST B1DEMB01, Table 2). You can find them on the DLI FTP site at the following address: /ftp/dli/census/1971/bst71 . The record lay out is found in the folder called "Record Layout".

Education Survey

Question

I have a researcher who is in the process of putting a proposal together, but has some questions regarding availability of some surveys:

1 - University Student Information System - which I believe is now called the Enhanced Student Information System

2 - Post-Secondary Education Participation Survey (PEPS)

3 - NGS2000 - will there be another round of this survey in the future?

Answers and Responses

1. Statistics Canada response:

1) ESIS is not available to the DLI. If your user wants access, it will be a custom tabulation from the division at a cost (unless he/she can find the data in a report).

2)PEPS is not available to the DLI. If your user wants access, it will be a custom tabulation from the division at a cost (unless he/she can find the data in a report).

3) NGS2000 microdata, should be released by the end of the 2006 fiscal year, but this is very tentative and the release date will more than likely be delayed.

2. These are the titles listed as being part of the RDC:
http://www.statcan.ca/english/rdc/whatdata.htm

I did not see any mention of PEPS and ESIS. The question would best be answered by the RDC program itself though
(http://www.statcan.ca/english/rdc/network.htm).

3. NGS2000 PUMP is hopefully going to be released by the end of the fiscal year. The 5-year follow-up to the 2000 graduates is on-going and will be released. When is unknown at this time, but pls. tell your client that NGS is an on-going survey and data is released periodically.

Thematic Search Tool (TST)

Question

Is there a reason why access to the thematic search tool is now restricted? It's an extemely valuable resource for finding older files.

Answers and Responses

1. I asked this question a while ago and the division advised that the tool is not being kept up-to-date and that users should be using the IMDB (Longitudinal Immigration Database).

2. When the IMDB becomes as useful as the TST, we will certainly use it. But the TST is the ONLY resource at StatCan that will allow users to look at surveys in that level of detail. While it is not being kept up to date, it is still extremely useful. A caveat as to its coverage should be enough cya for even STC. Rendering a useful resource useless just doesn't make sense.

3. The difference between the thematic search tool and the IMDB is like the difference between searching the library's catalogue, versus searching a database of journal articles. Sure you can find out that Stats Can has a survey called SLID in year XXXX, but only the Thematic Search Tool will tell you that that survey has a question with eg the word "daycare" (I'm pulling stuff out of memory here, but I think that's about right).

Unless the IMDB can provide searchable question text in all surveys, it is no replacement for the TST.

4. A final point is that the IMDB is NOT retrospective, while the TST is. Thus much of the material in the Thematic Search Tool is not going to be found in the IMDB. We can use the IMDB for current stuff, but the TST is invaluable for the historic files.