Friday, January 30, 2009
Looking back through the DLI-list emails - there was talk about a PUMF for
the NGS2005 in June 2007 - then request for info regarding a joint 2000-2005
PUMF in January 2008.
Wondering whether we could get an update on this file? I have a faculty
member looking to use the 2005 data if its available.
Data from the NGS 2007 (Class of 2005) are currently available by request
only. Here is the associated Daily release.
There is no PUMF yet but I will ask the author division if there are plans
to produce one. If there are, I will ask for an estimated release date. I
noticed that in the past, they have provided us with data tables as well so
I will ask about getting some for the DLI collection.
On Dec. 9, 2008, I announced the availability of the PUMF for the Follow-Up
of Graduates Survey 2005 (Class of 2000).
I am 99% certain that this PUMF includes some of the NGS 2002 (Class of
2000) data but I will confirm that with the Division.
I will follow-up on all of this as soon as I have the full story.
Regarding the NGS, I have confirmed the following with the author division:
1. For the NGS 2007 (Class of 2005), data has been released (master files only) which means that custom requests can be produced at a cost. There will be an analysis report released sometime early this year. From that, the division may be producing some tables however, nothing is confirmed yet.
2. There is no PUMF for the NGS 2007 and at this point they are not discussing it.
3. The NGS 2005 (class of 2000: follow-up) graduates PUMF that was just released does include some variables from both the NGS and the FOG. If you look at the codebook, the first page (table of contents) tells you what variables come from NGS and FOG.
I hope this answers all your questions. I will keep you posted if I hear any updates regarding plans for a NGS 2007 PUMF or data tables.
Research Data Strategy Working Group addressing gaps in data stewardship.
The announcement is provided below and includes a link to the report.
This report is to some extent an update to the 1996 publication, "Data Policy and
Barriers to Data Access in Canada: Issues for Global Change Research," which i view as
the quintessential benchmark against which
data stewardship progress in Canada should be measured.
Table 2 on page 17 shows a lot of work remains and that progress has
been spotty since the 1996 report. However, one statement on this
page should catch the eyes of those associated with the DLI: "There
are considerable variations across disciplines, which contributes to
the complexity of these issues. The social sciences are ahead in
some areas because of the international data documentation standard
DDI, which has been embraced in Canada by the Data Liberation
The Digging into Data Challenge is an international grant competition
sponsored by four leading research agencies, the Joint Information
Systems Committee (JISC) from the United Kingdom, the National Endowment
for the Humanities (NEH) from the United States, the National Science
Foundation (NSF) from the United States, and the Social Sciences and
Humanities Research Council (SSHRC) from Canada.
For more details:
With great excitement, a grad student here just introduced me to the
Has everyone else heard about this already, except me? From all appearances
it seems that this database makes possible direct comparisons of the data
for CMA's and CSD's for the 1996, 2001 and 2006 Censuses.
The grad student is part of a collaborative research group here in Montreal
with Prof's from Concordia and U de M who are extremely interested in the
potential of this data. They have the following
1. Is it the case that that boundaries for the included geographies are
adjusted -- (e.g. to those of 2006?)
Or is this just a compilation of census data, for places "with the same
name", just using numbers directly as generated in the original census (i.e.
disregarding any changes of boundaries)?
2. Is there an ASCII version of this data set to which researchers can have
access? [The grad student has been downloading sections into Excel, but
this is extremely tedious, as their project involves all of Canada.]
It appears that Agriculture Canada was involved. Perhaps we should be
Thanks for any enlightenment anyone has on this project.
You are right, this is a good source of info. A lot of work was done by the
Rural Secretariat to come up with this site and the if you see below under
DATA SOURCES AND LICENSE, this describe the use that can be made of these
WHAT IS CID:
The Community Information Database (CID) is a free internet-based resource
developed to provide communities, researchers, and governments with access
to consistent and reliable socio-economic and demographic data and
information for all communities across Canada.
The CID was developed by the Rural Secretariat in collaboration with the
provinces and territories, other government departments, and community
The CID provides:
Access to over 500 pieces of data about your community or region, including:
population, education, income, employment, families, and much more
An interactive map for displaying and accessing data
A tool for you to learn more about rural Canada
CONTACT US: (any questions concerning this site should be directed to the
Rural Research and Analysis Unit:
The Rural Secretariat is a focal point for the Government of Canada to work
in partnership with Canadians in rural and remote areas to build strong,
dynamic communities. Located in Agriculture and Agri-Food Canada, it:
provides leadership and coordination for the Canadian Rural Partnership;
facilitates liaison and creation of partnerships around rural issues and
promotes dialogue between rural stakeholders and the federal government; and
develops tools and information for rural Canadians.
The Community Information Database is an evolving tool designed to provide
rural Canadians with easily accessible economic, demographic, and social
data at the community level. We welcome your thoughts on how to make the
Community Information Database more responsive to your needs.
Please forward any questions you may have about us or the site, to:
CID-BDC@agr.gc.ca or write us:
Rural Research and Analysis Unit
Rural Secretariat, Agriculture and Agri-Food Canada
560 Rochester Street, Tower 1, 6th Floor
DATA SOURCES AND LICENSE:
Most of the data and indicators in the Community Information Database come
from Statistics Canada’s 1996, 2001, and 2006 Census of Population.
Statistics Canada information is used with the permission of Statistics
Canada. Users are forbidden to copy the data and disseminate them, in an
original or modified form, for commercial purposes, without the expressed
permission of Statistics Canada.
Information on the availability of the wide range of data from Statistics
Canada can be obtained from Statistics Canada's Regional Offices, their Web
site at http://www.statcan.ca, and their toll-free access number
"Population estimates for census subdivisions are produced on a cost recovery basis. These are not part of any products such as publications, CD-ROMs offered by the division. This information could be made available at a cost but you need to take into account the following:
- the data would represent preliminary figures available at the time of release. No revisions would be applied therefore the total would not sum up to the provincial total for the same year based on the same geography. I'll try to explain:
Updating population estimates between censuses entails the use of data from administrative files or surveys. The quality of population estimates therefore depends on the availability of a number of administrative data files that are provided to Statistics Canada by Canadian and foreign government departments. Since some components are not available until several months after the reference date, three kinds of postcensal estimates are produced: preliminary postcensal (PP), updated postcensal (PR) and final postcensal (PD). The time lag between the reference date and the release date is three to four months for preliminary estimates and two to three years for final estimates. Though it requires more vigilance on the part of users, the production of three successive series of postcensal estimates is the strategy that best satisfies the need for both timeliness and accuracy of the estimates.
The provincial estimates as well as the CMA, CD and ER estimates would have gone through the process of revisions but not the CSD estimates based on the 1996 Census.
Here are the cost quotes:
Total population estimates for all CSDs in Ontario for the years 1998 to 2001 based on the 1996 Census adjusted for net census undercoverage: $3,790 + applicable taxes.
Population estimates by age and sex for all CSDs in Ontario for the years 1998 to 2001 based on the 1996 Census adjusted for net census undercoverage: $4,500 + applicable taxes."
This request could not be produced until July 2009."
If your client is interested in purchasing the data, please let me know.
Our time was working on the KIDS Cycle 1, Primary file and discovered a
The Data List syntax for the primary file, cycle 1 contains an error for
variable AD1CS06. The syntax locates this variable in column 1603. However,
the variable is a two column variable and the correct location in the raw
data file is columns 1602-1603. The SPSS DATA LIST
syntax statement should be amended to read: AD1CS06 1602-1603. This
correction produces the correct unweighted frequency counts for this
variable, with a single exception: there is still one case with a response
code of 99 that actually should be coded 1 in order for the counts to match
those in the codebook.
What do we do here?
Regarding the NLSCY Cycle 1 Primary File frequency issue mentioned below,
the author division finally determined that the error is the PUMF codebook.
Here is their full response:
"we have finished the investigation on our end and it is the pumf codebook
that is in error. There may be other errors in the pumf codebook. We are
looking into redoing the cycle 1 pumf codebook."
If you notice any additional discrepancies between the counts and the
codebook, please let me know and I will pass them along to the division.
Please note also that a revised version of the SPSS file for the NLSCY Cycle
1 Primary File is available (as announced on January 19, 2009):
This revision corrects the problems you discuss below concerning variable
With apologies for any duplication, I'd like to ask whether there will bea PUMF for Survey on the Vitality of Official-Language Minorities, 2006 (SVOLM)?
The division responsible for the Survey on the Vitality of Official Language Minorities has confirmed that a PUMF is in process. They expect to release it by the end of March.
I have received a request to access the Financial Performance Indicators for Canadian Business from an external user from the business sector.
Normally, for a DLI product, I would simply turn down the request as this person is not associated with Concordia. But in this case the product is both a DLI and DSP item. In general, depository libraries are responsible for making DSP documents available to the public. So my question is: what rules apply for those hybrid products? DSP or DLI? Or is the answer to provide access to the data but only under strict non-commercial provisions?
1) In my view we have a real problem with downloading executable programmes. We have had the same kind of request and I have responded to specific questions that the public user needed answered rather than give out the whole programme.. With our DLI users we have used the IDLS with a ezproxy access, however once they have the zip file on their computer we have to rely on the restricted use document that is part of the entry protocol.
2)We have also tussled with this problem at Kwantlen. The short answer is that the DLI license takes precedence over the DSP license. There are examples in the DLI licensing case study collection.
The DSP, despite its central function of making Canadian publications available to the public through depository libraries, has strongly discouraged circulating any DSP CD-ROMs. I disagree with Tony Moren's reasons for not lending DSP CD-ROMs as you'll see from the long chain of emails with the DSP and within Kwantlen appended to this email. At an institution like ours where there are no public/student workstations with CD drives for on-campus use, this essentially make them inaccessible and useless. I was therefore happy to note the DLI override.
Until recently, I would burn a copy of the frequently requested DLI CD-ROMs and place them on reserve for short term loan. Since only Kwantlen borrowers (authorized users) can borrow reserve items, this controlled the access. I also emblazoned the jewel cases and CD's themselves with the license info. We now offer authenticated web-based access to many DLI files through IDLS and CHASS. I am in the midst of setting up a "conditions of access" page for these services that users must accept before being allowed authenticated access to any DLI files on IDLS. I've already set it up for our CHASS Canadian Census Analyser and CANSIM subscriptions.
I hope this helps. We're quite new members of DLI and a relatively small institution, so I've been figuring this out as we go. I'd appreciate hearing how others manage these resources.
Just to muddy the waters a bit more, do we have the same level of access to this product through both programs? I have a fuzzy recollection of the DSP version of this product providing only Canada level data, whereas the DLI version allows province-level data…
1) I checked through the listserv archives and your memory served you well. This issue came up in 2005, and it was determined that there were differences between the DLI and DSP versions. According to the message I found, the DSP version only contains national data, whereas the DLI version includes data at the provincial level. This would mean that the provincial level data would be subject to the DLI rules.
DLI contacts can access the full-text of the post at the following link:
The message also notes that the distinction between the DSP and DLI versions is unclear due to the way the product was registered. There is also no way to determine this distinction from any of the documentation.
We would like to investigate this issue further to make sure that we have the full story. We will be sure to keep you posted of what we find out.
2) This discussion on license for the Financial Performance Indicators for Canadian Business has raised some very interesting issues and I think, the DSP vs the DLI license has to be addressed by looking at what community this was designed for. In some cases, this can be confusing but if you go back to the basics, this becomes easier. One must also keep some special circumstances in mind and Chris raised a good example of this taking into account her own reality and what she needs to function with.
I'll take the census for example and the public's perspective:
1- There are data accessible from the StatCan Internet site for everyone. This means that I can go home and access that data directly without going through any intermediary. Census refers to level one access.
2- There are data accessible only through the DSP libraries. This means that in order to access these data, I have to leave home and go to the public library to access that level of information. Census refers to level two access. This is the case where you cannot distribute data or CDROMS and these are only accessible to the public when they access a computer at a public library. In order to identify who has access, Census was provided with all the IP addresses from DSP libraries and this allow them to give access to that level of data with the provision that DSP libraries provide access to these data through a computer on site. Some libraries have dedicated a computer or some computers for accessing data according to the DSP agreement.
3- There are data that are only available through the ROs and in order for anyone from the public to have access to these data, they have to buy it. It refers to level three data.
Now, let look at the DLI's perspective:
The DLI license allow the access to the data for research and training purposes to these institutions which are part of the DLI. Author Divisions including Census have agreed to provide the DLI with standard electronic products. What it means for Census is that DLI has access to all data they produce excluding the data produce from the Census Custom Services. From the Institution's perspective, there are different ways of making the data available to your constituency. You can provide the data on a CD as Chris is doing which is fine. Other institutions have put in place a system that allows their users to access the DLI information through a server which controls access and this is also fine. Others download the data on a CD and provide that CD to the user which is also fine. Some institutions where students have laptops, in some classes they are downloading some files on their laptop and professors indicate or remind students about the use of these data within the DLI license and this is also fine. We have to move with the technology and accommodate users within the environment they are working in.
Overall, I don't think it's one license overriding another but these are licenses directed at providing access to users from different communities.
The problem then is when a smaller institution does not have all the equipment (numbers of computers) they can dedicate or restrict to allow the proper access according the licenses aimed at different communities. They realistically have to choose and this is what Chris has done. (By the way, I greatly appreciate your example, Chris, because it's a reality check and I completely understand and support your position.)
Now for the Financial Performance Indicators for Canadian Business
The DLI version includes data for the provinces, territories and regions, whereas the DSP version only includes national level data. If you look at the "Price note" for the FPICB in the online catalogue, you will see this distinction in terms of price, but not in terms of DLI/DSP availability.
Furthermore, since both versions are catalogued under the same product number, the "Information for Libraries" page does not help users determine which version is DLI and which version is DSP:
Therefore, we have decided to include the following note on the English and French FPICB pages on the DLI web site and FTP site so that our DLI users will know which product is subject to the DLI licensing rules:
Important: Please note that the version of Financial Performance Indicators for Canadian Business available through the DLI includes data for the provinces, territories and regions. However, the version of FPICB available through the DSP includes data at the national level only.
Monday, January 26, 2009
I have a researcher who would really like to get her hands on the Maternity Experience Survey (2007). I see that there doesn’t seem to be a PUMF listed on the DLI site. Will the microdata file be going into an RDC? Is there any other way she might get access?
The author division confirmed that there is no PUMF for the Maternity Experiences Survey (2007). However, the microdata is available through the Research Data Centres.
Alternatively, clients can request custom tabulations (costs start at $400).
A student here is looking for information on EAs from the 1981 Census. Ideally, she would like a boundary file that covers the Skeena FED (59021). The EAs she is looking for are 101-116.
I have checked the geography directory on the FTP site, and, among the 1981 files, there seems to be an EA file for Chicoutami-Jonquière. The only boundary files that seem to be for the entire country are CD, CMA, CMA/CA, CSD, and PROV.
Am I missing something? If not, is it possible to get a boundary file of EAs for either BC or specifically for the Skeena FED?
If an EA boundary file is not possible, would it be possible for someone to scan the paper map, or at least the appropriate section of the paper map, and send it electronically or by fax?
Thanks in advance for any assistance with this.
The EA file for Chicoutimi was a custom tabulation that was done upon request. It was then given to the DLI.
I checked the DLI listserv archive and found a related post from April 2007 that confirms that "Statistics Canada never produced EA boundary files for the 1981 Census". DLI members can read the full thread at the following link:
In any case, I will ask a geography consultant if they can provide you with a copy of the paper map and get back to you.
Friday, January 23, 2009
Can someone tell me if a SPSS syntax file has been created for the March 2008 PCCF? Or will it be created soon?
I can confirm that the DLI does not have any SPSS syntax files for the PCCF,
nor do we anticipate creating any. However, as Laine suggests below, the
SPSS file you need is available on her site at the links provided below. In
fact, her site offers SPSS syntax files for older versions of the PCCF as
We are having difficulties with the latest PCCF file for Sept 2008.
Several hundred postal codes, e.g., for postal codes A1A5K7, A1A5K8 ... , have more than one record which are labeled SLI=1 (1 = the best or only record for the postal code)
There appears to be a coding problem for the variable Single Link Indicator.
I deleted all postal codes with the SLI=0 as the researchers need to deal with unique postal codes.
If this is validated as a coding problem, could a new PCCF file be issued for Sept 2008 please?
1) I talked to one of Statistics Canada's PCCF experts, and he says that the September 2008 file appears to be correct. He provided me with the following, detailed explanation:
"In principle, users need to understand that there can be more than one record for a given postal code where SLI=1, if the postal code has been retired and then "rebirthed", or if for any reason the Delivery Mode Type (DMT) has changed. There should however be only be one record with SLI=1 for a given combination of postal code and DMT. If a postal code has a DMT of Z (retired) and any other DMT, you would want to keep only the non-retired records, which should eliminate most of the duplicate SLI records for a given postal code. But if there is only a retired record for the postal code, keep it.
For the two postal codes you mentioned, the above explanation sufficed to show that there really wasn't a problem, once the true nature of the SLI field is better understood. The same thing also occurred on previous versions versions of the PCCF.
However, it may or may not be appropriate to force only 1 record per postal code (using the SLI), as many postal codes serve a very wide area spanning multiple blocks, DAs and CSDs."
2)Further to the instructions from the PCCF folks at Statistics Canada on how to deal with non-unique postal codes, here are the SPSS steps I used to get rid of retired (DMT=Z) duplicate postal codes in SPSS without getting rid of unique retired postal codes…(nod now if are you still awake)…
1. Create a new variable (ID variable) :
- Using the menu: select TRANSFORM>COMPUTE then enter id in the Target Variable text box and $casenum in the Numeric Expression text box. Click OK.(If nothing happens, select TRANSFORM>RUN PENDING TRANSFORM)
2. Pull out the duplicate Postal codes to the top of the file:
- Using the menu: DATA>IDENTIFY DUPLICATE CASES then enter the Postal code variable in “DEFINE MATCHING CASES BY” and enter DMT in SORT WITHIN MATCHING GROUP BY and then select “MOVE MATCHING CASES TO THE TOP OF THE FILE” .
3. Now we’re ready to get rid of the non-unique postal codes. Please note my duplicate postal codes end at ID=11,849:
- Using the menu: DATA>SELECT CASES and Select If radial and then click on IF button and enter the syntax to identify the cases you want to keep, this is the syntax I used.
id >= 11850 or (id < 11850 and DMT ~="Z").
Then click on CONTINUE, then select OUTPUT: DELETE UNSELECTED CASES (Check first by filtering unselected cases and viewing that you are only going to delete duplicate retired postal codes).
This seems to work, but if anyone has a better way to do this, please let me know.
My apologies for asking this as I am sure the answer is somewhere in emails already sent. A Master’s student is inquiring:
1) The CCHS 2007 pumf is reported to be released in Summer 2009. When will the pumf be released to the DLI?
2) When will the 2006 Census pumf(s) be released to the DLI? The DLI Update #6 emails, dated Jan 8, 2009 states:
PUMFs (single and hierarchical files) – The single file (2.7% of population) is scheduled to be released during the summer 2009 and the hierarchical (1% of population) is scheduled to be released during the summer 2010. From 1971 to 2001, Individual, Family and Household PUMF files were produced. The plan is to produce the same Individual file again but not to produce the Family or Household files. These will be replaced with the hierarchical file which links the three universes (Individual, Family and Household).
Both PUMFs would be released to the DLI shortly after their official Statistics Canada release in the Daily, and the exact dates for the official release have not been determined yet (hence "summer"). In general, it takes us between a day and a week to get a copy of the PUMF from the division after the official release. Once we have the PUMF, Jackie posts it as soon as possible and the announcement is made on the listserv.
Keep in mind that the expected release dates you mention below are quite vague and delays can happen for any number of reasons. For major releases, we usually contact the division just before the expected release time for an update and keep you informed of any changes via the listserv. For example, I will contact Health Statistics Division sometime in June 2009 for an update on the CCHS 2007 PUMF release date.
No doubt this should be obvious to me, but I don't know what the
difference is between the two versions. I've looked at guides for both
of them and they are almost identical. What does SP3 mean? I assume it
is a verson reference but then it should say version...
1) SP3 = Service Pack 3
It's a slightly more up-to-date version of 7.0 that corrects some
bugs without affecting functionality.
2) From the "what's new" document that comes with the program, it
seems that SP3 accumulates the following changes to version 7.0:
- new feature: Cell coloring for footnotes
- bug fixes:
-- Under certain conditions, tables containing footnotes
may display incorrect data.
-- Errors when creating and opening extracts with more
than 999 fields.
On 24 May 2007 an update was sent to explain the absence of newer versions of SARTRE:
"The 2004 and 2005 data has not been processed at this time for lack of funding. SARTRE is a cost-recovery program and the division is currently looking for users that would be willing to finance ($40,000 for two years) this project on a long-term basis. They plan to provide us with updates as developments occur in their search for long term funding for SARTRE. "
Is there any news on this product? UBC researchers are interested in finding out.
The division used to produce a series of custom tables for outside clients and DLI were contributing in the production of these tables. Unfortunately, outside clients decided they did not need these tables anymore and the cost of continuing to buy these data by DLI alone was too expensive. Since this program is fully cost-recovery, the division did try to find other potential users for this information in order to spread the costs but were not successful. This is why they had sent that annoncement in 2007. DLI was contributing $5000.00 a year for these tables. Since then, no other potential users were identified so this is why we were not able to get any other updates.
And here I had understood that the DLI subscription fees are no longer
being used to buy data products from other STC departments?!?
Presumably a misunderstanding on my part.
This is the case now. The Distributive Trade Division stopped producing the SARTRE data in 2007 and Census and Income Division agreed that we stopped transfering money towards the data last fiscal year. Since April 2008, DLI is no longer transfering money to Divisions for data.
I guess that what you are saying is this is now essentially a custom product. If someone pays, it is produced, and if they don't, it is not.
1) It was a custom project for which they had a few customers sharing the price to get it done but since they decided not to buy these data anymore, they dismantled the project and the resources are not there anymore. In order to be able to re-create this side project, they would have to ensure resources on a more long time term because they can't use the resource from the survey to do this. The person doing this before has been re-assigned to another project. So, it's not only a question of if one pays, they would do it. I think it would be if there is a more long term commitment, they would consider doing it.
Hope this help clarify the situation.
2) The DLI has not paid for standard products for some time now. SARTRE was not a standard product, it was basically a special product that was created when interested users were willing to pay for the entire cost of production. A number of years ago they almost cancelled production of this product because they had lost some of their paying clients so I (DLI) agreed to put up $5,000 to help. They also looked for other buyers. They continued to make this product for a while. Now it looks like all of their paying clients
have bailed out. Since this product is cost recovery and not funded under the regular division budget I guess they had no choice but to cancel production. Too bad because this was a well like and well used product.
Last Friday just at closing time a student called, wanting access to Income Trends in Canada for 2001. He needed that year to extract data in 2001 dollars. Unfortunately the online 2001 tables do not work, they all lead to a "File not found" error message. Fortunately I still have the CD-ROM and was able to send him the tables he needed.
I tried a sample of other years and the tables were fine. Can you have someone look into the faulty links to the 2001 edition?
This is the answer we received from the division:
The current link should be used because of the revisions and the CPI should be used to convert the dollars.
We can no longer link to these guides (1) or our copies (2) are out of date. In follow-up, I was unable
to find the following user tutorials and guides from Statistics Canada.
Would it still be possible to access current versions?
(1) Formerly on the StatCan Web site:
Initiation aux tableaux de Beyond 20/20
Getting Started with Beyond 20/20 Tables
Manipuler des dimensions en Beyond 20/20
Manipulating Dimensions in Beyond 20/20
(2) Formerly available from the DLI FTP site as I recall...
Beyond 20/20 «Browser» : Guide d'initiation rapide (InitiationRapide.pdf)
Beyond 20/20 Browser: QuickStart Guide(QuickStart.pdf)
Beyond 20/20 «Browser» : Guide d'initiation (BrowserFra.pdf)
Beyond 20/20 Browser: User Guide (BrowserEng.pdf)
For question 2.
The Following guides should be in your b2020 browser dir
c:\Program Files\Beyond 2020\Professional Browser\Document
I have also place a copy of the files below on the ftp site
(2) Formerly available from the DLI FTP site as I recall...
Beyond 20/20 «Browser» : Guide d'initiation rapide (InitiationRapide.pdf)
Beyond 20/20 Browser: QuickStart Guide(QuickStart.pdf)
Beyond 20/20 «Browser» : Guide d'initiation (BrowserFra.pdf)
Beyond 20/20 Browser: User Guide (BrowserEng.pdf)
For question (1) I will forward it to the division.
Friday, January 16, 2009
article, "The Real Data Liberation Initiative," in response to a
data-warehousing vendor's "data liberation manifesto," a marketing
ploy aimed at Oracle".
Data Liberators of the world... unite and take over!
I have a graduate student who wants all possible profile data for the community of Houston (BC) for 1971-1991. I think we’re OK for everything but 1971. If the figures exist in print, we should have them on microfiche. I checked the 1971 catalogue on Laine’s site at http://prod.library.utoronto.ca:8090/datalib/codebooks/c/cc71/cen71pub.pdf, and I’m not seeing any Stats Can catalogue numbers that look promising in terms of profile data for census subdivisions. Can anyone lead me in the right direction with this?
Of course, if we have anything electronic, that would be even better…
1) I don't think that you will find a complete profile of Houston in print format since it is a small community. In 1971 print products, many variables are given for larger sdr only.
However, you could extract more data from electronic files in the BSTs at the enumeration areas level. You can find description of available variables and related files on Laine's site at
Most of these files (or all of them?) are available on the DLI ftp site.
Since the data in these tables are given at the EA level, the disadvantage is that you will have to extract Houston EA data (which are 301-304 in 1971 census) from the tables and aggregate them to have the complete SDR.
Last year, I have made a presentation on this topic at DLI Training Session of Sherbrooke. If you want, I can send you the PowerPoint presentation (in French only but could help) that I have prepared for this in a seperate mail ; I should make a final revision of this document to post it on training repository!
2) Because of the problems of area suppression and random rounding, I would
suggest using the municipality level (CSD) files from the 1971 census,
rather than the EA-level files.
Houston BC is prov=59 cd=51 csd=34 in 1971:
Yes, I would love to see your PowerPoint presentation.
Also, what document did you use to determine the enumeration code range for Houston for 1971?
You will find the cen71-offical-list files (pdf files) on the DLI ftp site under census/1971/doc (first I tried to find it into the geography directory, but it is really under census).
I've just read Laine's message. Yes, some very small numbers may be very difficult (or impossible) to interpret but it may be sometime useful to know how to get data from EA.
Today was 'live-demo' day for a group of PhD candidates in Social Work, and I was taking them through all of the wonderful StatCan Resources. When I tried to demo "CANSIM via EStat", by clicking on a subject from the right-hand column, I received this message:
ERROR 53: OPENING F:\WWW\ROOT\CII\CII_SUBJ.TPE
They were not impressed....
Is this local to McGill -- or are others getting this message?
I was able to do a search, but that interface is now a 'shadow' of its former self ....
I will be doing more demo's soon, so I hope for a quick fix!
Here is the E-STAT manager's explanation as to why this problem occurs:
"it occurs because users have book marked an address which is now being redirected. The redirects function very well for static pages, however, there are instances when one tries to generate dynamic pages (such as CANSIM by subject) where the redirect results in the error described below."
The manager suggests that you avoid redirection altogether by accessing E-STAT from the new address:
Others had reported this same problem back in November. On November 26, I had sent out an email advising members to change their bookmarks for E-STAT to the new addresses to avoid the problem. I have attached that email to this message for your reference.
Please let me know if this new address does not solve the problem and I will investigate further.
I have a researcher who needs statistics on the amount of area (acres or hectares) in Canada that is being planted with genetically modified varieties of crops — especially corn, canola, soybeans. It doesn't seem to be measured by the Census of Agriculture; however, we have found references that refer to this being measured by Statistics Canada's Crops Surveys. Would it be possible to have access to these through the DLI?
An analyst in Agriculture Division informed me that data on GM crops (corn and soy only) in Quebec and Ontario only is available online. Here is his response:
I suspect this is somewhere on the ftp site but I can't find it - so any assistance would be appreciated. We're looking for the updated Standard Classification of Goods data.
1) It is available on the Web at
2) You are right - the SGC is on the DLI FTP site:
The latest edition is for 2001, and can be found in file scg01.zip.
3) Found this on the website. Would it be what you're
Friday, January 9, 2009
I have a researcher wondering about the following:
Table 2.2 in the 2003/2004 files - do these numbers exclude foreign students? or are they included? In some of the following tables foreign is listed separately.
My contact in the Education Division confirmed that foreign students are included in the numbers for Table 2.2 of the 2003/2004 SED files.
I am looking for information about enrollments in distance education programs. Does anyone have a source for this? The places that I used to get this information don't seem to have it anymore.
I looked in the CAUT almanac, but there isn't anything. As well, the internet use surveys don't tell us whether they actually are enrolled, but just that there is some use for courses work.
I just received confirmation from the Education Division that they do not collect data on enrolments in distance education.
Is there any chance of getting the SHS 2007 files available though the DLI? I see we have up to 2006 available.
The SHS 2007 data tables were released on Dec. 22. I have asked the author division to provide us with the tables and will let you know as soon as they are available on our sites.
In both of these files, I’ve come across a problem in variable YRHHMOVE (year household head moved into dwelling). The codebook for each of these years indicates that valid values range up to 94; the frequencies which I ran picked up cases with value 95 in the 1995 HIFE, and values 95 and 96 for the 1996 HIFE.
I presume that the documentation for this variable wasn’t modified from an earlier release, and that the valid range has increased. If not, what do codes 95 and 96 signify?
I finally received confirmation from the author division regarding variable YRHHMOVE. Your presumption is correct. The documentation hasn't been updated to reflect that the valid ranges have increased. Therefore, the 95 and 96 codes are valid and signify the year the household head moved into the dwelling.
I have a faculty member looking for employment income by occupation from the 2006 Census. We found the following table, http://www.statcan.gc.ca/bsolc/olc-cel/olc-cel?lang=eng&catno=97-563-X2006063 , which gives the income by occupation, but only one occupation at a time can be chosen, and there are 720 occupations. I may just be overlooking something, but is there a table that will present the income for each occupation all at once?
I did find the numbers broken down by occupation for the 2001 census:
1) You are accessing the table from the Statistics Canada website and viewing it in HTML format, which has limited capabilities. However, that same table (97-563-X2006063) can be accessed from the DLI FTP site and viewed in Beyond 20/20 format:
Once you have obtained the table from the FTP site, you can then drag and drop in the variables of interest to create the table that you need.
2) Go to the bottom of the table and choose the IVT link – this will open the file in Beyond 20/20. You can manipulate the table to cross show all occupations against income.
3) Go into IDLS (http://idls.ca), and do a file name search for 97-563-XCB2006062/97-563-XCB2006063
This will give you links to Beyond 20/20 table with national/provincial totals, and the one which you had found for Census Metropolitan Areas/Census Agglomerations), from which you can grab all occupations at once.
I just received a request for the following information and wonder if anyone might have a suggestion?
I am looking for border crossing volume for the BC/American border that is broken down by specific border crossing and by day for my undergrad economic thesis. I have looked through E-SAT, but the only data I was able to find was monthly border data that was not broken down by specific crossings. My professor said that he worked with daily data for the Ontario area, so I assume it exists for BC.
I expect the student may be referring to the CANSIM Table 427-00011 ?
Number of international travellers entering or returning to Canada, by type of transport, monthly (persons) Definitions, data sources and methods: International Travel Survey: Frontier Counts - 5005
I can also see that we have the International Travel Survey in the DLI Collection: http://www.statcan.gc.ca/dli-ild/data-donnees/ftp/its-evi/its-evi2007-eng.htm . This looks promising?
The International Travel Survey microdata file should provide you with the data you need. According to the User Guides for the survey, the following characteristics are available on both the "Canadian Residents, Trips Abroad (United States)" and the "United States Residents, Trips to Canada" microdata files:
* Date of exit and re-entry (Day, month, year)
* Canadian border crossing at exit and at re-entry (land port, airport, seaport)
Please take a look at these and let me know if you need any additional information.
Monday, January 5, 2009
A user would like to know if a release date has been set for the Census 2006 PUMFs. Also, they'd like to know if a guide is currently available and if it will be ready before the PUMF release date. Lastly, they'd like to know how much the PUMF will be represented compared to the total file (in %).
To date, summer 2009 is the planned release date for the PUMFs. However, it's still too early to know how much the PUMF files will be represented compared to the entire file in percentage terms.
Here is a summary of our Census PUMFs.
Census 2006 Publice Use Microdata Files (PUMF):
Individual PUMF file (2.7 % of population) (800,000 records)
· Same file as in the past
· Unit of analysis is person
· Provincial breakdown (legal jurisdiction for education, health etc.)
· Most Census Metropolitan Areas for diversity studies (Maybe top 15? not sure yet until we finalize)
· On and off reserve flag as per request from Indian and Northern Affairs Canada (INAC)
· Variables taken from the questionnaire. Allows users to create their own derived variables
· File focused on individual characteristics and less household and family variables
· Release projected for summer 2009
Hierarchical PUMF File (1% of population) (150,000 records)
· Data by Regions (5 maybe - Ont, Que, BC, East & West) - no other provinces
· Some Census Metropolitan Areas for diversity studies (maybe top 6, not sure right now until analysis is done)
· Links the 3 universes (individual linked to family or household) - Unit of analysis are person, family & household
· Variables taken from the questionnaire. Users can create their own derived variables
· Allows for international comparison since some international agencies are producing Hierarchical files for allow better analysis
· Release projected for summer 2010
We have presented this plan to Microdata release committee and have their approval to continue. We will return to Microdata release committee sometime in March 2008 with our final content for approval of the Individual PUMF File to be released in the Summer of 2009.