Wednesday, April 30, 2014

Accessing Business Microdata

Question

I have a researcher interested in data sets from the following surveys:
Survey of Digital Technology and Internet Use (data release June 2014)
Survey of Innovation and Business Strategy (data release Feb. 2014)
Will PUMFs be made available for either of these, or will the researcher be restricted to using CANSIM tables?

Answer

The surveys you mentioned are business surveys and as such do not typically produce PUMFs due to confidentiality reasons. If researchers are interested in accessing Business microdata, there is a new program: The Canadian Centre for Data Development and Economic Research (CDER) <http://www.statcan.gc.ca/cder-cdre/index-eng.htm>

I noted on their Data sets that SIBS is available: <http://www.statcan.gc.ca/cder-cdre/data-donnees-eng.htm>. Alternative access would be the tables in CANSIM.

Tuesday, April 29, 2014

2013 PCCF Records Missing DAUID

Question
A user has reported and I have verified that there are quite a few records in the current PCCF (pccfNat_JUN13_fccpNat.txt) which have no DAUID assigned them. To be specific, there are 6,463 postal codes on the file missing DAUID; this number is reduced to 5,120 if the Single Link Indicator (best record match) is applied.

Are these the records that are referred to in the documentation (92-154-g2013001-eng.pdf , What’s new), with the following note? “A small number of new postal codesOM are linked to a census subdivision only. These new postal codesOM do not yet link to Statistics Canada's geographic frame. Linkage below the census subdivision will appear on these records when the street and address information becomes available on the geographic frame. This new linkage will appear on subsequent releases of the PCCF.”

I note as well, having looked at these records, that other fields are also assigned a code of 0 as a consequence of being identified at CSD only: FED03UID is not assigned if the CSD has multiple Federal Electoral Districts. I could quibble a bit with the characterization of this missing information as a “small number” – 5,120 is a fairly large number, but in context is just over 0.6% of all of the best match records in the file. I had thought of overlaying latitude/longitude onto DA boundary files, but it appears that all such records have the centroid of the CSD assigned to these fields.

He is concerned with the accuracy of his analysis as a consequence of missing this information: I don’t know how many of his observations fall into these unidentified postal codes. When will the next version of PCCF (with the updated geographic frame) be released?

Answer

I confirmed with subject matter that they will not know when the next release of the PCCF will be until end of Summer, at which time I will follow up for more details.

In regards to the updated geographic frame - although street and address information can come from Canada Post, we cannot do anything with this information unless we have matching information on our geographic frame to link to. Our geographic frame is based on the Spatial Data Infrastructure (SDI) which in turn is based on theNational Geographic Database (NGD).

A) The National Geographic Database

The National Geographic Database (NGD) is a shared database between Statistics Canada and Elections Canada. The database contains roads, road names and address ranges. It also includes separate reference layers containing physical and cultural features, such as hydrography and hydrographic names, railroads and power transmission lines. Priorities for road network file maintenance are determined by Statistics Canada and Elections Canada, enabling the NGD to meet the joint operational needs of both agencies in support of census and electoral activities.

The main sources for the NGD include:

· Statistics Canada's street network files
· Elections Canada's road network file
· National Topographic Database (NTDB) digital coverage at 1:50,000 and 1:250,000 from Natural Resources Canada, and Digital Chart of the World (DCW) coverage at 1:1,000,000
· provincially-sourced data sets
· other information from field operation activities, municipal maps and private sector licenced holdings.

More information on the NGD can be found in the Census Dictionary,<http://www12.statcan .gc.ca/census-recensement/2011/ref/dict/geo015-eng.cfm>.

B) Spatial Data Infrastructure (SDI)

The Spatial Data Infrastructure (SDI) is an internal maintenance database that is not disseminated outside of Statistics Canada. It contains roads, road names and address ranges from the National Geographic Database (NGD), as well as boundary arcs of standard geographic areas that do not follow roads, all in one integrated line layer. The database also includes a related polygon layer consisting of basic blocks (BB),1 boundary layers of standard geographic areas, and derived attribute tables, as well as reference layers containing physical and cultural features (such as hydrography, railroads and power transmission lines) from the NGD.

The SDI supports a wide range of census operations, such as the maintenance and delineation of the boundaries of standard geographic areas (including the automated delineation of dissemination blocks and population centres) and geocoding. The SDI is also the source for generating many geography products for the 2011 Census, such as cartographic boundary files and road network files.

More information on the SDI can be found in the Census Dictionary, <http://www12.statcan. gc.ca/census-recensement/2011/ref/dict/geo020-eng.cfm>

Detailed information on the process for Linking to 2011 Census geographic areas can be found in the PCCF reference guide – Data Quality section,<http://www.statcan.gc.ca/pub/ 92-154-g/2013001/qual-eng.htm>

Building Values in Greater Vancouver

Question

A researcher is looking for data on the value of commercial, retail, industrial, and public sector buildings by census tract, or, even better, by DA, for Vancouver: he is working on a project modeling the cost of the ocean levels rising. Are there any files specific to Vancouver that UBC or Simon Fraser might know about (e.g., tax rolls, etc.)?

Answer


BCAssessment would have this data, but you'll have to contact them about availability: <https://www.bcassessment.ca/Pages/default.aspx>.

There is also some property data available from the City of Vancouver: <http://data.vancouver.ca/datacatalogue/propertyInformation.htm>.

GSS Cycle 26 Release

Question

When will the GSS Cycle 26 be available?

Answer

They plan to release the PUMF this summer. They have not yet set a release date.

Wednesday, April 23, 2014

Income Dissemination Areas

Question

A student managed to create or find a map of dissemination areas for the Moncton CMA (based on the 2011 reference map for the 0013.00 CT) with superposed data about average income (in thousands) and type of dwelling by dissemination area. The only reference he gave his professor is the following link: <http://www12.statcan.gc.ca/census-recensement/2011/dp-pd/prof/index.cfm?Lang=E>. His professor would love to consult this data, but we can’t seem to find it. Where can he find it?

Answer

For the 2011 Census you can go through GeoSearch and obtain this map for each Census Tabulation in Moncton and if you know the DA codes that are in each CT you can also extract the information for those dissemination areas (Das):

http://geodepot.statcan.gc.ca/GeoSearch2011-GeoRecherche2011/GeoSearch2011-GeoRecherche2011.jsp?layerSelected=ct&searchGeocode=305&searchGeocode1=0013&searchGeocode2=00&cmdSearchEntered=Find+geographic+code&searchTheme=GeoCode&searchPass=2&MinX=8184178.393061218&MinY=1491604.14285714&MaxX=8323337.884081631&MaxY=1576309.92&LastImage=http%3A%2F%2Fgeodepot.statcan.ca%2FDiss%2FGeoSearch2011%2FOutput%2FGeoSearch2011_f6geoimsprod393635722515.gif&lang=E&FormTool=&sZoomLevel=4&boundaryType=ct&boundaryType2=&boundaryDefault=N

For the 2011 NHS we did not publish any information below the CT level in our standard products. For the CT mentioned above you can obtain this NHS Profile: NHS Profile, 0013.00, New Brunswick, 2011

Thursday, April 17, 2014

UCASS

Question

I've had a researcher ask whether or not the university salary information from UCASS will be continued to be collected via a different mechanism. More specifically he'd like to know if this data (at an institutional level) or an approximation thereof will be available even on a custom basis.

If not, could you also clarify the reason this survey was cancelled (or put in abeyance) and whether or not alternatives were considered?

Answer

Please note the updated products listed below and the path to access them via the EFT site.

2010-2011 University and College Academic Staff System (UCASS). This survey collects national comparable information on the number and socio-economic characteristics of full-time teaching staff at Canadian degree granting institutions (universities and colleges). The information is collected for each individual staff member employed by the institution as of October 1st of the academic year.

EFT:
/MAD_DLI/Root/other-products/ Univ and College Academic Staff System-ucass

Wednesday, April 16, 2014

CCHS 2010 PUMF vs. Master File

Question

I have a researcher who is looking for the total number of variables and cases in the CCHS 2010 master file are. She wants to compare this to the characteristics of the PUMF to help her decide if it is worth using the master file at the RDC. Is there any way to get this information? The Nesstar site does not provide frequencies for master files (and CCHS 2010 is not among those where the metadata is available).

Answer

Your researcher can consult Nesstar for the number of variables in the PUMF versus the masterfile- however the case count information on the masterfile is considered confidential. We did recently receive new Masterfiles to be added to Nesstar, and I checked and the CCHS 2010 annual component is available. I will have the DDI coder add it as a priority. As noted in the announcement when we started adding masterfile metadata:

Through our on-line tool, NESSTAR, researchers will be able to view survey documentation, methodology and variable level information related to the Statcan masterfiles. This metadata has been stripped of all confidential information, therefore, no frequency information or case counts are available for the masterfiles on the NESSTAR. The inclusion of DDI metadata for masterfiles in Nesstar will improve the discoverability of Statistics Canada surveys. Researchers will be able to identify variables of interest on the master file to assist them with their project proposals or custom tabulation requests. DLI contacts will now be able to distinguish whether a PUMF or the master file would best fit their users’ research needs.

Alternatively, I would encourage your researcher to speak to a RDC analyst or contact Health Statistics Division for more information.

Tuesday, April 15, 2014

Infant Mortality Rates

Question

I have a researcher looking for infant mortality rates for 1920, 1925, 1930 in Saskatchewan. I can only find recent numbers.

Answer


The information we have is not exactly the way your client would need it. What we do have available are older publications which date as far back as 1921 and onward. The client can be referred to the depository library as well. Would you have access to these? If not, I can see what is available digitally and send you some.

Union Coverage and Income Wages

Question

A PhD student here is looking for income data that shows either the total share of income or annual average income for unionized workers and non-unionized workers, for the period 1960-2010. I can find CANSIM tables that cover 1997 to the present, but I'd have to go to the LFS directly to get wages and union coverage data back to 1976. Why aren't these LFS data on union coverage and wages back to 1976 in CANSIM, given that the variables are there?
Are there any sources that will get this student what he needs? Historical Statistics of Canada doesn't have tables on wages and union status, so I'm not sure the data exists.

Answer

The Labour Force Survey (LFS) only began to collect wage and union data January 1997 with the new LFS questionnaire. Prior to this date, wage and union questions were not on the LFS questionnaire.

Thursday, April 10, 2014

Film, Television and Video Post-production Service Bulletin


Question

A student is trying to access the 1997 to 2004 issues of Film, Television and Video Post-production Service Bulletin (87-009X). As far as I can tell, the pre-2006 issues of that publication are no longer available online and I do not believe they have been released as print publications, at least not through the DLP. Can we obtain a copy of those publications? <http://www5.statcan.gc.ca/olc-cel/olc.action?ObjId=87-009-X&ObjType=2&lang=en&limit=1>

Answer

I have submitted the request for the broken link to be repaired. In the meantime, I have confirmed with the library that it can be ordered via interlibrary loans. The university library should submit an ILL request on behalf of the student.


Wednesday, April 9, 2014

CCHS 2012 Mental Health

Question

In the CCHS 2012 Mental health - variable MHE_06C, the question reads “… on a scale of 0 to 10 …” – but the reported (and observed) frequencies include a value 11 (which has the highest number of respondents: 136). Why is this?\

Answer

In the questionnaire there is an interviewer note for MHE_Q06C that states if they do not go to work/school, they should select 11 to represent "Not Applicable." This should have been indicated in the data dictionary so we will add it to the errata.

Tuesday, April 8, 2014

CCHS Mental Health PUMF


Question

I’ve been working on the PUMF released for the Canadian community Health Survey 2012 (Mental Health), and have some questions and comments about the file. For all of the interference means (DEPDINT, MIADINT, HYPDINT, BIPDINT1, BIPDINT2, BIPDINT, GADDINT, AUDDINT, SUDDINT): do lower scores indicate less interference? I’m assuming so, based on looking at the derivation of DEPFINT and MIAFINT, but want to be sure.

Incorrect SPSS variable labels:
MEDGOTHR 'Used - antidepressant - 2 D - (F)' should be “Used - other – 2D - (F)”
PN1_01D2 "…: not readily available" should be “…: job interfered”
EDUDR04 "Highest level/edu.: HH 4 levels (Derived)" should be “Highest level/edu.: RESPONDENT 4 levels (Derived)

Incorrect SPSS value labels:
SPSDATT, SPSDGUI, SPSDALL, SPSINT, SPSDWOR
- 6 is NOT “Not applicable”, but part of scale
- 7 is NOT “Don’t know”, but part of scale
- 8 is not “Refusal” by highest end of scale

WSTDJIN, WSTDPHY
- There is no value 5 as 4 is the top of the scale

DHHGHSZ
- 5 should be “5 persons or more in household”

EDUDH04 EDUDR04
- 3 should be “Some post-secondary”

FMI_05
99999.96 "NOT APPLICABLE"
99999.97 "DON'T KNOW"
99999.98 "REFUSAL"
99999.99 "NOT STATED"
- Variable has 2 decimal places; 99999.96 through 99999.99 should be declared as the missing values, not 99996 through 99999.

Answer

We’ve looked into the format errors for SPSS and we will be preparing an errata for most of the items listed below. We will be producing new format files that will be available on request as well.I was unable to replicate the issue for this variable: PN1_01D2. It seems that all is fine on the SPSS code we currently have.

In regards to the other question: Yes, lower scores indicate less interference.

Monday, April 7, 2014

National Graduate Survey

Question

It seems that the National Graduate Survey has been dormant for sometime. Did something replace it?

Answer

There will be a PUMF for the 2013 National Graduates Survey (Class of 2009/2010) produced. This will be available in the next eight months.

Thursday, April 3, 2014

PCCF Usage

Question

I have a PCCF question that may require clarification:

An established research group on campus - whose members are all part of the university community as either faculty or graduate students - have taken on a contract to conduct research for a local government agency. PCCFs will almost certainly be required to complete the work and submit the results to the agency. Data will not be shared but results of the research will be, as per Appendix A of the EULA, which allows use of postal code resources for research purposes.

But given that the research group's work is for a third party and that money will change hands for services rendered, does the postal code resources EULA apply?

This has been a very tricky situation for the research group. They are acting as agents of the university, i.e, the actual, capital-R, capital-G Research Group has been contracted to do the work and not the researchers working in their individual capacity beyond the university's mandate. This means that the university's Research Ethics Board requires ethics approval before research begins. At the same time, since the research is for a third party, other administrative units on campus see their work as external to the school and its policies. So we want to clarify where things stand with the postal code resources EULA.

Answer

I consulted with our Licensing section and have confirmed that their use is not acceptable under the EULA with the DLI since this is being done for a third party. The DLI was granted a Licence for the research, teaching and planning purposes. The third party must contact a Regional Office to acquire the file. The end-use licence which will be provided by the Regional Office will permit the third party to provide the PCCF to the university solely for the purpose of consulting services.

The Regional Office can provide the third party with information about the acquisition process. If you would like more information, please let me know and I can put you in touch with the appropriate party here at Statistics Canada.