Thursday, September 26, 2019

Race and the Census

Hello, I have a researcher who wants clarification as to why 'race' is not used as a variable in the Census. Also, an explanation of how the 'ethnicity' variable was developed for the Census would be helpful.


The Census Dictionary for every modern census year explains and expands on any terms and variables contained in that census.

Ethnic origin/race/ancestry/heritage/ etc are all complex terms to define and select, and is a historically evolving concept in addition to being differently measured in other census-holding countries, so there's usually links to further information from the Census Dictionary as well.

Here's the expanded info for Ethnic Origin for the 2016 Census:


Our subject matter specialist has provided the following information :

“We do not have a 'race' variable as such on the census, but we do have the visible minority variable, which is documented in the Visible Minority and Population Group Reference Guide, Census of Population, 2016, available through the following link:”

Historical Census Data Project

As some of you may be aware, OCUL’s Historical Census Working Group (part of the OCUL Data Community) is working on scoping a comprehensive bilingual inventory of Canadian census data. Our dream is to eventually build a bilingual and openly available discovery platform for census data & statistical tables (print & digital) going back to the earliest Canadian censuses. 

This is a big goal, and it certainly isn’t exclusively an “OCUL” one. Yes, this project started because a small group of data librarians in Ontario got together and started talking, but we need participation from across the country if we are going to realize our goal and have it meet everyone’s needs. 

I’m writing right now to update you on some of the work we’ve been doing, to request your feedback, and to encourage you to consider getting involved. This project is very flexible, and we can evolve the way we are organized and our working methods to accommodate everyone with an interest in all things census.

So...where are we at? We’ve been hard at work considering all of the relevant census collections in detail, reviewing existing work on the topic (such as existing inventories and platforms that contain some census material), and determining the inventory’s scope (see our scope statement here). 

We have also been prototyping the actual inventory, determining what metadata fields are needed to adequately describe the various data products. We invite you to review the following documents:

Inventory design (in English and French), showing how we propose to organize the inventory to accommodate the relevant census data products and documentation

Prototype inventory spreadsheet (attached for the census year 1921), showing the metadata fields we have selected. The cover sheet provides a summary of the inventory progress. 

Metadata crosswalk (attached), showing how the selected fields correspond to several relevant metadata standards. We intend to harvest existing metadata, which was the main driver for creating this crosswalk. 

Please let us know what you think! You can add comments directly in the documents, or don’t hesitate to send me an email with your feedback.

Next steps: we hope to finalize the inventory design in the next few weeks, and begin inventorying in earnest. There are over 100 censuses on our list to inventory, so this is a very big job! If you are able to contribute in any way (your time, student employee time, etc.) please get in touch. 

Thanks, and we look forward to hearing from you!

Wednesday, September 11, 2019

GSS 27, 2013, Social Identity

I have a researcher using the GSS 27, Social Identity, that she has some questions about.
  • Were respondents contacted, and the survey conducted, on cell phones as well as landlines?
  • In the survey questionnaire, there is the following language preference variable: LP_Q01:  Would you prefer that I speak in English or in French? This variable is not included in the PUMF. Would it be available in the Master file through the RDC? I would like to know how many respondents who identified as visible minorities selected ‘Other’.
  • LP_N02 INTERVIEWER: Select respondent's preferred non-official language. If necessary, ask: (What language would you prefer?) – would this variable be in the Master file also? If a respondent selected one of the unofficial languages, are they allowed to conduct the survey in that unofficial language? For example, if a respondent selected “Urdu”, would the survey be conducted in “Urdu”?
  • If respondent selected “Other”, but is not allowed to conduct survey in an unofficial language, are they then dropped from the survey?
  • From pg 160-166 of the 2013 questionnaire (attached for reference), there is a series of questions on respondent’s language background. These are also not in the PUMF. Would they be available in the Master file through the RDC?
    • LNR_Q025: Of English or French, which language(s) do you speak well enough to conduct a conversation? Is it...? English, French, both, neither.
    • LNR_Q114: Do you still understand Chinese?
    • LNR_Q120 Do you still understand Vietnamese?
  • LNR_Q100: What language did you first speak in childhood? and LNR_Q155 What language do you speak most often at home? are in the PUMF as grouped variables (LANCH and LANHSDC). Would the other languages grouped ‘other languages’ in the PUMF be included in the Master file?

See tabulation below for the following question:
  • I am confused as to why more visible minorities report “yes” to voting in the last federal election than those who report “yes” to being eligible to vote. In other words, 1065 visible minorities are eligible to vote, but 2671 claim they voted-- but how can so many people vote when they are not even eligible? This is quite a large discrepancy. Even if there’s some over-reporting of voting behavior, I feel survey administrators would have caught on to it.
  • This discrepancy between reported “yes” to voting and eligibility carries through in the provincial and municipal elections.
  • There is quite a large proportion of “valid skip” to these voting questions (I assume coded as “.a” in the .dta file, which corresponds to “6” in the codebook) What constitutes “valid skip”? Respondents below 18 yrs old?

Here is the response from subject matter:
  1. Survey respondents in 2013 were contacted both on landlines and cell phones. For the first time for a social survey at STC, respondents were also offered an Internet option.
  2. Language of interview (English or French) is available on the Masterfile as is Knowledge of official languages (‘Of English or French, which language do you speak well enough to conduct a conversation?’). The number of those respondents who answered that they can conduct conversations in neither English nor French is very small. Crossed by visible minority status it may be unreleasable. 
  3. Statistics Canada’s official policy is to conduct interviews in English and French only. In some cases, exemptions are granted to carry out an interview in a third language. Regional offices keep a list of interviewers and their language profiles in the ad hoc case where the respondent request another language beside English and French.  This is a best practice rather than a policy, and it depends on the language requested and if an interviewer is available with this profile.  When there are no interviewers able to conduct an interview in the respondent’s preferred language, the case becomes “out-of-scope” as a result of language barrier.
  4. LNR_Q025, LNR_Q114, LNR_Q120 and LNR_Q100 are available on the Masterfile.
  5. The reason for the discrepancy between number of respondents who voted and the number of respondents who were eligible for voting is that only respondents who reported not having voted were asked if they were eligible to vote. The valid skip category therefore includes all those who answered “yes” to having voted (since this clearly implies that they were eligible) as well as those who were under 18 years old.

Friday, September 6, 2019

Food Allergy Data for Business Evaluation


I have a researcher from the School of Business looking for information about what areas of Canada, US, or worldwide, have the highest rate of food allergies. They are looking to see if there are areas of high concentration of ‘single-location full-service restaurants’ that have high prevalence of food allergies. The objective is to identify an opportunity in these concentrated service areas to focus on a 'food-allergy' customer group in order to gain a strategic advantage in a crowded market.

Does anyone have suggestions for finding prevalence of food allergies?

I’ve received the following response from subject matter:

“In 2017, the Canadian Community Health Survey(CCHS) asked two questions on allergies. The first question asked if the respondent has ever been told by a health professional that they had allergies as a result of an allergy test. The following question asks what they are allergic to, with one of the categories being certain foods.

The data is available in a custom request.

It seems like the level of geography they would be interested in might be small, and because allergies were only available in 2017 the sample size might not be available. The feasibility of a custom request will depend on the sample size for the requested geography.”