Monday, June 7, 2021

From UBC Library: New Data Manifest Creation Tool

May 21, 2021

From UBC Library
I'd like to announce a simple new tool called "damage" that UBC Library has created which will hopefully help data professionals and researchers keep track of their data sets. It's a very simple file utility that produces file manifests. While it's intended for use with data, it can really be used for anything. It's a command line program that outputs a manifest in a variety of formats: plain text, CSV and JSON.

For plain text files, often used for microdata, the utility produces information on:

  • Minimum line length
  • Maximum line length
  • Number of records
  • Constant records flag (ie, all lines are of the same length)
  • Row and column of non-ASCII characters
  • Flag for DOS/Windows formatting (ie, carriage return + line feed as opposed to just a line feed).

For files in SAS, SPSS and Stata formats (ie, .sas7bdat, .sav and .dta) the utility will provide information on:

  • Number of cases (reported as rows)
  • Number of variables (reported as columns)

It's pretty simple to use. For example, to check one file, called setup.py:

>damage setup.py
setup.py
md5 checksum : e51af6d52cffdb9b355b04267bf700eb
Encoding: utf-8
Number of records: 45
Minimum line length 16, maximum line length 81, variable records

It can traverse an entire directory tree, though, and analyze all the files at once. I've included output from the recently released CPSS 6.

The manifest program could prove very useful to us or to researchers because by running it, you can see at a glance if a SAS, SPSS or Stata file has the correct number of records and/or cases, without having to open SAS, Stata or SPSS.

My hope is that a manifest like this could be included with any new additions to the DLI FTP site (or maybe even for current ones). That way when we DLI members download the material, we can quickly run the utility and verify that what we downloaded is correct. Arguably more importantly, if there are any file changes, like a new PDF or syntax, the checksums allow fast and easy comparisons between an old versions and a new one.

The utility could be useful for researchers as well. Using the CSV output, a researcher could produce a largely complete manifest which accompanies any data. They would undoubtedly need to fill in a few details, but that's still much easier than starting from nothing.

Premade binaries for Windows and Mac are available here.

Source code, documentation and a Python library (called fcheck) that you can use in other projects is available here. 

To show you how it would look in practice, I've attached sample outputs I made from a recent release of Canadian Perspectives Survey Series 6:


Any questions or concerns regarding this tool can be directed to UBC's Koerner Library.

Friday, June 4, 2021

Older Census Data - EA Level and Reference Maps

June 29, 2020



Question

I have a research team looking at sub-municipal population trends for Prince George, BC/Regional District of Fraser-Fort George hopefully going back to the 1940s. We get back to 1981 just fine, but prior to that I am running in to a wall looking for EA-level data/statistics and corresponding reference maps. Do these exist somewhere? I have found the EA-level files available from U of T, and I swear I’ve looked at the documentation but I can’t find a map of Enumeration Areas. If anyone could point me in the right direction I would appreciate it.



Answer

The library compiled information, available here.


I searched the Library’s collection of published census reports from 1941 to 1976, and I found limited information on enumeration areas. That said, many of the instructions for enumerators mentioned that maps had been distributed to the enumerators; some of the census reports included lists of EA codes; and I found EA-level data from other resources online. I feel all this suggests that EA-level data might exist for 1976, 1971, 1966, 1961, and possibly 1956. Have you tried contacting demography or DLI to see if they might have it?


While I could not find maps in StatCan reports showing EA boundaries, I found records of EA maps at Library and Archives, as well as the University of Toronto. Furthermore, there are a couple of maps on Scholars GeoPortal that contain EA-level data; these maps do not seem to delineate the boundaries of the EAs, but if you zoom in close enough, you can see numbered blocks on the maps. From the datasets, could a user figure out which blocks are included in each EA? I’m not familiar enough with GIS, but maybe Geo Help would know? (Links to Scholars GeoPortal are in the attached document.)


In general I think that most of the information that is held at Statcan is from 1971 to currently however it might be worth it for you to check with demography anyhow.


Geography had this to say about 1976 and 1971 data when I reached out to see what was available.

In terms of digitally, our data in fichiersGEOfiles go back to 1971. Each folder does have the Geographic Attribute File (GAF) that contains census geographic information at the Enumeration Area level for all of Canada. It may not have everything your client is looking for but each record includes population and dwelling counts, land area, names, unique identifiers, and geographic codes for linkages with other census boundaries. Unfortunately we do not have access to our own library of historical documents at the moment for information prior to 1971.

Friday, May 21, 2021

Human Trafficking Data

February 10, 2021


Question

The Daily announced today that Preliminary national estimates on police-reported human trafficking incidents, 2020 was released. And that is all it said. There was no indication of how to access the data; are they tables, in RTRA, RDC? Or that it is part of the UCR.


I looked up the UCR tables and can’t find any 2020 data related to human trafficking. So what has been released?




Answer

What The Daily released yesterday was a data availability announcement only. This means that there is no analytical report or any associated CODR tables.

 

The data is however available upon request to the CCJCSS.


*UPDATE | FEBRUARY 11, 2021* The table is attached here. It’s not much at all, however subject matter needs to announce every release (no matter how small!) in the Daily.

Census 2006 Data Question

January 6, 2021


Question

Can someone please help with answering the following question from a patron?


I've pulled data for each CSD from the 2006 census using Beyond 20/20 and my advisor wanted me to ask , based on the info for StatsCan that I've copied below - how can I know if zeroes in my dataset represent data suppression or are true zeroes? He is thinking that those that were suppressed will likely need to be treated as missing for my analyses, since they aren't true zeroes, but I'm not sure how to accurately differentiate these. He's thinking that I can likely do this by looking at CSD population size and the number of total private households, and if neither of these thresholds is exceeded (as outlined below) then the zero is likely a true zero (likely not many of these), otherwise I can replace the other zeroes as missing values. Does that make sense?


Census Info | Area suppression for income characteristic data:

Area suppression, when applied for data quality purposes, is used to replace all income characteristic data with zeroes for geographic areas with populations and/or number of households below a specific threshold. 


If a census tabulation contains any data showing income characteristics for individuals, families or households, then the following rule applies. Income characteristic data are zeroed out for areas where the population is less than 250 or where the number of private households is less than 40. These thresholds are applied to 2006 Census data as well as all previous census data. The threshold of 40 private households is based upon the fact that weighted data are being used. With the weighting factor for each household being 5, setting a threshold of 40 ensures that there will be at least 8 households used in the calculation. The private household threshold does not apply for tabulations based on place of work geographies. 


This seems to be what was happening in my data, as some variables for a single CSD have ‘.’ And others zeroes. Those with zeroes typically seem to relate to income, proportion of household spend etc.




Answer

Statistics Canada places the highest priority on maintaining the privacy and confidentiality of respondents. If necessary, data are suppressed to prevent direct or residual disclosure of identifiable data.   Because of this and the data quality measures in place, your client will not be able to distinguish between all “true zeros” and these suppression zeros. Area suppression is one type of suppression which involves removing all characteristic data for geographic areas with populations below a specified size. Having counts based on the geography should let them filter most of those out.

  • 250 people, if the table contains income data, and if the table also contains place-of-residence data, at least 40 private households
  • 100 people, if it is a six-character postal code area, that is, a local delivery unit (LDU), or if it is a custom area
  • 40 people, in all other cases.


In regards to your client’s question on individual cell suppression please see the following paragraph from Chapter One of the 2006 Overview of the Census:

  • Dissemination rules for statistics - Tables are sometimes accompanied by statistics such as averages, totals and standard deviations. There are various ways of ensuring that these statistics do not reveal sensitive information; for instance, they may be suppressed or made less precise. Some statistics, such as totals, ratios and percentages, are based on the rounded values in the tables to which they apply. A statistic will be suppressed if there are too few data to compute it. In cases of data items expressed in dollars, if the statistic must be calculated from data where the values are too close or if a value is too high compared to the others, then the statistic will be suppressed.

Depending on the income source variable, income medians and averages are most always never true 0. When there is a zero for most things it is a suppression. As for counts that have been rounded to zero, it is a feature of the confidentiality system and you cannot distinguish those rounded down from the true zeros.”

Thursday, May 20, 2021

Impact of Covid-19 on K-12 Education

July 7, 2020


Question

I have a researcher looking for data on the impact of covid-19 on K-12 students' education. If possible she'd also like to see comparisons based on race/ethnicity or socioeconomic status. Anyone have suggestions? Please advise. Thanks!



Answer

There are a few articles on children, schooling and COVID-19 on the Data to Insights for a Better Canada page. It includes online preparedness of children, academic and financial impacts on postsecondary students,  and impacts on the work placements of postsecondary students. This may not be exactly what the researcher is looking for, but may be a start.

LFS Supplementary Indicators and Visible Minorities

February 8, 2021

To support the analysis and interpretation of January 2021 LFS results, see attached links to:

 

 

This data is publicly available under the Statistics Canada Open Licence

Friday, May 14, 2021

Immigration of Catholic Priests in Canada

April 6, 2021



Question
I’m currently working on a research piece about the immigration of catholic priests in Canada due to a decline in local priests and growing secularization in the country.


I am looking for the statistics and numbers on the immigration of religious workers in Canada since the 1990s, and more specifically Catholic priests. I would like to find the following information:

·         Number of priests that immigrated to Canada as "religious workers" each year since 1990

·         Where these priests were from

·         Which province these priests migrated to or at least which parish sent a letter to hire them


If the type of religious worker (ex: priest/rabbi/imam) is not tracked, and only the number of religious workers as a whole is tracked, I would still like the stats on the number of religious workers that immigrated to Canada from 1990 and what province they went to.



Answer

I found from IRCC a bit of the legal aspect about this:

Religious work – International Mobility Program https://www.canada.ca/en/immigration-refugees-citizenship/corporate/publications-manuals/operational-bulletins-manuals/temporary-residents/foreign-workers/work-without-permit/authorization-work-without-work-permit-clergy.html

 

There are two separate provisions in the Immigration and Refugee Protection Regulations (IRPR) relating to religious work:

 

  • paragraph R186(l) provides a work permit exemption for religious leaders
  • paragraph R205(d) provides a labour market impact assessment (LMIA) exemption (code C50)

 

So I looked on the Open Data Portal for data on the International Mobility Program and found the following

 

Temporary Residents: Work Permit Holders – Ad Hoc IRCC (Specialized Datasets)

Temporary residents who are in Canada on a work permit in the observed calendar year. Datasets include Temporary Foreign Worker Program (TFWP) and International Mobility Program (IMP) work permit holders by year in which permit(s) became effective. Please note that the datasets will not be updated.

https://open.canada.ca/data/en/dataset/67fd1fae-4950-4018-a491-62e60cbd6974

 

Specialized Research Datasets: Temporary Resident – Ad Hoc IRCC (Specialized Datasets)

https://open.canada.ca/data/en/dataset/31ef4cab-d2b3-4dba-8e91-48fe64211ec5

 

But I can’t validate that the data contains the specific work permit of interest because when I click on the Access button, nothing happens on either dataset. Perhaps IRCC would have the data and could make it available upon request?

Great Data Literacy Modules From the UK Data Service

March 16, 2021

The UK Data service has made available introductory level interactive modules that are designed for users who want to get to grips with key aspects of survey, longitudinal and aggregate data. I skimmed through several of them are they are great. Even demonstrate how to get started with preparing survey data for analysis.

March 16, 2021



Question
I have a researcher trying to find total new births, birth rate, total new deaths and death rate by year at the CSD level.  I've found some data at the Health Unit level, but thought I would ask if anyone has come across anything at CSD level or smaller geography before.



Answer

Some datasets that may be of interest - the data frequency for the first two is monthly and the rest are annual:

 

Birth registrations in Ontario (by location)

https://open.canada.ca/data/en/dataset/6a2ee0c2-b3f1-4af2-9c86-735377a961af

(municipalities / CSDs)

 

Death registrations in Ontario (by location)

https://open.canada.ca/data/en/dataset/f2d9985b-7195-4a12-8f6c-4efc9adca8d1

(municipalities / CSDs)

 

Population estimates, July 1, by census subdivision, 2016 boundaries

https://www150.statcan.gc.ca/t1/tbl1/en/tv.action?pid=1710014201

 

Population estimates on July 1st, by age and sex

https://www150.statcan.gc.ca/t1/tbl1/en/tv.action?pid=1710000501

(Provinces)

 

Fertility: Overview, 2012 to 2016

https://www150.statcan.gc.ca/n1/pub/91-209-x/2018001/article/54956-eng.htm

National Registration File of 1940

March 5, 2021


Question

I have received a request from a user who need to have access to the National Registration File of 1940:

The National Registration File of 1940 resulted from the compulsory registration of all persons, 16 years of age or older, in the period from 1940 to 1946. This information was originally obtained under the authority of The National Resources Mobilization Act and the War Measures Act. Custody of the records was subsequently given to Statistics Canada, then known as the Dominion Bureau of Statistics.



Answer

If the client wants to access information contained in this file, they can contact statcan.censuspensionsearch-recherchesurpensionrec.statcan@canada.ca

 

There are however limitations to what can be accessed. More information can be found here: https://www150.statcan.gc.ca/n1/en/catalogue/93C0006

New DMP Templates

March 4, 2021


Portage is pleased to have published five new discipline- and methodology-specific Data Management Plan (DMP) Templates in English and French, with more to come. These Templates cover a range of disciplines and research methods, highlight best practices for DMPs in those disciplines, and provide tailored guidance for researchers writing their own DMPs. Initiated by a Portage funding call in April 2020, they are the result of hard work on the part of exceptional Researchers, Librarians, and Information Professionals in the Portage community, and members of the Portage Secretariat with whom they collaborated. 


The following DMP Templates are now available:


These Templates are available on Portage Training Resources under DMP Templates as well as in the Portage Zenodo Community. They are also available and embedded for use and institutional customization in the DMP Assistant.


If you have any questions regarding the DMP Templates, please contact Robyn Nicholson, Portage Data Management Planning Coordinator, at robyn.nicholson@carl-abrc.ca.

2017 Aboriginal Peoples Survey: derived variable for residential school attendance

February 23, 2021


Question
I would like to inquire about the methodology for creating the value 4 for the derived variable residential school attendance for the 2017 APS PUMF (please see the data dictionary screenshot below, pages 128-129).  The label Only parent(s)/grandparent(s)/other family member(s) attended seems unclear.

 

We were assuming that the value 4 is mutually exclusive of values 2 and 3 but are wondering how.  For example, does the value 4 include:

all of the groups: parent(s), grandparent(s) and other family member(s) or

- one or more of parent(s) and grandparent(s), plus other family member(s) or

two or more of any of the groups, parent(s), grandparent(s) and other family member(s) … ?




















‘Residential school’ refers to both ‘residential schools’ and ‘federal industrial schools’

 

In categories 2, 3, 4 and 5, the respondent may not have attended a residential school

 

Categories 3 and 4: ‘Other family members’ include the respondent’s current spouse or

 

NOTE: Categories include situations where non-attendance by any family members

 

NOTE: Categories include situations where non-attendance by any family members

 

Source: Derived Variable - Derived from: RS_05, RS_10A, RS_10B, RS_10C, RS_10D

Answer Categories Code Frequency Weighted Frequency % Respondent attended 1 1,169 41,107 4.1




Answer

All the categories for the derived variable DRSCHATT are mutually exclusive. The data dictionary indicates which persons are considered ‘other family’ members, and category 4 includes only these persons, not any parents or grandparents of the respondent.

 

Here are the specifications used to create the derived variable:

Specifications

Value

Condition(s)

Description

1

RS_Q05 = 1 and
RS_Q10A = 2 and RS_Q10B = 2 and
RS_Q10C in (2, 3) and RS_Q10D in (2, 3)

Only respondent attended

2

RS_Q05 in (2, 6) and
(RS_Q10A = 1 or RS_Q10B = 1) and
RS_Q10C in (2, 3) and RS_Q10D in (2, 3)

Only parent(s) or grandparent(s) attended

3

RS_Q05 = 1 and
(RS_Q10A = 1 or RS_Q10B = 1) and
RS_Q10C in (2, 3) and RS_Q10D in (2, 3)

Only respondent and parent(s) or grandparent(s) attended

4

RS_Q05 in (2, 6) and
RS_Q10A = 2 and RS_Q10B = 2 and
(RS_Q10C = 1 or RS_Q10D = 1)

Only other family members attended

5

RS_Q05 = 1 and
(RS_Q10A = 1 or RS_Q10B = 1) and
(RS_Q10C = 1 or RS_Q10D = 1)

Respondent, parent(s) or grandparent(s), and other family members attended

6

RS_Q05 = 1 and
RS_Q10A = 2 and RS_Q10B = 2 and
(RS_Q10C = 1 or RS_Q10D = 1)

Only respondent and other family members attended

7

RS_Q05 in (2, 6) and
(RS_Q10A = 1 or RS_Q10B = 1) and
(RS_Q10C = 1 or RS_Q10D = 1)

Only parent(s) or grandparent(s) and other family members attended

8

RS_Q05 in (2, 6) and
RS_Q10A = 2 and RS_Q10B = 2 and
RS_Q10C in (2, 3) and RS_Q10D in (2, 3)

Neither respondent nor any family members attended

99

else

NS

 

Public washrooms and COVID

February 24, 2021


Question
“What special considerations have been made for the increased need/demand for public washrooms during COVID in Canada? Specifically, what data sources/methods would you recommend for us to be able to capture what is happening in municipal pockets across Ontario/Canada?”

 

Does anyone have any ideas where to find this sort of thing. Are there associations of municipalities provincially or nationally that would be a good place to start? I would like to avoid suggesting that she contact individual public health agencies or municipalities.

 

The student mentioned Muniscope which seems to be a national resource for municipalities and agencies that deal with municipal matters. You have to be a member to access any of their resources. Does anyone have any experience dealing with Muniscope and getting help and/or resources from them? Do they share resources; do they charge?



Contributor 1

  1. Municipal open data portals sometimes have public washrooms, but they’re often incomplete or out of date – still, they might be a start. E.g. for Toronto, refreshed this week: https://open.toronto.ca/dataset/street-furniture-public-washroom/ (and this particular dataset seems limited to one company’s contracts with the city)
  2. Your patron might also have luck with some of the crowdsourced public washroom apps, like https://www.restroommap.com/ or https://www.refugerestrooms.org/ though these a) might not have the dates when specific items were created (just added, if that) and b) sometimes have specific themes, like non-gendered washrooms (very useful for people who need them, of course, but it seems like your patron has a different need).


Contributor 2
For BC, it might be worth checking CivicInfoBC to see if it includes any municipal reports. From the menu options on the left side, I'd suggest looking in the Documents section and/or COVID-19 Resources section. The Annual Surveys don't seem to cover this topic and wouldn't be current enough anyway.


https://www.civicinfo.bc.ca/researchtools


At the time of publication, no contributors have come forward with working knowledge of Muniscope.

Monday, March 15, 2021

UBC Library Works With FRDR to Make Statistics Canada Data More Discoverable

October 2, 2020


Portage and the UBC Library’s Abacus Data Network are pleased to announce the addition of the Statistics Canada Open License Dataverse to the Federated Research Data Repository (FRDR) Discovery Service



The Statistics Canada Open License Dataverse includes more than 1,000 public use microdata data files (PUMFs) as well as additional datasets that are not currently part of the federal Open Government Portal. The Abacus Data Network is a data repository collaboration that involves the University of British Columbia (UBC), Simon Fraser University (SFU), the University of Northern British Columbia (UNBC) and the University of Victoria (UVic).


FRDR currently has over 70 Canadian research data repositories included in its discovery service, with more being added. 


For more information, please contact FRDR Support at support@frdr-dfdr.ca.


For support with Statistics Canada Open License Data, please contact your local institution’s Data librarian.


For Abacus Dataverse support, please contact jeremy.buhler@ubc.ca.

How to Cite Statistics Canada Products

February 18, 2021

An updated edition of How to Cite Statistics Canada Products (12-591-X) is now available. This guide aims to provide direction on the creation of bibliographic references for Statistics Canada's products and services.

 

In the absence of international standards, citing statistics and data has been a neglected grey area in academic publishing. This new edition of How to Cite Statistics Canada Products fills the gap by covering an even broader range of Statistics Canada and other statistical products and services.

GSS - Caregiving and Care Receiving 2018 (Cycle 32) PUMF

February 4, 2021


Question

Can you confirm if there is a PUMF for the General Social Survey – Caregiving and Care Receiving 2018 (Cycle 32)? I don’t see GSS Cycle 32 in the DLI EFT /MAD_PUMF_FMGD_DAM/Root/4502_GSS-Care_ESG-Soins folder.


Answer

GSS Cycle 32 is currently on hold. The subject matter team had to move resources over in order to work on other releases so they are hoping to begin working on it again by the summer.