Monday, February 17, 2020

Non-Profit Data

Question:
I have the following question from a faculty member.  I'm not sure where to find these.  Any suggestions would be appreciated. NFP= not for profit organizations. I presume that this doesn't include educational organizations, municipalities, etc.


% of workforce employed in NFPs, government % of NFP funding, % of GDP contributed by NFPs, % of Canadians donating to NFPs.


Thanks in advance.  I think I see some info on Stats can, but I'm not sure.

--

Answer:

We received the following information from our subject matter areas:

We were able to find the following table on the Statistics Canada website that shows employment for Canada by non-profit institutions.  

There is also a glossary of terms that provides more information about not for profit institutions as well as a publication.

The client can find most of the data he needs from the Satellite Account of Non-Profit Institutions and Volunteering (up to 2017).

https://www150.statcan.gc.ca/n1/daily-quotidien/190305/dq190305a-cansim-eng.htm


  1. 1)      % of workforce employed in NFPs
    • You can find the number of people employed in non-profits from tables;
      • By sub-sector: 36-10-0617-01
      • By type of activity: 36-10-0615-01
      • These numbers should be compared to employment estimates from the National Accounts Labour Productivity program (available by province and for Canada): 36-10-0480-01
  2. Government % of NFP funding
    • You can find current transfers from government to non-profits in the income account: 36-10-0613-01
  3. % of GDP contributed by NFPs
    • You can find the GDP of the non-profit sector from tables;
      • By sub-sector: 36-10-0616-01
      • By type of activity: 36-10-0614-01
      •  This should be compared to basic prices GDP from the National Accounts.  This data is available by province and for Canada: 36-10-0221-01
      • Note: In order to get the GDP data from Market Prices to Basic Prices you will need to subtract “Taxes less subsidies on products and imports” from “GDP at market prices” in the table.
  4. % of Canadians donating to NFPs
    • Unfortunately, we don’t have data on the % of Canadian donating to Non-profits.  All we have is the amount of money actually being donated.  You can find this in the income account under “Current Transfers from Households: Donations”.  It is in table 36-10-0613-01

Tuesday, February 11, 2020

New files on Statistics Canada Nesstar

We are pleased to inform you that the following are now available on the Statistics Canada Nesstar WebView site.

  • Canadian Travel Survey (CTS) 1980 Person Data PUMF (English only)
  • Youth Smoking Survey (YSS) 2006-2007 PUMF (French only)
  • Survey on Ageing and Independence (SAI) 1991 PUMF (French only)
  • International Travel Survey (ITS) 2002 U.S. Resident Trips To Canada PUMF (English only)
  • Labour Market Activity Survey (LMAS) 1986 Person File PUMF (English only)
  • Labour Market Activity Survey (LMAS) 1987 Person File PUMF (English only)

Friday, February 7, 2020

Transgender and intersex people in Canada

Question:

I have a student looking for demographic information on transgender and intersex people in Canada.

Is there any survey/census/reliable estimate or other sources of information I can point them to?

There was a similar question asked here in 2015 and the answer was that there is no information collected by StatCan, but maybe something has changed since then.

Answer:

There is no data available on the Intersex population in Canada from Statistics Canada.

However, the Survey on Safety in Public and Private Spaces did publish an estimate of 0.24% for Transgender Population in Canada in December 2019.

Here is the link to their analytical document:

https://www150.statcan.gc.ca/n1/pub/85-002-x/2019001/article/00017-eng.htm

---

This isn’t the same concept that your researcher is looking for, but it may be helpful to bring up all the same.  (i.e., it won’t stand in as a proxy…)

The latest CCHS has Variable SDC_035, where R’s have the opportunity to state whether they consider themselves to be heterosexual, homosexual, or bisexual. 

From the questionnaire:

SDC_Q035   Do you consider yourself to be...?


INTERVIEWER: Read categories to respondent.
1 Heterosexual (sexual relations with people of the opposite sex)
2 Homosexual, that is lesbian or gay (sexual relations with people of your own sex)
3 Bisexual (sexual relations with people of both sexes)
DK, RF


I don’t think this would hit the mark for your researcher as the Q specifically talks about “sexual relations” as opposed to gender and sexual fluidities and identities.  That said, I think this may be the one of the few times (or only time?) in recent memory that StatCan has posed a question on sex/gender and then released into a PUMF, perhaps partly because the concept can be difficult articulate into a categorical variable without misrepresenting R’s opinions on this very personal identity question..

New Release: LFS January 2020

We are pleased to inform you that the following product is now available.

Labour Force Survey (LFS) - January 2020

This public use microdata file contains non-aggregated data for a wide variety of variables collected from the Labour Force Survey (LFS). The LFS collects monthly information on the labour market activities of Canada's working age population. This product is for users who prefer to do their own analysis by focusing on specific subgroups in the population or by cross-classifying variables that are not in our catalogued products. The Labour Force Survey estimates are based on a sample, and are therefore subject to sampling variability. Estimates for smaller geographic areas, industries, occupations or cross tabulations will have more variability. For an explanation of sampling variability of estimates, and how to use standard errors to assess this variability, consult the Data Quality section in the Guide to the Labour Force Survey.

EFT: /MAD_PUMF_FMGD_DAM/Root/3701_LFS_EPA/1976-2020/data/micro2020-01.zip

Tuesday, February 4, 2020

Problem with Postal Code Conversion File+ 7B

Question:

A researcher here is working with PCCF+ 7B and has encountered a problem, and wondered if someone at StatCan could assist?

Text below forwarded from the researcher:

The main problem I am facing is the DB-level GAF linking, but I also found two other issues that could be affecting the linking as a whole.

I am working on a project where we are trying to use postal codes reported in a survey to identify rural/urban status (planning to use the CSize variable). There was an option to provide full postal code, or to provide only the first three digits if the respondent would prefer. The records with full postal code seem to be linking properly, but any record with only FSA appears in the problem file with most of the variables set to missing. When I went through the code closely I noticed that when the records are linked to the Geographic Attribute File (GAF) it requires records to have a dissemination block. The comment in the code above this merge says “Merge back with Geographic Attribute File to get the remainder of the codes - Start at the Dissemination Block level, then move up to the DA, CSD, etc...”, but then the files are only linked at the DB level and any record without DB is not merged. From what I understand, dissemination block cannot be assigned to records linked on less than 4 characters so our records with only FSA do not get merged to the GAF and therefore have all the GAF variables set to missing later in the cleaning up stage of the files. When I looked at the datasets right before this merge, all the records are assigned a CMA from the linking based on FSA but this variable gets dropped before merging to the GAF. I know that the GAF is a DB level file but from my understanding there are variables at higher levels, such as CMA (and therefore CSize), that could be assigned for records without DB. Would it be valid to use the CMA variable assigned in the earlier linking to merge to the GAF for variables at the CMA level?

I also found that in the wc6dups file, the SAC id variable is only one digit instead of three. The input dataset only has one digit where SAC should be, and there is a comment beside this dataset in the input file SAS code saying “Check the input file”. This doesn’t actually make a difference in the files right now because this variable is overwritten by the GAF SAC variable, but I’m not sure if it will have an impact if the files are merged using SAC/CMA. Another thing I noticed is that DMT is used frequently through the code to assign missing values to records with specific DMTs, but DMT is set to 9 after the 6 character linking, and doesn’t exist in any of the datasets used later in the linking, so any records not linked using 6 characters has a DMT=9. I’m not sure the impact that this has on the validity of the linking, but I don’t think this is what was intended.

Answer:

In developing more recent versions of the PCCF+, we have removed most of the geographic links at the 3-digit FSA level, because we consider these to be largely inaccurate in most cases – this is why your client has most of these as missing. This happens during the cleaning phase. In general, we do not recommend off-label uses of the PCCF+, such as pulling out preliminary datasets to be used in geocoding.

In regards to specific questions:

but I’m not sure if it will have an impact if the files are merged using SAC/CMA.

They are not merged  using these variables.

Another thing I noticed is that DMT is used frequently through the code to assign missing values to records with specific DMTs, but DMT is set to 9 after the 6 character linking, and doesn’t exist in any of the datasets used later in the linking, so any records not linked using 6 characters has a DMT=9. I’m not sure the impact that this has on the validity of the linking, but I don’t think this is what was intended.

We cannot assign DMT to partial postal codes – they only exist for full postal codes. We perform different geocoding processes using DMT as an indicator of positional accuracy.

Monday, February 3, 2020

PSIS

Question:

I'm working with a student who wishes to use the microdata of Postsecondary Student Information System (PSIS). The application process to get access to the data at RDC however, is a bit too long given their situation (his thesis is due to in less than 2 months); if they could use PUMFs for this survey that would be ideal, but I didn't find it on Nesstar.  Did I miss anything and is there any other way for him to get his hands on the PSIS microdata?

Answer:

You may find the following table created by my colleagues helpful. It outlines which surveys have public use files, if there is a confidential file in the RDCs and if the file is available through the real time remote access (RTRA) program.

https://uottawa.libguides.com/c.php?g=401920&p=3205624

As you can see, there isn’t a PUMF for the PSIS. Likely your students best option would be a custom tabulation, but those are cost recoverable. 

New files on Statistics Canada Nesstar

We are pleased to inform you that the following are now available on the Statistics Canada Nesstar WebView site.

  • Youth Smoking Survey (YSS) 2004-2005 PUMF
  • Youth Smoking Survey (YSS) 2006 PUMF (English only)

Friday, January 31, 2020

New Release: NTS 2018

We are pleased to inform you that the following product is now available.

National Travel Survey (NTS) 2018

The National Travel Survey was developed to fully replace the Travel Survey of Residents of Canada (record number 3810) and replace the Canadian resident component of the International Travel Survey (record number 3152). The National Travel Survey collects information about the domestic and international travel of Canadian residents.

The National Travel Survey, sponsored by Statistics Canada, aims to measure the characteristics and the economic impact of the tourism activities of Canadian residents. The objectives of the survey are to provide information about the number of trips and expenditures by Canadian residents by trip origin, destination, duration, type of accommodation used, trip reason, mode of travel, etc.; to provide information on travel incidence and to provide the socio-demographic profile of travellers and non-travellers. From a macroeconomic point of view, the NTS measures the domestic and international tourism demand by Canadian residents.

EFT:  /MAD_PUMF_FMGD_DAM/Root/5232_NTS-ENV/2018

New Release: ITS 2017

We are pleased to inform you that the following product is now available.

International Travel Survey (ITS) 2017

The electronic questionnaires (e-questionnaires) and Air Exit Survey (AES) are components of the International Travel Survey (ITS) together with the Frontier Counts (record number 5005). It is an ongoing survey conducted by Statistics Canada since 1972 to meet the requirements of the Balance of Payments (BOP) of the Canadian System of National Accounts. The survey provides a full range of statistics on international travellers (visitors to Canada and Canadian residents returning to Canada), including detailed characteristics of their trips such as expenditures, activities, places visited and length of stay.

In addition to fulfilling BOP requirements, the information collected in the questionnaires is used by the Tourism Satellite Account (TSA), Canada Border Services Agency (CBSA), Destination Canada, provincial tourism agencies, the United States Department of Commerce, the OECD, banks, investment companies, other private sector industries and independent researchers. The information is also used for reporting to international organizations such as the World Tourism Organization (WTO), the Organization for Economic Co-operation and Development (OECD) and the Pacific-Asia Travel Association (PATA).

The AES started in the year 2000 for overseas visitors and in the year 2011 for U.S. visitors. The primary objective of the AES is to improve the quality and reliability of trip and traveller estimates for foreign air travellers to Canada, from major and emerging markets.

The e-questionnaire component of the survey began in 2013, with the distribution of invitation cards to travellers (Canadian, American, and Overseas) who have entered at one of 137 designated Canadian ports of entry. The mail-back questionnaires were last used in 2014.

EFT:  /MAD_PUMF_FMGD_DAM/Root/3152_ITS_EVI/2017/ 

Tuesday, January 28, 2020

CCHS variable CCS_185

Question:

I have a researcher working with multiple years of the CCHS (2007-2008 to 2015-2016) and a question has arisen - please see his message below:

In the CCHS documentation item CCS_185 (last time to have colonoscopy or sigmoidoscopy); the response options include both a time range of "5 TO < 10YEARS" and another time range of "5 YEARS OR MORE". These options should not be mutually exclusive (although they are presented in the dataset as such). Do you have any explanation for this? 

The researcher has indicated that it is represented this way in the data itself as well and is present across all the years. My initial thoughts are that there is a typo and that the second option should perhaps be "10 years or more", as otherwise these two response options overlap each other.

Can you please advise?

Answer:

There are a couple previously identified format/label issues with CCS for multiple CCHS cycles, including the format for CCS_185. The error for CCS_185 is a label error and CCS_185=6 should be ‘10 years or more’ (errata item 19). I would encourage the researcher to consult the errata found in the documentation folders (two separate CCHS annual component Errata documents; one for the 2000 to 2014 reference period, and another for 2015 and later) , this document contains information for all known errors as well as information on correcting known errors.

This may have previously been mentioned to the researcher, but the CCHS annual component was redesigned for the 2015 reference period and onward cycles. 
As a result of the 2015 CCHS redesign, combining or comparing cycles of CCHS data from before and after the redesign (e.g., combining/comparing 2014 (and earlier files) with 2015 annual files) is not recommended, and caution should be taken when comparing estimates across those years. Even estimates derived from content that has remained unchanged, are subject to the potential impacts of the other major changes to the survey (i.e. new survey frames and collection methods) and may not necessarily be comparable with past cycles. It would be very difficult to ascertain whether any changes or consistencies between estimates pre and post redesign are a reflection of the true population characteristic being examined, or the effect of the significant methodological and operational changes made to the survey. Please review the 2015 CCHS Redesign Summary found with the 2015 and also with 2016 survey documentation.

Monday, January 27, 2020

Request for Tools to Turn Tables in PDFs into Spreadsheets

Question:

Hi DLI Community,

Can anyone recommend a good tool for using OCR to turn pdfs of scanned tables into spreadsheets? A professor here at the University of Toronto is working with 200+ tables from nineteenth-century Ontario government publications, and I’m trying to suggest tools and a workflow for him and his RA.

Example of scanned document: https://archive.org/details/reportofcommissi187986ontauoft/page/n31/mode/2up

So far, my proposed workflow is for them to clean pdfs (if necessary) using Acrobat Pro, and then scan them using https://www.onlineocr.net/ (which is free and fairly good) or else something more powerful like OmniPage Ultimate (slow but useful proprietary OCR software, which we have on some library workstations) for particularly challenging tables. Finally, the tables can be manually corrected.

Do any of you have suggestions for OCR tools that worked for you, especially if they work in-browser and can create spreadsheets?

Answer:

You may want to have a look at Tabula (https://tabula.technology/). I have not used it extensively myself, but I remember that it was recommend by Vince Gray (of DLI fame), so it must be good!

--

I often use Camelot or it’s web version Excalibur but the document must be in a text-based PDF format. I don’t know a decent tool to convert image-based PDF to text-based PDF.

Wednesday, January 22, 2020

Variable issues with GSS Cycle 30

Question:

A researcher reports something strange with two variables from the 2016 iteration (Cycle 30- Canadians at Work and Home) of the GSS.

Both variables AGEFHSHG (Age of respondent’s father in household) and AGEMHSDG (Age of respondent’s mother in household) show answer categories that are very perplexing if you consider that these are supposed to be PARENTS:

0 to 9 years and
10 to 19 years

Granted, there are young parents that would be in the second category, but the first one does not make sense at all and I wonder if the second one is meant to start as early as 10 years old. And there are actually answers within the 0 to 9 years old category! This must be a mistake or there is something really strange about how these derived variables are obtained.

Answer:

Thank you for bringing this discrepancy to our attention. After some investigation, we discovered the following two issues: 

1.       There was 1 case where all foster children were inadvertently coded as foster parents. This has been recoded.

2.       Labels in the PUMF were incorrectly applied for “Age of respondent’s mother in the household” and “Age of respondent’s father in the household”.

The way it appeared in PUMF--incorrect
Age of respondent’s mother in household

0 to 9 years
10 to 19 years
20 to 29 years
30 to 39 years
40 to 49 years
50 years and over

Valid skip


To rerun the data yourself, please use the following coding:

 AGEMHSDG in groupings of 10 years */

/* Age group of respondent’s mother in household */

/****************************************************************************************/

/* Values for : */

/* 1 : Less than 40 */

/* 2 : 40 - 49 */

/* 3 : 50 - 59 */

/* 4 : 60 - 69 */

/* 5 : 70 - 79 */

/* 6 : 80 years and over */

/* 96 : VALID SKIP No Mother in household */

/* 97 : DON'T KNOW */

/* 98 : REFUSAL */

/* 99 : NOT STATED */

Our Processing Team is making all appropriate corrections and is rerunning the data. The data dictionary will be reissued in the near future.

.....

...the changes apply to both the mother and father variables.


Once we do the corrections on our end, the folks at Odesi will be able to take the data to restage for their purposes.

Friday, January 17, 2020

New files on Statistics Canada Nesstar

We are pleased to inform you that the following are now available on the Statistics Canada Nesstar WebView site.

·         Retirement Survey (RS) 1975 Pre-Retirement PUMF (English only)

·         Retirement Survey (RS) 1975 Retirement PUMF (English only)

·         Labour Market Activity Survey (LMAS) 1986 Job File PUMF (English only)

·         Labour Market Activity Survey (LMAS) 1987 Job File PUMF (English only)

·         Canadian Travel Survey (CTS) 1980 Person Trip Data PUMF (English only)

·         Family History Survey (FHS) 1984 PUMF (English only)

Should you have any questions or comments, feel free to contact us.

Wednesday, January 15, 2020

New Release: National Graduates Survey 2018

National Graduates Survey (NGS) 2018

Data from this survey will be used to better understand the experiences and outcomes of graduates, and to improve government programs. The survey is designed to collect details on topics such as: i) the extent to which graduates of postsecondary programs have been successful in obtaining employment since graduation; ii) the relationship between the graduates' program of study and the employment subsequently obtained; iii) the type of employment obtained and qualification requirements; iv) sources of funding for postsecondary education; and v) government-sponsored student loans and other sources of student debt.

This information will be used by Statistics Canada, Employment and Social Development Canada (ESDC), provincial and territorial ministries of education, researchers and other interested organizations to examine various aspects such as educational pathways, postsecondary funding, mobility, school-to-work transitions, labour market outcomes and pursuit of further postsecondary studies.

EFT: /MAD_PUMF_FMGD_DAM/Root/5012_NGS_END/2018

Friday, January 10, 2020

New Release: LFS December 2019

Labour Force Survey (LFS) - December 2019

This public use microdata file contains non-aggregated data for a wide variety of variables collected from the Labour Force Survey (LFS). The LFS collects monthly information on the labour market activities of Canada's working age population. This product is for users who prefer to do their own analysis by focusing on specific subgroups in the population or by cross-classifying variables that are not in our catalogued products. The Labour Force Survey estimates are based on a sample, and are therefore subject to sampling variability. Estimates for smaller geographic areas, industries, occupations or cross tabulations will have more variability. For an explanation of sampling variability of estimates, and how to use standard errors to assess this variability, consult the Data Quality section in the Guide to the Labour Force Survey.

EFT: /MAD_PUMF_FMGD_DAM/Root/3701_LFS_EPA/1976-2019/data/micro2019-12.zip