Friday, January 31, 2020

New Release: NTS 2018

We are pleased to inform you that the following product is now available.

National Travel Survey (NTS) 2018

The National Travel Survey was developed to fully replace the Travel Survey of Residents of Canada (record number 3810) and replace the Canadian resident component of the International Travel Survey (record number 3152). The National Travel Survey collects information about the domestic and international travel of Canadian residents.

The National Travel Survey, sponsored by Statistics Canada, aims to measure the characteristics and the economic impact of the tourism activities of Canadian residents. The objectives of the survey are to provide information about the number of trips and expenditures by Canadian residents by trip origin, destination, duration, type of accommodation used, trip reason, mode of travel, etc.; to provide information on travel incidence and to provide the socio-demographic profile of travellers and non-travellers. From a macroeconomic point of view, the NTS measures the domestic and international tourism demand by Canadian residents.


New Release: ITS 2017

We are pleased to inform you that the following product is now available.

International Travel Survey (ITS) 2017

The electronic questionnaires (e-questionnaires) and Air Exit Survey (AES) are components of the International Travel Survey (ITS) together with the Frontier Counts (record number 5005). It is an ongoing survey conducted by Statistics Canada since 1972 to meet the requirements of the Balance of Payments (BOP) of the Canadian System of National Accounts. The survey provides a full range of statistics on international travellers (visitors to Canada and Canadian residents returning to Canada), including detailed characteristics of their trips such as expenditures, activities, places visited and length of stay.

In addition to fulfilling BOP requirements, the information collected in the questionnaires is used by the Tourism Satellite Account (TSA), Canada Border Services Agency (CBSA), Destination Canada, provincial tourism agencies, the United States Department of Commerce, the OECD, banks, investment companies, other private sector industries and independent researchers. The information is also used for reporting to international organizations such as the World Tourism Organization (WTO), the Organization for Economic Co-operation and Development (OECD) and the Pacific-Asia Travel Association (PATA).

The AES started in the year 2000 for overseas visitors and in the year 2011 for U.S. visitors. The primary objective of the AES is to improve the quality and reliability of trip and traveller estimates for foreign air travellers to Canada, from major and emerging markets.

The e-questionnaire component of the survey began in 2013, with the distribution of invitation cards to travellers (Canadian, American, and Overseas) who have entered at one of 137 designated Canadian ports of entry. The mail-back questionnaires were last used in 2014.

EFT:  /MAD_PUMF_FMGD_DAM/Root/3152_ITS_EVI/2017/ 

Tuesday, January 28, 2020

CCHS variable CCS_185


I have a researcher working with multiple years of the CCHS (2007-2008 to 2015-2016) and a question has arisen - please see his message below:

In the CCHS documentation item CCS_185 (last time to have colonoscopy or sigmoidoscopy); the response options include both a time range of "5 TO < 10YEARS" and another time range of "5 YEARS OR MORE". These options should not be mutually exclusive (although they are presented in the dataset as such). Do you have any explanation for this? 

The researcher has indicated that it is represented this way in the data itself as well and is present across all the years. My initial thoughts are that there is a typo and that the second option should perhaps be "10 years or more", as otherwise these two response options overlap each other.

Can you please advise?


There are a couple previously identified format/label issues with CCS for multiple CCHS cycles, including the format for CCS_185. The error for CCS_185 is a label error and CCS_185=6 should be ‘10 years or more’ (errata item 19). I would encourage the researcher to consult the errata found in the documentation folders (two separate CCHS annual component Errata documents; one for the 2000 to 2014 reference period, and another for 2015 and later) , this document contains information for all known errors as well as information on correcting known errors.

This may have previously been mentioned to the researcher, but the CCHS annual component was redesigned for the 2015 reference period and onward cycles. 
As a result of the 2015 CCHS redesign, combining or comparing cycles of CCHS data from before and after the redesign (e.g., combining/comparing 2014 (and earlier files) with 2015 annual files) is not recommended, and caution should be taken when comparing estimates across those years. Even estimates derived from content that has remained unchanged, are subject to the potential impacts of the other major changes to the survey (i.e. new survey frames and collection methods) and may not necessarily be comparable with past cycles. It would be very difficult to ascertain whether any changes or consistencies between estimates pre and post redesign are a reflection of the true population characteristic being examined, or the effect of the significant methodological and operational changes made to the survey. Please review the 2015 CCHS Redesign Summary found with the 2015 and also with 2016 survey documentation.

Monday, January 27, 2020

Request for Tools to Turn Tables in PDFs into Spreadsheets


Hi DLI Community,

Can anyone recommend a good tool for using OCR to turn pdfs of scanned tables into spreadsheets? A professor here at the University of Toronto is working with 200+ tables from nineteenth-century Ontario government publications, and I’m trying to suggest tools and a workflow for him and his RA.

Example of scanned document:

So far, my proposed workflow is for them to clean pdfs (if necessary) using Acrobat Pro, and then scan them using (which is free and fairly good) or else something more powerful like OmniPage Ultimate (slow but useful proprietary OCR software, which we have on some library workstations) for particularly challenging tables. Finally, the tables can be manually corrected.

Do any of you have suggestions for OCR tools that worked for you, especially if they work in-browser and can create spreadsheets?


You may want to have a look at Tabula ( I have not used it extensively myself, but I remember that it was recommend by Vince Gray (of DLI fame), so it must be good!


I often use Camelot or it’s web version Excalibur but the document must be in a text-based PDF format. I don’t know a decent tool to convert image-based PDF to text-based PDF.

Wednesday, January 22, 2020

Variable issues with GSS Cycle 30


A researcher reports something strange with two variables from the 2016 iteration (Cycle 30- Canadians at Work and Home) of the GSS.

Both variables AGEFHSHG (Age of respondent’s father in household) and AGEMHSDG (Age of respondent’s mother in household) show answer categories that are very perplexing if you consider that these are supposed to be PARENTS:

0 to 9 years and
10 to 19 years

Granted, there are young parents that would be in the second category, but the first one does not make sense at all and I wonder if the second one is meant to start as early as 10 years old. And there are actually answers within the 0 to 9 years old category! This must be a mistake or there is something really strange about how these derived variables are obtained.


Thank you for bringing this discrepancy to our attention. After some investigation, we discovered the following two issues: 

1.       There was 1 case where all foster children were inadvertently coded as foster parents. This has been recoded.

2.       Labels in the PUMF were incorrectly applied for “Age of respondent’s mother in the household” and “Age of respondent’s father in the household”.

The way it appeared in PUMF--incorrect
Age of respondent’s mother in household

0 to 9 years
10 to 19 years
20 to 29 years
30 to 39 years
40 to 49 years
50 years and over

Valid skip

To rerun the data yourself, please use the following coding:

 AGEMHSDG in groupings of 10 years */

/* Age group of respondent’s mother in household */


/* Values for : */

/* 1 : Less than 40 */

/* 2 : 40 - 49 */

/* 3 : 50 - 59 */

/* 4 : 60 - 69 */

/* 5 : 70 - 79 */

/* 6 : 80 years and over */

/* 96 : VALID SKIP No Mother in household */

/* 97 : DON'T KNOW */

/* 98 : REFUSAL */

/* 99 : NOT STATED */

Our Processing Team is making all appropriate corrections and is rerunning the data. The data dictionary will be reissued in the near future.


...the changes apply to both the mother and father variables.

Once we do the corrections on our end, the folks at Odesi will be able to take the data to restage for their purposes.

Friday, January 17, 2020

New files on Statistics Canada Nesstar

We are pleased to inform you that the following are now available on the Statistics Canada Nesstar WebView site.

·         Retirement Survey (RS) 1975 Pre-Retirement PUMF (English only)

·         Retirement Survey (RS) 1975 Retirement PUMF (English only)

·         Labour Market Activity Survey (LMAS) 1986 Job File PUMF (English only)

·         Labour Market Activity Survey (LMAS) 1987 Job File PUMF (English only)

·         Canadian Travel Survey (CTS) 1980 Person Trip Data PUMF (English only)

·         Family History Survey (FHS) 1984 PUMF (English only)

Should you have any questions or comments, feel free to contact us.

Wednesday, January 15, 2020

New Release: National Graduates Survey 2018

National Graduates Survey (NGS) 2018

Data from this survey will be used to better understand the experiences and outcomes of graduates, and to improve government programs. The survey is designed to collect details on topics such as: i) the extent to which graduates of postsecondary programs have been successful in obtaining employment since graduation; ii) the relationship between the graduates' program of study and the employment subsequently obtained; iii) the type of employment obtained and qualification requirements; iv) sources of funding for postsecondary education; and v) government-sponsored student loans and other sources of student debt.

This information will be used by Statistics Canada, Employment and Social Development Canada (ESDC), provincial and territorial ministries of education, researchers and other interested organizations to examine various aspects such as educational pathways, postsecondary funding, mobility, school-to-work transitions, labour market outcomes and pursuit of further postsecondary studies.


Friday, January 10, 2020

New Release: LFS December 2019

Labour Force Survey (LFS) - December 2019

This public use microdata file contains non-aggregated data for a wide variety of variables collected from the Labour Force Survey (LFS). The LFS collects monthly information on the labour market activities of Canada's working age population. This product is for users who prefer to do their own analysis by focusing on specific subgroups in the population or by cross-classifying variables that are not in our catalogued products. The Labour Force Survey estimates are based on a sample, and are therefore subject to sampling variability. Estimates for smaller geographic areas, industries, occupations or cross tabulations will have more variability. For an explanation of sampling variability of estimates, and how to use standard errors to assess this variability, consult the Data Quality section in the Guide to the Labour Force Survey.

EFT: /MAD_PUMF_FMGD_DAM/Root/3701_LFS_EPA/1976-2019/data/