January 6, 2021
II've pulled data for each CSD from the 2006 census using Beyond 20/20 and my advisor wanted me to ask , based on the info for StatsCan that I've copied below - how can I know if zeroes in my dataset represent data suppression or are true zeroes? He is thinking that those that were suppressed will likely need to be treated as missing for my analyses, since they aren't true zeroes, but I'm not sure how to accurately differentiate these. He's thinking that I can likely do this by looking at CSD population size and the number of total private households, and if neither of these thresholds is exceeded (as outlined below) then the zero is likely a true zero (likely not many of these), otherwise I can replace the other zeroes as missing values. Does that make sense?
Area suppression for income characteristic data
Area suppression, when applied for data quality purposes, is used to replace all income characteristic data with zeroes for geographic areas with populations and/or number of households below a specific threshold.
If a census tabulation contains any data showing income characteristics for individuals, families or households, then the following rule applies. Income characteristic data are zeroed out for areas where the population is less than 250 or where the number of private households is less than 40. These thresholds are applied to 2006 Census data as well as all previous census data. The threshold of 40 private households is based upon the fact that weighted data are being used. With the weighting factor for each household being 5, setting a threshold of 40 ensures that there will be at least 8 households used in the calculation. The private household threshold does not apply for tabulations based on place of work geographies.
This seems to be what was happening in my data, as some variables for a single CSD have ‘.’ And others zeroes. Those with zeroes typically seem to relate to income, proportion of household spend etc.
Statistics Canada places the highest priority on maintaining the privacy and confidentiality of respondents. If necessary, data are suppressed to prevent direct or residual disclosure of identifiable data. Because of this and the data quality measures in place, your client will not be able to distinguish between all “true zeros” and these suppression zeros. Area suppression is one type of suppression which involves removing all characteristic data for geographic areas with populations below a specified size.
Having counts based on the geography should let them filter most of those out.
250 people, if the table contains income data, and if the table also contains place-of-residence data, at least 40 private households 100 people, if it is a six-character postal code area, that is, a local delivery unit (LDU), or if it is a custom area 40 people, in all other cases.
In regards to your client’s question on individual cell suppression please see the following paragraph from Chapter One of the 2006 Overview of the Census: Dissemination Rules for Statistics:
“Tables are sometimes accompanied by statistics such as averages, totals and standard deviations. There are various ways of ensuring that these statistics do not reveal sensitive information; for instance, they may be suppressed or made less precise. Some statistics, such as totals, ratios and percentages, are based on the rounded values in the tables to which they apply. A statistic will be suppressed if there are too few data to compute it. In cases of data items expressed in dollars, if the statistic must be calculated from data where the values are too close or if a value is too high compared to the others, then the statistic will be suppressed.”
Depending on the income source variable, income medians and averages are most always never true 0. When there is a zero for most things it is a suppression. As for counts that have been rounded to zero, it is a feature of the confidentiality system and you cannot distinguish those rounded down from the true zeros.