Wednesday, February 5, 2014

NHS Data Quality Question

Question

I'm trying to encourage our students and researchers to use the NHS User Guide to consider the quality of NHS data for a given community. I feel that this is especially important for researchers at UNBC, as they tend to study small communities. However, I'm confused with a statement at <http://www12.statcan.gc.ca/nhs-enm/2011/ref/nhs-enm_guide/guide_4-eng.cfm#A_5_4> under "Discrepancy between 2011 Census counts and 2011 NHS estimates."

Quote from that section: "For a given census subdivision (CSD) or any other geographic area, users are invited to compare the 2011 Census count with the NHS estimate for the same target population to get an idea of the quality of the NHS estimates."

Where does one find the NHS estimate for a given CSD? The only figure given in the NHS profiles is the "Total population in private households by citizenship."”

First question:

As this is the "total population in private households," this wouldn't include those not in private households. So this isn't the actual population estimate, is it?

Second question:

I'm sure that I'm not understanding something. Can you, or someone, help me to connect the dots?

Third question:

I do see another statement in that same section ("Discrepancy between 2011 Census counts and 2011 NHS estimates") that seems useful:

"A similar analysis comparing the NHS estimates and the 2011 Census counts for common questions would also provide an idea of the quality of the NHS estimates."

I see that there is a figure in the Census Profiles under Housing and dwelling characteristics" for "Total number of persons in private households," so I'm assuming that this figure could be compared to "Total population in private households by citizenship" to work out a ratio. Is that correct?

Answer

There are two main measures for the quality of the NHS data. The first compares the 2011 Census count with the NHS estimate and the second uses the global non-response rate (GNR) that is published for the 2011 National Household Survey (NHS) estimates.

First, it is important to keep in mind these main differences that exist between the 2011 Census count and the NHS estimate:

The definition of the population of each data source: the target population for the 2011 Census includes usual residents in collective dwellings and persons living abroad, whereas the target population for the NHS excludes them.
The variability of the estimates for the NHS: the NHS estimates are derived from a sample survey and are therefore subject to sampling error; they are also subject to potentially higher non-response error than in the census due to the survey's voluntary nature.

So when comparing the 2011 Census and the 2011 NHS tables you can only compare the tables that have the universe containing the population in private households.

Also keep in mind that with a sampling rate of about 3 in 10 and a response rate of 68.6%, it is estimated that only about 21% of the Canadian population participated in the NHS.

This is why users are cautioned to pay close attention to the potential differences between the 2011 Census counts and the NHS estimates for common characteristics. Where there are differences, users should consider the 2011 Census counts to be of higher quality and give preference to them since they are not affected by the NHS's sampling variance or non-response error.

When comparing the 2011 Census count and the 2011 NHS estimate the discrepancy between these counts is the difference between the NHS estimate and the 2011 Census count divided by the 2011 Census count.

Whether there is a discrepancy or not is an indication of the quality of the NHS estimates. For a given census subdivision (CSD) or any other geographic area, users are invited to compare the 2011 Census count with the NHS estimate for the same target population to get an idea of the quality of the NHS estimates.

The larger the discrepancy is, the greater the risk of having poor-quality NHS estimates.
For CSDs with a population of 25,000 or more, the census count and the NHS estimate are practically identical. That is not always the case for smaller CSDs.

The global non-response rate is also an important measure of the quality of NHS estimates. It combines household and item non-response. This measure is used for the 2011 Census, just as it was in 2006 for dissemination of the Census, including the long form.

In the specific case of the NHS, the global non-response rate is weighted to take account of the initial sample and the subsample used in non-response follow-up. It is calculated and presented for each geographic area.

As noted in Section 3.1 <http://www12.statcan.gc.ca/nhs-enm/2011/ref/nhs-enm_guide/guide_2-eng.cfm>, there is non-response bias when a survey's non-respondents are different from its respondents. The higher the non-response is, the greater the risk of non-response bias. For the NHS, a number of measures were taken to mitigate the potential effects of non-response bias. Despite those efforts, the risk of non-response bias remains.

The global non-response rate is also used as a main dissemination criterion associated with the quality of the NHS estimates. For example, the NHS estimates for any geographic area with a global non-response rate greater than or equal to 50% are not published in the standard products. The estimates for such areas have such a high level of error that they should not be released under most circumstances.

A smaller GNR indicates a lower risk of non-response bias and as a result, lower risk of inaccuracy.