A researcher here is working with PCCF+ 7B and has encountered a problem, and wondered if someone at StatCan could assist?
Text below forwarded from the researcher:
The main problem I am facing is the DB-level GAF linking, but I also found two other issues that could be affecting the linking as a whole.
I am working on a project where we are trying to use postal codes reported in a survey to identify rural/urban status (planning to use the CSize variable). There was an option to provide full postal code, or to provide only the first three digits if the respondent would prefer. The records with full postal code seem to be linking properly, but any record with only FSA appears in the problem file with most of the variables set to missing. When I went through the code closely I noticed that when the records are linked to the Geographic Attribute File (GAF) it requires records to have a dissemination block. The comment in the code above this merge says “Merge back with Geographic Attribute File to get the remainder of the codes - Start at the Dissemination Block level, then move up to the DA, CSD, etc...”, but then the files are only linked at the DB level and any record without DB is not merged. From what I understand, dissemination block cannot be assigned to records linked on less than 4 characters so our records with only FSA do not get merged to the GAF and therefore have all the GAF variables set to missing later in the cleaning up stage of the files. When I looked at the datasets right before this merge, all the records are assigned a CMA from the linking based on FSA but this variable gets dropped before merging to the GAF. I know that the GAF is a DB level file but from my understanding there are variables at higher levels, such as CMA (and therefore CSize), that could be assigned for records without DB. Would it be valid to use the CMA variable assigned in the earlier linking to merge to the GAF for variables at the CMA level?
I also found that in the wc6dups file, the SAC id variable is only one digit instead of three. The input dataset only has one digit where SAC should be, and there is a comment beside this dataset in the input file SAS code saying “Check the input file”. This doesn’t actually make a difference in the files right now because this variable is overwritten by the GAF SAC variable, but I’m not sure if it will have an impact if the files are merged using SAC/CMA. Another thing I noticed is that DMT is used frequently through the code to assign missing values to records with specific DMTs, but DMT is set to 9 after the 6 character linking, and doesn’t exist in any of the datasets used later in the linking, so any records not linked using 6 characters has a DMT=9. I’m not sure the impact that this has on the validity of the linking, but I don’t think this is what was intended.
In developing more recent versions of the PCCF+, we have removed most of the geographic links at the 3-digit FSA level, because we consider these to be largely inaccurate in most cases – this is why your client has most of these as missing. This happens during the cleaning phase. In general, we do not recommend off-label uses of the PCCF+, such as pulling out preliminary datasets to be used in geocoding.
In regards to specific questions:
but I’m not sure if it will have an impact if the files are merged using SAC/CMA.
They are not merged using these variables.
Another thing I noticed is that DMT is used frequently through the code to assign missing values to records with specific DMTs, but DMT is set to 9 after the 6 character linking, and doesn’t exist in any of the datasets used later in the linking, so any records not linked using 6 characters has a DMT=9. I’m not sure the impact that this has on the validity of the linking, but I don’t think this is what was intended.
We cannot assign DMT to partial postal codes – they only exist for full postal codes. We perform different geocoding processes using DMT as an indicator of positional accuracy.