Friday, January 23, 2009

Unique postal codes not all unique in the PCCF for Sept 2008

Question

We are having difficulties with the latest PCCF file for Sept 2008.


Several hundred postal codes, e.g., for postal codes A1A5K7, A1A5K8 ... , have more than one record which are labeled SLI=1 (1 = the best or only record for the postal code)


There appears to be a coding problem for the variable Single Link Indicator.


I deleted all postal codes with the SLI=0 as the researchers need to deal with unique postal codes.



If this is validated as a coding problem, could a new PCCF file be issued for Sept 2008 please?

Answer


1) I talked to one of Statistics Canada's PCCF experts, and he says that the September 2008 file appears to be correct. He provided me with the following, detailed explanation:

"In principle, users need to understand that there can be more than one record for a given postal code where SLI=1, if the postal code has been retired and then "rebirthed", or if for any reason the Delivery Mode Type (DMT) has changed. There should however be only be one record with SLI=1 for a given combination of postal code and DMT. If a postal code has a DMT of Z (retired) and any other DMT, you would want to keep only the non-retired records, which should eliminate most of the duplicate SLI records for a given postal code. But if there is only a retired record for the postal code, keep it.

For the two postal codes you mentioned, the above explanation sufficed to show that there really wasn't a problem, once the true nature of the SLI field is better understood. The same thing also occurred on previous versions versions of the PCCF.

However, it may or may not be appropriate to force only 1 record per postal code (using the SLI), as many postal codes serve a very wide area spanning multiple blocks, DAs and CSDs."

2)Further to the instructions from the PCCF folks at Statistics Canada on how to deal with non-unique postal codes, here are the SPSS steps I used to get rid of retired (DMT=Z) duplicate postal codes in SPSS without getting rid of unique retired postal codes…(nod now if are you still awake)…

1. Create a new variable (ID variable) :

- Using the menu: select TRANSFORM>COMPUTE then enter id in the Target Variable text box and $casenum in the Numeric Expression text box. Click OK.(If nothing happens, select TRANSFORM>RUN PENDING TRANSFORM)

2. Pull out the duplicate Postal codes to the top of the file:

- Using the menu: DATA>IDENTIFY DUPLICATE CASES then enter the Postal code variable in “DEFINE MATCHING CASES BY” and enter DMT in SORT WITHIN MATCHING GROUP BY and then select “MOVE MATCHING CASES TO THE TOP OF THE FILE” .

3. Now we’re ready to get rid of the non-unique postal codes. Please note my duplicate postal codes end at ID=11,849:

- Using the menu: DATA>SELECT CASES and Select If radial and then click on IF button and enter the syntax to identify the cases you want to keep, this is the syntax I used.

id >= 11850 or (id < 11850 and DMT ~="Z").

Then click on CONTINUE, then select OUTPUT: DELETE UNSELECTED CASES (Check first by filtering unselected cases and viewing that you are only going to delete duplicate retired postal codes).

This seems to work, but if anyone has a better way to do this, please let me know.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.