December 15, 2020
Question
I have a researcher using the PCCF+ to look at improving patient access to hospital clinics. They are calculating the distance a patient travels from home to clinic but are finding that each time they run the data they are getting different lat/longs. The trouble is that they have multiple cohorts and some patients could be in more than one cohort.
With getting different lat/longs, they are sometimes finding the difference in one patient’s lat/long can be as much as 36KM. Is there a way to account for this variance in results? Is there any way to limit the amount of difference between results for a person who is in multiple cohorts? I have included the researcher’s original question below:
----- Original question -----
[Note – ‘pts’ and ‘pt’ means patient]
We are using the software to calculate distance from a pts. postal code to the hospital, and several other satellite clinics, but I am finding each time I run the data, I can get a different latitude and longitude for the same postal code within the same data run and across various data runs.
To give you an example:
Run 1:
Pts attending a clinic appointment between 2014-2017 (a pt. could have attended multiple times, some pts. have the same postal code but live in a different location – especially in rural cases)
Run 2:
Pts. attending clinic between 2018-2019 (some of these pts. may also have been seen in the 2014-17 cohort)
Hence, run 2 will still likely contain pts. that also attended an appointment in run 1, but it is quite likely PCCF+ will assign a different lat and long. for that pt.
We were planning to run different types of analysis and a pt. could be included in more than one analysis, resulting in a possible different lat and long each time, which when then comparing cohorts would mean we are not always using the same lat and long for a patient.
I am trying to figure out the possible variance. To run all the 2014-2019 data in one go and then try to separate into the various different cohorts would be extremely time consuming as cross reference would need to be made back to my tracking sheets to figure out what year the pts. appointment was. Currently planning on 6 different runs, with overlapping pts. across the runs and multiple duplicate postal codes.
I have discovered so far for one postal code the difference between the 2 locations (driving) is as much as 36km.
Any help or thoughts you could provide on this would be great.
I was also wondering if PCCF+ allows for street addresses to be used in combination with postal codes, we may be too late this time round for that route, but it would be good to know for the future.
Answer
There could be a couple of reasons for these results:
One possible cause could be that there are, at times, multiple records for each postal code. For the most accurate results, I would confirm that the record being used is always the one where the single link indicator (SLI) is equal to one.
Additionally, the coordinates are based on the geography the postal code is geocoded to. You can check what geography the postal code is geocoded to by the variable Rep_Pt_Type. The majority of records are automatically geocoded to the block level, but there are others (mainly rural areas) that are geocoded to the Census Sub Division (CSD).
Another possible reason for the results is that we are getting regularly getting updated data from Canada Post, and we are also making corrections to the data as we find errors. This may be the cause of why you would see slight differences from year to year.
As for the last question, I would need to confirm with the team in Health Statistics responsible for the PCCF+ if they have any plans to include new variables into the file.