Thursday, October 11, 2018

Discussion on Cataloging

Just checked our library catalogue record for Canadian Business Patterns which now needs to reflect the new title Canadian Business Counts.  So far so good.

However, this reminds me of a long standing problem for me which relates to my getting DLI data records into our catalogue. This is an internal matter but advice from others on how they connect with their technical services area may be helpful.

The U of R catalogue record takes the user to .  This is all good, but it is not the whole story.  It should also direct the user to CHASS (another outdated name?) or Nesstar and it seems to me that it should also indicate that mediated access is available for obtaining the data from the EFT site. I would like to see all of this spelled out which would include a statement as to what the differences between the data on the EFT and the extraction sites.  Unfortunately, I can't remember.  Does it pertain to the level of geography provideid?

Additionally, I know some libraries also point to this: --  Customized extractions from the Canadian Business Patterns

I think, little late for ACCOLEDS this year, that we should perhaps have a technical services session covering this topic at a future training event.

Answer from DLI Admin:
I can speak to the difference between the data access methods on the DLI side.

The data on the EFT is the whole DLI collection. Each safe (depending on the license signed by the subscribing institution) gives access to the following products:

1.       MAD_PUMF_FMGD_DAM - Survey PUMFs and metadata
2.       MAD_DLI_IDD_DAM - DLI annual reports, DLI training materials, CD-ROM data products, Geography files, Census files and more
3.       MAD-PCCF_FCCP_DAM - Postal Code Conversion Files (PCCF), Postal Code Conversion Files Plus (PCCF+) and Postal Code Federal Riding Files (PCFRF)
4.       MAD_CIHI_ICIS_DAM - Discharge Abstract Database (DAD) from the Canadian Institute of Health Information (CIHI)
5.       MAD_SPSDM_BDMSPS_DAM - Social Policy Simulation Database and Model files (SPSDM)

More information about the contents of each safe can be found in Section 7: Accessing and Citing DLI Data of the DLI Survival Guide.

The DLI Nesstar site is a web-based data portal where users can search and identify variables of interest on the microdata files and determine whether a PUMF or a master file would best fit their research needs. The DLI’s version of Nessar houses public use microdata files and the metadata from Research Data Centre (RDC) master files. The server itself can house different forms of data. I believe that ODESI houses more of a variety such as public-opinion polls and such.

Everything that is available on the DLI Nesstar site is on the EFT. The DLI Nesstar site is just another means to explore and analyze data through a web browser.

I’ll pass on the idea of a future session on difference technical services available through the community to the rest of the DLI team. Thank you for bringing this to our attention!

Answer for DLI Member:
Perhaps the challenge also stems from the data being available or not available in multiple places, some of which are mediated, others are not, and, THE LACK OF AUTHORITY METADATA (machine-readable) for these data products. This is also compounded by the inconsistency between the public facing content on the STC website, and what is available through the DLI. In some cases, if you do a comparison, data found in the DLI to not match the same product records on the website. There are a whole bunch more problems here which have likely lead each institution to do their own thing.

I wholly agree that the metadata problems we encounter for StatCan data is not unique to just you, it would be interesting to explore this further, but perhaps as you say a whole session could be dedicated to this. At SP and in Ontario we have explored providing MARC records, OAI endpoint connections, and APIs for searching data available in ODESI, but again that is only for ODESI copies of DLI data. API integration into your catalogue is probably a good approach at this point, and with new discovery platforms offering custom API integration this is getting more and more sophisticated.

Perhaps even the national research data discovery platform FRDR can play a role in all of this?  Maybe not right now for DLI data, given the duplication….? But regardless we should look to some kind of solution as a group.