Wednesday, June 14, 2017

Digital Object Identifiers for StatCan data

Background
Back in 2011 and then again in 2015 there were some questions on the list about DOIs and StatCan data.  At one point it was asked specifically if STC had considered registering DOIs (it was right after DataCite Canada had launched and was being tested) and later [after the Abacus group (SFU, UBC, UNBC, and UVic) made the transition to Dataverse, it was] asked about assigning DOIs and [a] discussion ensued. Now UNB is also going to use Dataverse to offer up research data and were thinking to use it for secondary data as well, but  we have discovered that we have no choice about DOIs. That is, Dataverse is Harvard's resource and they officially indicate that DOIs are assigned. Period. Since Dataverse assigns DOIs automatically, we want to register them (or what's the point?) and are trying to figure out if there is a way that we might be able to force those DOIs to match STC registered DOIs if there were such a thing.

Question 1
If that's not possible, then, the question becomes: what is StatCan's official line on other institutions effectively assigning DOIs to their (STC) data?

Question 2
So to STC employees I would ask a) is Statistics Canada registering DOIs or do they plan to and b) is Statistics Canada concerned about multiple universities (or other organizations) assigning DOIs?  Perhaps we've missed something in the Dataverse documentation, but it really doesn't seem like we have a choice about the assignment of DOIs.

Question 3
My question to the DLI community is how do you deal with this issue (i.e., that Dataverse doesn't give you the choice of whether it assigns a DOI and it doesn't look like we can suppress the info) at your institution?  

Answer 1
  • At UBC, we mint DOIs only for research data, not licensed datasets. 
  • We do not mint DOIs via Abacus Dataverse but via our discovery layer - Open Collections - https://open.library.ubc.ca/, which allows us great flexibility for DOIs minting
  • The newest Dataverse version - 4.6.2 allows to mint handles in addition to DOIs, which might solve the UNB problem. We have collaborated with Harvard to offer that...
    • i.e.: https://dataverse.org/blog/dataverse-462, very new , just released last week or so...So good timing. I was working with Harvard on that for more than a year. Developed by DANS (our Dutch colleagues).More on Github - https://github.com/IQSS/dataverse/milestone/61?closed=1
  • We had to develop our entire DOIs GUI and a pipeline as Datacite Canada was not flexible enough for us. Here is more information - http://researchdata.library.ubc.ca/plan/get-dois/
  • By now we have minted more than 215,000 DOIs for our digital assets (out of 274K in Canada - https://stats.datacite.org/?fq=allocator_facet%3A%22CISTI+-+National+Research+Council+Canada%22&#tab-datacentres)
  • We have assisted multiple schools in their DOIs work, namely uOttawa, BC ELN, Guelph, McMaster, VIU and many more...

Answer 2
We [StatCan] have reached out internally to obtain more information.  Statistics Canada is collaborating with NRC’s DataCite to register DOIs for its aggregate data on the website.  While this will still take some time, progress is underway. I brought forth the concern from the community registering their own DOIs for statcan data. Statcan consulted with DataCite representative that indicated that the current best practice thinking is that multiple DOIs are accepted, as long as they are from different clients. There are good use cases for both registrations, so that another DataCite client with a different prefix will be able to assign a DOI to a copy of Statcan content stored in their repository.  

Informally, I was informed it would be ideal if repositories linked to the official Statcan DOI once available.  As the authors of the data, this would ensure that users are directed to the current and authoritative source.  At some point in the future, they have agreed it would be beneficial to have a conversation with stakeholders of the community.  We, the DLI, are continuing to have conversations with internal stakeholder regarding the potential for registration of items, such as PUMFs.
[StatCan] is interested to learn more about the communities perspectives, please share your comments on the list.


Further comments
Hi folks, I’m attending the Dataverse Community Meeting this week, the release which was discussed below allows support for Handles OR DOIs as the persistent identifier for your Dataverse instance, not both (if I’m clear on this, haven’t actually tested it yet). In our case, I believe we would want to support the option for selecting either a handle or doi on a dataset by dataset basis in the same DV instance  – see this use case ticket here: https://github.com/IQSS/dataverse/issues/3623

I’m also interested in coming up with a coordinated solution for registering DOIs for STC data, including aggregate data available to us via the DLI. I’m happy that STC and DLI may be assigning DOIs in the near future, which is a step forward. We’ve discussed this at length within the OCUL community and I hope it can be a topic of discussion at our national training next year in Montreal (I’m assuming this is happening). 

Here are some options we’ve explored/discussed:
  • publishing STC data w/ DOIs (argument that these are our access copies; according the DataCite BP)
  • publishing STC data w/ DOIs but unregistering these DOIs using the DataCite API
  • publishing STC data w/ Handles (however, not technically possible in Dataverse yet)
  • publishing STC data w/ internal Dataverse identifier (same as above)

I have some questions for other folks in Canada that might be helpful for our conversations…If we were to coordinate loading of STC data w/other Canadian universities 
  • what data are you loading?
  • can we harvest one repository? 
  • can we harvest DLI’s repository? 
For now our PUMFs will remain in our Nesstar repository, but we are in the process of releasing all non-PUMFs in our SP Dataverse. 

Further comments
The issue of duplicate DOIs is becoming a concern not only in our environment but elsewhere as well. At least, STC is now considering assigning them to aggregate data – a first step. It was also good to hear what is [happening] out west as this is an issue we have to consider for ODESI and the Scholars Portal Dataverse. There should be some interesting information coming out of  the DataVerse conference – I look forward to hearing about it from anyone else who is attending.