15 June 2014

Identifying and acquiring datasets for repositories

Last Friday, along with Philippa Broadley (Research Data Librarian, Queensland University of Technology) and Marianne Brown (Data Collections Specialist, James Cook University), I attended a meeting of the Research Support Working Party of the Queensland University Libraries Office of Cooperation (QULOC). The working party have themed discussions at their regular meetings and this time around the topic was acquiring datasets for repositories, something that Philippa, Marianne and I all have experience with.

The discussion was very wide-ranging. Philippa provided a great overview of QUT's multi-pronged approach to identifying datasets, which includes:

  • Meetings between senior Library staff and senior research leaders
  • Outreach programs by liaison librarians, including attending departmental meetings
  • Newsletter articles
  • Building relationships with key research facilities, at point of establishment or at other critical times (e.g. a storage migration project)
  • A new ANDS-funded collaborative project with a spatial data focus, involving QUT's Institute for Future Environments as well as Queensland Government's Department of Natural Resources and Mines
  • Monitoring users of high performance computing (HPC) infrastructure at QUT
  • Monitoring QUT users of QRISCloud, the Queensland node of the Research Data Storage Infrastructure (RDSI) and the national research cloud provided by the National eResearch Collaboration Tools and Resources (NeCTAR)
  • Analysing reports from the publications repository (QUT ePrints) to identify willing depositors as well as publications that have been deposited with supplementary data
  • Contacting users of QUT's online survey service.
I made a couple of other suggestions including:
  • Seeking access to, or reports from, your Research Office's grants management system, e.g. of recently completed or about-to-be-completed projects
  • Using these reports to target researchers directly through emails and follow-up calls
  • 'Snowballing' i.e. asking every researcher you are dealing with if she/he can recommend anyone else for you to talk to
  • If the library is publishing open access journals or monographs, offering the ability to publish supplementary datasets through the repository
  • 'Repatriating' metadata from external repositories to ensure that datasets are part of the institutional record. 
Marianne's approach included some of the strategies noted above, but with a strong focus on linking data collection identification and acquisition with efforts to improve research data storage (as this is usually the researchers' most urgent need). Marianne also highlighted one of the benefits for researchers in linking from their institutional repository to an externally published dataset, which is that institutional repositories often feed the researcher profile system. 

Other themes that emerged from the discussion included:
  • The importance of automating data capture into repositories from data management solutions such as survey tools, electronic lab notebooks, and scientific imaging hardware such as microscopes, and the role that librarians can play in this, including metadata mapping and advice about licensing
  • Metadata being 'fit for purpose' in terms of identifying a dataset and its location as a minimum, rather than always needing a perfect full description
  • Aligning with changes in scholarly publishing such as the recent PLoS data policy and the emergence of data journals
  • The importance of short-term initiatives that focus on manual workflows for populating repositories with final state data as well as longer term strategies to build sustainable services that focus on earlier stages in the research lifecycle.
This was a really great session to attend and gave me some new ideas to put into practice at my own institutions. Thank to all involved for the chance to be part of it.