written by Sarah Stewart
Image: Ash Barnes
The Research Data Management service, based in Central Library, has ‘hit the ground running’ at the beginning of 2017. The RDM team will present a series of brief lunch-time talks, the ‘Byte-Size Sessions’, with the first taking place on February 14th. Then, on 17th March, we will host the second ‘Data Circus’, a College-wide showcase of open data and software.
Data and software are crucial materials for, and products of, research, underpinning scholarly communications. In recent years, funders such as the RCUK and the Wellcome Trust have mandated data publication and data sharing with as few restrictions as possible. For instance, the Wellcome Trust Policy on Research Data (2014) states:
‘Making research data widely available to the research community in a timely and responsible manner ensures that these data can be verified, built upon and used to advance knowledge and its application to generate improvements in health.’ (reference)
As research data generates increasing interest, sharing and re-use, instilling good practices for research data management has become important at all stages of the research process. At Imperial, the Research Data Management Team have developed a workflow which follows the lifecycle of research data through all of its stages; from data planning and creation, through to long-term data archiving, data publication, and data sharing, as per the workflow (Fig. 1).
Fig.1: The Research Data Management workflow at Imperial College follows the ‘life cycle’ of research data from the initial planning stages of a research project, through to publication, data sharing and long-term archival storage of research results.
The Research Data Management service has seen a steady increase in the number of queries since its inception (Fig. 2). These queries cover numerous aspects of research data management and the workflow at Imperial, but the most frequently-asked questions are about writing Data Management plans and creating data access statements.
Fig. 2: The RDM Team has seen a steady increase in the number of enquiries. These enquiries have been mediated through e-mail, telephone and face-to-face meetings.
The Research Data Management Team have been actively involved in outreach, promoting best practice in research data management. We have achieved this through tutorials, webinars, lab group and departmental visits, face-to-face meetings with researchers and faculty, and events such as Research Data ‘Clinics’. It has been a successful first year for the Research Data Management Team at Imperial, and we are looking forward to the year ahead.
Earlier today Imperial College submitted its annual report on compliance with the Research Council’s open access policy to RCUK. The RCUK OA policy envisages a five year journey after which 100% of RCUK funded scholarly papers should be available as open access in 2018. To support the transition to open access, RCUK have set up a block grant that makes funds available to institutions to cover the cost for article processing charges (APCs) and other OA-related expenses. Funds are awarded in relation to RCUK research funding for institutions, and Imperial College has the second largest allocation, just behind Cambridge and followed by UCL. The annual reports to RCUK give an overview over institutional spend and on compliance.
The headline figures for the 2015/2016 College report are:
- £1,051,130 block grant spend from April 2015 to March 2016
- 89% overall compliance, split in 31% via the gold and 58% via the green route
- 570 article processing charges paid at an average cost of ~£1,800
- The top five publishers are: Elsevier, Wiley, Nature, ACS and OUP
Like every year when discussing the RCUK report figures I think it is important to highlight that compliance rates between universities cannot meaningfully be compared without understanding the data sources and methods used. Just to give one example: the College could also have reported 81% green and 8% gold from the same data.
Why do I caution against directly comparing the numbers? For starters, research-intensive universities find it difficult to establish what 100% is. With hundreds, or in the case of Imperial College many thousand papers published every year we rely on academics to manually notify us for each paper who the funder is. Even though the College has made much progress improving its processes and data over the past few years we have to acknowledge that data collected through such a process will never be complete or fully accurate. For the College report we decided, like in the previous years, to base our analysis on outputs we know to have been RCUK-funded. For this year the size of the sample was 1,923 papers (compared to 1,326 in 2014). With a different sample the numbers would have been different, and other universities may have taken a different approach to analysing the data.
Sadly, it is currently not easy to establish whether an output was made available open access. Publishers do not usually add licensing information to metadata, and searching for manuscripts deposited in external repositories is possible but not necessarily accurate. The process we used for analysis was:
- Cross-reference the sample with the journal list from the Directory of Open Access Journals; class every article published in a full OA journal as compliant ‘gold’.
- Take the remaining articles and cross-reference with the list of articles for which the College Library has paid an APC; class all those articles as compliant ‘gold’.
- Take the remaining articles and cross-reference with the outputs from ResearchFish that show a CC BY license; class all those articles as compliant ‘gold’.
- Take the remaining articles and cross-reference with list of outputs deposited in the College repository Spiral; class all those articles as compliant ‘green’.
- Take the remaining articles and cross-reference with list of outputs that have a Europe PubMed Central ID; class all those articles as compliant ‘green’.
As in previous years we also put remaining outputs through Cottage Labs Lantern tool but this showed no additional open access outputs. The main reason for that, I suspect, is the high compliance via the green route: some 81% of outputs in the sample had been deposited to the College repository Spiral or to Europe PMC. As the College prefers green over hybrid gold it would have been in line with our policy to report them as green, but as the RCUK prefers gold OA we decided to report all outputs know as gold as such, like in previous years.
I could write more about reporting issues around open access, but as I have done that on a few other occasions I refer those who haven’t suffered enough to my previous posts.
One other caveat should be raised for those planning to analyse the APC spend in comparison with previous years: The APC article level data is based on APCs paid during the reporting period. This differs from the APC data reported in the previous period which was based on APC applications published. There are, therefore, a small number of records duplicated from the previous year. These have been identified in the notes column.
If you want the data from the full report, please visit the College repository: http://hdl.handle.net/10044/1/40159
I would like to thank everyone involved in putting the information together, especially colleagues in the Central Library’s open access team.