On Monday 28th September representatives of over 50 UK universities, ORCID, Jisc, GuildHE, RCUK and CRIS vendors met at Imperial College London for the first UK ORCID members meeting, and to launch the Jisc ORCID consortium. ORCID provides a persistent identifier that links researchers to their professional activities and outputs – throughout their career, even if they change name or employer. The unique iD ensures that authors receive credit for their work and allows institutions to automate information exchange with other organisations such as funders, thereby increasing data quality, saving academics time and institutions money.
In 2014, Imperial College London was one of the first universities in the UK to make ORCID available to researchers, working with the Jisc-ARMA-ORCID pilot. We have since actively engaged with ORCID and the community to increase uptake and improve systems integration.The UK ORCID meeting was designed to bring together different strands of these discussions, andto facilitate a broad discussion about the next steps for ORCID in the UK. Following the pilot programme, Jisc has negotiated an ORCID consortium through which universities can benefit from premium ORCID membership at significantly reduced cost. The meeting was the official launch event for the consortium.Over the last two years ORCID, a relatively new initiative, has gained a lot of momentum, not just in the UK:
- over 1.65m researchers registered globally
- ORCID iDs associated with over 4.3m DOIs
- over 300 member organisations
- 3 national consortia agreements signed (Italy, UK and Denmark) with more in progress
In 2011, Jisc had set up a “researcher identifier” task and finish group, that included funders, libraries, IT directors, research managers and organisations like HESA. This group eventually recommended ORCID as a solution for the UK. Since then, ORCID has seen increasing support from research organisations and funders. Recently, both the Wellcome Trust and NIHR have mandated the use of ORCID for grant applications. RCUK’s Overview of Systems Interoperability Project resulted in a strong endorsement for ORCID, as did HEFCE’s Report of the Independent Review of the Role of Metrics in Research Assessment and Management.
The UK ORCID meeting was not in the first instance about funders and their mandates though, it was about a discussion between the ORCID member organisations, the Jisc consortium and the way we as a community want to move forward. Specifically, the meeting had four aims:
- to raise awareness and understanding of ORCID and the Jisc consortium offer and benefits
- to bring together the UK ORCID community and establish how we want to work together
- to discuss community expectations for system and platform providers, funders and publishers
- to inform the Jisc technical and community support offering
The aim of the morning session was to raise awareness and create a shared understanding of ORCID. It started with presentations from ORCID and Jisc, followed by four university case studies from the pilot programme (Kent, Imperial, Oxford and York) and a Q&A panel. After lunch we discussed community requirements, and ways to work together to achieve these. Four thematic areas were discussed in breakout groups, organised through a community document where participants and others who could not attend in person, had listed their issues and expectations in advance of the meeting. This approach helped focus the discussions and led to a broad agreement on key issues.
Below is my summary of the key community requirements:
CRIS and repository platforms:
- actively prompt users to link their ORCID iD
- facilitate iD creation by pre-populating ORCID profiles with institutional affiliation and other relevant information
- harvest metadata for outputs associated with an iD from other systems
- allow users to push output metadata into the ORCID registry
- collect ORCID iDs for all authors, not just the corresponding author
- make iDs of all authors available with output metadata
- mint DOIs on acceptance and link to authors’ iDs
- make the author accepted manuscript available on acceptance, with an ID
- fully integrate ORCID into their workflows and systems
- move towards mandating ORCID
This is only a high-level summary of a much richer discussion. Some of the detail that I have conveniently skipped over will no doubt lead to further discussions later, but I found it remarkable how broad the consensus was – across more than 50 universities with very different approaches, requirements and cultures. There is still a lot of work to be done until we can reap all of the benefits that ORCID can enable, but the members meeting showed that universities are keen to work together with Jisc and ORCID to make progress.
Universities across the UK are now actively considering how to roll out ORCID, and there was much interest in lessons learned and emerging best practice. A UK ORCID mailing list is currently being set up and Jisc and ORCID are looking into ways to capture and share information through the new consortium. Jisc are currently hiring for staff to support the consortium and help members to implement ORCID. I am looking forward to follow-on discussions with Jisc, ORCID and the community about the next steps.
Presentations (in order of appearance):
- ORCID: What, Who, Why, How? (Alice Meadows, ORCID)
- ORCID: where are we, and how did we get here? (Neil Jacobs, Jisc)
- ORCID at Kent (Simon Kerridge, Kent)
- Imperial College ORCID project (Torsten Reimer, Imperial College London)
- ORCIDs at Oxford (Sally Rumsey, Bodleian Libraries)
- ORCID iD implementation. Experiences at the University of York (Janette Colclough)
When you come at it for the first time, open access looks pretty complicated. Funder policies, institutional policies, publisher policies, different flavours of OA including ‘green’, ‘gold’, ‘libre’ and ‘gratis’ and a whole new language with mystifying terms like ‘hybrid journal’, ‘article processing charge’ and ‘author accepted manuscript’ await. Even librarians sometimes struggle to understand journal policies, or what certain licensing conditions actually mean.
It was perhaps for this reason that, when we started the College open access project, academics gave us a clear mission: a one button solution to open access.
We haven’t quite achieved that yet, but since May we are running a new workflow that reduces the complexity to one sentence: ‘When you have a paper accepted, deposit the peer-reviewed manuscript – we do the rest, no matter what type of open access.’
The workflow is based on two ideas:
- Ask authors for the minimum information required.
- Offer authors a single publications workflow that covers green and gold OA as well information required for funder reporting.
The frontend for this workflow is Symplectic Elements, the system used by our academics to manage their scholarly outputs. We have worked with the vendor to deliver an OA workflow that kicks in on acceptance for publication, and then we customised the system to interface with ASK OA, our in-house APC management system.
On acceptance for publication, authors add minimal metadata and the manuscript to Elements, link the article to relevant grants and if they want the College to pay an open access charge they simply tick a box. Colleagues in the Library’s open access team then check the submission, set necessary embargoes and make the output available through Spiral, the College repository. If payment is requested, the data is automatically transferred to ASK OA, the cloud-based, workflow-driven system that we launched last year. Through that process, authors receive a purchase order number to send to their publisher. When the College receives the electronic invoice, our finance system matches the PO and the payment process starts. No author interaction needed.
Above you see a screenshot of the information we require from authors. In addition, they deposit the manuscript (or share a link if it was already deposited in an external repository) and link the output to relevant grants. That allows us to charge costs for open access publishing to the correct funders and, once funder systems are ready, will enable the College to automate funder reporting on research outputs. If you want to see a demonstration, check out this video guide produced by the College Library:
The feedback we had from academics has been positive so far, and the numbers show that as well:
While the workflow is working well so far, we are still far away from what I would consider the ideal scenario. There are still enough journals with difficult and unhelpful policies, and no university workflow will be able to fix that. Publishers being unable to issue correct invoices is another issue. We also have the problem to reliably match the metadata entered on acceptance with the metadata for the published output. Publishers could help by issuing authors with a DOI on acceptance.
Even better, publishers could feed publication metadata into systems like CrossRef on the date of acceptance. If the metadata had funder, licence and embargo information attached and a link to the manuscript, then open access would indeed become a one-click-problem. Authors enter their data on submission, and following acceptance it automatically travels through all relevant systems, until it ends up in an institutional repository. There would be no additional effort for authors, and admin overhead would be reduced greatly. The components to enable this already exist, for example the author identifier ORCID that was rolled out across the College last year.
We are still working towards the goal of a “one button” solution for open access with our partners. Until then the message remains: deposit the manuscript on acceptance, we do the rest.
The library has released a new workflow on how to make your publications REF compliant. Authors can now deposit their journal articles and conference proceedings on acceptance in Spiral via Symplectic. At the same time an application can be made for APC funding to pay open access fees.
Monday 9 – Thursday 12 February 2015 saw data management and curation professionals and researchers descend on London for the 10th annual International Digital Curation Conference (IDCC), held at 30 Euston Square. Although IDCC is focussed on “digital curation”, in recent years it has become the main annual conference for the wider research data management community.
This year’s conference theme was “Ten years back, ten years forward: achievements, lessons and the future for digital curation”.
Day 1: Keynotes, demos and panel sessions
Tony Hey opened the conference with an overview of the past 10 years of e-Science activities in the UK, in highlighting the many successes along with the lack of recent progress in some areas. Part of the problem is that the funding for data curation tends to be very local, while the value of the shared data is global, leading to a “tragedy of the commons” situation: people want to use others’ data but aren’t willing to invest in sharing their own. He also had some very positive messages for the future, including how a number of disciplines are evolving to include data scientists as an integral part of the research process:
Next up was a panel session comparing international perspectives from the UK (Mark Thorley, NERC), Australia (Clare McLaughlin, Australian Embassy and Mission to the EU) and Finland (Riita Maijala, Department for HE and Science Policy, Finland). It was interesting to compare the situation in the UK, which is patchy at best, with Australia, which has had a lot of government funding in recent years to invest in research data infrastructure for institutions and the Australian National Data Service. This funding has resulted in excellent support for research data within institutions, fully integrated at a national level for discovery. The panel noted that we’re currently moving from a culture of compliance (with funder/publisher/institutional policies) to one of appreciating the value of sharing data. There was also some discussion about the role of libraries, with the suggestion that it might be time for academic librarians to go back to an earlier role which is more directly involved in the research process.
After lunch was a session of parallel demos. On the data archiving front, Arkivum’s Matthew Addis demonstrated their integration with ePrints (similar workflows for DSpace and others are in the works). There was also a demo of the Islandora framework which integrates the Drupal CMS, the Fedora Core digital repository and Solr for search and discovery: this lets you build a customised repository by putting together “solution packs” for different types of content (e.g. image data, audio, etc.).
The final session of the day was another panel session on the subject of “Why is it taking so long?”, featuring our own Torsten Reimer alongside Laurence Horton (LSE), Constanze Curdt (University of Cologne), Amy Hodge (Stanford University), Tim DiLauro (Johns Hopkins University) and Geoffrey Bilder (CrossRef), moderated by Carly Strasser (DataCite). This produced a lively debate about whether the RDM culture change really is taking a long time, or whether we are in fact making good progress. It certainly isn’t a uniform picture: different disciplines are definitely moving at different speeds. A key problem is that at the moment a lot of the investment in RDM support and infrastructure is happening on a project basis, with very few institutions making a long-term commitment to fund this work. Related to this, research councils are expecting individual research projects to include their own RDM costs in budgets, and expecting this to add up to an infrastructure across a whole institution: this was likened to funding someone to build a bike shed and expecting a national electricity grid as a side effect!
There was some hope expressed as well though. Although researchers are bad at producing metadata right now, for example, we can expect them to get better with practice. In addition, experience from CrossRef shows that it typically takes 3–4 years from delivering an infrastructure to the promised benefits starting to be delivered. In other words, “it’s a journey, not a destination”!
Day 2: research and practice papers
Day 2 of the conference proper was opened by Melissa Terras, Director of UCL Centre for Digital Humanities, with a keynote entitled “The stuff we forget: Digital Humanities, digital data, and the academic cycle”. She described a number of recent digital humanities projects at UCL, highlighting some of the digital preservation problems along the way. The main common problem is that there is usually no budget line for preservation, so any associated costs (including staff time) reduce the resources available for the project itself. In additional, the large reference datasets produced by these projects are often in excess of 1TB. This is difficult to share, and made more so by the fact that subsets of the dataset are not useful — users generally want the whole thing.
The bulk of day 2, as is traditional at IDCC, was made up of parallel sessions of research and practice papers. There were a lot of these, and all of the presentations are available on the conference website, but here are a few highlights.
Some were still a long way from implementation, such as Lukasz Bolikowzki’s (University of Warsaw) “System for distributed minting and management of persistent identifiers”, based on Bitcoin-like ideas and doing away with the need for a single ultimate authority (like DataCite) for identifiers. In the same session, Bertram Ludäscher (University of Illinois Urbana-Champaign) described YesWorkflow, a tool to allow researchers to markup their analysis scripts in such a way that the workflow can be extracted and presented graphically (e.g. for publication or documentation).
Daisy Abbot (Glasgow School of Art) presented some interesting conclusions from a survey of PhD students and supervisors:
- 90% saw digital curation as important, though 60% of PhD holders an 80% of students report little or no expertise
- Generally students are seen as having most responsibility for managing thier data, but supervisors assign themselves more of the responsibility than the students do
- People are much more likely to use those close to them (friends, colleagues, supervisors) as sources of guidance, rather than publicly available information (e.g. DCC, MANTRA, etc.)
In a packed session on education:
- Liz Lyon (University of Pittsburgh) described a project to send MLIS students into science/engineering labs to learn from the researchers (and pass on some of their own expertise);
- Helen Tibbo (University of North Carolina) gave a potted history of digital curation education and training in the US; and
- Cheryl Thompson (University of Illinois Urbana-Champaign) discussed their project to give MLIS students internships in data science.
To close the conference proper, Helen Hockx-Yu (Head of Web Archiving, British Library) talked about the history of web archiving at the BL and their preparation for non-print legal deposit, which came into force on 6 April 2013 through the Legal Deposit Libraries (Non-Print Works) Regulations 2013. They now have two UK web archives:
- An open archive, which includes only those sites permitted by licenses
- The full legal deposit web archive, which includes everything identified as a “UK” website (including `.uk’ domain names and known British organisations), and is only accessible through the reading room of the British Library and a small number of other access points.
Software Carpentry is a community-developed course to improve the software engineering skills and practices of self-taught programmers in the research community, with the aim of improving the quality of research software and hence the reliability and reproducibility of the results. Data Carpentry is an extension of this idea to teaching skills of reproducible data analysis.
One of the main aims of a Data Carpentry course is to move researchers away from using ad hoc analysis in Excel and towards using programmable tools such as R and Python to to create documented, reproducible workflows. Excel is a powerful tool, but the danger when using it is that all manipulations are performed in-place and the result is often saved over the original spreadsheet. This both destroys (potentially) the raw data without providing any documentation of what was done to arrive at the processed version. Instead, using a scripting language to perform analysis enables the analysis to be done without touching the original data file while producing a repeatable transcript of the workflow. In addition, using freely available open-source tools means that the analysis can be repeated without a need for potentially expensive licenses for commercial software.
The Data Carpentry workshop on Wednesday offered the opportunity to experience Data Carpentry from three different perspectives:
- Workshop attendee
- Potential host and instructor
- Training materials contributor
We started out with a very brief idea of what a Data Carpentry workshop attendee might experience. The course would usually be run over two days, and start with some advanced techniques for doing data analysis in Excel, but in the interest of time we went straight into using the R statistical programming language. We went through the process of setting up the R environment, before moving on to accessing a dataset (based on US census data) that enables the probability of a given name being male or female to be estimated.
The next section of the workshop involved a discussion of how the training was delivered, during which we came up with a list of potential improvements to the content. During the final part, we had an introduction to github and the git version control system (which are used by Software/Data Carpentry to manage community development of the learning materials), and then split up into teams to attempt to address some of our suggested improvements by editing and adding content.
I found this last part particularly helpful, as I (in common with several of the other participants) have often wanted to contribute to projects like this but have worried about whether my contribution would be useful. It was therefore very useful to have the opportunity to do so in a controlled environment with guidance from someone intimately involved with the project.
In summary, Data Carpentry and Software Carpentry both appear to be valuable resources, especially given that there is an existing network of volunteers available to deliver the training and the only cost then is the travel and subsistence expenses of the trainers. I would be very interested in working to introduce this here at Imperial.
Jisc Research Data Spring
Research Data Spring is a part of Jisc’s Research at Risk “co-design” programme, and will fund a series of innovative research data management projects led by groups based in research institutions. This funding programme is following a new pattern for Jisc, with three progressive phases. A set of projects will be selected to receive between £5,000 and £20,000 for phase 1, which will last 4 months. After this, a subset of the projects will be chosen to receive a further £5,000 – £40,000 in phase 2, which lasts 5 months. Finally, a subset of the phase 2 projects will receive an additional £5,000 – £60,000 for phase 3, lasting 6 months. You can look at a full list of ideas on the Research At Risk Ideascale site: these will be pitched to a “Dragon’s Den”-style panel at the workshop in Birmingham on 26/27 February.
The Research Data Spring workshop on Thursday 12 February was an opportunity to meet some of the idea owners and for them to give “elevator pitch” presentations to all present. There was then plenty of time for the idea owners and other interested people to mingle, discuss, give feedback and collaborate to further develop the ideas before the Birmingham workshop.
Ideas that seem particularly relevant to us at Imperial include:
- Open Source Database-as-a-Service with Data Publishing
- A system to make it easier for researchers who currently use Microsoft Access or Excel to move their data to a robust relational database management system and share that data with collaborators.
- Computational experiments as data objects
- Packaging up computational experiments (such as weather simulations) into easily verifiable bundles with easy access to the software code, the input parameters and the results.
- Research data as a Unique and Distinctive Collection (UDC)
- Looking at how data can fit into research library collection policies for long-term preservation of and access to key institutional assets.
- Managing sensitive data
- Looking at how the rules that apply to sensitive data (both general, e.g. Data Protection Act, and specific, e.g. consent forms) can be codified and applied consistently to facilitate inter-institutional collaborations.
- Linked data notebook
- Development of a work-in-progress notebook that allows individuals and small research groups to capture and create sharable linked data.
- Research Data requirements vocabulary
- Codifying the requirements for data management (how long must it be held for, how large will it get, etc.) so that managing it can be at least partly automated.
- Standards and Schemas for Digital Research Notebooks
- Improving the interoperability of electronic lab notebooks/digital research notebooks.
Just in time before the College closes for the Christmas break I have found the time to write my overdue summary of recent developments in the world of open access and scholarly communication. Below are a few of the headlines and developments that caught my eye during the last couple of months.
Cost of Open Access
Commissioned by London Higher and SPARC Europe, Research Consulting have published Counting the Costs of Open Access. Using data provided by universities, including Imperial College, it concludes that there was a £9.2m cost to UK research organisations for achieving compliance with RCUK’s open access policy in 2013/14. Main conclusions are quoted below – the estimated costs for meeting REF open access requirements are particularly interesting seeing as HEFCE do not provide any funding for their in some ways even more ambitious open access policy:
- The time devoted to OA compliance is equivalent to 110 fulltime staff members across the UK.
- The cost of meeting the deposit requirements for a post-2014 REF is estimated at £4-5m per annum.
- Gold OA takes 2 hours per article, at a cost of £81.
- Green OA takes just over 45 minutes, at a cost of £33.
Pinfield, Salter and Bath published: The ‘total cost of publication’ in a hybrid open-access environment. The study analyses data from 23 UK institutions, including Imperial College, covering the period 2007 to 2014. It finds that while the mean value of APCs has been relatively stable, ‘hybrid’ subscription/OA journals were consistently more expensive than fully-OA journals. Modelling shows that APCs are now constituting 10% of the total cost of ownership for publishing (excluding administrative costs).
EBSCO’s 2015 Serials Price Projection Report assumes price increase of 5-7%, not including a recommended additional 2-4% to allow for currency fluctuations.
John Ulmschneider, Librarian at the Virginia Commonwealth University, estimates that with current price increases the cost for subscription payments would “eat up the entire budget for this entire university in 20 years”. Partly in response to that, VCU has launched its own open access publishing platform.
UK Funder News
Arthritis Research UK, Breast Cancer Campaign, the British Heart Foundation (BHF), Cancer Research UK, Leukaemia & Lymphoma Research, and the Wellcome Trust have joined together to create the Charity Open Access Fund (COAF). COAF operates in essentially the same way as the WT fund it replaces.
An article summarising responses to the RCUK review of open access cites the Wellcome Trust saying that sanctions could accelerate the implementation of open-access.
The Wellcome Trust published a list of journals that do not provide a compliant publishing option.
International Funder News
A new Danish open access strategy sets the goal to reach Open Access to 80% of all publically funded peer-reviewed articles in 2017, concluding with 100% in 2022.
The Open Access policy of the Austrian FWF requires CC BY (if Gold OA) and deposit in a sustainable repository on publication. The FWF covers APCs up to a limit of €2500.
Research Information published a summary of international developments around open access: The Research Council of Norway is making funding available to cover up to 50% of OA publishing charges. The Chinese Academy of Sciences and the National Natural Science Foundation of China require deposit of papers in an OA repository within 12 months of publication. The Mexican president has signed an act to provide “Mexicans with free access to scientific and academic production, which has been partially or fully financed by public funds”.
Publishers and Open Access
In November, negotiations between Elsevier and the Dutch universities broke down following an Elsevier proposal that “totally fails to address this inevitable change [to open access]”. The universities have since reached an agreement with Springer; negotiations with Elsevier have resumed.
The launch of Science Advances, a journal of the American Association for the Advancement of Science (AAAS), prompted strong criticism of the AAAS approach to open access. Over a hundred scientists signed an open letter criticising AAAS for charging $1000 for the CC BY license as well as $1500 for papers longer than ten pages – on top of a $3000 base APC. This has been picked up by media including the New Statesman.
The Nature Publishing Group has had two major OA-related headlines. Generally well received was the announcement that NPG would switch the prestigious Nature Communications to full open access. On the other hand, the move to give, limited, read access to articles has been widely criticised as beggar access and a step back for open access: NPG allow those with a subscription to give others viewing (not printing) access to papers, through a proprietary software.
An open letter signed by nearly 60 open access advocates, publishers, library organisations and civil society bodies warns against model licenses governing copyright on open access articles proposed by the International Association of Scientific, Technical & Medical Publishers (STM). The letter says the STM licences “would limit the use, reuse and exploitation of research” and would “make it difficult, confusing or impossible to combine these research outputs with other public resources”. The STM licenses are seen as incompatible with Creative Commons licences.
Jisc and Wiley have negotiated a deal that provides credits for article processing charges (APCs) to universities that license Wiley journal content and have a Wiley OA account.