Category: Events

Research Software Directories

This is a summary of a SORSE discussion session, presented by:

  • Mark Woodbridge, Imperial College London
  • Vanessa Sochat, Stanford University
  • Jurriaan Spaaks, Netherlands eScience Center

And featuring contributions from:

  • Malin Sandström, INCF
  • Alexander Struck, Humboldt University of Berlin

Introduction

The discussion session “Research Software Directories: What, Why, and How?” was held on September 16 during SORSE, an International Series of Online Research Software Events. As presenters, we each shared efforts to develop and maintain software directories: catalogues to showcase the software outputs of an institution or community. The directories presented were:

Each of the above offered several advantages and disadvantages, or were scoped for particular use cases. For example, research-software.nl provides a robust application for serving detailed metrics and metadata for software, however it requires more manual entry. The Research Software Encyclopedia is automated and does not require hosting, but it lacks the same level of metadata. The Imperial College London and GitHub Search research software directories offer much quicker to deploy solutions, but might be too simple for some use cases. The directories are discussed in detail in the following sections. In addition to this set, we suggest the reader take a look at the Awesome Registries list to find additional examples.

How many participants use software directories?

We were quite surprised at the results of asking attendees the extent to which they have contributed or used software directories. For a total of 27 participants, 43% have used a directory for a relevant project, 27% have submitted software to a directory, and 58% indicated neither of the above.

Presentations

The Research Software Directory by Netherlands eScience Center

Jurriaan’s presentation started off by explaining why the Netherlands eScience Center had a need for what eventually became the Research Software Directory. Primary reasons were that as the Netherlands eScience Center grew beyond say, 20 or so engineers, tracking what software was available in-house really became too difficult a problem to do ad-hoc, despite the fact that all of their repositories are publicly accessible on GitHub. Secondly, the eScience Center strives to be as open as possible, and they thought it was important to be able to show the outside world where the taxpayer’s money had gone. Lastly, the eScience Center has a continuous need to keep track of various metrics, both for reporting to their funders (SURF and NWO), but also for helping management make informed business decisions.

Jurriaan then demonstrated the eScience Center’s instance of the Research Software Directory. While walking the viewers through the design, he explained how the product pages’ design was helping site visitors on their way towards adoption of the software presented on the product page.

When designing the Research Software Directory, specific attention was paid to how an instance is filled with data, how this data is curated, and how to do this in a way that can be sustained over time. To this end, the Research Software Directory harvests much of its information automatically, for example using APIs to GitHub (code development platform), Zenodo (archiving service), and Zotero (reference manager). This setup allows engineers employed by the Netherlands eScience Center to stay mostly in their comfort zone (i.e. GitHub). They just need to make sure to follow best practices such as having publicly accessible repositories, making releases on Zenodo using the automated integration, and including software citation metadata (CFF) in their repositories. Given that they already do much of that anyway, making an entry in the Research Software Directory can be achieved in a few clicks via the Admin interface that the Research Software Directory provides.

The Research Software Directory has proven to be a great resource for managing the organization, for providing funders with relevant metrics, and for increasing the visibility of tools. Despite these upsides, of course there are some downsides as well, for example it has proven difficult to carve out enough time to curate prose on the product pages, leading to text snippets that are sometimes too difficult to read for visitors not yet familiar with the software that the product page presents. A second problem is maintenance of the Research Software Directory software itself: the software stack includes more than 40 techniques, methods, and tools, in various languages and using a variety of frameworks. It has proven difficult to find developers that are familiar enough with all of these to be effective at maintaining the site. While this has not led to any significant downtime in the 3 years research-software.nl has been running, eScience Center intends to start reducing the software stack in the very near future. Furthermore, they are investigating whether it’s feasible to provide Research Software Directories as a service.

The Research Software Directory by Imperial College London

Mark Woodbridge demonstrated Imperial College’s Research Software Directory, explaining how it was developed to present a manually curated list of GitHub and GitLab repositories – motivated by a desire to rapidly catalogue and demonstrate the breadth of software developed at Imperial. It is also intended to encourage collaboration by assisting researchers to identify existing expertise and projects at Imperial.

The chosen approach has resulted in a system which is easy to maintain – both in operational complexity and in adding entries to the directory (even if the latter does depend on some familiarity with git and GitHub i.e. making a commit and pull request). This simplicity comes at a price: it depends on Algolia (a freemium service), has limited features, and is not easy to customise. It also relies on manual curation and repository metadata: due to limited bandwidth and lack of incentives, developers rarely submit or annotate software themselves. Finally, it lacks the polish and level of detail that you might expect of a public-facing showcase.

The system has however achieved its aims in effectively showcasing research software and developers at Imperial, and has provided a set of metadata enabling the identification of preferred languages to fast-growing fields of research. A suite of standalone utility scripts ensures that the contact details and project web pages remain up-to-date, and that new repositories by known developers are added to the directory in a timely manner.

The Research Software Encyclopedia

The Research Software Encyclopedia (RSEPedia) is a community-driven, open source directory that provides a means to communicate about software. It consists of three components – a set of criteria and taxonomy items used to describe or otherwise communicate about software categorization preferences, a database, and a command line client to interact with those components. The criteria and taxonomy items are maintained in their own GitHub repository, https://github.com/rseng/rseng, and render to an interface to allow for exploration and visualization. Importantly, the site for these items hosts a weekly software showcase, allowing the community to learn more about open source libraries that might be useful for their work. The terms are also served programmatically to a RESTful application programming interface (API) that makes them readily available for the RSEPedia software, which is also provided on GitHub (https://github.com/rseng/rse). Using the software, an individual or institution is empowered to easily generate a database and interface for a set of software they care about. They can inspect, add, search, or otherwise interact with metadata. While relational databases can be created, the community maintained database is a flat file database hosted on GitHub (https://rseng.github.io/software) that allows an interested contributor to browse, and annotate software with criteria and taxonomy items in an online interface. Annotation only takes a few clicks, and the process to make changes and update the database is fully automated via GitHub actions. Annotation in bulk is also easy to do locally after cloning the software repository, starting the annotation interface, and opening a pull request with changes. Importantly, although annotation can help to share ideas about software, it is not required to make the RSEPedia useful. By way of being able to communicate about software via asking questions, and by way of the software showcase, the RSEPedia can be successful for your needs if you just need a way to describe what you are looking for (e.g., for a grant or journal) or just want to share your set of software to be easily searchable.

GitHub Search is a derivation of the Research Software Directory by Imperial College London, but it removes the Algolia dependency, and derives software repositories directly from the GitHub API list of repositories for an organization directly on GitHub pages. This means that deployment is easy, coming down to simply creating the repository with a GitHub action to build it at some frequency to update the pages.

Discussions

After the presentations, attendees were divided over three groups for a 20-minute discussion session. All three groups saw lively discussions and discussed a plethora of relevant subjects, a selection of which is included below.

How do software directories interact with high performance computing (HPC)?

With several attendees that work as administrators for HPC, the question quickly came up about the relationship between software directories and HPC centers. Indeed, these centers typically maintain a large catalog of software for a user base, and it could be beneficial to link this software catalog or strategy to maintain it with a software directory. For example, if you are familiar with spack or easybuild you could imagine having integration to use a software directory to look up metadata, or generate user-friendly documentation pages. The pages might have install instructions, examples, and optimization hints for different architectures.

Guix-HPC is a package manager for a variety of software that is developed to allow reproducible HPC environments. It may interact with existing instances of Research Software Directories.

Curation policies

The main concern related to the “curation” of software directories were criteria for inclusion. A lively discussion related to the definition of “research software”, particularly in relation to scale and licensing. In the broadest sense there was agreement in principle that it could refer to any tool or library used to produce scientific results.

In terms of scale, attendees working in life sciences research emphasized that research software in their context could be a standalone script, and software directories should therefore “scale-down” appropriately.  Scripts of this type may be less substantial but their quality could well be assessed similarly to more prototypical projects in terms of documentation, design for re-use and version control.

Licensing was a more challenging topic – an argument was made for directories enabling users to find any tool that might accelerate research, including commercial software  – as long as an appropriate licence was available.

In broader terms, there was consensus that curators should avoid making assumptions about software applicability and relevance, even if they do have domain knowledge. More important than strict policies is effective annotation and filters so that users can apply their own criteria when searching for relevant software.

Searching for software

Searching for software presents its own challenges as an RSD only presents local results and many other platforms would need to be consulted for an exhaustive overview of relevant packages. Here, some registry lists prove to be helpful, for example Awesome Research Software Registries.

The purpose and minimum features of Research Software Directories

Participants identified discoverability as a major issue in relation to research software, particularly for domain specialists (i.e. end-users). This led to the following features being considered of primary importance:

  • Metadata clearly explaining the purpose and value of individual software tools in non-technical terms. The community is currently working on metadata standards like CFF or CodeMeta.
  • Contact details for the authors of the software in case further advice or support is required
  • Installation and getting started instructions
  • Guidance on how to cite the software
  • Licensing terms. This was discussed not only in relation to terms of use but also, for non-free software, ensuring cost-efficiencies by avoiding unilateral purchasing decisions and promoting the use or procurement of shared/group licences.

Many other features may benefit researchers, for example, linking from an RSD entry to its accompanying paper and data, as suggested in “Generalist Repository Comparison Chart” or listing received software citations, as implemented in swMATH.

Organization-based registry vs community-based registry

Some registries out there are scoped to serve an organization, whereas other registries like ascl.net or bio.tools aim to serve an entire research community. An advantage of the latter is increased traffic to the registry, and real benefits for users to browse the registry to see if somebody else in the community already created a solution. However, because the social structure across the community is quite loose, it will be more difficult to keep people involved, to discover new tools that could be added to the registry, and to make sure that the language used on the registry’s pages is understandable by everyone in the community. Furthermore, governance of the instance will be more difficult. For example, within the community there may exist different opinions on what metadata should be kept, and weighing these opinions will be more difficult in a larger community than a small one.

In contrast, organizational registries are more easy to run and govern — discovering tools that could be added is (or used to be) a matter of hanging out at the coffee machine and asking your colleague what they are working on right now. Helping your colleague enter their data, and making sure they do it correctly, is easier as well, and some good old-fashioned peer pressure can be applied if needed. Funding policies currently do not mandate the publication of research software, as Horizon 2020 required for research data (if possible).

Further resources

Recommendations and Next Steps

By discussing topics of curation, federation, technology and sustainability of research software directories with a wider audience, this discussion section hoped to not only promote the benefits of such directories and encourage their deployment, but also to identify issues and gather ideas to address them. From discussion above, it’s clear that there are interesting projects and updates to existing directories that might be pursued.

RSLondonSouthEast 2020

RSLondonSouthEast 2020, the annual gathering for Research Software Engineers based in or around London, took place on the 6th February at the Royal Society. The College was strongly represented by contributions from RSEs based at Imperial.

Full talks:

Lightning talks:

Posters:

Jeremy Cohen introduces RSLondonSouthEast 2020 at the Royal Society

Jeremy Cohen (Department of Computing) was the chair of the organising committee. Stefano Galvan (Department of Mechanical Engineering), Alex Hill (Department of Infectious Disease Epidemiology) and Jazz Mack Smith (Department of Metabolism, Digestion and Reproduction) served on the programme committee.

Many thanks to all the committee members and everyone who presented, submitted proposals or attended on the day, and to EPSRC and the Society of Research Software Engineering for their support. For more information from the event check Jeremy’s full report, RESIDE’s blog post or #RSLondonSE2020 on Twitter.

1st Research Software Winter Seminars and Roundtable

On Thursday 12th of December the Research Computing Service joined the College’s Research Software Community in celebrating the 1st Research Software Winter Seminars and Roundtable, the final event of another great year of building research software at Imperial. The event had two goals: first, to celebrate the research software-related achievements of the RS Community during 2019, and second, to plan the activities and goals for the year that is about to start.

The seminar session featured nine exciting talks, ranging from a review of the activities of the Community during 2019 and the training opportunities in computing and data science skills, to technical talks on the use of complex analysis pipelines for RNA sequencing and the extension of open source software with custom features.

This is the full list of talks, including several relevant links:

After the talks, there was a roundtable discussion chaired by Diego Alonso, with a panel including Elsa Angelini, Jeremy Cohen, Phoebe Pearce and Mark Woodbridge, to help answer some questions about what the audience would like to see from the Community next year, how we can communicate with each other better and who can get involved to make those things happen. There were many excellent contributions from the audience, who were also very engaged and eager to see the community grow and take an active role on it.

Among the activities that were discussed – and that gained volunteers to help make them a reality – were the creation of a Slack workspace as an instantaneous, bidirectional communication channel within the community (already up and running; sign-up now!) and the recruitment of RSE Champions in the different communities (PhD students, postdocs, etc) to promote Community events and bring more people aboard or to assist with the organisation of departmental events.

The event concluded with informal drinks and nibbles in the ICT Kitchen – including mulled wine! – where the enthusiastic attendees and speakers mingled together and shared experiences and plans for the future.

There are plenty of things going on and 2020 is due to see a very bright RS Community at Imperial!

NL-RSE19

On 20 November 2019 Mark Woodbridge and Jeremy Cohen represented Imperial College at NL-RSE19, the first annual conference of the Netherlands Research Software Engineer community.

NL-RSE19 poster session

Their presentation, Strength in Numbers: Growing RSE Capacity at Imperial College London (10.5281/zenodo.3548308) described the expanding groups involved in RSE at Imperial, their respective activities, and how examples of these are fostering collaboration and awareness across the College. They also took the opportunity to display a poster first shown at UKRSE19 that highlights key aspects of these initiatives. The talk and poster generated much interest and resulted in productive discussions with members of the NL-RSE community in relation to building inclusive communities, long-term support for research software, personal development opportunities for RSEs, and how best to support the broad range of research typically carried out in larger institutions.

NL-RSE19 poster session

Many thanks to the organisers (in particular Niels Drost and Ben van Werkhoven of the Netherlands eScience Center) for the opportunity to engage with the vibrant and rapidly growing RSE community in the Netherlands.

Hacktoberfest 2019

On Thursday 10th October a Research Software Engineering (RSE) themed Hacktoberfest event was hosted by Imperial College’s Research Computing Service, Research Software community and ICT. Signups from Imperial spanned all four faculties and the event also produced external interest with registrations from UCL and the V&A.

The evening opened with a short introduction to Hacktoberfest from Jeremy Cohen of the Imperial Research Software community. Chris Cave-Ayland from the RCS followed with a crash course on the steps to follow when making a contribution to an open source project on GitHub. The last opening talk was given by Vasily Sartakov from the Large-Scale Data and Systems Group (LSDS). Titled “Open Source Opportunities”, Vasily’s talk showed how open source has become the dominant paradigm for software development.

The opening talks were followed by lightning talks from the research projects participating in the Hackathon. In total, six software projects attended from five different groups:

Pizza-fuelled hacking then commenced! Participants either chose one of the presented projects to work with, or an alternative project that they were interested in contributing to. Thanks to having direct access to project developers, attendees were able to get up to speed quickly and start working on pull requests for submission. Participating in Hacktoberfest was found to be very valuable for the research projects involved with a total of eight new contributions made so far.

Hacktoberfest

Many thanks to all the speakers and participants who took part, and to Imperial College ICT for supporting this event. We hope to see many of you claiming your Hacktoberfest t-shirt by the end of October!

RSEConUK 2019

September 2019 saw the 4th Conference of Research Software Engineering (RSEConUK 2019) take place in Birmingham, UK. From the 17th-19th September over 350 RSEs, software engineers, researchers and people with a wide range of related roles came to the University of Birmingham to participate in the largest Research Software Engineering conference yet.

RSE19 conference photograph
RSE19 conference photograph courtesy @RSEConUK

While the majority of the attendees were from the UK and Europe, the conference attracted people from around the world.

The conference has been growing each year and this time there was a packed schedule including two keynotes, a series of parallel sessions with talks and panels, a day of workshops and some additional special sessions such as RSE Worldwide.

Imperial was well represented with 11 members of the College attending the conference at various times during the week and getting involved by volunteering, giving talks, joining panels, running workshops and presenting posters:

It was fantastic to see so much participation from Imperial and representatives from many different departments across the College. This provides a great example of how Research Software Engineering at Imperial is such a vital element of the College’s research output and we look forward to seeing an even greater presence from Imperial at next year’s conference.

Research Software London Software Carpentry

On the 9th and 10th July 2019 the Research Software London community ran its first regional Software Carpentry workshop. The event was jointly organised by Imperial, UCL and Queen Mary with Queen Mary hosting the workshop at their Mile End Campus. Several Imperial software carpentry volunteers and members of the Imperial research software community were involved in organising and running the event along with organisers, instructors and helpers from UCL and Queen Mary. The workshop covered a standard Software Carpentry syllabus with the attendees being taught the basics of the Unix shell and git on the first day of the workshop with an introduction to Python on the second day.

The majority of attendees were from Queen Mary, UCL and Imperial but spaces were also made available to the wider RSLondon community. This provided a great opportunity for newcomers to the research software field from institutions that don’t currently run carpentry workshops to attend and learn some core computing and software development skills. More than 30 people registered for the workshop and we received significant positive feedback as well as helpful suggestions on possible enhancements for future workshops.

Software Carpentry lesson
Image courtesy of David Pérez-Suárez

Building on the success of this event, RSLondon are planning to run further such workshops and are looking at other areas covered by The Carpentries for future sessions, in addition to Software Carpentry. If you have contacts at other institutions in London and the South East region who you think would be interested in hosting or attending an RSLondon Carpentry workshop later in 2019, get in touch with Jeremy Cohen

Imperial College RSE Team members Chris Cave-Ayland (instructor) and Mayeul d’Avezac (helper) assisted at this workshop.

deRSE19

The first German national RSE conference took place in Potsdam on 4th-6th June 2019 with 187 attendees. deRSE19 was a really vibrant, welcoming and well-organised event in a great location and had a diverse agenda, encouraging participants from across Europe to share experiences of software engineering in research.

deRSE19 group photo
deRSE19 aerial group photo (CC-BY Antonia Cozacu, Jan Philipp Dietrich, de-RSE e.V.)

In terms of presentations Imperial College was the best-represented institution from outside Germany, with the following speakers:

  • Jeremy Cohen (EPSRC RSE Fellow, Department of Computing) who presented a talk on building research software communities and a poster about RSLondon.
  • Alex Hill (Senior Web Application Developer, Department of Infectious Disease Epidemiology) who spoke about the challenges of conducting constructive code reviews, particularly in a research setting.
  • Mark Woodbridge (RSE Team Lead, Research Computing Service) who gave a talk on RSE 2.0, reflecting on progress in Research Software Engineering and how it may develop in the near future.

Many thanks to all the event organisers and sponsors for giving us the opportunity to present.

Also during the conference a keynote on RSE collaboration was delivered by Alys Brett, chair of the newly established Society of Research Software Engineering and head of the Software Engineering Group at the UKAEA. UK RSEs also attended deRSE19 from the Software Sustainability Institute, the University of Westminster, and the University of Southampton. We look forward to reuniting with them, as well as colleagues from Germany and beyond at UKRSE19 in September!

Research Software in Physics event

The first Imperial College Research Software in Physics event took place on Friday 17th of May. This event, organised by the Imperial Research Software Community and supported by the College’s ICT department, aimed to help researchers to meet others writing or using research software (RS) in Physics and learn about resources available to help them do so. It gathered around 25 people from all seniority levels and several departments, who shared for over two hours their experiences and opinions on different aspects surrounding the development and use of software for research.

Diego Alonso Álvarez

The event was opened by Diego Alonso Álvarez, a member of the Research Software Engineering team in the Research Computing Service and ex-member of the Physics Department, and Jeremy Cohen, coordinator of the Imperial Research Software Community. Between them they gave an overview of the value of research software (RS), the services available at Imperial to promote software sustainability and good coding practices, and the broader landscape of the RS community in the London area and the UK.

Pat Scott, a lecturer from the Astrophysics group, gave the first of the invited talks, focused on GAMBIT, “a global fitting code for generic Beyond the Standard Model theories” but with potential utility in any other research discipline. Pat highlighted that coding is not an add-on in physics any more but an integral part of it. He also pointed out that while it is important to have good coding practices, increase your user base and publish papers on your code, in the end, in the broader community, you will be judged by your contributions to physics, not software.

The second talk was given by Kelvin Choi, PhD student from the Space & Atmospheric Physics Group, broadly speaking about the challenges involved when working with climate data and models. Among other topics, Kelvin discussed the need to wrap legacy code in more modern languages in order to maintain the traceability and the comparability of the results to those carried out in the 1980s. He also described the need for a pipeline transforming the raw TB of data coming from the satellites to the end results, gluing together different software – often written in different languages – and combining different data formats.

These speakers were followed by 7 lightning talks given by researchers in the department and the Imperial RSE team, including:

  • specific applications of GAMBIT (Janina Renk) and its combination with other tools like TensorFlow to efficiently explore a large parameter space (Benjamin Farmer);
  • the description of custom advanced software for the modelling of the formation of planet-forming discs (Marija Jankovic) or the performance of novel solar cells (Philip Calado);
  • the challenges of dealing with legacy code and data related to the Cassini mission (Gregory Hunt);
  • some of the activities of the RSE team, improving the accessibility of the data from the Cassini mission (Christopher Cave-Ayland) or using the xarray Python package to improve the quality and readability of existing code (Mayeul d’Avezac).

All the talks were very engaging and in several cases sparked discussion points that were adopted for the final part of the event. In the discussion session, the audience was divided into groups around different topics ranging from code peer review and code review practices in the software engineering industry, testing practices and reproducibility or software development models and methodologies within the research community. Dedicated blog posts on the contents on these discussions will follow in due course.

Many thanks to Pat Scott, all the speakers, and everyone who attended the event.

RSLondonSouthEast 2019

Research Software London‘s first annual workshop took place at the Royal Society on February 8, 2019, bringing together a regional community of research software users and developers from over 20 institutions. It featured a diverse schedule of talks and discussions about software engineering, community building and both domain-specific and general-purpose tools of relevance to research.

There were four talks from Imperial researchers, including the keynote from Professor Spencer Sherwin, Director of Research Computing. The College’s Research Computing Service was also represented by Dr Diego Alonso Álvarez, who presented an introduction to xarray and described the RSE team’s work on integrating it into the MUSE energy systems model.

Please see Diego’s slides for more information. Other talks and media from the event are available via #rslondonse19.

Thanks to RSLondon, the programme committee and its chair Dr Jeremy Cohen for organising an informative and stimulating day, and to the EPSRC for supporting the event. We’re looking forward to participating in future meetings and helping further strengthen the regional RSE community.