Research and quilting have a similar Zen in that both combine and build upon multiple prior works. But the workflow is difficult to reproduce in research software: how can we combine group X’s state-of-the-art ODE solver with group Z’s state-of-the-art parallel linear algebra to create Y’s new biology model when they all use different libraries and conventions? This is the problem that Julia tackles head on, thanks to it’s innovative type system and multiple dispatch. In “Shared Memory Parallelization of Banded Block-Banded Matrices” we describe how to combine the parallelization capabilities from one package (SharedArrays) with the specialized matrix of another (BlockBandedMatrices.jl) – without modifying the internals of either.
At Imperial we’re fortunate to have a powerful and well-maintained high-performance computing (HPC) system. We use this as a batch processing back-end for user-facing web applications that we have developed (such as Smart Forming) and for benchmarking projects including MUSE. The web applications themselves are typically hosted on CentOS VMware virtual machines hosted in our data centre and maintained by a dedicated team within ICT. These servers are set up to authenticate against our institutional sign-on system, are pre-configured with monitoring and alerting, and can directly access other on-premise systems (such as the HPC cluster and our Research Data Store).
Despite this local infrastructure we still derive a lot of value from access to our institutional Azure subscription, in both ad hoc and longer-term use of cloud resources. This gives us capabilities that would be difficult or costly to replicate on-premise. These include:
- The ability to rapidly provision and tear-down systems and services
- Access to higher-level (lower-maintenance) abstractions i.e. PaaS and FaaS
- Access to a diverse range of operating systems and configurations, from VMs for multiple versions of Windows to macOS build agents
In particular we rely on the following services:
- DevOps Pipelines: Cross-platform QA (primarily testing and linting) and packaging (including PyInstaller builds on macOS and Windows). Build failures are pushed to relevant Teams channels.
- Functions: Our Trending app provides us with information about active repositories in our institutional GitHub organisation. Using Functions makes its deployment zero-maintenance.
- App Service: Our GtR app provides us with alerts for new UKRI grants to Imperial College. It is deployed to App Service to avoid the setup and maintenance required of a standalone VM.
- Cosmos DB: Both GtR and Trending use the MongoDB API provided by Cosmos.
- Virtual Machines: We use Azure when we need VMs for long-running services that are required to accept incoming requests from other systems but don’t need access to on-premise resources, or when we need short-lived VMs for testing purposes:
- We use a Container Linux VM to run Fathom, which aggregates usage stats for some of our externally accessible apps (POWBAL, our Research Software Community Newsletters, and our Research Software Directory)
- We use Windows 7 and Windows 10 VMs to manually test relevant projects in the environments matching those of the intended end-users
- We use CentOS VMs to test in-development versions of applications, including those developed in collaboration with the Cardiovascular Genetics & Genomics Group at Imperial (CardioClassifier and VariantFX)
- Container Registry: We use continuous deployment for all our web apps (including MAGDA and POWBAL), meaning that pushing to the master branch in GitHub is sufficient to run our QA pipeline, build a Docker image which is pushed to the Azure registry, and for Watchtower to pull the image onto the target server and restart the relevant service(s).
- Single Sign-On: This allows users of our internal apps to authenticate using their existing Office 365 accounts – avoiding the need for further login details.
- Notebooks: We have our own Jupyter server attached to our cluster and data store, but Azure Notebooks are very useful for sharing externally, and for teaching large classes.
In short, Azure provides us with services that work alongside our existing systems, enabling us to deliver RSE projects more effectively and with much lower operational overheads than if we tried to replicate the same features on-premise. And by becoming familiar with these services we’re better equipped to advise and assist researchers across Imperial College who wish to take advantage of all the compute resources at their disposal – on-premise and in the cloud.
This is the third and final post in a series describing activities funded by our RSE Cloud Computing Award. We are exploring the use of selected Microsoft Azure services to accelerate the delivery of RSE projects via a cloud-first approach.
In our previous two posts we described two ways of deploying web applications to Azure: firstly using a Virtual Machine in place of an on-premise server, and then using the App Service to run a Docker container. The former provides a means of provisioning an arbitrary machine much more rapidly that would traditionally be possible, and the latter gives us a seamless route from development to production – greatly reducing the burden of long-term maintenance and monitoring.
By taking these steps we’ve reduced our unit of deployment from a VM to a container and simplified the provisioning process accordingly. However, building a container, even when automated, incurs an overhead in time and space and the resultant artifact is still one-step removed from our code. Can we do any better – perhaps by simply bundling our code and submitting to a suitable capable runtime – without needing to understand a technology such as Docker?
Azure Functions provide a “serverless” compute service that can run code on-demand (i.e. in response to a trigger) without having to explicitly provision infrastructure. There are similarities with the App Service in terms of ease of management, but also some differences: principally that in return for some loss of flexibility in runtime environment you get an even simpler deployment mechanism and potentially much lower usage charges. Your code can be executed in response to a range of events, including webhooks, database triggers, spreadsheet updates or file uploads.In this post we’ll demonstrate how to run deploy a simple scheduled task: a Node.js script that sends a periodic email identifying the most active repositories within a GitHub organisation. It uses the GitHub GraphQL API to get the the latest statistics (stars, forks and commits) and tracks the changes in a database. I use this script to receive weekly updates for trending repositories under ImperialCollegeLondon, but it’s easy to reconfigure for your own organisation.
As previously, we’ll use the Azure Cloud Shell, and arguments that you’ll want to set yourself are highlighted in bold.
As usual we first create a resource group, and then add a storage account for our function:
az group create --name myResourceGroup --location westeurope
az storage account create --resource-group myResourceGroup --name ictrendingstore --sku Standard_LRS
Creating our function app
Then we create our app (a container for one or more functions):
az functionapp create --resource-group myResourceGroup --name ictrending --storage-account ictrendingstore --consumption-plan-location westeurope
And upgrade Node.js so that we can use ES6 features including async functions:
az functionapp config appsettings set --resource-group myResourceGroup --name ictrending --settings FUNCTIONS_EXTENSION_VERSION=beta WEBSITE_NODE_DEFAULT_VERSION=8.9.4
Deploying our code
Before we upload our code we configure the runtime with some required configuration (repository name, GitHub token, MongoDB URL and email settings):
az functionapp config appsettings set --resource-group myResourceGroup --name ictrending --settings GITHUB_ACCESS_TOKEN=xxx ORGANISATION=ImperialCollegeLondon MONGO_URL=mongodb://username:email@example.com/db SMTP_URL=smtp://username:firstname.lastname@example.org EMAIL_FROMemail@example.com EMAIL_TOfirstname.lastname@example.org
We then simply upload a zipped copy of our code, its dependencies, and a trigger configuration (a timer for 8am on Mondays):
curl -LO https://github.com/ImperialCollegeLondon/trending/releases/download/v1.0.0/trending.zip
az functionapp deployment source config-zip ---resource-group myResourceGroup --name ictrending --src trending.zip
You’ll subsequently receive your weekly email on Monday morning, assuming there has been some activity in your chosen organisation!
Inspecting the code reveals that it needs to comply with a (very lightweight) calling convention by exporting a default function and invoking a callback on the provided context, and it needs to be written in one of several supported languages. We uploaded our source as an archive but you can also deploy (and then update) code directly from source control.
As usual you can delete your entire resource group, including your storage account and function by running:
az group delete --name myResourceGroup
In this post we’ve shown how zipping and uploading your source code can be sufficient to get an app into production. This is all without knowledge of any particular operating system or virtualisation technology, and at very low cost thanks to consumption-based charging and on-demand activation. Whether you choose to deliver your software as a VM, container or source archive will obviously depend on the nature of the application and its usage patterns, but this flexibility provides potentially great productivity gains – not only in deployment but also long-term maintenance. In this instance it’s a great fit for short-lived scheduled tasks but there any a huge number of alternative applications.
This is the second in a series of posts describing activities funded by our RSE Cloud Computing Award. We are exploring the use of selected Microsoft Azure services to accelerate the delivery of RSE projects via a cloud-first approach.
In our previous post we described the deployment of a fairly typical web application to the cloud, using an Azure Virtual Machine in place of an on-premise server. Such VMs offer familiarity and a great deal of flexibility, but require initial provisioning followed by ongoing maintenance and monitoring. Our team at Imperial College is increasingly using containers to package applications and their dependencies, using Docker images as our unit of deployment. Can we do better than provisioning servers on a case-by-case basis to get web applications into production, and thereby more rapidly deliver services to our users?
The Azure App Service provides a solution named Web App for Containers, which essentially allows you to deploy a container directly without provisioning a VM. It handles updates to the underlying OS, load balancing and scaling. In this post we’ll demonstrate how to run pre-built and custom Docker images on Azure, without having to manually configure any OS or container runtime. As previously, we’ll use the Azure Cloud Shell, and arguments that you’ll want to set yourself are highlighted in bold.
First of all we create an App Service plan. This only needs to be performed once for your active subscription:
az group create --name myResourceGroup --location "West Europe"
az appservice plan create --name myAppServicePlan --resource-group myResourceGroup --sku S1 --is-linux
Deploying a pre-built, public container image
az webapp create --resource-group myResourceGroup --plan myAppServicePlan --name ic-nginx --deployment-container-image-name nginx
We can then visit our public site at https://ic-nginx.azurewebsites.net/
You can use a custom DNS name by following these further instructions. Note that the site automatically has HTTPS enabled.
Decommissioning the webapp (thereby avoiding any further charges) is similarly straightforward:
az webapp delete --resource-group myResourceGroup --name ic-nginx
Deploying a custom container image
Running your own app is as simple as providing a valid container identifier to az webapp create. This can point to either a public or private image on Docker Hub or any other container registry, including Azure’s native registry.
For demonstration purposes we’ll build a Datasette image to publish the UK responses from the 2017 RSE Survey. Datasette is a great tool for automatically converting an SQLite database to a public website, providing not only a means to browse and query the data (including query bookmarking) but also an API for programmatic access to the underyling data. It has a sister tool, csvs-to-sqlite, that takes CSV files and produces a suitable SQLite file.
First we need to install both tools, download the survey data, and convert it from CSV to SQLite:
pip install https://github.com/simonw/csvs-to-sqlite/zipball/master datasette
curl -O https://raw.githubusercontent.com/softwaresaved/international-survey/master/analysis/2017/uk/data/cleaned_data.csv
csvs-to-sqlite --table responses cleaned_data.csv uk-rse-survey-2017.db
Then we can create a Docker image containing the data and the Datasette app with one command, annotating with the appropriate licence information:
datasette package uk-rse-survey-2017.db --tag mwoodbri/uk-rse-survey:2017 --title "UK RSE Survey (2017)" --license "Attribution 2.5 UK: Scotland (CC BY 2.5 SCOTLAND)" --license_url "https://creativecommons.org/licenses/by/2.5/scotland/deed.en_GB" --source "The University of Edinburgh on behalf of the Software Sustainability Institute" --source_url "https://github.com/softwaresaved/international-survey"
Then we push the image to Docker Hub:
docker push mwoodbri/uk-rse-survey:2017
And, as previously, create an Azure Web App:
az webapp create --resource-group myResourceGroup --plan myAppServicePlan --name rse-survey --deployment-container-image-name mwoodbri/uk-rse-survey:2017
After a brief delay the app is publicly available: https://rse-survey.azurewebsites.net/
Note that the App Service automatically detects the right port to expose (8001 in this case) and maps it to port 80.
Datasette enables you to run and bookmark SQL queries, for example this query which lists the contributors’ organisations in order of the number of responses received:
If you’re hosting your images on a publicly accessible that requires authentication then you can use the previous az webapp create command into two steps: one to create the app and then to assign the relevant image. In this case we’ll use the Azure Container Registry but this approach is compatible with any Docker Hub compatible registry.
First we’ll provision a container registry. These steps are unnecessary if you already have one:
az acr create --name myrepo --resource-group myResourceGroup --sku Basic --admin-enabled true
az acr credential show --name myrepo
Then we can login to our private registry and push our appropriately tagged image:
docker login myrepo.azurecr.io --username username docker push myrepo.azurecr.io/uk-rse-survey:2017
Finally we can create our webapp and configure it to be created using the image from our private registry:
az webapp create --resource-group myResourceGroup --plan myAppServicePlan --name rse-survey
az webapp config container set --resource-group myResourceGroup --name rse-survey --docker-custom-image-name myrepo.azurecr.io/rse-survey --docker-registry-server-url https://myrepo.azurecr.io --docker-registry-server-user username --docker-registry-server-password password
The end result should be exactly the same as when using the same image but from the public registry.
As usual, you can delete your entire resource group, including your App Service plan, registry (if created) and webapps by running:
az group delete --name myResourceGroup
In this post we’ve demonstrated how a Docker image can be run on Azure using one command, and how to build an deploy a simple app that presents a simple interface to explore data provided in CSV format. We’ve also shown how to use images from private registries.
This approach is ideal for deploying self-contained apps, but doesn’t present an immediate solution for orchestrating more complex, multi-container applications. We’ll revisit this in a subsequent post.
This is the first in a series of posts describing activities funded by our RSE Cloud Computing Award. We are exploring the use of selected Microsoft Azure services to accelerate the delivery of RSE projects via a cloud-first approach.
A great way to explore an unfamiliar cloud platform is to deploy a familiar tool and compare the process with that used for an on-premise installation. In this case we’ll set up an open source continuous delivery system (Drone) to carry out automated testing of a simple Python project hosted on GitHub. Drone is not as capable or flexible as alternatives like Jenkins (which we’ll consider in a subsequent post) but it’s a lot simpler and a suitable example of a self-contained webapp for our purposes of getting started with Azure.
We’ll be automatically testing this repository, containing a trivial Python 3 project with a single test which can be run via python -m unittest. We add a single YAML file to the repository to configure Drone accordingly.
There are then just three (short!) steps to get Drone testing the repository whenever code is pushed to GitHub. You don’t need anything except a web browser and an Azure account:
1: Create an Azure VM where we’ll install Drone
You can do this via the Azure Portal but we’ll use the new Azure Cloud Shell as it’s quicker – and easier to document, which is important for reproducibility. Drone is distributed as a Docker image so we’ll provision a minimal Container Linux VM to host it. We need to create a resource group, add the VM, give it a public DNS name (you will need to choose your own, instead of my-ci-server) and enable HTTP(S) access:
az group create -l westeurope --name my-rg
az vm create --name my-ci-server --resource-group my-rg --image CoreOS:CoreOS:Stable:1632.2.1 --generate-ssh-keys --size Basic_A0
az network public-ip update --name my-ci-serverPublicIP --resource-group my-rg --dns-name my-ci-server
az network nsg rule create --resource-group my-rg --nsg-name my-ci-serverNSG --name HTTP --destination-port-ranges 80 --priority 1010
az network nsg rule create --resource-group my-rg --nsg-name my-ci-serverNSG --name HTTPS --destination-port-ranges 443 --priority 1020
2: Register a new OAuth application in GitHub
In order to provide Drone with access to the repository (or repositories) we want to test, visit this page and enter the following, replacing the hostname appropriately:
- Application name: Drone
- Homepage URL: https://my-ci-server.westeurope.cloudapp.azure.com
- Authorization callback URL: https://my-ci-server.westeurope.cloudapp.azure.com/authorize
Save the Client ID and Client Secret for the next step
3: Install and configure Drone
Run the following commands back in the Cloud Shell. You again need to replace the hostname, and also provide your GitHub username and the Client ID and Secret from the previous step.
sudo docker run -d --name drone-server -e DRONE_HOST=https://my-ci-server.westeurope.cloudapp.azure.com -e DRONE_ADMIN=mwoodbri -e DRONE_GITHUB=true -e DRONE_GITHUB_CLIENT=xxxxxxxxxxxxxxxxxxxx -e DRONE_GITHUB_SECRET=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx -e DRONE_LETS_ENCRYPT=true -v drone:/var/lib/drone/ -p 80:80 -p 443:443 --restart=unless-stopped drone/drone
sudo docker run -d --name drone-agent --link drone-server -e DRONE_SERVER=drone-server:9000 -v /var/run/docker.sock:/var/run/docker.sock --restart=unless-stopped drone/agent
Then visit https://my-ci-server.westeurope.cloudapp.azure.com and toggle the switch next to the name of the relevant repository.
Drone is now monitoring the code for changes, and will run the test suite in response. If we deliberately break our unit test by making this change and pushing the code then Drone will immediately run the code and identify a problem:
We can then go onto configure Drone to notify us via email, Slack etc of failures using one of its many plugins.
We’ve seen how various features of the Azure platform, including Virtual Machines, Cloud Shell, and the extensive Marketplace can be combined with GitHub and Drone to rapidly deploy a secure, private CI system entirely from your browser. There exist alternative means of achieving the same result – not least various hosted, subscription based systems – and there are Azure recipes for Jenkins and Drone itself. However, the approach demonstrated here is applicable to any container-based software and therefore provides a flexible and efficient means of at least prototyping new services – via a cloud-first strategy.