- David Van Dyk, Imperial College London
- Sofia Olhede, University College London
- David Hand, Imperial College London and Winton Capital
- Mark Briers, QinetiQ
- Tomasz Duczmal , Experian
- Giovanni Montana, King's College London
- Mark Roulston, Winton Capital
- Driss Ouazar, ENIM Rabat
- Yike Guo, Imperial College London
- Christoforos Anagnostopoulos, Imperial College London
- Peter Richtarik, University of Edinburgh
- Michael A. Osborne, University of Oxford
AbstractsSofia Olhede, "Limiting Behavior in Large Networks"
Networks are ubiquitous in today's world. Any time we make observations about people, places, or things and the interactions between them, we have a network. Networks are constantly coming online at larger and larger scales, as part of the "Big Data" and "Open Data" revolutions. At the same time, a quantitative understanding of real-world networks is in its infancy, and requires strong theoretical and methodological foundations to be laid. The goal of this talk is to provide some insight into these foundations for the non-specialist, and in particular how we can mathematically model limiting behavior in large networks. Questions to be discussed include the following: How do we simplify and understand a rigid, discrete mathematical construction (a graph) in terms of an infinite-dimensional limiting object (a mathematical function)? What types of limiting behaviors should we expect to see in real-world networks? What types of behaviors can we model mathematically, leading to defensible data-analytic conclusions and sense-making, as well as algorithms that come with strong guarantees of correctness?
Tomasz Duczmal, "It's a great time to be a geek!"
As Analytics professionals data is our bread and butter. With the increasing digitisation of our lives data is becoming more and more abundant and is now officially Big. There are some big (no pun intended) changes to the data landscape coming our way which will have profound implications for the type of problems we tackle and the tools we use to tackle them. In this presentation Paul Russell will discuss some of the challenges this presents to a data and analytics business, set this in the context of an industry built on data-driven decision making and try to persuade you that there has never been a better time to be in Analytics.
David Van Dyk, "Big Data and Complex Modeling Challenges in Astronomy and Solar Physics"
In recent years, technological advances have dramatically increased the quality and quantity of data available to astronomers. Newly launched or soon-to-be launched space-based telescopes are tailored to data-collection challenges associated with specific scientific goals. These instruments provide massive new surveys resulting in new catalogs containing terabytes of data, high resolution spectrography and imaging across the electromagnetic spectrum, and incredibly detailed movies of dynamic and explosive processes in the solar atmosphere. The spectrum of new instruments is helping scientists make impressive strides in our understanding of the physical universe, but at the same time generating massive data-analytic and data-mining challenges for scientists who study the resulting data. This talk will give an overview of a number of statistical challenges that arrives form big data and complex models in astronomy and solar physics.
David Hand, "Big data: past, present, and future"
We are inundated with messages about the promise offered by big data. Economic miracles, scientific breakthroughs, technological leaps appear to be merely a matter of taking advantage of a resource which is increasingly widely available. But is everything as straightforward as these promises seem to imply? I look at the history of big data, distinguish between different kinds of big data, and explore whether we really are at the start of a revolution. No new technology is achieved without effort and without overcoming obstacles, and I describe some such obstacles that lie in the path of realising the promise of big data.
Mark Roulston, "Big Data in Finance: Forecasting a changing world"
Every day thousands of financial instruments are traded around the globe, generating huge amounts of data. Ideally, the prices of these financial instruments should reflect some aspect of the real economy - whether it is the value of a company or the relative scarcity of a barrel of oil. Forecasts of prices should incorporate data describing the real world. This "exogenous" data might be the sales of a corporation or the amount of oil in storage, or one of thousands of other sources of datasets, adding to the mountain of potentially relevant information. Forecasting in finance is complicated by the fact that signal-to-noise ratios tend to be very small because obvious opportunities for profit are "arbitraged" away by market participants. This differentiates finance from the natural sciences, where an effect can be known about by everyone without weakening it. The combination of a large universe of potential predictors and low signal-to-noise means that forecasters of financial markets are particularly susceptible to over-fitting and selection and publication biases. I will describe these problems and some of the ways they can be mitigated.
Christoforos Anagnostopoulos, "Streaming data analysis"
Data collection technology is undergoing a revolution that is enabling streaming acquisition of real-time information in a wide variety of settings. Streaming learning algorithms must rely on summary statistics and computa- tionally efficient online inference without the need to store and revisit the data history. Moreover, learning must be temporally adaptive in order to remain up-to-date against unforeseen changes in the underlying data generation mechanism. In this tutorial, we describe commonly used streaming data algorithms in classification and regression, offer a methodological discussion of common threads underlying such algorithms, and present illustrative examples.
Peter Richtarik, "Algorithms for Big Data Convex Optimization"
In this tutorial I will describe several recent gradient-type optimization algorithms able to efficiently handle big data problems. Broadly speaking, I will talk about modern scalable variants of stochastic gradient descent, stochastic block coordinate descent and subgradient descent. All these methods are able to efficiently solve problems with billion or more variables.
Michael Osborne, "Big model configuration with Bayesian optimisation and Bayesian quadrature"
In the age of big data, algorithms and software are approaching a degree of complexity that will make manual configuration nigh impossible. As an example, state-of-the-art deep neural networks require the tuning of multiple layers of parameters, leading to a multi-modal parameter space in which performance evaluations are highly expensive. This tutorial will introduce a rich class of approaches to the automatic optimisation and averaging of such algorithms: respectively, Bayesian optimization and Bayesian quadrature. These approaches are built on a foundation of flexible Gaussian process models that are able to robustly represent the performance or likelihood of algorithms even without strong prior knowledge. Likewise, this tutorial will assume no strong prior knowledge of these methods, and will aim to give a gentle coverage of the field.
Mark Briers, "A practitioner's guide to Big Data"
Successful use of Big Data technology often requires integration of several processing paradigms (e.g. Map-Reduce and Streaming) to solve the business problem under consideration. During this presentation, I plan to explore an instantiation of the Lambda architecture in the context of an industrial Big Data problem. I will highlight the benefits and limitations of the underlying technologies, the skillsets needed to be successful in their application, and areas of future research.
Driss Ouazar, "Data and/or Big Data, Knowledge, and Modelling of Water Systems - Reality, Expectations,(Un)Certainty?"
Water resources involve complex processes and therefore multi-disciplines including socio-economical and cultural aspects. Data is the key point, from access to weather predictions, real-time sensor data, topography to information about asset hydraulic infrastructures history. Such a knowledge used in conjunction with water system simulation models will save costs while preventing flooding, improve the containment of water during dry periods, prevent damage to agriculture, and help in providing required water quantities with some specified quality to projects. Water resources problems are featured by a huge volume of collected, observed and monitored data, of historical nature, involve stochastic aspects, multi-sources heterogeneous data, multi-scale multi-models, heterogeneous data resulting from various sophisticated numerical and simulation models, and spatial dimension used in conjunction with remote sensing and geographic information systems. Relevant data are very often difficult to find. This makes them complex systems to deal with, both in case of water as a good or as a risk. The operators objectives are to deliver enough water with a reasonable quality satisfying some standards, or to prevent and/or avoid a catastrophic event related to floods and/or droughts, through hydraulic infrastructures designed for such purposes or strategic planning. We will present an overview of such complex systems with emphasis on water issues, data aspects, modelling issues, uncertainty concepts (Data/Big Data, Models, Knowledge), and integration aspects.
Giovanni Montana, "Big Data in Bioinformatics: Mapping the Genome to Learn about the Brain"
In bioinformatics we we are dealing with a growing range of biomedical data, including microarrays, ultra high-throughput sequencing data as well as bioimaging data. In this talk I will first provide an overview of these resources and discuss some of the potential benefits for personalised medicine. One of the emerging challenges is related to the development of statistical methods for the integrative analysis of multiple and heterogeneous data types. I will discuss an application in neuroimaging genomics and present some statistical models that have been proposed to disentangle the genetic architecture of the human brain.