Royal Statistical Society General Applications Section Meeting

Invited speakers

Dr Nicholas Heard, Imperial College London
- "Modelling Dependencies in internet traffic data"
Dr Patrick Rubin-Delanchy, Heilbronn Institute for Mathematical Research, University of Bristol
- "Modelling a network of point processes: an approach based on exchangeability"
Dr Joshua Neil, Ersnt& Young, USA
- "Data Science for Enterprise Network Security Operations"
Dave Palmer , Darktrace
- Self learning immune systems in theory and in practice
Matt Rapier , SELEX ES
- "Why do you want so much data, and all of it in the middle?"

Abstracts

Dr Nick Heard, "Modelling Dependencies in internet traffic data"
NetFlow data are aggregated summaries (meta-data) of the traffic passing around a computer network from one internet protocol (IP) address to another, and lie at the heart of much of the statistical research in cyber-security. Understanding the NetFlow data generated by an IP address requires modelling several layers of dependency: Some NetFlow events are automated traffic, some are generated by humans; some are a mixture of the two, with human events triggering automated ones; and the events typically arise in bursts within a higher level diurnal seasonal pattern. However, a typical enterprise computer network, for example, comprises tens or hundreds of thousands of IP addresses, each with their own specific behaviours and vulnerabilities. And so against the need for complex models to capture the complex behaviour of each IP address is the overarching need for scalable analytics for performing statistical cyber-security monitoring in real-time. This talk discusses some approaches which seek to strike a balance to these conflicting requirements by capturing some of the highest level dependencies in NetFlow data.
Dr Patrick Rubin-Delanchy , Modelling a network of point processes: an approach based on exchangeability
Data from cybersecurity applications often exhibit a network structure where links between nodes have an associated (marked) point process (e.g. the traffic between two computers). Considerations of exchangeability (the node labels are arbitrary) lead to a Bayesian non-parametric model for the network. This is in turn approximated by an "enhanced" stochastic block model, where a block contains a model for a point process, rather than a single Bernoulli parameter. In effect, modelling a network of point processes is translated into a Bayesian clustering problem. We introduce some point process models for computer traffic, and demonstrate a scalable algorithm to address the network clustering problem. Performance is illustrated on NetFlow data collected from Los Alamos National Laboratory.
Dr Joshua Neil , Data Science for Enterprise Network Security Operations
Security analytics has been gaining attention in industry cyber security. There are significant gaps in current security technology and processes that can benefit from statistical methods. In this talk, a general view of industry needs and current approaches to analytics will be reviewed. Following that, a discussion of attack behaviors, their effects on various enterprise data sets, and modeling approaches to score data for security relevance will be presented. We will then give a brief overview of a storage and computing architecture to support these approaches. Finally, the role of data science in daily security operatio ns will be discussed.
Dave Palmer , Self learning immune systems in theory and in practice
Darktrace builds Enterprise Immune Systems - platforms that use layers of competing machine learning approaches and probability theory applied to features extracted from raw network packets to learn the normal behaviour of employees and devices within the enterprise and then detect anomalies worthy of human investigation and, potentially, response. This talk will introduce the technology and mathematics, discussing key challenges such as the veracity of identity, and global scaling whilst maintaining soft-real-time performance. A live demonstration will be used to focus discussion on making anomaly detection usable for non-mathematicians and we will wrap up with lessons learned and findings from over 500 deployments as diverse as banks, factories, power stations, and F1 teams.
Matt Rapier , Why do you want so much data, and all of it in the middle?
Minimising the cyber threat in a network involves processing huge amounts of data – events, network flows, packet captures, intelligence, malware, best practice. The list is endless. Some of those data types have very large volumes, often bigger than the information set being protected. Current cyber architectures seek to deliver that data to the analyst in their darkened lair for mysterious processes to be enacted upon it, yielding understanding of network penetrations. We can see the growth in data volumes blowing the architectural limits of conventional databases with growing adoption of big data technologies by the Cyber product vendors. We explore and contrast this architecture with thinking in service management where data reduction, simplification and statistical analyses monitor our SLAs and ensure faults are identified; also with military C2 where limitations in bandwidth have driven the creation of highly efficient information sharing techniques. This talk seeks to explore whether the classic cyber architectures will continue to be effective, and contrasts these experiences with other fields which have moved beyond the everything everywhere model.

Statistical Aspects of Cyber-Security

Invited speakers

Abstracts