The data dive will be focussed on the following problems:
  • Models for Ovarian Cancer, Eric Abogaye and Haonan Lu
  • Performance of small area screening for Chlamydia , Nathan Green
  • Predicting mortality in heart disease patients using videos of cardiac motion Declan O'Regan

      Models for Ovarian Cancer

      High Grade Serous Ovarian Cancer (HSGOC) is the most lethal gynaecological disease and the 5-year survival is 35%-40%. A subset of HGSOC patients has extremely poor survival rate, although no robust prognostic biomarker is currently available to identify these patients. It is thus crucial to develop novel prognostic biomarkers to stratify patients for personalised treatment. Previous biomarker development has been focused on mainly molecular profiles; until recently, the interest in using quantitative medical imaging data as biomarkers was limited. Medical images are non-invasive, cost-efficient and more representative of the entire disease compared to molecular profiles which come from biopsies. Here we present the recent quantitative imaging data we collected for 283 HGSOC patients with associated comprehensive clinical and molecular profiles. We aim to develop robust predictive and prognostic biomarkers for HGSOC patients using the imaging data.

      Data

      Radiomics refers to the quantitative measurement of mesoscopic-structural features in medical images. Previously, a radiomic analysis software, TexLab2.0, was developed in our group which could derive 657 radiomic features, covering the shape, size, texture and wavelet decompositions of a CT scan. The procedure for generating radiomic profile entails: 1) Obtaining CT scan from cancer patients at the presentation of the disease; 2) Segmentation of tumours in the scan by a radiologist; and 3) upload of the segmented CT scans into TexLab2.0 with output as radiomic data. We have collected radiomic profiles for 283 HGSOC patients using TexLab2.0. We have also curated clinical outcome data, proteomics (246 cases), transcriptomic (70 cases) and copy-number profile (80 cases) within the same cohort.

      Questions

      We have shown that radiomic profiles contain important prognostic information and are closely correlated with biological pathways, in previous studies. However, we also found a few challenges while analysing these data.
      • The CT scans were generated from different CT scanners, which have different CT scan thickness. We have shown the radiomic profile is moderately associated with scan thickness and therefore would like to develop a method to normalise these data.
      • HGSOC patients are often presented with bilateral disease, meaning two tumours are found for one patient. Although these two tumours are quite closely linked, two different radiomic profiles are generated for these bilateral patients. In our previous study, we selected one of the tumours that was more representative of the patient. However, we wonder is there any better method to analyse these patients with bilateral disease: e.g. integrate the two radiomic profiles.
      • Previously we split the 283 cases into three datasets (one discovery and two validation datasets) and used LASSO regression to develop a prognostic model consisting of 4 radiomic features in a discovery dataset and validated the model in two validation datasets. We believe there are more features which could be prognostically important and would like to investigate whether a better prognostic model could be developed using these three datasets.
      • Other than developing prognostic models, we have also collected proteomics data (which quantified expression levels of 300 proteins and phosphoproteins), transcriptomic data, and DNA copy-number profile for a subset of cases in the same cohort. We have shown that the prognostic model we developed is strongly linked to certain pathways (e.g. stroma and DNA damage repair). Therefore, we wonder whether it would it be possible to develop radiomic-based models to predict important genetic changes or biological pathway activations.

      Performance of small area screening for Chlamydia

      The Natsal dataset is a large sexual health survey with a complex design. Many previous analyses of this data have failed to account for the structure of this data in time and space, as well as sampling mechanisms. Questions of interest here include
      • Fine-scale spatial analysis of chlamydia prevalence
      • Comparing effectiveness of screening programmes.
      • Integration of national-level disease surveillance data.

      Predicting mortality in heart disease patients using videos of cardiac motion

      Background

      Pulmonary Hypertension (PH) refers to a variety of conditions characterized by elevations in pulmonary arterial pressure. In some patients, PH follows a rapidly progressive clinical course ultimately culminating in heart failure and death. Due to its progressive nature, early risk stratification is important, as it allows identification of patients at risk of rapid disease progression. Risk assessment tools have been developed to predict survival among PH patients using diagnostic indicators of cardiac function such as exercise testing, serum biomarkers, etc. However, these conventional measures do not adequately capture the complex shape and contraction pattern of the diseased heart, and are relatively insensitive to the subtle changes in contractile function which are predictive of rapid progression of PH.

      Problem Description

      Cine MR (Magnetic Resonance) images are short “videos” that depict heart motion throughout the cardiac cycle. Cine images are typically obtained by repeatedly imaging the heart at a single cross-sectional location at various points throughout the cardiac cycle. This imaging technique provides a non-invasive means for high-resolution visualization of morphologic changes in the heart as it beats in real-time. In recent years, various Deep Learning architectures have been developed for analysis of videos to perform tasks such as video classification (action recognition), semantic video segmentation, etc. The objective of this project is to apply these deep learning approaches to the cardiac imaging domain. We hypothesize that Deep Video analysis of MR-derived Cine image sequences can fully leverage spatiotemporal information about complex changes in cardiac contractile function that predict survival/prognosis in PH patients.

      Dataset Description

      Outcome Variable

      The primary outcome measure will be right-censored time to all-cause mortality, i.e. the time from the date of diagnosis to date of death/censoring.

      Predictor/Input Variables

      The input data is in the form of “cardiac motion videos”, i.e. Cine image sequences pre-processed as follows: starting from a Cine cardiac MRI scan, segmentation (delineation of well-defined heart regions) will be performed on the Cine MR images (see Figure below). The by-product of this process will be a set of segmented Cine images that characterize cardiac motion and regional contractile function of the heart at a given cross-sectional slice location. These time-resolved image sequences (1 set collected on each patient) will be used as input data.

      Sample size

      We anticipate full data on ~700 PH patients, with no missing values in the predictor/input and outcome variables.