Guest blog post written by Dr. Altuna Akalin, Head of Bioinformatics and the Omics Data Science Platform at the Berlin Institute of Medical Systems Biology (BIMSB), Max Delbruck Center
Repositive recently partnered with the Bioinformatics & Omics Data Science Platform research group to assist us in gathering data and carrying out validation tests for our cutting-edge machine learning model that functions as a ‘search engine for tumours’. Our model can be used to help with the selection of cell lines for early-stage drug discovery pipelines, biomarker discovery, and target identification, as well as for carrying out patient stratification.
At the BIMSB, our mission is to integrate different levels of gene regulation into comprehensive and predictive models that elucidate their function in health and disease, which is, by its nature, a truly interdisciplinary endeavour. Within BIMSB, our research group has a dual mission: developing new machine learning and data analysis tools and using them to investigate questions in biology and medicine. We have published extensively in this area and are moving towards research areas with more diagnostic potential .
We have recently developed a method that helps us classify tumours based on their multi-omics profile . Our method is based on deep learning and integrates gene expression, single point mutations, and copy number variations from tumour biopsies. To generate this classification system, we used latent factor analysis, an unsupervised learning technique that involves feeding an algorithm large portions of unlabelled data and allowing the model to summarize patterns that span different data types and thereby group tumours accordingly. We can use this technology to predict clinical variables from tumour biopsies, such as patient survival and drug response .
In machine learning, a critical aspect of algorithm development is testing your methods on a data set that has not been used in the initial training. Therefore, since our algorithm training was based largely on publicly available datasets, we needed to seek out new non-public datasets to be able to test our methods appropriately. We found Repositive via a simple google query looking for companies that have data on PDX models. We then contacted Repositive and asked for help regarding our search for genomics data from PDX models. Repositive’s Head of Business Development, Jeff Almeida-King, was very responsive and agreed to assist us with our search via the Cancer Model Scout service, a concierge service that involves Repositive’s bioinformatics team searching for preclinical cancer models tailored to your specific needs. With the help of Repositive, we were soon connected with the relevant CROs and thus we were able to quickly come up with a validation set to test our machine learning models.
 Wolfgang Copp et al. (2020): “Deep learning for genomics using Janggu”, Nature Communications, DOI: 10.1038/s41467-020-17155-y.
 Ricardo Wurmus et al. (2019): “PiGx: Reproducible genomics analysis pipelines with GNU Guix”, GigaScience
 Jonathan Ronen et al. (2019): "Evaluation of colorectal cancer subtypes and cell lines using deep learning", Life Science Alliance, DOI: 10.26508/lsa.201900517.
Image credit: sourced from Unsplash
Repositive is helping to accelerate cancer drug development in order to bring new treatments and cures to patients as a quickly as possible.
Try the free version of our Cancer Models Platform today!