Cancer Models Forum

Posted by Steven, June 2019

How to find the perfect in vivo cancer model with your gene expression signature

It is a problem that many translational scientists have faced. You’ve got some great results from testing a compound in vitro, providing a valuable piece of evidence that the compound is effective against cancer. But what next? Before you rush out to test the drug in a clinical trial, most therapies will require further evidence that the compound works in vivo. In the era of targeted therapy, this poses a challenge: what is the best cancer model to use for in vivo testing, and how do you find it? The key issue is finding an animal model that has the molecular characteristics of the patient subgroup that you hope to treat with your drug. For example, those molecular characteristics might be a specific DNA mutation, a gene expression signature or structural DNA variants.

In this blog post, part of a series looking at some of the challenges of finding the perfect preclinical cancer model, we’ll be looking at how to confirm that a model from a contract research organisation expresses your genes of interest, and how the Cancer Models Scout team at Repositive might be able to help!

Searching for your perfect preclinical cancer model

The first step is understanding your requirements. For example, are you looking for a patient-derived xenograft (PDX) model or a syngeneic mouse model? Do you need a model from a specific indication or is the intended patient subgroup agnostic of cancer type? Are you interested in the expression of a single gene or do you have a multi-gene signature you want to test your therapy against?

We’ve put together a downloadable checklist which helps you gather all this information in one place.

You can then start to search for a preclinical cancer model to use. This might be via Google, PubMed, or simply emailing the preclinical oncology CROs that you know (see how Repositive can simplify this step below!). If you strike gold and find a model that has been previously shown to fulfil your gene expression criteria, well, then you’re done! However, the chance of this is rare. Ultimately you need to find a model that has had its RNA sequenced, or profiled by microarray, so that you can determine the expression levels of the genes you are interested in.

If you are interested in a rare cancer type, then a candidate model that you find is less likely to have RNA sequencing data available. Therefore, before you can get started on planning your in vivo experiments, you might need to organise RNA sequencing.

Comparing apples with apples

While next-gen RNA sequencing is now comparatively cheap and accessible, running the analysis to interpret the results still requires significant bioinformatics know-how. For the scientist trying to validate an anti-cancer compound, spending time understanding the difference between FASTQ, BAM & SAM files, which aligner to use, and which analysis package is best for differential expression analysis, isn’t really what you set out to do.

In particular, there are several issues commonly encountered with RNA sequencing data of preclinical cancer models:

1. Lack of a normal reference sample. Many cancer models are isolated from patient samples, but don’t have a matched normal tissue sample from the same patient. This makes determining the relative gene expression level difficult.

2. Making meaningful comparisons. If your gene expression signature was focused on patients with high expression of BRAF, the question is: high expression relative to what? High BRAF expression relative to other tumours of the same type? High expression compared to normal cells from the same tissue? Or high expression relative to all other cancer models? Regardless of which angle you’re taking, assessing the gene expression for a given model will require you to have a panel of data (either other tumours or normal samples) to compare against.

3. Batch effects. Each sample in your panel of data may have been produced at different times, by different people, using different sequencing technologies, resulting in fluctuations in data between samples which have nothing to do with any underlying biological changes. This leads to the challenge of removing the non-biological variation in the data that is the result of different sequencing runs but retaining the important biological signal. Using house-keeping genes as a reference can be one useful approach for normalising between samples, but even commonly used house-keeping genes can vary under some conditions! (1)

The complexity and technicality of the bioinformatics analysis, plus the associated time and money required to validate data for individual models, means that this approach isn’t a viable option for many researchers and biotech companies. Especially given that there is no guarantee the model will have the gene expression signature you’re after. Or worse, the risk of a poor analysis incorrectly returning a false-positive result could lead to months of wasted effort evaluating your therapy in an unsuitable model.

How Repositive can help

Repositive offers a Cancer Model Scout service that will help you to identify the perfect preclinical cancer model for your experiment. We have the largest directory of cancer models, many of which have molecular characterisation, and our in-house team of bioinformatics experts have already run standardising pipelines on the available RNA sequencing data.

This means we can quickly tell you whether there are models that match both your tissue and expression criteria and connect you to the relevant providers, saving you the hassle of countless hours searching Google, PubMed and sending numerous emails!

Are you searching for a model with particular gene expression levels? Get in touch to discuss your requirements with the team as part of your free initial consultation.

Further reading: (1) Greer S, Honeywell R, Geletu M, Arulanandam R, Raptis L. "Housekeeping genes; expression levels may change with density of cultured cells". Journal of Immunological Methods. 355(1–2): 76–9. doi:10.1016/j.jim.2010.02.006. PMID 20171969.

Image credit: ©Juan Gärtner -

Posted by
Steven Williams

Steven Williams

Data Scientist
See all Steven's posts