Comments on “Clinical drug response can be predicted using baseline gene expression levels and in vitro drug sensitivity in cell lines”
The opinions expressed here are those of only Paul Geeleher and not my colleagues or institution.
I recently co-authored a paper in Genome Biology entitled “Clinical drug response can be predicted using baseline gene expression levels and in vitro drug sensitivity in cell lines”. The paper proposes a solution to a problem that has a long and sordid history in pharmacogenomics (see “Deception at Duke” on YouTube).
Because of continued work on these and similar data, some things have come to my attention since the publication that I would like to discuss here. I would also like to give some additional background on how the project came about.
The project did not begin with in vivo prediction in mind. In fact, I had been working on a very large dataset that we had received from a collaborator at a major pharmaceutical company, a component of which was a large panel of cell lines that had been treated with one of the drugs in CGP. I was interested in testing how well I could predict drug sensitivity in this dataset, using models derived on CGP; this was an interesting question given the highly divergent microarray platforms used by these studies (a two-channel array platform Vs a single-channel Affymetrix array). I informally tested a few methods, achieving decent results, until I stumbled across two papers in the literature which had compared the performance of a host of methods for their ability to predict survival phenotypes from gene expression microarray data. One method, ridge regression, was identified by both studies as the top performing method, so I proceeded to apply it to my cell line data, where I got results that were substantially better than previous methods that I had applied. At the same time, I had also been investigating methods for integrating data from different microarray platforms, but luckily there had also been a large comparison published for these, which recommended the ComBat function from the R library SVA. In conjunction with these methods I had been long familiar with the concept of remapping microarray probes to the latest build of the genome using the BrainArray data. Thus, these methods formed the bones of my pipeline and they performed very well in my panel of cell lines. While working on the panel of cell lines it also came to my attention that removing genes with very low variance seemed to cause a very slight performance increase, which would be in line with biological expectations. As there are no established methods of removing low variance genes (that I am aware of) I simply removed the lowest 20% of genes, based on a visual inspection of a histogram of the variance of all genes in CGP. I should note that these results on the panel of cell lines did not make it into the paper because these were unpublished data obtained for an entirely different project, but I will happily share the code (including some other possibly embarrassing early efforts) if you are interested and email me!
So, with this pipeline in place (containing components that the literature suggested were as strong as they could be) and having recently become aware of the “Potti Scandal”, I decided to give the in vivo prediction a shot. I obtained the Docetaxel dataset from Potti’s flagship Nature Medicine paper. To my considerable surprise, the pipeline yielded a significant result with the expected directionality. I was quite amazed by this and decided that I should obviously try to reproduce this result in other clinical trials. The next suitable trial that I identified was for Bortezomib in Myeloma and in this case I was again surprised to find results that were even more significant than previously (although the sample size was considerably larger). However, my optimism was subsequently tapered somewhat by null results in a Cisplatin clinical trial in breast cancer and for Erlotinib and Sorafinib in lung cancer. This led to prolonged investigations. A possible reason for the Cisplatin result is that there appeared to be issues with the clinical data, possibly because the drug isn’t typically used to treat breast cancer and variability in drug response is very low; this and related issues are discuseed at lenght in our paper.
However, I noted that the distribution of drug sensitivity data in Erlotinib and Sorafinib (in the CGP cell line training-data) were drastically different from those of Docetaxel, Bortezomib and Cisplatin, with a very obvious trend towards only a few cell lines responding to these drugs (interestingly, also consistent with biological expectation), rather than the far more uniform distribution of the previous data. This led me to hypothesis that there may be issues with fitting a linear model to these data (discuseed at lenght in the paper) and as an alternative approach, I dichotimized the drug response data and fit a logistic regression model, while keeping as much of the rest of the pipeline in tact as was possible. The data were dichotimized using the number of cell lines that achieved a measurable IC50 on one end and a large number of resistant cell lines on the other – but I would suggest that more transparent and robust methods should be developed in future. However, again to my surprise, this immediately yeilded a signficant result for Erlotiinib. A null result was achieved for Sorafenib, but this also leads an interesting discussion point. One very plausible explanation is that it is thought that one of the main mechanisms for the anti-tumor activity of Sorafenib in vivo is as an angiogenesis inhibitor. This is achieved by inhibiting VEGF. Thus, a cell viability assay may not be a good model for the in vivo activity of this drug (and potentially other drugs). Considering whether the in vitro model is reflective of the expected in vivo biology is a very important concern and one that should always be stronly considered in pursuing such lines of research.
There are some potential criticisms of the Erlotinib result, particularly that it was only demonstrated on a single dataset and this is where I would ask readers of the paper to exercise caution when interpreting this result. The number of samples is small (only 25) and we need to wait and see if this result will generalize to larger datasets (which are apparently coming soon from the BATTLE trial), however, there are a number of reasons that this result was still definitely worth including in this paper. Firstly, using this example, we have highlighted an important difference in the distribution of drug response phenotype across a panel of cell lines for highly-targetted versus more-broadly-cytotoxic drugs. This is a very important consideration for the choice of model / machine learning algorithm and should absolutely not be ignored by future investigators who attempt similar analysis. Perhaps most importantly, the model worked right out of the bag and is thus impossible to ignore. There are some issues, but a larger dataset will eventually put the issue to rest one way or another. One interesting side note is that it has recently come to my attention that the same model actually also performs well in predicting drug sensitivity in the Erlotinib CCLE data, which I have been working with recently as part of a new project.
CHOICE OF MACHINE LEARNING ALGORITHM
There have been some interested parties in terms of the choice of ridge regression. It is important that I re-iterate that, strong literature support should arguably be considered the most unbiased means by which to chose an algorithm for such a study. Given the existing support, it is baffling to me that this algorithm has been ignored for prediction of pharmacogenomic phenotypes. Why? I don’t know. As a side point, I would like to emphasize that the comparisons to Lasso and ElasticNet regression were included because a “comparison of methods” was requested by all three reviewers, a request that I have some problems with, given the literature support for ridge regression. I should also emphasize, that no algorithm other than ridge regression was ever applied to the clinical datasets prior to these reviewers requests, as the inclusion of these results could easily lead to a dangerous accusation of selecting methods based on performance in the datasets, which is likely to lead to a useless model.
I hope that the future will lead us to identify further datasets on which this method can be tested. My dream is that substantial progress can be made in terms of personalized cancer therapy by using statistical models constructed on models system (be they cell lines or anything else). This will require focused attention to best practices in machine learning and data mining. We hope that we have taken an important first step and are excited to see a productive future for this field.