Archive for the 'Uncategorized' Category

Comments on “Clinical drug response can be predicted using baseline gene expression levels and in vitro drug sensitivity in cell lines”

I recently co-authored a paper in Genome Biology entitled “Clinical drug response can be predicted using baseline gene expression levels and in vitro drug sensitivity in cell lines”. The paper proposes a solution to a problem that has a long and sordid history in pharmacogenomics (see “Deception at Duke” on YouTube).
Because of continued work on [...]

Split a chromosome name and location type string into its constituent parts in R

Easy to do with a regular expression and the strsplit function in R. The | operator means “or”. For example for “chr15:88120587-88121480″:
> unlist(strsplit(“chr15:88120587-88121480″, “chr|:|-”))[2:4]
[1] “15″ “88120587″ “88121480″

Principal Components Analysis Explained using R

Here, we will explain principle component analysis (PCA) by stepping though the algorithm manually and reproducing the output of the prcomp() function in R, which is normally used to do PCA.
First make up some data and plot it; in terms of gene expression analysis, we can think of the rows of the matrix below as [...]

Writing a group of R data.frames to named Excel Worksheets

First install “WriteXLS” library in R using:
source(“http://bioconductor.org/biocLite.R”)
biocLite(“WriteXLS”)
Now if I have 3 data.frames called “genes”, “proteins” and “elephants”, all I need to do to write them to the same Excel file, on different *named* worksheets is:
library(“WriteXLS”)
WriteXLS(c(“genes”, “proteins”, “elephants”), “WriteXLS.xls”)
Note, for this to work, Perl needs to be installed. You also need the “Text::CSV_XS.pm” library. On Ubuntu [...]

Use “org.Hs.eg.db” to map between Entrez Gene Ids and HUGO gene symbols in R

To translate between these identifiers in R, this code creates a table with the mapping:
library(org.Hs.eg.db)
e2s = toTable(org.Hs.egSYMBOL)
To have a look at the mapping (which is stored as a data.frame)

head(e2s)

This will output something like this
gene_id symbol
1 1 A1BG
2 2 [...]

In R, list all files in current directory ending with “.CEL”

Use regular expression matching, where the “$” means that this string ends with the preceding string:
dir(pattern=”.CEL$”)

Use biomaRt to tranlate HUGO to Entrez gene Ids.

We can use the R package biomaRt to conveniently convert between different types of gene ids. In this example we will convert official HUGO gene names to entrez gene ids.
First we load biomaRt in R using the current ensembl database for human:
library(biomaRt)
ensembl

Find files in linux command line using wildcards

To find a file whose name contains the text “lostfile” in the current directory and all subdirectories:
find . -name \*lostfile\*
To search the filesystem for such a file simply do:
find / -name \*lostfile\*