RNA expression profiling of biological models to identify which genes are expressed and which pathways are active in biological systems under different conditions is central to unraveling how genes control biology. For years, qRT-PCR has provided a reliable and robust approach to assess transcript levels for any gene. However, the qRT-PCR assay is limited to just a few gene targets. Approaches using arrayed oligonucleotide probes developed to address this throughput limitation, but RNA expression analysis using microarrays does not provide the robust measurements of expression levels over a wide dynamic range that qRT-PCR does. In addition, indirect detection by fluorescence of hybridized transcripts to arrayed oligonucleotides suffers from noise and background signals caused by cross-hybridization.
With the advent of affordable and accessible next generation sequencing (NGS) technology, complete sequencing of cDNA made from RNA isolated from biological models of interest became much more feasible. RNA-seq approaches have overtaken microarrays as the method of choice for genome-wide expression analysis.
While RNA-seq provides an accurate reading of the transcripts present in biological samples, it can be a challenging assay to run for large numbers of samples. In addition RNA, sample preparation, especially for groups interested mainly in protein expression genes, requires mRNA enrichment. For blood samples, additional work is required to deplete beta-globin gene levels. Even after this preparative work on the RNA is completed the assay involves cDNA synthesis of the total transcriptome, followed by generation of NGS libraries from the total complement of fragmented cDNA. NGS analysis of these complex samples requires significant sequencing depth—typically 25-50M reads—as well as involved deconvolution and bioinformatic processing to estimate the copy number of each gene from large numbers of variable gene fragment reads.
An alternative to sequencing the entire transcriptome is to use a targeted RNA sequencing approach. This is the basis of the DriverMap Targeted RNA Expression Assay developed by Cellecta. Rather than reverse-transcribing the whole transcriptome, the DriverMap method combines multiplex RT-PCR amplification to amplify a defined and conserved 80- to 250- base segment of the transcript of each targeted gene, and then uses next-generation sequencing (NGS) to quantitatively assess abundance levels of each of these transcript amplicons (Fig. 1).
There are significant advantages to targeting, amplifying, and sequencing carefully selected discrete loci of interest in the expressed transcriptome. From a procedural perspective, using gene-specific primers that target each gene of interest for the reverse transcriptase and PCR reactions obliviates the need for RNA preparation to remove unwanted sequences such as ribosomal RNA, beta-globins, or other non-coding RNA. Only transcript sequences matching the targets of interest are amplified. The rest ignored by the chemistry. As a result, only total RNA is needed for the assay, and, since it is PCR-based, not even much of that is required. As little as 10 pg of total RNA from single-cell lysate is enough to detect most target transcripts.
The DriverMap RT-PCR targeted approach also greatly simplifies the cDNA library that needs to be sequenced. A multiplex assay that targets all 19,000 human protein-coding genes produces, at most, only 19,000 amplicons for sequencing. The NGS read depth required to reliably read several thousand targeted amplicons is much less than the depth required for the whole transcriptome. As a result, the targeted approach detects lower level expressed transcripts more consistently and reliably than other RNA-seq approaches and with much less sequencing. (Fig. 2)
In addition to requiring fewer NGS reads, analysis of the sequencing results is much simpler using DriverMap targeted RNA sequencing since each targeted region of cDNA provides a reference sequence with which to directly compare sequencing reads. There is no need to estimate gene copy number from assembles cDNA fragments. With targeted RT-PCR the read levels of each amplicon directly correlate to the expression levels of the target transcript. The analysis can be done entirely in an excel spreadsheet using housekeeping genes as references in a manner similar to standard qRT-PCR.
Key to the DriverMap Targeted RNA approach is the efficiency and specificity of the multiplex PCR target amplification step. Numerous factors which may be tolerated in simple PCR reactions–including secondary structure, non-specific binding, primer binding inefficiency and primer-primer interactions–can cause significant problems in a multiplex PCR environment that amplifies many thousands of templates in the same reaction.
Development of the DriverMap assay required sophisticated primer design work and an iterative empirical testing process. For this, we developed a bioinformatics primer design pipeline and carried out experimental validation of thousands of PCR primers in a highly complex multiplex reaction (see Fig. 3). This approach enabled us to generated an optimized genome-wide, multiplex PCR assay for all 19,000 human protein-coding genes.
Primers were initially selected and screened for several characteristics. For example, all primers are selected to have GCA-rich sequences (with minimum T content) in order to reduce primer-dimer formation in multiplex PCR assay. In addition, primers are selected for high specificity, high Tm, and a small size of amplicon with a balanced nucleotide distribution to ensure robust NGS.
In addition to the bioinformatics selection, each primer set also had to be experimentally tested. For each target mRNA, we synthesized the best 5–20 PCR primers then ran them in a several multiplex RT-PCR reactions using a set of universal control human tissue/cell line RNAs (Fig. 4), mouse negative control RNAs and positive control DNA as templates. For the genome-wide DriverMap Targeted RNA Expression Assay one set of primers targeting the conservative portions of different mRNA isoforms with the highest efficiency, sensitivity, and specificity was selected for each mRNA. Furthermore, for highly abundant transcripts, we selected primers with low specific efficacy, which allowed us to solve the problem of over-sequencing highly expressed target transcripts.
While the DriverMap Targeted RNA Profiling offers numerous benefits over standard RNA sequencing, it is not a replacement for the more broadly applicable approach RNA-seq. The DriverMap assay requires knowledge of the transcripts of interest. Since cDNA synthesis and PCR primers target specific regions in expressed RNA from known genes, it is not a useful approach to identify novel transcripts nor a good approach to investigate alternative isoforms of particular genes. Also, the assay, as configured, targets protein-coding genes. It will not detect non-coding RNA or other untargeted transcripts. However, customized assays for these purposes can be developed.
Generally, the DriverMap Targeted RNA Expression Profiling Assay provides a convenient and effective method to identify which pathways are active in your biological samples of interest. As a PCR assay, it is particularly effective when samples are limiting with very small amounts of RNA. In addition, due to the ability to target selected sets of genes, it can be especially useful to quantify cell or biomarkers expressed at very low levels. Custom assays can also be designed to target specifically target low-level genes of interest. Particularly for investigators interested in measuring transcript levels of specific cell fractions in heterogeneous samples, such as immune cells in tumor samples, this ability to pick up low expressed genes can be crucial.
Cellecta offers researchers access to the DriverMap assay as a kit to enable you to run the assay in your lab, and as a service, if you prefer to send us your samples for targeted expression profiling.