shRNA Design Results – How good is your algorithm?

The last blog entry discussed how hairpin structural features affect representation and effectiveness of shRNA sequences in pooled libraries. However, it is clear that the nucleotide sequence is possibly the major factor that determines the efficiency at which a particular siRNA or shRNA knocks down a target.

Despite a large amount of research over the last several years on various features that affect knockdown potential, a priori prediction of most effective siRNA/shRNA sequences continues to be very much "hit and miss." As anyone familiar with RNAi is aware, there are numerous design algorithms available online from a variety of commercial and academic sources. For example, several publicly available prediction algorithms are listed and described at Protocol Online and on the Charité-Universitätsmedizin Berlin website.

Cellecta, of course, has also focused a large amount of effort divining the critical parameters to effectively predict shRNA. With a pooled screening, even small improvements in our proportion of effective sequences produces significant differences in screening results. We have been using an approach that assesses a number of parameters related to GC composition and distribution. This approach has been quite effective as the data from a recent set of designed shRNA shows (see below). In this case, we designed 5 shRNA to each of 29 gene targets. For all but one of the targets, at least one shRNA construct generated greater than 70% knockdown of the targeted transcript (measured by qRT-PCR) and, for 23 of the targets there were multiple constructs generating >70% knockdown for each target.

result of new shRNA design on knockdown efficiency

The results of the project shown here were produced using a relatively straightforward approach. Recently, as part of an NIH grant, we have coupled knockdown data and results from phenotypic screens with a self-learning algorithm to produce a more sophisticated analysis that assesses over a hundred characteristics in a tree like process to identify "good" shRNA. This sort of analysis has increased the fraction of effective shRNA (shRNA that knockdown that target by >70%) by about 10%--from about 65%-70% to close to 80%. Access to this design algorithm is available on the DECIPHER Project website. In the end, the improvement seems to be a result of eliminating the worst performers, rather than improving the prediction of the best shRNA. This indicates that improvements in effective prediction may not lie with identifying which parameters enhance or promote effective RNA inhibition, but rather, in identifying which parameters can potentially disrupt inhibition.

Please email info@cellecta.com with any comments.

Also in Cellecta Blog & News

Perturb-Seq Screening: Cell-by-Cell Analysis of Gene Perturbations Induced by Pooled CRISPR sgRNA Libraries

Gene Expression Profiling of Single-Cell Samples: DriverMap Targeted Expression Profiling vs SMART Technology

Single-cell expression analysis provides insights about gene expression and cell heterogeneity at the single-cell level. It enables the elucidation of intracellular gene regulatory networks and intracellular pathways that would otherwise be masked in bulk analysis (Massaia et al., 2018). The DriverMap™ Targeted Gene Expression Profiling (TXP) assay combines highly multiplexed RT-PCR amplification with the depth and precision of Next-Generation Sequencing (NGS) to quantitatively measure gene expression of up to 19,000 target genes in a single assay–even down to the single-cell level.

Comparing DNA vs. RNA Samples for Immune Repertoire Profiling

Adaptive immunity relies on B and T cells that recognize foreign antigens via hypervariable B cell and T cell receptors (BCRs and TCRs). Diversity among B cell and T cell receptors is primarily produced by V(D)J recombination, which involves the shuffling and joining of the variable (V), diversity (D), joining (J), and constant region (C) gene segments. This results in a diverse repertoire called the adaptive immune repertoire (AIR) that comprises multiple individual clonotypes (sequence) for particular receptor chains.