Archive for the ‘Technical’ category

Another Group Finds Similar Keys to Optimal Pooled shRNA Library Screens

March 20th, 2012
Share

Our group recently ran across an article describing an independent RNAi screen with a non-Cellecta pooled shRNA expression library that piqued our interest. In the October 2011 online Genome Biology Journal, Sims, et al. comprehensively described how to run a rigorous genome-wide pooled RNA interference screen using next generation sequencing. The article thoroughly describes the procedural steps involved in screening a heterogeneous pooled library of thousands of lentiviral shRNA expression constructs. Although they used a library somewhat different than our design (the lack of unique sequenceable barcodes being one notable difference), the study nicely demonstrates many of the requirements to ensure meaningful screening results and emphasizes the need to use high throughput next-generation sequencing (as opposed to microarray hybridization) for reproducible measurements of shRNA depletion or enrichment following selection.

Viability or “drop-out” screens that look for depletion of shRNA sequences in selected populations to identify essential genes are one of the most common applications of pooled shRNA screening. The Sims et al. study focuses primarily on the key factors to ensure reproducible results for these screens. Among the most important ones, they note the following:

  1. The shRNA expression library itself must be generated systematically to minimize variation in hairpin representation. This should be assessed by HT sequencing of the plasmid form of the library. Interestingly, Sims et al. also found that the plasmid library is a better reference for starting hairpin representation than the pseudoviral packaged library, which is consistent with our experience at Cellecta, too.
  2. It is essential to manage cell numbers to maintain hairpin representation through the whole screen. Specifically, Sims et al. recommends maintaining at least 1,000 cells per RNA–which is also the ratio we find optimal as described in an earlier blog post. They also caution against letting cells grow past 70% confluency before replating.
  3. Following selection, it is important to amplify sufficient genomic DNA to ensure a representative population from each cell sample. For their library of 10,000 shRNAs, they used at least 60ug of genomic DNA for pre-sequencing PCR amplification. We too find similar amounts necessary (i.e., for 27,000 shRNA, we us 200 ug/sample).
  4. Biological replicates are a requirement to overcome stochastic noise inherent in the screen. However, replicates should have a high level of reproducibility with R-squared values of 0.9 or better.
  5. The pooled shRNA library must be a reasonable size to enable practical handling of the cell populations, genomic DNA amplification, and biological replicates required for an effective screen. Sims et al used a library with 10,000 shRNA.

    As a result of the thorough technique, Sims et al. estimated they were able to identify more than 98% of the hairpins in all replicates. One distinct difference in the Sims et al. library compared with Cellecta’s is the presence of an barcode, that is, a unique readily identifiable sequence separate from the hairpin sequence that can be used to identify the particular shRNA in the expression cassette. Somewhat confusingly, though, Sims et al. used the term “barcode screen” although no barcode is present in their library. Detection of shRNA levels in selected populations was done by sequencing a portion of the shRNA encoding region. From our experience, use of a separate unique barcode optimized for sequence analysis increases sequencing calls and helps improve replicate correlations. Sims et al. did find that the pre-sequencing PCR step introduced a certain amount of noise in the data, which is consistent with amplification variability of shRNA sequences as opposed to short standardized barcodes.

    The consistency of the general findings of this independent study with our experience, however, is very encouraging. Using a similar but distinct library, Sims et al. have uncovered many of the same critical requirements for optimal screening with complex shRNA pools as we have. This alignment emphasizes the importance of these procedural details to obtaining meaningful screening results, and it provides additional support for RNAi screening standards of practice that was the topic of the previous post.

    Sims et al. also mentioned the development of two open source programs for computational analysis of pooled shRNA screening results–shALIGN and shRNAseq: http://rock.icr.ac.uk/software/shrnaseq.jsp.

    Share

    The Need for RNAi Screening Standards

    February 3rd, 2012
    Share

    A couple of months ago at the CHI Discovery on Target Conference, Hakim Djaballah, Director of the HTS Core Facility at the Memorial Sloan Kettering Cancer Center, gave a unique and insightful presentation highlighting the challenges RNAi screening to identify lethal loss of function interaction in oncogenic systems.

    For the exceptional benefits RNAi offers as a targeted tool to elucidate gene function, it is still a relatively new technology with limitations, potential, and idiosyncrasies that remain somewhat undefined.  These features are especially evident when RNAi is adapted for large-scale screening of gene function where small details in the set-up, screening process, quality of the reagents, and types of cells can significantly affect the variation and consistency in the large amount of data generated.  By highlighting some disappointing follow-up results from initially exciting high-profile publications, Dr. Djaballah identified a few critical benchmarks for evaluating RNAi screening.

    Dr. Djaballah’s group looked at three potentially high value cancer targets identified in independent loss-of-function RNAi screens.  In addition to the publications, his group reviewed the primary screening data in more detail to evaluate the procedure and statistical significance of the data. The first screen, published in May 2009 in Cell (Scholl, et al.), identified STK33 as required for KRAS oncogenic activity.  Initially, a very exciting discovery, a number of groups pursued STK33 as a potential therapeutic target.  However, subsequent groups (Barbie, et al. and Luo, et al.) failed to find the STK33 among the strong hits in similar screens.  A recent publication in September 2011 by Babij, et al. in Cancer Research, which also does not see STK33 as a hit in a similar shRNA screen, presents compelling data indicating STK33 is not, in fact, generally essential for survival KRAS-dependant cells.  Although, based on recent letters to the editor in Cancer Research, this does not seem to be the end of the discussion between these two groups, the initial excitement of STK33 seems to have been premature at best.

    The KRAS synthetic lethal screens by Barbie et al. mentioned above that did not pick-up STK33 as a strong hit their screen did, however,  identify another potentially interesting gene—the IκB kinase TBK1—that appeared essential for, but previously unknown to be involved in,  KRAS lethality.  This target was not found by other groups and has yet to be confirmed, but Dr. Djaballah had some reservations as to whether statistical analysis of the data really supported this as a true “hit” or simply an outlier.  Dr. Djaballah also had similar concerns with a recent Nature Letter in Oct. 2011 by Zuber et al that identified Brd4 as an essential gene and possible therapeutic target in acute myeloid leukemia cells.  A more detailed review of the data from this screen revealed some issues with the confidence of this hit, as there was significant variability in the CV values and a couple of the shRNAs were enriched by as much as million fold.

    Dr. Djaballah’s discussion was clearly not intended to disparage any specific study, but rather to demonstrate the slippery potential of over-interpreting the extensive data produced by such a powerful approach.  Based on the amount of resources and effort put into a complex screen, it can be difficult to maintain the reserve required to coldly and rigorously analyze the experimental design and results in a detached manner and properly assess which candidates really meet the criteria for follow up.  As a relatively new screening technology without much in the way of standards and defined good practices, it is easy to prematurely “fall in love” with potentially interesting targets that may be just noise in the data.  From his analysis, Dr. Djaballah suggests paying particular attention to the following three aspects:

    1. Do the infections at the appropriate MOI and with sufficient cells to assess the effect. Although Dr. Djaballah was primarily talking about arrayed screens (with single shRNA plasmids in wells), the points is also very valid for pooled screening.  It is critical to ensure there are each cells containing each shRNA to be assayed to generate reliable reproducible results.
    2. Use correct passage times. Whether the screen requires looking at knockdowns or survival, it is important that the cells are maintained for a long enough passage number to produce a significant differential between affected and non-affected cells.  Conversely, too many passages will introduce too much noise.
    3. Pay attention to general data and overall results. It is important to see if the overall screening results make sense.  For example, are known lethal genes showing up in the hits?

    While pointing out the importance of these considerations in evaluating a screening, Dr. Djaballah contended overall that there are currently few standards of practice to provide guidelines around these and similar procedural details.  We at Cellecta would agree, having focused most of our effort in the last several years toward optimizing the many subtleties of pooled shRNA screening to enable consistent, robust, and interpretable results.

    Share

    shRNA Design Results – How good is your algorithm?

    August 3rd, 2011
    Share

     

    The last blog entry discussed how hairpin structural features affect representation and effectiveness of shRNA sequences in pooled libraries. However, it is clear that the nucleotide sequence is possibly the major factor that determines the efficiency at which a particular siRNA or shRNA knocks down a target.
    Despite a large amount of research over the last several years on various features that affect knockdown potential, a priori prediction of most effective siRNA/shRNA sequences continues to be very much “hit and miss.” As anyone familiar with RNAi is aware, there are numerous design algorithms available online from a variety of commercial and academic sources. For example, several publicly available prediction algorithms are listed and described at Protocol Online and on the Charité-Universitätsmedizin Berlin website.

    Cellecta, of course, has also focused a large amount of effort divining the critical parameters to effectively predict shRNA. With a pooled screening, even small improvements in our proportion of effective sequences produces significant differences in screening results. We have been using an approach that assesses a number of parameters related to GC composition and distribution. This approach has been quite effective as the data from a recent set of designed shRNA shows (see below). In this case, we designed 5 shRNA to each of 29 gene targets. For all but one of the targets, at least one shRNA construct generated greater than 70% knockdown of the targeted transcript (measured by qRT-PCR) and, for 23 of the targets there were multiple constructs generating >70% knockdown for each target.

    result of new shRNA design on knockdown efficiency

    The results of the project shown here were produced using a relatively straightforward approach. Recently, as part of an NIH grant, we have coupled knockdown data and results from phenotypic screens with a self-learning algorithm to produce a more sophisticated analysis that assesses over a hundred characteristics in a tree like process to identify “good” shRNA. This sort of analysis has increased the fraction of effective shRNA (shRNA that knockdown that target by >70%) by about 10%–from about 65%-70% to close to 80%. Access to this design algorithm will soon be available on the Decipher Project website. In the end, the improvement seems to be a result of eliminating the worst performers, rather than improving the prediction of the best shRNA. This indicates that improvements in effective prediction may not lie with identifying which parameters enhance or promote effective RNA inhibition, but rather, in identifying which parameters can potentially disrupt inhibition.

    Share

    The Importance of shRNA Structural Design for Pooled Libraries

    June 23rd, 2011
    Share

     
    We have done a significant amount of evaluation to identify features of the optimal shRNA structure necessary to ensure our complex libraries maintain and express representative and effective shRNA. Independent from particular targeting sequences, structural variations such as defined mismatches, loop sizes, and stem lengths have significant affect on both the cloning and maintenance of the hairpin-coding insert in the heterogeneous library, as well as subsequent stability of the insert to integrate and stably express in the host cell. To provide comprehensive screening, it is critical to optimize the structure so that the library can maintain a representative copy number of all shRNA sequences with minimal bias through cloning, amplification, packaging, transduction, and selection.

    For the initial structural analysis, we made use of an shRNA-testing reporter construct using a destabilized green fluorescent protein (GFP). A description of this reporter system can be found in the Technology section of our website and more details will be available for publication shortly. I would just point out that, by combining this reporter construct with our library screening technology, we were able to analyze the effects of 40 different design variations of 150 different shRNA targeting sequences. Subsequent analysis by qRT-PCR of the best 10 designs showed significant variation as you can see from the data below that shows three examples of 10 variations of shRNA targeting the same region of the p53 gene. In the end, the best structure was a 25-base stem containing a few precise mismatches and a 7-nt loop.
     

    p53 knockdown percentage for various shRNA structures

     
    One question we often get asked when presenting this optimization strategy is how the miRNA structure compares. In fact, we did not include miRNA variations in this study because we previously found that they anecdotally did not perform as well as shRNA. Cellular processing of miRNA to form the RNA-induced silencing complex requires digestion with both the Drosha and Dicer enzymes. Intuitively, the addition of the extra Drosha processing might be expected to make the effective concentration of the active siRNA form somewhat lower in the cells and also make it less universally effective across different cell lines since it depends on the effective activity of two enzymes instead of just the one. Anecdotally, we did find a lot of variation with miRNA forms that we tested. Published results from a CGAP-sponsored inhibition study of the shRNAmir miRNA variation also found similar results in terms of effectiveness and variation. The study contained a lot of interesting data presented many ways but the finding that particularly stood out what that, across more than 100 genes, only about 37% of expressed shRNAmir produced >75% knockdown in OVCAR-8 cells and just 12% provided 75% knockdown in MCF7 cells (See CGAP Study Figure 1). Hairpin shRNA structures routinely generated much better knockdown rates.
     

    Share

    RNAi Screening with An Inducible Promoter: Is There an Advantage?

    May 12th, 2011
    Share

     
    Inducible expression is often desirable for functional genetic testing to establish clear cause and effect for a specific phenotype. A particular phenotype can be demonstrably linked to expression of a specific cDNA by showing it disappears when transcription is suppressed. Although less direct, similar logic applies to shRNA expression, where a phenotype appears when an shRNA is expressed and disappears when the same shRNA is repressed. To facilitate these types of analyses, we developed inducible versions of both the H1 and U6 RNA polymerase III promoters using the tetracycline repressor element. These promoter constructs were optimized so that the addition of tetracycline (actually, a tetracycline-analog doxycycline) induces expression of the shRNA by inhibiting the tetracycline-element-specific repressor (TetR) from binding and blocking transcription. An example of induced repression of GFP can be seen in the figure below.

    Fluorescent cell images indicating inducible shRNA inhibiting GFP expression

    We routinely use tetracycline-inducible shRNA promoters to validate the effectiveness of individual shRNA sequences identified in our screenings. In particular, with potentially lethal shRNA that target essential genes for cell viability, it is almost a requirement to prevent expression of the shRNA until the cells are established so that the phenotype of cell arrest, necrosis, or apoptosis can be clearly observed and specifically linked to expression of the shRNA. Since the purpose of a majority of our screens is to identify essential genes required for cell proliferation, these inducible shRNA constructs are essential for validating the identified “hits.”

    For general functional shRNA screening, however, inducibility is not always desirable. Although it is often assumed by research groups with whom we interact that a library with an inducible shRNA promoter would produce more reproducible and quantitative hits from a genetic screen, our experience indicates this is not necessarily the case. For example, with viability screens, where we are simply looking for which shRNA sequences inhibit cell proliferation (i.e., shRNAs that are depleted in the overall cell population after several divisions), the need to induce the expression of the library by adding doxycycline to the cells complicates the screening procedure and can introduce some unnecessary variation in the system. Of course, with some screens, it may be preferable or even necessary to use an inducible library. For example, screens to identify genes which repress a particular reporter may be easier to carry out when shRNA expression in the library is repressed until sometime after infection and selection of a baseline population. Also, for in vivo screening, using an inducible library may be almost essential so that significant library shRNA expression and selection does not occur until the cells are established in the mouse model. However, there is no clear benefit to including a defined induction step in any particular screen.

    As with most experimental options, choosing between inducible vs. constitutive shRNA expression for a library requires careful consideration of the experimental setup. While the disadvantages are not always so obvious, there are often unforeseen drawbacks in adding seemingly small variables into what is already a technically challenging assay.

    Share

    Ensuring Comprehensive Screening with Pooled shRNA Expression Libraries

    April 3rd, 2011
    Share

     
    Researchers are often interested in using a pooled shRNA library for genome-wide RNAi screening to cast a very “wide and unbiased net to identify any and all genes functionally involved in some pathway”. Although it is not difficult to make an shRNA library targeting all human or mouse genes, it is practically very difficult to comprehensively screen using such a library. Careful consideration of starting cell numbers and handling of cells during propagation is essential to ensure thorough screening of pooled shRNA expression libraries, minimize false negatives, and obtain consistent and reproducible results.

    First, there is an issue of library complexity since it is necessary to have several shRNAs designed to target each gene. The effectiveness of “validated” shRNA varies from cell-to-cell, and no effective shRNA has been identified for many genes. For these reasons, it is necessary to incorporate several shRNAs for each gene to ensure reasonable knockdown of a high percentage of targets. Cellecta typically designs 5-6 shRNA against each target gene, so more than 25,000 shRNA are required to target 5,000 genes. A library targeting the entire human genome, estimated at just over 23,000 genes, requires approximately 115,000 individual shRNA constructs. While it is not particularly difficult to construct libraries of this complexity, this number of unique shRNA sequences creates technical challenges with representative screening.

    Pooled shRNA library screens require quantification of changes in the fraction of each shRNA sequence in selected vs. control cells or starting library. A “hit” occurs when selected cells have significantly more or less of a particular shRNA sequence. Whether one is looking at enrichment of specific shRNA in the selected cells vs. the control (positive selection) or depletion of shRNA in selected cells vs. the control (negative selection), it is critical that the screen begin with sufficient numbers of each shRNA to ensure measured changes in the fraction an shRNA sequence are statistically significant. This means that, if there are very low numbers of specific shRNAs at the start of the screen, small random changes in a drifting population may be difficult to differentiate from significant trends. Simply put, a loss of 2 shRNA is a 20% change if there are only 10 initially vs. 2% if there are 100. For this reason, a least a few hundred cells need to be infected with each shRNA to initiate a good screening. This is demonstrated in the data below where starting with a smaller population of just 50 cells per shRNA (third bar) leads to more variance than starting with a population of 200 cells per shRNA (first bar). This means that starting a screen involves infecting 100 times more cells than the complexity of the library. For a library with 25,000 shRNAs, the starting population should be 2.5 million infected cells, and for a library with 115,000 shRNAs, the starting population should be over 11 million infected cells.
     

    Graph of Reproducibility in Triplicates for RNAi Library Viability Screen

     
    To screen a heterogenous mixture of shRNA expression constructs, however, it is important to have 2-3 times more cells than viral particles to help ensure that most cells are only infected with one shRNA-carrying virus (i.e., a multiplicity of infection [MOI] of 0.3-0.5), so you need to have 2-3 times more cells than the number targeted for infection. Thus, 6-8 million cells are needed to start a screen with libraries of 25,000 shRNAs, and a whole genome library of 115,000 shRNAs would require 25-35 million cells. Since each screen should be done in duplicate, or better, triplicate, the number of cells needed makes a full genome screen with a redundant shRNA library impractical.

    Finally, to ensure a comprehensive screen, it is not simply sufficient to start with the right amount of cells. During the screening process, incorrect propagating the cells can completely undercut the representation set up at the initiation of the screen. This is especially true for a negative selection screen, such as a viability screen where one is interested in identifying shRNA that kill or inhibit proliferation of cells, and, therefore, drop out of the population. It is critical to maintain the full library representation that was initially used at the start of the screen. If a portion of propagating cells are removed during propagation (e.g., cells are split), the representation of the library can be skewed in the sample. By doing so, this introduces significant noise. This effect is readily seen in the first two bars of the figure where the effect of starting with sufficient cells (200 cells per shRNA) is completely undercut by splitting cells during propagation so that that the final count of cells after 10 days is the same as the initial number of cells. The correlation between triplicates falls dramatically when the cells are split.

    Library representation is often overlooked, especially when the desire is for large-scale unbiased screens. However, without careful consideration in designing screening procedures that reflect the complexity of the library, results of these large-scale screens can produce relatively meaningless data with anecdotal results at best. So, what about genome-wide screening? Our approach with the DECIPHER Project is to provide modules, each targeting approximately 5,000 genes with 27,500 shRNA, which enable comprehensive screening. We are more than half way finished building a series of 5 modules that will target all human genes, and 4 modules targeting all mouse genes. For more information on these libraries, visit the DECIPHER Project website.

    Share
    UA-4527469-1