How to Make a Crappy CRISPR Library

In addition to the pooled CRISPR libraries we offer, there are a few other libraries that researchers can choose to use for gene knockout screens, such as the Broad Institute’s GeCKO and Brunello libraries available through Addgene. To conserve the limited stock of these libraries, many labs only distribute a small fraction of the amplified library and expect the receiving lab to re-amplify the library again in bacteria to make their own stock. However, for this process to produce a usable CRISPR library, three factors are key:

  1. It is essential to start with a complete and representative aliquot of the library for re-amplification.
  2. Enough E. coli must be transformed to ensure that all the plasmids in the aliquot of the library are picked up by cells and amplified so that the final re-amplified library faithfully represents the original.
  3. The transformed bacteria must be cultured in such a way as to prevent competition so that each bacterial cell generates a large population of progeny that have its construct.

Theoretically, a re-amplified library should be a reasonably accurate reproduction of the original library if the above requirements are meant. However, in practice, re-amplification of large complex libraries produces libraries with poorer distribution than the original primary amplified library and often significant sequences will be lost with the process.

We have recently seen this re-amplification problem with GeCKO libraries from two different labs that utilized our Sequencing Service. The GeCKO library is distributed in 1 ug aliquots (20 ul of 50ng/ul) that need to be re-amplified to generate a library suitable for lentiviral packaging and screening. In both cases, the customers provided us genomic DNA samples from their CRISPR screens and aliquots of the re-amplified plasmid library that they used to generate packaged lentiviral particles to transduce the cells for their screens. These plasmid library samples provide a baseline measurement of the representation of each sgRNA in the pre-transduced library. One of the labs sent us a GeCKO library that they had re-amplified themselves from the original 1 ug aliquot they purchased. The second lab, however, purchased a library pre-packaged as lentiviral particles from the distributor, and was provided with an aliquot of the re-amplified library used for packaging by that supplier.

After sequencing both of these orders, it became clear that the distribution of sgRNA in the re-amplified libraries used for the screens was substandard (see Figures below). In both cases, the libraries were missing ca. 1,200 sgRNA from the original library. Since the missing sgRNA sequences are the same in both libraries, it seems likely that these sgRNA were missing in the starting 1 ug aliquot of the GeCKO library rather than lost during re-amplification. However, another 800 sgRNA in one library, and 3,000 sgRNA in the other library were very poorly represented with less than 100 counts each in the normalized results. These differences were clearly introduced during re-amplification, and, in light of the differences, it is fair to conclude that the re-amplification procedure for one of the libraries was performed somewhat better than the other.

The read count distributions of the bulk of the sgRNA constructs in both re-amplified GeCKO libraries, however, were significantly broader than in the original published GeCKO library. Further, the distribution in the sample from the lab that purchased the pre-packaged GeCKO library (the one with the 3,000 poorly represented sgRNAs) was so skewed that it was not useable as a baseline to analyze the results from the screened samples, so the screen was compromised.

Cellecta does not recommend ever re-amplifying our pooled libraries, especially the larger ones. It is not a simple procedure and can really compromise a screen from the start. Considering the time and effort required to run a pooled screen with a large complex pooled library, use of a re-amplified poorly characterized library is a significant risk. We never re-amplify any of our custom or pre-made libraries. All CRISPR and shRNA libraries that we sell have been only been amplified once, just after ligation, and we provide enough material to our customers to ensure they do not have to re-amplify. The several hundred micrograms of plasmid library that we provide from the primary library amplification enables researchers to go directly to the packaging step.

In summary, here are some tips to keep in mind to prevent ending up with a crappy CRISPR library:


  1. Avoid re-amplifying the library by starting with sufficient material to not have to do this.
  2. If you do have to re-amplify, check the distribution of the sgRNAs to ensure that you still have a reasonable baseline distribution that will give you quality data in a knockout screen.
  3. Enlist the help of a partner with expertise in CRISPR libraries to reduce your risk of wasting valuable time and samples.

Re-amplified GeCKO libraries from 2 different labs Figure 1. Distribution of re-amplified GeCKO libraries from two different laboratories.

In Figure 1 above, the number of reads, normalized to 40M total for each run, are indicated on the x-axis, and the number of sgRNA on the y-axis. Each bar indicates the number of different sgRNA for each read count.

You can see the 1,200 sgRNA sequences missing from both library distributions as indicated by the bar at the 10 and under read counts. Although not so obvious in the figures, the number of low-count sgRNA (100 counts or less) for the upper sample is significantly higher (3,000 sgRNA) than the sample in the lower panel (800 sgRNA). The overall distribution in the lower panel is clearly somewhat better than the upper-panel sample, where there are ca. 30,000 sgRNA with counts below 1000 and over 15,000 sgRNA with ca. 2,000+ counts. This bi-modal distribution with roughly a fifth of the library highly over-represented can ruin a screen (as it did in this case) since these overrepresented sequences dominate the data. Neither of these libraries would pass Cellecta's QC standards.

Please email with any comments.

Also in Cellecta Blog & News

Perturb-Seq Screening: Cell-by-Cell Analysis of Gene Perturbations Induced by Pooled CRISPR sgRNA Libraries

Read More
Gene Expression Profiling of Single-Cell Samples: DriverMap Targeted Expression Profiling vs SMART Technology

Single-cell expression analysis provides insights about gene expression and cell heterogeneity at the single-cell level. It enables the elucidation of intracellular gene regulatory networks and intracellular pathways that would otherwise be masked in bulk analysis (Massaia et al., 2018). The DriverMap™ Targeted Gene Expression Profiling (TXP) assay combines highly multiplexed RT-PCR amplification with the depth and precision of Next-Generation Sequencing (NGS) to quantitatively measure gene expression of up to 19,000 target genes in a single assay–even down to the single-cell level.
Read More
Comparing DNA vs. RNA Samples for Immune Repertoire Profiling

Adaptive immunity relies on B and T cells that recognize foreign antigens via hypervariable B cell and T cell receptors (BCRs and TCRs). Diversity among B cell and T cell receptors is primarily produced by V(D)J recombination, which involves the shuffling and joining of the variable (V), diversity (D), joining (J), and constant region (C) gene segments. This results in a diverse repertoire called the adaptive immune repertoire (AIR) that comprises multiple individual clonotypes (sequence) for particular receptor chains.
Read More