CRISPRs (Clustered Regularly Interspaced Short Palindromic Repeats) are DNA loci containing short repetitions of base sequences that are present within prokaryotes and function as a primitive immune system, cleaving foreign DNA (from invading viruses). There are several CRISPR variations that have evolved. Generally, the bacterial CRISPR systems transcribe two RNA molecules: one that targets the DNA of an invading virus (the crRNA), and a second (the tracrRNA) that hybridizes with the crRNA and recruits the CRISPR nuclease complex to cleave the invading virus RNA.
In 2012, Jinek, et al. (Science) used the S. pyogenes CRISPR system to create a single-guide RNA (sgRNA) that could be engineered to target and digest any DNA sequence in conjunction with the S. pyogenes Cas9 protein (spCas9). It was developed by fusing the two nucleotide components of the native S. pyogenes CRISPR system into a single molecule. The first 5’-domain of the sgRNA corresponds to the crRNA in the native bacterial system and contains the 20-base variable region complementary to the DNA target. The rest of the sgRNA is derived from the tracrRNA sequence. Thus, the sgRNA contains an initial variable region of about 20 bases followed by a constant sequence of about 80 nucleotides that contains all the key interactions with the Cas9 nuclease.
This sgRNA/spCas9 system has become the standard CRISPR system of choice for use as a gene editing tool in mammalian systems. The sgRNA paired with the Cas9 nuclease effectively cleaves genomic DNA in a highly site-specific manner. When the sgRNA/Cas9 components are targeted to a gene-coding region in a cell, repeated cleavage of the target site eventually leads to repair failure which produces an insertion or deletion (i.e, an “indel”) that typically knocks out gene expression.
For permanent expression, sgRNAs and Cas9 can be cloned into lentiviral vectors, packaged into viral particles, and transduced into target cells. Since both the sgRNA and Cas9 are stably integrated into the host cell genome, they are passed along to progeny cells. In 2014, two groups (Wang, et al., Science; Shalem, et al., Science) showed that a library of sgRNA-expressing lentiviral constructs to protein-expressing genes could be used to identify genes essential for viability in mammalian cells. Large cell populations transduced with genome-wide sgRNA libraries were grown for a few weeks, then the frequency of each sgRNA in the population was assayed to determine if any were unrepresented—indicating that they were toxic to cells because they knocked out a gene target essential for proliferation. These studies demonstrate the efficiency of the CRISPR system in knocking out gene targets. Without efficient disruption of targeted genes in a majority of the cells that take up an sgRNA, these sort of genome-wide pooled sgRNA functional genetic screens would not work.
In our work, we have found that sgRNA knockout efficiency is directly dependent on Cas9 expression levels. We found that transient expression of Cas9 is not reliable for genetic screens with pooled sgRNA libraries. For CRISPR screening purposes, it is important to use cells that express Cas9 well. Cas9 expressing cells should be generated first by transducing with a lentivirus expressing the Cas9 nuclease, and then, selecting to ensure the highest level of Cas9 expression that the cells will tolerate. Once this is confirmed, Cas9 cells can be transduced with a lentivirus expressing a pooled sgRNA library. Cas9 expression levels definitely vary from cell type to cell type, depending on the promoter used to drive Cas9 expression and the number of Cas9 copies per cell. It is a critical first step to optimize Cas9 expression in cells selected for use in a CRISPR genetic screen.
In trying to optimize the CRISPR system, many research groups have studied the design of the initial variable region that defines the sequence the sgRNA targets and identified ways to optimize this sequence to ensure the knockout is effective while minimizing off-target disruptions (for example, Doench J. G., et al., Nat Biotechnol. 2014). The 20-base targeting sequence, however, only comprises of about one-fifth of the sgRNA sequence. The rest of this molecular primarily interacts with the Cas9 protease to catalyze DNA target cleavage. Improvements to this constant 3’-region of the sgRNA molecule would be useful to increase the efficacy of any sgRNA. One study (Chen, et al., Cell. 2013) showed that some modifications to the constant region do improve binding of a deactivated Cas9 (dCas9)-EGFP fusion protein.
Following up on the studies above, we initiated some work to assess if the modifications to the sgRNA that increased the EGFP-dCas9 fusion binding actually increase the rate or knockout efficiency of the standard CRISPR system with an active Cas9 nuclease, and further, what effect would this have on results of CRISPR-based pooled genetic screens. There were two modifications of the 3’-guide sequence we looked at. One modification swapped the places of an adenine (A) and thymine (T) in the initial stem sequence just after the sgRNA targeting domain. There is a row of four thymines that can act as a transcription terminator site for the U6 promoter that is often used to drive expression of the sgRNA sequence in lentiviral constructs. Another alternation adds a 5-bp extension (“HE”) to the same stem which should make it more stable and accessible to the Cas9 protein.
Initial experiments with a few sgRNA sequences targeting a GFP gene indicated that these two substitutions did indeed increase the rate of target knockout (see figure below). Both the AT alternation and HE insertion significantly improved the knockout rate of GFP relative to the “wild-type” (wt) sgRNA sequence. While the effects varied somewhat for each sequence, the overall positive impact of these changes on several targets in the GFP sequence was clear.
To look at whether the same changes to 3’-guide sequences in a pooled library might increase the knockout rate and improve the overall results of genetic loss- of-function screens, we built four libraries each with a different design: one wt sgRNA library, one with A-T inversion, one with the HE insertion, and a fourth library with both the HE and AT modifications (HEAT). With each library, we ran a “dropout viability screen” to identify genes required for cell viability. With this type of genetic screen cells harboring sgRNA that knockout essential genes do not proliferate so these sgRNA are underrepresented after several cell doublings. Analysis showed a consistent increase in the magnitude of the dropout levels of sgRNAs containing both the AT and HE modifications.
The findings show that it is possible to substantially improve the quality of sgRNA libraries with just a few changes in the constant 3’-region of the sgRNA where the Cas9 protein binds. Constructs utilizing the modified sgRNA structure knocked out target genes more quickly and effectively than those with the standard sgRNA design. Further, libraries expressing sgRNA with the HE and AT modifications generated stronger and more robust results than those with the standard structure which decreases the chance of missing essential genes during screening. As a result of this work, Cellecta uses the HEAT sgRNA design as the default in all its libraries of individual CRISPR constructs.