Posts Tagged ‘unique’

Developmental progression of DNA double-strand break repair deciphered by a single-allele resolution mutation … – Nature.com

ICP: an integrated pipeline for classifying CRISPR/Cas9 induced mutant alleles

We developed an integrated bioinformatic tool ICP (Integrated Classifier Pipeline), to parse complex DSB repair outcomes induced by CRISPR/Cas9 and automatically call for experimental errors generated during NGS library preparation and sequencing: 1) a Nucleotide Position Classifier (NPClassifier), and 2) a Single Allele-resolution Classifier (SAClassifier). We employed these two complementary sequence analysis modules in tandem to enable in-depth interpretation of deep sequencing data at single allele resolution (Fig.1ac, see Methods section for detailed description of ICP tools). In line with the unique DNA signatures generated by distinct DSB repair pathways, we categorized the repair products into four major categories. Alleles with a deletion only on the PAM-distal side (PAM-proximal side was protected by Cas9 protein after cleavage), a common category, were termed as PEPPR class mutations (PAM-End Proximal Protected Repair, PEPPR)41,42. While single strand cleavage by the Cas9 RuvC domain can also nick the non-complementary strand at locations beyond the canonical site between the 6th and 7th nucleotide upstream of the PAM sequence, we restrict our analysis here to the majority cases wherein Cas9 cleavage generates blunt DSB ends to simplify the robust classification scheme developed in this study43,44,45. Mutant alleles judged to be generated by directly annealing 2bp microhomology sequences spanning the gRNA cleavage site were assigned into MMEJ class (again acknowledging that such alleles can also be generated with 1bp microhomology sequence, which however, are not readily amenable to the semi-automated analysis we developed)46,47,48, while pure deletion alleles not belonging to either the PEPPR or MMEJ categories were classified as DELET class mutations. Remaining alleles that include insertions-only and indels (deletion plus insertion) were categorized as insertion class (INSRT) mutations (Fig.1b).

The process of DSB repair pattern profiling consists of preparing a NGS library (a), classifying the resulting parsed alleles (b) and displaying processed alleles by rank order and class of mutations (c). a NGS library preparation: Genomic DNA from F1 test flies carrying both Cas9 and gRNA expressing cassettes either maternally (dark blue bars) or paternally (red bars, or progeny from other designated crosses) are subjected for targeted PCR amplification with primers containing Illumina compatible adapters at the 5 terminal to detect somatic indels. The gray rectangle represents a short region of genomic DNA containing a Cas9/gRNA target: purple circle depicts Cas9 protein and sky-blue line is gRNA. b Classification: Raw NGS data are subjected to the NPClassifier to parse alleles into specific primary categories required for building allelic dictionaries used by the SAClassifier. Four major indel groups are categorized: PEPPR (PAM-End Proximal Protected Repair, sky-blue), MMEJ (Microhomology Mediated End-Joining, dark pink), DELET (deletion, any deletions do not belong to PEPPR and MMEJ, orange) and INSRT (insertion, including the alleles only with inserted nucleotides or had deletions and insertions, purple). The 24-nt short PEPPR, MMEJ and DELET dictionaries are used for a more accurate classification and error calling by binning together all alleles with the same seed region that match primary allelic entries in the SAClassifier dictionaries. c DSB repair pattern visualization: intuitive rendering of the processed raw sequence data as an output of rank ordered classes of alleles. Allelic classes derived from NGS sequencing of individual flies or mosquitoes are displayed by their ranked frequency (allele landscape) and repair pattern fingerprints (color-coded by categories).

Briefly, raw reads generated from deep sequencing were subjected to a preliminary categorization using the NPClassifier, which recognizes the relative positions of editing start- and end-points flanking Cas9 cleavage site and then generates a collection of priori alleles for each category. These primary outputs (MMEJ and DELET) were used for building full-length standard comprehensive dictionaries listing all observed mutations and derived 24-nt short dictionaries (with the same seed region flanking the Cas9 cleavage site) as inputs of the SAClassifier. In addition, a synthetic PEPPR dictionary was built by iteratively increasing the length of deletions by a single nucleotide distal to the PAM site, excluding alleles belonging to the MMEJ category. By fishing the raw reads with 24-nt dictionaries, we were able to automatically recognize reads that also contained experimentally generated errors (e.g., from PCR amplification), which usually are located outside of the narrow 24-nt short dictionary window, thereby assigning such composite alleles to correctly matched root alleles (Fig.1b). These dual iteratively employed ICP classification tools provide a robust and precise classification of CRISPR/Cas9 induced DSB repair outcomes. Next, we developed an evocative user-friendly interface to visualize processed allelic category information in the form of rank ordered allelic landscape plots and repair pattern fingerprints (color-coded DSB repair categories), both of which are sorted by read frequency (Fig.1c). These intuitively accessible data outputs are far more informative and discriminating than the unprocessed primary DNA sequence reads (e.g., compare the seemingly idiosyncratic raw lesions depicted in Fig.2a to the obviously unique processed and concordant replicate patterns shown Fig.2b, c). The ICP was thus employed to visualize results in all the following experiments.

a Examples of the top five somatic indels from individual flies derived from split-drive crosses in which the Cas9 transgene is inherited either maternally (Maternal-S, left) or paternally (Paternal-S, right), but separately from a cassette carryingthe gRNAtransmittedby the other parent. Purple stars indicate the color codes for mutation categories (dark pink: MMEJ, sky-blue: PEPPR, orange: DELET, purple: INSRT) and dark green star indicates the separate raw sequence color coded for the four nucleotides A, T, G, and C. The red bar indicates Paternal-S crosses while dark blue bar represents Maternal-S crosses. b Landscapes of top 50 alleles ranked by reads ratio. All six sequenced individual flies are plotted together, with dark blue lines plotting the data from Maternal-S crosses and the red lines from Paternal-S crosses. The y-axis presents the fraction of reads for a given allele and the x-axis depicts the top 50 alleles according to rank order by read frequency. c DSB repair fingerprints for three representative sequenced individual flies from each cross. The x-axis is the same as depicted in panel b. Both panels show the top 50 ranked alleles. d. Bar plots of Class Fraction for top 50 alleles. Color codes for classes are as in panels a and c. Correlation analysis of two out of three replicates from Maternal-S cross (e) or Paternal-S (f) cross. r2 values and p-values are indicated. Source data for panels b, d, e and f are provided as a Source Data file.

Since DSB repair outcomes have been found to vary considerably as a function of Cas9 or gRNA source and level49,50, we employed the ICP platform to parse somatic indels generated by co-expressing Cas9 and gRNAs in somatic cells of fruit flies (Drosophila melanogaster) and mosquitoes (Anopheles stephensi) in various configurations associated with gene-drive systems. We first applied ICP analysis to a split gene-drive system inserted into the Drosophila pale (ple)gene that is designed to detect copying of a gene cassette in somatic cells. This element, referred to as a CopyCatcher (pleCC), carries a gRNA targeting the first intron of Drosophila ple locus49. In this current study, we make use of low-level ectopic somatic Cas9 expression (which is substantial and broad for vasa-Cas9) to analyze DSB repair patterns across diverse cell types in F1 progeny carrying both Cas9 and gRNAs51,52,53. Because cells actively undergoing meiosis make up only a small fraction of dividing cells in an adult fly, the mutational effects of Cas9/gRNA cleavage in such F1 individuals largely reflect the somatic action of these nuclease complexes. We thus conducted several alternative crossingschemes to assess the somatic mutagenic activity of vasa-Cas9 and gRNA components when transmitted to F1 individuals in various configurations from their F0 parents: 1) Maternal Split (Maternal-S, females carrying vasa-Cas9 crossed with males carrying pleCC); 2) Paternal Split (Paternal-S, males carrying vasa-Cas9 crossed with females carrying pleCC); and 3) Maternal Full (Maternal-F, females carrying both the pleCC and vasa-Cas9 transgenes); or Paternal Full (Paternal-F, males carrying both the pleCC and vasa-Cas9 transgenes)49. Comparative ICP analysis revealed several striking and consistent differences between the prevalent somatic mutations generated in individual progeny in each of these different crossing schemes. In the case of Paternal-S crosses, the resulting mutations were dominated by PEPPR alleles (4 out of top 5 alleles in Fig.2a, Fig. S1a, and 70% of the top 50 alleles as rendered in rank ordered allelic landscapes and color coded DSB repair fingerprints in Fig.2c). In contrast, Maternal-S crosses primarily generated MMEJ and INSRT indels (4 out of top 5 alleles were MMEJ, and at least 50% of the top 50 alleles were INSRT mutations, Fig.2a, c, Supplementary Fig. S1a). These differences were also evident in the steeper allelic landscape curves that were generated from the Maternal-S versus Paternal-S crosses (Fig.2b) as characterized by the initial portion of the curve depicting the 5 most frequent alleles (i.e., the dark blue lines in Fig.2b are all above the red lines for the 5 most frequent alleles). We further quantified differences in allelic profiles between crosses by bar plots displaying the summed proportions of the different allelic classes (summing the percentages of all alleles from each category) which we termed as Class Fraction (Fig.2d). This analysis revealed that INSRT alleles were generated at a significantly higher frequency in Maternal-S crosses, while the PEPPR class dominated among the top 50 alleles in the reciprocal Paternal-S crosses (Fig.2d).

A striking feature of the highly divergent DSB repair signatures generated from maternally versus paternally inherited Cas9 sources was the remarkable reproducibility of their DSB repair fingerprints observed across three individual replicates from each cross (Fig.2e, f). We performed a correlation analysis within replicates by extracting 23 common alleles across all six sequenced flies and plotted the resulting allelic profiles together relative to an arbitrarily chosen Paternal-S replicate as reference (bold red line, Supplementary Fig. S1b). We observed that the frequency distributions of these 23 common alleles were much more similar to each other within intra-cross comparisons than between inter-crosses (Supplementary Fig. S1b). This trend was also revealed by higher correlation coefficients for intra-cross comparisons than for inter-cross comparisons based on allelic read ratios (Supplementary Fig. S1cg). Conspicuous defining differences between the Maternal-S and Paternal-S fingerprints were also evident based on the Class Fraction index (Fig.2d). In summary, a variety of differing statistical measurements all underscore the robust consistent similarities shared among allele profiles generated from individual replicates of same cross and clearly distinctive DSB repair pattern fingerprints generated by maternal versus paternal Cas9 inheritance.

We extended our ICP analysis of mutant allele profiles generated in the ple locus to the more extreme Maternal-F (dark blue lines) and Paternal-F (red lines) cross schemes to assess the role of inheritance patterns when both the source of vasa-Cas9 and gRNA originated from a single parent49. Again, we observed highly dominant alleles in the Maternal-F crosses, clearly evident in allelic landscapes, that deviated markedly from those produced by the Paternal-F crosses, which produced more evenly distributed spectra of alleles spread across a broad range of allelic frequencies (Fig.3a, b). As expected based on these large differences, the repair pattern fingerprints generated from different crosses produced clearly distinguishable patterns of mutation classes, which was particularly evident when considering the Class Fraction (Fig.3e). Cumulatively, these data suggest that the developmental timing and/or levels of Cas9 expression (maternal, early zygotic, or late zygotic) are likely to play a key role in determining which particular DSB repair pathway or sub-pathway is engaged in resolving DSBs.

ad Unique DSB repair signatures obtained using different Cas9 sources are displayed with the top 20 alleles (landscapes and DSB repair pattern fingerprints). NGS sequencing was performed on pools of 20 adults. a vasa-Cas9 inserted in the X chromosome and the pleCC element carrying the gRNA were both carried by either female or male parents, mimicking a full-drive configuration (Maternal-F and Paternal-F crosses with vasa-Cas9). b vasa-Cas9 split crosses wherein the Cas9 transgene was transmitted either maternally (Maternal-S) or paternally (Paternal-S) and the pleCC gRNA bearing cassette was carried by the other parent. Same Maternal-S versus Paternal-S crosses as in panel b, but using either actin-Cas9 (c) or nanos-Cas9 (d) sources. e Class Fraction Index for crosses in panels ad. Bars are shaded according to allelic class color codes. f UMAP embedding for visualizing a common set of 59 alleles shared between the four split crosses with actin-Cas9 and vasa-Cas9. Dots represent single alleles, and the colors indicate the allelic category. g Distribution of top 20 alleles generated from single flies derived from across between parents carrying theSpo11 gRNA and vasa-Cas9elements (Paternal-S cross: red lines and Maternal-S cross: dark blue lines). The top plot shows the allelic landscape for the top 20 alleles from all six sequenced single flies and the bottom shows three examples of the classification fingerprints (with all allelic classes condensed into single rows) color coded for the allele categories. h Class Fraction Index for Spo11 gRNA crosses. i, j Correlation analysis between two replicates from each cross. Dark blue is Maternal-S and red is for Paternal-S. r2 values and p-values are indicated. Source data are provided as a Source Data file.

Previous studies have shown that the relative frequencies of NHEJ versus HDR events depend on the source of Cas9 both in terms of timing and level of expression49,50,54. We thus wondered whether ICP analysis would similarly reveal distinct DSB repair outcomes for two additional Cas9 sources (actin-Cas9 and nanos-Cas9, expressing level of Cas9: actin-Cas9>vasa-Cas9>nanos-Cas9) inserted at the same locus with vasa-Cas9 (Fig.3c, d)49.

As was observed for the vasa-Cas9 source, the actin-Cas9 and nanos-Cas9 sources both generated differing allelic landscapes and repair pattern fingerprints when transmitted maternally versus paternally, which also were readily distinguishable from each other (Fig.3bd). Mirroring results with the vasa-Cas9 source, significant differences between the proportions of PEPPR versus MMEJ class among the top 20 alleles were observed in Maternal-S versus Paternal-S crosses for actin-Cas9. For the nanos-Cas9 source, both the MMEJ and INSRT categories were particularly reduced in Paternal-S crosses, although this latter sex-based difference was not as dramatic as for the other Cas9 sources (presumably due to its more germline restricted expression, Fig.3d)55,56. Overall, the general trend once again indicated that maternally inherited Cas9 sources biased somatic DSB repair outcomes in favor of MMEJ and INSRT classes over PEPPR alleles, while paternal transmission of Cas9 generated mutant alleles dominated by PEPPR class alleles (Fig.3e).

Based on the overall similarities of the DSB repair outcomes observed for actin-Cas9 and vasa-Cas9 crosses, we extracted a set of 59 shared alleles that appeared in all sequenced samples and performed UMAP (Uniform Manifold Approximation and Projection) analysis to cluster these common alleles, condensing them into 5 distinct clouds (Fig.3f). Clouds 1, 2, 3, and 4 were dominated by alternative subsets of PEPPR alleles distinguished primarily by the length of deletion (the average deletion sizes were 24bp, 40bp, 31bp for PEPPR Mini, Midi-I and Midi-II cluster, and it was longer than 55bp for PEPPR Maxi cluster), while cloud 5 was predominantly comprised of MMEJ alleles. We reviewed raw sequences for the few trans-cloud assigned alleles and discovered that some of these alleles could be interpreted as having been generated from a second round of repair using one of the core alleles from the same cloud as a repair template. For example, we inferred that allele 58 was actually a PEPPR deletion with several nucleotides potentially having been back-filled. This result is consistent with the previous report that alleles with insertions or complex repair outcomes would be generated from several rounds of synthesis following the generation of a primary deletion event57,58. Assessing the impact of such potential complexities, which we ignore here for simplicity, will require additional future scrutiny. The remainder of these alleles, such as allele 44, could be accounted for variability in the exact Cas9 cleavage site (between the 6th and 7th nucleotidescounting from the PAMside), with an extra nucleotide being deleted on the PAM-proximal side of the gRNA cleavage site (Fig.3f)43,59,60. Since both of these outcomes were rare, we hypothesized second-order origins for such outlier alleles further validate the robust nature of our ICP platform in recognizing core primary categories of DNA repair outcomes. We also analyzed the common 59 alleles by plotting their read frequencies and observed that the differences between the allelic landscapes for the two reciprocal crosses per each Cas9 source mirrored the trend in Fig.3ad described above (Supplementary Fig. S2a, b). Cumulatively, these concordant findings support a key role for theparental origin of Cas9 servingas a major determinant of the DSB repair outcome.

Another obvious determinant of DSB repair outcome is the local genomic DNA context. We assessed the general applicability of theICP by employing it to classify alleles generated by gRNAs targeting four other loci: prosalpha2 (pros2), Rab11, Spo11 and Rab5 using the vasa-Cas9 source61. Paralleling our findings from the ple locus, we observed divergent allelic profiles between Paternal-S and Maternal-S crosses with distinct dominant mutation categories based on the specific target site. For example, the predominant allelic classes generated at the Spo11, pros2 and Rab11 loci were PEPPR and INSRT alleles, while PEPPR and MMEJ alleles were most prevalent for the Rab5 targets (Fig.3g, h, Supplementary Figs. S36). Among these four targets, Spo11 displayed the greatest divergence in the prevalence of top alleles generated from Maternal-S and Paternal-S crosses (reminiscent of the fine distinctions parsed for the ple locus, Fig.3g). We nonetheless still observed high correlation coefficients between two replicates within the same cross and significantly lower correlation coefficients associated with inter-cross comparisons between maternal versus paternal Cas9 inheritance (averaged r2=0.33, Fig.3i, j, Supplementary Fig. S3). We also observed distinctive sex-specific DSB repair patterns for Cas9 transmission at the pros2 and Rab11 gRNAs targeting sites (Supplementary Figs. S4 and S5), although these differences were less pronounced than for ple and Spo11 gRNAs, while for Rab5, the allelic patterns were similar for both maternal and paternal crosses (Supplementary Fig. S6, see Supplementary Discussion Section). In summary, these data support the broad utility of the ICP pipeline to deliver unique discernable locus-specific fingerprints associated with distinct parental inheritance patterns of Cas9 that generalize to other genomic targets.

Given the strong Cas9 inheritance-dependent distinctions observed for allelic profiles resulting from maternal versus paternal Cas9/gRNA-induced DSBs in Drosophila, we wondered whether similar DSB repair pattern fingerprints could be discerned in mosquitoes carrying a linked full gene-drive in which the Cas9 and gRNA transgenes are carried together in a single cassette62,63,64,65. We examined this possibility using the transgenic An. stephensi Reckh drive,which is inserted into the kynurenine hydroxylase (kh) locus63. Because of the Cas9 and gRNA linkage, the Reckh drive behaves as the Maternal-F and Paternal-F cross configurations described above in which all CRISPR components are carried by a single parental sex63.

Consistent with our observations in flies, the Reckh Maternal-F crosses generated a high proportion of indels that were dominated to a remarkable extent by single mutant alleles with read percentages exceeding 85% for each of the three single mosquitoes sequenced, followed by a long distributed tail of lower frequency alleles. The highly biased nature of the replicate allelic distributions is readily revealed by a virtual step-function in their rank-ordered allelic landscapes (Fig.4a). In striking contrast, over 50% alleles recovered from the Paternal-F crosses were wild-type (WT), which presumably reflects alleles that either remained uncut or DSB ends that were rejoined accurately without further editing. The highly predominant WT allele was followed by a very shallow tail distribution of low frequency mutant alleles in the paternal rank-ordered allelic landscapes (Fig.4a). This dramatic difference in allelic profiles between Maternal-F versus Paternal-F crosses was also clearly displayed by the class-tally bars color coded for the different fractions of each class (black = WT) located beneath each landscape (Fig.4a). Here, the Class Fraction Index measure indicated that Maternal-F crosses generated a greater proportion of INSRT alleles in the first two samples, while Paternal-F crosses produced a high frequency of PEPPR alleles (Fig.4b). As in the case of allelic profiles recovered at the ple and Spo11 loci in flies, common sets of highly correlated mutant DSB repair fingerprints were observed across all three replicates of the Paternal-F Reckh crosses (Supplementary Fig. S7). A similar comparison of allelic distributions in the maternal crosses was precluded by virtue of the single highly dominant alleles and corresponding paucity of lower frequency events, the nature of which varied greatly between replicates. We conclude that the high-resolution performance of the ICP platform in Drosophila can be generalized to other insects such as An. stephensi to robustly discern sex-dependent CRISPR transmission patterns resulting in distinct DSB repair outcomes.

a Rank-ordered landscapes of the top 50 alleles generated from NGS analysis of single mosquitoes. Colored bars with red dots indicate mutated alleles, and black bars with black dots indicate an unmutated WT allele. Middle panels: allelic class fingerprints color coded as in previous figures. Bottom bars: fraction of each allelic class, including WT (black), PEPPR (sky-blue), MMEJ (deep pink), DELET (orange) and INSRT (purple). Numbers indicate the percentage of the corresponding class. b Class Fraction Index for single mosquito sequencing data in panel a. c Developmental time-points for sample collections. d Kinetics of Cas9 mutagenesis generated by the Reckh gRNA. Lines represent the summed fraction of mutant alleles at each time-point. Dark-blue lines indicate maternal (Maternal-F) crosses and red lines paternal (Paternal-F) crosses. e DSB repair fingerprints at different timepoints. Samples were collected at the time points shown in panel c and 20 eggs, larvae, pupae or adults were pooled together for genomic DNA extraction and deep sequencing. The far left and far right panels indicate the Class percentages including WT alleles (black), displaying the proportion of each class at single time-points. Source data are provided as a Source Data file.

Given the dramatic differences we observed in the frequency and nature of somatic alleles generated in maternal versus paternal-sourced Cas9 in both flies and mosquitoes, we wondered whether the developmental timing of Cas9/gRNA expression (maternal=early? and paternal=late?) was the key determinant for these highly reproducible DSB repair fingerprints. We tested this hypothesis by assessing whether DSB repair fingerprints varied as a function of developmental progression using a series of narrowly timed sample collections of F1 mosquitoes produced from crosses of Reckh parents to WT and assayed DSB repair spectra using the ICP pipeline at 12 different developmental stages (Fig.4c. Note: as homozygous Reckh transgenic mosquitoes were crossed to WT, all F1 progeny carried one Reckh allele and one WT receiver allele, the latter of which was amplified for DSB repair analysis). We tracked a diminishing proportion of WT (presumably uncut) alleles and a corresponding increase in mutant alleles of various classes at each of the time points (Fig.4d). Strikingly, nearly half of the target alleles were edited in embryos by 30minutes post-oviposition for both the Maternal-F and Paternal-F Reckh crosses, which corresponds to early pre-blastoderm stages prior to the maternal-to-zygotic transition, suggesting a very early activity of Cas9 in mosquito embryos driven either by maternally inherited Cas9/gRNA complexes or potentially by very early zygotic expression of the Cas9 and gRNA components (Fig.4d)66. We also observed similarly frequent indels being generated as early as 30min in flies expressing Cas9 (either maternally or paternally) with a gRNA targeting the pros2 locus, although the dynamics of Cas9 production are distinct in these two organisms (Supplementary Fig. S8a). Following this initial surge in target cleavage, we observed divergent trajectories in the accumulation of mutant alleles between maternal versus paternal lineages. As an overall trend, mutant alleles accumulated progressively in the Maternal-F lineage until virtually no WT alleles remained, while in Paternal-F lineage, even at the endpoint of adulthood, approximately 60% of WT alleles persisted, in line with our single time point experiments (Fig.4a, d, Supplementary Fig. S8b). As observed in the final distributions of adult alleles, progeny from Maternal-F crosses tended to be enriched for INSRT alleles over the entire developmental time course, while PEPPR alleles were more common in Paternal-F crosses with pronounced accumulation of such alleles during later stages (Fig.4e). A finer scale analysis of the categories of mutant alleles generated over time revealed dynamic patterns of prevalent alleles during mosquito developmental stages (Fig.4e). For example, the proportion of MMEJ alleles peaked at the 2-hour and 4-hour time points (Fig.4e). Similarly, a split-drive expressing a gRNA targeting the Drosophila pros2 locus generated distinct temporal profiles of cleavage patterns in crosses from female versus male parents carrying the drive element (Supplementary Fig. S9).

One unexpected feature of the developmental variations in allelic composition we observed was that the proportion of WT alleles increased at certain time points (e.g., 1-hour in maternal cross and 12-hour - day 1=24h in paternal cross). These temporal fluctuations were also observed in flies expressing Cas9 and a pros2 gRNA at two hours after oviposition (Supplementary Figs. S8a and S9), revealing that this phenomenon might reflect a generally relevant form of clonal selection for WT cells during pre-blastoderm stages. The latter clonal selection might arise if mutant cells experienced negative selection at certain development stages. In the case of paternal transmission, one strong line of evidence supporting this WT clonal selection hypothesis is that in adults, the Reckh element is transmitted to over 99% of F1 progeny, indicating that nearly all target alleles in the germline must be WT. This high frequency of paternal germline transmission is also consistent with the high prevalence of WT alleles tallied at 12h in embryos derived from the paternal crosses (Fig.4e, see Supplementary Discussion Section for more in-depth consideration of this point). We analyzed the developmental distributions of 21 common alleles that were generated at all time-points (Supplementary Fig. S10ae). Most of these common alleles belonged to the PEPPR class, while only five were INSRT alleles, despite the INSRT class overall being the most prevalent for both crosses, again suggesting that INSRT alleles have a higher diversity than other mutation categories (Supplementary Fig. S10a). Overall, this analysis is in line with our previous observation that Maternal-F crosses produced more INSRT alleles while Paternal-F crosses generated a preponderance of PEPPR alleles (Supplementary Fig. S10b).

Given the strong influence of maternal versus paternal origin of Cas9 on the resulting distributions of alleles characterized above by ICP analysis, we wondered whether such allelic signatures could be exploited for lineage tracing in randomly mating multi-generational population cages. We first examined ICP outputs from a controlled crossing scheme carried out over three generations with pleCC and Reckh gRNAs to derive allelic fingerprints distinguishing parents of origin by identifying both somatic alleles in the F1 generation as well as assessment of which of those alleles might be transmitted through the germline to non-fluorescent progeny (i.e., those not inheriting the pleCC or Reckh element) at the F2 generation (Fig.5ad, Supplementary Fig. S11). As anticipated, in both pleCC and Reckh Maternal-F crosses, single dominant somatic alleles were observed in the F1 generation, with the top single allele representing more than 50% of all alleles (Fig.5a, c). Furthermore, all such predominant somatic mutant alleles, which precluded gene-cassette copying of the pleCC or Reckh drive elements in those F1 individuals, were transmitted faithfully through the germline to non-fluorescent F2 progeny with approximately 50% frequency. Furthermore, we observed marked differences in the other half of total reads in F2 progeny depending on the origin of Cas9/gRNA complexes. Thus, a distribution of multiple diverse low frequency mutations were generated when crossing F1 pleCC+ or Reckh+ females with WT males (presumably derived from F1 drive females having deposited Cas9/gRNA complexes maternally that then acted on the paternally sourced WT allele somatically in F2 individuals). In the reciprocal male cross, however, approximately 50% of all alleles remained WT (Fig.5b, d, Supplementary Fig. S12af). These findings support the hypothesis that the top somatic indels derived from maternal Cas9 sources were generated at very early developmental stages (possibly at the point of fertilization or shortly thereafter during the first somatic cell division), resulting in a single mutant allele being initially produced and then transmitted to every descendent cell including all germline progenitor cells49. With the paternal-sourced Cas9 and gRNA, arrays of variable somatic mutations were recovered with the most prominent alleles accounting for fewer than 10% of the total alleles in F1 progeny (Fig.5b). Accordingly, paternally generated F1 somatic alleles were more randomly transmitted via the germline of individuals that failed to copy the gene cassette for either the pleCC or Reckh elements. As a result of this diversity of somatic F1 alleles, only occasionally were the most prevalent alleles also transmitted through germline (e.g., individuals 1, 4 and 5 in Fig.5b, Supplementary Fig. S12gl).

Primary DNA sequences of top single alleles and their percentages of the total alleles from six individual sequenced flies derived from ple gRNA Maternal-F (a) and Paternal-F (b) crosses. Gray bars indicate the location of the gRNA protospacer and red arrowheads are the associated PAM sites. The first row depicts the reference sequence covering the expected DSB cleavage site. Colored squares in the right column indicate the class to which a given allele belongs to. The tables shown on the right of each allele show its frequency among all reads. Left columns of the table indicate frequencies of the somatic allele, and the right columns are the top germline mutant allele frequency obtained by sequencing F2 non-fluorescence progeny derived from same F1 individuals whose top somatic allele is displayed in the left column (excluding WT alleles). Colored dots indicate different alleles with the same color shared between two columns indicating that the same allele appeared as both top 1 somatic and germline indels from the same F0 founders. c, d Allele profiles generated by Reckh parents and progeny generated with the same crossing scheme as for the pleCC. c Tabulation of the Maternal-F cross. d Tabulation of the Paternal-F cross. e Crossing scheme forthe Reckh cage trials. Three individual cages were seeded with 10 homozygous Reckh females, 90 WT females and 100 WT males for the maternally initiated lineage, while the paternally initiated cages were seeded with 10 homozygous Reckh males, 90 WT males and 100 WT females. At each of the following three generations, 10 Reckh+ females and 10 Reckh+ males were randomly collected for single mosquito deep sequencing. f Biased inheritance of Reckh was observed in the maternally seeded cages at generations 2 and 3, but not for the paternally seeded cages. Pink bars denote the fraction of sequenced individual mosquitoes inheriting Reckh from female parents, and cyan colored bars represent Reckh inheritance from the males. Source data are provided as a Source Data file.

The Reckh element in mosquitoes performed similarly to the fly pleCC, however, Reckh F1 individuals displayed less frequent zygotic cleavage and a corresponding reduction in the diversity of resulting somatically generated mutations (>50% WT alleles remained, Paternal-F cross). Consistent with this limited number and array of somatic mutations in the F1 generation from Paternal-F cross, NHEJ mutations were only rarely transmitted to the F2 generation, probably due to more germline-restricted expression of vasa-Cas9 in mosquitoes as compared to flies (Fig.5c, d). These results again suggest that cleavage and repair events were generated later during development in paternal crosses resulting in a stochastic transmission of F1 somatic alleles to the germline, which were largely uncorrelated with the most prevalent allele present somatically in the F1 parent49. Taken together, these highly divergent sex-dependent DSB repair signatures suggested that such genetic fingerprints could be used to track parental history in the context of randomly mating multi-generation population cages.

Based on the highly dominant mutant indels (Maternal-F) versus WT (Paternal-F) alleles generated by Reckh genetic element described above, we evaluated inheritance patterns of indels in multi-generational cages initiated by a 5% introduction of Reckh into WT populations either through maternal or paternal lineages in the F0 generation (Fig.5e). We randomly selected at least 20 fluorescence marker-positive mosquitoes (10 females and 10 males) for NGS analysis at generations 2 and 3, when the Reckh allele was still present at relatively low frequencies in the population and random mating was more likely to have taken place between Reckh/+ heterozygous and WT mosquitoes. Thus, we envisioned that the source of Reckh allele could be tracked back to a male versus female parent of origin by examining whether a dominant WT allele was present (inherited paternally) or not (inherited maternally) (Fig.5e, f). Following this reasoning, we inferred a strong bias for progeny inheriting the Reckh element from a Reckh+ males mating with WT females during generations 2 and 3 than the reverse (i.e., female transmission of Reckh alleles) in the maternally seeded lineage. Indeed, in one maternally seeded replicate (cage 2, generation 3), 100% of the progeny had inherited the Reckh element from their fathers (Fig.5f). In contrast to the striking sex-specific transmission bias observed in maternally seeded cages, progeny from paternally seeded cages displayed more evenly distributed stochastic parental inheritance patterns (Fig.5f). These highly reproducible parent of origin signatures demonstrate the utility of ICP in allelic lineage tracking, which could be of great potential utility in evaluating alternative initial release strategies for gene-drive mosquitoes as well as post-release surveillance of gene-drives as they spread through wild target populations (see Discussion).

Another important challenge for deciphering DSB repair outcomes is to track both NHEJ and gene-cassette mediated HDRevents within the same sample. Such a comprehensive genetic detection tool could have broad impactful applications (see Discussion). For example, one important and non-trivial application is to follow the progress of gene-drives in a marker free fashion as they spread through insect populations. Such dual tracking capability would address the potential concern that mutations eliminating a dominant marker for the gene-drive element could evade phenotype-based assessments of the drive process. Accordingly, we devised a three-step short-amplicon based deep sequencing (200400bp) strategy based on tightly linked colony-specific nucleotide polymorphisms distinguishing donor versus receiver chromosomes to detect copying of two CopyCatcher elements, pleCC and hthCC, from their chromosomes of origin (donor chromosome) to WT homologous (receiver chromosome) targets (Fig.6a)49. Notably, this strategy only amplified the inserted gene cassette on the donor chromosome and or the cassette if it copied onto the receiver chromosome. Thus, the measured allelic frequencies indicate the relative proportions of gene cassettes copied to the receiver chromosome versus those residing on the donor chromosome (Fig.6b displays the inferred somatic HDR frequency quantified from the three-step NGS sequencing protocol as well as Indels quantified by our standard 2-step NGS sequencing protocol - see Methods section for additional details).

a Scheme for tracking gene-drive copying using NGS. Gray bars: genomic DNA, pink oval: Cas9 protein, sky-blue line: gRNA, colored asterisks: polymorphisms. Color coded rectangles represent four nucleotides. Four possible recombinants listed are generated by resolving Holliday junctions at different sites marked with black crosses. b NGS sequencing-based quantification of somatic HDR generated by pleCC in F1 progeny. Areas delineated by dotted lines indicate patches of cells in which somatic HDR copying events have taken place either under bright field (upper) or RFP fluorescent filed (middle). Bottom bars are the summary of the inferred frequency for the somatic HDR (orange), indels (green) and WT alleles (black) derived from the deep sequencing data using the same samples photographed above. More than three flies from each cross were imaged and used for analysis. Scale bars indicate 200 pixels. c Somatic HDR profile with ple gRNA. The red line is for Maternal-F cross and dark blue line for the Paternal-F cross. d Diagram of the hthCC. Black double arrow: recoded hth cDNA, blue rectangles: exon 1, light green rectangles: exons 2-14, and colored lines underneath represent probes used for detection. e In situ images with embryos laid from hthCC-vasa-Cas9 females crossed with WT males. Blue=exon 1, green=WT exons 2-14, red=recoded cDNA for exons 2-14. Insets are magnified single nuclei indicated by colored arrows. This experiment has been repeated at least three times. Scale bars stand for 10m. f Temporal profiles for somatic HDR-mediated copying of the hthCC element assessed by NGS as described for the pleCC in panels c and f. Y-axis tabulates the percentage of HDR at a given time point. Table at the bottom quantifies the HDR fraction at given time points for both the Paternal-F and Maternal-F crosses. Source data are provided as a Source Data file.

In our first set of experiments, we analyzed editing outcomes by examining F1 progeny derived from Maternal-S and Paternal-S pleCC crosses. We compared the rates of somatic HDR measured by NGS analysis to those evaluated by image-based phenotypes associated with copying of the CopyCatcher element. As summarized previously, CopyCatchers such as the pleCC are designed to permit quantification of concordant homozygous mutant clonal phenotypes (e.g., pale patches of thoracic cuticle and embedded sectors ofcolorless bristles), with underlying DsRed+ fluorescent cell phenotypes49. Individual flies in which imaging-based analysis had been conducted were then subject toseparate NGS HDR-fingerprinting and INDELs-fingerprinting resulting in a comprehensive quantification of HDR, NHEJ, and WT alleles within the same sample (Fig.6b, libraries for HDR-fingerprinting and INDELs-fingerprinting were prepared from the same individual fly, but with different DNA preparation and sequencing protocols as detailed description in Methods). For these experiments, F1 flies were genotyped and those carrying both Cas9 and pleCC gRNA were used for NGS analysis (data shown here are the inferred frequencies of somatic HDR, NHEJ events, and WT alleles). This dual integrated analysis revealed that HDR in the Maternal-S crosses resulted in ~15% somatic HDR-mediated cassette copying events on average based on sequencing, and that such cassette copying was yet more frequent in Paternal-S crosses, producing ~25% somatic HDR. The nearly two-fold greater HDR-mediated copying efficiency detected by sequencing in Paternal-S crosses mirrors phenotypic outcomes wherein maternally inherited Cas9 similarly results in a lower frequency of cassette copying detected by fluorescence image analysis in somatic cells than for paternally inherited Cas9 (Fig.6b)49.

Our genetic analysis of stage-dependent differences in DSB repair pathway activity in this study is consistent with a commonly held view in the gene-drive field based on a variety of indirect genetic transmission data that HDR-mediated cassette copying does not occur efficiently during early embryonic stages50,51,63,67,68,69,70. This inference, however, has not yet been verified experimentally. We thus sought to provide direct evidence supporting this key supposition using NGS-based HDR-fingerprinting to track the somatic HDR events across a range of developmental stages in both Maternal-F and Paternal-F crosses in which the Cas9 and gRNA transgenes are transmitted together either maternally or paternally using our validated NGS sequencing protocol. Notably, we collected samples at 9 timepoints and pooled 20 F1 progeny together for pooled sequencing to prime the developmental profile of somatic HDR with pleCC (samples were thus collected without genotyping since it is impractical to genotype individual embryos and young larvae). Because of the limitations imposed by embryo pooling we were unable to use the same samples collected here for also quantifying the generation of somatic NHEJ alleles (i.e., only half of the F1 progeny carried the vasa-Cas9 transgene on the X chromosome and those embryos lacking this transgene were not suitable for generating mutations - note that such an analysis was possible in the case of the viable Reckh drive shown in Fig.4e as well as for a viable split-drive allele inserted into the essential prosalpha2 locus shown in Supplementary Fig. S9). Indeed, NGS analysis detected only very rare examples of somatic HDR events in early embryos derived from both crosses (Fig.6c). Notably, HDR in the Paternal-F cross detected by this sequencing protocol increased substantially to 35.9% during adult stages, a period coinciding with the temporal peak of the pale expression profile (note that in this experiment we employed the actin-Cas9 rather than vasa-Cas9 source, which has higher level of Cas9 expression in somatic cells and generates a correspondingly higher frequency of somatic HDR)49.

We extended our sequencing-based strategy to quantify somatic HDR using a second CopyCatcher element (hthCC) designed specifically to identify even rare copying events in early blastoderm-stage embryos. The hthCC is inserted into the homothorax (hth) gene and was engineered to visualize HDR-mediated copying of the gene cassette by fluorescence in situ hybridization (FISH) using discriminating fluorescent RNA probes complementary to specific endogenous versus recoded cDNA sequences (Fig.6d, e). In this system, copying of the transgene from the donor chromosome to the receiver chromosome would be indicated by the presence of two nuclear dots of red fluorescence detected by the hth recoded cDNA-specific probe (indicating two copies of recoded hth cDNA). In contrast, cells in which no copying occurred should contain only a single nuclear red dot signal (from the donor allele). Such in situ analysis detected no clear case of gene cassette copying in any of the ~5000 blastoderm stage cells examined across ~500 embryos (with the caveat that some mitotic nuclei generate ambiguous signals depending on their orientation). This qualified negative result assessed by in situ analysis was consistent with the very low estimates of HDR frequency during the same early blastoderm-stage developmental window based on NGS analysis in staged time-course experiments, although the latter sequencing method did detect very low levels of somatic HDR at ~3hours after egg laying from the Paternal-F crosses (and no copying until day three of larvae with the maternal cross Fig.6df). The very low levels of somatic HDR observed in early embryos for the hthCC construct either by in situ hybridization or by NGS sequencing parallel the results summarized above for the pleCC element (Fig.6c, f). The maximal somatic HDR frequency observed for the hthCC Maternal-F crosses (0.06% at day 3 after egg laying) was somewhat lower than that for the similar cross for pleCC (0.35% at adult stage), consistent with the predominance of single mutant alleles being generated at very early stages following fertilization in Maternal-F crosses. In contrast to the exceedingly rare copying of the hthCC element detected in early embryos for either the Maternal-F or Paternal-F crosses, the same element frequently copied to the homologous chromosome during later developmental stages in Paternal-F crosses as assessed by NGS sequencing. The hthCC elementagain copied with somewhat lower efficiency than the pleCC element (e.g., 15.2% for hthCC versus 35.9% for pleCC tabulated in adults), presumably reflecting differing genomic cleavage rates or gene conversion efficiencies generated by their respective gRNAs (including total cleavage levels and temporal features). In aggregate, these two examples of quantitative analysis of copying frequencies based on both NGS and in situ analysis demonstrate that ICP and NGS-based quantification of gene conversion events can be successfully integrated for a comprehensive analysis of DSB repair outcomes, including both NHEJ and HDR events as a function of developmental stage. These powerful tools also could be applied for following gene-drive spread through freely mating populations in a marker-free manner as well as for a variety of other applications including gene therapy (see Discussion).

View post:
Developmental progression of DNA double-strand break repair deciphered by a single-allele resolution mutation ... - Nature.com

CRISPR-Cas systems: Overview, innovations and applications in human …

Abstract

Genome editing is the modification of genomic DNA at a specific target site in a wide variety of cell types and organisms, including insertion, deletion and replacement of DNA, resulting in inactivation of target genes, acquisition of novel genetic traits and correction of pathogenic gene mutations. Due to the advantages of simple design, low cost, high efficiency, good repeatability and short-cycle, CRISPR-Cas systems have become the most widely used genome editing technology in molecular biology laboratories all around the world. In this review, an overview of the CRISPR-Cas systems will be introduced, including the innovations, the applications in human disease research and gene therapy, as well as the challenges and opportunities that will be faced in the practical application of CRISPR-Cas systems.

Keywords: CRISPR, Cas9, Genome editing, Human disease models, Rabbit, Gene therapy, Off target effects

Genome editing is the modification of genomic DNA at a specific target site in a wide variety of cell types and organisms, including insertion, deletion and replacement of DNA, resulting in inactivation of target genes, acquisition of novel genetic traits and correction of pathogenic gene mutations [1], [2], [3]. In recent years, with the rapid development of life sciences, genome editing technology has become the most efficient method to study gene function, explore the pathogenesis of hereditary diseases, develop novel targets for gene therapy, breed crop varieties, and so on [4], [5], [6], [7].

At present, there are three mainstream genome editing tools in the world, zinc finger nucleases (ZFNs), transcription activator-like effector nucleases (TALENs) and the RNA-guided CRISPR (clustered regularly interspaced short palindromic repeats)-Cas (CRISPR-associated) nucleases systems [8], [9], [10]. Due to the advantages of simple design, low cost, high efficiency, good repeatability and short-cycle, CRISPR-Cas systems have become the most widely used genome editing technology in molecular biology laboratories all around the world [11], [12]. In this review, an overview of the CRISPR-Cas systems will be introduced, including the innovations and applications in human disease research and gene therapy, as well as the challenges and opportunities that will be faced in the practical application of CRISPR-Cas systems.

CRISPR-Cas is an adaptive immune system existing in most bacteria and archaea, preventing them from being infected by phages, viruses and other foreign genetic elements [13], [14]. It is composed of CRISPR repeat-spacer arrays, which can be further transcribed into CRISPR RNA (crRNA) and trans-activating CRISPR RNA (tracrRNA), and a set of CRISPR-associated (cas) genes which encode Cas proteins with endonuclease activity [15]. When the prokaryotes are invaded by foreign genetic elements, the foreign DNA can be cut into short fragments by Cas proteins, then the DNA fragments will be integrated into the CRISPR array as new spacers [16]. Once the same invader invades again, crRNA will quickly recognize and pair with the foreign DNA, which guides Cas protein to cleave target sequences of foreign DNA, thereby protecting the host [16].

CRISPR-Cas systems can be classified into 2 classes (Class 1 and Class 2), 6 types (I to VI) and several subtypes, with multi-Cas protein effector complexes in Class 1 systems (Type I, III, and IV) and a single effector protein in Class 2 systems (Type II, V, and VI) [17], [18]. The classification, representative members, and typical characteristics of each CRISPR-Cas system are summarized in [10], [12], [15], [16], [17], [18].

Summary of CRISPR-Cas systems.

Type II CRISPR-Cas9 system derived from Streptococcus pyogenes (SpCas9) is one of the best characterized and most commonly used category in numerous CRISPR-Cas systems [18], [19]. The main components of CRISPR-Cas9 system are RNA-guided Cas9 endonuclease and a single-guide RNA (sgRNA) [20]. The Cas9 protein possesses two nuclease domains, named HNH and RuvC, and each cleaves one strand of the target double-stranded DNA [21]. A single-guide RNA (sgRNA) is a simplified combination of crRNA and tracrRNA [22]. The Cas9 nuclease and sgRNA form a Cas9 ribonucleoprotein (RNP), which can bind and cleave the specific DNA target [23]. Furthermore, a protospacer adjacent motif (PAM) sequence is required for Cas9 proteins binding to the target DNA [20].

During genome editing process, sgRNA recruits Cas9 endonuclease to a specific site in the genome to generate a double-stranded break (DSB), which can be repaired by two endogenous self-repair mechanisms, the error-prone non-homologous end joining (NHEJ) pathway or the homology-directed repair (HDR) pathway [24]. Under most conditions, NHEJ is more efficient than HDR, for it is active in about 90% of the cell cycle and not dependent on nearby homology donor [25]. NHEJ can introduce random insertions or deletions (indels) into the cleavage sites, leading to the generation of frameshift mutations or premature stop codons within the open reading frame (ORF) of the target genes, finally inactivating the target genes [26], [27]. Alternatively, HDR can introduce precise genomic modifications at the target site by using a homologous DNA repair template [28], [29] (). Furthermore, large fragment deletions and simultaneous knockout of multiple genes could be achieved by using multiple sgRNAs targeting one single gene or more [30], [31].

Mechanism of genome editing. Double-strand break (DSB) induced by nucleases can be repaired by non-homologous end joining (NHEJ) or homology-directed repair (HDR) pathways. NHEJ can introduce random insertions or deletions (indels) of varying length at the site of the DSB. Alternatively, HDR can introduce precise genomic modifications at the target site by using a homologous DNA donor template.

CRISPR-Cas systems have become the most favorite genome editing tool in the molecular biology laboratory since they were confirmed to have genome editing capabilities in 2012 [23]. They have made numerous achievements in the field of correcting pathogenic mutations, searching for essential genes for cancer immunotherapy, and solving key problems in organ xenotransplantation [5], [32], [33]. Unfortunately, there are still some limitations which need to solve in CRISPR-Cas systems, such as potential off-target effects, limited genome-targeting scope restricted by PAM sequences, and low efficiency and specificity [34], [35]. Therefore, many research teams have been trying to improve this tool.

By introducing two point mutations, H840A and D10A, into HNH and RuvC nuclease domain, researchers have obtained a nuclease dead Cas9 (dCas9) [36]. The dCas9 lacks DNA cleavage activity, but DNA binding activity is not affected. Then, by fusing transcriptional activators or repressors to dCas9, the CRISPR-dCas9 system can be used to activate (CRISPRa) or inhibit (CRISPRi) transcription of target genes [37], [38]. Additionally, dCas9 can be fused to various effector domains, which enables sequence-specific recruitment of fluorescent proteins for genome imaging and epigenetic modifiers for epigenetic modification [39], [40]. Furthermore, this system is easy to operate and allows simultaneous manipulation of multiple genes within a cell [38].

In order to improve the efficiency of site-directed mutagenesis, base editing systems containing dCas9 coupled with cytosine deaminase (cytidine base editor, CBE) or adenosine deaminase (adenine base editor, ABE) have been developed [41], [42]. It can introduce CG to TA or AT to GC point mutations into the editing window of the sgRNA target sites without double-stranded DNA cleavage [41], [42]. Since base editing systems avoid the generation of random insertions or deletions to a great extent, the results of gene mutation are more predictive. However, owing to the restriction of base editing window, base editing systems are not suitable for any target sequence in the genome. Accordingly, C-rich sequences, for example, would produce a lot of off-target mutations [43]. Therefore, researchers have always been trying to develop and optimize novel base editing systems to overcome this drawback [44]. At present, base editing systems have been widely used in various cell lines, human embryos, bacteria, plants and animals for efficient site-directed mutagenesis, which may have broad application prospects in basic research, biotechnology and gene therapy [45], [46], [47]. In theory, 3956 gene variants existing in Clin var database could be repaired by base substitution of C-T or G-A [42], [48].

An NGG PAM at the 3 end of the target DNA site is essential for the recognization and cleavage of the target gene by Cas9 protein [20]. Besides classical NGG PAM sites, other PAM sites such as NGA and NAG also exist, but their efficiency of genome editing is not high [49]. However, such PAM sites only exist in about one-sixteenth of the human genome, thereby largely restricting the targetable genomic loci. For this purpose, several Cas9 variants have been developed to expand PAM compatibility.

In 2018, David Liu et al.[50] developed xCas9 by phage-assisted continuous evolution (PACE), which can recognize multiple PAMs (NG, GAA, GAT, etc.). In the latter half of the same year, Nishimasu et al. developed SpCas9-NG, which can recognize relaxed NG PAMs [51]. In 2020, Miller et al. developed three new SpCas9 variants recognizing non-G PAMs, such as NRRH, NRCH and NRTH PAMs [52]. Later in the same year, Walton et al. developed a SpCas9 variant named SpG, which is capable of targeting an expanded set of NGN PAMs [53]. Subsequently, they optimized the SpG system and developed a near-PAMless variant named SpRY, which is capable of editing nearly all PAMs (NRN and NYN PAMs) [53].

By using these Cas9 variants, researchers have repaired some previously inaccessible disease-relevant genetic variants [51], [52], [53]. However, there are still some drawbacks in these variants, such as low efficiency and cleavage activity [50], [51]. Therefore, they should be further improved by molecular engineering in order to expand the applications of SpCas9 in disease-relevant genome editing.

In addition to editing DNA, CRISPR-Cas systems can also edit RNA. Class 2 Type VI CRISPR-Cas13 systems contain a single RNA-guided Cas13 protein with ribonuclease activity, which can bind to target single-stranded RNA (ssRNA) and specifically cleave the target [54]. To date, four Cas13 proteins have been identified: Cas13a (also known as C2c2), Cas13b, Cas13c and Cas13d [55]. They have successfully been applied in RNA knockdown, transcript labeling, splicing regulation and virus detection [56], [57], [58]. Later, Feng Zhang et al. developed two RNA base edting systems (REPAIR system, enables A-to-I (G) replacement; RESCUE system, enables C-to-U replacement) by fusing catalytically inactivated Cas13 (dCas13) with the adenine/cytidine deaminase domain of ADAR2 (adenosine deaminase acting on RNA type 2) [59], [60].

Compared with DNA editing, RNA editing has the advantages of high efficiency and high specificity. Furthermore, it can make temporary, reversible genetic edits to the genome, avoiding the potential risks and ethical issues caused by permanent genome editing [61], [62]. At present, RNA editing has been widely used for pre-clinical studies of various diseases, which opens a new era for RNA level research, diagnosis and treatment.

Recently, Anzalone et al. developed a novel genome editing technology, named prime editing, which can mediate targeted insertions, deletions and all 12 types of base substitutions without double-strand breaks or donor DNA templates [63]. This system contains a catalytically impaired Cas9 fused to a reverse transcriptase and a prime editing guide RNA (pegRNA) with functions of specifying the target site and encoding the desired edit [63]. After Cas9 cleaves the target site, the reverse transcriptase uses pegRNA as a template for reverse transcription, and then, new genetic information can be written into the target site [63]. Prime editing can effectively improve the efficiency and accuracy of genome editing, and significantly expand the scope of genome editing in biological and therapeutic research. In theory, it is possible to correct up to 89% known disease-causing gene mutations [63]. Nevertheless, as a novel genome editing technique, more research is still needed to further understand and improve prime editing system.

So far, as a rapid and efficient genome editing tool, CRISPR-Cas systems have been extensively used in a variety of species, including bacteria, yeast, tobacco, Arabidopsis, sorghum, rice, Caenorhabditis elegans, Drosophila, zebrafish, Xenopus laevis, mouse, rat, rabbit, dog, sheep, pig and monkey [64], [65], [66], [67], [68], [69], [70], [71], [72], [73], [74], [75], [76], [77], [78], as well as various human cell lines, such as tumor cells, adult cells and stem cells [79], [80]. In medical field, the most important application of CRISPR-Cas systems is to establish genetically modified animal and cell models of many human diseases, including gene knockout models, exogenous gene knock-in models, and site directed mutagenesis models [80], [81].

(1)

Establishing animal models of human diseases

Animal models are crucial tools for understanding gene function, exploring pathogenesis of human diseases and developing new drugs. However, traditional methods for generating animal models are complex, costly and time-consuming, which severely limit the application of animal models in basic medical research and preclinical studies [82]. Since the discovery of CRISPR-Cas systems, a series of genetically modified animal models have successfully been generated in a highly efficient manner [72], [73], [74], [75], [76], [77], [78].

Among numerous model animals, mice are widely used for scientific studies and recognized as the most important model animals in human disease research [83]. So far, researchers have successfully generated many genetically modified mouse models, such as cancer, cardiovascular disease, cardiomyopathy, Huntington's disease, albino, deafness, hemophilia B, obesity, urea cycle disorder and muscular dystrophy [84], [85], [86], [87], [88], [89], [90], [91], [92], [93]. Nevertheless, owing to the great species differences between humans and rodents, they cant provide effective assessment and long-term follow-up for research and treatment of human diseases [94]. Therefore, the application of larger model animals, such as rabbits, pigs and non-human primates, is becoming more and more widespread [74], [77], [78]. With the development of CRISPR-Cas systems, generating larger animal models for human diseases has become a reality, which greatly enriches the disease model resource bank.

Our research focuses on the generation of genetically modified rabbit models using CRISPR-Cas systems. Compared with mice, rabbits are closer to humans in physiology, anatomy and evolution [95]. In addition, rabbits have a short gestation period and less breeding cost. All these make them suitable for studies of the cardiovascular, pulmonary and metabolism diseases [95], [96]. Nowadays, we have generated a series of rabbit models for simulating human diseases, including congenital cataracts, duchenne muscular dystrophy (DMD), X-linked hypophosphatemia (XLH), etc (summarized in ) [97], [98], [99], [100], [101], [102], [103], [104], [105], [106], [107], [108], [109], [110], [111], [112], [113], [114]. Take the generation of PAX4 gene knockout rabbits as an example, the procedure we used to establish genetically modified rabbit models is summarized in and .

CRISPR-Cas system mediated rabbit models of human diseases.

Generation of PAX4 gene knockout (KO) rabbits using CRISPR-Cas9 system. (A) Schematic diagram of the sgRNA target sites located in the rabbit PAX4 locus. PAX4 exons are indicated by yellow rectangles; target sites of the two sgRNA sequences, sgRNA1 and sgRNA2, are highlighted in green; protospacer-adjacent motif (PAM) sequence is highlighted in red. Primers F and R are used for mutation detection in pups. (B) Microinjection and embryo transfer. First a mixture of Cas9 mRNA and sgRNA is microinjected into the cytoplasm of the zygote at the pronuclear stage. Then the injected embryos are transferred into the oviduct of recipient rabbits. After 30days gestation, PAX4 KO rabbits are born. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

Summary of the PAX4 KO rabbits generated by CRISPR-Cas9 system.

In addition, the pig is an important model animal extensively used in biomedical research. Compared with mice, their body/organ size, lifespan, anatomy, physiology, metabolic profile and immune characteristics are more similar to those of humans, which makes the pig an ideal model for studying human cardiovascular diseases and xenotransplantation [115]. At present, several genetically modified pig models have been successfully generated, including neurodegenerative diseases, cardiovascular diseases, cancer, immunodeficiency and xenotransplantation model [116], [117], [118], [119], [120], [121], [122].

To date, non-human primates are recognized as the best human disease models. Their advantage is that their genome has 98% homology with the human genome; also, they are highly similar to humans in tissue structure, immunity, physiology and metabolism [123]. Whats more, they can be infected by human specific viruses, which makes them very important models in infectious disease research [124]. Nowadays, researchers have generated many genetically modified monkey models, such as cancer, muscular dystrophy, developmental retardation, adrenal hypoplasia congenita and Oct4-hrGFP knockin monkeys [125], [126], [127], [128], [129].

(2)

Establishing cell models of human diseases

It was found that the efficiency of CRISPR-Cas mediated genome editing is higher in vitro than in vivo, thus the use of genetically modified cell models can greatly shorten the research time in medical research [130]. Until now, researchers have used CRISPR-Cas systems to perform genetic manipulations on various cell lines, such as tumor cells, adult cells and stem cells, in order to simulate a variety of human diseases [79], [80].

Fuchs et al. generated the RPS25-deficient Hela cell line by knocking out ribosomal protein eS25 (RPS25) gene using CRISPR-Cas9 system [131]. Drost et al. edited four common colorectal cancer-related genes (APC, P53, KRAS and SMAD4) in human intestinal stem cells (hISCs) by CRISPR-Cas9 technology [132]. The genetically modified hISCs with 4 gene mutations possessed the biological characteristics of intestinal tumors and could simulate the occurrence of human colorectal cancer [132]. Jiang et al. induced site-specific chromosome translocation in mouse embryonic stem cells by CRISPR-Cas9, in order to establish a cell and animal model for subsequent research on congenital genetic diseases, infertility, and cancer related to chromosomal translocation [133].

In addition, induced pluripotent stem cells (iPSCs) have shown great application prospect in disease model establishment, drug discovery and patient-specific cellular therapy development [134]. iPSCs have the ability of self-renewal and multiple differentiation potential, which are of great significance in disease model establishment and regenerative medicine research [135]. In recent years, by combining CRISPR-Cas systems with iPSC technology, researchers have generated numerous novel and reliable disease models with isogenic backgrounds and provided new solutions for cell replacement therapy and precise therapy in a variety of human diseases, including neurodegenerative diseases, acquired immunodeficiency syndrome (AIDS), -thalassemia, etc [134], [135], [136].

With the development of CRISPR-Cas systems and the discovery of novel Cas enzymes (Cas12, Cas13, etc.), CRISPR-based molecular diagnostic technology is rapidly developing and has been selected as one of the world's top ten science and technology advancements in 2018 [137].

Unlike Cas9, Cas13 enzymes possess a collateral cleavage activity, which can induce cleavage of nearby non-target RNAs after cleavage of target sequence [54]. Based on the collateral cleavage activity of Cas13, Feng Zhang et al.[138] developed a Cas13a-based in vitro nucleic acid detection platform, named SHERLOCK (Specific High Sensitivity Enzymatic Reporter UnLOCKing). It is composed of Cas13a, sgRNA targeting specific RNA sequences and fluorescent RNA reporters. After Cas13a protein recognizes and cleaves the target RNA, it will cut the report RNA and release the detectable fluorescence signal, so as to achieve the purpose of diagnosis [138]. Researchers have used this method to detect viruses, distinguish pathogenic bacteria, genotype human DNA and identify tumor DNA mutations [137], [138]. Later, Feng Zhang et al. improved SHERLOCK system and renamed it as SHERLOCKv2, which can detect four virus at the same time [139].

In addition to Cas13, Cas12 enzymes are also found to possess collateral cleavage activity [140]. Doudna et al.[141] developed a nucleic acid detection system based on Cas12a (also known as Cpf1), named DETECTR (DNA endonuclease-targeted CRISPR trans reporter). DETECTR has been used to detect cervical cancer associated HPV subtypes (HPV16 and HPV18) in either virus-infected human cell lines or clinical patient samples [141]. Furthermore, Doudna et al. are trying to use the newly discovered Cas14 and CasX proteins in molecular diagnosis, which may further enrich the relevant techniques of CRISPR-based molecular diagnosis [142], [143].

CRISPR-based molecular diagnostic technology has incomparable advantages over traditional molecular diagnostic methods, such as high sensitivity and single-base specificity, which is suitable for early screening of cancer, detection of cancer susceptibility genes and pathogenic genes [137], [144]. Meanwhile, CRISPR diagnostics is inexpensive, simple, fast, without special instrument, and is suitable for field quick detection and detection in less-developed areas [137], [144]. At present, many companies are trying to develop CRISPR diagnostic kits for family use, to detect HIV, rabies, Toxoplasma gondi, etc.

CRISPR-Cas9 system enables genome-wide high-throughput screening, making it a powerful tool for functional genomic screening [145]. The high efficiency of genome editing with CRISPR-Cas9 system makes it possible to edit multiple targets in parallel, thus a mixed cell population with gene mutation can be produced, and the relationship between genotypes and phenotypes could be confirmed by these mutant cells [146]. CRISPR-Cas9 library screening can be divided into two categories: positive selection and negative selection [147]. It has been utilized to identify genes associated with cancer cell survival, drug resistance and virus infection in various models [148], [149], [150]. Compared with RNAi-based screening, high-throughput CRISPR-Cas9 library screening has the advantages of higher transfection efficiency, minimal off-target effects and higher data reproducibility [151]. At present, scientists have constructed human and mouse genome-wide sgRNA libraries, and they have been increasingly improved according to different requirements [152], [153]. In the future, CRISPR-Cas9-based high-throughput screening technology will definitely get unprecedented development and application.

Gene therapy refers to the introduction of foreign genes into target cells to treat specific diseases caused by mutated or defective genes [154]. Target cells of gene therapy are mainly divided into two categories: somatic cells and germ line cells. However, since germ line gene therapy is complicated in technique as well as involves ethical and security issues, today gene therapy is limited to somatic cell gene therapy [155]. Traditional gene therapy is usually carried out by homologous recombination or lentiviral delivery. Nevertheless, the efficiency of homologous recombination is low, and lentiviral vectors are randomly inserted into the recipient genome, which may bring potential security risks to clinical applications [156]. Currently, with the rapid development of CRISPR-Cas systems, they have been widely applied in gene therapy for treating various of human diseases, monogenic diseases, infectious diseases, cancer, etc [155], [156], [157]. Furthermore, some CRISPR-mediated genome-editing therapies have already reached the stage of clinical testing. briefly summarizes the ongoing clinical trials of gene therapy using genome-editing technology, including ZFN, TALEN and CRISPR-Cas systems.

(1)

Monogenic diseases

Monogenic diseases refer to the genetic diseases caused by mutations of a single allele or a pair of alleles on a pair of homologous chromosomes [158]. There are more than 6600 known monogenic diseases around the world, -thalassaemia, sickle cell disease (SCD), hemophilia B (HB), retinitis pigmentosa (RP), leber congenital amaurosis type 10 (LCA10), duchenne muscular dystrophy (DMD), hutchinson-gilford progeria syndrome (HGPS), hereditary tyrosinemia (HT), cystic fibrosis (CF), etc [159]. Most of the monogenic diseases are rare diseases lacking of effective treatment, which will greatly affect the life quality of patients. Nowadays, many animal models of monogenic diseases have been treated with CRISPR-mediated gene therapy. Furthermore, even some CRISPR clinical trials for monogenic diseases are going on [160].

Summary of clinical trials of gene therapy using genome-editing technology.

-Thalassaemia, a hereditary hemolytic anemia disease, is one of the most common and health-threatening monogenic diseases in the world. It is characterized by mutations in the -globin (HBB) gene, leading to severe anemia caused by decreased hemoglobin (Hb) level [161]. For the moment, the only way to cure -thalassemia is hematopoietic stem cell transplantation (HSCT). Yet, high cost of treatment and shortage of donors limit its clinical application [162]. Other therapy, for example, blood transfusion, can only sustain the life of patients but cant cure the disease [161]. To better treat -thalassemia, researchers have turned their attention to gene therapy. A major technical idea is to repair the defective -globin gene of iPSCs from patients with -thalassemia by CRISPR-Cas9 technology, then red blood cells can be produced normally and the disease could be cured [163], [164]. Besides, reactivating fetal hemoglobin (HbF) expression has also been proposed to be an effective method to treat -thalassemia through knockout of BCL11A gene, which suppresses the expression of fetal hemoglobin [165], [166].

Additionally, CRISPR-Cas systems have also been used for the treatment of other hematologic diseases, such as sickle cell disease (SCD) and hemophilia B (HB). SCD is a monogenic disease caused by a single-nucleotide mutation in human -globin gene, leading to a substitution of glutamic acid by valine and the production of an abnormal version of -globin, which is known as hemoglobin S (HbS) [167]. CRISPR-Cas9 system has been used to treat SCD by repairing the -globin gene mutation or reactivating HbF expression [168], [169]. HB is an X-linked hereditary bleeding disorder caused by deficiency of coagulation factor IX, and the most common treatment for hemophilia B is supplement blood coagulation factor [170], [171]. Huai et al. injected naked Cas9-sgRNA plasmid and donor DNA into the adult mice of F9 mutation HB mouse model for gene correction [172]. Meanwhile, Cas9/sgRNA were also microinjected into germline cells of this HB mouse model for gene correction. Both in vivo and ex vivo experiment were sufficient to remit the coagulation deficiency [172]. Guan et al. corrected the F9 Y371D mutation in HB mice using CRISPR-Cas9 mediated in situ genome editing, which greatly improved the hemostatic efficiency and increased the survival of HB mice [173].

Duchenne muscular dystrophy (DMD) is an X-chromosome recessive hereditary disease, with clinical manifestations of muscle weakness or muscle atrophy due to a progressive deterioration of skeletal muscle function [174]. It is usually caused by mutations in the DMD gene, a gene encoding dystrophin protein [174]. Deletions of one or more exons of the DMD gene will result in frameshift mutations or premature termination of translation, thereby normal dystrophin protein can not be synthesized [175]. Currently, there is no effective treatment for DMD. Conventional drug treatment can only control the disease to a certain extent, but can not cure it. It was found that a functional truncated dystrophin protein can be obtained by removing the mutated transcripts with CRISPR-Cas9 system [176], [177], [178]. In addition, base editing systems can also be applied in DMD treatment by repairing single base mutation or inducing exon skipping by introducing premature termination codons (PTCs) [179].

Retinitis pigmentosa (RP) is a group of hereditary retinal degenerative diseases characterized by progressive loss of photoreceptor cells and retinal pigment epithelium (RPE) function [180]. RP has obvious genetic heterogeneity, and the inheritance patterns include autosomal dominant, autosomal recessive, and X-linked recessive inheritance [180]. To date, there is still no cure for RP. In recent years, with the rapid development of gene editing technology, there has been some progress in the treatment of RP. Several gene mutations causing RP have been corrected by CRISPR-Cas9 in mouse models to prevent retinal degeneration and improve visual function, for example, RHO gene, PRPF31 gene and RP1 gene [181], [182].

Leber Congenital Amaurosis type 10 (LCA10) is an autosomal retinal dystrophy with severe vision loss at an early age. The most common gene mutation found in patients with LCA10 is IVS26 mutation in the CEP290 gene, which disrupts the coding sequence by generating an aberrant splice site [183]. Ruan et al. used CRISPR-Cas9 system to knock out the intronic region of the CEP290 gene and restored normal CEP290 expression [184]. In addition, subretinal injection of EDIT-101 in humanized CEP290 mice showed rapid and sustained CEP290 gene editing [185], [186].

Hutchinson-Gilford Progeria Syndrome (HGPS) is a rare lethal genetic disorder with the characteristic of accelerated aging [187]. A point mutation within exon 11 of lamin A gene activates a cryptic splice site, leading to the production of a truncated lamin A called progerin [188]. However, CRISPR-Cas based gene therapy has opened up a broad prospect in HGPS treatment. Administration of AAV-delivered CRISPR-Cas9 components into HGPS mice can reduce the expression of progerin, thereby improved the health condition and prolonged the lifespan of HGPS mice [189], [190]. In addition, Suzuki et al. repaired G609G mutation in a HGPS mouse model via single homology arm donor mediated intron-targeting gene integration (SATI), which ameliorated aging-associated phenotypes and extended the lifespan of HGPS mice [191].

CRISPR-Cas systems have also showed their advantages in gene therapy of hereditary tyrosinemia (HT) and cystic fibrosis (CF). HT is a disorder of tyrosine metabolism caused by deficiency of fuarylacetoacetate hydrolase (Fah) [192]. Yin et al. corrected a Fah mutation in a HT mouse model by injecting CRISPR-Cas9 components into the liver of the mice [193]. Then, the wild-type Fah protein in the liver cells began to express and the body weight loss phenotype was rescued [193]. CF, an autosomal recessive inherited disease with severe respiratory problems and infections, has a high mortality rate at an early age [194]. It is caused by mutations in the CFTR gene, which encodes an epithelial chloride anion channel, the cystic fibrosis transmembrane conductance regulator (CFTR) [194]. Until now, genome editing strategies have been carried out in cell models to correct CFTR mutations. In cultured intestinal stem cells and induced pluripotent stem cells from cystic fbrosis patients, the CFTR homozygous 508 mutation has been corrected by CRISPR-Cas9 technology, leading to recovery of normal CFTR expression and function in differentiated mature airway epithelial cells and intestinal organoids [195], [196].

(2)

Infectious diseases

In recent years, gene therapy has gradually been applied to the treatment of viral infectious diseases. Transforming host cells to avoid viral infection or preventing viral proliferation and transmission are two main strategies for gene therapy of viral infectious diseases [197].

Human immunodeficiency virus (HIV), a kind of retrovirus, mainly attacks the human immune system, especially the CD4 T lymphocytes. When human cells are invaded by HIV, the viral sequences can be integrated into the host genome, blocking cellular and humoral immunity while causing acquired immunodeficiency syndrome (AIDS) [198]. There is still no known cure for AIDS but it could be treated. Although antiretroviral therapy can inhibit HIV-1 replication, the viral sequences still exist in the host genome, and they could be reactivated at any time [199]. CRISPR-Cas9 system can target long terminal repeat (LTR) and destruct HIV-1 proviruses, thus it is possible to completely eliminate HIV-1 from genome of infected host cells [200], [201]. In addition, resistance to HIV-1 infection could be induced by knockout of the HIV co-receptor CCR5 gene in CD4 T cells [202], [203].

Cervical cancer is the second most common gynecologic malignant tumor. The incidence is increasing year by year and young people are especially prone to this disease. It was found that the occurrence of cervical cancer is closely related to HPV (human papillomavirus) infection [204]. HPV is a double-stranded cyclic DNA virus, E6 and E7 genes located in HPV16 early regions are carcinogenic genes [205]. Researchers designed sgRNAs targeting E6 and E7 genes to block the expression of E6 and E7 protein, subsequently the expression of p53 and pRb was restored to normal, finally increasing tumor cells apoptosis and suppressing subcutaneous tumor growth in in vivo experiments [206], [207], [208]. Moreover, HPV virus proliferation was blocked through cutting off E6/E7 genes, and the virus in the bodies could be eliminated [206], [207], [208].

(3)

Cancer

Cancer is the second leading cause of death worldwide after cardiovascular diseases, and it is also a medical problem that needs to be solved urgently. A variety of genetic or epigenetic mutations have been accumulated in the cancer genome, which can activate proto-oncogenes, inactivate tumor suppressors and produce drug resistance [209], [210]. So far, CRISPR-Cas systems have been used to correct the oncogenic genome/epigenome mutations in tumor cells and animal models, resulting in inhibition of tumor cell growth and promotion of cell apoptosis, thereby inhibiting tumor growth [211], [212], [213].

In addition, immunotherapy is considered to be a major breakthrough in cancer treatment, especially chimeric antigen receptor-T (CAR-T) cell therapy, which has a significantly therapeutic effect on leukemia, lymphoma and certain types of solid tumors [214], [215], [216]. CAR-T cells are genetically manipulated, patient-specific T cells, which express receptors targeting antigens specially expressed on tumor cells, for example, CD19 CAR-T cells for B cell malignancies. Then these cells will be transfused back to patients to fight against cancer [217]. However, CAR-T cell therapy is complex, time-consuming and expensive, and it is greatly limited by the quality and quantity of autologous T cells. Therefore, researchers have used CRISPR-Cas9 system to develop universal CAR-T cells, such as simultaneously removing endogenous T cell receptor gene and HLA class I encoding gene on T cells of healthy donors and introducing CAR sequence [218], [219], [220]. Thereby, it could be used in multiple patients without causing graft versus host reaction (GVHR). In addition, CRISPR-Cas mediated genome editing has also been used to enhance the function of CAR-T cells by knocking out genes encoding signaling molecules or T cell inhibitory receptors, such as programmed cell death protein 1 (PD-1) and cytotoxic T lymphocyte antigen 4 (CTLA-4) [221], [222].

Though CRISPR-Cas mediated efficient genome editing technologies have been broadly applied in a variety of species and different types of cells, there are still some important issues needed to be addressed during the process of application, such as off-target effects, delivery methods, immunogenicity and potential risk of cancer.

It was found that designed sgRNAs will mismatch with non-target DNA sequences and introduce unexpected gene mutations, called off-target effects [223]. Off-target effects seriously restrict the widespread application of CRISPR-Cas mediated genome editing in gene therapy, for it might lead to genomic instability and increase the risk of certain diseases by introducing unwanted mutations at off-target sites [224]. At present, several strategies have been used to predict and detect off-target effects, online prediction software, whole genome sequencing (WGS), genome-wide, unbiased identification of DSBs enabled by sequencing (GUIDE-seq), discovery of in situ cas off-targets and verification by sequencing (DISCOVER-Seq), etc [225]. Furthermore, to minimize off-target effects, researchers have systematically studied the factors affecting off-target effects and developed a number of effective approaches.

(1)

Rational design and modification of sgRNAs

The specific binding of sgRNA with the target sequence is the key factor in CRISPR-Cas mediated genome editing. Rational design of highly specific sgRNAs might minimize off-target effects [224]. The length and GC content of sgRNAs, and mismatches between sgRNA and its off-target site will all affect the frequency of off-target effects [226]. In addition, on the basis of rational design of sgRNAs, the specificity of CRISPR-Cas systems can be further improved by modifying sgRNAs, such as engineered hairpin sgRNAs and chemical modifications of sgRNAs [227], [228].

(2)

Modification of Cas9 protein

As we know, the interaction between Cas9 and DNA affects the stability of DNA-Cas9/sgRNA complex as well as tolerance to mismatch [229]. Therefore, high-fidelity SpCas9 variants have been developed by introducing amino substitution(s) into Cas9 protein in order to destabilize the function structure of the CRISPR complex [230]. Researchers have developed several highly effective Cas9 mutants, high-fidelity Cas9 (SpCas9-HF1), enhanced specificity Cas9 (eSpCas9), hyper-accurate Cas9 (HypaCas9), etc [231], [232], [233]. All of them can significantly reduce off-target effects while retain robust target cleavage activity.

(3)

Adoption of double nicking strategy

Recently, a double-nicking strategy has been developed to minimize off-target effects, which employs two catalytic mutant Cas9-D10A nickases and a pair of sgRNAs to produce a cleavage on each strand of the target DNA, thus forming a functional double strand break [234]. Additionally, it was proven that the fusion protein generated by combining dCas9 with Fok nuclease can also reduce off-target effects [235]. Only when the two fusion protein monomers are close to each other to form dimers, can they perform the cleavage function [235]. This strategy could greatly reduce DNA cleavage at non-target sites.

(4)

Anti-CRISPRs

Off switches for CRISPR-Cas9 system was first discovered by Pawluk et al. in 2016. They identified three naturally existing protein families, named as anti-CRISPRs, which can specifically inhibit the CRISPR-Cas9 system of Neisseria meningitidis[236]. Later, Rauch et al. discovered four unique type IIA CRISPR-Cas9 inhibitor proteins encoded by Listeria monocytogenes prophages, and two of them (AcrllA2 and AcrllA4) can block SpCas9 when assayed in Escherichia coli and human cells [237]. Recently, Doudna et al. discovered two broad-spectrum inhibitors of CRISPR-Cas9 system (AcrllC1 and AcrllC3) [238]. Therefore, in order to reduce off-target effects, the anti-CRISPRs could be used to prevent the continuous expression of Cas9 protein in cells to be edited.

(5)

Others

The concentration of Cas9/sgRNA can also affect the frequency of off-target mutations [239]. Thus, the optimal concentration of Cas9 and sgRNA needs to be determined by pre-experiment. Besides, the formulation of CRISPR-Cas9 can affect the frequency of off-target mutations as well. Cas9 nucleases can be delivered into target cells in 3 different forms: DNA expression plasmid, mRNA or recombination protein [240]. Currently, the use of Cas9/sgRNA ribonucleoprotein complexes (Cas9-RNPs), which are composed of purified Cas9 proteins in combination with sgRNA, is becoming more and more widespread. It was found that delivery as plasmid usually produces more off-targets than delivery as RNPs, since the CRISPR-Cas system is active for a shorter time without Cas9 transcription and translation stages [241], [242].

Nowadays, how to effectively deliver CRISPR-Cas components to specific cells, tissues and organs for precisely directed genome editing is still a major problem in gene therapy. Ideal delivery vectors should have the advantages of non-toxicity, well targeting property, high efficiency, low cost, and biodegradability [35], [156]. At present, three main delivery methods have been employed in delivering CRISPR-Cas components, including physical, viral and non-viral methods [243]. Physical methods are the simplest way to deliver CRISPR-Cas components, including electroporation, microinjection and mechanical cell deformation. They are simple and efficient, which can also improve the expression of genes, and being widely applied in in vitro experiments [243], [244]. In addition, viral vectors, such as adenovirus, adeno-associated virus (AAV) and lentivirus viral vectors, are being widely used for both in vitro/ex vivo and in vivo delivery due to their high delivery efficiency. They are commonly used for gene delivery in gene therapy, and some of them have been approved for clinical use [245], [246]. However, safety issue of viral vectors is still a major problem needed to be solved in pre-clinical trials. Therefore, researchers have turned their attention to non-viral vectors, for instance, liposomes, polymers and nanoparticles [247]. Based on the advantages of safety, availability and cost-effectiveness, they are becoming a hotspot for the delivery of CRISPR-Cas components [248].

Since all these delivery methods have both advantages and disadvantages, its necessary to design a complex of viral vectors and non-viral vectors, which combines the advantages of both vectors. Along with the deepening of research, various carriers could be modified by different methods to increase the delivery efficiency and reduce the toxicity [249]. In addition, more novel vectors, such as graphene and carbon nanomaterials (CNMs), could also be applied in the delivery of CRISPR-Cas components [250], [251].

Since the components of CRISPR-Cas systems are derived from bacteria, host immune response to Cas gene and Cas protein is regarded as one of the most important challenges in the clinical trials of CRISPR-Cas system [156], [252]. It was found that in vivo delivery of CRISPR-Cas components can elicit immune responses against the Cas protein [252], [253]. Furthermore, researchers also found that there were anti-Cas9 antibodies and anti-Cas9 T cells existing in healthy humans, suggesting the pre-existing of humoral and celluar immune responses to Cas9 protein in humans [254]. Therefore, how to detect and reduce the immunogenicity of Cas proteins is a major challenge will be faced in clinical application of CRISPR-Cas systems. Researchers are trying to handle this problem by modifying Cas9 protein or using Cas9 homologues [255].

Recently, two independent research groups found that CRISPR-Cas mediated double-stranded breaks (DSBs) can activate the p53 signaling pathway [256], [257]. This means that genetically edited cells are likely to become potential cancer initiating cells, and clinical treatment with CRISPR-Cas systems might inadvertently increase the risk of cancer [256], [257], [258]. Although there is still no direct evidence to confirm the relationship between CRISPR-Cas mediated genome editing and carcinogenesis, these studies once again give a warning on the application of CRISPR-Cas systems in gene therapy. It reminds us that there is still a long way to go before CRISPR-Cas systems could be successfully applied to humans.

CRISPR-Cas mediated genome editing has attracted much attention since its advent in 2012. In theory, each gene can be edited by CRISPR-Cas systems, even genes in human germ cells [259]. However, germline gene editing is forbidden in many countries including China, for it could have unintended consequences and bring ethical and safety concerns [260].

However, in March 2015, a Chinese scientist, Junjiu Huang, published a paper about gene editing in human tripronuclear zygotes in the journal Protein & Cell, which brings the ethical controversy of human embryo gene editing to a climax [261]. Since then, genome editing has been challenged by ethics and morality, and legal regulation of genome editing has triggered a heated discussion all around the world.

Then, on Nov. 28, 2018, the day before the opening of the second international human genome editing summit, Jiankui He, a Chinese scientist from the Southern University of Science and Technology, announced that a pair of gene-edited babies, named Lulu and Nana, were born healthy in China this month. They are the worlds first gene-edited babies, whose CCR5 gene has been modified, making them naturally resistant to HIV infection after birth [262]. The announcement has provoked shock, even outrage among scientists around the world, causing widespread controversy in the application of genome editing.

The society was shocked by this breaking news, for it involves genome editing in human embryos and propagating into future generations, triggering a chorus of criticism from the scientific community and bringing concerns about ethics and security in the use of genome editing. Therefore, scientists call on Chinese government to investigate the matter fully and establish strict regulations on human genome editing. Global supervisory system is also needed to ensure genome editing of human embryos moving ahead safely and ethically [263].

Since CRISPR-Cas mediated genome editing technologies have provided an accessible and adaptable means to alter, regulate, and visualize genomes, they are thought to be a major milestone for molecular biology in the 21st century. So far, CRISPR-Cas systems have been broadly applied in gene function analysis, human gene therapy, targeted drug development, animal model construction and livestock breeding, which fully prove their great potential for further development. However, there are still some limitations to overcome in the practical applications of CRISPR-Cas systems, and great efforts still need to be made to evaluate their long-term safety and effectiveness.

Visit link:
CRISPR-Cas systems: Overview, innovations and applications in human ...

Archives