Posts Tagged ‘data’

Developmental progression of DNA double-strand break repair deciphered by a single-allele resolution mutation … – Nature.com

ICP: an integrated pipeline for classifying CRISPR/Cas9 induced mutant alleles

We developed an integrated bioinformatic tool ICP (Integrated Classifier Pipeline), to parse complex DSB repair outcomes induced by CRISPR/Cas9 and automatically call for experimental errors generated during NGS library preparation and sequencing: 1) a Nucleotide Position Classifier (NPClassifier), and 2) a Single Allele-resolution Classifier (SAClassifier). We employed these two complementary sequence analysis modules in tandem to enable in-depth interpretation of deep sequencing data at single allele resolution (Fig.1ac, see Methods section for detailed description of ICP tools). In line with the unique DNA signatures generated by distinct DSB repair pathways, we categorized the repair products into four major categories. Alleles with a deletion only on the PAM-distal side (PAM-proximal side was protected by Cas9 protein after cleavage), a common category, were termed as PEPPR class mutations (PAM-End Proximal Protected Repair, PEPPR)41,42. While single strand cleavage by the Cas9 RuvC domain can also nick the non-complementary strand at locations beyond the canonical site between the 6th and 7th nucleotide upstream of the PAM sequence, we restrict our analysis here to the majority cases wherein Cas9 cleavage generates blunt DSB ends to simplify the robust classification scheme developed in this study43,44,45. Mutant alleles judged to be generated by directly annealing 2bp microhomology sequences spanning the gRNA cleavage site were assigned into MMEJ class (again acknowledging that such alleles can also be generated with 1bp microhomology sequence, which however, are not readily amenable to the semi-automated analysis we developed)46,47,48, while pure deletion alleles not belonging to either the PEPPR or MMEJ categories were classified as DELET class mutations. Remaining alleles that include insertions-only and indels (deletion plus insertion) were categorized as insertion class (INSRT) mutations (Fig.1b).

The process of DSB repair pattern profiling consists of preparing a NGS library (a), classifying the resulting parsed alleles (b) and displaying processed alleles by rank order and class of mutations (c). a NGS library preparation: Genomic DNA from F1 test flies carrying both Cas9 and gRNA expressing cassettes either maternally (dark blue bars) or paternally (red bars, or progeny from other designated crosses) are subjected for targeted PCR amplification with primers containing Illumina compatible adapters at the 5 terminal to detect somatic indels. The gray rectangle represents a short region of genomic DNA containing a Cas9/gRNA target: purple circle depicts Cas9 protein and sky-blue line is gRNA. b Classification: Raw NGS data are subjected to the NPClassifier to parse alleles into specific primary categories required for building allelic dictionaries used by the SAClassifier. Four major indel groups are categorized: PEPPR (PAM-End Proximal Protected Repair, sky-blue), MMEJ (Microhomology Mediated End-Joining, dark pink), DELET (deletion, any deletions do not belong to PEPPR and MMEJ, orange) and INSRT (insertion, including the alleles only with inserted nucleotides or had deletions and insertions, purple). The 24-nt short PEPPR, MMEJ and DELET dictionaries are used for a more accurate classification and error calling by binning together all alleles with the same seed region that match primary allelic entries in the SAClassifier dictionaries. c DSB repair pattern visualization: intuitive rendering of the processed raw sequence data as an output of rank ordered classes of alleles. Allelic classes derived from NGS sequencing of individual flies or mosquitoes are displayed by their ranked frequency (allele landscape) and repair pattern fingerprints (color-coded by categories).

Briefly, raw reads generated from deep sequencing were subjected to a preliminary categorization using the NPClassifier, which recognizes the relative positions of editing start- and end-points flanking Cas9 cleavage site and then generates a collection of priori alleles for each category. These primary outputs (MMEJ and DELET) were used for building full-length standard comprehensive dictionaries listing all observed mutations and derived 24-nt short dictionaries (with the same seed region flanking the Cas9 cleavage site) as inputs of the SAClassifier. In addition, a synthetic PEPPR dictionary was built by iteratively increasing the length of deletions by a single nucleotide distal to the PAM site, excluding alleles belonging to the MMEJ category. By fishing the raw reads with 24-nt dictionaries, we were able to automatically recognize reads that also contained experimentally generated errors (e.g., from PCR amplification), which usually are located outside of the narrow 24-nt short dictionary window, thereby assigning such composite alleles to correctly matched root alleles (Fig.1b). These dual iteratively employed ICP classification tools provide a robust and precise classification of CRISPR/Cas9 induced DSB repair outcomes. Next, we developed an evocative user-friendly interface to visualize processed allelic category information in the form of rank ordered allelic landscape plots and repair pattern fingerprints (color-coded DSB repair categories), both of which are sorted by read frequency (Fig.1c). These intuitively accessible data outputs are far more informative and discriminating than the unprocessed primary DNA sequence reads (e.g., compare the seemingly idiosyncratic raw lesions depicted in Fig.2a to the obviously unique processed and concordant replicate patterns shown Fig.2b, c). The ICP was thus employed to visualize results in all the following experiments.

a Examples of the top five somatic indels from individual flies derived from split-drive crosses in which the Cas9 transgene is inherited either maternally (Maternal-S, left) or paternally (Paternal-S, right), but separately from a cassette carryingthe gRNAtransmittedby the other parent. Purple stars indicate the color codes for mutation categories (dark pink: MMEJ, sky-blue: PEPPR, orange: DELET, purple: INSRT) and dark green star indicates the separate raw sequence color coded for the four nucleotides A, T, G, and C. The red bar indicates Paternal-S crosses while dark blue bar represents Maternal-S crosses. b Landscapes of top 50 alleles ranked by reads ratio. All six sequenced individual flies are plotted together, with dark blue lines plotting the data from Maternal-S crosses and the red lines from Paternal-S crosses. The y-axis presents the fraction of reads for a given allele and the x-axis depicts the top 50 alleles according to rank order by read frequency. c DSB repair fingerprints for three representative sequenced individual flies from each cross. The x-axis is the same as depicted in panel b. Both panels show the top 50 ranked alleles. d. Bar plots of Class Fraction for top 50 alleles. Color codes for classes are as in panels a and c. Correlation analysis of two out of three replicates from Maternal-S cross (e) or Paternal-S (f) cross. r2 values and p-values are indicated. Source data for panels b, d, e and f are provided as a Source Data file.

Since DSB repair outcomes have been found to vary considerably as a function of Cas9 or gRNA source and level49,50, we employed the ICP platform to parse somatic indels generated by co-expressing Cas9 and gRNAs in somatic cells of fruit flies (Drosophila melanogaster) and mosquitoes (Anopheles stephensi) in various configurations associated with gene-drive systems. We first applied ICP analysis to a split gene-drive system inserted into the Drosophila pale (ple)gene that is designed to detect copying of a gene cassette in somatic cells. This element, referred to as a CopyCatcher (pleCC), carries a gRNA targeting the first intron of Drosophila ple locus49. In this current study, we make use of low-level ectopic somatic Cas9 expression (which is substantial and broad for vasa-Cas9) to analyze DSB repair patterns across diverse cell types in F1 progeny carrying both Cas9 and gRNAs51,52,53. Because cells actively undergoing meiosis make up only a small fraction of dividing cells in an adult fly, the mutational effects of Cas9/gRNA cleavage in such F1 individuals largely reflect the somatic action of these nuclease complexes. We thus conducted several alternative crossingschemes to assess the somatic mutagenic activity of vasa-Cas9 and gRNA components when transmitted to F1 individuals in various configurations from their F0 parents: 1) Maternal Split (Maternal-S, females carrying vasa-Cas9 crossed with males carrying pleCC); 2) Paternal Split (Paternal-S, males carrying vasa-Cas9 crossed with females carrying pleCC); and 3) Maternal Full (Maternal-F, females carrying both the pleCC and vasa-Cas9 transgenes); or Paternal Full (Paternal-F, males carrying both the pleCC and vasa-Cas9 transgenes)49. Comparative ICP analysis revealed several striking and consistent differences between the prevalent somatic mutations generated in individual progeny in each of these different crossing schemes. In the case of Paternal-S crosses, the resulting mutations were dominated by PEPPR alleles (4 out of top 5 alleles in Fig.2a, Fig. S1a, and 70% of the top 50 alleles as rendered in rank ordered allelic landscapes and color coded DSB repair fingerprints in Fig.2c). In contrast, Maternal-S crosses primarily generated MMEJ and INSRT indels (4 out of top 5 alleles were MMEJ, and at least 50% of the top 50 alleles were INSRT mutations, Fig.2a, c, Supplementary Fig. S1a). These differences were also evident in the steeper allelic landscape curves that were generated from the Maternal-S versus Paternal-S crosses (Fig.2b) as characterized by the initial portion of the curve depicting the 5 most frequent alleles (i.e., the dark blue lines in Fig.2b are all above the red lines for the 5 most frequent alleles). We further quantified differences in allelic profiles between crosses by bar plots displaying the summed proportions of the different allelic classes (summing the percentages of all alleles from each category) which we termed as Class Fraction (Fig.2d). This analysis revealed that INSRT alleles were generated at a significantly higher frequency in Maternal-S crosses, while the PEPPR class dominated among the top 50 alleles in the reciprocal Paternal-S crosses (Fig.2d).

A striking feature of the highly divergent DSB repair signatures generated from maternally versus paternally inherited Cas9 sources was the remarkable reproducibility of their DSB repair fingerprints observed across three individual replicates from each cross (Fig.2e, f). We performed a correlation analysis within replicates by extracting 23 common alleles across all six sequenced flies and plotted the resulting allelic profiles together relative to an arbitrarily chosen Paternal-S replicate as reference (bold red line, Supplementary Fig. S1b). We observed that the frequency distributions of these 23 common alleles were much more similar to each other within intra-cross comparisons than between inter-crosses (Supplementary Fig. S1b). This trend was also revealed by higher correlation coefficients for intra-cross comparisons than for inter-cross comparisons based on allelic read ratios (Supplementary Fig. S1cg). Conspicuous defining differences between the Maternal-S and Paternal-S fingerprints were also evident based on the Class Fraction index (Fig.2d). In summary, a variety of differing statistical measurements all underscore the robust consistent similarities shared among allele profiles generated from individual replicates of same cross and clearly distinctive DSB repair pattern fingerprints generated by maternal versus paternal Cas9 inheritance.

We extended our ICP analysis of mutant allele profiles generated in the ple locus to the more extreme Maternal-F (dark blue lines) and Paternal-F (red lines) cross schemes to assess the role of inheritance patterns when both the source of vasa-Cas9 and gRNA originated from a single parent49. Again, we observed highly dominant alleles in the Maternal-F crosses, clearly evident in allelic landscapes, that deviated markedly from those produced by the Paternal-F crosses, which produced more evenly distributed spectra of alleles spread across a broad range of allelic frequencies (Fig.3a, b). As expected based on these large differences, the repair pattern fingerprints generated from different crosses produced clearly distinguishable patterns of mutation classes, which was particularly evident when considering the Class Fraction (Fig.3e). Cumulatively, these data suggest that the developmental timing and/or levels of Cas9 expression (maternal, early zygotic, or late zygotic) are likely to play a key role in determining which particular DSB repair pathway or sub-pathway is engaged in resolving DSBs.

ad Unique DSB repair signatures obtained using different Cas9 sources are displayed with the top 20 alleles (landscapes and DSB repair pattern fingerprints). NGS sequencing was performed on pools of 20 adults. a vasa-Cas9 inserted in the X chromosome and the pleCC element carrying the gRNA were both carried by either female or male parents, mimicking a full-drive configuration (Maternal-F and Paternal-F crosses with vasa-Cas9). b vasa-Cas9 split crosses wherein the Cas9 transgene was transmitted either maternally (Maternal-S) or paternally (Paternal-S) and the pleCC gRNA bearing cassette was carried by the other parent. Same Maternal-S versus Paternal-S crosses as in panel b, but using either actin-Cas9 (c) or nanos-Cas9 (d) sources. e Class Fraction Index for crosses in panels ad. Bars are shaded according to allelic class color codes. f UMAP embedding for visualizing a common set of 59 alleles shared between the four split crosses with actin-Cas9 and vasa-Cas9. Dots represent single alleles, and the colors indicate the allelic category. g Distribution of top 20 alleles generated from single flies derived from across between parents carrying theSpo11 gRNA and vasa-Cas9elements (Paternal-S cross: red lines and Maternal-S cross: dark blue lines). The top plot shows the allelic landscape for the top 20 alleles from all six sequenced single flies and the bottom shows three examples of the classification fingerprints (with all allelic classes condensed into single rows) color coded for the allele categories. h Class Fraction Index for Spo11 gRNA crosses. i, j Correlation analysis between two replicates from each cross. Dark blue is Maternal-S and red is for Paternal-S. r2 values and p-values are indicated. Source data are provided as a Source Data file.

Previous studies have shown that the relative frequencies of NHEJ versus HDR events depend on the source of Cas9 both in terms of timing and level of expression49,50,54. We thus wondered whether ICP analysis would similarly reveal distinct DSB repair outcomes for two additional Cas9 sources (actin-Cas9 and nanos-Cas9, expressing level of Cas9: actin-Cas9>vasa-Cas9>nanos-Cas9) inserted at the same locus with vasa-Cas9 (Fig.3c, d)49.

As was observed for the vasa-Cas9 source, the actin-Cas9 and nanos-Cas9 sources both generated differing allelic landscapes and repair pattern fingerprints when transmitted maternally versus paternally, which also were readily distinguishable from each other (Fig.3bd). Mirroring results with the vasa-Cas9 source, significant differences between the proportions of PEPPR versus MMEJ class among the top 20 alleles were observed in Maternal-S versus Paternal-S crosses for actin-Cas9. For the nanos-Cas9 source, both the MMEJ and INSRT categories were particularly reduced in Paternal-S crosses, although this latter sex-based difference was not as dramatic as for the other Cas9 sources (presumably due to its more germline restricted expression, Fig.3d)55,56. Overall, the general trend once again indicated that maternally inherited Cas9 sources biased somatic DSB repair outcomes in favor of MMEJ and INSRT classes over PEPPR alleles, while paternal transmission of Cas9 generated mutant alleles dominated by PEPPR class alleles (Fig.3e).

Based on the overall similarities of the DSB repair outcomes observed for actin-Cas9 and vasa-Cas9 crosses, we extracted a set of 59 shared alleles that appeared in all sequenced samples and performed UMAP (Uniform Manifold Approximation and Projection) analysis to cluster these common alleles, condensing them into 5 distinct clouds (Fig.3f). Clouds 1, 2, 3, and 4 were dominated by alternative subsets of PEPPR alleles distinguished primarily by the length of deletion (the average deletion sizes were 24bp, 40bp, 31bp for PEPPR Mini, Midi-I and Midi-II cluster, and it was longer than 55bp for PEPPR Maxi cluster), while cloud 5 was predominantly comprised of MMEJ alleles. We reviewed raw sequences for the few trans-cloud assigned alleles and discovered that some of these alleles could be interpreted as having been generated from a second round of repair using one of the core alleles from the same cloud as a repair template. For example, we inferred that allele 58 was actually a PEPPR deletion with several nucleotides potentially having been back-filled. This result is consistent with the previous report that alleles with insertions or complex repair outcomes would be generated from several rounds of synthesis following the generation of a primary deletion event57,58. Assessing the impact of such potential complexities, which we ignore here for simplicity, will require additional future scrutiny. The remainder of these alleles, such as allele 44, could be accounted for variability in the exact Cas9 cleavage site (between the 6th and 7th nucleotidescounting from the PAMside), with an extra nucleotide being deleted on the PAM-proximal side of the gRNA cleavage site (Fig.3f)43,59,60. Since both of these outcomes were rare, we hypothesized second-order origins for such outlier alleles further validate the robust nature of our ICP platform in recognizing core primary categories of DNA repair outcomes. We also analyzed the common 59 alleles by plotting their read frequencies and observed that the differences between the allelic landscapes for the two reciprocal crosses per each Cas9 source mirrored the trend in Fig.3ad described above (Supplementary Fig. S2a, b). Cumulatively, these concordant findings support a key role for theparental origin of Cas9 servingas a major determinant of the DSB repair outcome.

Another obvious determinant of DSB repair outcome is the local genomic DNA context. We assessed the general applicability of theICP by employing it to classify alleles generated by gRNAs targeting four other loci: prosalpha2 (pros2), Rab11, Spo11 and Rab5 using the vasa-Cas9 source61. Paralleling our findings from the ple locus, we observed divergent allelic profiles between Paternal-S and Maternal-S crosses with distinct dominant mutation categories based on the specific target site. For example, the predominant allelic classes generated at the Spo11, pros2 and Rab11 loci were PEPPR and INSRT alleles, while PEPPR and MMEJ alleles were most prevalent for the Rab5 targets (Fig.3g, h, Supplementary Figs. S36). Among these four targets, Spo11 displayed the greatest divergence in the prevalence of top alleles generated from Maternal-S and Paternal-S crosses (reminiscent of the fine distinctions parsed for the ple locus, Fig.3g). We nonetheless still observed high correlation coefficients between two replicates within the same cross and significantly lower correlation coefficients associated with inter-cross comparisons between maternal versus paternal Cas9 inheritance (averaged r2=0.33, Fig.3i, j, Supplementary Fig. S3). We also observed distinctive sex-specific DSB repair patterns for Cas9 transmission at the pros2 and Rab11 gRNAs targeting sites (Supplementary Figs. S4 and S5), although these differences were less pronounced than for ple and Spo11 gRNAs, while for Rab5, the allelic patterns were similar for both maternal and paternal crosses (Supplementary Fig. S6, see Supplementary Discussion Section). In summary, these data support the broad utility of the ICP pipeline to deliver unique discernable locus-specific fingerprints associated with distinct parental inheritance patterns of Cas9 that generalize to other genomic targets.

Given the strong Cas9 inheritance-dependent distinctions observed for allelic profiles resulting from maternal versus paternal Cas9/gRNA-induced DSBs in Drosophila, we wondered whether similar DSB repair pattern fingerprints could be discerned in mosquitoes carrying a linked full gene-drive in which the Cas9 and gRNA transgenes are carried together in a single cassette62,63,64,65. We examined this possibility using the transgenic An. stephensi Reckh drive,which is inserted into the kynurenine hydroxylase (kh) locus63. Because of the Cas9 and gRNA linkage, the Reckh drive behaves as the Maternal-F and Paternal-F cross configurations described above in which all CRISPR components are carried by a single parental sex63.

Consistent with our observations in flies, the Reckh Maternal-F crosses generated a high proportion of indels that were dominated to a remarkable extent by single mutant alleles with read percentages exceeding 85% for each of the three single mosquitoes sequenced, followed by a long distributed tail of lower frequency alleles. The highly biased nature of the replicate allelic distributions is readily revealed by a virtual step-function in their rank-ordered allelic landscapes (Fig.4a). In striking contrast, over 50% alleles recovered from the Paternal-F crosses were wild-type (WT), which presumably reflects alleles that either remained uncut or DSB ends that were rejoined accurately without further editing. The highly predominant WT allele was followed by a very shallow tail distribution of low frequency mutant alleles in the paternal rank-ordered allelic landscapes (Fig.4a). This dramatic difference in allelic profiles between Maternal-F versus Paternal-F crosses was also clearly displayed by the class-tally bars color coded for the different fractions of each class (black = WT) located beneath each landscape (Fig.4a). Here, the Class Fraction Index measure indicated that Maternal-F crosses generated a greater proportion of INSRT alleles in the first two samples, while Paternal-F crosses produced a high frequency of PEPPR alleles (Fig.4b). As in the case of allelic profiles recovered at the ple and Spo11 loci in flies, common sets of highly correlated mutant DSB repair fingerprints were observed across all three replicates of the Paternal-F Reckh crosses (Supplementary Fig. S7). A similar comparison of allelic distributions in the maternal crosses was precluded by virtue of the single highly dominant alleles and corresponding paucity of lower frequency events, the nature of which varied greatly between replicates. We conclude that the high-resolution performance of the ICP platform in Drosophila can be generalized to other insects such as An. stephensi to robustly discern sex-dependent CRISPR transmission patterns resulting in distinct DSB repair outcomes.

a Rank-ordered landscapes of the top 50 alleles generated from NGS analysis of single mosquitoes. Colored bars with red dots indicate mutated alleles, and black bars with black dots indicate an unmutated WT allele. Middle panels: allelic class fingerprints color coded as in previous figures. Bottom bars: fraction of each allelic class, including WT (black), PEPPR (sky-blue), MMEJ (deep pink), DELET (orange) and INSRT (purple). Numbers indicate the percentage of the corresponding class. b Class Fraction Index for single mosquito sequencing data in panel a. c Developmental time-points for sample collections. d Kinetics of Cas9 mutagenesis generated by the Reckh gRNA. Lines represent the summed fraction of mutant alleles at each time-point. Dark-blue lines indicate maternal (Maternal-F) crosses and red lines paternal (Paternal-F) crosses. e DSB repair fingerprints at different timepoints. Samples were collected at the time points shown in panel c and 20 eggs, larvae, pupae or adults were pooled together for genomic DNA extraction and deep sequencing. The far left and far right panels indicate the Class percentages including WT alleles (black), displaying the proportion of each class at single time-points. Source data are provided as a Source Data file.

Given the dramatic differences we observed in the frequency and nature of somatic alleles generated in maternal versus paternal-sourced Cas9 in both flies and mosquitoes, we wondered whether the developmental timing of Cas9/gRNA expression (maternal=early? and paternal=late?) was the key determinant for these highly reproducible DSB repair fingerprints. We tested this hypothesis by assessing whether DSB repair fingerprints varied as a function of developmental progression using a series of narrowly timed sample collections of F1 mosquitoes produced from crosses of Reckh parents to WT and assayed DSB repair spectra using the ICP pipeline at 12 different developmental stages (Fig.4c. Note: as homozygous Reckh transgenic mosquitoes were crossed to WT, all F1 progeny carried one Reckh allele and one WT receiver allele, the latter of which was amplified for DSB repair analysis). We tracked a diminishing proportion of WT (presumably uncut) alleles and a corresponding increase in mutant alleles of various classes at each of the time points (Fig.4d). Strikingly, nearly half of the target alleles were edited in embryos by 30minutes post-oviposition for both the Maternal-F and Paternal-F Reckh crosses, which corresponds to early pre-blastoderm stages prior to the maternal-to-zygotic transition, suggesting a very early activity of Cas9 in mosquito embryos driven either by maternally inherited Cas9/gRNA complexes or potentially by very early zygotic expression of the Cas9 and gRNA components (Fig.4d)66. We also observed similarly frequent indels being generated as early as 30min in flies expressing Cas9 (either maternally or paternally) with a gRNA targeting the pros2 locus, although the dynamics of Cas9 production are distinct in these two organisms (Supplementary Fig. S8a). Following this initial surge in target cleavage, we observed divergent trajectories in the accumulation of mutant alleles between maternal versus paternal lineages. As an overall trend, mutant alleles accumulated progressively in the Maternal-F lineage until virtually no WT alleles remained, while in Paternal-F lineage, even at the endpoint of adulthood, approximately 60% of WT alleles persisted, in line with our single time point experiments (Fig.4a, d, Supplementary Fig. S8b). As observed in the final distributions of adult alleles, progeny from Maternal-F crosses tended to be enriched for INSRT alleles over the entire developmental time course, while PEPPR alleles were more common in Paternal-F crosses with pronounced accumulation of such alleles during later stages (Fig.4e). A finer scale analysis of the categories of mutant alleles generated over time revealed dynamic patterns of prevalent alleles during mosquito developmental stages (Fig.4e). For example, the proportion of MMEJ alleles peaked at the 2-hour and 4-hour time points (Fig.4e). Similarly, a split-drive expressing a gRNA targeting the Drosophila pros2 locus generated distinct temporal profiles of cleavage patterns in crosses from female versus male parents carrying the drive element (Supplementary Fig. S9).

One unexpected feature of the developmental variations in allelic composition we observed was that the proportion of WT alleles increased at certain time points (e.g., 1-hour in maternal cross and 12-hour - day 1=24h in paternal cross). These temporal fluctuations were also observed in flies expressing Cas9 and a pros2 gRNA at two hours after oviposition (Supplementary Figs. S8a and S9), revealing that this phenomenon might reflect a generally relevant form of clonal selection for WT cells during pre-blastoderm stages. The latter clonal selection might arise if mutant cells experienced negative selection at certain development stages. In the case of paternal transmission, one strong line of evidence supporting this WT clonal selection hypothesis is that in adults, the Reckh element is transmitted to over 99% of F1 progeny, indicating that nearly all target alleles in the germline must be WT. This high frequency of paternal germline transmission is also consistent with the high prevalence of WT alleles tallied at 12h in embryos derived from the paternal crosses (Fig.4e, see Supplementary Discussion Section for more in-depth consideration of this point). We analyzed the developmental distributions of 21 common alleles that were generated at all time-points (Supplementary Fig. S10ae). Most of these common alleles belonged to the PEPPR class, while only five were INSRT alleles, despite the INSRT class overall being the most prevalent for both crosses, again suggesting that INSRT alleles have a higher diversity than other mutation categories (Supplementary Fig. S10a). Overall, this analysis is in line with our previous observation that Maternal-F crosses produced more INSRT alleles while Paternal-F crosses generated a preponderance of PEPPR alleles (Supplementary Fig. S10b).

Given the strong influence of maternal versus paternal origin of Cas9 on the resulting distributions of alleles characterized above by ICP analysis, we wondered whether such allelic signatures could be exploited for lineage tracing in randomly mating multi-generational population cages. We first examined ICP outputs from a controlled crossing scheme carried out over three generations with pleCC and Reckh gRNAs to derive allelic fingerprints distinguishing parents of origin by identifying both somatic alleles in the F1 generation as well as assessment of which of those alleles might be transmitted through the germline to non-fluorescent progeny (i.e., those not inheriting the pleCC or Reckh element) at the F2 generation (Fig.5ad, Supplementary Fig. S11). As anticipated, in both pleCC and Reckh Maternal-F crosses, single dominant somatic alleles were observed in the F1 generation, with the top single allele representing more than 50% of all alleles (Fig.5a, c). Furthermore, all such predominant somatic mutant alleles, which precluded gene-cassette copying of the pleCC or Reckh drive elements in those F1 individuals, were transmitted faithfully through the germline to non-fluorescent F2 progeny with approximately 50% frequency. Furthermore, we observed marked differences in the other half of total reads in F2 progeny depending on the origin of Cas9/gRNA complexes. Thus, a distribution of multiple diverse low frequency mutations were generated when crossing F1 pleCC+ or Reckh+ females with WT males (presumably derived from F1 drive females having deposited Cas9/gRNA complexes maternally that then acted on the paternally sourced WT allele somatically in F2 individuals). In the reciprocal male cross, however, approximately 50% of all alleles remained WT (Fig.5b, d, Supplementary Fig. S12af). These findings support the hypothesis that the top somatic indels derived from maternal Cas9 sources were generated at very early developmental stages (possibly at the point of fertilization or shortly thereafter during the first somatic cell division), resulting in a single mutant allele being initially produced and then transmitted to every descendent cell including all germline progenitor cells49. With the paternal-sourced Cas9 and gRNA, arrays of variable somatic mutations were recovered with the most prominent alleles accounting for fewer than 10% of the total alleles in F1 progeny (Fig.5b). Accordingly, paternally generated F1 somatic alleles were more randomly transmitted via the germline of individuals that failed to copy the gene cassette for either the pleCC or Reckh elements. As a result of this diversity of somatic F1 alleles, only occasionally were the most prevalent alleles also transmitted through germline (e.g., individuals 1, 4 and 5 in Fig.5b, Supplementary Fig. S12gl).

Primary DNA sequences of top single alleles and their percentages of the total alleles from six individual sequenced flies derived from ple gRNA Maternal-F (a) and Paternal-F (b) crosses. Gray bars indicate the location of the gRNA protospacer and red arrowheads are the associated PAM sites. The first row depicts the reference sequence covering the expected DSB cleavage site. Colored squares in the right column indicate the class to which a given allele belongs to. The tables shown on the right of each allele show its frequency among all reads. Left columns of the table indicate frequencies of the somatic allele, and the right columns are the top germline mutant allele frequency obtained by sequencing F2 non-fluorescence progeny derived from same F1 individuals whose top somatic allele is displayed in the left column (excluding WT alleles). Colored dots indicate different alleles with the same color shared between two columns indicating that the same allele appeared as both top 1 somatic and germline indels from the same F0 founders. c, d Allele profiles generated by Reckh parents and progeny generated with the same crossing scheme as for the pleCC. c Tabulation of the Maternal-F cross. d Tabulation of the Paternal-F cross. e Crossing scheme forthe Reckh cage trials. Three individual cages were seeded with 10 homozygous Reckh females, 90 WT females and 100 WT males for the maternally initiated lineage, while the paternally initiated cages were seeded with 10 homozygous Reckh males, 90 WT males and 100 WT females. At each of the following three generations, 10 Reckh+ females and 10 Reckh+ males were randomly collected for single mosquito deep sequencing. f Biased inheritance of Reckh was observed in the maternally seeded cages at generations 2 and 3, but not for the paternally seeded cages. Pink bars denote the fraction of sequenced individual mosquitoes inheriting Reckh from female parents, and cyan colored bars represent Reckh inheritance from the males. Source data are provided as a Source Data file.

The Reckh element in mosquitoes performed similarly to the fly pleCC, however, Reckh F1 individuals displayed less frequent zygotic cleavage and a corresponding reduction in the diversity of resulting somatically generated mutations (>50% WT alleles remained, Paternal-F cross). Consistent with this limited number and array of somatic mutations in the F1 generation from Paternal-F cross, NHEJ mutations were only rarely transmitted to the F2 generation, probably due to more germline-restricted expression of vasa-Cas9 in mosquitoes as compared to flies (Fig.5c, d). These results again suggest that cleavage and repair events were generated later during development in paternal crosses resulting in a stochastic transmission of F1 somatic alleles to the germline, which were largely uncorrelated with the most prevalent allele present somatically in the F1 parent49. Taken together, these highly divergent sex-dependent DSB repair signatures suggested that such genetic fingerprints could be used to track parental history in the context of randomly mating multi-generation population cages.

Based on the highly dominant mutant indels (Maternal-F) versus WT (Paternal-F) alleles generated by Reckh genetic element described above, we evaluated inheritance patterns of indels in multi-generational cages initiated by a 5% introduction of Reckh into WT populations either through maternal or paternal lineages in the F0 generation (Fig.5e). We randomly selected at least 20 fluorescence marker-positive mosquitoes (10 females and 10 males) for NGS analysis at generations 2 and 3, when the Reckh allele was still present at relatively low frequencies in the population and random mating was more likely to have taken place between Reckh/+ heterozygous and WT mosquitoes. Thus, we envisioned that the source of Reckh allele could be tracked back to a male versus female parent of origin by examining whether a dominant WT allele was present (inherited paternally) or not (inherited maternally) (Fig.5e, f). Following this reasoning, we inferred a strong bias for progeny inheriting the Reckh element from a Reckh+ males mating with WT females during generations 2 and 3 than the reverse (i.e., female transmission of Reckh alleles) in the maternally seeded lineage. Indeed, in one maternally seeded replicate (cage 2, generation 3), 100% of the progeny had inherited the Reckh element from their fathers (Fig.5f). In contrast to the striking sex-specific transmission bias observed in maternally seeded cages, progeny from paternally seeded cages displayed more evenly distributed stochastic parental inheritance patterns (Fig.5f). These highly reproducible parent of origin signatures demonstrate the utility of ICP in allelic lineage tracking, which could be of great potential utility in evaluating alternative initial release strategies for gene-drive mosquitoes as well as post-release surveillance of gene-drives as they spread through wild target populations (see Discussion).

Another important challenge for deciphering DSB repair outcomes is to track both NHEJ and gene-cassette mediated HDRevents within the same sample. Such a comprehensive genetic detection tool could have broad impactful applications (see Discussion). For example, one important and non-trivial application is to follow the progress of gene-drives in a marker free fashion as they spread through insect populations. Such dual tracking capability would address the potential concern that mutations eliminating a dominant marker for the gene-drive element could evade phenotype-based assessments of the drive process. Accordingly, we devised a three-step short-amplicon based deep sequencing (200400bp) strategy based on tightly linked colony-specific nucleotide polymorphisms distinguishing donor versus receiver chromosomes to detect copying of two CopyCatcher elements, pleCC and hthCC, from their chromosomes of origin (donor chromosome) to WT homologous (receiver chromosome) targets (Fig.6a)49. Notably, this strategy only amplified the inserted gene cassette on the donor chromosome and or the cassette if it copied onto the receiver chromosome. Thus, the measured allelic frequencies indicate the relative proportions of gene cassettes copied to the receiver chromosome versus those residing on the donor chromosome (Fig.6b displays the inferred somatic HDR frequency quantified from the three-step NGS sequencing protocol as well as Indels quantified by our standard 2-step NGS sequencing protocol - see Methods section for additional details).

a Scheme for tracking gene-drive copying using NGS. Gray bars: genomic DNA, pink oval: Cas9 protein, sky-blue line: gRNA, colored asterisks: polymorphisms. Color coded rectangles represent four nucleotides. Four possible recombinants listed are generated by resolving Holliday junctions at different sites marked with black crosses. b NGS sequencing-based quantification of somatic HDR generated by pleCC in F1 progeny. Areas delineated by dotted lines indicate patches of cells in which somatic HDR copying events have taken place either under bright field (upper) or RFP fluorescent filed (middle). Bottom bars are the summary of the inferred frequency for the somatic HDR (orange), indels (green) and WT alleles (black) derived from the deep sequencing data using the same samples photographed above. More than three flies from each cross were imaged and used for analysis. Scale bars indicate 200 pixels. c Somatic HDR profile with ple gRNA. The red line is for Maternal-F cross and dark blue line for the Paternal-F cross. d Diagram of the hthCC. Black double arrow: recoded hth cDNA, blue rectangles: exon 1, light green rectangles: exons 2-14, and colored lines underneath represent probes used for detection. e In situ images with embryos laid from hthCC-vasa-Cas9 females crossed with WT males. Blue=exon 1, green=WT exons 2-14, red=recoded cDNA for exons 2-14. Insets are magnified single nuclei indicated by colored arrows. This experiment has been repeated at least three times. Scale bars stand for 10m. f Temporal profiles for somatic HDR-mediated copying of the hthCC element assessed by NGS as described for the pleCC in panels c and f. Y-axis tabulates the percentage of HDR at a given time point. Table at the bottom quantifies the HDR fraction at given time points for both the Paternal-F and Maternal-F crosses. Source data are provided as a Source Data file.

In our first set of experiments, we analyzed editing outcomes by examining F1 progeny derived from Maternal-S and Paternal-S pleCC crosses. We compared the rates of somatic HDR measured by NGS analysis to those evaluated by image-based phenotypes associated with copying of the CopyCatcher element. As summarized previously, CopyCatchers such as the pleCC are designed to permit quantification of concordant homozygous mutant clonal phenotypes (e.g., pale patches of thoracic cuticle and embedded sectors ofcolorless bristles), with underlying DsRed+ fluorescent cell phenotypes49. Individual flies in which imaging-based analysis had been conducted were then subject toseparate NGS HDR-fingerprinting and INDELs-fingerprinting resulting in a comprehensive quantification of HDR, NHEJ, and WT alleles within the same sample (Fig.6b, libraries for HDR-fingerprinting and INDELs-fingerprinting were prepared from the same individual fly, but with different DNA preparation and sequencing protocols as detailed description in Methods). For these experiments, F1 flies were genotyped and those carrying both Cas9 and pleCC gRNA were used for NGS analysis (data shown here are the inferred frequencies of somatic HDR, NHEJ events, and WT alleles). This dual integrated analysis revealed that HDR in the Maternal-S crosses resulted in ~15% somatic HDR-mediated cassette copying events on average based on sequencing, and that such cassette copying was yet more frequent in Paternal-S crosses, producing ~25% somatic HDR. The nearly two-fold greater HDR-mediated copying efficiency detected by sequencing in Paternal-S crosses mirrors phenotypic outcomes wherein maternally inherited Cas9 similarly results in a lower frequency of cassette copying detected by fluorescence image analysis in somatic cells than for paternally inherited Cas9 (Fig.6b)49.

Our genetic analysis of stage-dependent differences in DSB repair pathway activity in this study is consistent with a commonly held view in the gene-drive field based on a variety of indirect genetic transmission data that HDR-mediated cassette copying does not occur efficiently during early embryonic stages50,51,63,67,68,69,70. This inference, however, has not yet been verified experimentally. We thus sought to provide direct evidence supporting this key supposition using NGS-based HDR-fingerprinting to track the somatic HDR events across a range of developmental stages in both Maternal-F and Paternal-F crosses in which the Cas9 and gRNA transgenes are transmitted together either maternally or paternally using our validated NGS sequencing protocol. Notably, we collected samples at 9 timepoints and pooled 20 F1 progeny together for pooled sequencing to prime the developmental profile of somatic HDR with pleCC (samples were thus collected without genotyping since it is impractical to genotype individual embryos and young larvae). Because of the limitations imposed by embryo pooling we were unable to use the same samples collected here for also quantifying the generation of somatic NHEJ alleles (i.e., only half of the F1 progeny carried the vasa-Cas9 transgene on the X chromosome and those embryos lacking this transgene were not suitable for generating mutations - note that such an analysis was possible in the case of the viable Reckh drive shown in Fig.4e as well as for a viable split-drive allele inserted into the essential prosalpha2 locus shown in Supplementary Fig. S9). Indeed, NGS analysis detected only very rare examples of somatic HDR events in early embryos derived from both crosses (Fig.6c). Notably, HDR in the Paternal-F cross detected by this sequencing protocol increased substantially to 35.9% during adult stages, a period coinciding with the temporal peak of the pale expression profile (note that in this experiment we employed the actin-Cas9 rather than vasa-Cas9 source, which has higher level of Cas9 expression in somatic cells and generates a correspondingly higher frequency of somatic HDR)49.

We extended our sequencing-based strategy to quantify somatic HDR using a second CopyCatcher element (hthCC) designed specifically to identify even rare copying events in early blastoderm-stage embryos. The hthCC is inserted into the homothorax (hth) gene and was engineered to visualize HDR-mediated copying of the gene cassette by fluorescence in situ hybridization (FISH) using discriminating fluorescent RNA probes complementary to specific endogenous versus recoded cDNA sequences (Fig.6d, e). In this system, copying of the transgene from the donor chromosome to the receiver chromosome would be indicated by the presence of two nuclear dots of red fluorescence detected by the hth recoded cDNA-specific probe (indicating two copies of recoded hth cDNA). In contrast, cells in which no copying occurred should contain only a single nuclear red dot signal (from the donor allele). Such in situ analysis detected no clear case of gene cassette copying in any of the ~5000 blastoderm stage cells examined across ~500 embryos (with the caveat that some mitotic nuclei generate ambiguous signals depending on their orientation). This qualified negative result assessed by in situ analysis was consistent with the very low estimates of HDR frequency during the same early blastoderm-stage developmental window based on NGS analysis in staged time-course experiments, although the latter sequencing method did detect very low levels of somatic HDR at ~3hours after egg laying from the Paternal-F crosses (and no copying until day three of larvae with the maternal cross Fig.6df). The very low levels of somatic HDR observed in early embryos for the hthCC construct either by in situ hybridization or by NGS sequencing parallel the results summarized above for the pleCC element (Fig.6c, f). The maximal somatic HDR frequency observed for the hthCC Maternal-F crosses (0.06% at day 3 after egg laying) was somewhat lower than that for the similar cross for pleCC (0.35% at adult stage), consistent with the predominance of single mutant alleles being generated at very early stages following fertilization in Maternal-F crosses. In contrast to the exceedingly rare copying of the hthCC element detected in early embryos for either the Maternal-F or Paternal-F crosses, the same element frequently copied to the homologous chromosome during later developmental stages in Paternal-F crosses as assessed by NGS sequencing. The hthCC elementagain copied with somewhat lower efficiency than the pleCC element (e.g., 15.2% for hthCC versus 35.9% for pleCC tabulated in adults), presumably reflecting differing genomic cleavage rates or gene conversion efficiencies generated by their respective gRNAs (including total cleavage levels and temporal features). In aggregate, these two examples of quantitative analysis of copying frequencies based on both NGS and in situ analysis demonstrate that ICP and NGS-based quantification of gene conversion events can be successfully integrated for a comprehensive analysis of DSB repair outcomes, including both NHEJ and HDR events as a function of developmental stage. These powerful tools also could be applied for following gene-drive spread through freely mating populations in a marker-free manner as well as for a variety of other applications including gene therapy (see Discussion).

View post:
Developmental progression of DNA double-strand break repair deciphered by a single-allele resolution mutation ... - Nature.com

Archives