Genetic correlates of phenotypic heterogeneity in autism – Nature.com

Posted: June 9, 2022 at 2:01 am

Participants

For factor analyses, we restricted our analyses to autistic individuals from the SSC and SPARK cohorts. Participants had to have completed the two phenotypic measures (details are below) to be included in the factor analyses. We also excluded autistic individuals with incomplete entries in either of the two measures (n=5,754 only in the SPARK cohort). This resulted in 1,803 participants (n=1,554 males) in the SSC, 14,346 participants (n=11,440 males) in SPARK version 3 and 8,271 participants (n=6,262 males) in extra entries from SPARK version 5 (SSC, mean age=108.75 months, s.d.=43.29 months; SPARK version 3, mean age=112.11 months, s.d.=46.43 months; SPARK version 5, mean age=111.22 months, s.d.=48.19 months). Only the SCQ was available for siblings in the SPARK study.

We conducted analyses using data from four cohorts of autistic individuals: the SSC (n=8,813)30, the Autism Genetic Resource Exchange (AGRE, CHOP sample) (nmax=1,200)64, the AIMS-2-TRIALS Longitudinal European Autism Project (LEAP) sample (nmax=262)65 and SPARK (n=29,782)31. For sibling comparisons, we included siblings from the SSC (n=1,829) and SPARK (n=12,260) cohorts. For trio-based analyses, we restricted to complete trios in the SSC (n=2,234) and SPARK (n=4,747) cohorts. For all analyses, we restricted the sample to autistic individuals who passed genetic quality control (QC) and who had phenotypic information.

We conducted factor analyses using the SCQ29 and the RBS28. The SCQ is a widely used caregiver report of autistic traits capturing primarily social communication difficulties and, to a lesser extent, repetitive and restricted behaviors29. There are 40 binary (yes-or-no) questions in total, with the first question focusing on the individuals ability to use phrases or sentences (total score, 039). We used the Lifetime version rather than the current version as this was available in both the SPARK and SSC studies. Of note, in the Lifetime version, questions 119 are about behavior over the lifetime, while questions 2040 refer to behavior between the ages of 4 to 5 years or in the last 12 months if the participant is younger. We excluded participants who could not communicate using phrases or sentences (n=217 in the SSC and n=17,092 in SPARK) as other questions in the SCQ were not applicable to this group of participants. The RBS is a caregiver-reported measure of presence and severity of repetitive behaviors over the last 12 months. It consists of 43 questions assessed on a four-point Likert scale (total score, 0129). Higher scores on both measures indicate greater autistic traits.

We conducted exploratory factor analysis on a random half of the SSC (n=901 individuals, of which 782 were males) using promax rotation to identify correlated factors as implemented by psych (ref. 66) in R. We conducted three sets of exploratory correlated factor analyses: for all items, for social items and for non-social items. Previous studies have provided support for a broad dissociation between social and non-social autism features12,23 and have conducted separate factor analyses of social (for example, refs. 67,68) and non-social autism features (for example, refs. 69,70). Thus, we reasoned that separating items into social and non-social categories might aid the identification of covariance structures that may not be apparent when analyzing all items together. We divided the data into social (all of the SCQ except item 1 and nine other items and item 28 from the RBS) and non-social (nine items from the SCQ (items 8, 11, 12 and 1418) and all items from RBS except item 28) items, which was carried out after discussion between V.W. and X.Z. The ideal number of factors to be extracted was identified from examining the scree plot (Supplementary Fig. 2), parallel analyses and theoretical interpretability of the extracted factors. However, we examined all potential models using confirmatory factor analyses as well to obtain fit indices, and the final model was identified using both exploratory and confirmatory factor analyses.

We then applied the model configurations from promax rotated exploratory factor analysis for bifactor models to explore the existence of general factor(s). In addition to a single general factor bifactor model, we divided the data into social and non-social items as mentioned earlier and applied bifactor models separately for the social and non-social items. Hierarchical values and explained common variances were then calculated for potential models as extra indicators of the feasibility of bifactor models, but hierarchical values were not greater than 0.8 for most of the models tested, and explained common variances were not greater than 0.7 (refs. 71,72,73) for any of the models tested (Supplementary Table 2).

Three rounds of confirmatory factor analyses were conducted: first for the second half of the SSC, followed by analysis of SPARK participants whose phenotypic data were available in version 3 of the data release and, finally, analysis of SPARK participants whose phenotypic data were available only in version 4 or version 5 of the data release and not in the earlier releases. To evaluate the models, multiple widely adopted fit indices were considered, including the comparative fit index (CFI), the TLI and the root mean square error of approximation. In CFA, items were assigned only to the factor with the highest loading to attain parsimony. We conducted three broad sets of confirmatory factor analyses: (1) confirmatory factor analyses of all correlated factor models, (2) confirmatory factor analyses of the autism bifactor model and (3) confirmatory factor analyses of social and non-social bifactor models. For each of these confirmatory factor models, we limited the number of factors tested based on the slope of the scree plots and based on the number of items loading onto the factor (five or more). For the confirmatory factor analyses of social and non-social bifactor models, we iteratively combined various numbers of social and non-social group factors. In bifactor models, items without loading onto the general factor in the correspondent EFA were excluded. Items were allocated to different group factors, which were identified based on the highest loading (items with loading <0.3 were excluded). Due to the ordinal nature of the data, all CFAs were conducted using the diagonally weighted least-squares estimator (to account for the ordinal nature of the data) in the R package lavaan 0.6-5 (ref. 74). We identified the model most appropriate for the data at hand with TLI and CFI>0.9 (TLI and CFI>0.95 for bifactor models), low root mean square error of approximation and good theoretical interpretability based on discussions between V.W. and X.Z. Additionally, as sensitivity analyses, the identified model (correlated six-factor model) was run again with two orthogonal method factors mapping onto SCQ and RBS-R to investigate if the fit indices remained high after accounting for covariance between items derived from the same measure, as these measures vary subtly during the period of time evaluated. We also reanalyzed the identified model after removing items that were loaded onto multiple factors (>0.3 on two or more factors) to provide clearer theoretical interpretation of the model. For genetic analyses, we used factor scores from the correlated six-factor model without including the orthogonal method factors and without dropping the multi-loaded items.

QC was conducted for each cohort separately by array. We excluded participants with genotyping rate <95%, excessive heterozygosity (3s.d. from the mean), non-European ancestry as detailed below, mismatched genetic and reported sex and, for families, those with Mendelian errors >10%. SNPs with genotyping rate <10% were excluded, or they were excluded if they deviated from HardyWeinberg equilibrium (P<1106). Given the ancestral diversity in the SPARK cohort, HardyWeinberg equilibrium and heterozygosity were calculated in each genetically homogeneous population separately. Genetically homogeneous populations (corresponding to five super-populations: African, East Asian, South Asian, admixed American and European) were identified using the five genetic principal components calculated using SPARK and 1000 Genomes Phase 3 populations75 and clustered using UMAP76. Principal components were calculated using linkage disequilibrium-pruned SNPs (r2=0.1, window size=1,000kb, step size=500 variants, after removing regions with complex linkage disequilibrium patterns) using GENESIS77, which accounts for relatedness between individuals, calculated using KING78.

Imputation was conducted using the Michigan Imputation Server79 with 1000 Genomes phase 3 version 5 as the reference panel49 (for AGRE and SSC), with the HRC r1.1 2016 reference panel80 (for AIMS-2-TRIALS) or using the TOPMed imputation panel81 (for both releases of SPARK). Details of imputation have been previously reported82. SNPs were excluded from polygenic risk scores if they had minor allele frequency <1%, had an imputation r2<0.4 or were multi-allelic.

We restricted our PGS associations to four GWAS. First, we included a GWAS of autism from the latest release from the iPSYCH cohort (iPSYCH-2015)83. This includes 19,870 autistic individuals (15,025 males and 4,845 females) and 39,078 individuals without an autism diagnosis (19,763 males and 19,315 females). All individuals included in this GWAS were born between May 1980 and December 2008 to mothers who were living in Denmark. GWAS was conducted on individuals of European ancestry, with the first ten genetic principal components included as covariates using logistic regression as provided in PLINK. Further details are provided elsewhere49. We additionally included GWAS for educational attainment (n=766,345, excluding the 23andMe dataset)35, intelligence (n=269,867)34, ADHD (n=20,183 individuals diagnosed with ADHD and 35,191 controls)36 and schizophrenia (69,369 individuals diagnosed with schizophrenia and 236,642 controls)37. These GWAS were selected given the relatively large sample size and modest genetic correlation with autism. Additionally, as a negative control, we included PGS generated from a GWAS of hair color (blonde versus other, n=43,319 blondes and n=342,284 others) from the UK Biobank, which was downloaded from https://atlas.ctglab.nl/traitDB/3495. This phenotype has SNP heritability comparable to that of the other GWAS used (h2=0.15, s.e.=0.014), is unlikely to be genetically or phenotypically correlated with autism and related traits, and has a sample size large enough to be a reasonably well-powered negative control.

PGS were generated for three phenotypes using polygenic risk scoring with continuous shrinkage (PRS-CS)84, which is among the best-performing polygenic scoring methods using summary statistics in terms of variance explained85. In addition, this method bypasses the step of identifying a P-value threshold. We set the global shrinkage prior () to 0.01, as is recommended for highly polygenic traits. Details of the SNPs included are provided in Supplementary Table 3.

De novo variants were obtained from Antaki et al.19. De novo variants (structural variants and SNVs) were called for all SSC samples and a subset of the SPARK samples (phase 1 genotype release, SNVs only). To identify high-impact de novo SNVs, we restricted data to variants with a known effect on protein. These are damaging variants: transcript_ablation, splice_acceptor_variant, splice_donor_variant, stop_gained, frameshift_variant, stop_loss, start_loss or missense variants with MPC86 scores >2. We further restricted data to variants in constrained genes with a LOEUF score <0.37 (ref. 87), which represent the topmost decile of constrained genes. For SVs, we restricted data to SVs affecting the most constrained genes, that is, those with LOEUF score <0.37, representing the first decile of most constrained genes. We did not make a distinction between deletions or duplications. To identify carriers, non-carriers and parents, we restricted our data to samples from the SPARK and SSC studies that had been exome sequenced and families in which both parents and the autistic proband(s) passed the genotyping QC.

For genes associated with severe developmental disorders, we obtained the list of constrained genes that are significant genes associated with severe developmental disorders from Kaplanis et al.27. To investigate the association of this set of genes with autism and developmental disorders, we first identified autistic carriers with a high-impact de novo variant and then divided this group into carriers who had at least one high-impact de novo variant in a DD gene and carriers with high-impact de novo variants in other constrained genes.

Only individuals with undiagnosed developmental disorders are recruited into the Deciphering Developmental Disorders study, and, as such, known genes associated with developmental disorders that are easy for clinicians to recognize and diagnose may be omitted from the genes identified by Kaplanis et al.27. To account for this bias, we ran sensitivity analyses using a larger but overlapping list of genes identified from the Developmental Disorder Gene-to-Phenotype database (DDG2P). From this database, we used constrained genes that are either confirmed or probable developmental disorder genes and genes for which heterozygous variants lead to developmental phenotypes (that is, mono-allelic or X-linked dominant).

We identified 19 autism core and associated features that (1) are widely used in studies related to autism; (2) are a combination of parent-, self- and other-reported and performance-based measures to investigate if reporter status affects the PGS association; (3) are collected in all three cohorts; and (4) cover a range of core and associated features in autism. The core features are

ADOS88: social affect

ADOS88: restricted and repetitive behavior domain total score

ADI89: communication (verbal) domain total score

ADI89: restricted and repetitive behavior domain total score

ADI89: social domain total score

RBS28

Parent-reported Social Responsiveness Scale-2 (ref. 90): total raw scores

SCQ29

Insistence of sameness factor (F1)

Social interaction factor (F2)

Sensorymotor behavior factor (F3)

Self-injurious behavior factor (F4)

Idiosyncratic repetitive speech and behavior (F5)

Communication skills factor (F6).

The associated features are

Vineland Adaptive Behavior Scales91: composite standard scores

Full-scale IQ

Verbal IQ

Nonverbal IQ

Developmental Coordination Disorders Questionnaire92.

Measures of IQ were quantified using multiple methods across the range of IQ scores in the AGRE, SSC and LEAP studies. In the SPARK study, IQ scores were available based on parent reports on ten IQ score bins (Fig. 1c). We used these as full-scale scores. For analyses involving the SPARK and SSC cohorts, we converted full-scale scores from the SSC into IQ bins to match what was available from the SPARK study and treated them as continuous variables based on examination of the frequency histogram (Supplementary Fig. 8). For the six factors, we excluded individuals who were minimally verbal (Factor analyses), but these individuals were not excluded for analyses with other autism features.

We identified seven questions relating to developmental delay in the SPARK medical screening questionnaire. These are all binary questions (yes or no). Summed scores ranged from 0 to 7. The developmental phenotypes include the presence of

ID, cognitive impairment, global developmental delay or borderline intellectual functioning

Language delay or language disorder

Learning disability (learning disorder, including reading, written expression or math; or nonverbal learning disability)

Motor delay (for example, delay in walking) or developmental coordination disorder

Mutism

Social (pragmatic) communication disorder (as included in DSM IV TR and earlier)

Speech articulation problems.

We included the age of first words and the age of walking independently for further analyses. This was recorded using parent-reported questionnaires in the SPARK study and in ADI-R89 in the SSC study. While other developmental phenotypes are available, we focused on these two, as they represent major milestones in motor and language development and are relatively well characterized.

Before any statistical analyses, we visually inspected the distributions of the variables. All continuous variables were approximately normally distributed with the exception of the age of first words, the age of walking independently and the count of co-occurring developmental disabilities. For these three variables, we used quasi-Poisson or negative binomial regression to account for overdispersion in the data and because the variance was much greater than the mean. These models produced the same estimate but modestly different standard errors. Both have two parameters. However, while quasi-Poisson regression models the variance as a linear function of the mean, the negative binomial models the variance as a quadratic function of the mean. The model that produced the lower residual deviance was chosen between the two. For all other continuous variables, we used linear regression and parametric tests. For binary data, we used logistic regression as there was not a large imbalance in the case:control ratio.

For each cohort, PGS and high-impact de novo variants were regressed against the autism features with sex and the first ten genetic principal components as covariates in all analyses, with all continuous independent variables standardized. In addition, array was included as a covariate in SSC and AGRE datasets. This was performed using linear regression for standardized quantitative phenotypes, logistic regression for binary phenotypes (for example, association between PGS and the presence of a high-impact de novo variant), Poisson regression for count data (number of developmental disorders or delays, not standardized) and negative binomial regression for the age of walking independently or the age of first words (not standardized; MASS93 package in R).

For the association between genetic variables and core and associated autism phenotypes, we first conducted linear regression analyses for the four PGS first using multivariate regression analyses with data from SPARK (waves 1 and 2), SSC, AGRE and AIMS-2-TRIALS LEAP. This is of the form:

$$yapprox {textrm{PGS}}_{textrm{autism}} + {textrm{PGS}}_{textrm{schizophrenia}} + {textrm{PGS}}_{textrm{EA}} + {textrm{PGS}}_{textrm{intelligence}} + {textrm{sex}} + {textrm{age}} + 10 {textrm{PCs}},$$

(1)

where EA is educational attainment and 10PCs are ten principal components. For the negative control, we added the negative control as an additional independent variable in equation (1):

$$begin{array}{lll}yapprox {textrm{PGS}}_{textrm{autism}} + {textrm{PGS}}_{textrm{schizophrenia}} + {textrm{PGS}}_{textrm{EA}} + {textrm{PGS}}_{textrm{intelligence}} \+ {textrm{PGS}}_{textrm{hair color}} + {textrm{sex}} + {textrm{age}} + 10{textrm{PCs}}.end{array}$$

(2)

For the AGRE and SPARK studies, we ran equivalent mixed-effects models with family ID modeled as random intercepts to account for relatedness between individuals. This was carried out using the lme4 (ref. 94) package in R.

For high-impact de novo variants, we included the count of high-impact de novo variants as an additional independent variable in equation (1) and ran regression analyses for SPARK (wave 1 only) and SSC. To ensure interpretability across analyses, we retained only individuals who passed the genotypic QC, which included only individuals of European ancestries. Family ID was included as a random intercept:

$$begin{array}{l}yapprox {textrm{PGS}}_{textrm{autism}} + {textrm{PGS}}_{textrm{schizophrenia}} + {textrm{PGS}}_{textrm{EA}} + {textrm{PGS}}_{{mathop{{{rm{intelligence}}}}} } \+ {textrm{high-impact de novo count}} + {textrm{sex}} + {textrm{age}} + 10{textrm{PCs}.}end{array}$$

(3)

Effect sizes were meta-analyzed across the three cohorts using inverse-variance-weighted meta-analyses with the following formula:

$$begin{array}{l} {w_{i}} = {{mathrm{SE}}_{i}^{-2}} \ {{mathrm{SE}}_{mathrm{meta}}} = {surd}left(left({Sigma}_{1} w_{i}right)^{-1}right)\ {{beta}_{mathrm{meta}} = {Sigma}_{i}{{beta}_{i}}{{w}_{i}}{left({{Sigma}_{i}}{{w}_{i}}right)}^{-1}},end{array}$$

(4)

where i is the standardized regression coefficient of the PGS, SEi is the associated standard error and wi is the weight. P values were calculated from Z scores. Given the high correlation between the autism features and phenotypes, we used BenjaminiYekutieli false discovery rates to correct for multiple testing (corrected P<0.05). We calculated heterogeneity statistics (Cochrans Q and I2 values) for the PGS meta-analyses but not for the associations with high-impact de novo variants, as the latter were calculated using only two datasets (SSC and SPARK).

For the SPARK and SSC studies, we investigated the association between PGS (equation (1)) and being a carrier of a high-impact de novo variant (equation (3)) and the age of first walking and first words using negative binomial regression and conducted inverse-variance meta-analyses (equation (4)). We ran the same analyses for the SPARK study to investigate the association between PGS (equation (1)) and high-impact de novo variants (equation (3)) and counts of co-occurring developmental disabilities (quasi-Poisson regression). Leave-one-out analyses were conducted by systematically excluding one of seven co-occurring developmental disabilities and reconducting the analyses.

To investigate additivity between common and high-impact de novo variants, we conducted logistic regression with carrier status as a dependent binary variable and all PGS included as independent variables and genetic principal components, sex and age included as covariates. This was carried out separately for SPARK (wave 1) and SSC and meta-analyzed as outlined earlier.

Statistical significance of differences in factor scores between sexes were computed using t-tests. Associations with age and IQ bins were conducted using linear regressions after including sex as a covariate.

Matrix equivalency tests were conducted using the Jennrich test in the psych66 package in R. Power calculations were conducted using simulations. Statistical differences between pairwise correlation coefficients (carriers versus non-carriers) in core and associated features were tested using the package cocor95 in R. Using scaled existing data on full-scale IQ, adaptive behavior and motor coordination, we generated correlated simulated variables at a range of correlation coefficients to reflect the correlation between the six core factors and the three associated features. We then ran regression analyses using the simulated variable and high-impact de novo variants as provided in equation (3). We repeated this 1,000 times and counted the fraction of outcomes for which the association between high-impact de novo variant count and the simulated variable had P<0.05 to obtain statistical power. Differences in the age of walking and the age of first words between groups of autistic individuals and siblings were calculated using Wilcoxon rank-sum tests.

Polygenic transmission deviation was conducted using polygenic transmission disequilibrium tests14. To allow comparisons with midparental scores, residuals of the autism PGS were obtained after regressing out the first ten genetic principal components. These residuals were standardized by using the parental mean and standard deviations. We obtained similar results using PGS that had not been residualized for the first ten genetic principal components. We defined individuals without co-occurring ID as individuals whose full-scale IQ is above 70 the SSC and SPARK studies. Additionally, in the SPARK cohort, we excluded any of these participants who had a co-occurring diagnosis of intellectual disability, cognitive impairment, global developmental delay or borderline intellectual functioning. Analyses were conducted separately in the SSC and SPARK cohorts and meta-analyzed using inverse-variance-weighted meta-analyses. We additionally conducted pTDT analyses on non-autistic siblings to investigate differences between males and females.

For sex differences in high-impact de novo variants, we calculated relative risk in autistic females versus males based on (1) all carriers, (2) carriers of DD genes and (3) carriers of non-DD genes (SPARK wave 1 and SSC). For sensitivity analyses, we conducted logistic regression with sex as the dependent variable and carrier status for DD genes and either full-scale IQ and motor coordination scores (in SPARK wave 1 and SSC) or number of developmental disorders (only in SPARK wave 1) as covariates. For each sensitivity analysis, we provide the estimates of the unconditional analysis as well (that is, without the covariates).

We opted to conduct heritability analyses using unscreened population controls rather than family controls (that is, pseudocontrols or unaffected family members), as this likely reduces SNP heritability96 owing to parents having higher genetic likelihood for autism compared to unselected population controls55 and due to assortative mating97. Casecontrol heritability analyses were conducted using the ABCD cohort as population controls; specifically, the ABCD child cohort in the USA, recruited at the age of 9 or 10 years. This cohort is reasonably representative of the US population in terms of demographics and ancestry. As such, it represents an excellent comparison cohort for the SPARK and SSC cohorts. The ABCD cohort was genotyped using the Smokescreen genotype array, a bespoke array designed for the study containing over 300,000 SNPs. Genetic QC was conducted identically as for SPARK. Genetically homogeneous groups were identified using the first five genetic principal components followed by UMAP clustering with the 1000 Genomes data. We restricted our analyses to 4,481 individuals of non-Finnish European ancestries in the ABCD cohort. Scripts for this are available at https://github.com/vwarrier/ABCD_geneticQC. Imputation was conducted, similar to the analysis of SPARK data, using the TOPMed imputation panel.

For casecontrol heritability analyses, we combined genotype data from the ABCD cohort and from autistic individuals from the SPARK and SSC cohorts. We restricted the analysis to 6,328,651 well-imputed SNPs (r2>0.9) with minor allele frequency >1% in all datasets. Furthermore, we excluded multi-allelic SNPs and SNPs with minor allele frequency difference of >5% between the three datasets and, in the combined dataset, were not in HardyWeinberg equilibrium (P>1106) or had genotyping rate <99%. We additionally excluded related individuals, identified using GCTA-GREML, and individuals with genotyping rate <95%. We calculated genetic principal components for the combined dataset using 52,007 SNPs with minimal linkage disequilibrium (r2=0.1, 1,000kb, step size of 500 variants, removing regions with complex long-range linkage disequilibrium). Visual inspection of the principal-component plots did not identify any outliers (Supplementary Fig. 9). While our QC procedure is stringent, we note that there will be unaccounted-for effects in SNP heritability due to fine-scale population stratification, differences in genotyping array and participation bias in the autism cohorts. However, our focus is on the differences in SNP heritability between subgroups of autistic individuals, and unaccounted-for casecontrol differences will not affect this.

We calculated SNP heritability for autism and additionally in subgroups stratified for the presence of ID, sex, sex and ID together, and the presence of high-impact de novo variants. We also conducted SNP heritability in subgroups of autistic individuals with scores >1s.d. from the mean for each of the six factors, autistic individuals with F1 scores>F2 scores and autistic individuals with F2 scores>F1 scores.

We calculated the observed-scale SNP heritability (baseline and subgroups) using GCTA-GREML52,53 and, additionally, using PCGC54. In all models except for the sex-stratified models, we included sex, age in months and the first ten genetic principal components as covariates. In the sex-stratified models, we included age in months and the first ten genetic principal components as covariates. For sex-stratified heritability analyses, both case and control data were from the same sex. For GCTA-GREML, the observed-scale SNP heritability was converted into liability-scale SNP heritability using equation (23) from Lee et al.98. PCGC estimates SNP heritability directly on the liability scale using the prevalence rates from Maenner et al.99. For all analyses, we ensured that the number of cases did not exceed the number of controls, with a maximum case:control ratio of 1.

We used prevalence rates from Maenner et al.99, which provides prevalence of autism among 8 year olds (1.8%). The study also provides prevalence rates by sex and by the presence of ID. However, there is wide variation in autism prevalence. We thus recalculated the SNP heritability across a range of state-specific prevalence estimates obtained from Maenner et al.99. For estimates of liability-scale heritability for subtypes defined by factor scores >1s.d. from the mean, we estimated a prevalence of 16% of the total prevalence. For F1>F2 and F2>F1, prevalence was estimated at 50% of the total autism prevalence. Estimating approximate population prevalence of autistic individuals with high-impact de novo variant carriers is difficult due to ascertainment bias in existing autism cohorts. However, a previous study has demonstrated that the mutation rate for rare protein-truncating variants is similar between autistic individuals and siblings from the SSC and autistic individuals and population controls from the iPSYCH sample in Denmark, which does not have a participation bias100, implying that the de novo mutation rate in autistic individuals from the SPARK and SSC cohorts may be generalizable. Using the sex-specific proportion of de novo variant carriers and autism prevalence, we calculated a prevalence of 0.2% for being an autistic carrier of a high-impact de novo variant.

For sex-stratified SNP heritability analyses, we additionally calculated SNP heritability for a range of state-specific prevalence estimates to better model state-specific factors that contribute to autism diagnosis. In addition, using a total prevalence of 1.8%, we estimated SNP heritability using a male:female ratio of 3.3:1 (ref. 51) to account for diagnostic bias that may inflate the ratio.

We used GCTA-GREML to also estimate SNP heritability for the six factors, full-scale IQ and the bivariate genetic correlation between them. We used the same set of SNPs used in the casecontrol analyses. We were unable to conduct bivariate genetic correlation for the casecontrol datasets due to limitations of sample size.

We received ethical approval to access and analyze de-identified genetic and phenotypic data from the three cohorts from the University of Cambridge Human Biology Research Ethics Committee.

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

View original post here:
Genetic correlates of phenotypic heterogeneity in autism - Nature.com

Related Posts

Comments are closed.

Archives