11 Release Notes
11.0.1 Executive Summary
The Autism Inpatient Collection (AIC) is a multisite study that enrolled children and adolescents with ASD aged 4–20 years admitted to six specialized, inpatient psychiatry units which exclusively serve children with developmental delay (primarily autism and/or intellectual disability) who are admitted due to emotional and/or behavioral crises. Enrollment began March 2014 and continued until May 2024. Measures characterizing adaptive and cognitive functioning, communication, externalizing behaviors, emotion regulation, co-occurring psychiatric disorders, self-injurious behavior, parent stress and parent self-efficacy were collected.
Inpatients met criteria to enroll in the study either by a score of 12 or higher on the Social Communication Questionnaire (SCQ) completed by a caregiver OR through referral into the study by an inpatient unit psychiatrist, based on clinical concern for Autism. Children that are referred into the study may have a SCQ score less than 12. Once enrolled in the study, ASD diagnosis was evaluated by the Autism Diagnostic Observation Schedule, Second Edition (ADOS-2) and extensive inpatient observation. Biological samples from probands and their biological parents were banked and processed for DNA extraction and creation of lymphoblastoid cell lines.
Participants are categorized into three cohorts based on autism diagnosis confirmation: double-confirmed ASD, single-confirmed ASD, and non-ASD. This classification is determined using two variables: s_scoresumm_adosdiag, which indicates whether the participant received an ADOS diagnosis of autism (1=Yes, 0=No), and s_dxbxfinaldx, which indicates whether the participant received a clinician diagnosis of autism (1=Yes, 0=No). Double-confirmed ASD probands have both variables coded as 1, meaning they received a diagnosis from both the ADOS assessment and a clinician. Single-confirmed ASD probands have one variable coded as 1 while the other is 0, indicating that only one source confirmed an autism diagnosis. Non-ASD probands have both variables coded as 0, meaning they did not receive an autism diagnosis from either source. Each cohort has been made available as a separate dataset.
The ADOS-2 examinations were completed by individuals at each site who had achieved research reliability , meeting the AIC-specific requirements set forth by certified ADOS-2 trainer and AIC Co-Investigator Robin Gabriels, Psy.D. ADOS-2 examiners were typically masters or doctoral level clinicians, such as social workers or clinical psychologists (LCSW, Psy.D., Ph.D.). Three ADOS-2 examiners were experienced research assistants who were fully trained to research reliability, either directly by Dr. Gabriels (one RA at the Colorado site) or trained by the AIC-site lead ADOS-2 examiner (one RA at the Bradley Hospital site, one at the Spring Harbor Hospital site), all meeting AIC-required, research-level reliability. Annual recalibration meetings were held with the ADOS examiners by Dr. Gabriels. When the ADOS-2 was administered by a RA, the supervising psychologist also observed the child directly to verify the diagnosis.
All probands had at least one parent/caregiver who participated in the AIC study and completed questionnaires about the proband and their own experiences (stress and self-efficacy measures). The parent/caregiver respondent may have been a biological parent or other primary caregiver.
During data review, errors were identified in the CBCL, CASI, Vineland, and Leiter scores by an SFARI reviewer. These errors, which were manually computed and entered into REDCap, have been suppressed (replaced with NA) in the dataset. At this time, the affected scores have not been re-scored. Users should be aware of these suppressions when analyzing the data and consider them when interpreting results.
11.0.2 Study Structure
During the course of the study some measures were discontinued, while other were added, across 4 distinct phases, creating varying N’s within the dataset. For a complete list of measures administered across each phase, please refer to the Event Calendar.
Phase 1 of the study is the only phase that included measures administered at multiple time points. During Phase 1, the Aberrant Behavior Checklist (ABC), Parent Stress Index -4 (Short Form) (PSI-4-SF), Emotion Dysregulation Inventory (EDI), Leiter non-verbal IQ test, Vineland 2, Functional Assessment Screening Tool (FAST) and the Difficult Behavior Self Efficacy Scale (DBSES) were collected at admission, discharge, and 2 month follow up. Also included in Phase 1 was the Repetitive Behavior Scale – Revised, Subscale II Self-Injurious Injury (RBS-R SIB) (completed by the Caregiver only), and the CASI-5 . Data collection with these measures was discontinued at the start of Phase 2. The maximum possible N for the above measures collected only during Phase 1 is 376. In addition to these standardized measures, information on demographics, medical history, and other factors were collected. During Phase 1, sleep observation data was collected. This was discontinued before Phase 1 ended and the maximum N for the observed sleep data is 218. This sleep data is not included in the primary dataset; it is a separate file within the release package.
Phase 2 discontinued performing assessments at multiple time points, the FAST and sleep data collection. During this phase, the Child Behavior Checklist (CBCL), Augmentative and Alternative Communication data collection form, and the RBS-R SIB Staff were added to the study. The maximum N for measures that were administered only during Phase 2 is 742. The ABC, EDI, and Leiter continued to be given during phase 2.
Phase 3 discontinued performance of the Repetitive Behavior Scale, DBSS, & CASI, and added the Behavior Problems Inventory (BPI-01) and Children’s Sleep Habits Questionnaire (CSHQ), and the Vineland 2 assessment was replaced with the Vineland 3. In the case that a proband was administered the Vineland 2 before the Vineland 3 was made available, Vineland 2 data is included. The key difference between Vineland-II and Vineland-3 is that Vineland-II does not assess Motor Skills for individuals aged 7 and older, whereas Vineland-3 includes Motor Skills assessment up to age 9. The maximum N for measures that were administered only during Phase 3 is 289.
In Phase 4 of the study, significant updates and additions were made to the measurement tools employed. The Open-Source Challenging Behavior Scale (OS-CBS), Pediatric Anxiety Rating Scale (PARS) and a comprehensive Puberty Questionnaire, were introduced. Additionally, the Child Sleep Habits Questionnaire (CSHQ) was updated to the latest version, the CSHQ-2. The maximum N for measures that were administered only during Phase 4 is 110.
The majority of measures were administered throughout all phases and have a maximum possible N of 1543. Please refer to the table for measures included in this dataset.
All instruments, regardless of phase, are completed about the proband or by the primary caregiver about themself. Some respondents may not have completed every instrument. Respondents may have declined a particular measure or may have withdrawn or were lost to contact before completing a measure.
Other instruments may not have been applicable for certain participants, for example, if a proband had no self-injurious behavior reported on the caregiver completed RBS-R SIB subscale, then the Functional Assessment Screening Tool (FAST) 3 was not applicable, and therefore not completed.
11.0.4 Measure Specific Notes
General
Variable prefixes indicate the timepoint of data collection: variables beginning with ‘a_’ correspond to the admission timepoint, ‘s_’ indicate the stay timepoint, ‘d_’ represent the discharge timepoint, and ‘fu_’ denote the follow-up timepoint.
Variables marked “checkbox” appear in the full dataset as: 0 = No, 1 = Yes
Aberrent Behavior Checklist
The full Aberrant Behavior Checklist (ABC) was administered at the admission timepoint. At discharge and follow-up, data collection was limited to the Irritability subscale.
ADOS-2 Module 4 Algorithms
During the course of this study, a revised scoring algorithm became available for ADOS-2 Module 4.4 Phase 1 participants assessed using Module 4 were scored and diagnosed only using the older algorithm available at the time, providing a Communication Score [s_scoresumm_4_ctotal], a Reciprocal Social Interaction Score [s_scoresumm_4_sitotal], a Communication and Reciprocal Social Interaction total score [s_scoresumm_4_satotal], and a Stereotyped Behaviors and Restricted Interests Score [s_scoresumm_4_rrbtotal]. The older algorithm does not include a comparison score, so any cases that fall into this category will not have a comparison score variable for use. Phase 1 Module 4 scores were not revised and diagnoses were not altered using the new algorithm to protect the integrity of the Phase 1 ASD diagnoses, early cohort descriptions, and published analysis findings contemporaneous with the scoring standard at that time. Old algorithm variables are:
s_scoresumm_4_a4 s_scoresumm_4_a8 s_scoresumm_4_a9 s_scoresumm_4_a10 s_scoresumm_4_b1 s_scoresumm_4_b2 s_scoresumm_4_b6 s_scoresumm_4_b8 s_scoresumm_4_b9 s_scoresumm_4_b11 s_scoresumm_4_b12 s_scoresumm_4_c1 s_scoresumm_4_d1 s_scoresumm_4_d2 s_scoresumm_4_d4 s_scoresumm_4_d5
For Phase 2 participants assessed after the revised Module 4 algorithm became available to the study, ONLY the new algorithm was scored because it had become the accepted standard. The algorithm also provides a comparison score in line with the other modules [s_scoresumm_compscore]. For those Module 4 assessments completed during Phase 2 but before the transition to the new algorithm, protocols were re-scored and participants will have both original algorithm scores and subscale total scores, as well as the revised algorithm scores and subscale scores. New algorithm variables are:
s_scoresumm_4_a8 s_scoresumm_4_a10 s_scoresumm_4_b1 s_scoresumm_4_b2 s_scoresumm_4_b5 s_scoresumm_4_b7 s_scoresumm_4_b9 s_scoresumm_4_b11 s_scoresumm_4_b12 s_scoresumm_4_b13 s_scoresumm_4_a2 s_scoresumm_4_a4 s_scoresumm_4_d1 s_scoresumm_4_d2 s_scoresumm_4_d4
Summarizing Module 4 data in the data set: Phase 1 – only old scoring algorithm available: old algorithm scores only, new algorithm items that were not part of the previous algorithm will be coded as 8888 Phase 2 – prior to new scoring algorithm availability: items for both algorithms will contain data Phase 2 – after new scoring algorithm availability: only new algorithm scores will contain data, old algorithm items that are not part of the new algorithm will be coded as 8888
Please note that there are known issues with occasional missing algorithm items, which would ordinarily render total scores and classifications on the ADOS invalid; however, all cases met cutoffs and are included.
CSHQ
In phase 4, the CSHQ-1 was updated to the more streamlined CSHQ-2, with the primary differences lying in variable naming conventions, response formatting, and content scope. Both versions of the CSHQ contain detailed sleep-related measures categorized into domains such as sleep initiation, anxiety, night waking, and daytime alertness, with responses formatted using specific dropdowns and radio buttons. The CSHQ-2 streamlines variable names and introduces calculated fields such as “CSHQ Age.” Some variables present in CSHQ-1, like specific sleep behavior items, were consolidated in CSHQ-2. Additionally, CSHQ-2 simplifies notes and response types.
Intake and Medical Demographic Form
Family/household member demographics were collected for all persons living at home with the child (proband). Demographics were recorded for a maximum of 10 household members. Each of these demographic items end with a number (e.g., a_age1, a_sex1, a_marital1, a_relation1). This number indicates a specific household member. The age items (e.g., a_age1, a_age2, a_age3, a_age4, a_age5, a_age6, a_age7, a_age8, a_age9, a_age10) reflect the age of that particular family member listed. For example, a household member with a_age1=30, a_sex1=female, a_marital1=Married and a_relation1=Bio parent. This can be stated as bio parent who is female, 30 years of age and married. (This is one person, i.e., the bio mother).
Leiter
Probands with a mental age below 3 years or severe unsafe behaviors throughout the stay may have not been testable using the Leiter-3. A few subjects had a very low mental age based on their Vineland scores and were unable to complete the Leiter-3. Despite low mental age, these participants were administered ADOS-2 Module 1.
Sample Size of New Measures
Several new measures were introduced in Phase 4, resulting in a very small sample size (n) for these instruments. These measures include the Adult Behavior Checklist, Adult Functioning Scale (AFS Caregiver-Report & AFS Self-Report), Emotion Dysregulation Inventory Self-Report (EDI), Open-Source Challenging Behavior Scale (OSCBS), Pittsburgh Sleep Quality Index (PSQI), and Suicidal Behaviors Questionnaire-Autism Spectrum Conditions (SBQ-ASC).
11.0.5 Data entry, Validation, and Exclusions
Data was entered and validated using the secure survey and database website REDCap.
Real Time Data Validation (RTDV)
RTDA was implemented within the REDCap screen, which limited data entry to a specific data range or format. Categorical fields and/or data validations were created wherever possible to avoid inconsistencies typically found with open text fields. • All date fields were formatted m/d/y. • Numerical format (only allowing numerical data to be entered) where applicable. • Age (collected for proband and all family members) could not be less than 0. • Enrollment date [enrolladmitdate] must be after 1/1/2014. • Demographics date [demodate] must be after 1/1/2014. • Consent date [consentdate] must be after 1/1/2014. • Family ID must be greater than 1000.01 and less than 8000.00.
Missingness
All data underwent comprehensive missing data checks, which were executed within the electronic data entry form and addressed by each of the site research assistants (RAs). Missing data codes (i.e., 9999 or for dates 09/09/9999) and Not applicable data codes (i.e., 8888 or for dates 08/08/8888) were entered where data was not obtainable. These missing datapoints were recoded as System Missing (for SPSS), NA (for RDS), or blank (for CSV).
Logic checks
Logic checks were executed within each measure and errors were addressed by the person entering data.
Enrollment: • Enrollment age (calculated using date of birth and admission date) was 4 years of age or more and less than 21 years of age.
Social Communications Questionnaire (SCQ): • Years of age at time of SCQ assessment met study criteria (i.e., years of age between 4–20, 11 months). • Total score was valid (i.e., if SCQ Item 1 = No, score must be below 34, If SCQ Item 1 = Yes, score must be below 40).
Diagnostic and Behavioral Summary • If “No co-morbid diagnoses” was checked, no other diagnoses were checked. • Section 5 — ASD discrepancy: No symptoms met and one or more symptoms were not checked. • Section 5 — DSM-5 Checklist for ASD: None of the A, B, C, D, E criteria was met and symptom criteria were not checked.
Autism Diagnostic Observation Schedule, Second Edition (ADOS-2): • Module 3: If A9 or B1 or B2 are coded as 2, then B3 should have been coded 8 by default.
Vineland: • Composite score was greater than 160 • Adaptive behavior composite was greater than sum of domain standard scores. • Vineland-II excludes Motor Skills (ages ≥7). • Vineland-III includes Motor Skills (up to age 9). • Age at time of Vineland is 7 years old or older; they should not have motor skill scores. This checks for records where the proband is 7 years of age or greater and has motor skills scores entered.
Inpatient Data Form: • Length of stay was less than 365 days.
Spot Checks
Ten percent of each site’s total confirmed Autism/ASD Family IDs were randomly chosen. Each site RA was given a custom template to complete (one template for each Family ID) to record frequencies of data errors by instrument and event. For the Family IDs included, all data points were confirmed between the paper hard copy and the electronic data collection form to ensure the data matches. Once the templates were completed, they were returned to the data manager. The data manager summarized the errors and disseminated the summary to the group to decide if there were any systematic data errors. Spot checks identified an error rate of <1 percent, and no systematic errors were identified.
Data Excluded from the Data Set
Nonessential text fields were deleted from the data set (e.g., behavioral or other descriptive text notes that could have been unnecessarily identifying).
All fields with dates have been removed from the data set for de-identification purposes. For all measures collected during the admission and the during stay events, one latent variable was created which calculated the number of days between the hospital admission date and assessment date. For all measures collected during the discharge and two-month follow-up events, two latent variables were created: (1) number of days between admission date and assessment date and (2) number of days between discharge date and assessment date. In the process of creating these latent variables, if a measure was completed before the reference date, this resulted in a negative value.
A negative value occurs when an assessment date entered occurred before the hospital admission date entered (e.g. parent completed parent surveys the day before the patient was admitted).
11.0.6 Additional Participant Data Available
This 1543 data set includes all key phenotypic measures collected since establishing the AIC in 2014.
In Phase 1 Discharge and 2 Month Follow Up parent report measures were not obtained for enrolled participants who were found to be Non-ASD upon ADOS examination. For this reason, the following measures will not be found at the discharge and 2 month timepoints in the confirmed Non-ASD dataset: ABC, PSI, EDI, DBSES, and 2 Month Follow Up Form. The variable names for each of these measures can be found in the data reference.
In addition to our primary dataset, which includes individuals with double-confirmed autism (both ADOS-tested and clinician-diagnosed), we are releasing two auxiliary datasets this year. As in previous years, we will provide our non-ASD dataset, which includes individuals without an autism diagnosis. New this year, we are introducing a single-confirmed ASD dataset, which includes individuals who meet criteria for autism based on either ADOS testing or a clinician diagnosis but not both.
Social Communication Questionnaire (SCQ)
Per SCQ instructions, respondents who answer No to Item 1 (indicating the participant cannot speak in phrases or short sentences) are instructed to skip items 2-7 that only pertain to verbal participants. Therefore, if Item 1 (a_scq1) was ’No”, Items 2-7 (a_scq2, a_scq3, a_scq4, a_scq5, a_scq6, a_scq7) were skipped and are coded as missing (System Missing (for SPSS), NA (for RDS), or blank (for CSV)).
Please note that items 2, 9, and 19-40 are reverse-coded (e.g., Yes = 0, No = 1). This is reflected in the data; further reverse-coding is not needed for analysis.
Special Non-ASD Note Discharge and 2 Month Follow Up parent report measures were not obtained for enrolled participants who were found to be Non-ASD upon ADOS examination. For this reason, the following measures will not be found at the discharge and 2 month timepoints in the confirmed Non-ASD dataset: ABC, PSI, EDI, DBSES, and 2 Month Follow Up Form. The variable names for each of these measures can be found in the data reference.