1 Release Notes


1.0.1 Executive Summary

The Autism Inpatient Collection (AIC) is a multisite study that enrolled children and adolescents with ASD aged 4–20 years admitted to six specialized, inpatient psychiatry units which exclusively serve children with developmental delay (primarily autism and/or intellectual disability) who are admitted due to emotional and/or behavioral crises. Enrollment began March 2014 and continued until May 2024. Measures characterizing adaptive and cognitive functioning, communication, externalizing behaviors, emotion regulation, co-occurring psychiatric disorders, self-injurious behavior, parent stress and parent self-efficacy were collected.

Inpatients met criteria to enroll in the study either by a score of 12 or higher on the Social Communication Questionnaire (SCQ) completed by a caregiver OR through referral into the study by an inpatient unit psychiatrist, based on clinical concern for Autism. Children that are referred into the study may have a SCQ score less than 12. Once enrolled in the study, ASD diagnosis was evaluated by the Autism Diagnostic Observation Schedule, Second Edition (ADOS-2) and extensive inpatient observation. Biological samples from probands and their biological parents were banked and processed for DNA extraction and creation of lymphoblastoid cell lines.

Probands are categorized into three cohorts based on autism diagnosis confirmation (variable asd_dx_confirm_type): double-confirmed ASD, single-confirmed ASD, and non-ASD. This classification is determined using two variables: s_ados_diag, which indicates whether the participant received an ADOS diagnosis of autism (1=Yes, 0=No), and s_dxbxfinaldx, which indicates whether the participant received a clinician diagnosis of autism (1=Yes, 0=No). Double-confirmed ASD probands have both variables coded as 1, meaning they received a diagnosis from both the ADOS assessment and a clinician. Single-confirmed ASD probands have one variable coded as 1 while the other is 0 or missing data, indicating that only one source confirmed an autism diagnosis. Non-ASD probands have neither variable coded as 1, meaning they did not receive an autism diagnosis from either source.

The ADOS-2 examinations were completed by individuals at each site who had achieved research reliability, meeting the AIC-specific requirements set forth by certified ADOS-2 trainer and AIC Co-Investigator Robin Gabriels, Psy.D. ADOS-2 examiners were typically masters or doctoral level clinicians, such as social workers or clinical psychologists (LCSW, Psy.D., Ph.D.). Three ADOS-2 examiners were experienced research assistants who were fully trained to research reliability, either directly by Dr. Gabriels (one RA at the Colorado site) or trained by the AIC-site lead ADOS-2 examiner (one RA at the Bradley Hospital site, one at the Spring Harbor Hospital site), all meeting AIC-required, research-level reliability. Annual recalibration meetings were held with the ADOS examiners by Dr. Gabriels. When the ADOS-2 was administered by a RA, the supervising psychologist also observed the child directly to verify the diagnosis.

All probands had at least one parent/caregiver who participated in the AIC study and completed questionnaires about the proband and their own experiences (stress and self-efficacy measures). The parent/caregiver respondent may have been a biological parent or other primary caregiver.

1.0.2 Study Structure

During the course of the study, some measures were discontinued, while other were added, across 4 distinct phases, creating varying N’s within the dataset. Additionally, measures were completed at different timepoints (admission, stay, discharge, follow-up), depending on the design of each phase and the measure in question. For a complete list of measures administered across each phase their associated timepoints, please refer to the Study Measure Collection Schema.

Phase 1 of the study is the only phase that included measures administered at multiple timepoints. During Phase 1, the Aberrant Behavior Checklist (ABC), Parent Stress Index 4th Edition Short Form (PSI-4-SF), Emotion Dysregulation Inventory (EDI), Vineland-II, Functional Assessment Screening Tool (FAST) and the Difficult Behavior Self Efficacy Scale (DBSES) were collected at admission, discharge, and 2-month follow-up. Also included in Phase 1 was the Leiter-3 (non-verbal IQ test), the Repetitive Behavior Scale – Revised, Subscale II Self-Injurious Injury (RBS-R SIB) (completed by the Caregiver only), and the Child & Adolescent Symptom Inventory-5 (CASI-5). Data collection with these measures was discontinued at the start of Phase 2. The maximum possible N for the above measures collected only during Phase 1 is 376. In addition to these standardized measures, information on demographics, medical history, and other factors were collected. During Phase 1, sleep observation data was collected. This was discontinued before Phase 1 ended and the maximum N for the observed sleep data is 218.

Phase 2 discontinued performing assessments at multiple timepoints, the FAST and sleep data collection. During this phase, the Child Behavior Checklist (CBCL), Augmentative and Alternative Communication (AAC) data collection form, and the RBS-R SIB Staff were added to the study. The maximum N for measures that were administered only during Phase 2 is 742. The ABC, EDI, and Leiter-3 continued to be given during phase 2.

Phase 3 discontinued performance of the RBS-R SIB, DBSES, & CASI-5, and added the Behavior Problems Inventory (BPI-01) and Children’s Sleep Habits Questionnaire (CSHQ). The Vineland-II assessment was replaced with the Vineland-3. The maximum N for measures that were administered only during Phase 3 is 289.

In Phase 4 of the study, the Open-Source Challenging Behavior Scale (OSCBS), Pediatric Anxiety Rating Scale (PARS) and a comprehensive Puberty Questionnaire were introduced, as well as several measures which had very small (<5) samples (Adult Behavior Checklist, Adult Functioning Scale, Emotion Dysregulation Inventory Self-Report (EDI), Pittsburgh Sleep Quality Index (PSQI), and Suicidal Behaviors Questionnaire-Autism Spectrum Conditions (SBQ-ASC)). Additionally, the CSHQ was updated to the latest version, the CSHQ-2. The maximum N for measures that were administered only during Phase 4 is 110.

The majority of measures were administered throughout all phases and have a maximum possible N of 1544.

All instruments, regardless of phase, are completed about the proband or by the primary caregiver about themself. Some respondents may not have completed every instrument. Respondents may have declined a particular measure or may have withdrawn or were lost to contact before completing a measure.

Other instruments may not have been applicable for certain participants, for example, if a proband had no self-injurious behavior reported on the caregiver completed RBS-R SIB subscale, then the Functional Assessment Screening Tool (FAST) 3 was not applicable, and therefore not completed.

1.0.3 Data Structure

Data for each measure in the study is provided in a separate CSVs. Data for double-confirmed ASD and single-confirmed ASD cohorts are available in separate files with the prefix dasd- and sasd-, respectively. Data for the non-ASD cohort will not be released.

Each CSV contains aic_id, aic_fid, a_csex, and asd_dx_confirm_type variables. In addition, most files include metadata variables to provide context about data collection for that measure:

  • [measure]_age: Indicates child age when form was completed
  • [measure]_admitdaysto and [measure]_dcdaysto: Indicates number of days between admission/discharge and form completion
  • [measure]_respondent: Specifies who completed the form (e.g., parent, self, clinician).

If a participant had no data for a given measure (e.g., because form was not returned, measure was not applicable, measure was not being administered when participant was enrolled), they will not be included in that corresponding file. The complete list of enrolled participants can be found in the Enrollment file.

Data is organized in wide format, and variable prefixes indicate what timepoint data was collected at (i.e., “admission”, “stay”, “discharge”, “follow-up”; see the Data Dictionary for details).

1.0.5 Measure Specific Notes

1.0.5.1 Aberrent Behavior Checklist

The full Aberrant Behavior Checklist (ABC) was administered at the admission timepoint. At discharge and follow-up, data collection was limited to the Irritability subscale.

1.0.5.2 ADOS-2 Module 4

During the course of this study, a revised scoring algorithm (Hus and Lord 2014) became available for ADOS-2 Module 4. Phase 1 participants assessed using Module 4 were scored and diagnosed only using the older algorithm available at the time (WPS 2012), providing a Communication Score [s_ados4_wps2012_c_total], a Reciprocal Social Interaction Score [s_ados4_wps2012_si_total], a Communication and Reciprocal Social Interaction total score [s_ados4_csi_total_wps2012], and a Stereotyped Behaviors and Restricted Interests Score [s_ados4_wps2012_srb_total]. The older algorithm (WPS 2012) does not include a comparison score, so any cases that fall into this category will not have a comparison score value [s_ados_compscore]. Phase 1 Module 4 scores were not revised and diagnoses were not altered using the new algorithm to protect the integrity of the Phase 1 ASD diagnoses, early cohort descriptions, and published analysis findings contemporaneous with the scoring standard at that time.

For Phase 2 participants assessed after the revised Module 4 algorithm became available to the study, ONLY the new algorithm was scored because it had become the accepted standard. The Hus and Lord algorithm provides a Social Affect total score [s_ados4_sa_total_hus], Restricted and Repetitive Behaviour total Score ([s_ados4_rrb_total_hus], an Overall total score [s_ados4_overall_total_hus], and a comparison score in line with the other modules [s_ados_compscore]. For Module 4 assessments completed during Phase 2 but before the transition to the new algorithm, protocols were re-scored and participants will have both old (WPS 2012) algorithm scores and subscale total scores, as well as the revised (Hus and Lord 2014) algorithm scores and subscale scores.

The variables that feed the ADOS-2 Module 4 “old” (WPS 2012) algorithm composite scores are prefixed with s_ados4_alg_, followed by: a4, a8, a9, a10, b1, b2, b6, b8, b9, b11**, b12**, c1, d1, d2, d4, and d5.

** confirm these – jen suggests b11/b12 may be b10 or b11

The variables that feed the ADOS-2 Module 4 “revised” (Hus and Lord 2014) algorithm composite scores are prefixed with s_ados4_alg_, followed by: a2, a4, a8, a10, b1, b2, b5, b7, b9, b11**, b12, b13, d1, d2, and d4.

** confirm these – jen suggests b11 may be b10

Summary of Module 4 data in the data set:

  • Phase 1 – only old (WPS 2012) scoring algorithm available: old algorithm scores only

  • Phase 2 – prior to revised (Hus and Lord 2014) scoring algorithm availability: old and revised algorithm scores available

  • Phase 2 – after revised scoring algorithm availability: revised algorithm scores only

Please note that while the ADOS Module 4 is for older teens and adults, it was administered well below the recommended age range in several individuals. This affects the validity of the measure and could serve to increase severity scores and risk of false positive diagnosis.

Further, there are known issues with occasional missing algorithm items, which would ordinarily render total scores and classifications on the ADOS invalid; however, all cases met cutoffs and are included.

1.0.5.3 CSHQ – UDPATE

In phase 4, the CSHQ-1 was updated to the more streamlined CSHQ-2, with the primary differences lying in variable naming conventions, response formatting, and content scope. Both versions of the CSHQ contain detailed sleep-related measures categorized into domains such as sleep initiation, anxiety, night waking, and daytime alertness, with responses formatted using specific dropdowns and radio buttons. Some variables present in CSHQ-1, like specific sleep behavior items, were consolidated in CSHQ-2. Additionally, CSHQ-2 simplifies notes and response types.

[update with citations and details about versions once confirmed]

1.0.5.4 Intake and Medical Demographic Form

At admission, demographic variables (age, sex, marital status, and relationship to proband) were collected for up to 10 people residing in the household of the proband. These variables are labeled using the prefix a_hh_mem[number]_, where [number] refers to the household member (e.g., hh_mem1, hh_mem2, etc.). For example, a_hh_mem1_sex and a_hh_mem1_age refer to the sex and age of the first household member, respectively.

1.0.5.5 Leiter-3

Probands with a mental age below 3 years or severe unsafe behaviors throughout the stay may have not been testable using the Leiter-3. A few subjects had a very low mental age based on their Vineland scores and were unable to complete the Leiter-3. Despite low mental age, these participants were administered ADOS-2 Module 1.

1.0.5.6 Social Communication Questionnaire (SCQ)

Per SCQ instructions, respondents who answer No to Item 1 (indicating the participant cannot speak in phrases or short sentences) are instructed to skip items 2-7 that only pertain to verbal participants. Therefore, if Item 1 (a_scq_01) was ’No”, Items 2-7 (a_scq_02 - a_scq_07) were skipped and are coded as missing (blank).

Please note that items 2, 9, and 19-40 are reverse-coded (e.g., Yes = 0, No = 1) in the data.

1.0.5.7 Vineland – UPDATE

[ Add something about known missing subdomain scores in Vineland 2]

If a proband was administered the Vineland-II before the Vineland-3 was made available, Vineland-II data is included. The key difference between Vineland-II and Vineland-3 is that Vineland-II does not assess Motor Skills for individuals aged 7 and older, whereas Vineland-3 includes Motor Skills assessment up to age 9.

1.0.6 Data entry, Validation, and Exclusions

Data was entered and validated using the secure survey and database website REDCap.

1.0.6.1 Real Time Data Validation (RTDV)

RTDA was implemented within the REDCap screen, which limited data entry to a specific data range or format. Categorical fields and/or data validations were created wherever possible to avoid inconsistencies typically found with open text fields.

  • All date fields were formatted m/d/y.

  • Numerical format (only allowing numerical data to be entered) where applicable.

  • Age (collected for proband and all family members) could not be less than 0.

  • Enrollment date [enrolladmitdate], demographics date [demodate], and consent date [consentdate] must be after 1/1/2014.

  • Family ID must be greater than 1000.01 and less than 8000.00.

1.0.6.2 Missingness

All data underwent comprehensive missing data checks, which were executed within the electronic data entry form and addressed by each of the site research assistants (RAs). Missing data codes (i.e., 9999 or for dates 09/09/9999) and “not applicable” data codes (i.e., 8888 or for dates 08/08/8888) were entered where data was not obtainable.

1.0.6.3 Logic checks

Logic checks were executed within each measure and errors were addressed by the person entering data.

Enrollment:

  • Enrollment age (calculated using date of birth and admission date) was 4 years of age or more and less than 21 years of age.

Social Communications Questionnaire (SCQ):

  • Years of age at time of SCQ assessment met study criteria (i.e., years of age between 4–20, 11 months).

  • Total score was valid (i.e., if SCQ Item 1 = No, score must be below 34, If SCQ Item 1 = Yes, score must be below 40).

Diagnostic and Behavioral Summary

  • If “No co-morbid diagnoses” was checked, no other diagnoses were checked.

  • Section 5 — ASD discrepancy: No symptoms met and one or more symptoms were not checked.

  • Section 5 — DSM-5 Checklist for ASD: None of the A, B, C, D, E criteria was met and symptom criteria were not checked.

Autism Diagnostic Observation Schedule, Second Edition (ADOS-2):

  • Module 3: If A9 or B1 or B2 are coded as 2, then B3 should have been coded 8 by default.

Vineland:

  • Composite score was greater than 160

  • Adaptive behavior composite was greater than sum of domain standard scores.

  • Vineland-II excludes Motor Skills (ages ≥7).

  • Vineland-3 includes Motor Skills (up to age 9).

  • Age at time of Vineland is 7 years old or older; they should not have motor skill scores. This checks for records where the proband is 7 years of age or greater and has motor skills scores entered.

Inpatient Data Form:

  • Length of stay was less than 365 days.

1.0.6.4 Spot Checks

Ten percent of each site’s total confirmed Autism/ASD Family IDs were randomly chosen. Each site RA was given a custom template to complete (one template for each Family ID) to record frequencies of data errors by instrument and event. For the Family IDs included, all data points were confirmed between the paper hard copy and the electronic data collection form to ensure the data matches. Once the templates were completed, they were returned to the data manager. The data manager summarized the errors and disseminated the summary to the group to decide if there were any systematic data errors. Spot checks identified an error rate of <1 percent, and no systematic errors were identified.

1.0.7 Data Cleaning and Preparation for Release

1.0.7.1 Removal of errors

During data review, errors were identified in the CBCL, CASI-5, Vineland, and Leiter scores by a SFARI reviewer. These errors, which were manually computed and entered into REDCap, have been suppressed/coded as missing (blank) in the dataset. At this time, the affected scores have not been re-scored. Users should be aware of these suppressions when analyzing the data and consider them when interpreting results.

1.0.7.2 Coding of Checkbox Items

The default coding of checkbox items downloaded from REDCap is 1=checked and 0=unchecked across all items, even when a checklist is not administered due to survey logic/branching. To avoid bias in response rates related to non-administration, if a checklist was not administered due to survey logic/branching, all checkbox items in that checklist are coded as missing (blank).

For example, consider a survey where a checklist (Q2) is only administered if Q1 = “yes”

  • if Q1 = “yes”: Q2 checkbox items will coded as 1=checked and 0=unchecked

  • if Q1 is not “yes” (e.g., Q1 = “no” or Q1 response is missing): all Q2 checkbox items will be coded as missing (blank)

To see whether checklist administration was dependent on survey logic/branching, please see the ‘Branching Logic’ column in the Data Dictionary.

1.0.7.3 Removal of missing data codes

Missing data codes (e.g., 9999, 09/09/9999) and “not applicable” data codes (e.g., 8888, 08/08/8888) were re-coded as missing (blank).

1.0.7.4 Exclusion of variables

Nonessential text fields were excluded from the data set (e.g., behavioral or other descriptive text notes that could have been unnecessarily identifying).

Questionnaires with sample sizes <5 (PSQI, SBQ, AFS, Adult Behavior Checklist, EDI-Self) were excluded from the dataset.

Date fields (e.g., measure dates, date of birth) have been removed from the data set for de-identification purposes.

1.0.7.5 Calculation of _age and _daysto variables

In place of date variables, (1) child age and (2) number of days between admission (for “admission” and “stay” events) or discharge (for “discharge” and “follow-up” events) and form completion were calculated for each measure. If a measure was completed before the reference date (e.g. parent completed an “admission” survey before the patient was admitted), this resulted in a negative value.

1.0.7.6 Calculation of _age_out_of_range variables

For select measures (ABC, CASI-5, CBCL, CSHQ, EDI, OSCBS), [form]_age_out_of_range variables were created to indicate whether a participant was outside the age range for which the scale was intended or originally developed. A value of 1 denotes that the participant’s age fell outside the age range for that measure, while a value of 0 indicates the participant was within range.

These variables are provided to support analytic decisions and help researchers identify cases where data may be less appropriate for use based on a participant’s age. This is particularly relevant for measures like the CBCL, which produce age-based standardized scores, where alignment with the intended age range directly influences score calculation and interpretation.

However, it remains up to the researcher to determine whether data from participants outside the original age range remain suitable for their specific analyses. For example, while the CSHQ was developed in children 2-10 years, it has also been used in research involving children >10 years.