Data mining approaches for the detection of signals associated with pediatric post-acute sequelae of COVID-19

In a recent study posted to the medRxiv* preprint server, researchers described tree-based scan statistics for pediatric long coronavirus disease 2019 (COVID-19).

Study: Understanding pediatric long COVID using a tree-based scan statistic approach: An EHR-based cohort study from the RECOVER Program. Image Credit: Lightspring/Shutterstock


The National Institutes of Health (NIH) initiated the Researching COVID to Enhance Recovery (RECOVER) initiative in 2021 in order to utilize electronic health record (EHR) data to determine and classify patients having post-acute sequelae of COVID-19 (PASC), as described by the NIH as an inability to recover from SARS-CoV-2 infection or persistent symptomatology for over 30 days.

 According to the literature, PASC has been predicted in COVID-19-affected patients and its origin, risk factors, and outcomes have been described. So far only a few studies have accurately described PASC among children.

About the study

In the present study, researchers aimed to discover PASC signals using data mining instead of clinical experience.

Two comparisons dominated the given analyses. PASC cases were compared to severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infected and uninfected patients. PASC evidence included a U09.9 diagnosis code, an EHR interface terminology (IMO) term containing either the strings ('post' and 'acute' and 'covid'), or a B94.8 diagnosis code.

Infected patients had a positive polymerase chain reaction (PCR), antigen, or serology test for SARS-COV-2. Serology tests for infection included immunoglobulin (Ig)-M, IgG anti-nucleocapsid (N) antibodies, IgG anti-spike (S) or receptor-binding domain (RBD) antibodies, and IgA and IgG undifferentiated antibodies. Patients with COVID-19 in the hospital or emergency department (ED) were also classified as SARS-CoV-2-infected.

During the study's observation period, virus testing was still commonly performed in healthcare settings. Patients were deemed SARS-CoV-2 uninfected if (1) all diagnostic tests such as an antigen, PCR, and serology were negative during the study period, and (2) the patient had no diagnosis codes that indicated COVID-19, multisystem inflammatory syndrome in children (MIS-C), or PASC.

The entry date of the cohort for incident PASC infections was the same as the initial positive antigen or PCR test, four weeks prior to the initial positive serology test, or four weeks prior to the initial PASC diagnosis in the absence of no confirmatory test.

The entry date of the non-PASC COVID-19 positive patients was based on the first COVID-19 diagnosis or encounter. For SARS-CoV-2-uninfected patients, random negative tests were noted as the cohort entry dates. At cohort entry, all case and control patients were aged over 21 years. For every diagnosis code and patient, the team constructed a binary indicator for incident occurrence within 28 to 179 days of cohort admission.

The team used utilized the International Classification of Diseases 10th Revision (ICD-10) vocabulary as inputs. The hierarchy followed had seven levels of nodes corresponding to each level of the tree scan. The hierarchy was referred to alternately as a tree while the cluster that contained a node a along with its descendants were termed as the branch of a tree, or a cut.


A total of 13,750 patients were recruited for the three cohorts between 1 March 2020 and 22 June 2022, with 1,250 PASC infection cases. Younger boys and girls were less likely to be in the PASC cohort. Most cohorts entered in the fall of 2021.

Multiple statistical indications emerged when comparing PASC and COVID-19-infected individuals. At the top-most level of the tree scan, appreciable cuts were noted with ICD-10 codes for signs, symptoms, and clinical and laboratory results which were not elsewhere categorized, musculoskeletal and connective tissue diseases, nervous system disorders, respiratory disorders, mental and behavioral diseases, nutritional, endocrine, and metabolic disorders, circulatory system diseases, variables impacting health status and health services, subcutaneous and skin disease; and digestive system diseases.

Within the branch describing uncategorized signs and symptoms, the three leading cuts corresponded to respiratory and circulatory symptoms, general symptoms, and cognitive, perceptual, emotional, and behavioral symptoms.


The study findings showed several PASC-related disorders and body systems. Since the study employed data-driven methods, the team identified several novel or under-reported illnesses and symptoms.

The researchers believe that a more data-driven approach to knowledge discovery is required due to the pandemic's fast-changing nature and the lack of agreement on the precise symptoms that characterize PASC in children. This comprehensive analysis of diagnoses in a cohort of children with PASC adds much to the medical community's knowledge of the complex symptoms of this disorder.

The study findings can guide the design of future prospective studies to more thoroughly explore the patterns found here, enhance therapeutic practice, and focus research on the biochemical underpinnings of PASC.

*Important notice

medRxiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as conclusive, guide clinical practice/health-related behavior, or treated as established information.

Journal reference:
  • Lorman, V. et al. (2022) "Understanding pediatric long COVID using a tree-based scan statistic approach: An EHR-based cohort study from the RECOVER Program". medRxiv. doi: 10.1101/2022.12.08.22283158.

Posted in: Medical Science News | Medical Research News | Disease/Infection News

Tags: Antibodies, Antigen, Children, Coronavirus, Coronavirus Disease COVID-19, covid-19, Diagnostic, Digestive System, Electronic Health Record, Endocrine, Healthcare, Hospital, Immunoglobulin, International Classification of Diseases, Laboratory, Metabolic Disorders, Musculoskeletal, Nervous System, Pandemic, Polymerase, Polymerase Chain Reaction, Receptor, Research, Respiratory, SARS, SARS-CoV-2, Serology, Severe Acute Respiratory, Severe Acute Respiratory Syndrome, Skin, Syndrome, Virus

Comments (0)

Written by

Bhavana Kunkalikar

Bhavana Kunkalikar is a medical writer based in Goa, India. Her academic background is in Pharmaceutical sciences and she holds a Bachelor's degree in Pharmacy. Her educational background allowed her to foster an interest in anatomical and physiological sciences. Her college project work based on ‘The manifestations and causes of sickle cell anemia’ formed the stepping stone to a life-long fascination with human pathophysiology.

Source: Read Full Article