Scientists identify features to better define long COVID

Scientists identify features to better define long COVID
Written by admin_3fxxacau

Press release

Monday, May 16, 2022

Using machine learning, researchers are finding patterns in electronic health record data to better identify people who may have the disease.

A research team supported by the National Institutes of Health has identified the characteristics of people with long COVID and those likely to have it. Scientists, using machine learning techniques, analyzed an unprecedented collection of electronic health records (EHRs) available for COVID-19 research to better identify who has long COVID. Explore anonymized EHR data in the National COVID Cohort Collaborative (N3C), a centralized national public database run by the NIH’s National Center for Advancing Translational Sciences (NCATS), the team used the data to find more than 100,000 long-lived likely COVID cases in October 2021 (as of May 2022, the number is greater than 200,000). The findings appear in Digital Health The Lancet.

The long COVID is marked by many symptoms, including shortness of breath, fatigue, fever, headache, “brain fog” and other neurological issues. These symptoms can last several months or longer after an initial diagnosis of COVID-19. One of the reasons COVID has long been difficult to identify is that many of its symptoms are similar to those of other illnesses and conditions. Better characterization of long COVIDs could lead to better diagnostics and new therapeutic approaches.

“It made sense to take advantage of modern data analysis tools and a unique big data resource like N3C, where many features of long COVID can be represented,” said co-author Emily Pfaff, Ph. D., clinical informatician at the University. from North Carolina to Chapel Hill.

The N3C data enclave currently includes information representing more than 13 million people nationwide, including nearly 5 million COVID-19 positive cases. The resource allows for rapid research on emerging questions regarding COVID-19 vaccines, therapies, risk factors and health outcomes.

The new research is part of a larger related trans-NIH initiative, COVID Research to Enhance Recovery (RECOVER), which aims to improve understanding of the long-term effects of COVID-19, called post-acute sequelae of SARS-CoV-2 infection (PASC). RECOVER will accurately identify people with PASC and develop approaches for its prevention and treatment. The program will also answer critical research questions about the long-term effects of COVID through clinical trials, longitudinal observational studies, and more.

In the Lancet study, Pfaff, Melissa Handel, Ph.D., of the University of Colorado’s Anschutz Medical Campus, and their colleagues looked at patient demographics, health care utilization, diagnostics, and medications in the health records of 97,995 adult COVID-19 patients from N3C. They used this information, along with data on nearly 600 long COVID patients from three long COVID clinics, to create three machine learning models to identify long COVID patients.

In machine learning, scientists “train” computational methods to quickly sift through large amounts of data to reveal new insights – in this case, about the long COVID. The models looked for patterns in the data that could help researchers both understand patient characteristics and better identify people with the disease.

The models focused on identifying potential long COVID patients among three groups in the N3C database: all COVID-19 patients, patients hospitalized with COVID-19, and patients who had COVID-19 but did not. not been hospitalized. The models were found to be accurate, as people identified as at risk for long COVID were similar to patients seen at long COVID clinics. The machine learning systems classified about 100,000 patients in the N3C database whose profiles closely matched those with long COVID.

“Once you are able to determine who has long COVID from a large database of people, you can start asking about those people,” said Josh Fessel, MD, Ph.D., clinical advisor senior at NCATS and a science program. lead in RECOVER. “Was there something different about these people before they developed long COVID? Did they have certain risk factors? Was there anything about the way they were treated during acute COVID that might have increased or decreased their risk for long COVID? »

The models looked for common characteristics, including new drugs, doctor visits and new symptoms, in patients with a positive COVID diagnosis who were at least 90 days away from their acute infection. The models identified patients as having long COVID if they went to a long COVID clinic or had long COVID symptoms and likely had the disease but were not diagnosed.

“We want to incorporate the new patterns that we see with the diagnostic code for COVID and include them in our models to try to improve their performance,” said Handel of the University of Colorado. “Models can learn from a wider variety of patients and become more accurate. We hope we can use our long COVID patient classifier for clinical trial recruitment.

This study was funded by NCATS, which contributed to the design, maintenance, and security of the N3C enclave, and the NIH RECOVER Initiative, supported by NIH OT2HL161847. RECOVER coordinates, among other things, the participant recruitment protocol to which this work contributes. Analyzes were conducted with data and tools accessible through NCATS N3C Data Enclave and supported by NCATS U24TR002306.

About the National Center for the Advancement of Translational Sciences (NCATS): NCATS conducts and supports research into the science and workings of translation – the process by which interventions to improve health are developed and implemented – to enable more treatments to reach more patients, faster. For more information on how NCATS helps shorten the journey from scientific observation to clinical intervention, visit

About the National Institutes of Health (NIH):The NIH, the country’s medical research agency, comprises 27 institutes and centers and is part of the US Department of Health and Human Services. The NIH is the primary federal agency that conducts and supports basic, clinical, and translational medical research, and studies the causes, treatments, and cures for common and rare diseases. For more information about the NIH and its programs, visit

NIH…Transforming Discovery into Health®


#Scientists #identify #features #define #long #COVID

About the author


Leave a Comment