Michael A Simon, Ryan Luginbuhl, Richard Parker
Both clinical trials and studies leveraging real-world data have repeatedly confirmed the three COVID-19 vaccines authorized for use by the Food and Drug Administration are safe and effective at preventing infection, hospitalization, and death due to COVID-19 and a recent observational study of self-reported symptoms provides support that vaccination may also reduce the probability of developing long-COVID. As part of a federated research study with the COVID-19 Patient Recovery Alliance, Arcadia.io performed a retrospective analysis of the medical history of 240,648 COVID-19-infected persons to identity factors influencing the development and progression of long-COVID. This analysis revealed that patients who received at least one dose of any of the three COVID vaccines prior to their diagnosis with COVID-19 were 7-10 times less likely to report two or more long-COVID symptoms compared to unvaccinated patients. Furthermore, unvaccinated patients who received their first COVID-19 vaccination within four weeks of SARS-CoV-2 infection were 4-6 times less likely to report multiple long-COVID symptoms, and those who received their first dose 4-8 weeks after diagnosis were 3 times less likely to report multiple long-COVID symptoms compared to those who remained unvaccinated. This relationship supports the hypothesis that COVID-19 vaccination is protective against long-COVID and that effect persists even if vaccination occurs up to 12 weeks after COVID-19 diagnosis. A critical objective of this study was hypothesis generation, and the authors intend to perform further studies to substantiate the findings and encourage other researchers to as well.
Clinical trials and studies of real-world evidence have demonstrated repeatedly that the three COVID-19 vaccines* authorized for emergency use or approved for use by the Food and Drug Administration are safe and effective at preventing COVID-19 infection, hospitalization, and death (1). A recent observational study of self-reported symptoms goes further, concluding that vaccination may also reduce the likelihood and intensity of long-COVID (2), a proposal that has been raised anecdotally, as well (3). The COVID-19 Patient Recovery Alliance is performing a federated real-world data study of long-COVID and as a part of the federated study Arcadia.io performed a retrospective analysis of the medical history of COVID-19-infected persons to identify factors, both before and after COVID-19 diagnosis, influencing the development and progression of long-COVID.
Data for this analysis were collected from Arcadia Data Research (Arcadia.io, Burlington, MA), a normalized, de-identified clinical and operational dataset containing over 150 million patient records. Data were captured directly from electronic health record (EHR) systems, practice management systems, and health care payer claims and eligibility data, and subjected to data quality analyses for compliance with quality measure, risk adjustment, utilization and finance, and care management requirements. The MITRE Institutional Review Board (IRB) reviewed the study submission, COVID-19 Analytics (MIRB 2020017), and found that it is exempt under the provisions of 45 CFR 46.104(d)(4) as secondary research and approved the study. For the purposes of this analysis, all patients needed to meet the following criteria: (1) patient must have been alive as of March 1, 2020; (2) patient must have had at least one encounter with a provider documenting patient history prior to January 1, 2020; and (3) patient must have had at least one encounter with a practitioner who evaluated their health status after January 1, 2020. All qualifications needed to have been met by May 31, 2021, the cutoff for the data extract. This yielded a data sample covering roughly a year’s worth of COVID-19 activity in the United States, about six months of vaccination activity, and a consistent snapshot of the pandemic prior to the emergence of the delta variant (B 1.617.2) as the predominant variant in the United States at the end of June 2021 (4).
Patient demographics and health status were determined based on self-reported sex, age, race, and ethnicity, payer-reported eligibility, and clinically-reported medical conditions and symptoms, identified by ICD-9 or ICD-10 codes. Patients qualified for inclusion were further classified as having been diagnosed with COVID-19 during this period or not. The study cohort was further limited to patients diagnosed with COVID-19, having met either of the following requirements: (1) they were diagnosed with ICD-10 code U07.1 at any time or B97.29 prior to May 2020 in a medical encounter (i.e., the diagnosis was assessed by a provider in a face-to-face or equivalent encounter); or (2) they received a positive result from a COVID-19 nucleic acid amplification test (NAAT) or antigen test result. For patients meeting these criteria, an “index date” was set for the first incidence of COVID-19 diagnosis or positive test result. The index date needed to be at least 20 weeks prior to the cutoff date of the data extraction for the patient to be included in the sample population. Patients who died within twelve weeks of this index date were also excluded from this analysis, as any long-COVID outcomes could not be determined for those individuals.
To simplify the analysis, the set of conditions considered here were defined by two ICD-10 value sets: COVID-19 associated conditions (such as diabetes or chronic kidney disease)§ and COVID-19 associated symptoms (such as fever or loss of taste and smell)†. The value sets were established by the COVID-19 Interoperability Alliance and were provided courtesy of MITRE and Clinical Architecture (5). Long-COVID cases were classified as those where the patient presented one or more COVID-associated symptoms between 12 and 20 weeks after the initial COVID-19 diagnosis (6). Vaccination data were sourced from payer reports and documentation in EHR systems; no distinction was made between the three U.S. vaccine manufacturers, and although dose sequences were tracked, only the timing of the first dose was used for this analysis.
A logistic regression model based on a Newton Conjugate Gradient solution (Python statsmodels v0.12.2) was used to identify factors potentially influencing the persistence or onset of long-COVID symptoms. The model was applied to each of the long-COVID symptom variables, which included individual symptoms† and two aggregate indicators, “Any Symptom” and “>1 Symptom”. The vaccination timing bins – first dose prior to COVID diagnosis, between 0 and 4 weeks post-diagnosis, between 4 and 8 weeks post-diagnosis, between 8 and 12 weeks post-diagnosis, and unvaccinated prior to 12 weeks post-diagnosis – were used as “one-hot” inputs to the model, leaving the last category (“unvaccinated”) as reference. Patient age and payer type were found to be highly colinear and so were combined into a single factor for statistical analysis¶. Patient sex, age and payer combination, race, ethnicity, hospitalization status during acute infection, and pre-existing COVID-associated conditions§ were all included as additional factors. The resulting parameters from the model fit were filtered by p-value to αc<0.05 and converted to odds ratios to determine effect on risk of exhibiting long-COVID symptoms.
To further test specific outcomes identified using the logistic regression model, a general linear model using an iteratively reweighted least squares optimization method (Python statsmodels v0.12.2) was fitted to an aggregate continuous variable counting the number of distinct long-COVID symptoms reported after 12-weeks following diagnosis (“Symptom Count”). In contrast to the binned vaccination timing variables used with the logistic regression model, vaccination timing in the linear model was defined as a ratio representing the timing of the first vaccination dose relative to infection, where 1=vaccination prior to COVID-19 infection, 0=unvaccinated at 20 weeks post-COVID-19 infection, and the values in between representing . As before, patient sex, age and payer combination, race, ethnicity, and pre-existing COVID-associated conditions were included as inputs to the model, and the resulting coefficient parameters were filtered by p-value to αc<0.05.
Given the initial population inclusion criteria, 25,804,278 persons in the Arcadia Data Research dataset were considered eligible for this study. Between February 2020 and May 2021, 1,065,626 of those persons were diagnosed with COVID-19. Of those patients, 240,648 met the additional inclusion criteria of qualifying provider encounters both before January 1, 2020, and at least 20 weeks after their first COVID-19 diagnosis. Of these patients, 220,460 (91.6%) had not received any vaccine against COVID-19 prior to 12-weeks after their COVID-19 diagnosis, 2,392 (1.0%) received their first dose prior to diagnosis, and the remainder, 17,796 (7.4%), were vaccinated within the first twelve weeks after COVID-19 diagnosis. Vaccination during that 12-week period after COVID-19 diagnosis was further broken down into four-week bins for statistical analysis (Table 1).
Based on the results of the logistic regression model, even a single dose of any of the three COVID-19 vaccines had a protective (<1.0) odds ratio relative to a patient who had no record of vaccination if vaccines were administered up to 12 weeks after diagnosis, and this protective effect was stronger the earlier the first dose occurred relative to COVID-19 infection (Table 2). This result was supported by the general linear model, which also characterized vaccination as having a strongly negative effect on likelihood and number of long-COVID symptoms, representing the strongest parameter in the model (param = -0.85, p-value<0.0005) (Table 3).
In this study, patients who had been vaccinated prior to COVID-19 infection were significantly less likely to have long-COVID symptoms. This result applies even if only a single dose of the vaccine is documented, regardless of the manufacturer of the vaccine. Although these results show that other factors, such as demographic factors and chronic conditions, also influence the likelihood that an individual will exhibit long-COVID symptoms, vaccination status had a consistently and substantially larger effect on this outcome than any other factor measured.
Furthermore, patients whose first vaccination occurred within 12 weeks after COVID-19 diagnosis were significantly less likely to have long-COVID symptoms than if they had remained unvaccinated. This finding is consistent with the hypothesis that a vaccine may accelerate clearance of the remaining SARS-CoV-2 virus from specific body compartments or reduce part of the body’s immune response related to development of long-COVID (3)..
Currently, CDC recommends “vaccination of people with known current SARS-CoV-2 infection should be deferred until the person has recovered from the acute illness (if the person had symptoms) and they have met criteria to discontinue isolation,” but there is no specific recommendation of earlier vaccine administration, nor any mention of potential preventative effects on the development of long-COVID (7). These results suggest added benefit to post-infection vaccination, and further indicate that the earlier vaccine is administered, the more significant the benefit. If these results are substantiated with further studies, significant public health benefits could be realized, since up to 30% of those infected with SARS-CoV-2 developed prolonged symptoms, with many of these being debilitating and lasting for months.
This study does not address whether COVID-19 vaccines administered within 12 weeks after COVID-19 infection are effective at preventing subsequent COVID-19 infection. Additionally, this study cannot conclude whether COVID-19 vaccines are safe for use for patients in the acute stage of SARS-CoV-2 infection, although no deleterious effects were detected in vaccinated patients in this study or a separate observational study on patients presenting long-COVID symptoms (8). However, the reduced likelihood of long-COVID symptoms observed in this study provides a rationale for vaccination sooner rather than later, achieving improved patient health outcomes related to long-COVID.
This study is subject to at least six limitations: First, these findings are based on opportunistic availability of large volumes of patient data and may have geographic, temporal, contractual, and socioeconomic gaps that could influence outcomes. Second, these findings use vaccination data recorded by payer entities or documented in EHRs by providers but do not incorporate dedicated vaccination surveillance data sources and so may have gaps in vaccination data that are presently undetectable. Third, it is possible, but unlikely, that some of the patients with COVID-19 were misclassified due to a false-positive test result (with no documented correction) or an inaccurate COVID-19 diagnosis. Fourth, no distinction was made between which of the three U.S. COVID-19 vaccines administered; it is possible that some of the effect described is related to a specific vaccine, and that such an effect could not be detected based on the data used here. Fifth, while interactions between the observed demographic factors have been explored, interactions between pre-existing chronic conditions have not and may introduce unforeseen effects to the findings described here. And sixth, this analysis was conducted on patient data collected prior to the emergence of the delta variant as the predominant variant circulating in the United States; as a result, we cannot conclude that the protective effect of vaccination against long-COVID described here applies to patients infected with the delta variant. Furthermore, variant specific analysis was not possible as variant data was not available in this dataset.
A critical goal of this study was hypothesis generation, and we intend to perform further studies to substantiate the findings and hope others will as well. Despite the above limitations, the findings reported here support the hypothesis that COVID-19 vaccination reduces the likelihood of developing long-COVID symptoms and does so even if the first dose is received up to 12 weeks post diagnosis. The Long-COVID Study, developed in collaboration with the COVID-19 Patient Recovery Alliance (PRA) and of which this analysis is a part, is intended to offer useful context for COVID-19 vaccine policy development as well as preliminary data for further exploration. To this end, this team and others associated with the PRA will be expanding on this effort to better clarify these results and determine the extent to which these conclusions apply to other circumstances – for example, those around multiple infections and the emergence of the delta variant – efforts that we hope will be joined by others seeking to better understand the causes underlying long-COVID.
Acknowledgements: Mary Kuchenbrod, Jacob Hochberg, Bill Salvucci, Arcadia.io; Dr. Brett Giroir, Data and Evidence Workgroup, Participants in the COVID-19 Patient Recovery Alliance, https://covid19patientrecovery.org/; Anonymous reviewers at MITRE Corporation.
- Food and Drug Administration. COVID-19 vaccines. Silver Spring, MD: US Department of Health and Human Services; Food and Drug Administration; 2021. Accessed August 29, 2021. https://www.fda.gov/emergency-preparedness-and-response/coronavirus-disease-2019-covid-19/covid-19-vaccines
- Antonelli M, et al. Risk factors and disease profile of post-vaccination SARS-CoV-2 infection in UK users of the COVID Symptom Study app: a prospective, community-based, nested, case-control study. The Lancet Infectious Diseases. Epub 2021 Sept 1. DOI: https://doi.org/10.1016/S1473-3099(21)00460-6
- Katella K. Why Vaccines May Be Helping Some With Long COVID. Yale Medicine; 2021.
- Centers for Disease Control and Prevention. COVID Data Tracker. Atlanta, GA: US Department of Health and Human Services; Centers for Disease Control and Prevention; 2021. Accessed August 29, 2021. https://covid.cdc.gov/covid-data-tracker/#variant-proportions
- COVID-19 Interoperability Alliance. COVID-19 Value Sets. Accessed from https://covid19ia.org on 4/6/2021 and used under a Creative Commons Attribution-NonCommercial 4.0 International License.
- Greenhalgh T, Knight M, Court C, Buxton M, Husain L. Management of post-acute covid-19 in primary care. BMJ 2020; 370: m3026 https://doi.org/10.1136/bmj.m3026
- Centers for Disease Control and Prevention. COVID-19 vaccination and SARS-CoV-2 infection: Interim Clinical Considerations for Use of COVID-19 Vaccines Currently Authorized in the United States. Atlanta, GA: US Department of Health and Human Services; Centers for Disease Control and Prevention; 2021. Accessed August 29, 2021. https://www.cdc.gov/vaccines/covid-19/clinical-considerations/covid-19-vaccines-us.html#CoV-19-vaccination
- Arnold DT, Milne A, Samms E, Stadon L, Maskell NA, Hamilton FW. Are vaccines safe in patients with Long COVID? A prospective observational study. Cold Spring Harbor Laboratory Press; MedRxiv; Epub 2021. DOI: https://doi.org/10.1101/2021.03.11.21253225
* The three vaccines approved for emergency use authorization during the study period were the two mRNA vaccines (Pfizer-BioNTech and Moderna) and the inactivated viral vaccine (Johnson & Johnson). No distinction is made in this study between the vaccines and all results only focus on timing of the first dose.
† COVID-19 associated symptoms categorized by body system (value sets supplied by Clinical Architecture as of April 18, 2021): Cardiovascular – Chest Pain, palpitations; Constitutional – Altered mental state, anorexia, chills, fatigue, fever, malaise; Ears, Nose, Mouth, Throat – Loss of sense of smell, loss of sense of taste, nasal congestion, sore throat; Gastrointestinal – Abdominal pain, diarrhea, digestive changes, nausea and/or vomiting; Musculoskeletal – Arthralgia, muscle weakness, general weakness, myalgia; Neurological – Altered mental state, headache; Respiratory – cough, dyspnea.
§ COVID-19 associated conditions categorized by body system (value sets supplied by Clinical Architecture as of April 18, 2021): Cardiac arrest, cardiac arrhythmias, cardiomyopathy, heart failure, history of deep vein thrombosis, hypertension, myocardial infarction, myocarditis, pulmonary hypertension, valvular heart diseases, diabetes mellitus, metabolic syndrome, chronic kidney disease, glomerulonephritis, nephrotic syndrome, dependence on renal dialysis, renal failure, coagulopathies, stroke, cognitive disorders, transient ischemic attack, asthma, chronic obstructive pulmonary disease, dependent on supplemental oxygen, nonspecific respiratory viral infections, obesity.
¶ Age Group/Payer Type combinations used in this analysis were: Medicare – Senior (65+yo), disabled (<60 yo); Commercial – Child (0-24yo), early adult (25-44yo), later adult (45-64yo); Medicaid – Child (0-24yo), early adult (25-44yo), later adult (45-64yo), senior (65+yo).
TABLE 1. Cohort Demographics – Patient counts by timing of first vaccine dose, categorized by demographic factors. Unknown or Unreported counts are not shown and some small groups or cells were redacted to preserve privacy so values shown may not sum to total patient counts.
TABLE 2. Logistic Regression / Odds Ratios – Results of logistic regression models on patients diagnosed with COVID-19 who were vaccinated before COVID-19 infection, within 12 weeks after COVID-19 infection, or remained unvaccinated 12 weeks after COVID-19 infection. “Any Symptom” model estimates the likelihood that the patient will experience any long-COVID symptom, and the “>1 Symptoms” model estimates the likelihood that the patient will experience more than one distinct long-COVID symptom. Reference group for odds ratios is patients who remained unvaccinated after 12 weeks following their first COVID-19 diagnosis. Other factors (not shown) included: Demographic characteristics based on self-reported information, payer information based on reports by health plan payer as of date of infection, indication of whether patient was hospitalized for COVID-19 (determined by diagnosis code) within the first four weeks of diagnosis, and documentation of a COVID-associated chronic conditions prior to date of COVID-19 diagnosis. Vaccination was the most consistent and substantial contributor to the model.
TABLE 3. General Linear Model Results – Results of linear regression model on patients diagnosed with COVID-19 who were vaccinated before COVID-19 infection, within 20 weeks after COVID-19 infection, or remained unvaccinated 20 weeks after COVID-19 infection. Model estimates the number of distinct long-COVID symptoms the infected patient is likely to demonstrate. The Vaccination input is a continuous variable ranging from 0-1, demographic characteristics were based on self-reported information, payer was as of date of infection, hospitalization indicates whether patient was hospitalized for COVID-19 (as indicated by diagnosis code) within the first four weeks of diagnosis, and chronic conditions represent documentation of a COVID-associated chronic condition prior to date of COVID-19 diagnosis. Vaccination was by far the most significant contributor to the model.