At the COVID Tracking Project, we were keenly aware of how little information the public was receiving. And we, like many other people, worried that HHS officials would attempt to influence the data. While hospitalization data were trickling out, other information remained locked up inside the government.
“As soon as COVID became a political issue, the administration willingly withheld data that showed how severe COVID was spreading in our communities,” says Ryan Panchadsaram, the former deputy chief technology officer of the United States under Obama and a co-founder of COVID Exit Strategy, which tracks the government’s response. “While internal reports were highlighting the ‘red zones’ and ‘areas of concern,’ the president and vice president continued to share that the reaction to COVID was ‘overblown.’”
So at the end of the summer, we decided to look for signs of cooking the books in the federal hospitalization data. First, we simply looked to see if there were obviously political patterns in the data—say, red states with lower hospitalization numbers than anticipated, or overall depressed numbers. We didn’t see anything like that. Then we ran statistical tests looking at the variance in data from different states.
What we found surprised us: The data that were flowing through HHS were much less spiky than what had flowed primarily through NHSN. In fact, at least on initial inspection, the HHS data looked a lot like our patchwork of data from states, which for the most part was not riddled with weird jumps or unexplained phenomena that were obviously not reflective of reality. When cases rose, hospitalizations did shortly thereafter. As the HHS data came to resemble the state data, we began to suspect that perhaps the HHS data had, as we put it in an internal report on August 20, “enormous potential to be the Federal numbers we’ve always wanted.”
Stitching together state reporting into a national data set is an incredibly research-intensive way to produce those statistics. We have to figure out precisely what information 56 states and territories are reporting, and even then, we cannot guarantee perfectly comparable data. HHS, for its part, simply asked states to report all confirmed and suspected COVID-19 hospitalizations in the same way, creating a consistent and standardized data set. Once hospitals learned the system, the data solidified. Jason Salemi, an epidemiologist at the University of South Florida, described the changes as “amazing improvements.”
“For a long while, there was very little help from federal data—it was a massive disappointment and failure to serve the public at a time when such information was direly needed,” Salemi told me. Since then, HHS “has stepped up to the challenge in a major way.”
Some critiques of the HHS-generated information have called its accuracy into question. There are many data sets in HHS Protect that originate in many different places, so we cannot speak to all of them. However, the COVID Tracking Project can check HHS against the state reports. In late November, we found that the data had come to match almost perfectly. Not all states report precisely the same way, and the COVID Tracking Project runs one day behind HHS, but after we took those factors into account, we found that HHS and state data were now falling within 2 percent of each other. If the HHS data were off, then the data produced by every state were also off.