Government COVID-19 algorithms: multiple formats, approaches and challenges

Many government agencies, state and federal, have deployed a variety of algorithms to combat the COVID-19 pandemic. Given that these automated systems can take a myriad of forms, investigating them can also require a variety of approaches. In this way, COVID-related algorithms show how challenging reporting on automated decision-making systems can be: just as government algorithms in general, they are widespread and diverse, making them a challenging, but important topic for journalists to cover. 

So, what kinds of algorithms are governments creating in response to the COVID pandemic? The COVID-related government algorithms that we found aim to track the virus, reduce its spread and soften the impacts of the pandemic in various ways. Many of these algorithms relate to state politics and health care. The Minnesota Department of Health, for example, published guidelines for Minnesota healthcare organizations to report COVID-19 cases to the state. Similarly, the Oregon Health Authority published recommendations for healthcare workers preventing and treating COVID-19. The State of Michigan created a web application to screen for COVID-19 based on users’ self-reported symptoms, and Carnegie Mellon University developed risk indices for Philadelphia counties to inform policymakers as they reopen the state economy.

Other algorithms address the virus in more unexpected ways. For instance, the Environmental Protection Agency published new animal carcass management guidelines in response to COVID-19; the Arizona Department of Child Safety published guidelines for virtual visits for foster caregivers; and the Center for Disease Control created an interactive map that informs users of the risks of traveling to different countries during the pandemic.

These algorithms may be useful, but they also present a number of risks. For example, the MI Symptoms App could raise privacy concerns as many other COVID-related government software have around the world. Although the MI Symptoms App is an online screening tool rather than a contact-tracing application, user information is not covered by HIPAA, can be shared with health departments and contributes to larger county and state data. Furthermore, though third-party organizations do not receive information from the MI Symptoms App, users can sign in with Google and Facebook. This could raise privacy concerns given the sensitive information collected by the software and how that information is handled by third-party services. This algorithm has the potential to affect a large number of people as the State of Michigan created and promoted this free application for employers to use in the daily screening protocol required by the state.

The Carnegie Mellon University Risk-Based Decision Support Tools similarly have the potential to affect many people. This algorithm creates risk evaluations that will influence Pennsylvania policymakers as they plan to reopen the economy, which will in turn impact the economic situation and safety of their constituents. This algorithm is also newsworthy given the general controversy surrounding reopening the economy, especially in Pennsylvania, which had some of the highest unemployment-compensation claims in the country as of late April and the tenth highest number of confirmed cases in the U.S. as of early July 2020. Although the risk indices are data-driven evaluations, the algorithm speaks to a political and divisive decision, so it is likely to be a topic of debate regardless of its output or whether policymakers act in accordance to its evaluations.

We’ll be on the lookout for more COVID-related algorithms moving forward. But even just with the ones we’ve already found, there’s more work to do. To investigate the MI Symptoms App, researchers and journalists could file a public records request with  the State of Michigan to learn about the software. They could also request user agreements from the MI Symptoms App and connected third-party services to learn about user privacy. Reporters who are interested in looking at the Carnegie Mellon risk indices can search for updated project proposals from the university or dig into the details of its implementation. Investigations into other COVID-related government algorithms can begin with contacting relevant government agencies and health organizations. For instance, The Minnesota Department of Health or a Minnesota hospital could speak to the efficacy of state reporting guidelines. We hope that with additional research and reporting, the impact of these systems and algorithms for the public can be further clarified. 

Facing the threat of budget cuts, Energy Star needs updates to continue saving money

Energy Star is a government program that helps consumers avoid $30 billion in energy costs every year. But the program is constantly threatened by administrative pressures and the need to be updated. Energy Star is an example of how the success and failure of an algorithm is not only dependent on its internal design or the data that goes into it, but also relies on how it is managed throughout its lifecycle.

The idea behind Energy Star is that appliances, electronics, buildings, and industrial plants can be rated according to an energy use scoring system. That system is an algorithm that combines survey data with an analysis of the energy consumption of the type of appliance or building that is being evaluated.

For buildings, the ratings are calculated by combining a trove of data for each building (size, location, number of occupants, etc). The Energy Star algorithm then categorizes the target buildings in groups based on similarity nationwide, using information from a national survey called the Commercial Building Energy Consumption Survey (CBECS). Then, the energy consumption of the target building is compared to buildings of the same classification. If the building that is being evaluated is in the top 75% of buildings of that classification—it can receive an Energy Star certification.

For appliances and electronics, it’s the same thing. For each of the more than 75 categories of appliances or electronics, from dishwashers to air purifiers, brands and models are rated according to the median energy consumption. If the efficiency is high enough to be in the 75th percentile, the product gets an Energy Star label.

The issue is that as technology evolves, appliances and buildings get more and more efficient. What was once considered economical can become wasteful as the years go by.  That is why the methodology and underlying data of a program like Energy Star has to be constantly updated.

But this is not the only issue with Energy Star. There is the even larger criticism of whether or not the algorithm is accurate at all. According to John Scofield, professor of Physics at Oberlin College, the Energy Star models are based on unreliable data, which leads to high degrees of uncertainty in the ratings of buildings, including severe grade inflation.

These issues raise the question of who is in charge of ensuring the accuracy of the rating system.. Today, Energy Star is maintained by a team at the Environmental Protection Agency (EPA), an agency that is constantly facing the possibility of budget cuts. Last year, the Trump administration presented a budget proposal that cut $42 million from Energy Star. While the proposal was rejected by Congress, a memo drafted by the EPA financial officer in March 2017 revealed that the federal government had threatened to eliminate the program altogether.

Civil rights groups concerned about biases in recidivism reduction algorithm

An algorithm that was launched in June by the U.S. Department of Justice to predict the likelihood of recidivism for federal inmates is being criticized by civil rights activists of possible gender and racial bias.

The Prisoner Assessment Tool Targeting Estimated Risk and Needs (PATTERN) algorithm is part of the First Step Act, a criminal justice reform bill that was passed by Congress with bipartisan support and signed into law by President Donald Trump in December 2018. The law intends to reduce recidivism and streamline the process through which inmates can be rewarded for good behavior, with the ultimate goal of reducing the federal prison population.

According to the DOJ, 3,100 federal inmates were already released as a result of the First Step Act. According to the Bureau of Prisons, as of September 12, there were 177,080 inmates under custody of the BOP, either in federal or private facilities.

The way PATTERN works is by classifying a BOP prisoner into one of four Risk Level Categories (“RLCs”) by scoring them by assigning points in 17 different categories. In their report explaining how the algorithm works, the DOJ touts that the new algorithm is 15% more accurate than similar tools, according to a metric called Area Under the Curve (AUC). The AUC represents the likelihood that the algorithm would give any given recidivist a higher risk score than a non-recidivist.

But as with other algorithms used by the justice system, PATTERN is raising controversy over the types of inputs it uses. Earlier this month, civil rights groups have published an open letter criticizing the fact that the algorithm uses historical data to calculate the assessments, which would make it a fertile ground for biases.

While the DOJ report also explains that tests were conducted to assess “racial and ethnic neutrality,” and that there is “minimal racial/ethnic disparity for PATTERN’s prediction strength,” the civil rights groups also urged the DOJ to address concerns about “racial, ethnic and gender biases.” According to the authors of the letter, failures in the algorithm “could be holding back thousands more from the freedom they deserve.”

An additional concern is that, whatever solutions are created to reduce the number of federal inmates, that would only impact a small proportion of prisoners in the country. The 2018 report on mass incarceration by the Prison Policy Initiative shows that only 10% of inmates in the United States are in federal facilities, compared to 60% for state prisons and 30% for local jails.

The Bureau of Justice Statistics reported this year that there were almost 2.2 million inmates in the United States in 2016, which means that for every 100,000 people residing in the United States, approximately 670 of them were behind bars. According to the Vera Institute of Justice, incarceration costs an average of more than $31,000 per inmate, per year, nationwide. In some states, it’s as much as $60,000. 

Nursing Home Compare: making sense of a crucial tool for the future

The Five-Star Quality Rating System is a convenient and widely used tool created by the Center for Medicare and Medicaid Services (CMS) to help people find good nursing homes, but it can be a subject of controversy when it comes to specific metrics that it uses. 

In 1998, the Center for Medicare and Medicaid Services (CMS) initially launched Nursing Home Compare as a federal website for information about nursing homes. The Five-Star Quality Rating system was an update made in 2008 to help the public better identify and compare nursing homes based on an automated score generated by specific metrics of the nursing homes. The Nursing Home Compare website works by requesting the zip code of the user and listing all available nursing homes within a 25 mile radius. The user can then select up to three to compare the ratings based on reports certified by the CMS (see screenshot below). Providers are given a score from one to five stars on criteria such as health inspections, staffing and quality measures, where one star represents below average and five star represents above average. 

Nursing Home Compare: making sense of a crucial tool for the future

While the star rating method is an easy way for consumers to visualize differences in quality between nursing homes, there might be issues which could threaten the validity of the assessment.

In an academic study published in 2018, researchers found that higher ratings on Nursing Home Compare did not directly translate to a lower rate of hospitalization, deeming the Five-Star Quality Rating System to be “less meaningful as an indicator of nursing home quality for post-acute care patients.” This study raises the concern that the rating system might not be incentivizing nursing homes to focus on patient care, but rather on simply meeting the standards that are checked during inspections in order to receive a higher rating. 

Another criticism regarding Nursing Home Compare is the lack of understanding towards assessing overall quality of nursing homes. In the most recent change to the system, a facility will automatically be assigned a one-star rating for the “registered nurse staffing” category if it reports four or more days per quarter with no registered nurse on site. The CMS said this change was to reflect that “nurse staffing has the greatest impact on the quality of care nursing homes deliver.” This change, while positive, was met with criticism from the American Health Care Association (AHCA). In a statement, the AHCA says that “the staff rating still does not include therapists” who also play a critical role in ensuring patient-centered care. While the CMS is continuously making strides to improve the system, they may well  be overlooking other vital components of quality care. 

The most recent study by the CDC in 2016 found that there were about 15,600 nursing homes in the U.S., however, that number is dropping. In an interview with the New York Times, University of Pittsburgh health policy researcher Dr. Nicholas Castle said that 200 to 300 nursing homes close each year due to declining number of residents who might choose alternatives such as assisted living or other ways to stay at home. Meanwhile, the Population Reference Bureau projects that by 2060, nearly 100 million Americans will be 65 or older, which is more than double the number from 2016.  Putting these pieces of information together, there will be an inevitable rise in demand for better nursing home facilities, and for an accurate rating system to help people find those homes. 

Could CMS’ Fraud Prevention System be unfair?

The Centers for Medicare and Medicaid Services (CMS) uses the Fraud Prevention System (FPS) to detect improper Medicare payments by processing millions of fee-for-service claims every day. But the focus on monetary return might cause the system to focus on some fraudulent healthcare providers in lieu of others.

FPS analyzes data related to Medicare fee-for-service claims to detect improper claims and develop leads for fraud investigations. The large number of irregular claims shows the need for such a system: CMS estimates that 9.51% of its payments in the 2017 fiscal year were towards claims that violated Medicare policy, which translates to $36.21 billion in improper payments. Past years also had high improper payment rates: in the 2016 fiscal year, it was 11%, and in the 2015 fiscal year, it was 12.1%.

These losses are the reason why the Small Business Jobs Act of 2010 required the CMS to create the FPS, which uses predictive analytics to process all fee-for-service claims prior to payment and prevent the payment of improper claims.

The issue is that FPS has a huge impact, both in the operation of Medicare and the lives of millions of Americans.  According to Northrop Grumman, the defense contractor chosen to implement the FPS, the system includes 42 million Medicare beneficiary profiles and 2 million healthcare provider profiles as of 2013, with millions of claims processed daily. Even though the system was expensive to implement, costing around $51.7 million in its fourth implementation year (2015), the investment is yielding results. In 2015, it helped identify or prevent $654.8 million of improper payment, which means approximately a $11.5 return for every dollar invested, although the rate of return was not as high in previous years.

Although it is important to consider the return on investment when evaluating the FPS system, using that as the only measurement of success can lead to undesirable biases. A report to Congress mentioned the potential side effects of concentrating on return on investment: the program might focus on getting money back from “amateur” fraudsters rather than “professional” fraudsters who, for example, might offshore their illegitimate gains. An “amateur” fraudster might be a healthcare provider that treats real patients but also makes improper claims. It would be easier to get money back from this business rather than someone who offshored all the money. In the short run, this could mean using fewer resources and getting higher returns, which would look good in terms of return on investment. However, the professional fraudsters are also a problem, and it’s important to go after them as well. Is it fair to target “small” fraud instead of “big” fraud just because the money is easier to recover?

Furthermore, healthcare providers can be punished over errors even when most of their fee-for-service claims are legitimate. For example, CMS revoked the billing privileges of the company Arriva (which describes itself as “the nation’s largest supplier of home-delivered diabetic testing supplies”) based on 211 improper claims, which represents only 0.003% of its claims over the past five years. Even though this reduces healthcare fraud, the decision also affects Arriva’s ability to provide real medical service to its customers.

To investigate this issue, researchers and journalists could view CMS’s Return on Investment reports and the Government Accountability Office’s reports on the Fraud Prevention System.

Investigation into this algorithm can start by contacting the press contacts of the Centers of Medicare and Medicaid Services, as well as people involved in health care fraud cases, since the FPS generates a lot of leads for investigations. The FBI also has a website dedicated to health care fraud news.

Welcome

Algorithmic decision-making systems (ADMs) now influence many facets of our lives. Whether it be in finance, employment, welfare management, romance, or dynamic pricing, these systems are practically ubiquitous throughout the public and private sectors.

Operating at scale ADMs can impact large swaths of people—for better or worse. Algorithmic accountability reporting has emerged as a response, an attempt to uncover the power wielded by ADMs and detail their biases, mistakes, or misuse. Algorithmic accountability entails understanding how and when people exercise power within and through an algorithmic system, and on whose behalf.

Algorithm Tips hopes to stimulate algorithmic accountability reporting and support a robust reporting beat on algorithms, in particular by making it easier to investigate algorithms used in government decision making.

To do this we curate a database of ADMs in the US federal government which are of potential interest for investigation (and hope to expand to local and international jurisdictions in the future). On our home page you can search the database for interesting algorithms using keywords relating to facets such as agency (e.g., Dept. of Justice) or topic (e.g., health, police, etc.). Next, on our resources page, you can learn how to submit public records requests about algorithms, or find news articles and research papers about the uses and risks of algorithms. We hope you can get some inspiration there. And finally, we actually dig into some of these leads ourselves and post write-ups to our blog. We hope that journalists and other stakeholders can build on these posts and develop even deeper investigations.

Algorithm Tips is a project of the Northwestern University Computational Journalism Lab. If you have any questions, comments, or concerns (or want to talk about how to help us expand the effort), get in touch: nad@northwestern.edu. Thanks!