New algorithms to score candidates for lifesaving organ donations

COVID-19 has been in the spotlight not only in the news, but also in how technologies are being developed to evaluate and control the spread of the virus. For example, various state guidelines outline how ventilators, personal protective equipment, and COVID-19 vaccines should be prioritized in hospitals, ambulances, and communities, respectively. But while the pandemic illuminated how government algorithms shape the way healthcare resources are allocated throughout the country, government algorithms will continue to inform urgent healthcare decisions beyond the pandemic. And one of the services impacted in this is the allocation of organs for transplant.

A new score-based framework is updating the Organ Procurement and Transplant Network (OPTN), the national system for distributing lung, liver, kidney and other lifesaving organ donations. Individual organ systems have been transitioning to this new framework since the OPTN Board of Directors approved it in 2018: lung allocation was the first to be updated in January 2019, and liver, kidney, and pancreas allocations were updated last month. Rather than reviewing transplant candidates in ranked classifications and within fixed areas, these new algorithms continually calculate composite scores for candidates that weigh factors related to medical urgency, placement efficiency, outcomes, and patient access. A higher score puts a patient higher on the waitlist, and in turn, more likely to receive an organ transplant. This framework is supposed to be more equitable and adaptable to future changes, but as seen in the recent pushback against new kidney policies in particular, critics have argued that this change will increase wait times and give differential treatment to patients in densely populated regions. 

Organ transplantation is the leading form of treatment for patients with severe organ failure. There were over 32,000 organ transplants in 2019, and an average of 95 transplants now take place in the U.S. everyday. Unlike other life saving transplants (like those involving blood or bone marrow), most organ donations come from deceased donors. Unfortunately, there are not enough donations to meet organ transplant needs across the country: In 2020, about 110,000 people remained on national waiting lists, and currently, there are over 120,000 people in need of a life saving transplant. Someone is added to the national transplant waiting list every nine minutes, and over 20 people die waiting for an organ donation each day. 

Organ allocation systems not only determine who receives scarce organ donations but also what that medical care looks like. In addition to affecting wait times, allocation systems take into account compatibility between donors and patients, which affects the likelihood of transplant success. Transplant candidates are screened if medical factors like blood type or weight make them incompatible with an organ donor. The new allocation algorithms will also be flexible enough to account for factors unique to each organ type. For example, immune system compatibility is important when matching kidney donors to recipients

Changes made to OPTN’s decision-making framework will affect all organ donations in the country: every transplant hospital, organ procurement organization and histocompatibility lab in the U.S. is connected through a nonprofit organization that supports OPTN in partnership with the federal government. Not only does the national organ allocation system have life-or-death implications for many patients across the country, but it also has an important role in shaping systemic issues of access and equity in American healthcare. Recent research has shown how prior models have led to a disparity in the care that African-American chronic kidney disease patients receive, including transplantation access, for example. Women also had less access to kidney transplantation compared to white men under the prior model. Like other attempts to maximize efficiency of limited resources using data-driven analytics, academics warn against the ethical issues  that may arise with algorithm-based organ allocation decisions. For example, programs listing liver transplant candidates were able to game a previous algorithm used to prioritize liver donations, and a previous proposal for a kidney allocation algorithm based primarily on longevity would have violated the Age Discrimination Act. Considering ethics upfront — designing allocation models around metrics of not only efficiency but also ethics — is particularly important given the high-stakes implications of organ allocation.

Journalists can follow along with organ-specific updates from OPTN, the Department of Health and Human Services, and other organizations to cover these algorithms as they continue to be adopted through 2023. Journalists can also research the legal and regulatory history of organ distribution in the U.S., community input considered in the development of the continuous distribution model, and tools related to organ allocation. For example, OPTN made an interactive dashboard to simulate comparisons and match runs. Furthermore, while organ allocation is organized and overseen at a national level, journalists could consider how this new framework impacts states, local communities, hospitals, and individuals, such as by investigating doctors’ and patients’ criticisms of the new systems. Journalists could also consider how these changes to OPTN occur in the context of recent policies concerning organ procurement organizations included in the national network. 

School districts use machine learning to identify high school drop-out risk

On-time graduation is an important metric for public high schools across the United States. Large “graduation gaps” point to the inequities and shortcomings of the American education system, and federal law requires states and districts to report high school graduation rates and intervene in schools with low rates. Improving drop-out rates is difficult, however, as school counselors are tasked with large caseloads and identifying at-risk students requires context and time. Many state and local agencies have adopted data-driven modeling tools to address these challenges, including the Kentucky Department of Education’s Early Warning System

The Kentucky Department of Education (KDE) developed the Early Warning System, an automated, machine learning based tool, in collaboration with Infinite Campus, a software company that hosts the state’s student data entry system. Based on this continually updating source of student information, the Early Warning System uses machine learning to measure how risk factors (such as attendance, academics, and home stability) predict graduation. The system automatically scores each student’s likelihood of graduating on a scale from 50 to 150, which indicates high, medium or low drop-out risk (the lower the number, the higher the risk). The Early Warning System’s interactive interface allows educators to view, filter and search these risk assessments in real time to ensure each student receives the necessary support. A visual dashboard allows users to view overall score distributions at various levels to help district and school personnel better understand what policies yield the greatest impacts on graduation. 

Though the Early Warning System began in KDE, Infinite Campus serves over 2,000 school districts across 45 states, and it made its Early Warning System available in additional states beginning in 2019. Michigan, Montana and Sheridan County School District #2 in Wyoming are among the other state and local agencies using this system. Other government organizations have adopted similar approaches: as early as 2013, 26 jurisdictions used early warning reports to identify students at risk of dropping out. 

Like other information technology software, these tools present privacy concerns. The student data used in these assessments contain personally identifiable information that is covered by privacy laws, such as the federal Family Educational Rights and Privacy Act. Though Infinite Campus’ Early Warning System doesn’t store individual student information, confidential student data have been surreptitiously used in the past. A 2020 Tampa Bay Times investigation, for example, uncovered that a sheriff’s office used school district data to label children as potential future criminals. Furthermore, the Early Warning System learns risk factors based on population-level trends, which could actually result in biases against some demographic groups. A student’s stability rating (a subsection of the overall risk assessment) takes into account information about race/ethnicity and gender, for example. 

Additionally, even if these efforts effectively mitigate dropouts, they do not fully address general criticisms about legislation based on graduation rates. The national graduation rate has increased since 2002, reaching approximately 88% in the 2017-2018 academic year. However, critics question whether this may reflect regulatory policy incentivizing graduation over quality of education. Reports have also shown how using graduation rate as a metric overlooks schools that struggle to advance students through high school and how low-income and students of color are disproportionately affected by this. 

As education agencies throughout the country have already adopted various drop-out risk assessment systems, journalists can begin by reviewing news coverage and studies of existing algorithms, such as public evaluations of the interventions that individual agencies implement for different risk ratings. Many organizations provide resources for educators to navigate these systems, which could also be useful to journalists. Similarly, state education department websites post state-specific plans related to federal education laws and information on student privacy protections. In addition to monitoring other clients that adopt Infinite Campus’ Early Warning System, journalists can watch for new drop-out risk assessment systems. Furthermore, while Kentucky has one of the highest average high school graduation rates in the country — the state had a 4-year graduation rate of 91.1% in 2020 — journalists might consider researching early warning systems in low-performing high schools.

Algorithmic pretrial risk assessment may just be more common than you think

Californians recently voted to reject Proposition 25, which sought to replace cash bail throughout the state with algorithmic risk assessments. But, like it or not, government agencies are already moving forward with algorithmic pretrial reform efforts. For example, the Los Angeles Superior Court piloted a program in March 2020 that utilizes a tool to calculate a defendant’s risk of failing to appear in court and recidivating pretrial. Outside California, the New York City Criminal Justice Agency has a similar release assessment that draws on data from over 1.6 million previous cases to calculate a risk score that informs judges’ pretrial decisions. And communities like Pierce County in Washington State are working with the National Partnership for Pretrial Justice to develop, implement and research pretrial risk assessment systems

Proponents of pretrial risk assessment argue that algorithms can be used to address issues of mass incarceration, inefficiency, and inequity in the criminal justice system. The aforementioned pilot program in Los Angeles was used to rapidly reduce the county’s incarcerated population in response to the COVID-19 pandemic, for example. The New York City Criminal Justice Agency said its release assessment could help alleviate the city’s backlog of pending cases, according to recent Wall Street Journal coverage, and the National Partnership for Pretrial Justice similarly hopes to use risk scores to support fairness in judicial decision making

More generally, according to a 2020 report by the Brennan Center, over 70% of the American prison population (about 536,000 people) are pretrial detainees, and many of these unconvicted individuals are only detained while awaiting trial because they can’t afford bail. Making pretrial detention decisions based on data-based risk assessments rather than ability to pay bail would stop this system of wealth-based discrimination, according to proponents of California’s Proposition 25, who hoped to implement pretrial assessment systems and eliminate money bail throughout the state. 

Others, however, argue that pretrial risk assessments do not help judges make more accurate, unbiased decisions. Opponents of such systems include not only those that oppose eliminating money bail (such as the bail bond industry and some law enforcement agencies); rather, many civil rights organizations that advocate for criminal justice reform are also against the adoption of pretrial risk assessments. In 2018, a coalition of over 100 civil rights, digital justice, and community-based organizations published a statement of concerns about embedding algorithmic decision making in the criminal justice system. 

Many academics also echo this skepticism. In 2019, 27 prominent researchers signed an open statement voicing concerns over “serious technical flaws” that undermine the accuracy, validity and effectiveness of actuarial pretrial risk assessments. More specifically, like many civil rights advocates, they argued such systems cannot adequately measure the risks that judges decide on. Instead, computer-based risk evaluations ultimately perpetuate historical racial inequities in the criminal justice system. 

Some government agencies and risk assessment developers have made efforts to bring transparency to these pretrial systems, so researchers and journalists could first search for readily available information before filing Freedom of Information Act requests. Legislation that would implement more of these algorithms is also something to keep an eye on. California’s Proposition 25, for example, presented the possibility that every county in the state would have to adopt pretrial assessment systems, each of which would have been important to examine in detail. Furthermore, computer-based risk assessments are also used in other areas of the criminal justice system, including recidivism reduction algorithms used at the federal level

Government agencies, big and small, are increasingly adopting controversial algorithms for hiring

As with private companies and nonprofit organizations, government agencies—both big and small—are adopting automation in Human Resources (HR) decision-making processes. For example, federal and local government agencies use algorithms to handle leave requests, issue certifications, and run background investigations. Algorithms, however, don’t eliminate discrimination and other inequities in hiring processes; rather, they can even exacerbate or mask them

HR Avatar is one example of a system that government agencies use to incorporate automation in HR decisions. The system uses AI-driven voice and personality analysis of tests and interviews to provide a quantitative evaluation of applicants for over 200 different positions, which a range of government agencies use to compare, screen and select applicants. Federal and local agencies use HR Avatar for these pre-employment assessments: the Department of Homeland Security, Transportation Security Administration, Federal Aviation Administration and Department of Commerce are listed as clients on HR Avatar’s website.

Recently, the Pottawattamie Sheriff Office, in Iowa, listed a call for applicants to a detention officer position and explained that this system would be used in the hiring process. Applicants first participate in HR Avatar’s Correctional Officer Pre-Employment Assessment and Virtual Interview, which provides the county sheriff office with an overall score and summary of each candidate, competency scores in various areas important to the position and other evaluations. The office then uses these evaluations in the selection of applicants to invite to continue in the hiring process for this position. In effect, HR Avatar’s standardized, automated evaluation of soft skills helps the county make hiring decisions about a position that requires fluency in social skills, such as the ability to retain composure when dealing with violent or hostile individuals.

Although automation can improve the efficiency and success of hiring processes, and artificial intelligence is increasingly enticing as companies continue to shrink and outsource HR departments, researchers have highlighted the challenges in using data science for HR tasks. Automated processes have the potential to reflect existing biases, proactively shape applicant pools and otherwise perpetuate discrimination in hiring practices. While there are many employment laws that address discrimination in hiring practices, the issue of identifying and mitigating discrimination in employment screening algorithms raises new policy concerns.  To further investigate government use of HR Avatar, journalists can issue FOIA requests to agencies that use the system. While investigating HR Avatar’s assessment system may be difficult since HR Avatar is a private company, journalists could review public documents. To further investigate government use of HR algorithms in general, journalists can research state and federal laws (and proposed legislation) about automated employment practices. Furthermore, other leads in the Algorithm Tips database point to automation in HR decision-making processes, including applicant selection at a city fire department, the Department of Justice, and the U.S. Armed Forces.

With climate-related floods on the rise, FEMA is updating an algorithm that impacts 96% of flood insurance in the U.S.

The National Flood Insurance Program (NFIP) has set insurance rates in the same way since the 1970s. Over the last fifty years, however, the program has faced mounting financial issues and criticism from policymakers, fiscal conservatives, environmentalists and other stakeholders. In response, the Federal Emergency Management Agency (FEMA), the department that manages the NFIP, recently announced a new system of insurance rating: Risk Rating 2.0 is currently scheduled to be implemented in October 2021.  

According to FEMA, floods are the most common and costly natural disasters in the United States. Flood insurance is not included under standard homeowner and renter insurance, and since the market for private flood insurance is relatively small, NFIP currently provides over 96% of flood insurance in the U.S. As of December 2019, the NFIP had over 5 million policies providing over $1.3 trillion in coverage. 

However, the program has struggled to remain fiscally solvent while providing affordable flood insurance. According to the Government Accountability Office, FEMA’s debt stood at $20.5 billion in September 2018 despite Congress cancelling $16 billion in debt the year before. The Government Accountability Office has designated the NFIP as “high risk” since 2006, because emphasizing affordability created cases where premium rates did not reflect the full risk of loss and produced insufficient premiums, which in turn transferred the financial burden of individual property owners to taxpayers as a whole. Additionally, scientists have criticized this insurance rating system for reinforcing risky patterns of development.

The goal of the NFIP’s redesigned insurance rating system is to incorporate modern flood risk assessments, including private-sector data, to deliver rates that are “fairer, easier to understand, and better reflect a property’s unique flood risk,” according to FEMA. These changes also need to address FEMA’s funding issues, so although legislation limits annual premium increases to 18%, some policyholders and policymakers of heavily affected areas expressed concern over potential premium increases.

In addition to the direct impact this new rating methodology will have on the NFIP’s five million policyholders, Risk Rating 2.0 will also indirectly affect taxpayers who contribute to national programs like NFIP, and more accurate risk assessments will hopefully discourage risky development. Furthermore, the impact and importance of the NFIP will only continue to grow as global climate change continues to increase high-tide flooding in American coastal communities. 

To investigate this issue, researchers and journalists can review the Government Accountability Office’s reports on the NFIP and FEMA’s decision to postpone implementation of Risk Rating 2.0 from 2020 to 2021. Freedom of Information Act requests could be filed with FEMA to learn more about the new methodology and about what private-sector data will be used in the updated methodology. Much of the government data on flood risks are the products of other algorithms. For example, FEMA uses software like the Wave Height Analysis for Flood Insurance Studies and the Flood Risk Map to evaluate flood risk in low-lying landscapes and other areas. Investigating these tools could provide a more thorough understanding of the data and models that inform  Risk Rating 2.0.

Government COVID-19 algorithms: multiple formats, approaches and challenges

Many government agencies, state and federal, have deployed a variety of algorithms to combat the COVID-19 pandemic. Given that these automated systems can take a myriad of forms, investigating them can also require a variety of approaches. In this way, COVID-related algorithms show how challenging reporting on automated decision-making systems can be: just as government algorithms in general, they are widespread and diverse, making them a challenging, but important topic for journalists to cover. 

So, what kinds of algorithms are governments creating in response to the COVID pandemic? The COVID-related government algorithms that we found aim to track the virus, reduce its spread and soften the impacts of the pandemic in various ways. Many of these algorithms relate to state politics and health care. The Minnesota Department of Health, for example, published guidelines for Minnesota healthcare organizations to report COVID-19 cases to the state. Similarly, the Oregon Health Authority published recommendations for healthcare workers preventing and treating COVID-19. The State of Michigan created a web application to screen for COVID-19 based on users’ self-reported symptoms, and Carnegie Mellon University developed risk indices for Philadelphia counties to inform policymakers as they reopen the state economy.

Other algorithms address the virus in more unexpected ways. For instance, the Environmental Protection Agency published new animal carcass management guidelines in response to COVID-19; the Arizona Department of Child Safety published guidelines for virtual visits for foster caregivers; and the Center for Disease Control created an interactive map that informs users of the risks of traveling to different countries during the pandemic.

These algorithms may be useful, but they also present a number of risks. For example, the MI Symptoms App could raise privacy concerns as many other COVID-related government software have around the world. Although the MI Symptoms App is an online screening tool rather than a contact-tracing application, user information is not covered by HIPAA, can be shared with health departments and contributes to larger county and state data. Furthermore, though third-party organizations do not receive information from the MI Symptoms App, users can sign in with Google and Facebook. This could raise privacy concerns given the sensitive information collected by the software and how that information is handled by third-party services. This algorithm has the potential to affect a large number of people as the State of Michigan created and promoted this free application for employers to use in the daily screening protocol required by the state.

The Carnegie Mellon University Risk-Based Decision Support Tools similarly have the potential to affect many people. This algorithm creates risk evaluations that will influence Pennsylvania policymakers as they plan to reopen the economy, which will in turn impact the economic situation and safety of their constituents. This algorithm is also newsworthy given the general controversy surrounding reopening the economy, especially in Pennsylvania, which had some of the highest unemployment-compensation claims in the country as of late April and the tenth highest number of confirmed cases in the U.S. as of early July 2020. Although the risk indices are data-driven evaluations, the algorithm speaks to a political and divisive decision, so it is likely to be a topic of debate regardless of its output or whether policymakers act in accordance to its evaluations.

We’ll be on the lookout for more COVID-related algorithms moving forward. But even just with the ones we’ve already found, there’s more work to do. To investigate the MI Symptoms App, researchers and journalists could file a public records request with  the State of Michigan to learn about the software. They could also request user agreements from the MI Symptoms App and connected third-party services to learn about user privacy. Reporters who are interested in looking at the Carnegie Mellon risk indices can search for updated project proposals from the university or dig into the details of its implementation. Investigations into other COVID-related government algorithms can begin with contacting relevant government agencies and health organizations. For instance, The Minnesota Department of Health or a Minnesota hospital could speak to the efficacy of state reporting guidelines. We hope that with additional research and reporting, the impact of these systems and algorithms for the public can be further clarified. 

Facing the threat of budget cuts, Energy Star needs updates to continue saving money

Energy Star is a government program that helps consumers avoid $30 billion in energy costs every year. But the program is constantly threatened by administrative pressures and the need to be updated. Energy Star is an example of how the success and failure of an algorithm is not only dependent on its internal design or the data that goes into it, but also relies on how it is managed throughout its lifecycle.

The idea behind Energy Star is that appliances, electronics, buildings, and industrial plants can be rated according to an energy use scoring system. That system is an algorithm that combines survey data with an analysis of the energy consumption of the type of appliance or building that is being evaluated.

For buildings, the ratings are calculated by combining a trove of data for each building (size, location, number of occupants, etc). The Energy Star algorithm then categorizes the target buildings in groups based on similarity nationwide, using information from a national survey called the Commercial Building Energy Consumption Survey (CBECS). Then, the energy consumption of the target building is compared to buildings of the same classification. If the building that is being evaluated is in the top 75% of buildings of that classification—it can receive an Energy Star certification.

For appliances and electronics, it’s the same thing. For each of the more than 75 categories of appliances or electronics, from dishwashers to air purifiers, brands and models are rated according to the median energy consumption. If the efficiency is high enough to be in the 75th percentile, the product gets an Energy Star label.

The issue is that as technology evolves, appliances and buildings get more and more efficient. What was once considered economical can become wasteful as the years go by.  That is why the methodology and underlying data of a program like Energy Star has to be constantly updated.

But this is not the only issue with Energy Star. There is the even larger criticism of whether or not the algorithm is accurate at all. According to John Scofield, professor of Physics at Oberlin College, the Energy Star models are based on unreliable data, which leads to high degrees of uncertainty in the ratings of buildings, including severe grade inflation.

These issues raise the question of who is in charge of ensuring the accuracy of the rating system.. Today, Energy Star is maintained by a team at the Environmental Protection Agency (EPA), an agency that is constantly facing the possibility of budget cuts. Last year, the Trump administration presented a budget proposal that cut $42 million from Energy Star. While the proposal was rejected by Congress, a memo drafted by the EPA financial officer in March 2017 revealed that the federal government had threatened to eliminate the program altogether.

Civil rights groups concerned about biases in recidivism reduction algorithm

An algorithm that was launched in June by the U.S. Department of Justice to predict the likelihood of recidivism for federal inmates is being criticized by civil rights activists of possible gender and racial bias.

The Prisoner Assessment Tool Targeting Estimated Risk and Needs (PATTERN) algorithm is part of the First Step Act, a criminal justice reform bill that was passed by Congress with bipartisan support and signed into law by President Donald Trump in December 2018. The law intends to reduce recidivism and streamline the process through which inmates can be rewarded for good behavior, with the ultimate goal of reducing the federal prison population.

According to the DOJ, 3,100 federal inmates were already released as a result of the First Step Act. According to the Bureau of Prisons, as of September 12, there were 177,080 inmates under custody of the BOP, either in federal or private facilities.

The way PATTERN works is by classifying a BOP prisoner into one of four Risk Level Categories (“RLCs”) by scoring them by assigning points in 17 different categories. In their report explaining how the algorithm works, the DOJ touts that the new algorithm is 15% more accurate than similar tools, according to a metric called Area Under the Curve (AUC). The AUC represents the likelihood that the algorithm would give any given recidivist a higher risk score than a non-recidivist.

But as with other algorithms used by the justice system, PATTERN is raising controversy over the types of inputs it uses. Earlier this month, civil rights groups have published an open letter criticizing the fact that the algorithm uses historical data to calculate the assessments, which would make it a fertile ground for biases.

While the DOJ report also explains that tests were conducted to assess “racial and ethnic neutrality,” and that there is “minimal racial/ethnic disparity for PATTERN’s prediction strength,” the civil rights groups also urged the DOJ to address concerns about “racial, ethnic and gender biases.” According to the authors of the letter, failures in the algorithm “could be holding back thousands more from the freedom they deserve.”

An additional concern is that, whatever solutions are created to reduce the number of federal inmates, that would only impact a small proportion of prisoners in the country. The 2018 report on mass incarceration by the Prison Policy Initiative shows that only 10% of inmates in the United States are in federal facilities, compared to 60% for state prisons and 30% for local jails.

The Bureau of Justice Statistics reported this year that there were almost 2.2 million inmates in the United States in 2016, which means that for every 100,000 people residing in the United States, approximately 670 of them were behind bars. According to the Vera Institute of Justice, incarceration costs an average of more than $31,000 per inmate, per year, nationwide. In some states, it’s as much as $60,000. 

Nursing Home Compare: making sense of a crucial tool for the future

The Five-Star Quality Rating System is a convenient and widely used tool created by the Center for Medicare and Medicaid Services (CMS) to help people find good nursing homes, but it can be a subject of controversy when it comes to specific metrics that it uses. 

In 1998, the Center for Medicare and Medicaid Services (CMS) initially launched Nursing Home Compare as a federal website for information about nursing homes. The Five-Star Quality Rating system was an update made in 2008 to help the public better identify and compare nursing homes based on an automated score generated by specific metrics of the nursing homes. The Nursing Home Compare website works by requesting the zip code of the user and listing all available nursing homes within a 25 mile radius. The user can then select up to three to compare the ratings based on reports certified by the CMS (see screenshot below). Providers are given a score from one to five stars on criteria such as health inspections, staffing and quality measures, where one star represents below average and five star represents above average. 

Nursing Home Compare: making sense of a crucial tool for the future

While the star rating method is an easy way for consumers to visualize differences in quality between nursing homes, there might be issues which could threaten the validity of the assessment.

In an academic study published in 2018, researchers found that higher ratings on Nursing Home Compare did not directly translate to a lower rate of hospitalization, deeming the Five-Star Quality Rating System to be “less meaningful as an indicator of nursing home quality for post-acute care patients.” This study raises the concern that the rating system might not be incentivizing nursing homes to focus on patient care, but rather on simply meeting the standards that are checked during inspections in order to receive a higher rating. 

Another criticism regarding Nursing Home Compare is the lack of understanding towards assessing overall quality of nursing homes. In the most recent change to the system, a facility will automatically be assigned a one-star rating for the “registered nurse staffing” category if it reports four or more days per quarter with no registered nurse on site. The CMS said this change was to reflect that “nurse staffing has the greatest impact on the quality of care nursing homes deliver.” This change, while positive, was met with criticism from the American Health Care Association (AHCA). In a statement, the AHCA says that “the staff rating still does not include therapists” who also play a critical role in ensuring patient-centered care. While the CMS is continuously making strides to improve the system, they may well  be overlooking other vital components of quality care. 

The most recent study by the CDC in 2016 found that there were about 15,600 nursing homes in the U.S., however, that number is dropping. In an interview with the New York Times, University of Pittsburgh health policy researcher Dr. Nicholas Castle said that 200 to 300 nursing homes close each year due to declining number of residents who might choose alternatives such as assisted living or other ways to stay at home. Meanwhile, the Population Reference Bureau projects that by 2060, nearly 100 million Americans will be 65 or older, which is more than double the number from 2016.  Putting these pieces of information together, there will be an inevitable rise in demand for better nursing home facilities, and for an accurate rating system to help people find those homes. 

Could CMS’ Fraud Prevention System be unfair?

The Centers for Medicare and Medicaid Services (CMS) uses the Fraud Prevention System (FPS) to detect improper Medicare payments by processing millions of fee-for-service claims every day. But the focus on monetary return might cause the system to focus on some fraudulent healthcare providers in lieu of others.

FPS analyzes data related to Medicare fee-for-service claims to detect improper claims and develop leads for fraud investigations. The large number of irregular claims shows the need for such a system: CMS estimates that 9.51% of its payments in the 2017 fiscal year were towards claims that violated Medicare policy, which translates to $36.21 billion in improper payments. Past years also had high improper payment rates: in the 2016 fiscal year, it was 11%, and in the 2015 fiscal year, it was 12.1%.

These losses are the reason why the Small Business Jobs Act of 2010 required the CMS to create the FPS, which uses predictive analytics to process all fee-for-service claims prior to payment and prevent the payment of improper claims.

The issue is that FPS has a huge impact, both in the operation of Medicare and the lives of millions of Americans.  According to Northrop Grumman, the defense contractor chosen to implement the FPS, the system includes 42 million Medicare beneficiary profiles and 2 million healthcare provider profiles as of 2013, with millions of claims processed daily. Even though the system was expensive to implement, costing around $51.7 million in its fourth implementation year (2015), the investment is yielding results. In 2015, it helped identify or prevent $654.8 million of improper payment, which means approximately a $11.5 return for every dollar invested, although the rate of return was not as high in previous years.

Although it is important to consider the return on investment when evaluating the FPS system, using that as the only measurement of success can lead to undesirable biases. A report to Congress mentioned the potential side effects of concentrating on return on investment: the program might focus on getting money back from “amateur” fraudsters rather than “professional” fraudsters who, for example, might offshore their illegitimate gains. An “amateur” fraudster might be a healthcare provider that treats real patients but also makes improper claims. It would be easier to get money back from this business rather than someone who offshored all the money. In the short run, this could mean using fewer resources and getting higher returns, which would look good in terms of return on investment. However, the professional fraudsters are also a problem, and it’s important to go after them as well. Is it fair to target “small” fraud instead of “big” fraud just because the money is easier to recover?

Furthermore, healthcare providers can be punished over errors even when most of their fee-for-service claims are legitimate. For example, CMS revoked the billing privileges of the company Arriva (which describes itself as “the nation’s largest supplier of home-delivered diabetic testing supplies”) based on 211 improper claims, which represents only 0.003% of its claims over the past five years. Even though this reduces healthcare fraud, the decision also affects Arriva’s ability to provide real medical service to its customers.

To investigate this issue, researchers and journalists could view CMS’s Return on Investment reports and the Government Accountability Office’s reports on the Fraud Prevention System.

Investigation into this algorithm can start by contacting the press contacts of the Centers of Medicare and Medicaid Services, as well as people involved in health care fraud cases, since the FPS generates a lot of leads for investigations. The FBI also has a website dedicated to health care fraud news.