By Paula Klein
As machine learning and AI are more commonly used to determine important real-world outcomes, such as who gets hired for a job or whether or not someone is eligible for bail, numerous researchers are grappling to understand potential biases arising from the use of algorithmic decision making tools, and how to mitigate those biases.
Several cutting-edge research studies addressing this issue were discussed at recent seminars hosted by the MIT IDE. One reported that biased machine learning predictions are mostly caused by biased training data. Another study suggested that reconsidering mathematical definitions of algorithmic fairness can create more equitable outcomes.
Bo Cowgill, Assistant Professor at Columbia Business School, sees machine learning bias as a growing concern. In general, there is a “lack of empirical data about AI process and outcomes,” even as the underlying process that may determine whether or not someone has a criminal record or is hired for a job, is increasingly determined by AI programs, Cowgill said.
Ethical Dilemmas
His research on operationalizing AI ethics aims to determine whether biased programmers or biased data –or both– are causing ethical dilemmas in human capital decisions. Cowgill said that experiments and clear metrics are needed to test how variations in programming practices influence outcomes. Better metrics “would allow us to understand more about organizational, empirical, and behavioral aspects of algorithm design and why bias arises at all,” he said.
Cowgill’s research evaluated 8.2 million algorithmic predictions of math skill from approximately 400 AI engineers, each of whom developed an algorithm under a randomly assigned experimental condition. Engineers were offered incentives, training data modifications, bias-awareness reminders, and/or technical knowledge of AI ethics to see if these interventions affected the algorithms produced.
Cowgill and coauthors measured how the prediction errors made by algorithms varied with programmers’ randomly assigned working conditions and demographic attributes in order to understand the benefits of a particular managerial or policy intervention. The results showed that although more of the bias can be attributed to biased training data, both biased training data and biased programmers impacted the outcomes because of complementarities among data, effort, and incentives. Read the full paper here.
In his IDE talk, Stanford University Assistant Professor Sharad Goel addressed the issue of bias and algorithmic fairness. Goel, who founded and directs Stanford’s Computational Policy Lab, also argues for better metrics to “carefully define and measure the targets of prediction to avoid retrenching biases in the data.” In a recent paper he points out that commonly used algorithms that satisfy popular mathematical formalizations of fairness are often flawed.
Goel cites many examples such as banking, criminal justice, and medicine, where “consequential decisions are often informed by statistical risk assessments” that quantify the likely consequences of potential courses of action. For instance, a bank’s lending decisions might be based on the probability that a prospective borrower will default if offered a loan. Similarly, a judge may decide to detain or release a defendant awaiting trial based on his or her estimated likelihood of committing a crime if released. “But as the influence and scope of these risk assessments increase,” so do concerns that the statistical models might inadvertently encode human biases.
Reassessing Algorithmic Fairness
Goel explained how three of the most popular definitions of algorithmic fairness—anti-classification, classification parity, and calibration—actually suffer from deep statistical limitations and can lead to more discrimination. “In particular, they are poor measures for detecting discriminatory algorithms and, even more importantly, designing algorithms to satisfy these definitions can, perversely, negatively impact the well-being of minority and majority communities alike.” Clearly, traditional algorithms are too static and rigid, he said.
Goel said programmers can design more equitable systems by separating predictions from decision-making. To achieve more equity in lending or in bail hearings, developers may need more input information for their programs, which could lead to higher costs and slower development times—all tradeoffs to be considered.
Two other IDE seminars touched on AI bias as follows:
- Harvard Business School Assistant Professor, Zoe Cullen, looked at how digital hiring platforms make it easier to measure the effects of interventions like wage subsidies, that can help people with a criminal record find work. Her research also examines how the underlying processes that sometimes determine factors such as whether or not someone has a criminal record, or is hired for a job, are increasingly determined by AI. In her paper, “Measuring Labor Demand for Workers with a Criminal Conviction,” Cullen finds that technological advances in rapid criminal background screening have led to increased employer screenings—up from approximately 30% to 80% since 1990.
- Kartik Hosanagar, Professor at the Wharton School of the University of Pennsylvania, discussed machine learning instrument variables (IV) for causal inference. His work shows that ML / AI tools can be used to improve the efficacy of established causal inference techniques, which can lead to more informed policy decisions. Hosanagar has written extensively about algorithms, including articles and a new book, A Human’s Guide to Machine Intelligence: How Algorithms Are Shaping Our Lives and How We Can Stay in Control.
Find links to IDE seminar videos here.