Reading and Implementation on Model Bias
For this assignment, please read the following book chapters and articles:
1. Fairness and Machine Learning, Solon Barocas et al. (ch. 3)
2. Bias Mitigation for Machine Learning Classifiers: A Comprehensive Survey, Max Hort et al., ACM Journal on Responsible Computing, 2024
3. Fairness Definitions Explained, Sahil Verma et al., IEEE/ACM FairWare, 2018
4. On the Apparent Conflict Between Individual and Group Fairness, Reuben Binns, ACM FAcct, 2020
5. Practical Fairness, Aileen Nielsen (ch. 3-4)
6. Machine Bias, Julia Angwin et al., ProPublica, 2016 [link]
7. AI Fairness 360 [link]
After completing these readings, please critical answer the following questions.
Based on paper [1], answer the following questions:
Formally demonstrate why Independence, Separation, and Sufficiency cannot all be satisfied together. Your analysis should include:
1. Proof that Independence and Separation are incompatible when base rates differ across groups.
2. Proof that Separation and Sufficiency are incompatible under differing base rates.
3. Proof that Independence and Sufficiency are mutually exclusive.
Based on paper [2], answer the following questions:
Paper [2] considers multiple approaches for in-processing and post-processing bias mitigation (see Section 4). Please provide a detailed explanation of each of these approaches, focusing on how they work to mitigate bias in machine learning classifiers. Then, by defining a specific use case (e.g., hiring prediction, credit approval, healthcare diagnosis) explain how each approach could be applied. Explain the advantages and disadvantages of using each approach in your chosen use case (for example, by considering factors such as effectiveness in bias reduction, impact on data quality, scalability and complexity, etc.)
Based on paper [3], answer the following questions:
1. The paper argues that no single fairness definition can apply universally across all scenarios. Select a real-world application (e.g., healthcare, hiring, lending) and critically evaluate how one fairness definition might introduce challenges or limitations in this context.
2. Even when an algorithm satisfies a fairness definition like demographic parity or equal opportunity, it may still inadvertently amplify biases present in the data. How can existing fairness metrics fail to capture this phenomenon? Provide an example of bias amplification and discuss why it may go undetected when using traditional fairness metrics.
3. Many real-world datasets suffer from class imbalance (e.g., far fewer positive outcomes in one class). How do the fairness definitions discussed in the paper (such as equal opportunity or demographic parity) handle imbalanced datasets? Are these fairness definitions robust to imbalance, or do they require modifications?
4. The paper discusses fairness in algorithmic models, but do not directly address the distinction between black-box models (e.g., deep learning) and transparent models (e.g., decision trees, linear models). How does the complexity of a model influence the way fairness is measured and enforced?
Based on paper [4], answer the following questions:
1. Define individual fairness and group fairness as presented in the paper. How does the papers describe the tension between these two concepts, and why are they often considered in conflict in fairness literature?
2. According to the paper, the apparent conflict between individual and group fairness is context-dependent. Select a real-world application (e.g., hiring, college admissions, criminal justice, or loan approvals) and discuss how you would navigate the trade-offs between individual and group fairness in this specific context.
Based on papers [5-7], answer the following questions:
In this assignment, you will explore how the COMPAS algorithm performs in relation to three key fairness criteria: Independence, Separation, and Sufficiency. In addition, you will investigate how to mitigate bias using in-processing and post-processing solutions from the AIF360 toolkit.
1. Independence (Statistical Parity): analyze whether the COMPAS algorithm’s predictions are independent of race. In other words, check if the likelihood of a positive outcome (high-risk prediction) is the same across different racial groups, regardless of their true recidivism status. How well does the COMPAS algorithm satisfy the Independence criterion?
2. Separation (Equalized Odds): separation ensures that error rates are similar for all groups, so examine whether the COMPAS algorithm satisfies separation by comparing the false positive rates (FPR) and false negative rates (FNR) across racial groups. Does the COMPAS algorithm have similar FPR and FNR across racial groups?
3. Sufficiency (Predictive Parity): investigate whether the COMPAS algorithm demonstrates predictive parity, meaning that the predicted risk scores are equally accurate for different racial groups. Does the algorithm meet the Sufficiency criterion in terms of predictive accuracy across racial groups?
4. Ethical Considerations: based on your results, which fairness dimension(s) should be prioritized in redesigning the COMPAS algorithm, and why? Discuss the ethical and social considerations that should guide these decisions.
5. Bias mitigation: using AIF360 toolkit, explore available in-processing (e.g., adversarial debiasing, prejudice remover, and meta fair classifier) and post-processing (e.g., equalized odds post-processing and calibrated equalized odds post-processing) bias mitigation algorithms and apply them to the COMPAS dataset. How did applying this method affect the fairness dimensions (Independence, Separation, Sufficiency) in the COMPAS algorithm?