Data Feminism is a paradigm that reimagines the concept of data and its applications while acknowledging the inherent power imbalances within data science. It recognizes that power is unequally distributed globally, with data itself serving as a form of power. Given the often unjust utilization of data, the primary goal of Data Feminism is to realize and reshape these imbalances. Data Feminism goes beyond a narrow focus on gender; rather, it adopts an intersectional approach by acknowledging various factors such as race, class, sexuality, ability, age, and religion that intersect to shape individuals' experiences and opportunities.
The Data Feminism course aims to bridge ethical and social justice themes with advancements in data science, exploring how individuals working with data can actively challenge and transform power differentials through a Data Feminism lens. This course is positioned at the intersection of data science and intersectional feminism. The objectives are mainly drawn based on the seven principles outlined in the book "Data Feminism" by Catherine D'Ignazio and Lauren F. Klein.
The Course Examiner
The Course Structure
The course contains seven modules, each dedicated to one of the outlined objectives. Within each module, students will engage in two sessions: one lecture and one discussion. In the lecture session of each module, the instructor will provide a comprehensive introduction to the module's context, offering an overview of the designated reading material. Subsequently, students will have one week to thoroughly review the assigned reading materials and submit a detailed critique of the selected papers. The discussion session of each module will be dedicated to a thorough review and in-depth exploration of the module's topic and associated papers.
Intended Learning Outcome (ILO)
After the course, the student should be able to:
ILO1: understand the theoretical and technical issues related to data justice.
ILO2: apply acquired knowledge to employ data and data science as tools to confront injustices magnified by data and associated techniques.
ILO3: analyze and evaluate data science practices by recognizing their biases and taking actions to address them.
Prerequisites
The students should have completed courses in machine learning and deep learning and be familiar with Python programming.
Assessment
Grading in this course will be based on four distinct tasks: completion of module reading assignments, presentation, active participation during module sessions, and the final project. The assignments can be undertaken in groups of two students.
Task 1 (reading assignments): each student/group is required to submit a comprehensive review for a set of assigned papers corresponding to each module.
Reading Assignments 1
Reading Assignments 2
Reading Assignments 3
Reading Assignments 4
Reading Assignments 5
Reading Assignments 6
Reading Assignments 7
Task 2 (presentation): each student/group will act as the moderator for a pre-selected module. In this capacity, they are responsible for presenting the set of the assigned papers, contributing to the collective understanding of the module's content.
Task 3 (group discussion): students are expected to attend the group presentation sessions and actively engage in the subsequent group discussions.
Task 4 (final project): the final project requires each student/group to reproduce a paper relevant to the course topics. A set of papers will be provided to the students, but they also have the option to propose alternative papers for consideration.
Grading
The course will be assessed on a Pass/Fail basis, and successful completion is contingent on meeting specific criteria. These criteria encompass completing at least 75% of the reading assignments, delivering a presentation during the discussion session, attending a minimum of 75% of the student presentation sessions, and successfully implementing the chosen paper, incorporating basic experiments.
Credits
It is a 7.5 ECTS credits course that spans 224 hours over 14 weeks, including the time allocated for the final project.
Schedule
Module 1: Critiquing Power in Data Science
Lecture Session: Sep. 10, 13:00-15:00 [slides]
Discussion Session: Sep. 17, 13:00-15:00 [slides]
Required Reading
- Data Feminism, Catherine D'Ignazio and Lauren F. Klein (intro, ch. 1-2)
- Black Feminist Thought, Patricia Hill Collins (ch. 12)
- Design Justice, Sasha Costanza-Chock (intro)
- Dig Deep: Beyond Lean In, bell hooks [link]
- Feminism for the 99%: A Manifesto, Nancy Fraser (thesis 1-10)
Optional Reading
- Data Grab, Ulises A. Mejias and Nick Couldry (ch. 1, ch. 6)
- Feminist Theory: From Margin to Center, bell hooks (ch. 1)
- Algorithms of Oppression, Safiya Umoja Noble (ch. 1)
- Race after Technology, Ruha Benjamin (intro)
- Automating Inequality, Virginia Eubanks (ch. 4)
- Demarginalizing the Intersection of Race and Sex, Kimberlé Crenshaw
- Restorative Justice and Reparations, Margaret Urban Walker
- Combahee River Collective Statement [link]
- Exclusive: Workers at Google DeepMind Push Company to Drop Military Contracts [link]
- Forget Project Maven. Here Are A Couple Other DoD Projects Google Is Working On [link]
Module 2: Ghost Work
Lecture Session: Sep. 24, 13:00-15:00 [slides]
Discussion Session: Oct. 1, 13:00-15:00 [slides]
Required Reading
- Data Feminism, Catherine D'Ignazio and Lauren F. Klein (ch. 7)
- The Exploited Labor Behind Artificial Intelligence, Adrienne Williams et al. [link]
- Ethical Norms and Issues in Crowdsourcing Practices: A Habermasian Analysis, Daniel Schlagwein et al., Information Systems Journal, 2019
- The Data-Production Dispositif, Milagros Miceli et al., ACM CSCW, 2022
- Difference and Dependence Among Digital Workers: The Case of Amazon Mechanical Turk, Lilly Irani, South Atlantic Quarterly, 2015
- Platformization of Inequality: Gender and Race in Digital Labor Platforms, Isabel Munoz et al., ACM CSCW, 2024
- The Cultural Work of Microwork, Lilly Irani, New Media & Society, 2015
- Turkopticon: Interrupting Worker Invisibility in Amazon Mechanical Turk, Lilly Irani et al., ACM SIGCHI, 2013
- We are Dynamo: Overcoming Stalling and Friction in Collective Action for Crowd Workers, Niloufar Salehi et al., ACM SIGCHI, 2015
- A Typology of Artificial Intelligence Data Work, James Muldoon et al., Big Data & Society, 2024
- Digital Labour Platforms and the Future of Work, Janine Berg et al., Rapport de l'OIT, 2018
Optional Reading
- Ghost Work, Mary L. Gray and Siddharth Suri (ch. 1)
- Atlas of AI, Kate Crawford (ch. 2)
- Wages Against Housework, Silvia Federici [link]
- Making Feminist Points, Sara Ahmed [link]
- Justice for Data Janitors, Lilly Irani [link]
- Digital Labour Markets in the Platform Economy, Florian Schmidt, 2017
- Whose Truth? Power, Labor, and the Production of Ground-Truth Data, Milagros Miceli, 2023
Module 3: Data Colonialism
Lecture Session: Oct. 8, 13:00-15:00 [slides]
Discussion Session: Oct. 15, 13:00-15:00 [slides]
Required Reading
- Data Feminism, Catherine D'Ignazio and Lauren F. Klein (ch. 5-6)
- Data Colonialism: Rethinking Big Data's Relation to the Contemporary Subject, N. Couldry and U.A. Mejias, Television & New Media, 2019
- Against Cleaning, Katie Rawson and Trevor Muñoz, Debates in the Digital Humanities, 2019
- Artificial Intelligence and Inclusion: Formerly Gang-Involved Youth as Domain Experts for Analyzing Unstructured Twitter Data, William R. Frey et al., Social Science Computer Review, 2020
- Word Embeddings Quantify 100 Years of Gender and Ethnic Stereotypes, Nikhil Garg et al., National Academy of Sciences, 2018
- Datasheets for datasets, Timnit Gebru et al., Communications of the ACM, 2021
- The Dataset Nutrition Label, Sarah Holland et al., Data Protection and Privacy, 2020
- Documenting Data Production Processes: A Participatory Approach for Data Work, Milagros Miceli et al., 2022
Optional Reading
- The Anti-Eviction Mapping Project: Counter Mapping and Oral History Toward Bay Area Housing Justice, Manissa Maharawal et al., 2018
- Data Grab, Ulises A. Mejias and Nick Couldry (ch. 1, ch. 6)
- All Data Are Local: Thinking Critically in a Data-Driven Society, Yanni Loukissas (ch. introduction)
- Ghost Stories for Darwin, Banu Subramaniam (ch. introduction)
- Indigenous Statistics: A Quantitative Research Methodology, By Maggie Walter, Chris Andersen (introduction)
- Situated Knowledges: The Science Question in Feminism and the Privilege of Partial Perspective, Donna Haraway, 2013
- Social Media for Large Studies of behavior, Derek Ruths and Jürgen Pfeffer, Science, 2014
- Tampering with Twitter's sample API, Jürgen Pfeffer et al., EPJ Data Science, 2018
- The Dataset Nutrition Label (2nd Gen), Kasia S. Chmielinski et al., 2022
- Data Biographies: Getting to Know Your Data, Heather Krause [link]
- Data user guides, Bob Gradeck [link]
- The Subjects and Stages of AI Dataset Development: A Framework for Dataset Accountability, Mehtab Khan et al., 2023
Module 4: Bias and Fairness in Data
Lecture Session: Oct. 22, 13:00-15:00 [slides]
Discussion Session: Oct. 29, 13:00-15:00 [slides] [notebook]
Required Reading
- A Framework for Understanding Sources of Harm Throughout the Machine Learning Life Cycle, Harini Suresh et al., ACM EAAMO, 2021
- Assessing and Remedying Coverage for a Given Dataset, Abolfazl Asudeh et al., IEEE ICDE, 2019.
- Representation Bias in Data: A Survey on Identification and Resolution Techniques, Nima Shahbazi et al., ACM Computing Surveys, 2023
- Bias Mitigation for Machine Learning Classifiers: A Comprehensive Survey, Max Hort et al., ACM Journal on Responsible Computing, 2024
- Machine bias, Julia Angwin et al., ProPublica, 2016 [link]
- Practical Fairness, Aileen Nielsen (ch. 3-4)
- Moving Beyond "Algorithmic Bias Is A Data Problem", Sara Hooker, Patterns, 2021
- Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification, Joy Buolamwini et al., Faact, 2018
- AI Fairness 360: An Extensible Toolkit for Detecting, Understanding, and Mitigating Unwanted Algorithmic Bias, 2018
- Data Preprocessing Techniques for Classification without Discrimination, Faisal Kamiran, Knowledge and Information Systems, 2012
- Certifying and Removing Disparate Impact, Michael Feldman et al., ACM KDD, 2015
- Learning Fair Representations, Zemel, Rich, et al., PMLR, 2013
Optional Reading
- Bias and Unfairness in Machine Learning Models, Tiago P. Pagano et al., Big data and Cognitive Computing, 2023
- The Accuracy, Fairness, and Limits of Predicting Recidivism, Julia Dressel et al., Science Advances, 2018
- Characterizing Bias in Compressed Models, Sara Hooker et al., arXiv, 2020
- Punishing Risk, Erin Collins, Geo. LJ, 2018
- Tackling Documentation Debt: A Survey on Algorithmic Fairness Datasets, Fabris, Alessandro, et al., ACM EAAMO, 2022
- Fairness in the AI Lifecycle [link]
- No Classification Without Representation, Shreya Shankar et al., arXiv, 2017
- Inherent Trade-offs in the Fair Determination of Risk Scores, Jon Kleinberg et al., arXiv, 2016
- Assessing Risk Assessment in Action, Megan Stevenson, Minn. L. Rev., 2018
- Data Augmentation for Discrimination Prevention and Bias Disambiguation, Shubham Sharma et al., AAAI/ACM AIES, 2020
- Dealing with Bias via Data Augmentation in Supervised Learning Scenarios, Vasileios Iosifidis et al., 2018
- Word Embeddings Quantify 100 Years of Gender and Ethnic Stereotypes, Nikhil Garg et al., National Academy of Sciences, 2018
- Contextual Analysis of Social Media: The Promise and Challenge of Eliciting Context in Social Media Posts with Natural Language Processing, Desmond U. Patton et al., 2020
- Artificial Intelligence and Inclusion: Formerly Gang-Involved Youth as Domain Experts for Analyzing Unstructured Twitter Data, William R. Frey et al., Social Science Computer Review, 2020
Module 5: Bias and Fairness in Models
Lecture Session: Nov. 5, 13:00-15:00 [slides]
Discussion Session: Nov. 12, 13:00-15:00 [slides]
Required Reading
- Data Feminism, Catherine D'Ignazio and Lauren F. Klein (ch. 4)
- Fairness and Machine Learning, Solon Barocas et al. (ch. 3)
- The Ethical Algorithm, Michael Kearns, Aaron Roth (ch. 2)
- Fairness in Machine Learning: A Survey, Simon Caton et al., ACM Computing Surveys, 2024
- Bias Mitigation for Machine Learning Classifiers: A Comprehensive Survey, Max Hort et al., ACM Journal on Responsible Computing, 2024
- On the Apparent Conflict Between Individual and Group Fairness, Reuben Binns, ACM FAcct, 2020
- Fairness Definitions Explained, Sahil Verma et al., IEEE/ACM FairWare, 2018
- A Short-Term Intervention for Long-Term Fairness in the Labor Market, Lily Hu et al., WWW, 2018
- A Unified Approach to Quantifying Algorithmic Unfairness: Measuring Individual and Group Unfairness Via Inequality Indices, Till Speicher et al., ACM SIGKDD, 2018
- Fairness in Machine Learning: Lessons from Political Philosophy, Reuben Binns, PMLR, 2018
- Fairness Through Awareness, Cynthia Dwork et al., ITCS, 2012
Optional Reading
- Atlas of AI, Kate Crawford (ch. 4)
- Sorting Things Out, Geoffrey C. Bowker, Susan Leigh Star (ch. introduction)
- Practical Fairness, Aileen Nielsen (ch. 5-6)
- Fairness Metrics: A Comparative Analysis, Pratyush Garg et al., IEEE Big Data, 2020
- A Survey on Bias and Fairness in Machine Learning, Ninareh Mehrabi et al., ACM CSUR, 2021
- Mitigating Unwanted Biases with Adversarial Learning, B.H. Zhang et al. AAAI/ACM AIES, 2018
- On Fairness and Calibration, Geoff Pleiss et al., Advances in neural information processing systems, 2017
- Data Preprocessing Techniques for Classification without Discrimination, Faisal Kamiran, Knowledge and Information Systems, 2012
- Learning Fair Representations, Zemel, Rich, et al., PMLR, 2013
- Fairness-aware Classifier with Prejudice Remover Regularizer, Toshihiro Kamishima et al., ECML PKDD, 2012
- Fairness Constraints: Mechanisms for Fair Classification, Muhammad Bilal Zafar et al., AISTATS, 2017
- Fairness Beyond Disparate Treatment & Disparate Impact, Muhammad Bilal Zafar et al., WWW, 2017
- Equality of Opportunity in Supervised Learning, Moritz Hardt et al., Advances in Neural Information Processing Systems, 2016
Module 6: Intersectionality
Lecture Session: Nov. 15, 13:00-15:00 [slides]
Discussion Session: Nov. 26, 13:00-15:00 [slides]
Required Reading
- Demarginalizing the Intersection of Race and Sex, Kimberlé Crenshaw, Feminist Legal Theories. Routledge, 2013
- A Survey on Intersectional Fairness in Machine Learning: Notions, Mitigation, and Challenges, Usman Gohar et al., arXiv, 2023
- Preventing Fairness Gerrymandering: Auditing and Learning for Subgroup Fairness, Michael Kearns et al., PMLR, 2018
- Multicalibration: Calibration for the (Computationally-Identifiable) Masses, Ursula Hébert-Johnson et al., PMLR, 2018
- Differential Fairness: An Intersectional Framework for Fair AI, Rashidul Islam et al., Entropy, 2023
- Are "Intersectionally Fair" AI Algorithms Really Fair to Women of Color? A Philosophical Analysis, Youjin Kong, ACM FAccT, 2022
- (Un)Fairness in AI: An Intersectional Feminist Analysis, Youjin Kong [link]
- Studying Up Machine Learning Data: Why Talk About Bias When We Mean Power?, Milagros Miceli et al., ACM HCI, 2022
- Factoring the Matrix of Domination: A Critical Review and Reimagination of Intersectionality in AI Fairness, Anaelia Ovalle et al., AAAI/ACM AIES, 2023
- Intersectionality, Patricia Hill Collins
- Intersectionality as Critical Social Theory, Intersectionality as Critical Social Theory
Optional Reading
- Fairness Improvement with Multiple Protected Attributes: How Far Are We?, Zhenpeng Chen et al., IEEE/ACM ICSE, 2024
- Bias Out-of-the-Box: An Empirical Analysis of Intersectional Occupational Biases in Popular Generative Language Models, Hannah Rose Kirk et al., Advances in Neural Information Processing Systems, 2021
- Low-Degree Multicalibration, Gopalan, Parikshit, et al., PMLR, 2022
- Probably Approximately Metric-Fair Learning, Guy N. Rothblum et al., PMLR, 2018
- Understanding and Mitigating Worker Biases in the Crowdsourced Collection of Subjective Judgments, Christoph Hube et al., ACM CHI, 2019
- Fairness for Unobserved Characteristics: Insights From Technological Impacts on Queer Communities, Nenad Tomasev et al., AAAI/ACM AIES, 2021
- Intersectionality as a Regulative Ideal, Katherine Gasdaglis et al., 2019
- Measuring Intersectional Biases in Historical Documents, Nadav Borenstein et al., arXiv, 2023
Module 7: Emotion and Embodiment
Lecture Session: Dec. 3, 13:00-15:00, Guest Lecturers: Miriah Meyer and Derya Akbaba
Discussion Session: Dec. 10, 14:00-16:00 [slides]
Required Reading
- Data Feminism, Catherine D'Ignazio and Lauren F. Klein (ch. 3)
- Visualization Rhetoric: Framing Effects in Narrative Visualization, Jessica Hullman et al., IEEE Trans. Vis. Comput. Graph., 2011
- Entanglements for Visualization: Changing Research Outcomes through Feminist Theory, Derya Akbaba et al., IEEE Trans. Vis. Comput. Graph., 2024
- Data Hunches: Incorporating Personal Knowledge into Visualizations, Haihan Lin et al., IEEE Trans. Vis. Comput. Graph., 2022
- A Framework for Externalizing Implicit Error Using Visualization, Nina McCurdy et al., IEEE Trans. Vis. Comput. Graph., 2018
- Feminist Data Visualization, Catherine D'Ignazio and Lauren F. Klein, IEEE VIS4DH, 2016
- Feminist HCI: Taking Stock and Outlining an Agenda for Design, Shaowen Bardzell, SIGCHI, 2010
- Troubling Collaboration: Maters of Care for Visualization Design Study, Derya Akbaba et al., ACM CHI, 2023
- The Work That Visualisation Conventions Do, Helen Kennedy et al., Information, Communication and Society, 2016
- Iceberg Sensemaking: A Process Model for Critical Data Analysis, Charles Berret et al., IEEE Trans. Vis. Comput. Graph., 2024
Optional Reading
- Speaking from the Heart: Gender and the Social Meaning of Emotion, Stephanie A. Shields
- Design Justice, Sasha Costanza-Chock
- Dear Data, Stefanie Posavec, Giorgia Lupi
- Situated Knowledges: The Science Question in Feminism, Donna Haraway, 2013
- Emotional Data Visualization: Periscopic's "U.S. Gun Deaths" and the Challenge of Uncertainty [link]
- Discursive Patinas: Anchoring Discussions in Data Visualizations, Tobias Kauer et al., IEEE Trans. Vis. Comput. Graph., 2024
- When the Body Became Data: Historical Data Cultures and Anatomical Illustration, Michael Correll et al., ACM CHI, 2024
- Data is Personal: Attitudes and Perceptions of Data Visualization in Rural Pennsylvania, Evan M. Peck et al., ACM CHI, 2019
- Viral Visualizations: How Coronavirus Skeptics Use Orthodox Data Practices to Promote Unorthodox Science Online, Crystal Lee et al., ACM CHI, 2021
- The Power of Absence: Thinking with Archival Theory in Algorithmic Design, Jihan Sherman et al., ACM DIS, 2024
- Disclosure as a Critical-Feminist Design Practice for Web-based Data Stories, Hannah Schwan et al., First Monday, 2022
- Cultural Feminism versus Post-Structuralism: The Identity Crisis in Feminist Theory, Linda Alcoff, Signs: Journal of Women in Culture and Society, 1988