PHD Exploitation

Whose Truth?
Power, Labor, and the Production of

Ground-Truth Data
vorgelegt von
M.A. Sozialwissenschaften
María de los Milagros Miceli
ORCID: 0000-0003-0585-3072
an der Fakultät IV - Elektrotechnik und Informatik

der Technischen Universität Berlin
zur Erlangung des akademischen Grades
Doktorin der Ingenieurwissenschaften
-Dr.-Ing.-
genehmigte Dissertation
Promotionsausschuss:
Vorsitzender: Prof. Dr. Uwe Nestmann
Gutachterin: Prof. Dr. Bettina Berendt
Gutachter: Prof. Dr. Antonio Casilli
Gutachterin: Dr. Alex Hanna
Tag der wissenschaftlichen Aussprache: 29.08.2022
Berlin 2023
Zusammenfassung
Um die unersättliche Nachfrage nach mehr, günstigeren und zunehmend differenzierteren Daten
für die Machine-Learning (ML)-Industrie zu befriedigen, werden Aufgaben wie Datenerhebung,
-aufbereitung und -annotation an spezialisierte Unternehmen und Plattformen ausgelagert. Die
Datenarbeiter*innen, die diese Aufgaben erledigen, sind vom Rest der ML-Produktionskette
getrennt. Sie arbeiten unter prekären Bedingungen und werden stark überwacht. Die vorliegende
Dissertation untersucht Unternehmen, in denen Ground-Truth-Daten produziert werden.
Ground-Truth-Daten liefern die Variable, die zum Trainieren und Validieren der meisten
überwachten ML-Modelle verwendet wird. Basierend auf Feldforschung bei zwei Unternehmen
in Argentinien und Bulgarien, Interviews mit Datenarbeitern*innen, Manager*innen und
ML-Ingenieur*innen sowie einem mehrjährigen partizipatorischen Designprozess verortet diese
Dissertation die Datenproduktion in spezifischen Umfeldern, die durch besondere Marktan-
forderungen, lokale Kontexte und Arbeitskonstellationen geprägt sind. Diese Dissertation
erweitert bisherige Forschung im Bereich der Datenerstellung und des Crowdsourcings, indem
sie die wirtschaftlichen Imperative in ML-Lieferketten beschreibt. Dabei wird argumentiert,
dass Arbeit ein grundlegender in ML-Ethikdiskurse zu integrierender Aspekt ist. Die Ergebnisse
zeigen, dass Ground-Truth-Daten das Produkt subjektiver und asymmetrischer sozialer
und arbeitsbezogener Beziehungen sind. Enge Arbeitsanweisungen und -tools, prekarisierte
Arbeitsbedingungen und lokale, von Wirtschaftskrisen geprägte Kontexte sorgen dafür, dass
die Datenarbeiter*innen den Managern*innen und Kunden*innen gegenüber gehorsam bleiben.
In solchen Konstellationen haben die Kunden*innen die Macht, den Daten ihre bevorzugten
„Wahrheitswerte” aufzuerlegen, solange sie die finanziellen Mittel haben, die Arbeiter*innen
zu bezahlen, die diese Auferlegung ausführen. Durch solche Produktionsprozesse werden
den Daten naturalisierte, aber gleichzeitig willkürliche Formen des Wissens eingeschrieben.
Dokumentationspraktiken haben großes Potential, in Daten eingebettete „Wahrheiten” sichtbar
und anfechtbar zu machen. Die kollaborative Dokumentation von Datenproduktionspro-
zessen kann Momente des Dissenses bewahren, Feedback-Schleifen ermöglichen und den
Datenarbeitern*innen eine Stimme geben. Diese Disseration stellt Überlegungen für das
Dokumentationsdesign vor, die es den Datenarbeitern*innen ermöglichen, in die Gestaltung
von Arbeitsanweisungen, in die durch ihre Arbeit produzierten Daten und letztlich in die
beteiligten Produktionsprozesse einzugreifen. Die Verbesserung der materiellen Bedingungen
in der Datenarbeit, die Ermächtigung der Arbeiter*innen und die Betrachtung ihrer Arbeit als
mächtiges Werkzeug zur Produktion besserer Daten sowie die detaillierte Dokumentation der
Datenproduktionsprozesse sind wesentliche Schritte, um Reflexion-, Diskussion- und Auditing-
Räume zu ermöglichen, die dazu beitragen können, wichtige soziale und ethische Fragen im
Zusammenhang mit ML-Technologien zu klären.
Abstract
To satisfy the voracious demand for more, cheaper, and increasingly differentiated data for
machine learning (ML), tasks such as data collection, curation, and annotation are outsourced
through specialized firms and platforms. The data workers who perform these tasks are kept
apart from the rest of the ML production chain. They work under precarious conditions and are
subject to continuous surveillance. This dissertation focuses on business process outsourcing
companies (BPOs) where ground-truth data is produced. Ground-truth data delivers the
variables that are used to train and validate most forms of supervised ML models. Through
fieldwork at two BPOs located in Argentina and Bulgaria, interviews with data workers,
managers, and ML practitioners, as well as a longitudinal participatory design engagement
with workers at both organizations, this dissertation situates data production in specific
settings shaped by particular market demands, local contexts, and labor constellations. It
expands previous research in data creation and crowdsourcing by discussing the economic
imperatives and labor relationships that shape ML supply chains and arguing that labor is
a fundamental aspect to be integrated into ML ethics discourses. The findings show that
ground-truth data is the product of subjective and asymmetrical social and labor relationships.
Narrow instructions and work interfaces, precarized labor conditions, and local contexts shaped
by economic crises ensure that data workers remain obedient to managers and clients. In such
constellations, clients have the power to impose their preferred “truth values” on data as long
as they have the financial means to pay workers who execute that imposition. Naturalized
yet arbitrary forms of knowledge are inscribed in data through such production processes.
This dissertation argues that documentation practices are key for making naturalized “truths”
encoded in data visible and contestable. The collaborative documentation of data production
processes can preserve moments of dissent, enable feedback loops, and center workers’ voices.
The findings present a series of considerations for designing documentation frameworks that
allow data workers to intervene in the shaping of task instructions, the data produced through
their labor, and, ultimately, the production processes involved. Improving material conditions
in data work, empowering workers, recognizing their labor as a powerful tool to produce better
data, and documenting data production processes in detail are essential steps to allow for
spaces of reflection, deliberation, and audit that contribute to addressing important social and
ethical questions surrounding ML technologies.
Para Marc, Anna y Bruno, que siempre están.
Para mi papá, donde quiera que esté.
Acknowledgements
First and foremost, I would like to thank the data laborers who shared the details of their
work and their often very difficult experiences with me. My research and this dissertation
would not exist without you. You have my eternal appreciation and solidarity.
Throughout my doctoral work, I received support and encouragement from my committee,
Professor Bettina Berendt, Professor Antonio Casilli, and Dr. Alex Hanna. Their valuable
expertise provided me with the tools and confidence to define my research direction and
complete my Ph.D.
I am forever indebted to Dr. Alex Hanna for advising my Ph.D. project and believing in
me and my work since the very beginning. Alex, thank you for your continuous trust and
encouragement, for the many hours of undivided attention despite your busy schedule, and for
your clear guidance. To me and many others, you are an invaluable leader and role model.
You have taught me much about research, but, most importantly, you have shown me the kind
of mentor I want to be to my future students. I will try my best to pay it forward.
My deepest gratitude goes to Professor Antonio Casilli whose generosity and commitment
allowed me to complete this project. Antonio, having your support through challenging times
has been a blessing. Thanks for inviting me to be a part of DIPLab and other amazing projects.
I am indebted to you for never letting me forget that I am a social scientist. I hope this
dissertation makes you proud and look forward to more collaborations in the future.
I would like to express my most sincere gratitude to Professor Bettina Berendt who took
a chance on my work across disciplinary boundaries. Her generosity allowed me to officially
become a Ph.D candidate in computer science and have my work fully funded. Bettina, thank
you so much for giving me an opportunity to leave my comfort zone by taking on this challenge.
I am indebted to my colleagues at the Weizenbaum Institute, whose collective solidarity
made the funding of my research possible. Thanks to Stefan Ulrich, Diana Serbanescu, Andrea
Hamm, and Jacob Kröger. Special thanks to Martin Schuessler who mentored and guided me
through the HCI realms and helped me navigate and translate across disciplines. My deepest
appreciation to Tianling Yang who contributed to this research with invaluable assistance.
Ling, I hope to accompany you in your Ph.D journey as you accompanied me in mine.
I would also like to express my appreciation to the staff at TU Berlin who supported my
work: Professor Uwe Nestmann who generously serves as Committee Chair, Jana Peich, for
her patience answering all of my questions, and Evelyn Adams, for providing assistance with
administrative affairs, making sure I did not overwork myself, and looking out for me.
I want to acknowledge my colleagues at the DAIR Institute, Raesetje Sefala, Meron
Estefanos, Dylan Baker, Adrienne Williams, and the great Dr. Timnit Gebru, who keep on
teaching me that another way of doing research is possible.
Thanks to the many collaborators who accompanied me on this journey, especially the
two wonderful Latin American researchers I have the pleasure of calling my friends: Adriana
Alvarado and Julian Posada. You have been the best source of inspiration, encouragement,
and ideas. I consider myself lucky to have had the privilege of working with you.
To my beautiful family and friends, Juan, Flor, Salvador, Libertad, Mariana, Rocío, and
my twelve nieces and nephews. To my mother, my brother Leo, and my sister Marcela, who
never stopped cheering for me. To my brother José Luis whose love shaped me and whose
light keeps on guiding me. To my father who lived to see and celebrate the submission of
my last paper and to share my appearances in the press with his butcher, the neighbors, and
the newspaper vendor even if neither he nor they could understand English. Dad, you never
stopped letting me know how proud you were of me. I promise I will relax and enjoy life more
now. I miss you so much.
Finally, it is hard to find words to express the immense gratitude I feel towards my children,
Anna and Bruno, and my husband, Marc. You have given me much more love than I could
ever possibly deserve, even in times when I was not my best self. Thanks for the laughs, the
drawings, the hugs, and the words of encouragement. Above all, thanks for your immense
patience. Marc, I will never forget how you kept everything else running while I was busy
doing this. Bruno, Anna, and Marc, I love you more than you will ever know. Thanks for
being my home.
x
Table of Contents
Title Page i
Zusammenfassung iii
Abstract v
List of Figures xv
List of Tables xvii
List of Papers Resulting from my Doctoral Work xix
List of Abbreviations xxiii
1 Introduction 1
1.1 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Defining Key Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.1 Human-Based Computation . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.2 Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2.3 Power . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.3 Research Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.3.1 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.3.1.1 Grounded Theory Methodology . . . . . . . . . . . . . . . . . 11
1.3.1.2 Dispositif Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.3.1.3 Participatory Design . . . . . . . . . . . . . . . . . . . . . . . . 13
1.3.2 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.3.2.1 Observations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.3.2.2 Qualitative Interviews . . . . . . . . . . . . . . . . . . . . . . . 14
1.3.2.3 Document Collection . . . . . . . . . . . . . . . . . . . . . . . 15
1.3.2.4 Co-Design Workshops . . . . . . . . . . . . . . . . . . . . . . . 15
1.3.3 Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.3.3.1 GTM Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.3.3.2 Critical Discourse Analysis . . . . . . . . . . . . . . . . . . . . 16
1.3.3.3 Reflexive Thematic Analysis . . . . . . . . . . . . . . . . . . . 17
1.3.4 Research Ethics and Positionality . . . . . . . . . . . . . . . . . . . . . . 18
xi
TABLE OF CONTENTS
2 Data Production and Power 21

2.1 Background and Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
P1 Paper 1. Studying Up Machine Learning Data: Why Talk About Bias When
We Mean Power? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
P1–1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
P1–2 The Limits of Bias . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
P1–2.1 Data is Always Biased . . . . . . . . . . . . . . . . . . . . . . . 26
P1–2.2 “Mitigating Worker Biases” Should Not Be the Goal . . . . . . 27
P1–2.3 Data Documentation Beyond Bias Mitigation . . . . . . . . . . 29
P1–3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
P1–3.1 How and Why Study Up Data? . . . . . . . . . . . . . . . . . 32
P1–4 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3 Meaning Imposition and Epistemic Authority in Data Annotation 37

P2 Paper 2: Between Subjectivity and Imposition: Power Dynamics in Data
Annotation for Computer Vision . . . . . . . . . . . . . . . . . . . . . . . . . . 39
P2–1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
P2–2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
P2–2.1 Data Work as Human Activity . . . . . . . . . . . . . . . . . . 40
P2–2.2 Data, Classification, Power . . . . . . . . . . . . . . . . . . . . 42
P2–3 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
P2–3.1 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . 44
P2–3.2 Sample . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
P2–3.3 Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
P2–4 Findings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
P2–4.1 Project 1: Drawing Polygons . . . . . . . . . . . . . . . . . . . 48
P2–4.2 Project 2: Building Categories . . . . . . . . . . . . . . . . . . 50
P2–4.3 Project 3: Classifying Faces . . . . . . . . . . . . . . . . . . . . 51
P2–4.4 Salient Observations . . . . . . . . . . . . . . . . . . . . . . . . 53
P2–5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
P2–5.1 Implictions for Practitioners . . . . . . . . . . . . . . . . . . . 56
P2–5.2 Implications for (CSCW) Researchers . . . . . . . . . . . . . . 58
P2–5.3 Limitations and Future Work . . . . . . . . . . . . . . . . . . . 58
P2–6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
P2–7 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4 Precarization, Alienation, and Control in Data Work 65

P3 Paper 3: The Data-Production Dispositif . . . . . . . . . . . . . . . . . . . . . 69
P3–1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
P3–2 Defining Key Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
P3–2.1 Dispositif and Other Foucauldian Concepts . . . . . . . . . . . 71
P3–2.2 Data Work for Machine Learning . . . . . . . . . . . . . . . . . 72
xii
TABLE OF CONTENTS
P3–3 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
P3–3.1 Dispositif Analysis of Data Production . . . . . . . . . . . . . 73
P3–3.2 Researcher Positionality . . . . . . . . . . . . . . . . . . . . . . 74
P3–3.3 Data Collection and Analysis . . . . . . . . . . . . . . . . . . . 75
P3–4 Findings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
P3–4.1 Different Tasks, Different Instructions . . . . . . . . . . . . . . 80
P3–4.2 Linguistically-Performed Elements . . . . . . . . . . . . . . . . 82
P3–4.3 Non-Linguistic Practices and Social Context . . . . . . . . . . 87
P3–4.4 Dispositif’s Materializations . . . . . . . . . . . . . . . . . . . 92
P3–5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
P3–5.1 Implications . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
P3–5.2 Limitations and Future Research . . . . . . . . . . . . . . . . . 100
P3–6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
P3–7 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
5 Co-Designing Documentation for Reflexivity and Participation 107

P4 Documenting Computer Vision Datasets. An Invitation to Reflexive Data
Practices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
P4–1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
P4–2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
P4–2.1 Documentation of Datasets and Models . . . . . . . . . . . . . 112
P4–2.2 The Notion of Reflexivity . . . . . . . . . . . . . . . . . . . . . 113
P4–3 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
P4–3.2 Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
P4–4 Findings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
P4–4.1 Actors and Collaboration . . . . . . . . . . . . . . . . . . . . . 115
P4–4.2 Documentation Purpose . . . . . . . . . . . . . . . . . . . . . . 116
P4–4.3 Documentation as Burden . . . . . . . . . . . . . . . . . . . . 117
P4–4.4 Intelligibility of Documentation . . . . . . . . . . . . . . . . . 117
P4–5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
P4–5.1 Why Reflexivity? . . . . . . . . . . . . . . . . . . . . . . . . . 118
P4–5.2 Why Document? . . . . . . . . . . . . . . . . . . . . . . . . . . 119
P4–6 Limitations and Future Work . . . . . . . . . . . . . . . . . . . . . . . . 120
P4–7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
P4–8 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
P5 Documenting Data Production Processes: A Participatory Approach for Data
Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
P5–1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
P5–2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
P5–2.1 The Documentation of ML Datasets and Data Pipelines . . . . 125
P5–2.2 Boundary Objects . . . . . . . . . . . . . . . . . . . . . . . . . 126
P5–3 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
xiii
TABLE OF CONTENTS
P5–3.1 Participatory Design . . . . . . . . . . . . . . . . . . . . . . . . 128

P5–3.2 The Participating Organizations . . . . . . . . . . . . . . . . . 128
P5–3.4 Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
P5–3.5 Positionality Statement . . . . . . . . . . . . . . . . . . . . . . 132
P5–4 Findings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
P5–4.1 Case 1: Alamo . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
P5–4.2 Case 2: Action Data . . . . . . . . . . . . . . . . . . . . . . . . 138
P5–4.2 Salient Considerations and Summary of Findings . . . . . . . . 143
P5–5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
P5–5.1 Design Implications: Documentation for and with Data Workers147
P5–5.2 Research Implications: Challenges of Participatory Research . 148
P5–6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
P5–7 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
5.2 Dynamic Work Sheet: A Prototype . . . . . . . . . . . . . . . . . . . . . . . . . 157
6 Reflection and Conclusion 163

6.1 Summary of Findings and Contributions . . . . . . . . . . . . . . . . . . . . . . 163
6.2 Knowledge Transfer and Science Dissemination . . . . . . . . . . . . . . . . . . 167
6.3 Future Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
6.4 Reflections on the Limitations of my Research . . . . . . . . . . . . . . . . . . . 169
6.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
References 173
Appendix A Interview Guides 189

A.1 In-depth Interviews . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
A.1.1 Data Workers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
A.1.2 Managers / Founders at S1 and S2 . . . . . . . . . . . . . . . . . . . . . 191
A.2 Expert Interviews . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
A.2.1 Other BPO Managers/Founders . . . . . . . . . . . . . . . . . . . . . . 192
A.2.2 ML Practitioners . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
A.3 Semi-structured Interviews on Documentation Practices . . . . . . . . . . . . . 195
A.3.1 BPO managers and ML practitioners . . . . . . . . . . . . . . . . . . . . 195
Appendix B Workshop Facilitation Templates 197

B.1 Workshops with S1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
B.2 Workshops with S2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214
xiv
List of Figures
1.1 Ground-truth data production . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.2 Worldwide distribution of research sites, participants, and data. . . . . . . . . . 11
1.3 Fieldwork jottings from my field book . . . . . . . . . . . . . . . . . . . . . . . 14
1.4 Graphic recording of the workhops with S2 . . . . . . . . . . . . . . . . . . . . 16
1.5 My whiteboard with an incipient coding system after fieldwork at S1 and S2 . 17
3.1 Paper 2 – Fig. 1: Layered structure of actors that participate in data annotation
processes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.2 Paper 2 – Fig. 2: Paradigm model resulting form the coding process. . . . . . . 56
4.1 Paper 3 – Fig. 1: Commercial data annotation tool. . . . . . . . . . . . . . . . 94

4.2 Paper 3 – Fig. 2: Age-based classification of images on Workerhub’s interface. . 95
4.3 Paper 3 – Fig. 3: The three components of the data-production dispositif . . . . 97
5.1 Paper 5 – Fig. 1: Timeline of the iterative phases comprised in the participatory
design engagement. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
5.2 Paper 5 – Fig. 2: Ground rules for participation in the co-design sessions. . . . 130
5.3 Paper 5 – Fig. 3: The wiki-based documentation prototype created in S1 to
brief and train workers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
5.4 Paper 5 – Fig. 4: Workshop activities aiming to understand roles, workflows,
and collaboration within S1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
5.5 Paper 5 – Fig. 5: Documentation templates prototyped at S2. . . . . . . . . . . 139
5.6 Paper 5 – Fig. 6: Stakeholder map showing the three groups participating in
the second workshop session with S2. . . . . . . . . . . . . . . . . . . . . . . . . 141
5.7 Paper 5 – Fig. 7: Roleplay activity in breakout groups with S2. . . . . . . . . . 142
5.8 Paper 5 – Fig. 8: Co-design exercise for data workers to deconstruct and re-
imagine documentation practices. . . . . . . . . . . . . . . . . . . . . . . . . . . 145
5.9 Participatory and circular documentation process . . . . . . . . . . . . . . . . . 157
5.10 The Dynamic Work Sheet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
5.11 Dynamic Work Sheet: Areas of content and access. . . . . . . . . . . . . . . . . 159
5.12 Dynamic Work Sheet: Interface Overview. . . . . . . . . . . . . . . . . . . . . . 159
5.13 Dynamic Work Sheet: Main project space. . . . . . . . . . . . . . . . . . . . . . 160
5.14 Dynamic Work Sheet: Main menu . . . . . . . . . . . . . . . . . . . . . . . . . 160
5.15 Dynamic Work Sheet: Project information. . . . . . . . . . . . . . . . . . . . . 160
5.16 Dynamic Work Sheet: Team composition. . . . . . . . . . . . . . . . . . . . . . 161
xv
LIST OF FIGURES
5.17 Dynamic Work Sheet: Private area. . . . . . . . . . . . . . . . . . . . . . . . . 161

5.18 Dynamic Work Sheet: Collaboration Tools. . . . . . . . . . . . . . . . . . . . . 161
5.19 Dynamic Work Sheet: Dynamic project space . . . . . . . . . . . . . . . . . . . 162
5.20 Dynamic Work Sheet: Instant messenger . . . . . . . . . . . . . . . . . . . . . . 162
6.1 Zine created by participants at the workshop “Crossing Data” . . . . . . . . . . 167

6.2 Interview setting in S2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
xvi
List of Tables
3.1 Paper 2 – Table 1: Overview of participants and research sites. . . . . . . . . . 45

3.2 Paper 2 – Table 2: Table of core phenomenon, axial categories, open codes, and
explanatory memos. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.1 Paper 3 – Table 1: Fieldwork sites: Studied data work BPO and platforms. . . 76
4.2 Paper 3 – Table 2: Evolution of codes throughout the three phases of discourse
analysis as applied to the instruction documents. . . . . . . . . . . . . . . . . . 77
4.3 Paper 3 – Table 3: Overview of interview partners and interview characteristics. 79
4.4 Paper 3 – Table 4: Types of tasks in outsourced data work. . . . . . . . . . . . 80
5.1 Paper 4 – Table 1: Summary of descriptive dimensions in previous data

documentation frameworks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
5.2 Paper 5 – Table 1: Overview of Participants and Research Methods. . . . . . . 129
5.3 Paper 5 – Table 2: Data Analysis: Summary of codes and themes. . . . . . . . 132
6.1 Overview of papers, research questions, methods, and findings. . . . . . . . . . 164
xvii
List of Papers Resulting from my
Doctoral Work
Papers Included in this Dissertation

Paper 1: Milagros Miceli, Julian Posada, and Tianling Yang. 2022. Studying Up Machine Learning Data:
Why Talk About Bias When We Mean Power? [accepted version]. The definitive Version of Record was
published in Proceedings of the ACM on Human-Computer Interaction 6, GROUP, Article 34 (January 2022),
14 pages. https://dl.acm.org/doi/10.1145/3492853
Paper 2: Milagros Miceli, Martin Schuessler, and Tianling Yang. 2020. Between Subjectivity and Imposition:
Power Dynamics in Data Annotation for Computer Vision. Proc. ACM Hum.-Comput. Interact. 4, CSCW2,
Article 115 (October 2020), 25 pages. [version of record, open access] https://doi.org/10.1145/3415186
[Best Paper Award]
Paper 3: Milagros Miceli and Julian Posada. 2022. The Data-Production Dispositif. Proc. ACM
Hum.-Comput. Interact. 6, CSCW2, Article 460 (November 2022), 37 pages. [version of record, open
access] https://doi.org/10.1145/3555561 [Impact Award, Honorable Mention to Best Paper, Methods
Recognition]
Paper 4: Milagros Miceli, Tianling Yang, Laurens Naudts, Martin Schuessler, Diana Serbanescu, and Alex
Hanna. 2021. Documenting Computer Vision Datasets: An Invitation to Reflexive Data Practices. In
Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency (FAccT ’21).
Association for Computing Machinery, New York, NY, USA, 161–172. [version of record, open access]
https://doi.org/10.1145/3442188.3445880
Paper 5: Milagros Miceli, Tianling Yang, Adriana Alvarado Garcia, Julian Posada, Sonja Mei Wang, Marc
Pohl, and Alex Hanna. 2022. Documenting Data Production Processes: A Participatory Approach for Data
Work. Proc. ACM Hum.-Comput. Interact. 6, CSCW2, Article 510 (November 2022), 34 pages. [version of
record, open access] https://doi.org/10.1145/3555623
Other Refereed Papers

Kristen M. Scott, Sonja Mei Wang, Milagros Miceli, Pieter Delobelle, Karolina Sztandar-Sztanderska, and
Bettina Berendt. 2022. Algorithmic Tools in Public Employment Services: Towards a Jobseeker-Centric
Perspective. In 2022 ACM Conference on Fairness, Accountability, and Transparency (FAccT ’22). Association
for Computing Machinery, New York, NY, USA, 2138–2148. https://doi.org/10.1145/3531146.3534631
[Best Paper Award]
Kathleen Pine, Claus Bossen, Naja Holten Møller, Milagros Miceli, Alex Jiahong Lu, Yunan Chen, Leah
Horgan, Zhaoyuan Su, Gina Neff, and Melissa Mazmanian. 2022. Investigating Data Work Across Domains:
New Perspectives on the Work of Creating Data. In Extended Abstracts of the 2022 CHI Conference on
Human Factors in Computing Systems (CHI EA ’22). Association for Computing Machinery, New York, NY,
USA, Article 87, 1–6. https://doi.org/10.1145/3491101.3503724
xix
0. List of Papers Resulting from my Doctoral Work
Adriana Alvarado Garcia, Ivana Feldfeber, Milagros Miceli, Saide Mobayed, and Helena Suárez Val. 2022.
Crossing Data: Building Bridges with Activist and Academic Practices from and for Latin America (Cruzar
datos: Tendiendo Puentes con Prácticas Activistas y Académicas desde y para América Latina). In Extended
Abstracts of the 2022 CHI Conference on Human Factors in Computing Systems (CHI EA ’22). Association for
Computing Machinery, New York, NY, USA, Article 82, 1–6. https://doi.org/10.1145/3491101.3505222
Gunay Kazimzade and Milagros Miceli. 2020. Biased Priorities, Biased Outcomes: Three Recommendations for
Ethics-oriented Data Annotation Practices. Proceedings of the AAAI/ACM Conference on AI, Ethics, and
Society. Association for Computing Machinery, New York, NY, USA, 71.
https://doi.org/10.1145/3375627.3375809
Book Chapters
Julian Posada, Gemma Newlands, Milagros Miceli. 2023. Labor, Automation, and Human-Machine
Communication. in A. L. Guzman, R. McEwan, S. Jones, (eds.). SAGE Handbook of Human-Machine
Communication. London: SAGE
Workshop Papers and Pre-Prints

Jacob Leon Kröger, Milagros Miceli, Florian Müller. 2021. How Data Can Be Used Against People: A
Classification of Personal Data Misuses. Social Science Research Network.
Milagros Miceli, Adriana Alvarado Garcia, Julian Posada, Tianling Yang. 2021. Co-Designing a Framework to
Document Machine Learning Data Production. Workshop: The Global Labours of AI and Data Intensive
Systems. ACM Conference on Computer-Supported Cooperative Work and Social Computing (CSCW’21)
Julian Posada and Milagros Miceli. 2021. Power/Knowledge in Data Work for ML. Workshop: The Global
Labours of AI and Data Intensive Systems. ACM Conference on Computer-Supported Cooperative Work and
Social Computing (CSCW’21)
Pieter Delobelle, Kristen Scott, Sonja Mei Wang, Milagros Miceli, David Hartmann, Tianling Yang, Helena
Murasso, Karolina Sztandar-Sztanderska, Bettina Berendt. 2021. Time to Question if We Should: Data-Driven
and Algorithmic Tools in Public Employment Services. International Workshop on Fair, Effective and
Sustainable Talent Management Using Data Science. FEAST’21
Milagros Miceli and Julian Posada. 2021. Wisdom for the Crowd: Discursive Power in Annotation Instructions
for Computer Vision. CVPR 2021 Workshop: Beyond Fairness. Towards a Just, Equitable, and Accountable
Computer Vision.
Milagros Miceli, Martin Schuessler, Tianlng Yang. 2020. Towards Reflexive Documentation for ML Datasets.
Beyond Checklist Approaches to Ethics in Design Workshop. ACM Conference on Computer-Supported
Cooperative Work and Social Computing (CSCW’20)
Milagros Miceli, Martin Schuessler, Tianling Yang. 2020. Work Practices at the Intersection of Data Processing
and ML Engineering. Interrogating Data Science Workshop. ACM Conference on Computer-Supported
Cooperative Work and Social Computing (CSCW’20)
Milagros Miceli, 2020. Making Data, Making Rality. Power, Visibility, and the Production of Datasets for ML.
ACM Celebration of Women in Computing (WomENcourage ’20)
Milagros Miceli, Martin Schuessler, Tianling Yang. 2020. Sensemaking or Imposition? Power Dynamics in
Practices of Data Annotation. Workshop on Worker-Centered Design: Expanding HCI Methods for Supporting
Labor. ACM Conference on Human Factors in Computing Systems (CHI’20)
Milagros Miceli and Gunay Kazimzade. 2020. Profit, Fairness, or Both? Setting Priorities in Data Annotation.
Workshop on Fair & Responsible AI. ACM Conference on Human Factors in Computing Systems (CHI’20)
xx
Milagros Miceli. 2019. AI’s Symbolic Power: Classification in the Age of Automation. Workshop on
Human-Centered Study of Data Science Work Practices. ACM Conference on Human Factors in Computing
Systems (CHI’19)
xxi
List of Abbreviations
Research Sites CSCW Computer-Supported Cooperative Work

S1 A BPO that produces ML datasets located FATE Fairness, Accountability, Transparency, and
in Buenos Aires, Argentina Ethics.
S2 A BPO that produces ML datasets located GDPR General Data Protection Regulation
in Sofia, Bulgaria
S3 Management employees at five other data- GTM Grounded Theory Methodology

work BPOs in India, Kenya, Iraq, Syria,
and the USA HCDS Human-Centered Data Science
S4 Instruction documents for data-work tasks HCI Human-Computer Interaction

carried out in S1 and in crowdsourcing
platforms operating in Venezuela HCML Human-Centered Machine Learning
S5 ML engineers in their role as data work IT Information Technology
requesters at four technology start-ups lo-
cated in Spain, Bulgaria, Germany, Den-
mark, and the USA ML Machine Learning
PDM Participatory Design Methodologies

General
QA Quality Assurance
AI “Artificial Intelligence”
RTA Reflexive Thematic Analysis
API Application Programming Interface
RQ Research Question
BPO Business Process Outsourcing Company
CCTV Closed Circuit Television SoW Scope of Work
CDA Critical Discourse Analysis STS Science and Technology Studies
CS Computer Science USA United States of America
xxiii
Introduction
1
Over the past decade, the quest for addressing injustice and harm produced or enhanced by
machine learning (ML) systems has prompted the development of an area of research, known
as FATE, that emphasizes the issue of bias, and the values of fairness, accountability, and
transparency in mitigating negative impacts of data-driven technologies [1]. Research in this
space has shown that biases can penetrate ML systems at every layer of the pipeline, including
data, design, model, and application [2]. Special attention has been paid to the quality of data,
arguing that representativeness issues in dataset can lead to discriminatory or exclusionary
outcomes by ML systems [2, 3, 4, 5, 6, 7, 8, 9, 10]. Moreover, significant academic focus has
been directed toward investigating the effects of individual subjectivities and prejudices in
data processing and preparation tasks, such as annotation, that are carried by human workers
[11, 12, 13, 14, 15]. More recently, research has advocated for the documentation of data used
to train and validate machine learning models. Work in this area has proposed frameworks to
document and disclose datasets’ origins, purpose, and characteristics to increase transparency,
help understand models’ functioning, and anticipate ethical issues comprised in data [16, 17,
18, 19, 20].
This dissertation expands previous research by applying a relational view to the economic
imperatives that drive machine learning, i.e., arguing that harm and injustice are not technical
issues that occur in a vacuum but are fundamentally entangled with normalized asymmetrical
relationships within and among the organizations where datasets and models are produced.
My research interest is how ML data comes to be. The focus of my work is on data workers
and labor conditions in ML supply chains where ground-truth data is produced. In most forms
of (supervised) ML, the “dependent variable” used to train and validate models is called ground
truth [21]. Ground-truth data is created in a process that includes the ascription of specific
meanings to data through labeling or annotation. These processes are typically based on the
assumption that for each data point and annotation instance, there is a single right answer [22].
This way, ground-truth labels synthesize the often infinitely more complex realities comprised
in each data point to make them “readable” in computational terms. In this dissertation, I
explore how power differentials shape the process in which arbitrary truth values are encoded
1
1. Introduction
and perpetuated in data. My work is thus about truth, power, and the relation that connects
both concepts in data production processes.
Through ethnographic fieldwork and participatory design methodologies, I situate such
production processes in specific contexts with specific ways of doing things. I engage in
long-term relationships with research participants and observe data production at two main
research sites: two business process outsourcing firms (BPOs), one located in Argentina and the
other in Bulgaria. These firms specialize in providing data-related services to ML companies
and research organizations. Such data-related tasks are performed by data workers. Data
work involves the collection, curation, and cleaning of data, labeling and keywording, and, in
the case of image data, can also involve semantic segmentation (i.e., marking and separating
the different objects contained in a picture) [23, 24, 25]. Outsourced data workers perform
these tasks through digital labor platforms (crowdsourcing) or business process outsourcing
companies (BPOs). In both contexts, data work is characterized by low- or piece-wages,
limited-to-no labor protection, and high levels of control and surveillance.
The Bulgarian and Argentine research sites are companies of the impact sourcing type.
Impact sourcing refers to a branch of the outsourcing industry that employs workers from
poor and vulnerable populations to provide information-based services at very competitive
prices. The Argentine organization (hereinafter S1) employs young workers living in the slums
that surround Buenos Aires. The Bulgarian firm (hereinafter S2) works with refugees from
the Middle East living in Sofia, and, through partner organizations, with workers located in
war-ridden zones in Iraq, Afghanistan, and Syria.
This dissertation includes five research papers, each one with a specific set of research
questions and methods (see Table 6.1). The overall research scope is defined by the following
overarching questions:
1. How is ground-truth data for ML produced?
2. What market imperatives, geographical contexts, and labor constellations shape ground-
truth data production?
3. How can those contexts and constellations be made explicit, visible, and contestable?
Following these questions, my doctoral work was structured around the following three, at
times overlapping, phases:
• An exploratory phase (see Chapter 3) that included fieldwork at S1 and S2. Here,
I take the firms as units of analysis and investigate how the service relationships that
connect these organizations with their clients shape the ways in which truth values are
ascribed to data through the pre-defined categories used for data annotation.
• A deepening phase (see Chapter 4) in which I explore the data-production dispositif, i.e.,
the entanglement of discourses, task instructions, work interfaces, and labor conditions
that inform data production in outsourcing facilities and platforms.
• A design phase (see Chapter 5) in which I regard data documentation as a tool sensitive
to power, and engage in a longitudinal participatory design process with workers at
S1 and S2 to co-design documentation principles that capture the intricacies of data
production processes and elevate the voices of data workers.
2
LABELING
Images
Truth
Figure 1.1: Ground-truth data is produced through classification and labeling. Arbitrary ways of
sorting the social world become so deeply naturalized, that they are accepted as unquestionable.
Illustration created by Marc Pohl for this disssertation.
In the next sub-sections, I provide details on these three phases, including what data
collection and analysis methods they entail, and how they were implemented in the three
studies that make up this dissertation.
The papers included in this dissertations describe how data work for ML is outsourced to
be carried out in BPOs, what conditions, structures, and standards shape the tasks conducted
by data workers, and how service relationships between BPOs and clients as well as labor
relationships within BPOs translate into meaning imposition when it comes to interpreting data,
assigning meaning through labels, and producing ML datasets. I argue that the documentation
of data production processes can help make such processes visible and contestable.
Due to my academic background, the studies are grounded in social science methods.
My work’s contributions, however, are oriented toward the CS-prevalent fields of Computer-
Supported Cooperative Work And Social Computing (CSCW), Human-Computer Interaction
(HCI), and ML Fairness, Accountability, Transparency and Ethics (FATE). Those contributions
can be summarized as follows:
1. Expanding the field that investigates bias in crowdsourcing by showing how power
asymmetries manifest in ML data work in the form of meaning impositions that encode
pre-defined and arbitrary truth values in datasets and systems (see Chapter 3, Paper 2.)
2. Arguing that labor conditions in data work should be considered a fundamental aspect
of ethical AI and that data workers could be important assets in the quest for better
quality data (see Chapter 4, Paper 3.)
3. Exploring chances to promote the reflexibity and empowerment of data workers through
documentation practices and frameworks. By focusing on the documentation of data
production process (instead of datasets), this dissertation shows that documentation can
enable the participation of data workers in shaping workflows and, ultimately, data (see
Chapter 5, Papers 4 and 5.)
4. Providing a mode of analysis for ML data and data work based on the study of power.
Beyond arguing that power in socio-technical systems should be studied, the papers that
make up this dissertation (especially Paper 1, Paper 2, and 3) show how this can be
achieved.
In sum, my research work addresses the relations that shape data production in specific
(inter-) organizational contexts and aims to make visible power asymmetries that inscribe
3
1. Introduction
particular truth values in datasets. Based on the findings and the contributions of each one
of the papers that make up this dissertation, I will argue that two aspects are key to bring
about actual change toward dismantling the data-production dispositif as described in Paper
3: changing the material conditions in data work and making normalized discourses encoded
in data explicit and contestable through documentation.
1.1 Thesis Outline

This dissertation includes six chapters and five articles published in venues of the Association
for Computing Machinery (ACM), namely the HCI journal Proceedings of the ACM on Human-
Computer Interaction and the ACM Conference on Fairness, Accountability, and Transparency
(FAccT) (see List of Included Papers, p. xix). In this sub-section, I introduce a brief description
of each chapter, including its focus, method, and main findings.
Chapter 1 introduces the topic and describes the motivation, contributions, and
methodologies underlying my doctoral work. Its aim is to situate my research in the CSCW,
HCI, and FATE fields and make key concepts legible for readers from a broad array of
disciplines.
Chapter 2 situates my research in the study of power and describes the standpoint of my
work. Paper 1, Studying Up Machine Learning Data: Why Talk About Bias When We Mean
Power?, presents a review of previous HCI and CSCW work on the topic of bias and crowdwork,
and argues in favor of broadening the field of inquiry from studying bias to studying power in
ML data, data work, and data documentation. The paper introduces the research agenda that
the subsequent studies included in this dissertation followed.
Chapter 3 focuses on the ML sub-field computer vision and on one specific type of
data-work task, namely, data annotation. Paper 2, Between Subjectivity and Imposition.
Power Dynamics in Data Annotation for Computer Vision, investigates how data annotators
at S1 and S2 label image data, what contexts define the sense-making of images as performed
by data annotators in these facilities, and who in the hierarchical structures that influence
service relationships has the power to decide what labels best define each data point.
Chapter 4 presents a comprehensive study of how data work is carried out in outsourced
facilities. It discusses the specific truth values encoded in task instructions provided by
requesters and carried out by data workers. Paper 3, The Data-Production Dispositif, introduces
an analysis of the labor conditions, social contexts, and artifacts that shape data work in
crowdsourcing platforms in Venezuela and a BPO in Argentina. The study includes the analysis
of 210 task instruction documents, interviews with data workers, managers, and requesters,
and the analysis of several data-work interfaces.
Chapter 5 corresponds with the design phase of my doctoral work. Based on the findings
discussed in Chapters 3 and 4, and taking into account the power dynamics outlined in
Chapter 2, the papers included in this chapter discuss existing documentation frameworks
for ML datasets and present design consideration for the documentation of data production
processes. Paper 4, Documenting Computer Vision Datasets. An Invitation to Reflexive
Practices, argues for making specific data production contexts explicit in documentation
and theorizes on the implications of moving the focus away from transparency to aim for
4
1.2 Defining Key Concepts
reflexivity in documentation practices. Paper 5, Documenting Data Production Processes. A

Participatory Approach for Data Work, presents a hands-on design inquiry into the constraints
and desiderata of data workers regarding the design and implementation of documentation
that is able to reflect heterogeneous, often distributed, data production processes. In addition,
in Section 5.2, I present an initial prototype of how this dissertation’s findings could be applied
in a documentation framework for data production based on the needs expressed by data
workers.
Chapter 6 presents a closing discussion and reflection on my doctoral work, its findings,
and my role as a researcher co-producing knowledge with and about precarized data workers.
In addition, the chapter provides details on knowledge-transfer activities carried out as part of
my work, including public and media responses.

This thesis includes five research papers (see List of Included Papers, p. xix). Information
on the background, motivation, research questions, and related work are provided within the
respective papers and corresponding introductory chapters. Table 6.1 offers an overview of
the papers, research questions, and findings. Additionally, to place the research topics and
contributions in context and make key concepts legible for a wider variety of disciplines, this
section provides details on the theoretical background that supported my research work.
1.2.1 Human-Based Computation

Human-based computation [26, 27, 28] (or human-assisted computation [29, 30, 31, 32, 28] or
distributed thinking [28]) is a computer science approach that refers to the delegation of specific
steps in the computational processes to humans. This approach uses differences in abilities and
alternative costs between humans and computer agents to achieve symbiotic human–computer
interaction. Its application in AI development is called human-based artificial intelligence
[33]. Many other terms are used to describe this type of labor: some call it microwork or
clickwork [23, 34] or crowdsourcing [35, 36]. In this dissertation, I use the term data work,
which involves the collection, curation, classification, labeling, and verification of data used to
train and validate ML systems [37, 38].
Shestakofsky [39] observes a form of “human-software complementarity” in the human labor
that supports algorithms and helps adapt these systems to their users. Since AI systems are
continuously produced and reproduced through human actions, there is a blurring of boundaries
between the human and machine elements [40]. Meaning is created in the interaction of humans
and algorithmic models in training, customization, and deployment instances throughout the
ML production processes [41]. Commonly associated with the modeling phase in algorithmic
development, the work of ML engineers and data scientists also encompasses data work in
practices that require collaboration and negotiation [42, 43, 44], design [45, 46], and creativity
[21, 47]. Other workers such as teachers and administrative, clerical, and medical personnel
carry out data work as part of their jobs [48, 49, 50]. In these domains, data-related tasks
and the labor involved are often rendered invisible [48, 51]. Finally, ReCAPTCHA tests are a
common example of how users are forced into unpaid, invisibilized, data work [52].
5
1. Introduction
Beyond the above-mentioned domains and examples, “data work” in the context of this
dissertation will exclusively refer to the labor involved in human-based computation that is
outsourced through crowdsourcing platforms and specialized business process outsourcing
(BPO) companies. In platforms, human labor is organized through APIs, task prices, and
software-as-a-service protocols that allow requesters to receive data produced by workers
directly in IT systems [53, 54]. As a result, workers are managed automatically through
algorithms [55, 56] and more subtle means of control, including gamification [57, 58]. In
BPOs, the communication with clients and the briefing of instructions occur through personal
interactions, and workers are managed by human team leaders and managers. Similar to the
platforms, the data produced by BPO workers is fed directly into IT systems determined by
the client. If we consider that a platform is a hybrid structure halfway between a firm and
a multi-sided market, as defined by Casilli and Posada [23] and Poell, Nieborg, and Dijck
[59], the differences between platform and BPO become blurry. Platforms function as shared
infrastructures that facilitate the process of outsourcing [60, 61, 62] and constitute, at the
same time, the materiality of algorithms and interfaces that enables and manages data work
[60, 55, 56]. Such form of shared infrastructure is often also used by BPOs to facilitate the
provision of services to clients and the management of tasks and workers. For instance, S2,
one of the BPOs participating in this investigation located in Bulgaria, evolved over the years
toward a re-intermediation model [63] in which third-party organizations in the Middle East
recruit the workers while S2 manages the projects with clients (see Paper 3). This model
allows S2 to access a large and varied workforce that enables the scaling of its business. Such
forms of re-intermediation involve sharing work tools used, for example, for data labeling,
a common payment structure, and supervision mechanisms that span across workers in the
Middle East, managers in Bulgaria, and requesters worldwide. Using a third party to recruit
workers has been documented in the crowdsourcing and gig economy sectors, in which some
platforms recruit workers to do data work for other platforms [25]. Without ignoring the
flexible boundaries between data-work platforms and BPOs, I will, for the sake of specificity,
refer to BPOs as the main research sites in the papers comprised in this dissertation — except
for Paper 3 [38], co-autored with Julian Posada whose expertise and fieldwork centers around
data-work platforms.
As Paper 1 [37] will discuss, there has been abundant HCI and FATE research on the
“human element” in human-based computation. However, the focus has been primarily set on
human biases, and the proposed solutions were oriented toward the control and standardization
of individual subjectivities, most notably in the case of crowdworkers who are subject to control
and surveillance. The papers comprised in this dissertation will expand that field of inquiry
toward a consideration of market mandates, corporate priorities, and power differentials that
shape data work and the datasets that are produced as a consequence. To start outlining the
stance of this research, the following subsections will explore how power shapes subjects and
subjectivities, and how this manifests in classification practices which are a key activity in
data work.
6
1.2.2 Classification
In ML, all questions that have a fixed set of answers can be represented as classification
problems. Classification is a modeling approach, that is, a way to represent problems and
thus abstract them. ML systems are fundamentally designed to cluster and classify data.
Consequently, classification constitutes the core of data work for ML, especially in task such
as data collection, annotation, and analysis, where workers interpret, select, and label data.
Both in the field of algorithmic modeling and in the study of data practices, arbitrary
classifications output by ML systems have been connected to the reproduction of harm and
discrimination [64, 65, 66]. To address this problem, technical approaches to ML fairness
have proposed three forms of solutions, as outlined by Corbett-Davies and Goel [67]: anti-
classification (algorithmic estimates do not consider protected class attributes or proxies for
them), classification parity (parity in the error rates of predictive performance measures across
different societal groups), and calibration (identity characteristics should only be considered
if they have empirical effects on the outcome under consideration). However, as argued by
Davis, Williams, and Yang [68] among others, these approaches fall short of accounting for the
problem at the root of classification practices and providing sustainable solutions.
“To classify is human” [69] and classification practices go far beyond technical approaches.
For instance, the social sciences have a long tradition of studying power imbalances related to
classification [70, 69, 71, 72, 73, 66]. As defined by Bowker and Star [69], “a classification is a
spacial, temporal, or spatio-temporal segmentation of the world.” This means that classifications
are culturally and historically specific and subject to social, organizational, and, in most cases,
economic interests. In this sense, classifications represent subjective social and technical choices
that have significant yet usually hidden or blurry ethical and political implications [74]. The
politics involved in each classification are mostly rendered invisible as they become a form of
infrastructure that only gets noted upon breakdown [75], for instance, when a ML systems
outputs errors or gross biases [76]. Classification practices are constructed and, simultaneously,
construct the social reality we perceive and live in [72, 77].
With the growing penetration of data-driven technologies into the social realms, arbitrary
classifications are increasingly established and stabilized through algorithmic systems [78].
Moreover, as Hanna et al. [76] argue, “with the development of markets and the rise of
technological innovation and actuarial science, classification of individuals has moved from
the more general assessment based on subgroup distinction to individuation based on market
imperatives.” These systems have power to output classifications in an impersonal — often
mistaken for neutral — manner and at an unprecedented scale [66, 5]. No matter whether it
is a human or a system who does the classifying, what remains unchanged is the fact that
with each classification, meaning is imposed, and higher or lower social positions, chances, and
disadvantages are assigned [5, 6, 79, 65].
With the aim of making visible the assumptions and worldviews encoded in ML datasets,
this dissertation focuses on practices of data classification as performed by data workers within
ML supply chains. Work practices in which classification is involved inform decisions about
what is considered data [64, 80] and what meanings are ascribed to each data point [69].
Data-related decisions are infrastructural decisions [80, 69, 1, 76] as they “exercise covert
political power by bringing certain things into spreadsheets and data infrastructures, and thus
7
1. Introduction
into management and policy” [80]. Such decisions strongly shape ML datasets and systems
and have crucial impacts on communities and individuals.
As I will discuss in the next sub-section, classifications are means and manifestations of
power because every act of classification is an attempt to impose a specific reading of the
social world over others possible readings [66]. This form of power, just like the classifications
themselves, tends to be naturalized in the sense that arbitrary ways of sorting the social world
become so deeply ingrained that people come to accept them as natural and indisputable.
However, as Durkheim and Mauss [70] argue, there is nothing natural about classification
infrastructures — beyond the act of classifying which is, as Bowker and Star [69] put it, “human.”
Instead, every classification implies a hierarchical order and a power exercise. Therefore,
adopting a critical perspective to study how data classification is carried out in ML supply
chains is key to making explicit the assumptions and worldviews encoded in ground-truth data
and ML systems.
1.2.3 Power
The notion of power is central to my research and the papers included in this dissertation.
Investigating data as a human-made entity [47] informed by power asymmetries [1] means
understanding both data and power relationally. Data exists as such through human
intervention [47] because “raw data is an oxymoron” [81]. Two power theorizations are
especially relevant to this dissertation as they constitute the core of Papers 2 and 3: Pierre
Bourdieu’s symbolic power [73] and Michel Foucault’s theory of disciplinary power [82, 83].
Bourdieu defines symbolic power as the authority to sort social reality by separating groups,
classifying, and naming them [73, 84]. The act of classifying and naming a thing reifies the
existence of its category and its own existence. Thus, symbolic power is not merely a matter of
naming or describing social reality but a way of “making the world through utterance” [73]. In
Paper 2 [85], my co-authors and I use the symbolic power construct to discuss how assumptions
that reflect the naturalization of practices and meanings get encoded in datasets through data
annotation [86, 71].
Based on the Foucauldian notion of power [83], Paper 3 [38] examines the ensemble of
discourses, work practices, hierarchies, subjects, and artifacts comprised in ML data work,
and the power relationships that are established and naturalized among and through them
[82]. Foucault defines power as a series of mechanisms and strategies that influence behaviours
and discourses and serve to discipline subjects [83]. In this conception, power is not held by
individuals but works through the relations that connect them. In Foucauldian terms, power
is part of a network of relationships linked to meet an end [87].
Similarly, Bourdieu [84] offers a relational view of power as enacted in the interaction
among actors and between actors and fields. In fact, Bourdieu’s conception of power is a
close relative of Foucault’s theory [86]. Both Bourdieu and Foucault theorize the relationship
between power and knowledge. Bourdieu understands knowledge as the symbolic imposition of
arbitrary visions [88, 73]. Foucault understands knowledge as the power to define others and
to produce truth through discourses [89]. Unlike Foucault, Bourdieu argues that power can be
possessed. This way, Bourdieu maintains the individual as an active agent, while Foucault
argues that the ultimate goal of power is to form subjects [86]. In both cases, the power aspect
8
1.3 Research Design
relates to the authority to legitimize certain definitions, worldviews, and classifications while
delegitimizing others.
Power is effective as long as it is normalized, that is, taken for granted and perceived as
the inevitable way things are. Naturalization (for Bourdieu) or normalization (for Foucault)
are key for maintaining the legitimacy of otherwise arbitrary worldviews. Naturalization and
normalization refer to the ability of power to make what is, in fact, an imposition seem natural.
This way, power refers to the mechanisms that enable arbitrary ways of sorting the social
world to become so deeply ingrained in discourses and perception schemes that people come
to accept them as natural and indisputable. Through power, specific “truths” are created,
normalized, and perpetuated.
The relationship between classification, power, and normalization is key for understanding
how ground-truth data is created in production processes that include outsourced labor, and
how meanings are ascribed to data in annotation tasks. As argued by D’Ignazio and Klein
[1], “once a [classification] system is in place, it becomes naturalized as ‘the way things are.’”
This is why a fundamental goal of my research work is making data classifications that are
otherwise naturalized in seemly trivial work practices and service relationships visible and
contestable.
1.3 Research Design

This dissertation represents three years of research work. Each one of the included papers
presents a different study and pursues a specific set of research questions, respectively oriented
toward the investigation of ML data production contexts, data labor, and documentation
frameworks (see Table 6.1).
The research sites explored can be summarized as follows (see also Fig. 1.2):
• S1: A BPO located in Buenos Aires, Argentina, that offers services related to
the production of training data for ML. S1 is medium-sized organization with around
400 data workers distributed in three offices located in Argentina and two other Latin
American countries. Projects conducted at this site include image segmentation and
labeling, data collection, and content moderation. The Buenos Aires office employs
around 200 data workers, mainly young people living in very poor neighborhoods or slums
in and around the city. Employing workers from extremely poor areas is part of S1’s
mission statement as an impact-sourcing company. Workers are offered a steady part- or
full-time salary and benefits, which contrasts with the widespread contractor-based model
observed in S2 and other BPOs as well as data work platforms [54]. Still, data workers
at S1 receive the minimum legal wage in Argentina (the equivalent to US$1.50/hour in
2019) and their salaries situate them below the poverty line.
• S2: A BPO that produces ML datasets located in Sofia, Bulgaria. It specializes

in image data collection, segmentation, and labeling. Its clients are computer vision
companies and academic institutions, mostly located in North America and Western
Europe. S2 offers its data workers contractor-based work and the possibility to complete
their assignments remotely, with flexible hours. Workers are paid per piece (image/video
9
1. Introduction
or annotation), and payment varies according to the project and its difficulty. At
the time of my first visit in July 2019, the company operated with one manager and
two coordinators in salaried positions handling operations and a pool of around 60
freelance data workers. As part of its impact-sourcing mission, S2 recruited data workers
among refugees from the Middle East who had been granted asylum in Sofia, Bulgaria.
That approach changed in 2021 and now S2 outsources 90% of its projects to partner
organizations located in Syria and Iraq.
• S3: Management employees at five other data-work BPOs located in India,

Kenya, Iraq, Syria, and the USA. The aim of these interviews was to put some of the
preliminary findings from the fieldwork conducted at S1 and S2 in context and explore
which observations could be generalized to other settings, organizations, and regions.
• S4: Instruction documents for data-work tasks carried out in S1 (Argentina) and
in crowdsourcing platforms operating in Venezuela. The documents cover tasks such
as scraping data from the internet, labeling hate speech, and segmenting images for
computer vision.
• S5: ML engineers in their role as data-work requesters at four technology start-

ups located in Spain, Bulgaria, Germany, Denmark, and the USA. The ML practitioners
were interviewed and some of them were invited to participate in a workshop session.
Some of the participants were direct clients of S1 and S2.
My doctoral work comprised three large studies: (1) a grounded theory investigation, (2) a
dispositif analysis, and (3) a longitudinal participatory design study. In total, I conducted 38
qualitative interviews, several weeks of participant observations, five co-design workshops, and
analyzed 210 instruction documents and several data work interfaces (see Sec.1.3.2). More
information about the research sites, participants, and the negotiations involved in obtaining
access to these organizations and individuals can be found in the papers included in this
dissertation.
1.3.1 Methodology
Beyond specific differences related to the methods applied, the three studies conducted as part
of my doctoral work started from the desire to understand how the production of ground-truth
data is carried out in specific (inter-)organizational contexts by focusing on the perspectives,
challenges, and desiderata of data workers. The use of qualitative methods and the building of
long-term relationships with the workers and organizations participating in the studies are
conscious decisions that allowed me to establish rapport and conduct first-hand observations.
To that end, I deployed three methodologies throughout my doctoral work, namely,
grounded Theory, dispositif analysis, and participatory design. I have used each one of these in
service of the specific research questions and the type of data included in each study and paper.
These three approaches are considered methodologies (as opposed to methods) because they
cover broader questions concerning the frame of analysis in which issues related to epistemology,
ontology, axiology, teleology, and validity play a fundamental role [90, 91, 92]. As I will describe
in Chapter 2, many HCI and CSCW studies on ML data work have used research methods
10
1.3 Research Design
Bulgarian BPO
(S2)
ML Practitioners
(S5)
Syrian workers
with Bulgarian BPO
(S2)
Managers at
other BPOs
(S3)
Instruction
documents
followed by Argentine BPO
Venezuelan (S1)
platform workers
and Argentine
BPO workers
(S4)
Figure 1.2: Worldwide distribution of research sites, participants, and data.
that keep the researchers at a safe distance from the production sites. As a consequence, much
of that research has produced results that portray data workers as “bias-carrying hazards”
without taking into account their lived experiences and the power asymmetries and meaning
impositions that characterize their work. Conversely, the analysis frame of my research aims
to situate the findings in the specific contexts of each organization, its labor conditions, its
relationship with the clients, and its responses to specific market imperatives. Such approach
implies acknowledging that findings are not discovered but are co-constructed by researchers
and participants [93, 94, 95]. To this end, the methodologies included in this dissertation
served as guidance to reflect on my perspectives, interpretations, and position methodically.
In the following, I will describe what these methodologies entail and how I implemented
them in my research work. In addition, I will provide details of the methods — in the sense
of routinized techniques and strategies of inquiry — comprised in each methodology and this
dissertation.
1.3.1.1 Grounded Theory Methodology
I approached the first phase of fieldwork, i.e., the first visits to S1 and S2 in 2019, following
the premise of conducting exploratory research without predefined hypotheses. To understand
the rich and complex topics of the first interviews and observations, I followed the type of
11
1. Introduction
constructivist grounded theory methodology (GTM) proposed by Charmaz [95] and adapted
to HCI by Muller [96] and Muller et al. [97].
In the classic version of GTM, Glaser and Strauss [98] emphasize the need to discover theory
as emerging from data that is completely independent from the scientific observer. Conversely,
constructivist GTM assumes that neither data nor theories are discovered. Researchers’
education and previous experience, even with very different research projects and topics,
inform their approach to fieldwork to a greater or lesser extent. Therefore, no investigation can
ever be free of previously learned concepts, nor can it be objective and free of interpretation
[95].
In the context of this methodology, two qualitative data collection methods were deployed,
namely participatory observation and in-depth qualitative interviewing. A detailed description
of how GTM was deployed as well as the findings resulting from that approach, can be found
in Papers 2 and 4.
1.3.1.2 Dispositif Analysis
The term dispositif, as understood by Foucault [89, 99, 83], describes the ensemble that
discourses form with elements as heterogeneous as organizations, practices, rules, and artifacts.
The focus of interest is on powerful apparatuses that relate elements of “the said as much as the
unsaid” [89]. This makes dispositif a key concept to investigate the relation between discursive
and non-discursive practices [100]. Often described as an extension of discourse analysis [101],
dispositif analysis expands the field of inquiry beyond texts to include actions, relationships,
and artifacts, with a focus on power as the connecting force between these elements [102, 90,
101, 103, 104, 87].
In Chapter 4, dispositif analysis provided initial yet essential guidance to develop an
analytical strategy to study the relationships between different data types, i.e., interviews,
interfaces, and documents. The analysis presented in Paper 3 comprised the following elements:
1. Linguistically performed elements: Including a corpus of 210 instruction texts for data-
related tasks requested by ML practitioners and outsourced to data workers.
2. Non-linguistically performed practices: Interviews with data workers, managers, and

clients were conducted to explore how linguistically performed elements translate into
practice.
3. Materializations: Through participant observations, the knowledge built into physical

and digital artifacts, such as interfaces, documents, and performance metrics used to
surveil workers and quantify their performance, was explored.
Despite the wide application of Foucauldian theory in HCI and CSCW, the dispositif notion
and analysis have not been used within these communities. In this sense, this dissertation
contributes to these fields with a novel and comprehensive mode of analysis to approach data
work. A detailed description of how dispositif analysis was applied as well as the findings
resulting from that approach can be found in Chapter 4 and Paper 3.
12
1.3 Research Design
1.3.1.3 Participatory Design
In CSCW and HCI, participatory design methodologies (PDM) have been deployed to explore
various fields of inquiry [105, 106, 107, 108, 109, 110, 111, 112]. PDM has also been applied in
the field of ML fairness, accountability, and transparency to guide algorithmic accountability
interventions [113], to learn about the concerns of communities affected by algorithmic decision-
making systems [114, 115, 116], and to elucidate causal pathways to be input into complex
modeling [117].
In this dissertation, PDM principles are used to explore the development, implementation,
and use of documentation in real-world data production scenarios. Participatory Design is
understood here as a long-term iterative process in which exploring participants’ insights
is as essential as producing tangible design outcomes. This implies taking the “politics of
participation” [118] into account by trying to anticipate power asymmetries that could emerge
among participants, and between me and them. It also implies centering the voices of those
actors — in this case the data workers — who are not usually heard in design processes.
In collaboration with S1 and S2, I inquired into the challenges and desiderata of data
workers regarding the documentation of data production processes. My engagement with the
participants comprised several iterative phases, including interviews, feedback rounds, the
development of prototypes, and five co-design workshops. This process led to the identification
of important considerations for the transparency of data and the design of documentation
frameworks. A detailed description of how participatory design was deployed as well as the
findings resulting from that approach can be found in Chapter 5 and Paper 5.
1.3.2 Data Collection
1.3.2.1 Observations
The participant observations were conducted in-person at S1 and S2 between May and July,
2019 after a few months of conversations with the participating organizations, during which
the scope of my visit and the participation of the workers was negotiated. I describe the
details of the negotiations involved in obtaining field access in Section 3.1 of Paper 2. In Paper
5 (Section 5.2), I also discuss the implications and challenges of engaging in participatory
research with the inter-mediation of partner organizations. I will discuss these in greater detail
in Chapter 6.4.
The visits to both research sites included access to team meetings, meetings with clients,
workers’ on-boarding and training, briefings, and quality checks. In addition, some data workers
allowed me to shadow [119] them and observe how projects and tasks were conducted. I was
also allowed to participate in one data labeling project for a few days to gather information
about the interfaces used to perform data-related tasks.
All observations were registered as jottings taken in real-time. Those jottings involved
descriptions of spatial setups, information on actors, and description of actions and interactions
observed. In parallel, I also noted reflections on my emotions and perceptions, including
explicitly subjective interpretations. Simple sketches and, when permitted, photos helped to
complete the notes. The information gathered in catchwords or bullet points in the jottings
13
1. Introduction
Figure 1.3: Example of jottings taken at fieldwork in S2 (Bulgaria). My field book included
observations, simple sketches, open questions, and reflections about my own position as researcher
participating in the co-creation of findings.
were transformed into complete texts and integrated into more consolidated field notes at the
end of each fieldwork day.
1.3.2.2 Qualitative Interviews
A substantial part of the fieldwork conducted in Bulgaria (S2) and Argentina (S1) consisted of
intensively interviewing data workers and management. A total of 38 interviews are included
in this dissertation. All interview participants were selected by the firms management, which
represents a limitation (see Section 6.4). In-depth interviews were conducted with data workers
and managers at both sites. They enabled a view into the experiences of the interview partners
while encouraging them to reflect and elaborate on interpretations those experiences [95]. Two
guides were drafted for the interviews: one for data workers, one for the managers. All in-depth
interviews were conducted face to face at the workplace of the interview partners.
During fieldwork at S1 and S2, the permanent exchange between data and research goals
indicated that conducting interviews with representatives of other similar companies could
enrich the preliminary findings. This led to the decision to conduct expert interviews with
managers at four additional data-processing BPOs (S3). These interviews were conducted
after contacting the companies via email or the potential interview partners via LinkedIn. The
sampled interview partners were considered experts because they were able to provide unique
insights on the structures and processes deployed and maintained within their companies,
14
1.3 Research Design
specific departments, and the overall market. A separate interview guide was prepared. The
expert interviews were conducted via Zoom.
Finally, semi-structured interviews were conducted with machine learning practitioners
(S5) working for companies located in Germany, Spain, Bulgaria, and the United States. The
interview guide prepared for these interviews inquired about the negotiations involved in
ground-truth data production and the role of these practitioners as requesters of data work.
Some of the interview partners were direct clients of S1 and S2. In those cases, the BPOs
provided an introduction. The rest were recruited via LinkedIn.
Semi-structured interviews were also conducted with data workers and managers at S1 and
S2 at later stages of the participatory process involved in investigating documentation practices.
Some participants were interviewed more than once at different stages of this research.
1.3.2.3 Document Collection
During fieldwork at S1 and S2, I gained access to various documents containing specific
instructions and requirements provided by clients, lists of metrics for quality assurance, emails
between team members and to clients, and impact assessments. In addition, some of the
interviewed ML practitioners provided further documents, notably, data-work instructions.
A larger part of the instruction documents included in Paper 3 were collected by paper
co-author Julian Posada during his observations of crowdsourcing platforms, in open social
media groups where data workers exchange knowledge, and through organic online search.
1.3.2.4 Co-Design Workshops
A total of five co-design workshops — two with S1 and three with S2 — were conducted to
investigate data documentation practices. Engaging in hands-on design sessions directly with
data workers allowed us to explore their challenges and desiderata beyond corporate priorities.
Due to the COVID-19 pandemic, all sessions took place online via Zoom, and the activities
were conducted in parallel on the visual tool Miro. The sessions were video recorded and later
transcribed. In addition, note-takers registered the activities and interactions, and one of my
collaborators produced a real-time graphic recording on Miro. The graphic recording was kept
visible to participants as it evolved throughout the workshop sessions.
17 participants attended the workshops with S1, including four managers and 13 data
workers. Three managers and 13 of the data workers attended the session with S2. In addition,
three representatives of a client organization that outsources data labeling tasks with S2
participated in one of the workshop sessions.
1.3.3 Data Analysis

The data analysis methods used in this doctoral work varied according to the data type and
the specific research questions. Some data was coded and analyzed more than once using
different methods in different research phases. It must be mentioned that all data was analyzed
by me and at least one collaborator to ensure the inter-subject comprehensibility [120] of the
interpretations and the derived findings.
In the following subsections, I describe each data-analysis method in detail.
15
1. Introduction
We're
Here
‫ﻧﺤﻦ ھﻨﺎ‬
What information do labelers need to do their job
- and who can provide it?
Use case? Who annotates? Labeling clases?

effort Segregation
Precision Use case?
Image quality Time / quality
Edge cases?
Definition Cost
of clases Type of dataset
Tech capability
Road map
Model specs What kind of annotation?
from 8 Countries Security:
Skills/ workforce polygon, boundingbox?
PLattform: does the data
-not having all ensure security? Security -GDPR
Theme+ Goal of the workshop the details Post processing work?
causes a lot of How is the process
extra work! documented? Blancing certain clases?
representation of all
Number of training clases?
clases needed?
Consitency of format?
‫ ﯾﻔﺎن ﺻﺮح ﺑﺎﻧﮫ‬Very difficult to

Why Ranua? Which of the NGO
‫ ﻣﻦ اﻟﺼﻌﺐ‬know all relevant
partner would need a project the most? ‫إ‬
examples before!
‫اﻟﺤﺼﻮل ﻋﻠﻰ‬ The complexity is too high.
Is the partner capable? ‫ﻛﻞ اﻷﻣﺜﻠﺔ‬ Change of scope in the
Do they have experience? middle of the project
client Data sientist

project manager quality What ? How ? needs to be compensated! Instructions need to be
Set clearly!
Communication and
understanding feedback are high Priority
assuerence the purpose of Examples from The Labelers as the biggest loser of this
the labeling! ? lack are the annotators
POV Client: where they spend more
Goals
Interest
put inital general lable

effort adapting their
goals and then get lable
work every time....
Ranua Service
feedback would
‫ﺗﻮﺿﯿﺢ دﻗﯿﻖ داﺧﻞ‬
VISI N
be helpful
Information
Result+
client revision
supervisor lable
‫اﻟﻤﻠﻒ ﯾﺤﺘﻮى ﻋﻠﻰ‬
‫اﻟﻤﺘﻄﻠﺒﺎت ﻛﻲ ﻧﻘﻮم‬
project assistance detailed lable
‫ﺑﻌﻤﻠﯿﺔ اﻟﺘﺴﻤﯿﺔ‬
information
lable
labeler Size of dataset

how to deal with
Feedback timeline which difficulties
Influence cases/instructions -all the time affect the timeline?
which are wirded ‫اﻟﺤﺼﻮل ﻋﻠﻰ ردة‬
for annoitators?
‫ﻓﻌﻞ او اﺳﺘﺠﺎﺑﺔ‬ was the price fair?
A lot of people / objects are in the image. ‫ وﺿﻊ‬:‫وﺟﮭﺔ ﻧﻀﺮ‬
Data-
annotation
Assement
of Data
Training/
Testing w.
Revision
with
Labeling Data Assesment
Distance between the objects is close -
needs more concentration. Bad image
quality of
works depents
‫اﻷھﺪاف اﻷوﻟﯿﺔ أوﻻ ﺛﻢ‬
‫اﻟﺤﺼﻮل ﻋﻠﻰ اﻟﺘﻌﻠﯿﻘﺎت‬ Iteration ‫ﻟﻠﻌﻤﻞ اﻟﻤﻘﺪم ﺑﻜﻞ‬
‫ﻣﺮاﺣﻞ اﻟﻤﺸﺮوع‬
Motivation could be
less -Delevery could
+Export + Implementation quality doubles the work.
‫ﺣﻮل ﻣﻀﻤﻮن اﻟﻌﻤﻞ‬
be affected.
Design Samples annotators client e.g. edge case ‫ﺑﺸﻜﻞ ﻣﺴﺘﻤﺮ‬
Figure 1.4: Graphic recording of the three workshop sessions conducted at S2 that included the
Bulgarian BPO management, data workers from Syria, and representatives from one the BPO’s
clients located in a Nordic country.
1.3.3.1 GTM Coding
The grounded theory (GTM) coding system [95], including phases of open, axial, and selective
coding, was deployed during the initial phase of this investigation to analyze the first interviews
and observations conducted at S1, S2, and S3. The use of GTM and the analysis procedures
indicated in this methodology was consistent with the exploratory spirit of the fieldwork and
the first interviews conducted in 2019 and reported in Papers 2 and 4. Aimed at producing
a theoretical framework during the research process, GTM proposes a continual interplay
between data collection and analysis [121].
The analysis steps consisted in building an understanding of the themes and concepts
contained in the data through open (descriptive) codes. Then, those low-level codes were
combined into more powerful configurations of codes called axial codes. From the axial codes,
more theoretically robust, higher-level codes were constructed. These are called selective codes
and they indicate deliberate interpretive choices by the coders [95].
1.3.3.2 Critical Discourse Analysis
Critical discourse analysis (CDA) [90] was deployed in the deepening phase (see Chapter 4) as
part of the dispositif analysis reported in Paper 3. In this context, CDA was used to code and
analyze instruction documents, interview transcripts, and field notes.
The analysis comprised three stages: (1) the structural analysis of the corpus, (2) a detailed
analysis of discourse fragments, and (3) a synoptic analysis. These steps (especially the
16
1.3 Research Design
Figure 1.5: My whiteboard at Weizenbaum Institute during the first data analysis phase using
the GTM coding system. Co-author and collaborator Tianling Yang and I used the board to
discuss our understanding of the data and the codes. The picture was taken on September 20,
2019.
synoptic analysis) included several iterations that allowed the discovery of connections between
different levels of analysis, the collection of evidence to support the emerging interpretations,
and the development of arguments.
Inductive coding was applied to the instruction documents first. Then, the interviews
were analyzed by combining inductive and deductive coding. In this sense, some of the topics
identified in the instruction documents helped build deductive categories to code the interviews.
In addition, there was room for inductive category formation so that several codes could
emerge directly from the interviews. Through this form of analysis, I aimed at identifying
patterns. Those patterns were later compared with the elements identified in the instruction
texts and complemented with participant observations that had been registered in field notes.
Further details on this analysis are provided in Paper 3.
1.3.3.3 Reflexive Thematic Analysis
In the context of the participatory research study (Phase 3) reported in Paper 5, I used a
version of reflexive thematic analysis (RTA) [122, 94] adapted to HCI [123] to code interview
and workshop transcripts.
Braun et al. [122] argue that themes “do not emerge fully-formed” and that there are
first candidate themes which are developed from earlier phases, before the final themes are
17
1. Introduction
settled on. It took me and my paper co-authors [124] several iterations until we arrived at a
compelling set of themes. Coder collaboration in RTA is not focused on inter-rater reliability
but rather on how different perspectives on the same data help reflect on codes and develop
themes [123]. To analyze interview transcripts and workshop data, we used an inductive
approach to coding, which means the data is the starting point, and codes are strongly linked
to the data instead of a theory or concept. Some of the initial interviews that had already
been analyzed with GTM were re-coded using RTA, which was consistent with the principles
of participatory design followed in the later research stage.
1.3.4 Research Ethics and Positionality

As each paper comprised in this dissertation contains information on ethics, data management,
and a statement indicating my positionality and that of my co-authors across the intersections
of race, class, and gender, I would like to take this section as an opportunity to briefly discuss
the positionality of my research work. This is not an attempt to distance myself from my
responsibility as the researcher who led these studies. In fact, by disclosing the standpoint
from which this research was conducted, I talk not only about my own positionality but also
about the organizational conditions that both enabled and constrained my work.
I conducted this research as a first-generation university graduate and a migrant to Germany.
From that position, navigating the academic field, the German bureaucracy, and, ultimately,
knowing what to do in specific situations was particularly challenging. On the other hand, my
position as a migrant with previous experience in very diverse types of (formal and informal)
work might have helped me establish rapport with some of the participants.
In 2018, I started working as a research assistant at the Weizenbaum Institute. I was a
Social Science master’s student at the time and had never worked in research or technology
before. Being the only social scientist in a team of computer scientists, it was at first hard to
be integrated in on-going projects and studies. It was then when I became interested in data
annotation, an area that had not yet been explored by “mainstream” research. Following this
interests, and after a few months of “isolation” within my research group, I decided to start
planning my own research project without supervision. That was the inception of my doctoral
work and the studies comprised in this dissertation.
For the first fieldwork phase, between May and July 2019, I only received partial funding
because, as a research assistant, I was not supposed to be conducting my own research. I
did not have other funding sources and used my saving to cover some of the expenses. After
presenting my preliminary observations, some of my colleagues became interested in my work.
I was then offered a fully funded Ph.D. student position with the research group, which I
accepted in November 2019. The subsequent studies were funded by TU Berlin through
Weizenbaum Institute.
The end of my contract was set for April 2022, which meant I had less than two and a
half years to complete the Ph.D. program. That time constraint shaped my research strategy
and the subsequent studies significantly. Moreover, the COVID-19 pandemic began only a few
weeks after I had started in the program. This represented another major limitation to my
research plans.
18
1.3 Research Design
My advising and supervision situation was a further factor that impacted my doctoral
work. Only a few weeks after I had started my Ph.D. program, my research group’s principal
investigator, who had also agreed to be my supervisor, left her position. In the middle of a
global pandemic and with a limited contract, I became a PhD student without a supervisor. I
was very lucky to meet Dr. Alex Hanna at that time. She guided me through that difficult
situation and became, de facto, my main advisor and mentor for the rest of my Ph.D. journey.
Later on, Prof. Antonio Casilli and Prof. Bettina Berendt completed the constellation that
allowed me to become a Ph.D. candidate.
The fact that my official Ph.D. program was in computer science significantly shaped the
research contributions I strove for. I was (and am) convinced that my social science-informed
perspective could broaden the field of research in crowdsourcing and data work. However, I
needed to make my work legible for computer scientists for this dissertation to be accepted as
a contribution to that field. It was difficult to convince computer scientists that qualitative
analyses of the organizations where data work takes place and long-term engagements with
data laborers were worth the time and effort. An on-going challenge of my work consists in
remaining true to my research goals while catering to readers and reviewers both from the social
sciences and computer science. My position as a researcher moving between different — and at
times opposing — disciplinary epistemologies shapes my research significantly.
The studies included in these pages did not receive orientation or approval from the TU
Berlin’s Ethics Committee because, according to the committee, they did not qualify as
potentially harmful and “needing an ethics vote.” In permanent consultation with my advisors
and other experts in qualitative and participatory research, I put much effort into protecting
the participants and the data. Specific reflections corresponding to the specific studies and
participants can be found in the papers included in this dissertation.
All participants received information about the respective studies beforehand. All of them
expressed their consent before their participation, either verbally and on record or in a written
form. They were informed of the possibility of withdrawing from the study at any point. All
participants were compensated for their time. The workshop participants were paid €15 per
hour of participation. The interview participants received a gift card for €30 (or the equivalent
in their local currency) for an online marketplace.
De-identification through pseudonymization was assured at all stages of this research. The
data was processed based on participants’ consent and in a GDPR-compliant manner. To
protect the participants’ identities and the data collected, I made sure that participants did
not have their names, birth dates, addresses, or emails attached to the interview transcripts
and audio files. Each participant was asked to choose a pseudonym to link the different data in
the research analysis phase. Participants have no personal identifying features in the published
material, and there is no specific mention of the companies they work for. The audio records
have been deleted by the publication of this dissertation.
Pseudonymizing qualitative data involves “significant social scientific trade-offs” [125].
Necessary as anonymization and pseudonymization are, they pose challenges for qualitative
research, specifically ethnography. For instance, the very action and process of doing fieldwork,
such as requesting field access and using snowball sampling, inevitably make researchers and
researched communities exposed to public attention and leads to risks that the participating
19
1. Introduction
individuals and communities can be identified [126, 127]. Moreover, people within or near
the participating organizations might be able to re-identify participants [126, 128, 129, 127].
Compared to anonymization, there are more identification risks when data is pseudonymized.
However, pseudonymization is very useful for establishing rapport and in longitudinal studies,
such as the participatory engagement comprised in this dissertation, in which researchers need
to connect data related to the same participants in different moments [130].
The research work included in this dissertation demanded a careful consideration of my
position and the power differentials between me and the participants. Despite my best efforts,
there were specific dynamics that could not be mitigated. These include the gatekeeping
tendency of the participant organizations that often counteracted my attempts to build rapport
with the participants, and my own inexperience that often exceeded my good intentions. In
Section 6.4, I reflect upon these issues and how they affected the research process and findings.
20
Data Production and Power
2
2.1 Background and Motivation
In 2018, I started to think about the relationship between power, data, and machine learning.
Back then, researchers and fellow students would often refer to my incipient ideas using the
umbrella term “bias.” The study of bias had become very popular and, indeed, some of the
work published in this area was interesting and important. Still, there was something upsetting
about the assumption that my research, too, was about bias.
As soon as I came into contact with texts in the field of critical data studies (especially
[131, 132, 1, 81, 133, 64]) that looked into inequity, discrimination, and harm produced
or enhanced by data and data-driven systems beyond the bias framing, I began to see the
difference more clearly. Scholars in this field argue that “fairness” and “bias” are not enough
to account for injustice and power asymmetries encoded in data. They also highlight the need
for interdisciplinary collaboration.
While studying data annotation and working with communities of data laborers, I observed
a fundamental discrepancy between my observations and the, up to that point, dominant
literature in the field of data work and crowdsourcing: While my findings were showing that
pre-defined truth values are imposed on data workers and, through them, on data, research
in this area was predominantly focusing on “addressing and mitigating worker biases” [13]
as if these workers had enough agency to “contaminate” data with their personal biases. By
comparing my initial findings with those investigations, I found words to describe why I
categorically refuse to frame my work as “bias research,” and why I believe a different approach
is necessary: Bias framings of this kind obscure labor as a fundamental dimension of data and
AI ethics.
The paper included in this chapter — which is actually a commentary based on literature
research and not a research paper in the strict sense — is a call to study up data. In anthropology,
studying up means expanding the field of inquiry to study power, i.e., interrogating elites that
have remained significantly understudied in the anthropological tradition. In this sense, my
co-authors and I make a call for research to critically examine the set of power relations that
21
2. Data Production and Power
inscribe specific forms of knowledge in machine learning datasets. This means accounting for
historical inequities, labor conditions, and epistemological standpoints encoded in data. The
text is framed as a critical review of HCI and CSCW work. However, our critique can be
extended to the ML learning field in general. We ground our argument in HCI and CSCW
because of these communities’ extensive history in interdisciplinary work combining design,
computer science, and social science, among other disciplines. We explore several implications
of framing diverse socio-technical problems as “bias” in machine learning. Through examples
related to the study of ML datasets, data work, and dataset documentation, we argue for a
shift of perspective to orient efforts toward considering the effects of power asymmetries on
data and systems.
This commentary was published in January 2022 in the ACM journal Proceedings of the
ACM on Human-Computer Interaction and received great attention from researchers and the
general public. In the context of this dissertation, Paper 1 serves as an introduction to the
stance and focus of my research, that is, the entanglement of arbitrary truths, power, and
labor in ML data production. In this sense, this paper lays the foundations for the studies
included in the subsequent chapters.
We close Paper 1 by proposing a “power-aware research agenda” which corresponds with
the research program that I followed in my doctoral work and this dissertation. We argue
that starting from the assumption that power imbalances, not just bias, are the problem leads
to fundamentally different research questions and requires research methods that position
researchers closer to the field of inquiry. The proposed research program fosters the ethnographic
exploration of data-work settings and practices (which is at the core of Chapter 3 of this
dissertation), the investigation of the logics inscribed in data work tasks and instructions
(Chapter 4), and the expansion of dataset documentation frameworks to make them sensitive
to power differentials in data production (Chapter 5).
As per TU Berlin’s regulations for dissertations that contain papers published with co-
authors, I am obliged to make my contribution and that of my co-authors explicit. This is not
an easy task considering that research ideas and scientific advancement evolve in collective
efforts that go beyond specific authors. Nonetheless, the most tangible tasks that led to the
submission and publication of this paper can be described as follows: The first complete
manuscript was drafted by me in May 2021. At the time, I was collaborating with Julian
Posada and Tianling Yang in various projects. I then invited them to join as co-authors. They
provided support in expanding the initial literature research, and were key to critically revising
and strengthening the argument.
22
Paper 1: Studying Up Machine Learning Data
Studying Up Machine Learning Data: Why Talk About Bias

When We Mean Power?
MILAGROS MICELI, Technische Universität Berlin, Weizenbaum Institute, Germany
JULIAN POSADA, University of Toronto, Schwartz Reisman Institute, Canada
34
TIANLING YANG, Technische Universität Berlin, Weizenbaum Institute, Germany
Research in machine learning (ML) has argued that models trained on incomplete or biased datasets can lead
to discriminatory outputs. In this commentary, we propose moving the research focus beyond bias-oriented
framings by adopting a power-aware perspective to “study up” ML datasets. This means accounting for
historical inequities, labor conditions, and epistemological standpoints inscribed in data. We draw on HCI
and CSCW work to support our argument, critically analyze previous research, and point at two co-existing
lines of work within our research community — one bias-centered, the other power-aware. We highlight the
need for dialogue and cooperation in three areas: data quality, data work, and data documentation. In the
first area, we argue that reducing societal problems to “bias” misses the context-based nature of data. In the
second one, we highlight the corporate forces and market imperatives involved in the labor of data workers
that subsequently shape ML datasets. Finally, we propose expanding current transparency-oriented efforts in
dataset documentation to reflect the social contexts of data design and production.
CCS Concepts: • Human-centered computing → Computer supported cooperative work; HCI design
and evaluation methods; Computer supported cooperative work; HCI design and evaluation methods.
Additional Key Words and Phrases: bias; power; machine learning datasets; training data; data work; dataset
documentation; HCI; CSCW
ACM Reference Format:
Milagros Miceli, Julian Posada, and Tianling Yang. 2022. Studying Up Machine Learning Data: Why Talk
About Bias When We Mean Power?. Proc. ACM Hum.-Comput. Interact. 6, GROUP, Article 34 (January 2022),
14 pages. https://doi.org/10.1145/3492853
1 INTRODUCTION
In 2015, Facebook’s “real name” policy caught some media attention after the platform’s algorithm
failed to recognize the names of hundreds of North American Indigenous users as “real” and
proceeded to cancel their accounts [43, 84]. According to Facebook’s algorithm, real names were
defined by Anglo-Western conventions. Thus, the system flagged names composed of several words
or with unusual capitalization. Moreover, despite the many contextual factors that determine how
a name sounds and looks like, Facebook enforces its policy algorithmically, that is, in a narrow,
unquestionable, and predefined way.
At first sight, the issues raised by users whose names were flagged could indicate the presence
of biased training data: As Anglo-Western names are dominant and those from other cultures are
underrepresented, the unbalanced dataset leads to “unfairness.” This approach is not wrong, but it
remains insufficient to fully address the issue at stake: that some worldviews are considered more
valid than others. Framing this type of issue as “bias” tends to obscure a set of persistent questions
Authors’ addresses: Milagros Miceli, Technische Universität Berlin, Weizenbaum Institute, Berlin, Germany; Julian Posada,
University of Toronto, Schwartz Reisman Institute, Toronto, Canada; Tianling Yang, Technische Universität Berlin, Weizen-
baum Institute, Berlin, Germany.
© 2022 Copyright held by the owner/author(s). Publication rights licensed to ACM.

This is the author’s version of the work. It is posted here for your personal use. Not for redistribution. The definitive Version
of Record was published in Proceedings of the ACM on Human-Computer Interaction, https://doi.org/10.1145/3492853.
Proc. ACM Hum.-Comput. Interact., Vol. 6, No. GROUP, Article 34. Publication date: January 2022.
23
34:2 Milagros Miceli, Julian Posada, and Tianling Yang
behind and beyond the technical realms: What is a real name? Who decides over the realness of a
name? And, do we need a real name policy at all?
In the past decade, injustice and harm produced by data-driven systems have often been addressed
under the umbrella term “bias.” Research has shown that biases can penetrate ML systems at every
layer of the pipeline, including data, design, model, and application [71]. Special attention has
been paid to the quality of data, arguing that models trained on incomplete or biased datasets can
lead to discriminatory or exclusionary outcomes [15, 29, 71]. Moreover, significant academic focus
lies upon bias in data work and crowdsourcing [13, 21, 34, 36, 49]. Because of the interpretative
character of tasks such as labeling, rating, and sorting data, abundant research has focused on
the individual subjectivities of data workers to account for biases in data, investigating ways of
mitigating them by constraining workers’ judgment.
With the present commentary, we aim to contribute to the discussion around data bias, data
worker bias, and data documentation by broadening the field of inquiry: from bias research towards
an investigation of power differentials that shape data. As we will argue in the following sections, the
study of biases locates the problem within technical systems, either data or algorithms, and obscures
its root causes. Moreover, the very understanding of bias and debiasing is inscribed with values,
interests, and power relations that inform what counts as bias and what does not, what problems
debiasing initiatives address, and what goals they aim to achieve. Conversely, the power-oriented
perspective looks into technical systems but focuses on larger organizational and social contexts. It
investigates the relations that intervene in data and system production and aims to make visible
power asymmetries that inscribe particular values and preferences in them.
Computing has become so widely integrated into society, both influencing and being shaped
by it, that a broader understanding of sociotechnical systems becomes key to addressing social
concerns surrounding their development and deployment. In this sense, “debiasing” ML data is not
sufficient to fully address the questions posed by “real-name” algorithms and other data-driven
systems that are deeply ingrained in our everyday lives. Such approaches could be expanded by
applying a relational view on the power dynamics and the economic imperatives that drive machine
learning, i.e., considering that biases do not occur in a vacuum but are fundamentally entangled
with naturalized ways of doing things within the organizations where datasets and models are
developed. This requires an epistemological shift in terms of how to think of these problems, what
questions to ask, and what methods to use. Such a shift can only be achieved through more dialogue
between Computer Science and disciplines such as Sociology, Anthropology, and Economy. Given
the important interdisciplinary tradition in HCI and CSCW, we believe in the key role of these
communities in prompting conversations around power and ML systems.
On that basis, we follow the line of previous work [5, 37, 86] that has borrowed the concept of
“studying up” from anthropologist Laura Nader. In anthropology, studying up means expanding the
field of inquiry to study power, i.e., interrogating elites that have remained significantly understudied
in the anthropological tradition. In their call to study up algorithmic fairness, Barabas and her
colleagues [5] explain that this endeavor “requires a new set of reflective practices which push
the data scientists to examine the political economy of their research and their own positionality
as researchers working in broken social systems.” In a similar vein, our appeal is to study up
machine learning data by investigating labor conditions, institutional practices, infrastructures,
and epistemological stances encoded into datasets, instead of looking for biases at the individual
level and proposing purely technical interventions.
In the following sections, we will zoom into three critical ML-related fields of inquiry: data
quality, data work, and data documentation. While our argument is based on previous research, it is
worth mentioning that a systematic literature review is not within the scope of this commentary.
Here, we look into CSCW and HCI research to critically discuss work that revolves around the
24
Studying Up Machine Learning Data: Why Talk About Bias When We Mean Power? 34:3
concept of “bias.” At the same time, we build on previous initiatives within CSCW and HCI that have,
instead, striven for a more comprehensive understanding of sociotechnical systems. By contrasting
both perspectives that co-exist within our research communities, we highlight the importance
of fostering more dialogue between them to produce research that expands the investigation of
individual biases into a consideration of power asymmetries within organizations and among them.
Our argumentation concludes with suggestions as to how to study up machine learning data and
why.
2 THE LIMITS OF BIAS

Studies on data and algorithmic biases have demonstrated how data-driven systems can enhance
discriminatory practices and result in exclusionary experiences in various domains, including
credit allocation [25, 58] and algorithmic filtering [4, 70]. CSCW and HCI research has explored
algorithmic bias in the job market [20, 46], advertisement [1], and image search engines [53],
among several other domains. Moreover, researchers have shown how algorithms contribute
significantly to the visibility of information [57] and how stereotypes are perpetuated by gender
recognition systems [44]. The quest for addressing these problems has prompted the development
of an area of research that emphasizes the issue of bias, and the values of fairness, accountability,
and transparency in mitigating its effects [28]. The fact that research in the technical realms takes
issue with social inequities and examines the harmful effects of technology is a significant step.
However, work in Critical Data and Algorithmic Studies, as well as CSCW and HCI, has argued
for a shift of perspective from individual cases and individual biases towards the comprehensive
analysis of social practices and power relations involved in creating the systems that surround us
[5, 11, 27, 52, 62, 85].
Technological development is sociotechnical in nature and data, as an abstraction [28, 55], is
not given but created through human discretion [68] and shaped by power dynamics [62]. By
focusing on technical solutions for personal subjectivities, bias-oriented approaches are mostly
unable to account for the social processes underway that comprise increasing surveillance and
privacy intrusion to satisfy the insatiable need for more and diverse data [22]. Moreover, such
investigations often leave the important shifts in labor that include the mobilization of largely
precarized workforces to process data and make it “readable” for ML systems [78], unquestioned.
Through a power-aware lens, it is possible to interrogate why accurate, efficient, and seemingly
“debiased” ML systems are still not good for everyone. For example, accurate facial recognition used
for surveillance is dangerous in the hands of unscrupulous organizations or oppressive governments
[28]. Debiasing efforts sometimes mitigate harm but machine learning will inevitably perpetuate
injustices if systems remain controlled by powerful organizations that follow their own agenda.
In this context, attempts to address and mitigate biases appear as “a tiny technological bandage
for a much larger problem” [28]. Research efforts that focus on designing “debiased” systems are
not bad. However, the question stands: “debiased” according to whom and for whom? [52]. The
bias-oriented approach provides only limited tools to explore this and other important questions.
Moreover, framing sociotechnical problems as bias constitutes what Powles and Nissenbaum call
“a seductive diversion” [80]: On the one hand, we are told that biases can be fought and mitigated,
and that data can be cleaned and systems debiased. On the other hand, it is argued that bias is not
only a technical but also a societal issue; hence, biases are everywhere and nowhere. If society is
biased, then biased AI cannot be avoided. This way, bias-oriented framings present a puzzle that
keeps us continually busy because technical fixes are inadequate solutions to societal issues. We
are always on the way to identifying and mitigating biases in an attempt to build debiased systems
while knowing that the ideal of a debiased system can never be achieved.
25
Still, considerable efforts, both within and beyond HCI and CSCW, are invested in technical tools
to mitigate data biases, algorithmic biases, and workers’ biases in domains where interrogation and
reflexivity could be more fruitful. This way, the bias puzzle distracts us from addressing fundamental
questions about who owns data and systems, who are the data workers, whose worldviews are
imposed onto them, whose biases we are trying to mitigate, and what kind of power datasets
perpetuate. “It also denies us the possibility of asking: should we be building these systems at
all?” [80]. These questions could shift the perspective because they interrogate privileged and
naturalized worldviews encoded in data and systems that (re)produce the status quo. Consequently,
such questions are more about power than they are about bias.
In the following sub-sections, we will develop this argument further by going deeper into the
discussion of problematic aspects of framing power differentials and injustice as “bias” in ML data,
data work, and dataset documentation.
2.1 Data is Always Biased

Data bias has been defined as “a systematic distortion in the data” that can be measured by “contrast-
ing a working data sample with reference samples drawn from different sources or contexts.”[71]
This definition encodes an important premise: that there is an absolute truth value in data and that
bias is just a “distortion” from that value. This key premise broadly motivates approaches to "debias"
data and ML systems. However, data never represents an absolute truth. Data, just like truth, is the
product of subjective and asymmetrical social relations [28].
In their groundbreaking analysis of three commercial gender classifiers made available by
Microsoft, Face++, and IBM, Buolamwini and Gebru [15] show that darker-skinned women are
up to 44 times more likely to be misclassified than lighter-skinned men. This work is often cited
as a paradigmatic example of how data can contain biases as related to the underrepresentation
of certain groups. Looking at this problem from a bias-oriented perspective, the solution seems
straightforward: add more and diverse data to training datasets. However, as Gebru also points
out, biased data is only part of the story: “[. . . ] not just bias in the training data, but ethics in
general — what’s okay to do, what’s okay not to do, the power dynamics of who has data, who
doesn’t have data, who has access to certain kinds of models, and who doesn’t” [50]. The contextual
issues that escape technical fixes also include: where are diverse data — the “missing faces” —
harvested? Under which conditions? Who classifies them? Moreover, and considering that Black
and Brown populations have historically been subject to surveillance, persecution, and police
violence [9, 14], it is worth asking if improving facial-recognition systems so that they can properly
“see” dark-skinned faces would further perpetuate injustice and harm.
Our point is that biased data is undoubtedly one issue to consider when it comes to discriminatory
outcomes from machine learning systems, but so are social structures, the definition of social
problems to be solved in computational terms, and the widespread assumption that algorithms
are neutral where people are not. These factors, as well as data, are deeply political. Machine
learning systems are fundamentally trained to cluster and classify data. When these classifications
are value-laden and interest-informed, they result in imposing and promoting the particular set
of interpretations and worldviews of some groups, which could reinforce injustice [12]. In other
words, ML systems have real effects on real people. Therefore, it is important to consider that
their quality cannot be thought of only in terms of accuracy and performance. Some issues do
not just get solved by throwing in more data and quantification does not always lead to better
representation or less harm. In a broader sense, harms produced by ML systems manifest existing
power asymmetries: they are about having the power to decide how systems will “see” and classify,
what data is worth including, and whose data we can afford to ignore. Those harms are about the
power to impose a hegemonic worldview over others possible.
26
Tracing the links to historical and ongoing asymmetries can be helpful to understand how data
comes to be [27] and what kind of political work ML systems perform [45]. This means, of course,
acknowledging that the data that fuels machine learning is produced by humans and hence is
laden with subjective judgments. Even so, discussions around human intervention on data ought to
consider that the subjective forces that shape data and systems are not just about the personal biases
of individual actors. Data is produced within organizations and through practices that “embody
specific technical ideals and business values” [74] that also shape the subjectivities of data workers.
We are for sure not the first ones to make this statement: Researchers in Human-Centered Data
Science (HCDS) [56, 68, 69, 73, 74, 92] and Human-Centered Machine Learning (HCML) [19] have
explored data as a “human-influenced entity”[68]. A series of CSCW/HCI workshops on Data
Science work practices [66, 67] has fostered interesting conversations on collaboration, meaning
making, trust, craft, and power. This line of work has shown that narratives, preferences, and
values related to larger socio-economic contexts are embedded in processes of data production
[75]. Practices such as the framing of real-world questions as computational problems [10, 72],
the choice of training data and data-capturing measurement interfaces [77], the establishment of
taxonomies to label data [62], and the selection of data features [68] as well as the design of data to
be recognizable, tractable, and analyzable [35, 68], all are decisions that are hardly ever made by
individual choice and in a vacuum. Instead, they concern organizational structures and depend on
budgets, revenue plans, and technical possibilities.
As the examples in the following section will show, despite the abundant CSCW and HCI
initiatives that have argued that “datasets aren’t simply raw materials to feed algorithms, but
are political interventions” [23], a considerable number of investigations within those research
communities still comprise the assumption that data represents an absolute truth value and that
bias is just a distortion that can be mitigated. The problem is that framing arbitrary representations
in data as “bias” misses the political character of datasets: there is no neutral data and no apolitical
standpoint from where we can call out bias [23]. Datasets are always “a worldview” [26] and, as
such, data always remains biased.
2.2 “Mitigating Worker Biases” Should Not Be the Goal

Datasets are conditioned by the networked systems in which they are created, developed, and
deployed. The examination of data provenance and the work practices involved in dataset production
is essential to the investigation of subjectivities embedded in data-driven systems [62, 68, 69, 73].
In formal terms, data work for machine learning involves tasks such as the collection, curation, and
cleaning of data, labeling and keywording, and, in the case of image data, it can also involve semantic
segmentation (i.e., marking and separating the different objects contained in a picture) [17, 18, 89].
Outsourced data workers perform these tasks through digital labour platforms (crowdsourcing) or
business process outsourcing companies (BPOs). In this regard, outsourced data work is part of the
broader gig economy landscape, in the case of platforms [? ], and other digital service BPOs, like
those providing content moderation [? ]. In both cases, these types of work are characterized for
low- or piece-wages, limited-to-no labor protection, and high levels of control and surveillance.
The tasks that data workers perform are fundamentally about making sense of data [62, 68],
that is, about interpreting the information contained in each data point. Because of the subjective
character of data-related tasks, bias-oriented research in this space has focused mainly on the
individual subjectivities of workers, considering their judgments to be a significant source of biases
and data quality errors [13, 21, 40, 49, 90]. For example, abundant research considers labelers’
subjectivity in annotation tasks to be one of the main reasons for biased labels. The field of research
directed towards the study of crowdworkers and crowdsourcing platforms [13, 21, 34, 36, 49] offers
several examples of such an approach. Some of this work argues, for example, that data workers’
27
cognitive biases [30], their own preferences [81], and political stances [91] can negatively affect
data. Moreover, research has proposed methods to identify and monitor annotator bias within
datasets [3, 39, 49, 90]. In a paper presented at CHI 2019, Hube et al.[49] explore how crowdworkers
annotate machine learning data and propose a framework for mitigating their biases. The authors
argue that extreme personal opinions of workers can affect data labeling tasks and produce biased
data, especially when the tasks involve opinion detection and sentiment analysis. Consequently,
they add that “the ability to mitigate biased judgments from workers is crucial in reducing noisy
labels and creating higher quality data.” This research follows the line of many of the work in
crowdsourcing that rests on three premises: (1) that data should represent an absolute ground truth,
and that bias is a deviation from that truth value, (2) that data workers have enough agency to
interpret data according to their personal judgment and could, therefore, be prone to deviating
from the predefined truth value that data should represent, (3) and that workers using their own
subjectivity to interpret data is per se detrimental to the quality of data. Quite often, such approaches
to detect and mitigate worker bias do not consider that data workers constitute automation’s “last
mile” [41], that is, the bottom end of hierarchical labor structures, and that they collect and label
data within organizational structures and according to predefined truth values instructed to them
by managers and clients.
Socio-technical systems are complex in nature and this also includes the data work that fuels
them. Several issues framed by previous research as “worker bias” are actually manifestations
of broader power asymmetries that fundamentally shape data: power asymmetries that are as
trivial as being the boss in a tech company and have decision-making power, or being an underpaid
crowdworker who risks being banned from the platform if they do not follow instructions. We argue
that research that focuses on the personal biases of workers and aims at mitigating them could
benefit from an interrogation of power differentials, normalized preconceptions, and profit-oriented
interests that shape labor conditions in data work.
Let us look at some examples from our on-going research project that focuses on data work in
Latin America [61]. These examples should provide an idea of the identity of the workers whose
biases research attempts to mitigate. The workers participating in our project are located, as many
data workers, in Argentina and Venezuela. The Venezuelan economy is currently experiencing
the highest levels of inflation in the world and many people look for work with crowdsourcing
platforms because they offer a steady income, paid in US dollars. Melba, one of the crowdworkers
interviewed by us, is a retired woman. Her monthly pension is the equivalent to USD$1, which, as
she puts it, is “not enough to buy half a dozen eggs; not enough to buy a piece of cheese or bread.”
The payment she receives for doing data work is also meager by international standards. However,
in a country experiencing hyperinflation, it allows her to supplement her income. In the case of
Juan, another crowdworker from Venezuela, the income from the platform is comparable to what he
would receive doing harvest work in the neighboring country, Colombia. However, working in data
annotation allows him to stay in Venezuela with his family instead of migrating and being apart.
In Argentina, most of the data workers that we interviewed live in the impoverished areas that
surround Buenos Aires. Despite the meager salaries they receive for data collection and annotation
tasks (the equivalent to US$1.80 per hour) and the exhausting nature of the work they perform,
all interviewees expressed being proud of their work. For many of them, doing data work means
finally having a desk job and breaking with generations of unlicensed cleaning or construction
work. Similarly, for many of the Venezuelan crowdworkers, having access to this type of work
means avoiding extreme poverty and having a means to circumvent many of the difficulties present
in their local labour market.
28
The cases described above are not extreme or marginal. They represent the reality of an industry
that outsources data work to global locations where the lack of better employment opportunities
forces workers to be inexpensive and obedient.
A growing body of literature in CSCW and HCI has taken crowdworkers’ perspective and
pointed to the issues of underpayment [47], crowdworkers’ growing dependency on performing
crowdsourcing tasks to make ends meet [82], the use of parameters and processes (e.g. the rate
of previously approved and paid tasks) to select and recruit crowdworkers [6], and the power
asymmetries introduced by crowdsourcing platform design and inherent in the relations between
service requesters and crowdworkers [51, 60, 83]. Ekbia and Nardi use the term heteromation
to characterize the shift in technology-mediated work and labor in which human intervention
and action are indispensable for technical systems to function [31]. They argue that heteromated
systems, like the platform Amazon Mechanical Turk, are the outcome of socioeconomic forces
rather than of the essential attributes of humans and machines, as commonly assumed [31]. The
authors not only scrutinize the asymmetrical labor relations in which crowdworkers are put at
a significantly disadvantaged position, but also emphasize that crowdworkers are regarded as
mere “functionaries” of algorithmic systems and are rendered invisible [31]. Apart from drawing
attention to invisible labor and asymmetrical labor relations, a political economic perspective
further highlights the profit-driven imperative of capital, the surveillance and social control enabled
and reinforced by digital technologies, and the political nature of design choices and technologies
that mediate work and labor [32, 33]. These studies are important examples within CSCW and HCI
of how shifting researcher’s gaze upwards to look into power dynamics can expose fundamentally
different issues with sociotechnical systems. However, they unfortunately have not received enough
attention from scholars in those very same research communities that investigate bias in data work.
Social and labor conditions affect the dependency of workers on data work, and that dependency
has an effect on how datasets are produced, such as restricting workers’ ability to raise questions
about annotation instructions and tasks. Expanding the question of how heteromated labor affects
crowdworkers, broader communities, and polities [31], we propose also asking how power asymme-
tries in heteromation inform machine learning datasets and systems. Starting from the assumption
that such power imbalances are the problem, not just bias, leads to fundamentally different research
questions and methods of inquiry. We believe that this perspective can significantly contribute to
broadening research on data worker and crowdsourcing bias.
2.3 Data Documentation Beyond Bias Mitigation

Several frameworks and tools to document machine learning datasets and models have been
proposed and applied. Significant examples are the work of Bender and Friedman with the Data
Statements for Natural Language Processing [8], Holland et al. with the Dataset Nutrition Label [48],
and most prominently, Mitchell et al. with Model Cards for Model Reporting [64], and Gebru et al.
with Datasheets for Datasets [38]. In these investigations, data bias appears as a core motivation for
developing documentation frameworks. The authors argue that documentation can help “diagnose
sources of bias” [48], and has potential to “mitigate unwanted biases in machine learning systems”
[38]. In the present subsection, we would like to discuss two ways to complement these approaches:
First, by expanding the documentation of dataset composition beyond merely listing dataset’s
elements, and second, by considering the complex and intricate relationship between dataset
creators and dataset consumers. As we will argue, both considerations could allow us to expand
this line of research and explore power relations in machine learning through a CSCW-informed
perspective beyond bias-centered framings.
First, we argue for the inclusion of further information beyond the proposed list of data “ingredi-
ents”. For instance, one of the questions in Datasheets for Datasets asks “does the dataset identify
29
any subpopulations?” (e.g. by race, age, or gender). This way of documenting dataset composition
is key but it also brings along what we consider to be a valid question: Is this information sufficient
in itself to explicate unjust outcomes? Disclosing whether a dataset includes racial categories and
listing said categories “does not speak to the problem of such categories’ reductiveness, nor makes
the assumptions behind race classifications embedded in datasets explicit” [63]. We believe that
documentation can and should tell us more, for instance, about how data collectors and annotators
have established the correspondence between data point and category. Moreover, it is important
to consider that “to ascribe phenomena to a category — that is, to name a thing — is in turn
a means of reifying the existence of that category” [23], as Crawford and Paglen put it. When
the documentation of racial categories contained in a dataset is limited to listing them without
further reflection, the risk exists that the documentation could contribute to the reification and
naturalization of such categories.
Our second idea is to look deeper into the intricate relationship between data workers and
requesters. In their investigation, Gebru et al. [38] argue that Datasheets for Datasets would improve
communication between dataset creators and dataset consumers. The clear differentiation between
dataset creators and consumers surely applies to large datasets commonly used for benchmarking,
such as ImageNet. However, such a clear separation does not correspond with most of the machine
learning datasets that are created for commercial use. For instance, Feinberg [35] unveils "a multilay-
ered set of interlocking design activities" in data infrastructure, collection and aggregation in data
production. In many settings of data production, design activities and decisions are shaped, if not
determined, by dataset consumers and other external stakeholders rather than data workers, which
makes the former co-designers of datasets. In such settings, the distinction between consumers
and producers is at least ambiguous. Previous work [54, 63] has explored companies producing
tailor-made datasets to train their own ML models. These companies have particular requirements
in mind and produce data specifically tailored to the ML product they aim to develop. Many of
these organizations do outsource data collection and labelling but, even then, tasks are completed
according to the specific instructions provided by model developers — whom Gebru et al. call
“dataset consumers.” Once the labeled dataset is sent to the model developers, data is further cleaned
and sometimes re-labeled. In a similar vein, Seidelin [87], building on and extending Feinberg’s
design perspective of data, situates data work and practices in organizational, cross-organizational,
and multi-stakeholder contexts. Her research reveals that data work and data-based services are
by nature collaborative and cooperative, and that the design and production of data are rather
co-design processes. These perspectives challenge the clear separation between dataset producers
and consumers and show that dataset consumers are also dataset co-creators.
With both ideas described above, we seek to expand previous work in data documentation
beyond bias-related motivations. Merely listing the composition of a dataset without interrogating
the origins of its categories might be sufficient if the aim of documentation is “mitigating unwanted
biases”. However, it is not enough to unveil the political work those categories perform. Similarly,
the stiff differentiation between dataset producers and consumers could reinforce a similar logic as
the studies on worker bias described in the previous section: The responsibility for data quality
issues lies with data workers exclusively and requesters have no control over assumptions encoded
in datasets because they are mere “consumers.”
An extended perspective could also help to explicate why, despite growing calls for more
transparency in machine learning, data documentation practices are still limited in the machine
learning industry. Some factors to take into account are that requesters often regard the information
that should be documented as corporate secrecy and that documentation is often perceived as an
optional task, in some cases even as a burden, that is time-consuming and expensive [63]. Moreover,
the lack of knowledge and training, be they technical or ethical, makes data workers less equipped to
30
reflect on what should go into documentation [54] and, even among informed workers, hierarchical
managerial structures in BPOs and the risk of being banned in data work platforms would probably
make workers reluctant to use documentation to reflect upon taken-for-granted practices.
To address such difficulties, researchers developing documentation frameworks could benefit
from the acknowledgement that data production is a collaborative project which demands coopera-
tive efforts from actors that hold different (organizational and social) positions and decision-making
power to shape data [24, 62]. While the bias-orientation of existing frameworks counteracts docu-
mentation’s potential to make power explicit and contestable, we believe that CSCW research could
significantly contribute to this goal. More than diagnosing “the source of bias,” documentation
should aim at interrogating work practices and decision-making hierarchies within and among
organizations.
3 CONCLUSION
This commentary has critically explored several implications of framing diverse socio-technical
problems as “bias” in machine learning. Through examples related to the study of ML datasets,
data work, and dataset documentation, we have argued for a shift of perspective to orient efforts
towards considering the effects of power asymmetries on data and systems.
Such reorientation not only concerns privileged groups among machine learning practitioners.
It is also about the role of researchers and the intertwined discourses in industry and academia
[42]. We need more research that interrogates the relationship between human subjectivities
and (inter-)organizational structures in processes of data production. Most importantly, power-
oriented investigations could allow researchers to “shift the gaze upward” [5] and move beyond a
simplistic view of individual behaviors and interpretations that, in many cases, ends up allocating
responsibilities with data workers exclusively. Moreover, it could be helpful to investigate workers’
dissent not as a hazard but as an asset that could help flag broader data quality issues, as Aroyo
and Welty [2] have argued. A view into corporate work practices and market demands can offer a
wider perspective to this line of research [78].
Instead of technically correcting bias, this commentary is a call to study up machine learning data,
that is, to interrogate the set of power relations that inscribe specific forms of knowledge in machine
learning datasets. CSCW and HCI offer good examples of how different power conceptualizations
can help broaden the study of socio-technical systems. For instance, scholars have drawn on
feminist [7, 28, 65] and postcolonial [51, 76] theories to ask “Who” questions and make visible
power dynamics in technoscientific discourses, highlighting their political nature.
Our call also includes considering data workers as allies in the quest for producing better and
more just data, instead of portraying them as bias-carrying hazards. It means asking ourselves,
“how is AI shifting power?” [52] rather than “how can worker biases be mitigated?” Practitioners
and researchers would do good by reflecting on power asymmetries that are inherent to creating
data if the goal is accounting for “biased” data but, most importantly, for unjust socio-technical
systems. Despite the abundant work (including several examples cited here) that has shown how
power differentials shape data and data work, a number of investigations within our research
community still direct their efforts towards mitigating biases in data work and crowdsourcing
without considering the experiences and conditions of workers. Therefore, we insist on the need
to foster interdisciplinary dialogue. Both lines of research — the study of power and the study
of bias in ML data production — co-exist in parallel within CSCW and HCI. It is our hope that
this commentary will prompt conversations that lead to more collaboration and, ultimately, to the
advancement and broadening of this field of inquiry.
31
3.1 How and Why Study Up Data?

We conclude by proposing a power-oriented research agenda to study ML data along three interre-
lated lines:
First, we propose conducting more qualitative and ethnographic research on data workers and
data work production: Who are data workers? In what geographical and cultural contexts do
they perform data work? What are the workflows, corporate infrastructures, and intra- and inter-
organizational relationships in data production? How do these contexts affect data workers and
dataset production? Retrieving data work settings can further make explicit the assumptions, norms,
and values that inform and are inscribed in data work, allowing the “arenas of voice” [16, 88] and
ethical considerations of workers [79] to emerge. In this sense, we argue that a deeper investigation
into data workers and data production cannot be achieved through mere quantitative measures
and necessitates qualitative and exploratory research as well as the expertise of social scientists.
Second, we propose “shift[ing] the gaze upward” [5] and studying the actors who outsource
the creation of machine learning datasets: Who are data work requesters? What are their needs
and wants? What rationale and priorities do they inscribe in data work tasks? What are the
organizational forces driving them to produce and request data in specific ways? How do their
needs and demands affect data workers’ labor conditions? Investigating the role of ML practitioners
commissioning data-related tasks could help to explore the collaborative nature of data work and
would see requesters as co-designers of data, and not as mere consumers. Here, too, it is important
to look into the organizational settings in which the work of model developers is embedded.
Drawing attention to data work requesters and their organizations can therefore reveal the service
relationships, market logics, and the resulting power asymmetries that shape data work and, thereby,
data.
Finally, we propose expanding data documentating research and existing documentation frame-
works: How can data documentation become sensitive to power relations and data production
contexts? What would such a data documentation framework look like? How could organizations
be incentivized to adopt such a documentation approach? How can we go beyond recognizing the
power imbalances inscribed in data work and take action to bridge the power gap? Recognizing
and investigating power relations are the initial steps to challenge them [28]. In this sense, a
power-oriented data documentation framework can be one of the tools to render power — and
its imbalances — visible in data work. In line with previous research [38, 59, 63], we argue that
documentation frameworks should be grounded on the needs of workers, be integrated into existing
workflows and organizational infrastructure, and have the flexibility to accommodate specific work
scenarios.
ACKNOWLEDGMENTS
Funded by the German Federal Ministry of Education and Research (BMBF) – Nr 16DII113, the
International Development Research Centre of Canada, and the Schwartz Reisman Institute for
Technology and Society. We would like to acknowledge the data workers that have shared their
knowledge and experience with us so that we could develop the ideas outlined in these pages. Special
thanks to Alex Hanna, Bettina Berendt, and our anonymous reviewers for providing insightful
comments that helped us strengthen our argument.
REFERENCES
[1] Ali, M., Sapiezynski, P., Bogen, M., Korolova, A., Mislove, A., and Rieke, A. Discrimination Through Optimization:
How Facebook’s Ad Delivery Can Lead to Biased Outcomes. Proc. ACM Hum.-Comput. Interact. 3, CSCW (Nov. 2019),
199:1–199:30. Article 199.
32
[2] Aroyo, L., and Welty, C. Truth Is a Lie: Crowd Truth and the Seven Myths of Human Annotation. AI Magazine 36, 1
(Mar. 2015), 15.
[3] Artstein, R., and Poesio, M. Bias decreases in proportion to the number of annotators. In Proceedings of FG-MoL
2005 : the 10th Conference on Formal Grammar and the 9th Meeting on Mathematics of Language, Edinburgh, 5–7 August,
2005 (2005), pp. 139–148.
[4] Baker, P., and Potts, A. ‘Why do white people have thin lips?’ Google and the perpetuation of stereotypes via
auto-complete search forms. Critical Discourse Studies 10, 2 (May 2013), 187–204.
[5] Barabas, C., Doyle, C., Rubinovitz, J., and Dinakar, K. Studying up: reorienting the study of algorithmic fairness
around issues of power. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency (Barcelona,
Spain, 2020), FAT* ’20, Association for Computing Machinery, pp. 167–176.
[6] Barbosa, N. M., and Chen, M. Rehumanized Crowdsourcing: A Labeling Framework Addressing Bias and Ethics
in Machine Learning. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (Glasgow
Scotland Uk, May 2019), ACM, pp. 1–12.
[7] Bardzell, S., and Bardzell, J. Towards a feminist HCI methodology: social science, feminism, and HCI. In Proceedings
of the SIGCHI Conference on Human Factors in Computing Systems (Vancouver BC Canada, May 2011), ACM, pp. 675–684.
[8] Bender, E. M., and Friedman, B. Data Statements for Natural Language Processing: Toward Mitigating System Bias
and Enabling Better Science. Transactions of the Association for Computational Linguistics 6 (2018), 587–604.
[9] Benjamin, R. Race After Technology: Abolitionist Tools for the New Jim Code, 1. edition ed. Polity, Medford, MA, June
2019.
[10] Berendt, B. AI for the Common Good?! Pitfalls, challenges, and ethics pen-testing. Paladyn, Journal of Behavioral
Robotics 10, 1 (Jan. 2019), 44–65.
[11] Birhane, A., Kalluri, P., Card, D., Agnew, W., Dotan, R., and Bao, M. The Values Encoded in Machine Learning
Research. arXiv:2106.15590 [cs] (June 2021). arXiv: 2106.15590.
[12] boyd, d. How an Algorithmic World Can Be Undermined, 2018.
[13] Brodley, C. E., and Friedl, M. A. Identifying Mislabeled Training Data. Journal of Artificial Intelligence Research 11
(Aug. 1999), 131–167.
[14] Browne, S. Dark Matters: On the Surveillance of Blackness. Duke University Press, Durham, NC, 2015.
[15] Buolamwini, J., and Gebru, T. Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Clas-
sification. In Proceedings of the 1st Conference on Fairness, Accountability and Transparency (2018), vol. 81, PMLR,
pp. 77–91.
[16] Casilli, A. A. Digital labor studies go global: Toward a digital decolonial turn. International Journal of Communication
11 (2017), 3934–3954.
[17] Casilli, A. A., and Posada, J. The Platformisation of Labor and Society. In Society and the Internet, M. Graham and
W. H. Dutton, Eds., vol. 2 ed. Oxford University Press, Oxford, 2019.
[18] Casilli, A. A., Tubaro, P., Le Ludec, C., Coville, M., Besenval, M., Mouhtare, T., and Wahal, E. Le Micro-Travail
en France. Derrière l’automatisation de nouvelles précarités au travail ? Projet DiPLab « Digital Platform Labor », Paris,
2019.
[19] Chancellor, S., Baumer, E. P., and De Choudhury, M. Who is the “human” in human-centered machine learning:
The case of predicting mental health from social media. Proceedings of the ACM on Human-Computer Interaction 3,
CSCW (2019).
[20] Chen, L., Ma, R., Hannák, A., and Wilson, C. Investigating the Impact of Gender on Rank in Resume Search Engines.
In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (Montreal QC Canada, Apr. 2018),
ACM, pp. 1–14.
[21] Cheng, J., and Cosley, D. How annotation styles influence content and preferences. In Proceedings of the 24th
ACM Conference on Hypertext and Social Media - HT ’13 (Paris, France, 2013), Association for Computing Machinery,
pp. 214–218.
[22] Couldry, N., and Mejias, U. A. Data Colonialism: Rethinking Big Data’s Relation to the Contemporary Subject.
Television & New Media 20, 4 (May 2019), 336–349.
[23] Crawford, K., and Paglen, T. Excavating AI: The Politics of Images in Machine Learning Training Sets, Sept. 2019.
[24] Dafoe, A., Bachrach, Y., Hadfield, G., Horvitz, E., Larson, K., and Graepel, T. Cooperative AI: machines must
learn to find common ground. Nature 593, 7857 (may 2021), 33–36.
[25] Danielle K. Citron. The Scored Society: Due Process for Automated Predictions. Washington Law Review 89, 1 (Mar.
2014), 1–33.
[26] Davis, H. A Dataset is a Worldview, Mar. 2020. Library Catalog: towardsdatascience.com.
[27] Denton, E., Hanna, A., Amironesei, R., Smart, A., Nicole, H., and Scheuerman, M. K. Bringing the People Back In:
Contesting Benchmark Machine Learning Datasets. arXiv:2007.07399 [cs] (July 2020). arXiv: 2007.07399.
[28] D’Ignazio, C., and Klein, L. F. Data feminism. Strong ideas series. The MIT Press, Cambridge, Massachusetts, 2020.
33
[29] Dixon, L., Li, J., Sorensen, J., Thain, N., and Vasserman, L. Measuring and Mitigating Unintended Bias in Text
Classification. In Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society - AIES ’18 (New Orleans, LA,
USA, 2018), ACM Press, pp. 67–73.
[30] Eickhoff, C. Cognitive Biases in Crowdsourcing. In Proceedings of the Eleventh ACM International Conference on Web
Search and Data Mining (Marina Del Rey CA USA, Feb. 2018), ACM, pp. 162–170.
[31] Ekbia, H., and Nardi, B. Heteromation and its (dis)contents: The invisible division of labor between humans and
machines. First Monday (May 2014).
[32] Ekbia, H., and Nardi, B. The political economy of computing: the elephant in the HCI room. Interactions 22, 6 (Oct.
2015), 46–49.
[33] Ekbia, H., and Nardi, B. Social Inequality and HCI: The View from Political Economy. In Proceedings of the 2016 CHI
Conference on Human Factors in Computing Systems (San Jose California USA, May 2016), ACM, pp. 4997–5002.
[34] Fan, S., Gadiraju, U., Checco, A., and Demartini, G. CrowdCO-OP: Sharing Risks and Rewards in Crowdsourcing.
Proceedings of the ACM on Human-Computer Interaction 4, CSCW2 (Oct. 2020), 1–24.
[35] Feinberg, M. A Design Perspective on Data. In CHI ’17: Proceedings of the 2017 CHI Conference on Human Factors in
Computing Systems (Denver, Colorado, USA, 2017), CHI ’17, Association for Computing Machinery, pp. 2952–2963.
[36] Finin, T., Murnane, W., Karandikar, A., Keller, N., Martineau, J., and Dredze, M. Annotating named entities in
Twitter data with crowdsourcing. In Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language
Data with Amazon’s Mechanical Turk (Los Angeles, California, June 2010), CSLDAMT ’10, Association for Computational
Linguistics, pp. 80–88.
[37] Forsythe, D. E. Studying Those Who Study Us: An Anthropologist in the World of Artificial Intelligence. Stanford
University Press, Stanford, 2001.
[38] Gebru, T., Morgenstern, J., Vecchione, B., Vaughan, J. W., Wallach, H., Daumé III, H., and Crawford, K. Datasheets
for Datasets. arXiv:1803.09010 [cs] (Mar. 2020). arXiv: 1803.09010.
[39] Geva, M., Goldberg, Y., and Berant, J. Are We Modeling the Task or the Annotator? An Investigation of Annotator
Bias in Natural Language Understanding Datasets. In Proceedings of the 2019 Conference on Empirical Methods in
Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)
(Hong Kong, China, 2019), Association for Computational Linguistics, pp. 1161–1166.
[40] Ghai, B., Liao, Q. V., Zhang, Y., and Mueller, K. Measuring Social Biases of Crowd Workers using Counterfactual
Queries.
[41] Gray, M. L., and Suri, S. Ghost Work: How to Stop Silicon Valley from Building a New Global Underclass. Houghton
Mifflin Harcourt, Boston, May 2019.
[42] Green, B. Data Science as Political Action: Grounding Data Science in a Politics of Justice. SSRN Scholarly Paper ID
3658431, Social Science Research Network, Rochester, NY, July 2020.
[43] Haimson, O. L., and Hoffmann, A. L. Constructing and enforcing "authentic" identity online: Facebook, real names,
and non-normative identities. First Monday (June 2016).
[44] Hamidi, F., Scheuerman, M. K., and Branham, S. M. Gender Recognition or Gender Reductionism?: The Social
Implications of Embedded Gender Recognition Systems. In Proceedings of the 2018 CHI Conference on Human Factors in
Computing Systems (Montreal QC Canada, Apr. 2018), ACM, pp. 1–13.
[45] Hanna, A., Denton, E., Smart, A., and Smith-Loud, J. Towards a Critical Race Methodology in Algorithmic Fairness.
In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency (Barcelona, Spain, 2020), FAT* ’20,
Association for Computing Machinery, pp. 501–512.
[46] Hannák, A., Wagner, C., Garcia, D., Mislove, A., Strohmaier, M., and Wilson, C. Bias in Online Freelance
Marketplaces: Evidence from TaskRabbit and Fiverr. In Proceedings of the 2017 ACM Conference on Computer Supported
Cooperative Work and Social Computing (Portland Oregon USA, Feb. 2017), ACM, pp. 1914–1933.
[47] Hara, K., Adams, A., Milland, K., Savage, S., Callison-Burch, C., and Bigham, J. P. A Data-Driven Analysis of
Workers’ Earnings on Amazon Mechanical Turk. In Proceedings of the 2018 CHI Conference on Human Factors in
Computing Systems (Montreal QC Canada, Apr. 2018), ACM, pp. 1–14.
[48] Holland, S., Hosny, A., Newman, S., Joseph, J., and Chmielinski, K. The Dataset Nutrition Label: A Framework To
Drive Higher Data Quality Standards. arXiv:1805.03677 (2018).
[49] Hube, C., Fetahu, B., and Gadiraju, U. Understanding and Mitigating Worker Biases in the Crowdsourced Collection
of Subjective Judgments. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (New York,
NY, USA, 2019), CHI ’19, Association for Computing Machinery, pp. 1–12. tex.ids: hube2019a event-place: Glasgow,
Scotland Uk.
[50] Hunter-Syed, A., and Gebru, T. Timnit Gebru on Algorithmic Bias & Data Mining Ethics, Apr. 2020.
[51] Irani, L. C., and Silberman, M. S. Turkopticon: interrupting worker invisibility in amazon mechanical turk. In
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Paris, France, Apr. 2013), CHI ’13,
34
[52] Kalluri, P. Don’t ask if artificial intelligence is good or fair, ask how it shifts power. Nature 583, 7815 (July 2020),
169–169. Number: 7815 Publisher: Nature Publishing Group.
[53] Kay, M., Matuszek, C., and Munson, S. A. Unequal Representation and Gender Stereotypes in Image Search Results
for Occupations. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems (Seoul
Republic of Korea, Apr. 2015), ACM, pp. 3819–3828.
[54] Kazimzade, G., and Miceli, M. Biased Priorities, Biased Outcomes: Three Recommendations for Ethics-oriented Data
Annotation Practices. In Proceedings of the AAAI/ACM Conference on Artificial Intelligence, Ethics, and Society. (New
York, NY, USA, Feb. 2020), AIES ’20, Association for Computing Machinery, pp. 1–7.
[55] Kitchin, R. The data revolution: big data, open data, data infrastructures & their consequences. SAGE Publications, Los
Angeles, California, 2014.
[56] Kogan, M., Halfaker, A., Guha, S., Aragon, C., Muller, M., and Geiger, S. Mapping Out Human-Centered Data
Science: Methods, Approaches, and Best Practices. In Companion of the 2020 ACM International Conference on Supporting
Group Work (Sanibel Island Florida USA, Jan. 2020), ACM, pp. 151–156.
[57] Kulshrestha, J., Eslami, M., Messias, J., Zafar, M. B., Ghosh, S., Gummadi, K. P., and Karahalios, K. Quantifying
Search Bias: Investigating Sources of Bias for Political Searches in Social Media. In Proceedings of the 2017 ACM
Conference on Computer Supported Cooperative Work and Social Computing (New York, NY, USA, 2017), CSCW ’17,
[58] Lee, M. S. A., and Floridi, L. Algorithmic Fairness in Mortgage Lending: from Absolute Conditions to Relational
Trade-offs. Minds and Machines 31, 1 (Mar. 2021), 165–191.
[59] Madaio, M. A., Stark, L., Wortman Vaughan, J., and Wallach, H. Co-Designing Checklists to Understand
Organizational Challenges and Opportunities around Fairness in AI. In Proceedings of the 2020 CHI Conference on
Human Factors in Computing Systems (Honolulu, HI, USA, Apr. 2020), CHI ’20, Association for Computing Machinery,
pp. 1–14.
[60] Martin, D., Hanrahan, B. V., O’Neill, J., and Gupta, N. Being a turker. In Proceedings of the 17th ACM conference on
Computer supported cooperative work & social computing (Baltimore Maryland USA, Feb. 2014), ACM, pp. 224–235.
[61] Miceli, M., and Posada, J. Wisdom for the Crowd: Discoursive Power in Annotation Instructions for Computer
Vision. arXiv:2105.10990 [cs] (May 2021). arXiv: 2105.10990.
[62] Miceli, M., Schuessler, M., and Yang, T. Between Subjectivity and Imposition: Power Dynamics in Data Annotation
for Computer Vision. Proceedings of the ACM on Human-Computer Interaction 4, CSCW2 (Oct. 2020), 1–25.
[63] Miceli, M., Yang, T., Naudts, L., Schuessler, M., Serbanescu, D., and Hanna, A. Documenting Computer Vision
Datasets: An Invitation to Reflexive Data Practices. In Proceedings of the 2021 ACM Conference on Fairness, Accountability,
and Transparency (Virtual Event Canada, Mar. 2021), ACM, pp. 161–172.
[64] Mitchell, M., Wu, S., Zaldivar, A., Barnes, P., Vasserman, L., Hutchinson, B., Spitzer, E., Raji, I. D., and Gebru,
T. Model Cards for Model Reporting. In Proceedings of the Conference on Fairness, Accountability, and Transparency
(2019), FAT* ’19, Association for Computing Machinery, pp. 220–229.
[65] Muller, M. Feminism asks the “Who” questions in HCI. Interacting with Computers 23, 5 (Sept. 2011), 447–449.
[66] Muller, M., Aragon, C., Guha, S., Kogan, M., Neff, G., Seidelin, C., Shilton, K., and Tanweer, A. Interrogating
Data Science. In Conference Companion Publication of the 2020 on Computer Supported Cooperative Work and Social
Computing (Virtual Event USA, Oct. 2020), ACM, pp. 467–473.
[67] Muller, M., Feinberg, M., George, T., Jackson, S. J., John, B. E., Kery, M. B., and Passi, S. Human-Centered Study
of Data Science Work Practices. In Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing
Systems (Glasgow Scotland Uk, May 2019), ACM, pp. 1–8.
[68] Muller, M., Lange, I., Wang, D., Piorkowski, D., Tsay, J., Liao, Q. V., Dugan, C., and Erickson, T. How Data Science
Workers Work with Data: Discovery, Capture, Curation, Design, Creation. In Proceedings of the 2019 CHI Conference
on Human Factors in Computing Systems (Glasgow, Scotland Uk, 2019), CHI ’19, Association for Computing Machinery,
pp. 1–15.
[69] Muller, M., Wolf, C. T., Andres, J., Desmond, M., Joshi, N. N., Ashktorab, Z., Sharma, A., Brimijoin, K., Pan, Q.,
Duesterwald, E., and Dugan, C. Designing Ground Truth and the Social Life of Labels. In Proceedings of the 2021
CHI Conference on Human Factors in Computing Systems (Yokohama Japan, May 2021), ACM, pp. 1–16.
[70] Noble, S. U. Algorithms of Oppression: How Search Engines Reinforce Racism. NYU Press, New York, 2018.
[71] Olteanu, A., Castillo, C., Diaz, F., and Kiciman, E. Social Data: Biases, Methodological Pitfalls, and Ethical
Boundaries. SSRN Electronic Journal (2016).
[72] Passi, S., and Barocas, S. Problem Formulation and Fairness. In Proceedings of the Conference on Fairness, Accountability,
and Transparency (Atlanta, GA, USA, 2019), FAT* ’19, Association for Computing Machinery, pp. 39–48.
[73] Passi, S., and Jackson, S. Data Vision: Learning to See Through Algorithmic Abstraction. In Proceedings of the 2017
ACM Conference on Computer Supported Cooperative Work and Social Computing (Portland, Oregon, USA, 2017), CSCW
’17, Association for Computing Machinery, pp. 2436–2447.
35
[74] Passi, S., and Jackson, S. J. Trust in Data Science: Collaboration, Translation, and Accountability in Corporate Data
Science Projects. Proc. ACM Hum.-Comput. Interact. 2, CSCW (Nov. 2018), 1–28.
[75] Paullada, A., Raji, I. D., Bender, E. M., Denton, E., and Hanna, A. Data and its (dis)contents: A survey of dataset
development and use in machine learning research. arXiv:2012.05345 [cs] (Dec. 2020). arXiv: 2012.05345.
[76] Philip, K., Irani, L., and Dourish, P. Postcolonial Computing: A Tactical Survey. Science, Technology, & Human
Values 37, 1 (Jan. 2012), 3–29.
[77] Pine, K. H., and Liboiron, M. The Politics of Measurement and Action. In Proceedings of the 33rd Annual ACM
Conference on Human Factors in Computing Systems (New York, NY, USA, 2015), CHI ’15, Association for Computing
Machinery, pp. 3147–3156.
[78] Posada, J. The Future of Work Is Here: Toward a Comprehensive Approach to Artificial Intelligence and Labour.
Ethics of AI in Context (2020).
[79] Posada, J. Unbiased: AI Needs Ethics from Below. In New AI Lexicon, N. Raval, A. Kak, and L. Strathman, Eds. AI Now
Institute, New York, NY, 2021.
[80] Powles, J., and Nissenbaum, H. The Seductive Diversion of ‘Solving’ Bias in Artificial Intelligence, Dec. 2018.
[81] Ramanath, R., Choudhury, M., Bali, K., and Roy, R. S. Crowd Prefers the Middle Path: A New IAA Metric for
Crowdsourcing Reveals Turker Biases in Query Segmentation. In Proceedings of the 51st Annual Meeting of the
Association for Computational Linguistics (Volume 1: Long Papers) (Sofia, Bulgaria, 2013), Association for Computational
Linguistics, pp. 1713–1722.
[82] Ross, J., Irani, L., Silberman, M. S., Zaldivar, A., and Tomlinson, B. Who are the crowdworkers?: shifting
demographics in mechanical turk. In CHI ’10 Extended Abstracts on Human Factors in Computing Systems (Atlanta
Georgia USA, Apr. 2010), ACM, pp. 2863–2872.
[83] Salehi, N., Irani, L. C., Bernstein, M. S., Alkhatib, A., Ogbe, E., Milland, K., and Clickhappier. We Are Dynamo:
Overcoming Stalling and Friction in Collective Action for Crowd Workers. In Proceedings of the 33rd Annual ACM
Conference on Human Factors in Computing Systems (Seoul Republic of Korea, Apr. 2015), ACM, pp. 1621–1630.
[84] Sampat, R. Protesters target Facebook’s ’real name’ policy, June 2015.
[85] Scheuerman, M. K., Wade, K., Lustig, C., and Brubaker, J. R. How We’ve Taught Algorithms to See Identity:
Constructing Race and Gender in Image Databases for Facial Analysis. Proc. ACM Hum.-Comput. Interact. 4, CSCW1
(2020). Article 058.
[86] Seaver, N. Studying Up: The Ethnography of Technologists, Mar. 2014.
[87] Seidelin, C. Towards a Co-design Perspective on Data : Foregrounding Data in the Design and Innovation of Data-based
Services. Ph.D. thesis, IT-Universitetet i København, 2020.
[88] Star, S. L., and Strauss, A. Layers of Silence, Arenas of Voice: The Ecology of Visible and Invisible Work. Computer
Supported Cooperative Work 8, 1-2 (Mar. 1999), 9–30.
[89] Tubaro, P., and Casilli, A. A. Micro-work, artificial intelligence and the automotive industry. Journal of Industrial
and Business Economics (2019).
[90] Wauthier, F. L., and Jordan, M. I. Bayesian Bias Mitigation for Crowdsourcing. In Proceedings of the 24th International
Conference on Neural Information Processing Systems (Granada, Spain, 2011), NIPS’11, Curran Associates Inc., pp. 1800–
1808.
[91] Yano, T., Resnik, P., and Smith, N. A. Shedding (a Thousand Points of) Light on Biased Language. In Proceedings of
the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk (Los Angeles,
2010), Association for Computational Linguistics, pp. 152–158.
[92] Zhang, A. X., Muller, M., and Wang, D. How do Data Science Workers Collaborate? Roles, Workflows, and Tools.
Proceedings of the ACM on Human-Computer Interaction 4, CSCW1 (May 2020), 1–23.
Received July 2021; revised September 2021; accepted October 2021
36
Meaning Imposition and Epistemic
3
Authority in Data Annotation

In the same way that training data shapes ML systems, ground-truth data is conditioned by the
production contexts in which it is designed, developed, and deployed. Besides technical exercise
and operation, producing and working with data involves mastering forms of interpretation [42,
43, 47, 21, 134]. Consequently, examining data provenance and the work practices involved in
data production is essential for investigating subjective worldviews and assumptions embedded
in datasets and models [135, 136, 137].
Following the research agenda outlined in Paper 1 [37], this chapter presents an ethnographic
study of data-work settings and practices. To delve deeper into investigating the relationship
between data production and power, the present chapter focuses on one type of data work and
one specific ML application, namely, data annotation for computer vision.
In formal terms, image data annotation involves tasks such as curation, labeling, keywording,
and semantic segmentation which consists of marking and separating the different objects
contained in a picture. But beyond specific task descriptions, annotation work is fundamentally
about making sense of data, i.e., interpreting the complex information contained in each image.
The work included in this chapter, Paper 2 Between Subjectivity and Imposition: Power
Dynamics in Data Annotation for Computer Vision, is my first first-author paper. It presents
a grounded theory [95, 97, 96] investigation based on the early phase of exploratory fieldwork
at S1 and S2, the BPOs located in Argentina and Bulgaria. It includes interviews with data
annotators, BPOs management, and computer vision practitioners. It describes some of the
annotation projects involving image data that I observed at both field sites. In this paper,
my co-authors and I explore the role of workers’ subjectivity in the classification and labeling
of images and describe structures and standards that shape the interpretation of data. The
investigation is based on the following research questions:
1. How do data annotators make sense of data?
37
3. Meaning Imposition and Epistemic Authority in Data Annotation
2. What conditions, structures, and standards shape that sense-making practice?
3. Who, and at what stages of the annotation process, decides which classifications best
define each data point?
The findings show that the work of data annotators is profoundly influenced by the interests,
values, and priorities of clients. Arbitrary classifications are imposed top-down on annotators,
and through them, on data. In this sense, the power to impose labels and meanings correlates
with the possession of financial means to pay for data annotators who execute that imposition.
This form of power, epistemic authority, and imposition is largely naturalized in industry
settings because it seems common sense for data to be labelled according to the requirements
of paying clients, no matter how arbitrary or harmful such classifications might be. In this
sense, the main contribution of this paper is bringing the political-economy dimension of
data production into discussion, and showing that naturalized power asymmetries, not the
individual biases of annotators, fundamentally shape ML data and systems.
In view of these findings, we close this paper with a call for systematic documentation
of ML datasets. Documentation should, of course, reflect datasets’ technical features, but it
should also be able to make explicit the actors, hierarchies, and rationale behind the labels
assigned to data. Based on this call, Chapter 5 will explore forms of documentation to reflect
data production processes, guided by the value of reflexivity and based on the possibilities and
desiderata of data workers.
Paper 2 was published in October 2020 in the journal Proceedings of the ACM on Human-
Computer Interaction and presented at the 2020 ACM Conference On Computer-Supported
Cooperative Work And Social Computing (CSCW’20), where it was also awarded Best Paper.
To comply with TU Berlin’s regulations, I make my contribution and that of my co-authors
explicit: The first idea and complete manuscript were drafted by me in January 2020 based on
the interviews and observations I had conducted up until then. Tianling Yang collaborated
with literature research and was essential to help code the data and establish inter-subject
comprehensibility [120]. Martin Schuessler guided me through the process of writing a scientific
paper, considering that this was my first time. Moreover, his expertise working in HCI was key
to make this work legible to the CSCW community and, more broadly, computer scientists.
38
Paper 2: Between Subjectivity and Imposition
Between Subjectivity and Imposition: Power Dynamics in

Data Annotation for Computer Vision
MILAGROS MICELI, Technische Universität Berlin, Weizenbaum Institut, Germany

MARTIN SCHUESSLER, Technische Universität Berlin, Weizenbaum Institut, Germany
TIANLING YANG, Technische Universität Berlin, Weizenbaum Institut, Germany
The interpretation of data is fundamental to machine learning. This paper investigates practices of image
data annotation as performed in industrial contexts. We define data annotation as a sense-making practice,
where annotators assign meaning to data through the use of labels. Previous human-centered investigations
have largely focused on annotators’ subjectivity as a major cause of biased labels. We propose a wider view
on this issue: guided by constructivist grounded theory, we conducted several weeks of fieldwork at two
annotation companies. We analyzed which structures, power relations, and naturalized impositions shape the 115
interpretation of data. Our results show that the work of annotators is profoundly informed by the interests,
values, and priorities of other actors above their station. Arbitrary classifications are vertically imposed on
annotators, and through them, on data. This imposition is largely naturalized. Assigning meaning to data
is often presented as a technical matter. This paper shows it is, in fact, an exercise of power with multiple
implications for individuals and society.
CCS Concepts: • Human-centered computing → Empirical studies in collaborative and social com-
puting; • Social and professional topics → Employment issues; • Computing methodologies →
Supervised learning by classification.
Additional Key Words and Phrases: Machine Learning, Computer Vision, Data Annotation, Image Data, Power,
Social Inequity, Grounded Theory, Symbolic Power, Classification, Subjectivity, Data Creation, Work Place
Ethnography, Training and Evaluation Data, Image Labeling
Milagros Miceli, Martin Schuessler, and Tianling Yang. 2020. Between Subjectivity and Imposition: Power
Dynamics in Data Annotation for Computer Vision. Proc. ACM Hum.-Comput. Interact. 1, 1, Article 115
(October 2020), 25 pages. https://doi.org/10.1145/3415186
1 INTRODUCTION
Power imbalances related to practices of classification have long been a topic of interest for
the social sciences [9, 11, 13, 16, 32, 57]. What is (relatively) new is that arbitrary classifications
are increasingly established and stabilized through automated algorithmic systems[57, 62]. With
each system’s outcome, meaning is imposed, and higher or lower social positions, chances, and
disadvantages are assigned [6, 24, 37, 63]. These systems are often expected to minimize human
intervention in decision-making and thus be neutral and value-free [23, 51, 73]. However, previous
research has shown that they may contain biases that lead to discrimination and exclusion in several
domains such as credit [37], the job market [70], facial recognition systems [19, 45, 71], algorithmic
Authors’ addresses: Milagros Miceli, Technische Universität Berlin, Weizenbaum Institut, Berlin, Germany, m.miceli@tu-
berlin.de; Martin Schuessler, Technische Universität Berlin, Weizenbaum Institut, Berlin, Germany, schuessler@tu-berlin.de;
Tianling Yang, Technische Universität Berlin, Weizenbaum Institut, Berlin, Germany, tiangling.yang@tu-berlin.de.
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee
provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and
the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses,
contact the owner/author(s).
© 2020 Copyright held by the owner/author(s).
2573-0142/2020/10-ART115
https://doi.org/10.1145/3415186
Proc. ACM Hum.-Comput. Interact., Vol. 1, No. 1, Article 115. Publication date: October 2020.
39
115:2 Milagros Miceli et al.
filtering [4, 62], and even advertisement [1]. Critical academic work has furthermore discussed the
politics involved in data-driven systems [27, 30, 56] and highlighted the importance of investigating
the capitalistic logics woven into them [20, 26, 81]. What the enthusiasm of technologists seems to
render invisible is that algorithmic systems are crafted by humans and hence laden with subjective
judgments, values, and interests [31, 44]. Moreover, before the smartest system is able to make
predictions, humans first need to make sense of the data that feeds it [61, 66, 72]. Despite its
highly interpretative character, data-related work is still often believed to be neutral, “comprising
unambiguous data, and proceeding through regularized steps of analysis” [61].
The present paper investigates data annotation for computer vision based on three research
questions: How do data annotators make sense of data? What conditions, structures, and standards
shape that sense-making praxis? Who, and at what stages of the annotation process, decides which
classifications best define each data point? We present a constructivist grounded theory [21, 59, 60]
investigation comprising several weeks of fieldwork at two annotation companies and 24 interviews
with annotators, management, and computer vision practitioners. We define data annotation as a
sense-making [52] process where actors classify data by assigning meaning to its content through
the use of labels. As we have observed, this process involves several actors and iterations and begins
as clients transform their needs and expectations into annotation instructions. The sensemaking of
data, so we argue, does not happen in a vacuum and cannot be analyzed independently from the
context in which it is carried out.
We use Bourdieu’s [13] concept of symbolic power, defined as the authority to impose meanings
that will appear as legitimate and part of a natural order of things, as a lens to analyze the dynamics
of imposition and naturalization inscribed in the classification, sorting, and labeling of data. Previous
research in the field of data annotation has largely focused on workers’ individual subjectivities as
a major cause for biased labels [18, 40, 48, 77]. Conversely, our investigation introduces a power-
oriented perspective and shows that hierarchical structures broadly inform the interpretation of
data. Top-down meaning impositions that follow the demands of clients and the market shape data
profoundly.
With this investigation, we seek to orient the discussion towards the interests and values
embedded in the systems that potentially shape our individual life-chances [37]. Through the
description of three observed annotation projects, we expose the deeply normative nature of
such forms of data classification and discuss their effects on labels and datasets. Building on this
perspective, we propose the incorporation of power-aware documentation in processes of data
annotation as a method to restore context. We argue that reflexive practices can improve deliberative
accountability, compliance to regulations, and the explication and preservation of effective data
work knowledge. With this work, we also hope to inspire researchers to adopt a situated and
power-aware perspective not only to investigate practices of data creation but also as a tool for
reflecting power dynamics in their own research process.
2 RELATED WORK
2.1 Data Work as Human Activity
Previous work has argued that data-driven systems are often linked to “a technologically-inflected
promise of mechanical neutrality” [41]. However, these systems require, in many cases, the in-
tervention of human discretion in their deployment [2, 64], and even more frequently, in their
creation [18, 22, 35, 36, 39, 48, 51, 61, 66, 67]. Moreover, critical research has argued that data-driven
systems embody the personal and corporate values and interests of the people and organizations
involved in their development [31, 51, 53, 57]. As Klinger and Svensson state, “arguments that
technology had agency on its own hide the individuals, structures, and relations in power and
40
Between Subjectivity and Imposition 115:3
thus serve their interests, interests that become increasingly blurred” [53]. A view into the power
dynamics encoded in data and systems is, as we will argue, of fundamental importance, especially
considering that “the technical nature of the procedures tends to mask the presumptions that enter
into the programming process, the choices that are made, and the conceivable alternatives that are
ruled out” [57].
Besides technical exercise and operation, the development of data-driven products involves
“mastering forms of discretion” [66] and is conditioned by the networked systems in which they
are created, developed, and deployed [51, 73]. Kitchin [51] pinpoints various processes and factors
that reveal extensive human interventions in data-driven systems, such as the translation of tasks
into algorithmic models, available resources, the choice of training data, hardware and platform,
the creative process of programming, and adaptation of systems to meet requirements of standards
and regulations. He further argues that algorithmic systems are subject to the purposes of their
creation: “to create value and capital; to nudge behaviour and structure preferences in a certain
way; and to identify, sort and classify people” [51].
The examination of the provenance of data and the work practices involved in their creation
is fundamental for the investigation of subjectivities and assumptions embedded in algorithmic
systems. Passi and Jackson [66] propose the concept of data vision to describe the ability to
successfully work with data through an effective combination of formal knowledge and tools, and
situated decisions in the face of empirical contingency. Mastery of this interplay is essential to data
analysts, which reveals “the breadth and depth of human work” inscribed in data [66].
Embedded in such processes are not only individual subjectivities, but also narratives, prefer-
ences, and values related to larger socio-economic contexts [8, 50, 67]: “Numbers not only signify
model performance or validity, but also embody specific technical ideals and business values.” [67].
Data practices such as the choice of training data, data capturing measurement interfaces [68], and
the selection of data attributes [61] as well as the design of data in an algorithmically recognizable,
tractable, and analyzable way [35, 61], all indicate that data is created through human interven-
tion [61]. Feinberg points to the “interpretive flexibility” and situated nature of data and considers
data as a product of “interlocking design decisions” made by data designers [35]. According to
Muller et al. [61], the degree of human intervention will determine how deep and fundamental
subjective interpretations are inscribed in data and its analysis.
The present paper unpacks data annotation practices with a human-centered perspective. The
practices we have observed and analyzed are situated in outsourcing companies that provide
annotation services for commissioning clients. As previous work has argued, service is situated
in local, cultural, and social contexts [50] and is co-produced and co-created in the interactions
between service providers and recipients [8]. This perspective sheds light on the situated [33, 66]
and collaborative [35, 67] nature of data work, as clients and annotation teams both participate in
the creation of datasets. Scrutinizing data annotation with a service perspective further requires
taking into consideration institutional structures and organizational routines [3, 50].
Annotation tasks are, as we will argue, mainly about sensemaking [52], i.e. framing data to
make it categorizable, sortable, and interpretable. Previous work in this space has largely focused
on individual preconceptions, considering annotators’ subjectivities to be a major source for
labeling bias [18, 40, 48, 77]. Other researchers (we among them) explore factors beyond individual
subjectivities that influence workers and labels, such as loosely-defined annotation guidelines and
annotation context [36], the choice of annotation styles [22], and the interference between items
in the same data batch [80]. In a thorough investigation into annotation practices in academic
research, Geiger et al. [39] draw attention to the background of annotators, formal definitions and
qualifications, training, pre-screening for crowdwork platforms, and inter-rater reliability processes.
41
The authors consider these factors to be likely to influence the annotations and advocate for their
documentation.
With the present paper, we join the discussion around subjectivity in data annotation. By
examining the processes and contexts that shape this line of work, we argue that subjectivity can
also be shaped by power structures that enable the imposition of meanings and classifications.
2.2 Data, Classification, Power

Practices related to classifying and naming constitute the core of data annotation work. As Bowker
and Star [16] have most prominently argued, classifications represent subjective social and technical
choices that have significant yet usually hidden or blurry ethical and political implications [79].
Classification practices are constructed and, at the same time, construct the social reality we
perceive and live in [11]. Therefore, they are also culturally and historically specific [46]. Adopting
a critical position to examine these practices is essential because, as Durkheim and Mauss argue,
“every classification implies a hierarchical order for which neither the tangible world nor our mind
gives us the model. We therefore have reason to ask where it was found” [32].
Humans collect, label, and analyze data in the usually invisible context of a plan that determines
what is considered data [17, 68] and how that data is to be classified [16]. “A dataset is a worldview”,
as Davis [29] wonderfully puts it. Accordingly, it can never be objective nor exhaustive because, “it
encompasses the worldview of the labelers, whether they labeled the data manually, unknowingly,
or through a third party service like Mechanical Turk, which comes with its own demographic biases.
It encompasses the worldview of the built-in taxonomies created by the organizers, which in many
cases are corporations whose motives are directly incompatible with a high quality of life.” [29].
Furthermore, decisions about what information to collect and how to measure and interpret data
define possibilities for action by making certain aspects of the social world visible – thus measurable
– while excluding other aspects [30, 68]. Data-related decisions are infrastructural decisions [16, 68]
as they “exercise covert political power by bringing certain things into spreadsheets and data
infrastructures, and thus into management and policy” [68]. This way, datasets are powerful
technologies [16] that bring into existence what they contain, and render invisible what they
exclude. As Bowker argues, “the database itself will ultimately shape the world in its image: it will
be performative.” [15].
The performative character of datasets, that is, the power of creating reality through inclusion
and exclusion, relates to Pierre Bourdieu’s theorization of symbolic power. Symbolic power is
the authority to sort social reality by separating groups, classifying, and naming them [10, 13].
Every act of classification is an attempt to impose a specific reading of the social world over other
possible interpretations [57]. Thus, symbolic power is not merely a matter of naming or describing
social reality but a way of “making the world through utterance” [13]. The power aspect here
relates to the authority to lend legitimacy to certain definitions while delegitimizing others. This
authority is unevenly distributed and correlates with the possession of economic, cultural, and
social capital [11].
According to Bourdieu [13], dominant worldviews find their origin in arbitrary classifications
that serve to legitimize and perpetuate power asymmetries, by making seem natural what is in
fact political: “Every established order tends to produce (to very different degrees and with very
different means) the naturalization of its own arbitrariness” [9]. The systems of meaning created
through acts of symbolic power are arbitrary because they are not deducted from any natural
principle but subject to the interests and values of those in a dominant position at a given place and
time in history [12]. A combination of recognition and misrecognition is necessary to guarantee the
efficacy of arbitrary classifications [9]: the authority to impose classifications must be recognized as
legitimate, for the imposition to actually be misrecognized in its arbitrariness and be perceived as
42
natural. This process of naturalization allows arbitrary ways of sorting the social world to become
so deeply ingrained that people come to accept them as natural and indisputable. As argued by
D’Ignazio and Klein [30], “once a [classification] system is in place, it becomes naturalized as ‘the
way things are’”. Thus, the worldviews imposed through symbolic power are rendered less and less
visible in their arbitrariness, until disappearing into the realms of what is considered common sense.
As we will argue, the interplay between recognition of authority and naturalization of arbitrary
classifications decisively shapes annotations and data.
Previous investigations have related discriminatory or exclusionary outputs of data-driven
systems to symbolic power: Mau argues that “advancing digitalization and the growing importance
of Big Data have led to the rapid rise of algorithms as the primary instruments of nomination
power” [57]. Here, nomination refers to the authority to name and classify. The author describes
the ubiquity of an algorithmic authority embedded in a wide range of procedures and increasingly
participating in the reinforcement of social classifications. Crawford and Paglen [27] discuss the
politics involved in training sets for image classification. The authors expose the power dynamics
implicit in the interpretation of images as it constitutes “a form of politics, filled with questions
about who gets to decide what images mean and what kinds of social and political work those
representations perform” [27]. Even if not directly referenced to Bourdieu, Crawford and Paglen’s
conclusion closely relates to what the French sociologist has described as the “social magic” [13]
of creating reality through naming and classifying: “There’s a kind of sorcery that goes into the
creation of categories. To create a category or to name things is to divide an almost infinitely
complex universe into separate phenomena. To impose order onto an undifferentiated mass, to
ascribe phenomena to a category—that is, to name a thing—is in turn a means of reifying the
existence of that category.” [27]
Investigating data as a human-influenced entity [61] informed by power asymmetries [5] means
understanding both data and power relationally. Data exists as such through human interven-
tion [61] because, as we have seen, “raw data is an oxymoron” [42]. Similarly, Bourdieu [10] offers
a relational view of power as enacted in the interaction among actors as well as between actors and
field. In the discussion section, we will analyze the relation between annotators, data, and corporate
structures. The symbolic power construct will then offer a valuable contribution to the discussion
of assumptions encoded in datasets that reflect the naturalization of practices and meanings [9, 28].
3 METHOD
This investigation was guided by three research questions:
RQ1: How do data annotators make sense of data?
RQ2: What conditions, structures, and standards shape that sense-making praxis?
RQ3: Who, and at what stages of the annotation process, decides which classifications best
define each data point?
We followed a constructivist variation of grounded theory methodology (GTM) [21, 59, 60]. The
central premise of constructivist grounded theory is that neither data nor theories are discovered,
but are formed by the researcher’s interactions with the field and its participants [76]. This method
provided tools to systematically reflect on our position, subjectivity, and interpretative work during
fieldwork and at the coding stage.
Data was obtained through participatory observation (with varying degrees of involvement) and
qualitative interviewing (in-depth and expert interviews). Fieldwork was approached exploratorily,
guided by sensitizing concepts [21]. They helped to organize the complex stimuli in the field
without acting as hypotheses or preconceptions. Phases of data collection and analysis were
intertwined. Observations and interviews informed one another: while ideas emerging from the
observations served to identify areas of inquiry for the interviews and even possible relevant
43
interview partners, statements from the interviews pointed many times at interesting actors, tasks,
or processes needing to be more attentively observed. Through constant comparison [43], we were
able to identify differences and similarities between procedures and sites.
3.1 Data Collection

3.1.1 Participatory Observations. Part of the value of open-ended observations guided by GTM
is the opportunity to see the field inductively and allowing themes to emerge from the research
process and the data collected. However, once in the field, researchers must somehow organize
the complex stimuli experienced so that observing becomes and remains manageable because it is
certainly not possible to observe all details of all situations. At this point, sensitizing concepts come
into play to orient fieldwork [21]. Sensitizing concepts in this investigation include loosely defined
notions such as “impact sourcing”, “subjectivity”, “quality assurance”, “training”, and “company’s
structure”, which provided some initial direction to guide the observation during data gathering.
Fieldwork was conducted at two data annotation companies. At both locations, the level of
involvement regarding observations varied from shadowing to active participant observations [58].
At both annotation companies, fieldwork was allowed to commence after a representative of the
company and the researcher on the field signed non-disclosure agreements (NDA) and respectively
consented for participating in the present study. Consequently, we are restrained from disclosing
or using confidential information in this paper, particularly concerning the companies’ clients.
3.1.2 Qualitative Interviews. Part of the fieldwork conducted consisted of intensively interviewing
annotators and management. All interview partners were allowed to choose their code names or
were anonymized post-hoc to preserve the identity of related informants.
Interviews with management in additional annotation companies were framed as expert inter-
views. While in-depth interviews aim at studying the informant’s practices and perceptions, “the
purpose of the expert interview is to obtain additional unknown or reliable information, author-
itative opinions, serious and professional assessments on the research topic” [54]. The sampled
interview partners were considered experts because they provided unique insights into the struc-
tures and processes within their companies and the overall market (see table 1 and section 3.2,
Sample, for a detailed list of informants).
3.2 Sample
Four sources of information were exhaustively explored: we started with two impact sourcing
companies dedicated to data annotation located in Buenos Aires, Argentina (S1) and Sofia, Bulgaria
(S2). Impact sourcing refers to a branch of the outsourcing industry employing workers from poor
and vulnerable populations to provide information-based services at very competitive prices. We
chose annotation companies with rather traditional management structures over crowdsourcing
platforms where hierarchies appear not as evident We assumed that clear hierarchical structures
would make it easier to trace back labeling decisions and structures to real people. We also had the
preconception that tensions related to exercising power would be more prominent with workers
from vulnerable populations. Field access was another reason for our choice. Impact sourcing
companies responded most openly to our proposed ethnographic research.
While conducting fieldwork in S2, we decided to look closer into the translation of clients’
needs into annotation tasks and quality standards. Consequently, we also interviewed management
employees in three similar yet larger annotation companies (S3) and engineers with a computer
vision company using annotated training sets in Berlin, Germany (S4).
3.2.1 S1: The Annotation Company in Buenos Aires. At the time of this investigation in June 2019,
S1 is a medium-sized enterprise centrally located in Buenos Aires and dedicated to data-related
44
Table 1. Overview of Informants and Fieldwork Sites
Interview method Medium and Language Code name Role

Sole Team leader
S1: FIELDWORK Elisabeth Annotator, reviewer
(Annotation company Qual. in-depth Noah Annotator, tech leader
Face to face; Spanish (Native)
in Buenos Aires, interview Natalia Project manager
Argentina) Paula Founder
Nati QA analyst
Qual. expert interview Skype, Face-to-face; English (Proficient) Eva Founder
Qual. in-depth interview Face-to-face; English (Proficient)
Face to face; English (Proficient) Anna Intern in charge of impact assess-
Qual. expert interview
ment
S2: FIELDWORK Face to face; English (Low-intermediate) Ali Project manager, reviewer
(Annotation company Face to face; English (Low-Intermediate) Savel Annotator
in Sofia, Bulgaria) Face to face; English (Upper- Diana Annotator
Intermediate)
Qual. in-depth Face to face; English (Low-Intermediate) Hiva Annotator
interview with occasional translation by another in-
formant
Mahmud Annotator
Face to face; English (Intermediate) Mariam Annotator
Martin Annotator
Face to face; English (Advanced) Sarah Annotator
Face to face; another informant trans- Muzhgan Annotator
lated into English (Advanced)
S3: EXPERTS Jeff General manager in annotation com-
Zoom, English (Proficient)
(Managers in large Qual. expert interview pany in Iraq
annotation companies) Gina Program manager in annotation
company in Iraq
Zoom, English (Native) Adam Country manager in annotation
company in Kenya
Zoom, English (Advanced) Robert Director in annotation company in
India
S4: PRACTITIONERS Face to face; English (Proficient) Ines Project manager, data protection of-
(Computer vision Qual. in-depth ficer
company in Berlin, interview Face to face; English (Advanced) Dani Product manager
Germany) Face to face; English (Advanced) German Michael Computer vision engineer
(Native)
Face to face; English (Proficient) Dean Research scientist, lead engineer
microwork. The company has further branches in Uruguay and Colombia. The Buenos Aires office
occupies a whole floor with large common work areas. This location employs around 200 data
workers, mainly young people living in very poor neighborhoods or slums in and around Buenos
Aires. The companyäs employment strategy is a conscious decision as part of its impact sourcing
mission. At S1, workers are divided into four teams. Each team includes a project manager and
several team leaders and tech leaders. Annotators perform their tasks in-house and assume mainly
two roles: creators, doing the actual labeling work, or reviewers, who confirm or correct annotations.
Besides annotations for visual data, the company also conducts content moderation and software
testing projects. Most of the clients are large local or regional companies, including media, oil, and
technology corporations. At the time of this investigation, S1 had just started to expand to Brazil
and other international markets, which resulted in the need to train their workers in Portuguese
and English.
One particularity of S1 is that they provide workers with a steady part- or full-time salary and
benefits. This form of employment contrasts with the widespread contractor-based model in data
annotation. Even so, annotators at S1 received USD1.70 per hour, the minimum legal wage in
Argentina at the time of this investigation. These salaries left workers way below the poverty line
in a country that accumulated around 53% annual inflation in 2019. Low salaries are not the only
downside perceived by workers: informants also complained about the fixed work shifts and the
45
impossibility to work remotely, as the company does not allow its workers to take laptops or any
other equipment home.
The interviews at the Argentine company were conducted in Spanish, the mother tongue of both
interviewer and informants. Interview transcripts were coded and interpreted without translating
them by the first and third authors, native and intermediate Spanish speakers, respectively. Coding
without translation was done to preserve the original meaning of the statements. The quotations
in this paper were translated upon completion of the analysis.
3.2.2 S2: The Annotation Company in Sofia. S2 is a small annotation company in the center of
Sofia, Bulgaria. The company occupies a relatively small office. Work at this location can be quite
chaotic, with workers coming and going to receive paychecks or instructions for new projects. The
company focuses on the annotation of visual data, especially image segmentation and labeling.
The visual data involves various types of images, including medical residue, food, and satellite
imagery. The company’s clients are mostly located in Europe and North America. At the time of
this investigation in July 2019, ten active projects were handled by three employees in salaried
positions and a pool of around 60 freelance contractors. As an impact sourcing company, S2 is
committed to fair payment and works exclusively with refugees and migrants from the Middle
East. The company also favors female workers among them. Contractors mostly work remotely
with their own or company-provided laptops, with flexible hours. They are paid per picture and,
sometimes, per annotation. Payment varies according to the project and the level of difficulty. Most
informants were satisfied with the remuneration and flexible conditions. However, many of them
expressed the desire to have more stability and continuity of work and income.
All interviews at this location were conducted in English. Most annotators had low to medium
English skills, which represented a significant difficulty for the conduction of interviews. For
example, some informants over-simplified their statements and were often not able to provide
in-depth answers. The language barrier could not have been foreseen or mitigated, as the founder,
whose English skills are impeccable, had assured us a selection of interview partners with similar
language skills. The misunderstanding probably originated in the fact that all proposed informants
were indeed able to understand English at a level that was sufficient to perform their work. It
was, however, not enough for them to easily tell their stories. The language barrier required
improvisation on researchers end, including the simplification of questions and the introduction
of walk-through questions [58], allowing informants to show procedures directly while reducing
language requirements (see table 1 for more details).
3.2.3 S3: The Experts. In grounded theory investigations, decisions regarding theoretical saturation
often happen simultaneously with the gathering of data, forcing researchers to make quick decisions
on whether the collection of further or different data is necessary. While conducting fieldwork
in Bulgaria, the idea emerged that expert interviews with management in other, more prominent
impact sourcing companies could provide further insights about the translation of clients’ needs
into actual annotation tasks, standards, and quality assurance (QA). Through this form of inquiry,
we additionally sought to frame some of the fieldwork observations.
Three expert interviews were conducted: Jeff and Gina are, respectively, general and program
manager with a microwork company based in Iraq. Jeff is also in charge of training future workers
on data annotation. The company had initially been founded by a worldwide organization dedicated
to humanitarian aid and quickly became a for-profit impact sourcing company. Jeff and Gina were
interviewed simultaneously. Adam is the general manager at the Kenyan branch of an impact
sourcing company with many hubs for data annotation throughout Asia and Africa. Robert is
based in India and works as a director of machine learning with one of the oldest impact sourcing
46
companies dedicated to data annotation. The company has many branches in different Asian
countries.
The informants are identified through code names. The names of their companies remain
anonymous.
3.2.4 S4: The Practitioners. The demands, rules, and processes of clients represented a recurring
topic of the interviews conducted within data labeling companies. It seemed that managerial roles
within labeling companies implied the ability to mediate and translate the client’s requirements
into factual tasks for workers.
How do such requirements originate? On whose needs are they based? To start exploring these
questions, we decided to briefly investigate companies ordering and deploying labeled datasets for
their machine learning products. A visit to a computer vision company based in Berlin was then
arranged and carried out. Four relevant actors were interviewed in-depth at this location: a project
manager, the data protection officer, the lead engineer, and a data engineer.
While this company is not a direct client of S1, S2, or S3, it does commissions and utilizes labeled
images for its main product.
3.3 Data Analysis

The resulting 24 interviews were transcribed. Transcriptions were integrated with several pages of
field notes and various documents such as specific instructions provided by clients with labeling
requirements, metrics for quality assurance, and impact assessments. We followed the grounded
theory coding system [21] for the interpretation of data: Phases of open, axial, and selective coding
were systematically applied.
By the end of the open coding phase, a set of 28 codes had emerged. The process of axial coding
followed. We applied a set of premises [25] to make links between categories visible. The material
was then meticulously coded using the renewed set of axial categories. As part of this process,
we iteratively returned to the material to look for additional evidence and to test and revise the
emergent understanding. This analysis led to a core set of seven axial codes (see Table 2). Finally,
for the selective coding, we combined several axial codes to the core phenomenon “imposition of
meaning”. Selective coding indicates deliberate interpretive choices by the researchers. Making such
choices explicit during the analysis process is fundamental in constructivist grounded theory [21].
As a final step, we connected salient codes and categories to the core phenomenon as causal
conditions, context, intervening conditions, action/interactional strategies, or consequences [25]
(see Figure 2 “Paradigm Model”).
4 FINDINGS
The annotation of visual data consists of a set of practices aiming at interpreting the content of
images and assigning labels according to that interpretation. The observed work practices involve
mainly two tasks: labeling and segmenting. Segmenting, formally called semantic segmentation,
refers to the separation of objects within an image, thus classifying them as belonging to different
kinds. Labeling is mainly about giving a name to each of the objects that were previously classified
as different from each other. Sometimes, labeling also includes the assignment of keywords and
attributes. Those attributes fill the ascribed classifications with meaning by putting in words what
constitutes each class.
To illustrate our findings, we describe three of the observed annotation projects, that were
particularly relevant to our research questions. Several of the practices and tensions described in
these cases remained consistent across projects and even companies. Finally, we report four salient
observations that emerged from the collected data as part of the coding process.
47
Table 2. Table of core phenomenon, axial categories, open codes, and explanatory memos.
Axial Categories Open Codes Memos

Briefing Information of labeling instruction to labelers. Communication of client’s wishes and expec-
CLASSIFICATION tations. Communication chain from client to labelers.
AS POWER Struggle over meaning Struggle over the meaning of things. Power struggles to name things. Also moments of sub-
EXERCISE version from labelers.
Imposition One-way, top-down imposition of meaning during team meetings. Imposition of client’s de-
sires and/or views in view of discrepancies.
Team Agreement Democratic alignment of concepts and opinions within the team. Teamwork to reach an agree-
ment on how to name things.
Layering Nomination instances within annotation companies. Actors deciding over the interpretation
of data at different stages of the process.
Tools Different tools to perform tasks of data annotation, where they come from and how they may
LABELING OF represent a constraint for the work.
DATA Agency Room for agency while performing labeling tasks; agency here refers to the possession of
resources to achieve desired results.
Constraints Things that could count as a constraint for subjectivity when performing tasks of data anno-
tation.
Standardization How labeling is standardized. Efforts from company or client to standardize labeling tasks.
REFLEXIVITY ON Visions of future How workers imagine the future in relation to the tasks they perform.
WORK IMPACT Tech Visions of impact of technology/AI on society. Impact of their work on society.
Training Training received as part of the impact sourcing model. Training that could be helpful for
future jobs (languages, software, etc).
IMPOSITION OF MEANING
IMPACT Chance Chances to learn, to work in the desired field. Chances related to impact sourcing companies.
SOURCING Opportunities offered by companies to their employees.
Impact on lives Impact of job on worker’s lives. What this job means for them and how their lives have changed
with this job.
Closeness to management Indicators for flat hierarchies. Accessibility to management. Possibility to talk directly and
honestly to management.
Recruting How the interviewee was recruited to work in the company. How she/he got to work there.
Mobility chances Chances to grow and/or be promoted within the company.
Misunderstanding When the interviewer asks about biases and the interview partner offers an answer showing
BIAS they have misunderstood the question.
Unawareness Not knowing what the concept of bias refers to. Not being aware of biases as a hazard related
to their tasks.
Not bias related Claim that biases are not relevant for the type of projects they handle within the company
Speed Optimization of processes, so that they are faster and the client is satisfied.
CAPITALISTIC Company QA Quality Assurance Processes. Especially QA as a selling argument for clients. Control as a
LOGISTICS structure selling point.
Productivity Processes related to increasing or controlling productivity, making workers produce more.
Flexibility Flexibility in working time, work place; not as a fixed/ regular employee; work with children
etc.
Roles The division of roles and tasks in the work/ in the company.
Market’s logics Things that are done in a certain way to go according to the demands of the market.
Worker’s struggle Workers asking for better conditions/benefits. Expressions of disagreement with aspects of
the working conditions.
Control Control mechanisms. Control of results. Control of employees.
Clients All things clients. Communication with clients, desires of the clients, relation to clients. Client
as king.
Plans Plans for the future at a personal level. Hopes and dreams.
PERSONAL Vulnerability Related to the vulnerable background of workers. Personal struggle/difficulties.
SITUATION Previous work experience What workers did before becoming labelers.
Education Related to workers’ background. Achieved academic level. Plans for further education.
4.1 Project 1: Drawing Polygons

This project, conducted by S2 in Bulgaria, consisted of analyzing, marking, and labeling pictures of
vehicles for a Spanish client. The client had provided several image collections, each containing
photographs of damaged car exteriors. The source of the images and the exact purpose of the
dataset were unclear for the Bulgarian team. Only Eva, the founder of the annotation company,
was capable of sharing some vague information about the client and the planned product:
“I think it’s a company working for insurance companies. So, they are providing
insurance companies with a tool or a service I believe that’s going to be in the form of
an app that their users, who are using the insurance or maybe car rental companies
48
and so on, can use in order to report damages. And so, these damages can be processed
very quickly and identify them automatically. I think this is the final goal. I believe
they are in the very early stage still. They are still trying to gather enough photos and
train enough, use enough data to train their models.”
Eva was in charge of client communications and the final quality control for every project at S2.
Ali, an annotator who generally acted as mediator between Eva and the team, worked on the
project as well. Besides regular annotation tasks, Ali was in charge of selecting the annotators
for this project, briefing them with the instructions, and answering questions. For this purpose,
he maintained a project-specific Slack channel. Daily, he monitored the progress made by every
labeler and reviewed the annotated pictures. Despite his prominent role, Ali had no information
about the planned product or the purpose of the annotations. Lack of information and general
unawareness of the machine learning pipeline was very common among annotators at S2 and, to a
lesser extent, at S1 in Argentina. Eva agreed with this observation and added:
“I think that in many cases it’s too difficult for a lot people to imagine what’s the data
they’re working on for.”
Besides Eva, none of the annotators we interviewed in S1 could relate the terms “machine learning”
or “artificial intelligence” to their work. Ali did not inquire about further details beyond the specific
instructions for the “car accidents project” because the instruction sent by clients normally provided
“all we need” to complete annotation tasks:
I: “But why does the client need all these pictures annotated like this? Do you know?”
B: “No. But I think ... I am not sure, because I don’t ask about this.”
In this case, the client had sent a PDF document containing step-by-step instructions and example
pictures. Moreover, the client had provided the platform where the segmentation and annotation
tasks were to be performed. The platform had been specially developed for this purpose and tailored
to the client’s needs.
The first task for the annotators was to select the part of the vehicle that appeared damaged
from a sidebar containing different classes (e.g., door, tire, hood). After that, they drew a polygon
around the damaged area. The drawing was very time-consuming, and Ali seemed to pay special
attention to the correct demarcation of the damaged areas. After drawing the polygon, they would
classify the type of damage and its severity. Unfortunately, the company commissioning these
annotations requested that no further details about the specific commands and labels are shared in
this investigation as the company considers them one of their strategic advantage.
Apart from Eva and Ali, five annotators working remotely completed the project team. For the
general briefing and the project kick-off, they were summoned to the office. Eva explained the
client’s instructions in English and showed some examples of the pictures and the procedure. Ali
translated into Arabic for annotators with low English skills. Afterward, each annotator sat at
one of the work stations in the office and tested the task while Ali walked around observing how
annotators performed, answering questions, and continuously commenting on how easy the work
was. For the duration of the project, annotators working remotely would resolve questions with Ali
via Slack. Occasionally, if Ali was not satisfied with the quality of the polygons, he would summon
the annotators to the office and work with them for a few hours. The same procedure was followed
in cases of visible labeling inconsistencies among workers. Eva highlighted the importance of these
“alignment meetings” to ensure the uniformity of the labels through the standardization of workers’
subjectivities:
49
“Normally, issues in data labeling do not come so much from being lazy or not doing
your work that well. They come from a lack of understanding of the specific require-
ments for the task or maybe different interpretations because a lot of the things ... Two
people can interpret differently so it’s very important to share consistency and like
having everyone understand the images or the data in the same way [...]. But because
a lot of these tasks are not that straightforward, it’s just not ... It’s not just choosing A
or B. It’s more like okay for example I have this car, where do I track the exact scratch
or deformation? What kind of a level is it? Like, it’s a little bit more complicated and
that’s why it’s better to invest in the human capability and let’s say the standardization
of everyone’s understanding.”
4.2 Project 2: Building Categories

This project was conducted at S1, the Argentine annotation company. It constituted a test for the
acquisition of an important client, namely a sizable local corporation. The potential client had
simultaneously outsourced the project with different annotation companies, planning to sign a
contract with the best performing team.
We find this project to be particularly interesting as it constitutes an exception to the usual
procedure of labeling data according to categories instructed by clients. In this case, the annotators
were in charge of developing a classification system for the annotations. Concretely, the task
consisted of analyzing camera footage, counting, and classifying vehicles driving in a gas station.
The annotators were in charge of coming up with logical, mutually exclusive categories for the
labeling.
Three annotators, a reviewer, a team leader, and a quality assurance (QA) analyst sat together to
analyze the first, 60-minutes-long video. They started by counting all vehicles driving in the gas
station. After a few minutes, some analysts lost track and claimed they did not expect “just counting”
to be so complicated. To simplify the task, the team leader suggested establishing categories first,
so that each annotator could focus on counting only one category. They promptly agreed on five
categories, namely cars, buses, trucks, motorcycles, and vans. While counting, new categories such
as pick-ups, SUVs, and semi-trucks were suggested by annotators, approved by the team leader and
the QA analyst, and finally added to the list. Also, several questions arose: Can SUVs be considered
cars? Do ambulances and police cars constitute categories for themselves?
Several team members expressed being worried about not knowing the client’s exact expectations.
“We are not really used to this kind of ambiguity” reviewer Elisabeth said. She also shared an
experience from a former project, where inconsistencies between the interpretations of client and
annotators had arisen, even though the client had provided clear instructions for the annotations.
On that occasion, Elisabeth had been entirely sure that her interpretation was right until the client
corrected her work: “and you think you’re doing everything right until the client comes and says,
‘No, that’s all wrong!’” The client’s correction had led Elisabeth to the conclusion that “I had been
wrong all along. It put us [the team] back on track.”
As for the “gas station project”, Nati, the QA analyst, announced to the team that, despite the
freedom offered by the project, they would proceed "as usual" to resolve questions and, most
importantly, to assess the correctness of allocated labels. Upon request of the interviewer, reviewer
Elisabeth described the usual process in detail:
“Whenever I cannot resolve the questions annotators bring to me, I ask the leader. If
the leader cannot solve them either, we ask QA. Otherwise, they ask the contact person
at the client’s company.”
50
Interviewer: “So, the client has the final say?”

Elisabeth: “Yes. And the client surely has their hierarchies to discuss a solution as well.”
Despite the room offered to the team by the “gas station project” to shape data according to their
own judgment, the client’s figure seemed to be tacitly present at all times to orient annotators’
subjectivities. QA analyst Nati summarized this observation most clearly:
“We try to guess what the client would value the most, what will interest them, trying
to put ourselves in their shoes, thinking, imagining [the client] wants this or that.”
In her QA analyst role, Nati also paid special attention to optimizing the time needed to annotate
each video. Having one annotator counting only one category significantly reduced task completion
time but raised important questions about quality control and cost optimization, as Nati pointed
out:
“How are we going to check for accuracy if only one annotator is responsible for each
class and we do not have enough reviewers?”
Nati additionally mentioned that the client would not accept the costs of cross-checking results.
For Nati and the QA department, this project involved two challenges: the first was guessing
what the client was expecting from the annotations and which taxonomy would best serve that
expectation. The second consisted in optimizing the performance of annotators to present a
competitive offer to the potential client. Indeed, the Buenos Aires-based company seemed to put
much effort into developing better ways of measuring performance and output quality. In this sense,
Nati acknowledged the singularity of the “gas station project” as being uncommonly ambiguous
compared to the rest of their projects which generally included clear guidelines for the labels.
However, she still saw a good opportunity emerging from the open character of the project:
“This is where the QA department makes its move and says, okay, we can measure
all this. We try to offer value [...] going into details to see what we can measure and
offer the client something they would value because then we also participate in the
‘farming’ process. If we offer clients valuable QA data, they will probably buy more
hours from us.”
4.3 Project 3: Classifying Faces

The third project brings us back to the Bulgarian company (S2). It dealt with collections of images
depicting people. All images resembled those commonly found in a mobile phone’s gallery: several
selfies, group pictures of what seemed to be a family, a couple, a child holding a cat. Eva, the founder
of S2, explained that the dataset was intended for a facial recognition model for mobile phones.
The annotations had been commissioned by a local computer vision company.
The first task for the annotators consisted of classifying the faces in the images according to a
very concise set of instruction sent via email by the commissioning client:
(1) For each photo, draw a rectangular bounding box around each face in the photo.
(2) Annotate each such face with the following labels: Sex: male or female. Age: baby
(0-2 years old), boy or girl (2-16 years old), man or woman (16-65 years old), old man
or old woman (65+ years old). Ethnicity: Caucasian, Chinese, Indian, Japanese, Korean,
Latino, Afroamerican.
Additionally, five freely chosen keywords were to be attached to each image.
Founder Eva was in charge of the general quality control. Apart from her, three annotators
completed the team. Ali, one of the annotators, also managed the project, mostly briefing annotators,
tracking the completion of the task, and revising the bounding boxes. Despite the project’s sensitive
51
character, Eva did not have further information about the images’ provenance and whether the
people depicted were aware their picture would be used in a computer vision product.
Because of the highly subjective character of this project and the specificity of the classes
provided, we insistently asked annotators how they were able to differentiate and assign such
labels that were, at least to the eye of the researcher on the field, not at all straightforward. Ali
reacted very surprised to this kind of question, almost as if he would not understand our strong
interest in this topic:
“It’s not difficult, it’s easy! Because all information here [shows the email with the
instructions]. You have information. The woman is between 15 to 65, I think. The old
woman, 65 to more. Old woman and old man.”
Interviewer: “Yeah, but that’s what I’m saying, I would have had difficulties telling
whether the person in the picture is over 65.”
Ali: “No, no, because you see this picture, you make the zoom, and you see the face
[he zooms in and points at the area around the eyes, probably trying to show wrinkles
that are hard to recognize as such]. Everything is clear!”
Furthermore, Ali stated that this project was significantly easier to manage than others, given
the fact that annotators had not raised any questions or difficulties: “I think this is a project
nobody asked me about,” he said. Ali’s remarks coincide with the claims of the other annotators
involved: the classification of the people shown in the images in terms of race, age, and sex seemed
straightforward to them. The annotator in charge of keywording also claimed that this task was
very easy because the attributes were, in most of the cases, “pretty obvious.” When asked what
would be the procedure if they were unsure about what labels to assign, Eva, the founder, answered
that they would immediately seek the client’s opinion:
“In this case we usually obey everything that they say because you know their inter-
pretations is usually the one that makes sense.”
Later on, Eva referred to “the mobile libraries project” as one of the most “controversial” projects
in her company’s portfolio. While discussing bias-related issues and how these can affect labels,
she also highlighted the importance of raising moral questions around this type of projects and
working in solutions for undesirable biases. However, Eva argued that her clients would probably
not be interested in investing time or money in these issues. Similarly, Anna, the intern in charge
of conducting an impact assessment at S2, commented on clients’ general attitude towards ethical
issues related to the commissioned labels:
“I think even if they knew they should be sensitive or should be a little conscious about
these things I think it works in their favor to not be. It’s totally about digital ethics
but I feel like it maybe from a company perspective [...] that they would prefer an
outsourcing company that doesn’t ask too many questions.”
Anna also allocated some responsibility with the annotation companies. She commented on the
difficulty of explaining sensitive categories, such as race and gender, when workers and management
have different mother tongues. In S2, around 98% of the workers are refugees from the Middle East:
“Yes, I have observed the [mobile libraries] project ... I feel a lot of it is not that the
company is not aware of these things, but I think it’s maybe too complicated to explain
to refugees. I think some of us are lacking the vocabulary that would translate all these
nuances. [...] And I’ve never heard any of them... any of the refugees ask... I think that’s
also another factor. I think it’s a combination of a lot of these: The difficulties to explain
it and, maybe, the lack of curiosity or explicit curiosity on their end.”
52
4.4 Salient Observations

4.4.1 Standardization. At both annotation companies and in all projects observed, data annotation
was performed following the requirements and expectations of commissioning clients. Guidelines
were generally tailored to meet the requirements of the product that would be trained on the
annotated datasets, its desired outcome, and its revenue plan. Instructions and briefings, while
providing orientation, aimed at shaping the interpretation of data and, as described by Eva in
section 4.1, “standardizing everyone’s understanding.” As shown in Projects 1 and 2, quality assur-
ance constituted another decisive instance towards standardization and compliance with clients’
expectations. Encouraged to define what quality means in the context of their company, informants
at both locations (S1 and S2) and among the experts (S3) gave more or less different versions of a
similar answer: quality means doing what the client expects.
4.4.2 Layering. As shown in project 2, many roles and departments participate in annotation
assignments. Annotators occupy the lower layer of the hierarchical structure where the actual
labeling of data is carried out (see Fig. 1). In a more or less official way, every company has at least two
more layers where control is exercised: reviewers and quality assurance analysts (QA). In between
reviewers and QA, some companies also place team leaders, tech leaders, and project managers.
Finding more layers is possible, depending on the project’s and company’s size. As described in
Project 2, large corporations sometimes outsource the labeling of the same dataset with different
annotation companies. The results will later be controlled and compared. Also, important clients
often hire external consultants to evaluate the performance of annotation companies independently.
Furthermore, some annotation companies outsource parts of large labeling projects, if they lack
the human resources to complete the task. These practices add even more layers to the annotation
process. According to the experts (S3) and practitioners (S4) we interviewed, the layered character of
these procedures is not exclusive of S1 and S2 but can be generalized to other annotation companies.
Client
guidelines
ex
t
Co erna
ns l
QA ult qua
an lity
t
PM/
outsourcing
Team
Annotators
leader
revisi
on Reviewer
briefing
annotation
Annotators
Ma
rke
t
Fig. 1. Multiple actors on several layers of classification participate in processes of data annotation. The
layers are hierarchical and involve different levels of payment, occupational status, and epistemic authority.
53
4.4.3 Naturalization. Our findings show that the top-down ascription of meanings to data through
multi-layered structures were, for the most part, not perceived as an imposition by annotators. The
interviews are abundant in statements such as “the labels are generally self-evident,” and “the work
is very straightforward.”
The labels commissioned by clients and instructed by managers seemed to coincide in most
cases with annotators’ perceptions. In consequence, labels were hardly ever put under scrutiny
or discussed. Moreover, annotators and managers generally perceived clients to be the ones to
know exactly how data was supposed to be labeled since they held decisive information about
the product they aimed at developing and the corresponding business plan. Additionally, in some
cases, the image data to be labeled had been directly gathered by the commissioning company,
which reinforced the idea that the client would know best how to interpret those images. This
was reported by Eva (Founder of S2) in relation to a project involving satellite imagery. These
perceptions contribute to the naturalization of the layers of classification depicted in Fig. 1. As
illustrated by the projects described throughout this section, annotators broadly resolve doubts or
ambiguities regarding the labels by asking their superiors. Both at S1 and S2, we found that the
vertical resolution of questions prevailed over horizontal discussions and inter-rater agreement.
4.4.4 Profit-Orientation. Annotation companies mostly seek to optimize the speed and homogene-
ity of annotations to offer reasonable prices in the competitive market of outsourcing services.
Several annotators (especially in S2) stated that project deadlines were often so short, that they
were difficult to meet. Looking to cope with such a fast pace, workers relied even more on clear
guidelines and efficient tools. Several informants at S1 and S2 stated that they found their work
easier when clients provided clear instructions, a rather simple platform for the annotations, and a
smaller number of classes to label. As shown by the “gas station project” (section 4.2), annotators
tended to feel overwhelmed otherwise. In this sense, hierarchical structures did not solely aim at
constraining workers’ subjectivity but also provided orientation.
As expected from for-profit organizations, commissioning clients and annotation companies
are primarily concerned with product and revenue plans. Moreover, as stated by Eva and Anna in
section 4.3, some annotation companies may perceive a general disinterest of clients regarding the
application of ethics-oriented approaches, i.e., transparent documentation and quality control for
biased labels. A similar observation was reported by a QA analyst in S2 and confirmed by the four
experts interviewed (S3). However, this does not mean that detrimental intentions guide clients. It
merely states that ethical approaches involve monetary costs that clients cannot or will not bear. In
short, several informants in S1, S2, S3, and S4 described an environment where market logics and
profit-oriented priorities get inscribed in labels, even in projects involving sensitive classifications,
as described in section 4.3.
5 DISCUSSION
Our observations show that annotators’ subjectivities are, in most of the cases, subdued to interpre-
tations that are hierarchically instructed to them and imposed on data. We relate this process to the
concept of symbolic power, defined by Pierre Bourdieu [13] as the authority to impose arbitrary
meanings that will appear as legitimate and part of a natural order of things. Arbitrariness is, in
Bourdieu’s conception [12, 14, 28], not a synonym of randomness. It refers to the discretionary
character of imposed classifications and their subsumption to the interests of the powerful.
A twofold naturalization in the Bourdieusian sense [9] seems to facilitate the top-down imposition
of meaning in data annotation: First, we found that classifications used to ascribe meaning to data
are broadly naturalized. Annotators mostly perceive the labels instructed by clients and reassured
by managers and QA as correct and self-evident. In a recent investigation, Scheuerman et al.
54
present a similar observation, describing how race and gender categories are generally presented
as indisputable in image datasets [72]. In most of the cases observed by us, annotators, managers,
and clients do not perceive assigned classifications as arbitrary or imposed. Hence, the labels
are hardly ever questioned. Second, we have observed that the epistemic authority of managers
and clients is also broadly naturalized by annotators. They are perceived to know better what
labels correctly describe each data point. The higher the position occupied by an actor, the more
accepted and respected their judgments. Even if annotators or management ever perceive principles
of classifications as opposing personal or corporate values, the view persists that “the one who
is paying” has the right to impose meaning. This way, clients have the faculty to impose their
preferred classifications, just as they have the financial means to pay for labelers to execute
that imposition. As illustrated by the “gas station project” in section 4.2, workers might even
feel overwhelmed when clients do not overtly exercise their authority to instruct principles of
classification. When annotators are challenged with making sense of the data themselves, the main
rationale becomes “what would the client want?” in contrast to “what is contained in this data?”. In
this twofold naturalization lies, we argue, the efficacy of interpretations imposed on data: labels
must be naturalized and thus perceived as self-evident if actors are to misrecognize the arbitrariness
of their imposition [9].
As shown by our findings, the standardization of annotation practices and labels is assured
throughout several layers of classification and control. The positions are depicted in Figure 1
as hierarchical layers positioned one above the other because they involve different levels of
responsibility, payment, and occupational status. The number of layers, actors, and iterations
involved hinders the identification of specific responsibilities. Moreover, no information regarding
actors involved and criteria behind data-related decisions is registered. Annotation steps and
iterations remain broadly undocumented. Accountability is diluted in these widespread practices.
A problematic implication is that this multi-layered standardization process is hardly ever oriented
towards social responsibility and usually responds to economic interests only [49]. There is no
intention, however, to imply here that standardization is fundamentally harmful or that detrimental
intentions lead the actors involved. We rather aim at showing how power structures can be stabilized
through imposed standards [16] and argue that standardization can be dangerous if it is solely
guided by profit maximization.
In this sense, we argue that the discussion on workers’ subjectivity and personal values around
data annotation should not let us researchers forget that datasets are generally created as part of
large industrial structures, subject to market vicissitudes, and deeply intertwined with naturalized
capitalistic interests. The challenge here is “to explicate the assumptions, concepts, values, and
methods that today seem commonplace” [8] in this (and other) forms of service.
The main contribution of our investigation is the introduction of a power-oriented perspective
to discuss the dynamics of imposition and naturalization inscribed in the classification, sorting,
and labeling of data. Through this lens, we shed light on power imbalances informing annotation
practices and shaping datasets at their origins. Our main argument is that power asymmetries
inherent to capitalistic labor and service relationships have a fundamental effect on annotations.
They are at the core of the interpretation of data and profoundly shape datasets and computer
vision products.
There are at least two close-connected reasons why imposition and naturalization in the context
of data creation are socially relevant and, in a way, different from power imbalances enacted through
work practices in other settings: First, data practices involve particular ethical concerns because
assumptions and values that inform data can potentially have devastating effects for individuals
and communities [34, 63]. Algorithms trained on data that reproduces racists, sexist, or classist
55
classifications can reinforce discriminatory visions [62] “by suggesting that historically disadvan-
taged groups actually deserve less favorable treatment” [6]. Moreover, data about human behavior
is increasingly sold for profit [81], which could result in surveillance [81] and exploitation [26].
Second, data-related decisions define possibilities for action, by making certain aspects of reality
visible in datasets, while excluding others [15, 68]. This is relevant for state management and
policy, i.e., to pinpoint places where intervention or allocation of resources is needed. However,
the tendency of classification practices towards the erasure of residual categories [16] can cause
tension and even be harmful for individuals who remain unseen or misclassified by data-driven
systems [19, 71].
NEED FOR STANDARDIZED

LABELS AND PROCESSES
IN ANNOTATION
(contextual condition)
STANDAR DIZATION
OF WORKER'S SUBJECTIVITIES
(strategy)
PREVALENCE OF INSCRIPTION OF
LAYERING OF POWER ASSYMETRIES
PROFIT-ORIENTED IMPOSITION OF MEANING CLASSIFICATION INSTANCES IN LABELS, DATA, AND
PRIORITIES IN
DATA CREATION
IN DATA ANNOTATION (strategy) SYSTEMS
(causal condition) (phenomenon) (consequence)
NATUR ALIZATION
OF MEANINGS AND AUTHORITY
(strategy)
SENSEMAKING OF DATA
BY HUMAN WORKERS
(intervening condition)
Fig. 2. Paradigm model resulting from the process of selective coding. It depicts the top-down allocation of
meanings, its stabilization through annotation practices, and its effects on data (derived from the Grounded
Theory Paradigm Model, by Corbin and Strauss). [25]
5.1 Implications for Practitioners

While annotation companies and their clients may or may not be aware that they are actively
shaping data, the opacity surrounding embedded interests and preconceptions [72] is a significant
threat to fairness, transparency, accountability, and explainability. Therefore, it is important that
practitioners, i.e., corporations commissioning datasets and management at annotation companies,
take steps to reflect, document, and communicate their subjective choices [38, 61, 65, 66, 72].
Promoting the intelligibility of datasets is fundamental because they play a key role in the training
and evaluation of ML systems. Understanding datasets’ origin, purpose, and characteristics can
help better understand the behavior of models and uncover broad ethical issues [78].
Recent research work has highlighted the importance of structured disclosure documents that
should accompany datasets [7, 38, 39, 47, 55, 78]. Fortunately, the machine learning research commu-
nity has begun to promote similar reflexive practices: Following Pineau’s suggestion [69], authors
of NeurIPS and ICML conference are now requested to include a reproducibility checklist which
encourages “a complete description of the data collection process, such as instructions to annota-
tors and methods for quality control” if a new dataset is used in a paper. NeurIPS further requires
authors to disclose funding and competing interests. They are also asked to discuss “the potential
broader impact of their work, including its ethical aspects and future societal consequences.” These
56
conferences are highly influential for ML practitioners and facilitate the adoption of the latest
machine learning capabilities. It is certainly our hope that they will also inspire them to adopt such
reasonable best practices and to engage in reflexive documentation.
In line with previous literature [7, 38, 49, 74, 78], we advocate for the documentation of purpose,
composition, and intention of datasets. Moreover, the structures, decisions, actors, and frameworks
which shape data annotation should be made explicit [39, 72]. We furthermore propose orienting
documentation towards a reflexion of power dynamics. D’Ignazio and Klein [30] propose asking who
questions to examine how power operates in data science. In this vein, we propose that disclosure
documents include answers to questions such as: Whose product do the annotations serve, and how?
Whose rationale is behind the taxonomies that were applied to data? Who resolved discrepancies
in the annotation process? Who decided if labels were correctly allocated?
We argue that the annotation process already begins as clients transform their needs and
expectations into annotation instructions. Therefore, the responsibility for documenting should not
be solely placed with annotators but should be seen as a collaborative project involving annotation
companies and commissioning clients. Given the hierarchical structures and power imbalances
described in this paper, we find it extremely important that clients keep a record of the instructions
that were given to annotators, the platforms on which annotations were performed, and the reasons
for that platform choice, as well as the procedure employed for solving ambiguities, creating
homogeneity, and establishing inter-annotator agreements. Extending dataset factsheets with a
power-aware perspective could make power asymmetries visible and raise awareness about meaning
impositions and naturalization. Yet, it is vital that documentation checklists are not prescriptive
and produced exclusively in the vacuum of academia [38]. Instead, disclosure documents should be
developed in an open and democratic exchange with annotation companies and their clients to
accommodate real-world needs and scenarios [55].
Annotation companies and their clients might be reluctant to implement such a time-consuming
documentation process. Moreover, they may regard some of the information as trade secrets,
especially if it involves details about the intended product or if the structuring of the annotation
process is considered a strategic advantage. We argue that allocating resources for documentation
could nevertheless bring three pay-offs for organizations:
The first benefit is that proper documentation can foster deliberative accountability [67] and
improve inter-organizational traceability, for instance, between annotation companies and clients.
In addition, transparent documentation can help address the problematic dilution of accountability
as a result of various actors and layers in the annotation process. In the context of this service
relationship, accountability involves not only specific individuals but also organizations and includes
factors such as organizational routines and processes of value co-creation [50]. Given the power
imbalances that are inherent to this relationship [8], annotation companies could be motivated to
keep track of decisions and procedures in the event of discrepancies with clients.
The second benefit is that documentation can facilitate compliance with regulations such as the
GDPR and especially the “Right to Explanation” [67]. Serving as an external motivation, legal
frameworks and regulations urge companies to put transparency as well as societal and ethical
consequences of their products and services above the rationale of profit-maximization [49]. If there
is no legal incentive and companies perceive transparency as coming at the cost of profit-oriented
goals (as shown in our data), independently created transparency certifications and quality seals
for datasets may provide an additional incentive given the momentum created around FATE AI.
The third benefit is that documentation may create a long term business asset because knowledge
about practical data work is made explicit and persistent. Without documentation, such knowledge is
often confined to workers with the “craftsmanship” to make situated and discretionary decisions [66],
bearing the risk of knowledge loss due to worker flow or lack of traceability. At the same time,
57
documentation can have analytical value, improve communication in interdisciplinary teams, and
ease comprehension “for people with diverse backgrounds and expertise” [61].
5.2 Implications for (CSCW) Researchers

Our research highlights the relation between human intervention and hierarchical structures in
processes of data creation. It shows that power imbalances not only translate into asymmetrical
labor conditions but also concretely shape labels and data. We firmly believe that researchers
studying socio-technical systems in general, and data practices, in particular, could benefit from
including a similar, power-aware perspective in their analysis. Such a perspective would primarily
aim at making asymmetrical relations visible. Making power visible means exposing naturalized
imbalances that get inscribed in datasets and systems [30].
We propose four (interconnected) reasons for integrating such a perspective into research:
First, this perspective could contribute to making work visible [30, 44, 75]. Especially in the case
of machine learning systems where the enthusiasm of technologists tends to render human work
invisible [44], research should emphasize the value of the human labor that makes automation
possible. Furthermore, making “humans behind the machines” [53] visible could help contest any
pretension of calculative neutrality attributed to automated systems.
Second, this paper argues that power relationships inscribed in datasets are as problematic
as individual subjectivities. A power-oriented perspective allows researchers to “shift the gaze
upwards” [5] and move beyond a simplistic view of individual behaviors and interpretations that,
in many cases, could end up allocating responsibilities with workers exclusively. A view into
coroporate structures and market demands can offer a broader perspective to this line of research.
Third, the investigation of organizational routines and hierarchies could help researchers ap-
proach the real-world practice of data work [67], develop context-situated recommendations, and
assess their applicability in corporate scenarios. This could help establish open and democratic dis-
cussions between researchers and practitioners regarding the conception of solutions for undesired
data-related issues [38, 55].
Finally, rigorous reflexion and documentation of power dynamics is not only advisable for practi-
tioners working with data but is also fundamental for researchers investigating those work practices.
Acknowledging that, just like data, theories are not discovered, but they are co-constructed by
researchers and participants [76] is a significant step in this direction. Throughout this investigation,
the constructivist variation of grounded theory [21] has constituted a fantastic tool to methodically
reflect on the researchers’ perspectives, interpretations, and position.
5.3 Limitations and Future Work

This paper has focused on the annotation of image data for machine learning as performed within
impact sourcing companies. While our current results are bound to this context, the framework
presented here could inspire further (comparative) research involving diverse actors in other
annotation settings, such as crowdsourcing platforms.
6 CONCLUSION
This paper has presented a constructivist grounded theory investigation of the sensemaking of
data as performed by data annotators. Based on several weeks of fieldwork at two companies
and interviews with annotators, managers, and computer vision practitioners, we have described
structures and standards that influence the classification and labeling of data. We aimed at contesting
the supposed neutrality of data-driven systems by setting the spotlight on the power dynamics
that inform data creation.
58
We found that workers’ subjectivity is structurally constrained and profoundly shaped by

classifications imposed by actors above annotators’ station. Briefings, annotation guidelines, and
quality control all aim at meeting the demands of clients and the market. We have argued that the
creation of datasets follows the logics of cost effectiveness, optimization of workers’ output, and
standardization of labels, often at the expense of ethical considerations.
We have observed the presence of multiple instances of classification, with diverse actors among
several hierarchical layers that are related to the possession of capital. We have argued that the
many layers, actors, and iterations involved contribute to the imposition of meaning and, finally, to
the dilution of responsibilities and accountability for the possible harms caused by arbitrary labels.
Furthermore, our findings have shown that workers naturalize the imposed classifications as well
as the epistemic authority of those actors higher in the hierarchy. Our observations indicate that
power asymmetries, which are inherent to labor relations and to the service relationship between
annotation companies and their clients, fundamentally shape labels, datasets, and systems.
We have furthermore discussed implications for practitioners and researchers and advocated for
the adoption of a power-aware perspective to document actors and rationale behind the meanings
assigned to data in annotation work. Finally, we have emphasized the importance of adopting a
similar power-aware perspective in the CSCW research agenda, not only as a possible focus for
future work but also as a tool for reflecting on researchers’ own position and power.
7 ACKNOWLEDGEMENTS
Funded by the German Federal Ministry of Education and Research (BMBF) – Nr. 16DII113f. We
would like to acknowledge the individuals and companies participating in this study: we dearly
thank them for their openness! Special thanks to Philipp Weiß for his support whenever we
struggled with formatting tables in Overleaf. We wish to thank our anonymous reviewers for their
feedback, and Enrico Costanza, Walter S. Lasecki, Leon Sixt, Florian Butollo, Matti Nelimarkka, and
Alex Hanna for valuable comments on earlier versions of this work. Special thanks to our research
group leader Diana Serbanescu and PI Bettina Berendt for their continuous support for this project.
REFERENCES
[1] Muhammad Ali, Piotr Sapiezynski, Miranda Bogen, Aleksandra Korolova, Alan Mislove, and Aaron Rieke. 2019.
Discrimination Through Optimization: How Facebook’s Ad Delivery Can Lead to Biased Outcomes. Proc. ACM
Hum.-Comput. Interact. 3, CSCW (Nov. 2019), 199:1–199:30. https://doi.org/10.1145/3359301
[2] Ali Alkhatib and Michael Bernstein. 2019. Street-Level Algorithms: A Theory at the Gaps Between Policy and Decisions.
In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (CHI ’19). Association for Computing
Machinery, New York, NY, USA, 530:1–530:13. https://doi.org/10.1145/3290605.3300760
[3] Luis Araujo and Martin Spring. 2006. Services, Products, and the Institutional Structure of Production. Industrial
Marketing Management 35, 7 (Oct. 2006), 797–805. https://doi.org/10.1016/j.indmarman.2006.05.013
[4] Paul Baker and Amanda Potts. 2013. ‘Why Do White People Have Thin Lips?’ Google and the Perpetuation of
Stereotypes via Auto-Complete Search Forms. Critical Discourse Studies 10, 2 (May 2013), 187–204. https://doi.org/10.
1080/17405904.2012.744320
[5] Chelsea Barabas, Colin Doyle, JB Rubinovitz, and Karthik Dinakar. 2020. Studying up: Reorienting the Study of
Algorithmic Fairness around Issues of Power. In Proceedings of the 2020 Conference on Fairness, Accountability, and
Transparency (FAT* ’20). Association for Computing Machinery, Barcelona, Spain, 167–176. https://doi.org/10.1145/
3351095.3372859
[6] Solon Barocas and Andrew D. Selbst. 2016. Big Data’s Disparate Impact. California Law Review 104, 3 (2016), 671–732.
https://doi.org/10.15779/Z38BG31
[7] Emily M. Bender and Batya Friedman. 2018. Data Statements for Natural Language Processing: Toward Mitigating
System Bias and Enabling Better Science. Transactions of the Association for Computational Linguistics 6 (2018), 587–604.
https://doi.org/10.1162/tacl_a_00041
[8] Jeanette Blomberg and Chuck Darrah. 2015. Toward an Anthropology of Services. The Design Journal 18, 2 (2015),
171–192. https://doi.org/10.2752/175630615X14212498964196
59
[9] Pierre Bourdieu. 1977. Outline of a Theory of Practice. Cambridge University Press, Cambridge. https://doi.org/10.
1017/CBO9780511812507
[10] Pierre Bourdieu. 1985. The Social Space and the Genesis of Groups. Theory and Society 14, 6 (1985), 723–744.
https://doi.org/10.1007/BF00174048
[11] Pierre Bourdieu. 1989. Social Space and Symbolic Power. Sociological Theory 7, 1 (1989), 14–25. https://doi.org/10.
2307/202060
[12] Pierre Bourdieu. 1990. The logic of practice (reprinted ed.). Polity Press, Cambridge.
[13] Pierre Bourdieu. 1992. Language and Symbolic Power (new ed.). Blackwell Publishers, Cambridge.
[14] Pierre Bourdieu. 2000. Pascalian Meditations. Stanford University Press, Stanford, Calif.
[15] Geoffrey C. Bowker. 2000. Biodiversity Datadiversity. Social Studies of Science 30, 5 (Oct. 2000), 643–683. https:
//doi.org/10.1177/030631200030005001
[16] Geoffrey C. Bowker and Susan Leigh Star. 1999. Sorting Things out: Classification and Its Consequences. MIT Press,
Cambridge, Mass.
[17] danah boyd and Kate Crawford. 2012. Critical Questions for Big Data: Provocations for a Cultural, Technological, and
Scholarly Phenomenon. Information, Communication & Society 15, 5 (June 2012), 662–679. https://doi.org/10.1080/
1369118X.2012.678878
[18] C. E. Brodley and M. A. Friedl. 1999. Identifying Mislabeled Training Data. Journal of Artificial Intelligence Research 11
(Aug. 1999), 131–167. https://doi.org/10.1613/jair.606
[19] Joy Buolamwini and Timnit Gebru. 2018. Gender Shades: Intersectional Accuracy Disparities in Commercial Gender
Classification. In Proceedings of the 1st Conference on Fairness, Accountability and Transparency, Vol. 81. PMLR, 77–91.
[20] Ryan Burns. 2019. New Frontiers of Philanthro-capitalism: Digital Technologies and Humanitarianism. Antipode 51, 4
(April 2019), 1101–1122. https://doi.org/10.1111/anti.12534
[21] Kathy Charmaz. 2006. Constructing Grounded Theory: A Practical Guide through Qualitative Analysis. Sage Publications,
London ; Thousand Oaks, Calif.
[22] Justin Cheng and Dan Cosley. 2013. How Annotation Styles Influence Content and Preferences. In Proceedings of the
24th ACM Conference on Hypertext and Social Media - HT ’13. Association for Computing Machinery, Paris, France,
214–218. https://doi.org/10.1145/2481492.2481519
[23] Angèle Christin. 2016. From Daguerreotypes to Algorithms: Machines, Expertise, and Three Forms of Objectivity.
SIGCAS Computers and Society 46, 1 (2016), 27–32. https://doi.org/10.1145/2908216.2908220
[24] Danielle Keats Citron and Frank Pasquale. 2014. The Scored Society: Due Process for Automated Predictions. Washington
Law Review 89, 1 (2014).
[25] Juliet M. Corbin and Anselm L. Strauss. 2015. Basics of Qualitative Research: Techniques and Procedures for Developing
Grounded Theory (fourth edition ed.). SAGE, Los Angeles.
[26] Nick Couldry and Ulises A. Mejias. 2019. Data Colonialism: Rethinking Big Data’s Relation to the Contemporary
Subject. Television & New Media 20, 4 (May 2019), 336–349. https://doi.org/10.1177/1527476418796632
[27] Kate Crawford and Trevor Paglen. 2019. Excavating AI. https://www.excavating.ai.
[28] Ciaran Cronin. 1996. Bourdieu and Foucault on Power and Modernity. Philosophy & Social Criticism 22, 6 (Nov. 1996),
55–85. https://doi.org/10.1177/019145379602200603
[29] Hannah Davis. 2020. A Dataset Is a Worldview. https://towardsdatascience.com/a-dataset-is-a-worldview-
5328216dd44d.
[30] Catherine D’Ignazio and Lauren F. Klein. 2020. Data Feminism. The MIT Press, Cambridge, Massachusetts.
[31] Ravit Dotan and Smitha Milli. 2020. Value-Laden Disciplinary Shifts in Machine Learning. In Proceedings of the 2020
Conference on Fairness, Accountability, and Transparency (FAT* ’20). Association for Computing Machinery, Barcelona,
Spain, 294. https://doi.org/10.1145/3351095.3373157
[32] Emile Durkheim and Marcel Mauss. 1963. Primitive Classification. University of Chicago Press.
[33] M. C. Elish and danah boyd. 2018. Situating Methods in the Magic of Big Data and AI. Communication Monographs 85,
1 (Jan. 2018), 57–80. https://doi.org/10.1080/03637751.2017.1375130
[34] Virginia Eubanks. 2018. Automating Inequality: How High-Tech Tools Profile, Police, and Punish the Poor. St. Martin’s
Press, New York.
[35] Melanie Feinberg. 2017. A Design Perspective on Data. In CHI ’17: Proceedings of the 2017 CHI Conference on Human
Factors in Computing Systems (CHI ’17). Association for Computing Machinery, Denver, Colorado, USA, 2952–2963.
https://doi.org/10.1145/3025453.3025837
[36] Tim Finin, Will Murnane, Anand Karandikar, Nicholas Keller, Justin Martineau, and Mark Dredze. 2010. Annotating
Named Entities in Twitter Data with Crowdsourcing. In Proceedings of the NAACL HLT 2010 Workshop on Creating
Speech and Language Data with Amazon’s Mechanical Turk (CSLDAMT ’10). Association for Computational Linguistics,
Los Angeles, California, 80–88. https://doi.org/10.5555/1866696.1866709
60
[37] Marion Fourcade and Kieran Healy. 2013. Classification Situations: Life-Chances in the Neoliberal Era. Accounting,
Organizations and Society 38, 8 (Nov. 2013), 559–572. https://doi.org/10.1016/j.aos.2013.11.002
[38] Timnit Gebru, Jamie Morgenstern, Briana Vecchione, Jennifer Wortman Vaughan, Hanna Wallach, Hal Daumeé III,
and Kate Crawford. 2018. Datasheets for Datasets. arXiv:1803.09010 (March 2018). arXiv:1803.09010
[39] R. Stuart Geiger, Kevin Yu, Yanlai Yang, Mindy Dai, Jie Qiu, Rebekah Tang, and Jenny Huang. 2020. Garbage in,
Garbage out? Do Machine Learning Application Papers in Social Computing Report Where Human-Labeled Training
Data Comes From?. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency (FAT* ’20).
Association for Computing Machinery, Barcelona, Spain, 325–336. https://doi.org/10.1145/3351095.3372862
[40] Bhavya Ghai, Q. Vera Liao, Yunfeng Zhang, and Klaus Mueller. 2020. Measuring Social Biases of Crowd Workers Using
Counterfactual Queries. In Workshop on Fair & Responsible AI at ACM CHI Conference on Human Factors in Computing
Systems. Honolulu, HI, USA.
[41] Tarleton Gillespie and Tarleton Gillespie. 2014. The Relevance of Algorithms. In Media Technologies: Essays on
Communication, Materiality, and Society, Pablo J. Boczkowski and Kirsten A. Foot (Eds.). The MIT Press, 167–194.
https://doi.org/10.7551/mitpress/9780262525374.003.0009
[42] Lisa Gitelman (Ed.). 2013. "Raw Data" Is an Oxymoron. The MIT Press, Cambridge, Massachusetts ; London, England.
[43] Barney G. Glaser and Anselm L. Strauss. 1998. Grounded theory: Strategien qualitativer Forschung. Huber, Bern.
[44] Mary L. Gray and Siddharth Suri. 2019. Ghost Work: How to Stop Silicon Valley from Building a New Global Underclass.
Houghton Mifflin Harcourt, Boston.
[45] Foad Hamidi, Morgan Klaus Scheuerman, and Stacy M. Branham. 2018. Gender Recognition or Gender Reductionism?
The Social Implications of Embedded Gender Recognition Systems.. In Proceedings of the 2018 CHI Conference on
Human Factors in Computing Systems (CHI ’18). Association for Computing Machinery, New York, 1–13. https:
//doi.org/10.1145/3173574.3173582
[46] Alex Hanna, Emily Denton, Andrew Smart, and Jamila Smith-Loud. 2020. Towards a Critical Race Methodology in
Algorithmic Fairness. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency (FAT* ’20).
Association for Computing Machinery, Barcelona, Spain, 501–512. https://doi.org/10.1145/3351095.3372826
[47] Sarah Holland, Ahmed Hosny, Sarah Newman, Joshua Joseph, and Kasia Chmielinski. 2018. The Dataset Nutrition
Label: A Framework To Drive Higher Data Quality Standards. arXiv:1805.03677 (2018).
[48] Christoph Hube, Besnik Fetahu, and Ujwal Gadiraju. 2019. Understanding and Mitigating Worker Biases in the
Crowdsourced Collection of Subjective Judgments. In Proceedings of the 2019 CHI Conference on Human Factors in
Computing Systems (CHI ’19). Association for Computing Machinery, New York, NY, USA, 1–12. https://doi.org/10.
1145/3290605.3300637
[49] Gunay Kazimzade and Milagros Miceli. 2020. Biased Priorities, Biased Outcomes: Three Recommendations for Ethics-
Oriented Data Annotation Practices. In Proceedings of the AAAI/ACM Conference on Artificial Intelligence, Ethics, and
Society. (AIES ’20). Association for Computing Machinery, New York, NY, USA, 1–7. https://doi.org/10.1145/3375627.
3375809
[50] Lucy Kimbell and Jeanette Blomberg. 2017. The Object of Service Design. In Designing for Service: Key Issues and New
Directions. Bloomsbury Publishing, 81–94.
[51] Rob Kitchin. 2017. Thinking Critically about and Researching Algorithms. Information, Communication & Society 20, 1
(Jan. 2017), 14–29. https://doi.org/10.1080/1369118X.2016.1154087
[52] Gary Klein, Jennifer K. Phillips, Erica L. Rall, and Deborah A. Peluso. 2007. A Data-Frame Theory of Sensemaking. In
Expertise out of Context: Proceedings of the Sixth International Conference on Naturalistic Decision Making. Lawrence
Erlbaum Associates Publishers, Mahwah, NJ, US, 113–155.
[53] Ulrike Klinger and Jakob Svensson. 2018. The End of Media Logics? On Algorithms and Agency. New Media & Society
20, 12 (Dec. 2018), 4653–4670. https://doi.org/10.1177/1461444818779750
[54] Natalia M Libakova and Ekaterina A Sertakova. 2015. The Method of Expert Interview as an Effective Research
Procedure of Studying the Indigenous Peoples of the North. Journal of Siberian Federal University. Humanities & Social
Sciences 8, 1 (2015), 114–129. https://doi.org/10.17516/1997-1370-2015-8-1-114-129
[55] Michael A. Madaio, Luke Stark, Jennifer Wortman Vaughan, and Hanna Wallach. 2020. Co-Designing Checklists
to Understand Organizational Challenges and Opportunities around Fairness in AI. In Proceedings of the 2020 CHI
Conference on Human Factors in Computing Systems (CHI ’20). Association for Computing Machinery, Honolulu, HI,
USA, 1–14. https://doi.org/10.1145/3313831.3376445
[56] Astrid Mager. 2012. Algorithmic Ideology: How Capitalist Society Shapes Search Engines. Information, Communication
& Society 15, 5 (June 2012), 769–787. https://doi.org/10.1080/1369118X.2012.676056
[57] Steffen Mau. 2019. The Metric Society: On the Quantification of the Social. Polity, Cambridge ; Medford, MA.
[58] Frauke Mörike. 2019. Ethnography for Human Factors Researchers. Collecting and Interweaving Threads of HCI.
[59] Michael Muller. 2014. Curiosity, Creativity, and Surprise as Analytic Tools: Grounded Theory Method. In Ways of
Knowing in HCI, Judith S. Olson and Wendy A. Kellogg (Eds.). Springer, New York, NY, 25–48. https://doi.org/10.1007/
61
978-1-4939-0378-8_2
[60] Michael Muller, Shion Guha, Eric P.S. Baumer, David Mimno, and N. Sadat Shami. 2016. Machine Learning and
Grounded Theory Method: Convergence, Divergence, and Combination. In Proceedings of the 19th International
Conference on Supporting Group Work (GROUP ’16). Association for Computing Machinery, Sanibel Island, Florida,
USA, 3–8. https://doi.org/10.1145/2957276.2957280
[61] Michael Muller, Ingrid Lange, Dakuo Wang, David Piorkowski, Jason Tsay, Q. Vera Liao, Casey Dugan, and Thomas
Erickson. 2019. How Data Science Workers Work with Data: Discovery, Capture, Curation, Design, Creation. In
Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (CHI ’19). Association for Computing
Machinery, Glasgow, Scotland Uk, 1–15. https://doi.org/10.1145/3290605.3300356
[62] Safiya Umoja Noble. 2018. Algorithms of Oppression: How Search Engines Reinforce Racism. NYU Press, New York.
[63] Cathy O’Neil. 2017. Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy. PENGUIN
BOOKS, London.
[64] Juho Pääkkönen, Matti Nelimarkka, Jesse Haapoja, and Airi Lampinen. 2020. Bureaucracy as a Lens for Analyzing and
Designing. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (CHI ’20). Association for
Computing Machinery, Honolulu, HI, USA., 1–14. https://doi.org/10.1145/3313831.3376780
[65] Samir Passi and Solon Barocas. 2019. Problem Formulation and Fairness. In Proceedings of the Conference on Fairness,
Accountability, and Transparency (FAT* ’19). Association for Computing Machinery, Atlanta, GA, USA, 39–48. https:
//doi.org/10.1145/3287560.3287567
[66] Samir Passi and Steven Jackson. 2017. Data Vision: Learning to See Through Algorithmic Abstraction. In Proceedings
of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing (CSCW ’17). Association
for Computing Machinery, Portland, Oregon, USA, 2436–2447. https://doi.org/10.1145/2998181.2998331
[67] Samir Passi and Steven J. Jackson. 2018. Trust in Data Science: Collaboration, Translation, and Accountability in
Corporate Data Science Projects. Proc. ACM Hum.-Comput. Interact. 2, CSCW (Nov. 2018), 1–28. https://doi.org/10.
1145/3274405
[68] Kathleen H. Pine and Max Liboiron. 2015. The Politics of Measurement and Action. In Proceedings of the 33rd Annual
ACM Conference on Human Factors in Computing Systems (CHI ’15). Association for Computing Machinery, New York,
NY, USA, 3147–3156. https://doi.org/10.1145/2702123.2702298
[69] Joelle Pineau, Philippe Vincent-Lamarre, Koustuv Sinha, Vincent Larivière, Alina Beygelzimer, Florence d’Alché-Buc,
Emily Fox, and Hugo Larochelle. 2020. Improving Reproducibility in Machine Learning Research (A Report from the
NeurIPS 2019 Reproducibility Program). arXiv:2003.12206 (April 2020). arXiv:2003.12206
[70] Alex Rosenblat, Tamara Kneese, and Danah Boyd. 2014. Networked Employment Discrimination. SSRN Electronic
Journal (2014). https://doi.org/10.2139/ssrn.2543507
[71] Morgan Klaus Scheuerman, Jacob M. Paul, and Jed R. Brubaker. 2019. How Computers See Gender: An Evaluation of
Gender Classification in Commercial Facial Analysis Services. Proc. ACM Hum.-Comput. Interact. 3, CSCW (Nov. 2019).
https://doi.org/10.1145/3359246
[72] Morgan Klaus Scheuerman, Kandrea Wade, Caitlin Lustig, and Jed R Brubaker. 2020. How We’ve Taught Algorithms to
See Identity: Constructing Race and Gender in Image Databases for Facial Analysis. Proc. ACM Hum.-Comput. Interact.
4, CSCW1 (2020). https://doi.org/10.1145/3392866
[73] Nick Seaver. 2019. Knowing Algorithms. In digitalSTS: A Field Guide for Science & Technology Studies. Princeton
University Press, PRINCETON; OXFORD, 412–422.
[74] Ismaïla Seck, Khouloud Dahmane, Pierre Duthon, and Gaëlle Loosli. 2018. Baselines and a Datasheet for the Cerema
AWP Dataset. In Conférence d’Apprentissage CAp (Conférence d’Apprentissage Francophone 2018). Rouen, France.
https://doi.org/10.13140/RG.2.2.36360.93448
[75] Susan Leigh Star and Anselm Strauss. 1999. Layers of Silence, Arenas of Voice: The Ecology of Visible and Invisible
Work. Computer Supported Cooperative Work 8, 1-2 (March 1999), 9–30. https://doi.org/10.1023/A:1008651105359
[76] Robert Thornberg. 2012. Informed Grounded Theory. Scandinavian Journal of Educational Research 56, 3 (June 2012),
243–259. https://doi.org/10.1080/00313831.2011.581686
[77] Fabian L. Wauthier and Michael I. Jordan. 2011. Bayesian Bias Mitigation for Crowdsourcing. In Proceedings of the 24th
International Conference on Neural Information Processing Systems (NIPS’11). Curran Associates Inc., Granada, Spain,
1800–1808.
[78] Jennifer Wortman Vaughan and Hanna Wallach. 2020. A Human-Centered Agenda for Intelligible Machine Learning.
In Machines We Trust: Getting Along with Artificial Intelligence.
[79] Eviatar Zerubavel. 1993. The Fine Line: Making Distinctions in Everyday Life. (2nd ed. ed.). University of Chicago Press.
[80] Honglei Zhuang and Joel Young. 2015. Leveraging In-Batch Annotation Bias for Crowdsourced Active Learning. In
Proceedings of the Eighth ACM International Conference on Web Search and Data Mining (WSDM ’15). Association for
Computing Machinery, Shanghai, China, 243–252. https://doi.org/10.1145/2684822.2685301
62
[81] Shoshana Zuboff. 2019. The Age of Surveillance Capitalism: The Fight for the Future at the New Frontier of Power. Profile
Books, London.
Received January 2020; revised June 2020; accepted July 2020
63
Precarization, Alienation, and Control
4
in Data Work

In this dissertation, data work is defined as the labor involved in the collection, curation,
classification, labeling, and verification of data for ML. These tasks are mostly outsourced
through platforms and BPOs, often located in the Global South, where workers are paid as
little as a few cents of a dollar per task, usually lack social protection traditionally associated
with employment relations, and are subject to systems of control and surveillance. Tubaro
and Casilli [25] define three moments in outsourced AI production: “artificial intelligence
preparation,” “artificial intelligence verification,” and “artificial intelligence impersonation.”
AI preparation includes the collection of data and its annotation. AI verification involves
the evaluation of algorithmic outputs. Finally, AI impersonation, refers to the non-disclosed
“‘human-in-the-loop’ principle that makes workers hardly distinguishable from algorithms.”
Following the research agenda outlined in Paper 1 [37], this chapter studies the rationale and
priorities that are inscribed in data work instructions and explores the different types of tasks
outlined by Tubaro et al. Here, I lean on Foucault’s notion of dispositif and apply the method
of dispositif analysis [90]. A dispositif is an ensemble of objects, subjects, discourses, and
practices as well as the relations that can be established between them. This chapter explores
the broader data production ecosystem as a dispositif that (re)produces power-knowledge
relationships[82, 87]. Through this analysis, I center labor as a fundamental dimension of AI
ethics.
Paper 3, The Data-Production Dispositif introduces a novel mode of analysis combining
different types of data, and the first comprehensive exploration of its kind. It includes
an analysis of social and labor conditions at the sites where ML data is outsourced, the
discourses that these sites reproduce, and the artifacts (such as interfaces and documents)
that support that status quo. The paper merges both authors’ areas of expertise and presents
a comprehensive analysis of data work as performed by Latin American workers through
65
4. Precarization, Alienation, and Control in Data Work
platforms and BPOs. The BPO in question is S1 although, in this paper, we refer to it as
“Alamo.”
The paper investigates the following research questions:
1. What discourses are present in task instructions provided to outsourced data workers?
2. How do outsourced data workers, managers, and requesters interact with each other and
instruction documents to produce data?
3. What artifacts support the observance of instructions, and what kind of work do these
artifacts perform?
The analysis corpus comprises a total of 210 instruction documents, 55 interviews, and
several weeks of observations. We define the data-production dispositif as the network of
discourses, work practices, hierarchies, subjects, and artifacts comprised in ML data work
and the power/knowledge relationships that are established and naturalized among them.
The data-production dispositif determines the realities that ML datasets can reflect and the
ones that remain erased from them. It has a crucial effect on the outputs that ML models
will consider to be true. As Foucault argues, the emergence of each dispositif responds to an
“urgent need.” The data-production dispositif responds to the growing demand for data and
labor in the AI industry.
The findings show that, instead of seeking the “wisdom of crowds,” where a diverse and
independent group cooperates to solve a problem, requesters use task instructions to impose
predefined forms of interpreting, classifying, and sorting data that respond primarily to profit-
oriented interest. Managers in BPOs and algorithms in labor platforms are in charge of
overseeing the process. Poverty and dependence in the areas where data work is outsourced
leaves workers with no other option but to obey and avoid questioning instructions. Documents,
tools, and interfaces constitute some of the dispositif’s materializations.
We observe that the goal of the data-production dispositif is creating a specific type of
worker, namely, outsourced data workers who are kept apart from the rest of the machine
learning production chain and therefore alienated. Data workers who are surveilled, pushed
to obey requesters and not question tasks, and who are constantly reminded of the dangers
of non-compliance. We argue that approaches to “ethical AI” should also consider ways of
providing data workers with dignifying conditions and a sustainable future. In view of these
findings, we propose three ways of counteracting the data-production dispositif and its effects:
making worldviews encoded in task instructions explicit, thinking of workers as assets, and
empowering them to produce better data.
Paper 3 was published in 2022 with the Proceedings of the ACM on Human-Computer
Interaction and presented at the 2022 ACM Conference On Computer-Supported Cooperative
Work And Social Computing (CSCW’22) where it received three awards: Impact Award,
Honorable Mention to Best Paper, and Methods Recognition. The idea and the analysis
were developed in collaboration with Julian Posada during his fellowship at the Weizenbaum
Institute. We both contributed equally to this work. The paper combines Posada’s fieldwork
with platform workers in Venezuela with my fieldwork with data workers at the Argentine
BPO and the expert interviews I conducted with managers at other BPOs and with ML
66
practitioners. While Posada provided a large corpus of instructions documents collected online,
I led the development of the research design and methodology.
67
Paper 3: The Data-Production Dispositif
The Data-Production Dispositif
MILAGROS MICELI∗ , DAIR Institute, TU Berlin, and Weizenbaum Institute, Germany & USA
JULIAN POSADA∗† , Yale University, USA
Machine learning (ML) depends on data to train and verify models. Very often, organizations outsource
processes related to data work (i.e., generating and annotating data and evaluating outputs) through business
process outsourcing (BPO) companies and crowdsourcing platforms. This paper investigates outsourced ML
data work in Latin America by studying three platforms in Venezuela and a BPO in Argentina. We lean on
the Foucauldian notion of dispositif to define the data-production dispositif as an ensemble of discourses,
actions, and objects strategically disposed to (re)produce power/knowledge relations in data and labor. Our
dispositif analysis comprises the examination of 210 data work instruction documents, 55 interviews with data
workers, managers, and requesters, and participant observation. Our findings show that discourses encoded
in instructions reproduce and normalize the worldviews of requesters. Precarious working conditions and
economic dependency alienate workers, making them obedient to instructions. Furthermore, discourses and
social contexts materialize in artifacts, such as interfaces and performance metrics, limiting workers’ agency
and normalizing specific ways of interpreting data. We conclude by stressing the importance of counteracting
the data-production dispositif by fighting alienation and precarization, and empowering data workers to
become assets in the quest for high-quality data.
CCS Concepts: • Human-centered computing → Collaborative and social computing; • Information
systems → Crowdsourcing; • Applied computing → Annotation; • Computing methodologies → Ar- 460
tificial intelligence.
Additional Key Words and Phrases: data production, data work, machine learning, data labeling, platform
labor, crowdsourcing
Milagros Miceli and Julian Posada. 2022. The Data-Production Dispositif . Proc. ACM Hum.-Comput. Interact. 6,
CSCW2, Article 460 (November 2022), 37 pages. https://doi.org/10.1145/3555561
1 INTRODUCTION
Many machine learning (ML) models are built from training data previously collected, cleaned,
and annotated by human workers. Companies and research institutions outsource several of these
tasks through online labor platforms [64] and business process outsourcing (BPO) companies
[53]. In these instances, outsourcing organization and their clients regard workers as independent
contractors, considering them factors of production, and their labor a commodity or a product
subject to market regulations [87]. They are paid as little as a few cents of a dollar per task, usually
lack social protection traditionally tied with employment relations, and are subject to systems of
control and surveillance [30, 37, 81]. Their assignments broadly comprise the interpretation and
classification of data, and their work practices involve subjective social and technical choices that
∗ Equal contribution.
† Also with University of Toronto, Canada.
Authors’ addresses: Milagros Miceli, m.miceli@tu-berlin.de, DAIR Institute, TU Berlin, and Weizenbaum Institute, Germany
& USA; Julian Posada, julian.posada@yale.edu, Yale University, New Haven, CT, USA.
This work is licensed under a Creative Commons Attribution International 4.0 License.

2573-0142/2022/11-ART460
https://doi.org/10.1145/3555561
Proc. ACM Hum.-Comput. Interact., Vol. 6, No. CSCW2, Article 460. Publication date: November 2022.
69
460:2 Milagros Miceli and Julian Posada
influence data production and have ethical and political implications. Workers interpreting and
classifying data do not do so in a vacuum: their labor is embedded in large industrial structures
and deeply intertwined with naturalized profit-oriented interests [44].
This paper presents an investigation of data production for ML as carried out by Latin American
data workers mediated by three platforms operating in Venezuela and a business process outsourcing
(BPO) company located in Argentina. To study data work for machine learning, which we define as
the labor involved in the collection, curation, classification, labeling, and verification of data, we
lean on Foucault’s notion of dispositif and apply the method of dispositif analysis [39]. A dispositif
is an ensemble of objects, subjects, discourses, and practices as well as the relations that can be
established between them [25]. Examples of dispositifs include prisons, police, and academia. These
heterogeneous ensembles of discursive and non-discursive elements constitute what is perceived
as reality and, as such, what is taken for granted.
The decision to lean on Foucault’s notion of dispositif is methodological, rather than theoretical.
This notion and the method of dispositif analysis enables the study of data production as embedded
in social interactions and hierarchies that condition how data is constructed and how specific
discourses are reproduced. As we will describe in Section 3.1, this method also allowed us to
integrate diverse qualitative data and focus on the relationships between them. We define the
data-production dispositif as the network of discourses, work practices, hierarchies, subjects, and
artifacts comprised in ML data work (see Figure 3) and the power/knowledge relationships that are
established and naturalized among the three elements. The data-production dispositif determines
the realities that ML datasets can reflect and the ones that remain erased from them. It has a crucial
effect on the outputs that ML models will consider to be true. Dispositif analysis interrogates means
of reality making, with a special focus set on the meanings that become dominant and those that
are marginalized — “the said as much as the unsaid” [25]. Our dispositif analysis explores the sites
where the production of ML data is outsourced. It comprises the investigation of (1) linguistically-
performed elements (what is said/written), (2) non-linguistically performed practices (what is done),
and (3) materializations (how linguistically and non-linguistically performed practices translate
into objects) [39].
These elements and research questions relate specifically to the outsourcing of ML data-production
tasks and can be structured as follows:
• Linguistically performed elements: What discourses are present in task instructions pro-
vided to outsourced data workers? (RQ1)
→ We analyzed a corpus of 210 instruction texts for data-related tasks requested by ML
practitioners and outsourced to data workers.
• Non-linguistically performed practices: How do outsourced data workers, managers, and
requesters interact with each other and instruction documents to produce data? (RQ2)
→ To explore how linguistically performed elements translate into practice, we conducted
41 interviews with data workers and inquired how they interpret the instructed tasks. In
addition, we conducted interviews with six managers and eight ML practitioners (in their
role as data-work requesters).
• Materializations: What artifacts support the observance of instructions, and what kind of
work they perform? (RQ3)
→ Through participant observation, we account for some of the material elements in which
the data-production dispositif manifests, such as platforms and interfaces, tools to surveil
workers, and documents that record the decisions made between service providers and service
requesters.
70
The Data-Production Dispositif 460:3
To summarize, this paper, its contribution, and the extensive analysis it comprises can be described
as follows: We start by exploring Foucault’s notion of dispositif and defining key related concepts
such as power, knowledge, and discourse. Then, we review previous investigations that have
discussed ML data work and further define the scope of the data-production dispositif. After
offering an overview of the dispositif analysis method, informants, and fieldwork sites, we present
our findings. They are organized around the three elements that form the dispositif.
The findings show that, instead of seeking the “wisdom of crowds,” where a diverse and indepen-
dent group cooperates to solve a problem, requesters use task instructions to impose predefined
forms of interpreting, classifying, and sorting data that respond primarily to profit-oriented interest.
Managers in BPOs and algorithms in labor platforms are in charge of overseeing the process.
Poverty and dependence in the areas where data work is outsourced leaves workers with no other
option but to obey and avoid questioning instructions. Documents, tools, and interfaces constitute
some of the dispositif’s materializations. Given these findings, we outline some implications and
propose three ways of counteracting the current composition of the data-production dispositif and
its effects by fighting workers’ precarization, alienation, and surveillance. Finally, we discuss the
limitations of our investigation.
2 DEFINING KEY CONCEPTS

2.1 Dispositif and Other Foucauldian Concepts
Foucault proposes a relational conception of power [25] and argues that its exercise takes place in
networks of relations rather than being placed in a specific social location [14]. He defines power
as “a whole series of particular mechanisms, definable and defined, that seem capable of inducing
behaviours or discourses”[24]. Therefore, power is not held by or exercised over individuals but
works through the impersonal relations of force and strategy that connect subjects [14]. Power
operates through practices that act upon subject’s present or future actions. It is effective as long
as it is normalized, that is, taken for granted and perceived as the inevitable way things are.
Present throughout Foucault’s power analysis is the implicit relationship between power and
knowledge, whereby one implies the other. He understands knowledge as entangled with discursive
power and describes it as the power to define others and to produce truth through discourses.
Knowledge and power are integrated with one another because “it is not possible for power to be
exercised without knowledge, it is impossible for knowledge not to engender power” [25].
In a related manner, Foucault uses the term discourse to refer to a historically contingent system
that produces knowledge and meaning [22]. Discourse is not only a way of organizing and presenting
knowledge, but also, it can structure social practices and the relations that emerge through the
collective understanding of reality encoded in discourse [21]. Discourses encode power in the sense
that they can determine reality. Subjects are active participants in reality-making processes as
co-producers of discourses, which puts explicit and implicit knowledge at their disposal [39]. Thus,
discursive power is not exercised on subjects but flows through them. However, discourses still
have a disciplinary effect on subjects as they assure the prevalence of certain knowledge of what
can be said, done, and thought.
Power, knowledge, and discourse finally converge in dispositif, a notion that expands discourse
to include non-discursive practices and artifacts. Foucault defines dispositif as “a thoroughly hetero-
geneous ensemble consisting of discourses, institutions, architectural forms, regulatory decisions,
laws, administrative measures, scientific statements, philosophical, moral and philanthropic propo-
sitions – in short, the said as much as the unsaid . . . The [dispositif] itself is the system of relations
that can be established between these elements” [25]. There is a multiplicity of dispositifs that
influence each other and have strategic functions within power relationships. As Foucault puts it,
71
dispositifs respond to an “urgent need” that is bound to specific historical and geographical contexts.
Link [48] draws attention to the etymological root shared by the French words “disposition” and
“dispositif.” In colloquial French, “disposition” is used in the sense of being “at someone’s disposal.”
This way, Link highlights the power element comprised in dispositif as the separation between
those who are “at the disposal” (those who are instrumental) and those who have influence to
determine the strategy used to meet a need [9]. Those who are “at the disposal” and those who
“dispose” are part of the dispositif’s strategy.
In sum, the concept of dispositif comprises the knowledge that is built into linguistically per-
formed practices (what is said, written, and thought), non-linguistically performed practices (what
is done), and materializations (the objects) [39, 41]. A dispositif can therefore be defined as a
constantly changing network of objects, subjects, discourses, and practices that shape each other,
producing new knowledge and new power.
Previous HCI and CSCW research has engaged with Foucauldian theory. For instance, Harmon
and Mazmanian [34] follow Foucault’s understanding of discourse to explore how US residents talk
about smartphones and smartphone users. Kou et al. [46] use the Foucauldian concepts of power,
knowledge, and self to explicate human-technology relationships. And Bardzell et al. [4] draw
on Foucault’s Theory of Identity to investigate social practices within the virtual world Second
Life. In terms of methodology, Kannabiran et al. [42] use Foucauldian Discourse Analysis (FDA) to
study the rules and mechanisms involved in HCI discourses on sexuality, while Spiel [78] combines
Actor-Network Theory with FDA into a “critical experience” framework to evaluate how children
in the autistic spectrum interact with technologies. Despite the wide application of Foucauldian
theory in HCI and CSCW, the dispositif notion and analysis has not been applied to the study of
data production and data work. To that end, we believe that our contribution could be seen as
methodological, in the sense that we intend to show a novel and comprehensive mode of analysis
to approach data work.
2.2 Data Work for Machine Learning

The data-production dispositif analyzed in this paper comprises the “infrastructure” that enables
the (re-)production and circulation of specific discourses in and through ML data work. As Foucault
argues, the emergence of each dispositif responds to an “urgent need.” The data-production dispositif
responds to the growing demand for data and labor in the AI industry.
We define data work as the human labor necessary for data production, in this case, for machine
learning. Data work involves the collection, curation, classification, labeling, and verification of data.
Users, developers, and outsourced workers carry out these tasks at any point in the development
and deployment of AI systems. For example, medical professionals in the case of AI for healthcare
[6, 55, 79], education professionals [50], or internet users when answering ReCAPTCHA tests [40].
This paper will employ the term “data work” to refer exclusively to the labor outsourced through
crowdsourcing platforms and specialized business process outsourcing (BPO) companies, instead
of the broader data work carried out by other professionals and users, while acknowledging the
role of the former within the dispositif as requesters.
Platforms are one of the two significant ways of outsourcing data work. The rise of alternative
forms of work different from traditional employment [43] and the expansion of the “gig economy,”
or casual employment mediated through platforms [89], gave rise to “crowdsourcing,” “crowdwork,”
or “digital piecework” platforms where geographically dispersed workers are allocated many
fragmented tasks, which are carried out online from their homes. Platforms are hybrid organizations
that combine traits of firms and multi-sided markets [11]. They serve as infrastructures that
“facilitate and shape personalised interactions among end-users and complementors, organised
through the systematic collection, algorithmic processing, monetisation, and circulation of data”
72
[63]. Platforms thrive in digital environments because they respond to deficiencies in markets and
enterprises that fail to extract and appropriate data and allocate resources efficiently [11].
The second primary form of outsourced data work for ML is provided by business process
outsourcing (BPO) companies. Conversely to crowdsourcing platforms, where hierarchies are
primarily managed by algorithms, BPOs show rather traditional management structures. BPO
is a form of outsourcing that involves contracting a third-party service provider to carry out
specific parts of a company’s operations, in the case of our investigation, data-related tasks. These
service providers often specialize in one type of ML data service (e.g., semantic segmentation)
or application domain (e.g., computer vision), contrary to platforms specializing in one or a few
application domains but with more diverse data services. While prices per piece are significantly
higher than those offered by platforms, many machine learning companies prefer to outsource
their data-related projects with BPOs because of the perceived higher quality of data [54]. This is
due to the companies’ domain specialization and traditional managerial structures that allow more
direct and personal communication.
The intervention of humans in processes of data production has been addressed by a large body
of CSCW and HCI research [19, 26, 53, 57, 58, 61, 62, 77, 80]. Some investigations have explored
the role of worker subjectivity on datasets [7, 12, 29, 84] and have proposed ways to recognize and
address worker bias [3, 28, 36, 84]. In contrast, other researchers have documented the “practices,
politics, and values of workers in the data pipeline” [73, 74] and the sociotechnical organization
of data work that privileges speed, scale, and scalability over worker wellbeing [37], low wages
[18, 33], dependency [71], and the power asymmetries vis-à-vis requesters [38, 52, 53, 72].
As we argue, the “urgent need” addressed by the data-production dispositif is the exponential
need for cheaper and more profitable data, which is also the exploitation of surveillance [90],
natural resources [15], and other types of labor [13]. Previous research has highlighted the role of
these elements to guarantee a façade where AI is seen as neutral, unbiased, and efficient due to the
lack of human intervention — and error — while keeping workers and factors of production hidden
from the public lens [10, 30, 37]. These elements show the wide extension of the “heterogeneous
ensemble” that constitutes the discursive, non-discursive, and material elements of data production.
Because the data-production dispositif is too vast to explore in one academic paper, we circumscribe
its analysis around outsourced data work for ML as one of its crucial components.
3 METHODOLOGY
3.1 Dispositif Analysis of Data Production
We lean on dispositif analysis to investigate the discourses implicit in annotation instructions,
the non-discursive practices involved in the production of ML datasets, and how both materialize
in artifacts. Often described as an extension of discourse analysis [9], dispositif analysis expands
the field of inquiry beyond texts to include actions, relationships, and objects. Dispositif analysis
rests on the notion of knowledge (and power) as the connecting force between discursive and
non-discursive components. It accounts for hierarchies and power structures in societal fields and
organizations that shape the construction of meaning in discourse [68]. Thus, our dispositif analysis
crucially focuses on the relationship between discourse, practice, and objects in data production,
and the power created through their interaction.
Foucault never outlined an explicit methodology of dispositif analysis. Several authors, most
prominently Sigfried Jäger [39, 41], have explored ways of operationalizing the complex Foucauldian
notion of dispositif into a method of inquiry. Dispositif analysis has thus been in constant evolution
since the mid-1980s. Caborn [9] mentions four steps comprised in this methodology: (1) identifying
the elements that constitute the dispositif, (2) determining which discourses they embody and their
73
entanglement with other discourses, (3) interrogating power by “considering who or what is at
the disposal of whom”, and (4) analyzing non-discursive practices associated with the dispositif’s
discourses. However, to our knowledge, a comprehensive guide or method of how to conduct a
dispositif analysis has not yet been developed. As Jäger and Maier describe, dispositif analysis
remains “a flexible approach and systematic incitement for researchers to develop their analytic
strategies, depending on the research question and type of materials at hand” [39].
The study presented in this paper follows the experimental spirit of Jäger and Maier’s invita-
tion to develop our analytical strategy, in this case, to study data production. Here, we combine
methodological elements discussed by several authors in terms of the operationalization of power,
knowledge, and discourse [8, 9, 47, 48, 60], apply them to our fieldwork on data production through
platforms and at a BPO, and follow the examples provided by previous research that has successfully
applied variations of dispositif analysis [9, 31, 51, 85, 86]. We followed the four steps outlined by
Caborn mentioned above and based our analysis on the three-dimensional framework described by
Jäger and Maier [39] as follows:
• The analysis of linguistically performed elements: which aims at reconstructing the knowledge
built into what is said and written through discourse analysis. In terms of our investigation,
this phase comprised an examination of the discourses encoded in the instruction documents
received by data workers.
• The analysis of non-linguistically performed practices: which aims at reconstructing the
knowledge that underlies linguistically performed practices and how they translate into
action. In this phase, we investigated how workers make sense of the task instructions and
their work in general, the interactions between workers, managers, and clients, and the
labor conditions that structure these practices. We studied these elements through interviews
conducted with data annotators who perform tasks guided by such instructions, machine
learning practitioners who compose annotation instructions, and managers who oversee the
process.
• The materializations: This phase of analysis consisted of identifying the knowledge that
is built into physical and digital artifacts, i.e., discursive materialization, whose existence
is coherent with the discourses they encode. Through this lens, we set the focus on the
platforms and interfaces used to perform data work, documents (as artifacts and not as texts)
that record decision-making processes, and tools used to surveil workers and quantify their
performance. Our analysis of these materalizations is based on participant observations and
the above-mentioned interviews.
3.2 Researcher Positionality

Making researchers’ positionality explicit is key to situate the standpoint from which an investi-
gation has been conducted. Positionality statements are relevant to all types of studies, specially
qualitative and exploratory investigations such as this one. Moreover, given the flexible character
of dispositif analyses, it seems appropriate to disclose some elements of the authors’ backgrounds
that might have informed the analysis presented in this paper.
Both authors are multiracial researchers born in different countries of Latin America. Both
are first-generation academics working in institutions located in the Global North, where they
live under immigrant status. Both have a background in Sociology and Communication. Their
first language is Spanish. One of the authors identifies as female and the other as male. Both are
cisgender. Despite being born and raised within worker-class families and in the same regions as
the data workers interviewed, the authors acknowledge that their class-related experiences differ
from those of the interview partners and that their position as researchers living and working
74
in the Global North provides the authors with privilege that the study participants do not hold.
Throughout data collection, analysis, and while considering the implications of this investigation,
the authors have put much effort in remaining reflexive and acknowledging their position regarding
the study participants and field of inquiry.
3.3 Data Collection and Analysis

This investigation comprises several weeks of participant observation, a total of 55 interviews, and
the analysis of 210 instruction documents. These data were collected during several months of
fieldwork from 2019 to 2021, online and in person, at two sites (see Table 1):
• virtually studying three crowdsourcing platforms operating in Venezuela and the experiences
of platform workers, and,
• in a hybrid format, at a business process outsourcing company (BPO) located in Buenos Aires,
Argentina, where data workers perform tasks related to the collection and labeling of data
for machine learning.
At both fieldwork sites, we conducted participant observations and semi-structured interviews. To
complement these data, we conducted a series of expert interviews with managers at other BPOs
and with ML practitioners in their role of data-work requesters (see Section 3.3.2 and Table 3 for a
detailed account of the interview participants).
3.3.1 Fieldwork.
Fieldwork in Venezuela was carried out virtually between July 2020 and June 2021 due to
restrictions related to the coronavirus pandemic. For the first phase of this research, we signed up
and completed tasks for the platforms to understand the tasks available, working conditions, and
interfaces. While some of these platforms presented similar tasks, they differed considerably in
their general availability, the interfaces, labor process, and task applications. Initially, we contacted
the platform workers using convenience sampling since this population is invisible, meaning that
it is difficult to approach them without the support of the platforms, which we did not have for the
study. We sought permission from the moderators of the most popular worker groups on Facebook
and Discord to post a call for study participants. Thanks to this initial approach, we were able
to use snowball sampling to contact further participants. We conducted in-depth interviews and
asked workers about their experience working for the platforms. Additionally, we asked workers
if they could share information about the instructions they received. Some workers also shared
guides created by colleagues to understand and answer the tasks efficiently. We include those in our
analysis as well. Moreover, we also searched the internet to find additional annotation instructions
online. Our approach, notably the use of convenience and snowball sampling, present several
limitations in terms of reproducibility and bias towards participants belonging to similar social
circles. We have mitigated these issues by comparing our results with similar studies (see Section
2.2) and with the workers at the BPO company.
Fieldwork at Alamo, the Argentine business process outsourcing (BPO) company, was carried out
in-person between May and June 2019 in Buenos Aires and continued online between August 2020
and February 2021. At the time of this investigation, this company is a medium-sized organization
with branches in several Latin American countries. Alamo conducts, data collection, data annotation,
content moderation and software testing projects. The company is an impact sourcing type of
BPO, which refers to a branch of the outsourcing industry that purposely employs workers from
poor and marginalized populations to offer them a chance in the labor market and to provide
information-based services at lower prices. We contacted the company via e-mail to request field
access. After several months of inquiry, a meeting with the company’s management took place in
which the researcher on site signed a non-disclosure agreement that specified several elements that
75
Table 1. Fieldwork sites: Studied data work BPO and platforms
Entity Type Primary Tasks Applications

Data collection and annotation
ALAMO BPO E-commerce
Content moderation
Data entry
CLICKRATING Platform Online search engine
Algorithmic verification
2D/3D image classification Self-driving vehicles
TASKSOURCE Platform
2D/3D semantic segmentation Internet of things
2D image, text and video classification
Content moderation
WORKERHUB Platform 2D semantic segmentation
E-commerce
Text transcription
we are not allowed to disclose in this or other papers. Most of these elements concern the identity
of clients and specific details about their ML models. After this meeting, fieldwork was allowed to
commence and we were able to observe several projects related to the collection and annotation of
data for ML. Apart from shadowing workers, we were granted access to team meetings, meetings
with clients, workers’ briefings, and QA analysis related to three projects carried out by the company
in 2019, involving the collection and labeling of image data. We complemented the observations
with in-depth interviews with data workers, managers, and QA analysts.
3.3.2 Instruction Documents.

In total, we collected 210 annotation instruction documents from the platforms and the BPO. The
analysis of the instructions was carried out by both authors. We used critical discourse analysis
[39] to explore the instruction texts. The analysis comprised three stages: (1) the structural analysis
of the corpus, (2) a detailed analysis of discourse fragments, and (3) a synoptic analysis. These steps
(especially the synoptic analysis) included several iterations that allowed us to discover connections
between different levels of analysis, collect evidence to support our interpretations, and develop
arguments. Table 2 offers an overview of the codes used for the discourse analysis of the instruction
documents, their evolution throughout the three phases of analysis, and explanatory memos that
reflect our understanding of each code.
The goal of the structural analysis is to code the material to identify significant patterns and
recurring themes and sub-themes comprised in the instructions. By the end of the structural analysis
phase, we were able to identify elements of the text structure, regular tasks, and stylistic devices
that appeared in the instruction documents. These elements helped us identify “typical” texts and
representative discourse fragments for the following analysis step.
The detailed analysis comprised an examination of selected text fragments. We focused on
identifying typical representations and their variations and interrogated the elements highlighted
in the instruction documents and the contextual knowledge that is taken for granted and, thus,
neglected in them. We also paid special attention to binary reductionisms, presupposition and
attribution, examples, and visualizations. A critical aspect of the analysis focused on the taxonomies
that structure the labels instructed by requesters.
Finally, the synoptic analysis included the overall interrogation of the observations that emerged
from the structural and detailed analyses. This phase included an intensive exchange between both
authors to reflect upon our shared understanding of the identified discourse strands. We contrasted
the selected fragments and the identified elements with the interview and observation material.
This approach helped us understand the role of work and managerial practices in legitimizing
specific discourses.
76
Table 2. Evolution of codes throughout the three phases of discourse analysis as applied to the instruction
documents
Structural Analysis Detailed Analysis Synoptic Analysis Memo
Format
Document elements that hinder the correct understanding or
Documents as
Document language completion of tasks. Elements that constrain workers’ interpretation
constraints
Document of data.
characteristics Language barriers
Document version Evidence of documents evolving / being modified by requesters.
Document as artifacts
1st Person Different versions of same task.
Project description
Evidence of workers kept in the dark about the ML pipeline, made to
Descriptions AI description Worker alienation feel foreign to the products of their labor. Workers forced to automize
their outputs.
Use of tech jargon
Data collection
Data scraping, collection. Tasks that include taking or intervening
Classification Data generation
pictures and/or generating texts.
Involves text
Data labeling
Data segmentation and classification. Task related to labeling and
Segmentation Data annotation
keywording.
Task type Involves images
Keywording
Assesment of algorithmic output by workers. Testing of ML systems,
Algorithmic
Evaluation rating of search engine queries, moderation of content flagged by
verification
algorithm.
Testing
Rating Algorithmic Workers are instructed to act as an AI and rendered invisible in the
Content moderation impersonation process.
Categories Multiclass classifica- Classifications that include more than two options. Rationale behind
tion classes. Normalization.
Taxonomies Classes Binary classification Binary classifications and simplification of complex phenomena.
Atributes Other/Unknown Use of a third label, especially in relation to binary classifications. Oth-
erness.
Labels definition Ambiguity/Exceptions How ambiguity and marginal cases are dealt with.
Example description Errors/Discrepancies Cases where instructions contain errors. Discrepancies between in-
struction and interface.
Counter-example What is explicitly described. Rationale behind taxonomies and
Examples Explicit
Clarifications examples made explicit.
Interface description What remains unsaid. What is considered self-evident. Implicit

Implicit
Use of images rationale behind taxonomies.
Disturbing Content warnings Exposure to

Tasks that include dealing with that is sexual or violent in nature.
content Content inclusion disturbing content
Experience Discrepancies between instructions language and workers’ native

Language language. Problems and strategies. Use of Google Translate. Guides
Worker Language skills developed by workers in Spanish.
skills Technical skills Quantification and References to how the performance of workers is measured and
Score surveilance surveiled. Consequences for low scores.
Ban How the unquestioning obedience is fostered in task instructions.

Worker obedience How workers are prompted to think in terms of what the requester
Speed wants.
Warnings
No payment Pracarious labor conditions made explicit in instruction documents.
Worker precarization Threats of being banned from the task. References to piece rate pay.
Accuracy Arbitrary deffinitions of worker accuracy.
77
3.3.3 Interviews.
To reconstruct the knowledge that underlies the practices that constitute the data-production
dispositif, we turned to the experiences of those actors who interact with the instructions regularly.
With this aim, we conducted a total of 55 interviews with dataworkers located in Venezuela and
Argentina, BPOs managers and founders, and ML practitioners who regularly outsource data-related
tasks. Table 3 shows a detailed overview of the interview partners, including their role within their
organizations, location, type of interview, and language.
Due to the restrictions related to the COVID-19 pandemic, the interviews with platform workers
were conducted online through video calls, while those with BPO workers were conducted in person
before the pandemic. The interviews with the data workers were conducted in Spanish, which is
the native language of the interviewers and the participants. We conducted the expert interviews
in English. While most platform workers had tried several platforms, they usually focused on one,
except for two workers who worked simultaneously for Tasksource and Workerhub. All interview
partners were asked to choose a code name or were pseudonymized post-hoc to preserve their
identity and that of related informants.
The goal of the interviews was to reveal practices and perceptions and obtain additional in-
formation about the organizational relations and structures that inform how data-related tasks
come about, how instructions are communicated, and how workers execute them. For instance,
hierarchical structures can have an essential effect on meaning-making practices as enacted through
the annotation instructions without being referred to explicitly or implicitly in the instruction
documents. The in-depth interviews with data workers include accounts of specific work situations
involving the interpretation of data. Moreover, they cover task descriptions, widespread routines,
practices, working conditions, lived experiences, and general views on their work and the local
labor market.
It is essential to mention that the differentiation between in-depth and expert interviews refers
to the interview method chosen for each situation and informant and was not based on informants’
occupational status or position. Our priority was engaging in in-depth conversations with data
workers to discuss and learn from their experiences in and beyond data work. Conversely, we used
the expert-interview method to conduct focused exchanges with actors that possessed a broad
overview of the machine learning pipeline. The expert interviews covered the topics of data work
and the relationship between BPO/platform and requesters.
Dispositif analysis allowed us enough flexibility to obtain valuable insights from the interviews by
combining inductive and deductive coding. Some of the topics that we identified through discourse
analysis in the instruction documents helped us build categories to code the interviews. This form
of deductive coding was oriented towards finding additional evidence for phenomena identified
in the instruction texts and understanding the contexts in which instructions are formulated and
carried out. In addition, there was room for inductive category formation so that several codes
could emerge directly from the interviews during coding. This approach helped us identify valuable
observations that otherwise have been lost. Through this form of analysis, we aimed at identifying
patterns. Those patterns were later confronted with the elements identified in the instruction
texts and complemented with participant observations. The development of coding schemes for
the analysis and the coding process itself was carried out in iterations involving cross-coding
between both authors. The interview transcripts were analyzed in their original language (Spanish
or English). The excerpts included in Section 4 were translated by us when writing this paper
and only after the analysis phase. Our emergent understanding evolved throughout numerous
discussions and several iterations until reaching the set of findings that we present in Section 4.
78
Table 3. Overview of interview partners and interview characteristics
ORGANIZATION INTERVIEW METHOD MEDIUM LANG. INFORMANTS

Alamo (BPO in Argentina) In-depth interview In-person Spanish 10 data workers
Tasksource (Platform in Venezuela) In-depth interview Zoom Spanish 8 data workers
WORKERS
Workerhub (Platform in Venezuela) In-depth interview Zoom Spanish 19 data workers
Clickrating (Platform in Venezuela) In-depth interview Zoom Spanish 6 data workers
Data processing company (Bulgaria) Expert interview Zoom English 1 company founder
1 general manager
Data processing company (Iraq) Expert interview Zoom English
1 program manager
MANAGERS
Data processing company (Kenya) Expert interview Zoom English 1 country manager
1 director of ML services
Data processing company (India) Expert interview Zoom English
1 project manager
1 data protection officer
1 co-founder
Computer vision company (Germany) Expert interview In-person English
1 product manager
1 CV engineer
REQUESTERS
Machine learning company (USA) Expert interview Zoom English 1 product engineer
Machine learning company (Spain) Expert interview Zoom Spanish 1 lead engineer
1 co-founder
Computer vision company (Bulgaria) Expert interview Zoom English
1 CV engineer
3.3.4 Observations.
Through fieldwork at the BOP and the platforms, we were able to observe interactions among
data workers and between them and clients using in-person observation in the Argentinian case and
digital ethnography in the Venezuelan one [35]. Furthermore, we observed workers’ interactions
with crowdsourcing platforms and the software interfaces used to complete annotation tasks.
Special attention was paid to the interaction of workers with task instructions. It is important to
mention that the instruction documents underwent a twofold form of analysis: On the one hand we
analyzed instruction documents as texts through discourse analysis as described in Section 3.3.2.
On the other hand, we used the observations conducted to analyze these documents as artifacts or
materializations of the data-production dispositif. For the latter form of analysis, the focus was set
on documents’ function, provenance, and the interactions they allow or constrain.
The level of involvement regarding observations varied from shadowing to active participant
observations. In some cases, we had the opportunity to observe and try the interfaces and perform
data annotation tasks for several hours. All observations were recorded as jottings taken in real-time.
Those jottings comprised descriptions of briefings, meetings, tasks, documents, communication
channels, and interfaces, as well as their advantages and limitations. In parallel, reflections on
the researchers’ impressions and perceptions, including explicitly subjective interpretations, were
noted. Simple sketches and, when permitted, photos helped to complete the observations registered.
The information gathered in keywords or bullet points was later transformed into complete texts
and integrated into more consolidated field notes. In the analysis phase, we combined these field
notes with the interview transcripts and coded them following the steps described in the previous
subsection.
4 FINDINGS
The presentation of our findings comprises a descriptive subsection (4.1) and three analytical parts
(4.2, 4.3, and 4.4).
In 4.1, we present several examples of the different tasks carried out by data workers at each
one of our four fieldwork sites. We include details of how task instructions are formulated and
79
Table 4. Types of tasks in outsourced data work based on Tubaro et al. [81]
Task Type Description Examples (based on our fieldwork)

Data Generation The collection of data from “You can earn $2.5 by completing the task ‘Do you wear glasses?’
the worker’s environment Upload a picture of a document with your prescription values
now.”
Data Annotation The classification of data ac- “Based on the text in each task, select one of these three options:
cording to a predefined set Sexually Explicit, Suggestive, Non-Sexual.”
of labels
Algorithmic Verifi- The evaluation of algorith- “You’ll be shown two lists of up to eight search suggestions
cation mic outputs each. Your task is to indicate which list suggestion is better.”
AI Impersonation The impersonation of an ar- As the assistant, the “user will initiate the conversation. . . you
tificial agent need to use the facts to answer the user’s question.”
communicated to workers and how workers follow or interrogate instructions. Through these
descriptions, we seek to locate our analysis in specific settings with specific ways of doing things.
Next, we move into dissecting the described tasks, instructions, and practices while outlining
specific characteristics of the data-production dispositif. In 4.2, 4.3, and 4.4, we center the findings
of the dispositif analysis around our three research questions. Following RQ1, we describe discursive
practices such as those involved in the taxonomies used to collect and classify data and the warnings
and threats included in them. Following RQ2, we describe non-discursive practices and social contexts
such as the obedience to instructions, the dependence of Latin American workers to precarious
work, and moments of interrogation and solidarity among workers. Finally, and following RQ3, we
describe some of the dispositif’s materializations, such as documents, work interfaces, and tools to
measure workers’ performance and surveil them.
4.1 Different Tasks, Different Instructions

Before moving towards answering our three research questions, we will describe in this section the
different tasks available in data work and explored in this study. We use the framework proposed
by Tubaro et al. [81] to differentiate the tasks (see Table 4). While this framework was initially
conceived to analyze digital platform labor, we think it can also be applied to BPOs in the broader
field of data work due to the similar types of tasks available.
Tubaro et al. define three moments in outsourced AI production: “artificial intelligence prepara-
tion,” “artificial intelligence verification,” and “artificial intelligence impersonation.” The authors
divide AI preparation into the collection of data and its annotation. AI verification involves the
evaluation of algorithmic outputs. Finally, AI impersonation, often seen in the corporate and AI-as-
a-service sector [59], refers to the non-disclosed “‘human-in-the-loop’ principle that makes workers
hardly distinguishable from algorithms” [81].
4.1.1 Data Generation.
Platform workers are directed to collect data from websites or to produce media content (e.g.,
text, images, audio, and video) from their devices. For example, a task on Clickrating instructed
workers to find information online from companies in the United States, including their address and
telephone number. Workerhub required workers to take photos of themselves in certain poses or
pictures of family members (including children) and enter attributes of the subjects in these images,
including their age and gender. While tasks involving data collection from the web were paid a few
cents per assignment, those that involved capturing photos, video, and audio were compensated
with a few dollars per file generated. Interviewed workers found the latter type of task attractive
because they were the best remunerated. In this case, financial need overcame any privacy concern.
80
At the BPO, one of the data generation projects produced an image dataset to train a computer
vision algorithm capable of identifying fake ID documents. For this purpose, workers were instructed
to use their IDs and those of their family members. They took several pictures of the documents
and used some of those pictures to create different variations of the ID document (changing the
name, the address, the headshot). This way, they produced imagery of authentic as well as fake IDs.
The requester of this task was a large e-commerce corporation and Alamo’s most important client.
The task is just one of many data-related projects Alamo has conducted for this corporation in
the last four years. Because of their ongoing service relationship, the client invested considerable
time and money in training Alamo’s workers for each project. One particular characteristic of this
relationship is that instruction documents are not created by the client unilaterally but co-drafted
with Alamo’s project managers and team leaders. Managers and leaders then serve as support to
answer data workers’ questions, should they arise after reading the instructions or while completing
the tasks. Because of the sensitivity of the data involved in the ID project, a special interface with
several security measures was created to work on the images and store them. Furthermore, all
workers had to sign a consent form, allowing the use of their ID document. The request to use their
ID and those of family members caused some unease and raised several questions among workers,
as one of Alamo’s managers reported. Then, it was the managers’ job to “convince them that their
IDs were not going to be used for anything bad, no crime or something. And to do that without
revealing too much of the client’s idea because we had signed an NDA.”
4.1.2 Data Annotation.

The most common task in data work is data annotation. Projects of this kind were available in
all of the studied platforms and the BPO.
Tasksource and Workerhub provided many image segmentation and classification tasks as
platforms specializing in computer vision projects. Some of these tasks included the classification
and labeling of images of people according to gender, race, or age categories. Since segmentation
tasks required around one hour depending on the size of the image and the number of labels,
these tasks are paid more than classifying entire pictures according to a set of categories. Most of
the segmentation tasks were designed for training self-driving cars and devices that are part of
the internet of things. At the same time, image classification has multiple uses, including content
moderation (notably for hate speech and sexual content), healthcare, facial recognition, retail, and
marketing. For instance, the same BPO workers that collected pictures of ID documents were
subsequently asked to classify and label them as “authentic” or “fake.” In addition, workers had
to segment the “fake” ID documents and mark the part of the image they had modified. Finally,
they had to annotate the type of modification the image had undergone (e.g., “address has been
modified” or “headshot is fake”).
Outside of computer vision, workers of Workerhub were asked to identify hate speech and sexual
content in text, notably for social media. For example, in an assignment titled “Identify Racism,”
workers were asked to read social media posts and identify whether or not the content included
racism or if this judgment was not possible. In another task titled “$exxybabe69,” workers had to
judge if usernames included examples of child exploitation, general sexual content, or none of the
previous categories. Video and audio annotation were also present in this platform. For example,
in the task “How dirty is this,” workers had to identify if there was sexual or “racy” content in
different media types, including audio files and videos. In section 4.2, we will revisit some of these
tasks to describe how workers navigated the different taxonomies they encountered to perform
data annotation tasks.
4.1.3 Algorithmic Verification.
81
Algorithmic verification involves the assessment of algorithmic outputs by workers. This type of
task was observed primarily on the annotation platform Tasker and accessed through Clickrating.
Tasker is the internal platform of a major technology company that develops a search engine.
They use Clickrating to recruit their workers who, depending on the project, have to sign special
contracts and non-disclosure agreements. Algorithmic verification tasks, for example, include
assessing how the search engine has responded to a user query, the objects that accompany a search
result (e.g., images, maps, addresses), or whether the search result contains adult content or not. In
many cases, these assessments include comparing search results with a competitor search engine
and assessing which one is more accurate and substantial.
In another example of algorithmic verification, one of the tasks conducted by the BPO Alamo for
its largest client (the same e-commerce corporation behind the “ID project”) consisted in verifying
the outputs of a model used by the client to moderate user-generated content in their marketplaces.
In this case, the task consisted of reviewing content flagged as inappropriate by an algorithm and
confirming or correcting the output. For this purpose, the client had provided handbooks that
contained each marketplace’s terms and conditions and examples of the specific forms a violation
could take. For workers, this task often involved being exposed to disturbing images and violent
language, which several interview partners described as “tough.”
4.1.4 AI Impersonation.
Impersonation is the rarest type of task, and it was only observed once in the platform Clickrating.
Tubaro et al. [81] describe it as a task that occurs “whenever an algorithm cannot autonomously
bring an activity to completion, it hands control over to a human operator.” The task that we
encountered, developed by a major social media company, asked workers to dialogue with users
and respond to their queries according to a set of predefined “facts, history, and characteristics.” If
the worker couldn’t answer the user’s query, they were asked to say, “Sorry, I don’t know about
that, but can I tell you about...” and then they had to “insert fact related that may be of interest to
user.” The platform instructed workers to complete dialogues in the least amount of time, and they
have to be logged into the platform in specific 3-hour sessions.
4.2 Linguistically-Performed Elements

RQ1: What discourses are present in task instructions provided to outsourced data workers?
To explore our first research question, we analyze the task instructions as text. Our analysis
focuses on the categories and classes used for collecting, sorting, and labeling data as contained in
the task instructions. We describe three recurrent elements: (1) the normalization of conventions
from the Global North and oriented towards profit maximization, (2) the use of binary classifications
and the inclusion of residual categories such as “other” or “ambiguous,” and (3) the discursive
elements that aim at constraining data workers’ agency in the performance of data-related tasks.
4.2.1 Normalized Classifications.

Taxonomies are the main component of task instructions for data work. They consist of classifi-
cation systems comprising categories and classes used to collect, sort, and label data. Definitions,
examples, and counterexamples usually accompany taxonomies. The number of classes depends
on the task and varies according to each platform or company. For instance, assignments on the
platform Workerhub usually present a smaller set of labels, ranging from two to a dozen maximum.
At the same time, jobs in Tasksource usually feature dozens of classes classified in several categories
(e.g., for the semantic segmentation of a road, the category “cars” included labels like “police car”
or “ambulance”). The taxonomies that we observed in instructions carry self-evident meanings to
the clients but are not necessarily relevant to the annotators or communities affected by the ML
82
system. For instance, the label “foreign language” refers to languages other than English, and the
category “mail truck” only comprises examples of USPS vehicles.
One of the projects conducted at the BPO Alamo consisted in analyzing video footage of a road. A
particularity of this project is that the requester did not predefine the labels, but workers were asked
to come up with mutually exclusive classes to label vehicles in Spanish. One of the main difficulties
of the project, however, lay in the nuances of the Spanish language: Probably oriented towards
targeting a broader market, the requester wanted the labels to be formulated in español neutro
(“neutral Spanish”), i.e., without the idioms that characterize how Argentines and most Alamo
workers speak. This contrast led to many instances of discussion among workers and managers
about which vehicles’ designations the client would consider “neutral.”
The mismatch between the classifications that requesters and outsourced data workers consider
self-evident becomes critical in cases of social classification. For instance, in the task shown in
Example 1, workers were asked to label individuals’ faces for facial recognition according to
predefined racial groups that included “White, African American, Latinx or Hispanic, Asian, Indian,
Ambiguous,” where the last category should be selected “ONLY if you cannot identify the RACE of
the person in the image.”
Example 1
In this task you will be determining the race of the persons in the images.
You should select only one of the following categories:
• White
• African American
• Latinx or Hispanic
• Asian
• Indian
• Ambiguous
Beyond the already problematic situation of being asked to “guess someone’s race,” for many
interviewed Latin American workers, this type of classification did not make sense, as it was
conceived by a US company with a US-centric conception of racial classification, i.e., with little
regard for the cultural and racial complexities in Latin America. Similarly, Gonzalo, a data worker
for Workerhub, declared having trouble with a task that asked him to label “hateful speech” in
social media posts:
They give you many examples of what they consider “hateful” and not. But, once you’re
doing the task, you don’t encounter basic examples, and it’s up to you as a worker to
interpret the context and decide what counts as “hate.” Clearly, [the requesters] have
their parameters and, if they don’t consider something hateful, they will mark your
work as wrong. . . . For example, a sentence like “kick the latinos out” would not be
regarded as “hateful,” and you will not interpret it the same way as a Latino.
These examples of social classification and conceptualization are not just about cultural differ-
ences between requesters and data workers, but they reflect the prevalence of worldviews dictated
by requesters and considered self-evident to them. Furthermore, the taxonomies respond to the
commercial application of the product that will be trained on the data that outsourced workers
produce. For example, the instructions to categorize individuals for facial recognition technologies
in Example 1 are based on a US-centric definition of “protected group.” Such definitions suggest
requesters’ efforts to prioritize the mitigation of legal risks, neglecting the safety of users from
83
social minorities and groups facing discrimination in other contexts, such as ethnic groups with
defined caste systems or social categories not protected by US law, such as economic class.
In another example that shows the inscription of requesters’ profit orientation, workers of a
major social media app working through Clickrating, were instructed to evaluate if a post was
“building awareness.” Here, “awareness” referred to posts with commercial content and discarded
anything personal or political in nature (see Example 2).
Example 2
Building awareness means that the purpose of the [post] is to give information about a brand,
product, or other [sic] object. For example, “@Restaurant is awesome for karaoke and the food is
delicious!” is building awareness. A story is not building awareness if it’s primarily about the author’s
life. A story that mentions the name of a business or service without providing much additional
context, or a story that refers to a product in passing while its author is sharing one of their experiences,
is not building awareness. For example, if the author says, “I’m eating in @restaurant”, the story is
not building awareness for the restaurant.
Requesters’ profit orientation is implicit in task instructions and gets inscribed in the ways
data workers approach their tasks. For instance, when asked what the procedure would be if they
were unsure about annotation instructions, most of the workers at the BPO answered that they
would abide by the requesters’ opinion because “their interpretation is usually the one that makes
more sense as they know exactly what kind of system they are developing and how they plan to
commercialize it.”
4.2.2 Binary and Residual Categories.
While complex taxonomies are common in the BPO, the most common classification tasks in
platforms are binary. For example, for tasks described as “Does this link title use sensationalist
phrasing or tactics?,” “Who is the author of this story?,” and “You will be looking at a bounding box
and choosing whether it is around a primary face,” the possible labels were “yes or no,” “human or
non-human,” and “yes or no” respectively. As mentioned above, many tasks, especially in Workerhub,
are based on categories protected by the United States legislation when defining what counts as
hate speech, racism, and other forms of discrimination (See Example 3). However, the classification
of whether a text contains hate speech is often reduced to a binary decision without considering
the context.
Example 3
In this task, you will be identifying messages that contain hate speech.
Based on the text, you must select:
• Hate Speech: if the username contains hateful content
• None: if there is NO hateful or abusive language in the given [sic]
Definition:
Select Hate Speech if the text contains any of the following:
• Discrimination, disparagement, negativity, or violence against a person or group based on a
protected attribute
• References to hate groups, or groups that attack people based on a protected attribute
This form of binary classification often ignores ambiguity or uncertainty, such as when workers
are confronted with contexts that are ambiguous or separate from their cultural setting. Moreover,
given the impossibility of platform workers to send feedback to requesters, many omissions
84
(involuntary or not) remain unquestioned. For instance, a task on Workerhub asked workers to
classify images as “racist” or not; we observed an image representing several copies of the “crying
Wojak” meme wearing kippahs inside a heating oven while the meme “Pepe the Frog” is watching
outside. The text above reads: “Changed the wooden doors today frens [sic], this is actually working
as intended now!” While this image was, for us, a clear example of antisemitism, a form of racism
condemned worldwide, including by the United Nations [83], requesters instructed workers not to
consider this type of hateful content as “racist.”
In several of the cases where binary categories were prescribed, we encountered a third label,
usually called “other” or “ambiguous,” not to designate an additional class that would break the
binary classification but to merge errors or instances where the worker cannot apply one of the
two labels (see Example 4). Workers are also encouraged to ignore ambiguity altogether. Some
Clickrating tasks acknowledge the limits of this binary classification and urge workers to ignore
other possible attributes when categorizing data. For example, in an assignment where workers
tagged queries for a search engine, the instructions referred to “Queries with Multiple Meanings”
for “queries [that] have more than one meaning. For example, the query [apple], in the United
States might refer to the computer brand, the fruit, or the music company.” In this case, the task
instructed workers to “keep in mind only the dominant or common interpretations unless there is a
compelling reason to consider a minor interpretation.” The “dominant” or “common” interpretation
of a term in the US may be different to the one in Latin America. Still, we repeatedly encountered
similar instructions to deal with instances of ambiguity in many tasks.
Example 4
Overview
Select:
• Male: if the boxed face is a male
• Female: if the boxed face is a female
• Other: if there is no face in the box
Very often, instruction documents were dated, and requesters provided several updates. For
instance, in the task “How dirty is this image/video” on the platform Workerhub, workers were
initially instructed to label adult content in images. Later on, the requesters provided videos
without updating the documents, confusing some workers because they could not apply the original
instructions to the video footage. As a result, the requesters had to update the instructions to include
details about video annotation. In another example, a document that asked workers to classify
elements in a road revised its definition of “Pedestrians sitting on the ground” to include also those
“laying [sic] on benches and laying [sic] or sitting on the ground” (see Example 5). This example
also shows that the “other” category that we described above does not represent “everything else”
included in the classification but that there are elements deliberately or accidentally left out. In
this latter example, the requesters failed to see that people in the streets are not necessarily always
“walking” or “sitting,” but that a segment of the population lies or sleeps on them.
85
Example 5
Pedestrians sitting on the ground

Use the “pedestrian” label if a pedestrian is sitting on the ground, bench, ledge, then use the “pedestrian”
label.
UPDATE!!
Use the “PEDESTRIAN LYING DOWN” label for pedestrians laying [sic] on benches and laying
[sic] or sitting on the ground.
4.2.3 Warnings.
As we will argue in the following sections, the influence and preferences of powerful actors
in data work are stabilized through narrow task instructions, specially tailored work interfaces,
managers and quality assurance analysts in BPOs, and algorithms in crowdsourcing platforms.
Some of these processes are part of the tacit knowledge workers have about their position (i.e., it
goes without saying that workers must carry out tasks according to the preferences of requesters).
However, task instructions often make explicit reference to the power differentials between workers
and requesters as they include threatening warnings such as “low quality responses will be banned
and not paid” or “accurate responses are required. Otherwise you will be banned” (see Examples
6–8).
Platform workers risk being banned and even expelled from the platform if they fail to follow
task instructions. At the BPO Alamo, the communication of instructions is mediated by project
managers and team leaders. For this reason, such warnings are not explicit in written documents
but are present in reviewing instances and evaluating workers’ performance. Here, any concerns
expressed at the workers’ end are filtered through hierarchical managerial structures and hardly
ever reach requesters.
Example 6
This is a high paying job, a special job, but to gain access to it and to keep access to it after passing
the qualification test, we require patience and VERY careful [sic] thought out and accurate
responses.
Otherwise, you will, unfortunately be banned from the job :(
Example 7
The picture above shows SHIFTING DATA which means that the LiDAR points for stationary objects
move or slide around throughout the scene.
Any [Project] Tasks with shifting data are not usable by the customer & have to be cancelled.
You will NOT get paid for working on task [sic] with shifting data!!!!!!!!!!
Every time you get a [Project] Task (before you start working) always turn on dense APC and look
around the entire scene to check for shifting data. You will be able to tell that the shift is big enough
to be cancelled if it makes any object 0.3m plus larger than it’s [sic] normal size (or if it makes a flat
wall 0.3m plus thick) and effects [sic] multiple cuboids.
86
Example 8
We value your individual opinion and review each result, so please provide us with your best work
possible. We understand that this can be a tiring task, so if you are in any way unable to perform
your best work, please stop and come back once you are refreshed. You may also see multiple queries
with the same kind of visual treatment.
Please keep your judgments consistent UNLESS you feel that there is some difference in the two that
would result in a change of overall score.
Judges providing low quality responses will be banned and not paid.
These messages encode a precise definition of “accurate responses”: Accuracy is classified

according to what the client believes to be an accurate truth value, while divergence from that value
is considered inaccurate. Here, too, the classifications that make sense to requesters have prevalence.
This is why workers at the Argentine BPO are permanently encouraged by management to think
in terms of “what the client might want and what would bring more value to them.”
Given the social and economic contexts in which the outsourcing of data work occurs, warnings
and threats of being banned or fired reinforce the hierarchical structure of the annotation process
and compel workers to follow the norms as instructed or risk losing their livelihood. In the next
section, we will present evidence of how the social contexts of workers and the fear of losing their
job shapes how assignments are carried out.
4.3 Non-Linguistic Practices and Social Context

RQ2: How do outsourced data workers, managers, and requesters interact with each other and instruc-
tions to produce data?
This section explores our second research question. Here, we focus on analyzing interviews to
describe the contexts in which data-related tasks are carried out and the interactions they enable.
We describe (1) the social contexts of Latin American workers that lead to their dependence on
data work regardless of the labor conditions, (2) the elements that contribute to the unquestioning
obedience to instructions, and (3) moments of subversion of rules as well as workers’ organization
and solidarity.
4.3.1 Poverty and Dependence.
Being an impact sourcing company, Alamo employs around 400 young workers who live in
slums in and around Buenos Aires. As stated on its website, Alamo specifically recruits workers
from poor areas as part of its mission. As Natalia, one of the BPO’s project managers, describes,
this is a population that does not receive many opportunities in the Argentine labor market:
They are very young, and a bit, you know. . . Alamo works with people another company
wouldn’t hire, so people who live in areas. . . slums with difficulties, with a very low
socioeconomic level. That’s something the company pays attention to when it comes
to recruiting, and if during the interview we detect that the candidate could have an
opportunity somewhere else, we prefer not to hire that person and hire someone else.
One particularity of Alamo is that it provides workers with a regular part- or full-time salary.
This form of employment contrasts with the widespread piece-wage model in platforms. The salary
Alamo’s workers received in 2019 was the equivalent of 1.70 US dollars per hour which was the
minimum legal wage in Argentina. Despite the low wages and exhausting tasks such as semantic
segmentation or content moderation, all interviewees were satisfied as the company offers better
conditions than previous experiences. According to a report published in November 2020 by the
national Ministry of Production [76], the unemployment rate in Argentina is 10%, and 35% of the
87
employed labor force is not registered. Argentina has a long tradition of undeclared labor. This
way, employers avoid paying taxes while workers remain without protection or benefits.
Behind the numbers are people like Nati, who did different types of precarious work before
working for Alamo. She started at the BPO as a data annotator and quickly became a reviewer until
being offered a position as an analyst in the company’s QA department. Like other Alamo workers,
she acknowledges the difficulty of securing a desk job somewhere else. Moreover, many of our
research participants mentioned being proud of the work they do at Alamo because a desk job has
“a different status.” For several of them, working at Alamo means finally having a steady income
and breaking with generations of informal gigs, for example, in the cleaning or construction sectors.
As Nati explains, what Alamo offers is better than the alternatives:
That was the situation at home; we were going through a rough time. My mother was
out of work because her former boss had found someone else to clean, and I had lost
my job too. So I needed a job and when I found this one I was surprised to work at a
friendly place for a change! Now I have a desk, a future, and I feel appreciated. This is
new to me.
In the case of the platforms, they have thrived in the Venezuelan economy, which is characterized
by the highest levels of inflation in the world, with an average of 3,000% in 2020 [1]. All participants
that we interviewed from Venezuela stated that the “situación país” [country’s situation] was the
main reason they resorted to online work. Workers reported difficulty finding employment in the
local labor market, especially for income that is not dependent on the national currency, the bolivar,
which devalues quickly. For example, Rodrigo, a Clickrating worker, quit his job as an information
technology consultant because online platforms were the only way he could earn US dollars. He
explains the monetary situation of his country as follows:
There are two types of currency exchange rates: the official rate dictated by the govern-
ment and the one used on the black market, which everyone uses. Everyone knows this
black-market exchange rate. It’s an Instagram profile that posts the average exchange
rates of several independent currency exchange websites. They make this average and
post the fluctuation several times per day, which is the exchange rate that we use today.
Platforms’ low entry barriers make outsourced data work an attractive — and sometimes the
only — source of employment during social, economic, and political crises. Data workers earn 15 to
60 US dollars per week, the average being around 20 dollars, which is substantially higher than the
minimum wage in Venezuela reported by workers to be around 1 US dollar per month in March
2021.
Dependency on the platforms is exacerbated by the high unemployment levels and reduced
government support during the COVID-19 pandemic. In this situation, workers have limited access
to subsidies and pay for services such as healthcare from their income [65]. For example, Olivia, one
of the Tasksource workers, was diagnosed with diabetes and has to self-fund the costs of insulin.
For her, losing access to the platform is a life-threatening situation:
Imagine, with this pandemic, what can I do? My medical situation does not allow me to
go outside and risk getting the [coronavirus disease]. If I get it, I’ll die. For this reason,
I cannot take the risk and expose myself to something worse; I can’t risk this job either
because this is my only source of income.
This dependency affects the labor process as well. Workers usually do not choose which tasks to
perform, even if they disagree ethically with their assignments. When asked what criteria Carolina,
a Clickrating worker, uses to choose a task over another, she answered:
88
My priority is to get the tasks that pay the best. But I don’t even have that choice. The
platform restricts which jobs are available here in Venezuela, so I have to make the
most of it to earn the minimum and get paid as soon as I get one task.
By “minimum,” Carolina refers to the minimum income workers can transfer out of the platforms,
which is another form of creating dependency. Platforms establish a minimum of 5 to 12 US dollars
before they make payments and, if a worker cannot achieve this threshold, they have to wait for a
week before withdrawing their salary. This payment process is a form of institutionalized wage
theft implemented by the platforms. Workers lose the money they have worked for if they get
banned before reaching the threshold for payment.
4.3.2 Obedience to Instructions.

In its outsourcing capacities, Alamo focuses on data-related services ordered mainly by machine
learning companies. Even if they display some similarities, each of those projects is different from
the previous ones, and workers need to be briefed regularly. Depending on the difficulty and
extension of the task, briefings can be more or less sophisticated and involve more or fewer actors,
meetings, and processes. Sometimes, the instructions for new projects are sent by the requester
via email in a PDF document. One of the area managers receives that information and transmits
it to a project manager, who would then put together a team and work closely with their leader.
Depending on the degree of difficulty, one or more meetings with the team will be held to explain
the project, answer questions, and supervise the first steps. When handling large projects from
multinational organizations, Alamo invests a considerable amount of resources in the briefings. No
matter how big or small the requester, briefings at Alamo consist of getting the workers acquainted
with the expectations of the requesters and are a way of making sure workers are on the same page
and thinking similarly:
The information from the client usually reaches the team leader or the project manager
first, and, at that moment, what we do is to have a meeting for criteria alignment. . .
that is generally what we do. The team meets to touch base and see that we all think in
the same way. (Quality assurance analyst with Alamo)
These briefings give workers a framework for new projects and are instrumentalized by the
company as the first instance of control, aiming at reducing room for subjectivity. Further control
instances, aiming to ensure that data work is done uniformly and according to requesters’ expecta-
tions, take place in numerous iterations where reviewers and team leaders review and revise data
and go back to the instruction documents or contact the requester to clarify inconsistencies.
In companies like Alamo, data quality means producing data precisely according to the re-
quester’s expectations. According to Eva, a BPO manager in Bulgaria, this view on data quality is
commonplace in data services companies. In the following excerpt, she summarizes the importance
and main function of instructions and further instances of control, i.e., making sure that the workers
interpret the data homogeneously:
Normally, issues in data labeling do not come so much from being lazy or not doing your
work that well. They come from a lack of understanding of the specific requirements
for the task or maybe different interpretations because a lot of the things, two people
can interpret differently, so it’s very important to share consistency and, like, having
everyone understand the images or the data in the same way.
The interviews we conducted with requesters show that the priority behind the formulation
of task instructions is producing data that fits the requester’s machine learning product and the
business plan envisioned for that product. What does not match the requester’s instructions is
89
considered low-quality data or “noise.” Dean is a machine learning engineer working for a computer
vision company in Germany. He reported on this widespread view as follows:
Dean: Noise is what doesn’t fit your guidelines.
Interviewer: And where do those guidelines come from?
Dean: We say, “actually we want to do this, we want to do that,” and then, of course,
since the client is the king, we translate that business requirement into something
like. . . into a requirement in terms of labels, what kind of data we need.
As described in Section 4.2.3, compliance with requesters’ views is made explicit in instruction
documents in the form of warnings for workers. Those documents are usually the only source
of information and training platform workers have to complete their tasks. However, the case of
the platform Tasksource is slightly different, as it employs Latin American coaches to brief and
explain to workers how to interpret instructions and annotate tasks. This approach is similar to the
one used by the BPO Alamo and described above. However, at Tasksource, briefings take place in
week-long unpaid digital courses called “boot camps” and later evaluation periods called “in-house.”
The use of the military and correctional term “boot camp” could be interpreted as reflecting this
training’s purpose: conditioning workers to obey tasks without question. Ironically, even though
the platform employed workers to help train artificial agents, they were supposed to behave like
“robots,” according to a Tasksource worker named Cecilia:
When you start, they tell you: “To be successful in this job, you have to think like a
machine and not like a human.” After that, they explain to you why it has to be like
that. For example, you are teaching a [self-driving] car how it has to behave. When
you segment an image, there is a police car, and you label it like a regular car, the
[self-driving] car will think it’s a regular car and, if it crashes against it, something
terrible can happen. The mistake was not of the car that crashed into the police vehicle,
but it’s yours as a tasker, as a worker, who taught the car to behave like that.
Platform workers serve a similar role as BPO employees in reinforcing the primacy of instructions
and requester intent to complete tasks effectively, producing data that fits model and revenue plan
while shifting the responsibility for failures on workers. In this context, obedience to instructions
is critical for data workers to keep their job and make a living. The fear of being fired, banned from
the job, or not being paid for the task reinforces the disposition of workers to being compliant,
even when instructions look arbitrary. This is what Rodolfo, Tasksource worker, reported:
That is why I don’t like that platform very much. Because they give us the instructions
and we have to follow. And there are many cases where, if you don’t complete the task
really to perfection, according to what they want or what they think is right, they just
expel you. Just like that, even if you followed the instructions thoroughly.
4.3.3 Worker Solidarity and Organization.

Not everything is imposition and obedience in data work. There are also several expressions of
workers organizing to improve working conditions and help each other deal with tasks and make
the most out of them.
For instance, Alamo’s employment model that includes data workers as part of the company’s
permanent staff instead of having them as contractors results from workers organizing to demand
receiving a fixed salary and benefits. As reported by one of Alamo’s workers, Elisabeth, in 2019,
further workers’ demands were being negotiated with the company:
We asked for a couple of things like the possibility of home office and a better healthcare
plan. We are organizing many things. It’s being negotiated.
90
In 2020, probably also motivated by the COVID-19 pandemic, Alamo’s data workers were
finally allowed to work remotely. It is worth mentioning that before 2020, every other company’s
department and management were allowed to work at least some days of the week remotely while
the data workers could not.
In the case of the platforms, data workers organize in virtual groups and fora. The existence
of virtual and local groups of workers that provide solidarity and support has been reported in
other examples of platform labor [16, 67, 88]. In the case of Venezuelan data workers, we observed
similar situations. Because we used convenience and snowball sampling and worker groups on
social media as a starting point, all the interviewed participants were directly associated with
them. Participants use these independent and worker-led spaces to exchange information about
which tasks pay more and are less challenging to complete and warn each other about non-reliable
requesters. One of the aspects that workers paid significant attention to was the presence of bugs
in the tasks. When asked about their existence, Yolima, a worker with the platform Tasksource,
said to us:
Errors occur all the time. But, since we are in groups on Facebook and Whatsapp, we
alert each other and say, “Hey, don’t do this task because it has a bug. It will flag you
as mistaken even if you have done everything ok.”
Some smaller groups, with high entry barriers to ensure privacy and trustworthiness among
members, recommend specific tasks over others. For example, when describing tasks with sexual
or violent content, Estefanía, one of Clickrating’s workers, stated:
I don’t like those tasks with pornographic content. I do them only when my friends
from the groups say, “look, this is a good task, here’s the link.” I don’t have to look for
good tasks, and that’s great. I just have to log into my account and do the annotation
without worrying about which tasks to do.
Some users of these smaller groups also craft guides to explain the instructions to their peers.
Most interviewed workers stated that their knowledge of the English language was limited. Since
Tasksource and Clickrating only presented instructions in that language, and Workerhub provided
automated translations with errors, these guides in Spanish are a fundamental tool for workers. They
are written by workers for their peers and contain Spanish translations of taxonomies, definitions,
and examples. They also provide further explanations about the contexts in which workers can
apply the taxonomies, avoid being banned by the algorithm, and maintain high accuracy scores.
For example, in the introduction of a guide for a task to annotate hate speech in text for Workerhub,
a user wrote:
Example 9
IMPORTANT INFORMATION
What I’m sharing in this guide is based on my experience with the task. I’ll try to explain as best as I
can the tips that I consider are the most important to avoid being banned and the essential information
to understand the task.
BE CAREFUL
The task “No Hatred” is not available on all accounts. You must have been paid AT LEAST ONCE.
IT’S IMPORTANT THAT YOU CONSIDER THIS GUIDE FOR WHAT IT IS: A “GUIDE” made for you
to understand the task better. You must earn real experience by doing the task with perseverance and
dedication.
Work practices in the data-production dispositif are not informed exclusively by the relationships
between requesters, intermediaries (platform or BPO), and individual workers. They are also
dependent on the networks formed by the latter group. This can be observed in BPOs where data
91
workers share the same office space and constantly consult and advise each other on conducting
projects more quickly and easily. Among platform workers, online groups help to choose what
tasks to carry out, and such decisions are influenced by recommendations and guides from peers
who evaluate instructions from requesters.
4.4 Dispositif’s Materializations

RQ3: What artifacts support the observance of instructions, and what kind of work they perform?
In this section, we focus on the third research question. Based on the observations conducted
at the crowdsourcing platforms and the BPO company, we present three of the many possible
materializations of the data-production dispositif: (1) the function of diverse types of documents
that embody the dispositif’s discourses, (2) the platforms and interfaces that guide and constrain
data work, and (3) the tools used by managers and platforms to surveil workers and quantify their
performance.
4.4.1 Documents as Artifacts.

In Section 4.2, we focused on the content of instruction documents to describe the discursive
elements comprised in them. Here, instead, we look into a variety of documents — instructions
included — to analyze them as artifacts, focusing on their form, function, and type of work they
perform.
One common document related to data work at BPOs is that containing metadata and project
documentation. Alamo, for instance, records the details of each project in several documents that
vary in form and purpose according to the task and the requester. Often, that documentation aims
to preserve the evolution of task instructions, registering changes requested by clients. Keeping
this type of documentation functions as a form of “insurance” for Alamo and can help resolve
discrepancies if requesters are not satisfied with the service provided. In those cases, project
documentation serves as proof that data was produced as instructed. Documents containing project
details can also serve the purpose of preserving situated and contingent knowledge that would
otherwise get lost and could help improve future work practices [53]. Sometimes, these documents
become artifacts that cross Alamo’s boundaries and reach the requesters. For them, the documents
might have a factual function (in terms of the information they want or need) or a symbolic one (to
reassure clients that Alamo is at their disposal). Alamo’s QA analyst Nati describes this as follows:
We send a monthly report to the clients, including what was done and problems we
encountered; we set objectives for the following month and send an overview of the
metrics. Some clients don’t even look at the report but insist on receiving it every
month. Others value it and use the information to report to leadership or investors
within their organization.
The documents produced by the BPO are tailored to be valuable for requesters. Conversely, the
documents formulated by requesters often remain unintelligible for data workers, even if they are
the primary addressees, as in the case of instruction documents. In many cases, language is the
main issue hindering the intelligibility of documents: Most of the workers we interviewed have
limited knowledge of the English language and reported using translation services, notably Google
Translate, to understand the instructions provided by requesters. As mentioned in the previous
section, one of the main reasons platform workers resort to guides written by peers is that they are
written in Spanish. But beyond language differences, elements of the taxonomies used in documents
can also be confusing, as explained by Tasksource’s worker Yolima and described in Section 4.2.1:
92
For the [categories], they are made in the United States, I think. I don’t know what
they would call a laundry sink1 , a shower, or parts of the bathroom. Most of the time,
my mistakes were with parts of the bathroom, especially around the shower, the tap,
and those things. That was confusing because that was a shower for me, but it was
something else for [the platform].
The confusion produced by the different languages is not merely a matter of cultural bias.
Looking for cheap labor, platforms and requesters target the Venezuelan market but ignore the
language barriers and formulate instructions in English. Moreover, further documents that workers
encounter in their work, such as privacy policies, contracts, and non-disclosure agreements, are
also prepared in English and remain, partially or totally, unintelligible for them. Workers usually
sign these documents without understanding the full scope of their contractual relationship with
their employers. Along with instructions, these documents embody the data-production dispositif.
They are a materialization of normalized discourses and practices that shape data workers to be
dependent and, therefore, obedient, while their subjectivities as Spanish-speaking Latin American
workers are ignored and erased.
4.4.2 Work Interfaces.
In BPOs like Alamo, choices regarding which platform will host the data and will be used as
a tool are made by clients. In many cases, the requester has developed an annotation software
specifically tailored to the needs of their business and the dataset to be produced. In other projects,
the company uses a commercial platform designed by a third party. In this case, the client would
suggest the tool that best fits their needs among several choices available on the market.
The choice of a specific tool comes with limitations that, in one way or another, constrain data
workers’ agency to interpret and sort data. The most notorious one is that the taxonomies contained
in instruction documents are also embedded in the software interfaces that workers use to collect,
organize, segment, and label data. Workers usually interact with a drop-down menu containing all
the classes or attributes they are allowed to apply to data. Most interfaces do not allow workers to
add further options to the list of pre-defined labels that they receive from requesters. This is most
prominent in software interfaces specially designed by requesters and tailored to specific projects.
In those cases, the software interfaces that mediate between workers and data are designed to
ensure that tasks are completed according to particular parameters pre-defined by requesters and
made explicit in the instruction documents.
In the case of regular data work tools for commercial use, the impossibility of changing the
predefined categories or adding more classes is perceived as a limitation that makes data work
harder at BPOs and requires communication throughout hierarchical structures until the requester
is reached. Jeff, one of the managers leading a BPO in Iraq, reports on this issue:
There was a limitation on the annotation tool that they were using. They were relying
on an open-source platform that doesn’t have that feature that lets you add or create
predefined attributes, which makes the work many times easier.
Some of these generic tools give the project owner — generally the requester or a BPO’s project
manager — the faculty to allow workers to add further options to the classification system (see
Figure 1). However, this does not seem to be a widespread practice at Alamo. Among the many
projects that we had the opportunity to observe during fieldwork, only once were data workers
allowed to co-create the taxonomy around which data was organized and annotated.
1 We use this term to translate “lavadero,” a commonplace in Latin American homes for the washing of clothes equipped
with a washboard basin and sink.
93
Users
Administrator
EDIT PROJECT
Basic User Roles

Add new
Classes and attributes
Project Owner Administrator Supervisor Labeller View Only
Images and datasets
User and roles Administrate
Upload annotations Edit classes and attributes
Automated labelling Upload images
Export
Invite / Edit / Remove Users
Import
Start annotating
Fig. 1. Commercial data annotation tool. Only the project owner can grant rights to data workers.
In the crowdsourcing platforms that we studied, only Clickrating presented external interfaces,
meaning that workers had to log into internal annotation platforms of clients, notably in the case of
tasks requested by major technology companies. For Tasksource and Workerhub, workers interacted
with data annotation interfaces developed by these companies. In both cases, the screen displayed a
top bar with an accuracy score or the percentage of tasks submitted by the worker that the platform
judged accurate. For Workerhub, the top bar also showed the number of annotations completed for
the assignment, the earnings, the time spent per task, and a button to display the instructions (see
Figure 2). On both platforms, the labels were available in the right sidebar alongside tools to zoom
in and out and configure the visibility of the data. In all three platforms, workers could not change
the predefined labels or suggest changes.
The interfaces present in the BPO and the platforms feature gamification elements (scores and
timing) to speed up the labor process and keep workers focused on the tasks at hand. The over-
reliance on speed privileges action over reflection and increases the alienation between workers
and the production process. Even tasks that ask for workers’ judgment, such as those present
in Clickrating, are timed and reward fast thinking. That said, unlike the other platforms, they
offer room for comments to evaluate algorithmic outputs that can be substantial, creating more
engagement for workers beyond narrow annotation tasks.
4.4.3 Tools to Assess Worker Performance.

To differentiate itself in the very competitive market of outsourced data services, the BPO Alamo
makes a selling point out of its performance metrics and quality assurance mechanisms. The
company puts much effort into developing more and better ways of measuring performance and
quality, and transforming those into numbers and charts the client may perceive as valuable. As a
response to market demands, quality controls intensify, which results in pressure and surveillance
for workers. Moreover, the need for quantifiable data to translate “quality” into a percentage
exacerbates the standardization of work processes, which, once more, results in less room for
workers’ subjectivity.
Alamo has highly standardized processes that include a team leader and several reviewers per
team and a quality assurance (QA) department using several metrics to ensure that projects are
conducted in accordance with the requesters’ expectations. In addition, team leaders and the QA
department use metrics to quantify workers’ labor. As a token of transparency, sometimes workers’
94
Workerhub
https://workerhub.com/feed/5476789
i Instructions 1 Task Unit

$ 0.55 per 1.000 task Unikats
Task Timer 00:00:39 Guess how old I am? Previous Next
Image Settings
Brightness
Reset
Contrast
Reset
Hotkeys
View Keyboard Shortcuts
Categories
Type category namens to filter
Baby
Toddler
Pre - Teen
Teenager
Adult
Middle - Age
Senior
Other
Fig. 2. Age-based classification of images on Workerhub’s interface.
scores are shared with clients. Noah, one of the BPO’s team leaders, describe the function of metrics
within Alamo and concerning its clients:
We have metrics for everything. They can be individual, for personal output, or they
can be general in the project. We have some to measure correct and incorrect output,
there we see where we fail, where we can give more support to the team so that those
errors are corrected, how we can solve those problems. In QA, what they do is metrics.
Metrics, and ensure that the quality provided to the client is high.
In platforms, workers are also constantly evaluated with accuracy and speed metrics. Instead of
being managed by company employees like in the case of Alamo, platform workers are assessed and
controlled by algorithms. All platform workers we interviewed reported being often banned from
tasks because the algorithms negatively evaluated their performance. Of course, this represents a
serious obstacle to maintaining a stable income, especially when it is permanent. This is what Juan,
a worker of Workerhub, reported:
Juan: The platform pays every Tuesday. Once they ban you, you lose all your credits,
in the sense that, without an account under your name and email, you can’t open a
new account and access the money you’ve earned.
Interviewer: Did they tell you why?
Juan: No. I could have asked in the [platform managed] Discord channel, but if you ask
anything, you get banned. They are the ones who command. . . they are the ones who
decide. I was banned without cause because my accuracy was high. I never knew why
they expelled me.
Interviewer: How did you realize you were banned?
Juan: One day, I couldn’t access my account. . . . I created another account with the
same email, worked for a week, and they banned me again. They didn’t pay me. Some
95
of my colleagues from the same neighborhood and cousins who work for the platform
told me: “Don’t create an account with the same email because they won’t let you.
They will let you open it, but then they won’t pay you.”
The algorithms that assess worker performance in the three platforms that we have studied
follow exactly the same three-step process: First, workers have to work with the same data again
after some time. For example, if the task involves categorizing photographs of flowers according to
their colors, if a worker marks the same image differently, the algorithm will consider it “spam.” The
second mechanism is to verify workers’ answers with previously labeled data. If there is a mismatch,
the algorithm will assume that the worker is not performing their activities “accurately.” Finally,
from interviews with workers and previous observations in Amazon Mechanical Turk [56], a
platform that is not well established in Latin America outside of Brazil and, therefore, not the focus
of this study, the third method used by algorithms is to compare workers’ answers with those of
peers and assume that the most common answer is the correct one. Many of the workers’ groups
that we encounter provide guides, so workers do not diverge from the responses of the majority
and, thus, keep high levels of accuracy from the perspective of the algorithms.
5 DISCUSSION
In concurrence with previous work [27, 53, 54, 75], we have observed that workers collecting,
interpreting, sorting, and labeling data do not do so guided solely by their judgment: their work
and subjectivities are embedded in large industrial structures and subject to control. Artificial
intelligence politics are inextricably connected to the power relations behind data collection and
transformation and the working conditions that allow preconceived hegemonic forms of knowledge
to be encoded in machine learning algorithms via training datasets. Labor conditions and economic
power in the production of ML datasets manifest in decisions related to what is considered data
and how each data point is interpreted.
While task instructions help data workers complete their tasks, they also constitute a fundamental
tool to assure the imposition of requesters’ worldviews on datasets. Sometimes, the meanings and
classifications comprised in data work instructions appear self-evident to workers, and a shared
status quo is reproduced on the dataset. Often, however, the logic encoded in the instructions does
not resonate with them. This could be due to cultural differences between requesters and data
workers, lack of contextual information about the dataset’s application area, perceived errors that
cannot be reported, or simply because the tasks appear ethically questionable to workers. In such
cases, another form of normalized discourse persists: that of a hierarchical order where service
providers are conditioned to follow orders because “the client is always right” and workers should
“be like a machine.”
According to Foucault, discourse organizes knowledge that structures the constitution of social
relations through the collective understanding of the discursive logic and the acceptance of the
discourse as a social fact. A normalized discourse is, therefore, what goes without saying. This way,
the prevalence of requesters’ views and preferences does not need to be explicitly announced to
workers. Instead, such implicit knowledge influences how the outsourced data workers that we
observed and interviewed perform their tasks: carefully following instructions, even when they
do not make sense to them or when they do not agree with the contents and taxonomies in the
documents. The context of poverty and lack of opportunities in the regions where data production
is outsourced is also fundamental as it makes workers dependent on requesters and, thus, obedient
to instructions.
Finally, artifacts such as narrow work interfaces with embedded predefined labels, platforms
that do not allow workers’ feedback, and metrics to assess workers’ “accuracy” (understood as
96
The client
is always
right…
INSTRUCTIONS:
Accurate responses or be
banned from the job.
The
Data-Production
Discursive Practices Dispositif Non -Discursive Practices
Use the drop-down menue to select

only one label per Image:
BLACK
LATINX
CAUCASIAN
Materializations
Fig. 3. The three components of the data-production dispositif based on the framework and figure proposed
by Jäger and Maier [39]
compliance to requesters’ views) constitute discursive materializations that, at the same time,
ensure the perpetuation and normalization of specific discourses.
All these elements combined — the predefined truth values encoded in instructions,
the work practices and social positions of workers, and materializations such as inter-
faces — constitute the data-production dispositif. Without any of these elements, the disposi-
tif would not be able to function as such. As Foucault puts it, dispositifs respond to an “urgent
need” [25] that is historically and geographically contingent. The data-production dispositif re-
sponds to the voracious demand for more, cheaper, and increasingly differentiated data
to feed the growing AI industry [5, 13]. Its goal is to produce subjects that are compliant to that
need.
The Foucauldian notion of subject has a twofold meaning, with subjects, on the one hand, being
producers of discourse and, on the other hand, being created by and subjected to dispositifs. All
subjects are entangled in dispositifs and have, therefore, tacit knowledge of how to do things within
specific contexts. This tacit knowledge includes “knowing one’s place” and what is expected from
each subject depending on their position. Thus, data workers know that subjects in their social and
professional position are implicitly expected to comply with client’s requests. This way, dispositifs
normalize and homogenize the subjectivities of those they dominate, producing power/knowledge
relationships that shape the subjects within the dispositif according to certain beliefs, actions, and
behaviors that correspond to the dispositif’s purpose [21, 23].
Following Foucault’s perspective, we argue that the goal of the data-production dispositif is
creating a specific type of worker, namely, outsourced data workers who are kept apart
97
from the rest of the machine learning production chain and, therefore, alienated. Data
workers who are recruited in impoverished areas of the world, often under the premise of
“bringing jobs to marginalized populations,” but are not offered opportunities to rise so-
cially or professionally in terms of salary and education. Data workers who are surveilled,
pushed to obey requesters and not question tasks, and who are constantly reminded of
the dangers of non-compliance. Data production cannot be a dignifying type of work if it does
not provide workers with a sustainable future.
The implications of this data-production dispositif designed to constrain workers’ subjectivi-
ties and perpetuate their alienation, precarization, and control, will be unpacked in the following
subsection.
5.1 Implications
As the generous corpus of research literature dedicated to mitigating bias in crowdsourcing suggests,
controlling workers’ subjectivities is considered essential to avoid individual prejudices being
incorporated in datasets and, subsequently, in machine learning models. However, as we have
shown with our findings, unilateral views are already present at the requesters’ end in the form
of instructions that perpetuate particular worldviews and forms of discrimination that includes
racism, sexism, classism, and xenophobia.
Given its characteristics, the data-production dispositif is detrimental to data workers and the
communities affected by machine learning systems trained on data produced under such conditions.
To close this paper, we would like to make a call to dismantle the dispositif. However, before going
into the implications of our call, it is crucial to consider that we never cease to act within dispositifs
and, by dismantling the data-production dispositif, we would inevitably give rise to another one.
Therefore, we discuss here ways of dismantling the data-production dispositif as we know it today,
that is, by changing the material conditions in data work and making its normalized discourses
explicit.
5.1.1 Fighting Alienation by Making the ML Pipeline Visible to Workers.

Substantial efforts in research and industry have been directed towards investigating and mitigat-
ing worker bias in crowdsourcing. Many of these initiatives portray data workers as bias-carrying
hazards whose subjectivities need to be constrained to prevent them “contaminating” data. This
widespread discourse within the data-production dispositif gives place to narrow instructions and
work interfaces and the impossibility of questioning tasks. Workers are required to “think like
a machine” to be successful in the job. Moreover, data workers are often kept in the dark about
requesters’ plans and the machine learning models that they help train. Such conditions lead to
workers’ alienation as they are kept apart from the rest of the ML production chain.
Researchers have often referred to data workers such as data labelers and content moderators,
practicing ghost work [30] that remains “invisible” [70]. However, as Raval [69] accurately argues, it
is worth asking invisible for whom and, most importantly, “what does this seeing/knowing—hence
generating empathetic affect among Global North users—provide in terms of meaningful paths
to action for Global South subjects (workers and others)?” Breaking with the alienation of data
workers means much more than rendering them visible. It rather requires making the rest of the
machine learning supply chain visible to them. It means providing information and education on
technical and language matter that could help workers understand how their valuable labor fuels a
multi-billion dollar industry. This also concerns questions of labor organization and unionizing:
For instance, the recently-created Alphabet Workers Union has taken steps in this direction by
including contractors — many of them outsourced data workers. To help counter their alienation,
98
researchers and industry practitioners need to regard data workers as tech workers as much as we
do when we think of engineers.
Why would requesters want to educate data workers and disclose technical or commercial information
to them?
As mentioned above, the design of the tasks that we encountered failed to acknowledge and
rely on the unique ethical and societal understanding of workers to improve the annotations and,
with them, models. We found that the BPO model generates a stronger employment relationship
with workers compared to platforms, notably Workerhub and Tasksource, which translates into
higher engagement with the tasks at hand. Furthermore, BPO workers interviewed by us said they
wished they knew more about the requesters’ organizations and products because this would help
them understand their work and perform better. In this sense, expanding instructions to include
contextual information about the task, its field of application, and examples that show its relevance
for systems and users could improve data workers’ motivation and satisfaction, and help them
understand the value of their labor within ML supply chains.
5.1.2 Fighting Precarization by Considering Data Workers Assets.
One of the most pressing ethical and humanitarian concerns surrounding outsourced data work
is the workers’ quality of life. The data-production dispositif is designed to access a large and cheap
labor pool and profit from workers’ precarious working conditions. It is not a coincidence that, in
Latin America, the platforms we encountered were established primarily in Venezuela, a country
mired in a deep socio-political crisis exacerbated by the COVID-19 pandemic, and that the BPO
company in Argentina recruited its workers from low-income neighborhoods.
While the arrival of these platforms and BPOs has allowed many workers to circumvent the
limits of their local labor markets, the system of economic dependency and exploitation that they
reproduce hinders efforts for sustainable development that include access to decent work, and
economic growth [82]. Labor is an often overlooked aspect in discussions of ethical and sustainable
artificial intelligence [66]. We argue that we cannot truly create fair and equitable machine learning
systems if they depend on exploitative labor conditions in data work.
Why would requesters want to improve labor conditions in outsourced facilities?
All ML practitioners interviewed for this study had experience outsourcing data-related tasks
with both crowdsourcing platforms and BPOs. They all agreed that platforms are cheaper than BPOs,
but the latter offer higher quality. As argued by our interview partners, BPO teams remain more or
less unchanged throughout the production project, which results in better quality. Moreover, direct
communication with project managers allows for iterations and the incorporation of feedback.
Several ML practitioners also report preferring not to outsource data-related tasks, especially in
cases where a unique “feel for the data” [61], that can only be achieved with time and experience,
was required. The evidence pointing to a negative correlation between cheap labor and the quality
of data [49] described by the ML practitioners that we interviewed could be a strong argument for
requesters to take measures and fight precarious work in outsourced facilities. Improving labor
conditions might result in less expensive (and, perhaps, more effective) approach than investing in
“debiasing” datasets after production.
5.1.3 Fighting Workers’ Surveillance and Control by Encouraging Interrogation.
Our findings show that the widespread use of “protected categories” for human classification
is bound to the cultural contexts and local jurisdictions that define what counts as a protected
group. Moreover, even tasks that do not involve classifying humans, such as identifying objects
in a road, can potentially have fatal consequences for individuals or groups, as in the case of the
Tasksource requester who did not include labels for humans sleeping or lying on the streets. Making
the rationale behind task instructions explicit can be difficult if categories are implicitly considered
99
commonplace for requesters, as they might not even notice the normativity behind instructed
taxonomies. Moreover, data workers that are subject to surveillance and control and who risk
being banned from tasks are less likely to question instructions. De-centering the development of
taxonomies from an “a priori” (i.e., classifying exclusively based on personal experience) and data-
based (i.e., classifying solely based on quantitative data) classification to one that derives from the
context and experiences of those who may be affected by it could be a fruitful approach to this issue
[17]. Data workers often perceive errors in task instructions or interfaces that remain unnoticed by
the requesters. Even if this feedback could be valuable for requesters, the data-production dispositif
is designed to silence workers’ voices.
We argue that the approach observed in Clickrating, where feedback from workers was encour-
aged, could be constructive here. However, expanding and implementing such an approach would
require a general shift of perspective: from considering workers’ subjectivities a danger to data
towards considering workers as assets in the quest for producing high-quality datasets. Fostering
workers’ agency instead of surveillance and opening up channels for feedback could allow workers
to become co-producers of datasets instead of mere reproduction tools.
Why would requesters want to be questioned in their logic?
While taxonomies respond to the commercial necessities of requesters, they also need to be built
with equity and inclusivity in mind. This is not only an ethical issue, but it can quickly become
a commercial one. Public scrutiny can have fatal consequences for a machine learning product
that is perceived to be discriminatory or harmful [2, 32, 45]. Furthermore, instruction documents
are living documents. We have observed how requesters update them by withdrawing the tasks,
reinstating the instructions, and seeking data work again, a time-consuming and costly process.
Thus, requesters could benefit from considering instructions as the product of exchanges with the
different stakeholders contributing to data production and deployment. Data workers could play a
key role in interrogating and improving tasks and, therefore, datasets and ML systems.
5.2 Limitations and Future Research

Our findings are bound to the platforms, companies, individuals, and geographical contexts covered
by our study and our positionality as researchers, which has undoubtedly oriented but probably also
limited our observations and interpretations. Because of the qualitative nature of our investigation,
we have striven for inter-subject comprehensibility [20] instead of objectivity, which means making
sure that our interpretations are plausible for both authors and the contexts observed. Furthermore,
the use of multiple data sources allowed us to procure supporting evidence for observed phenomena.
In addition, the use of expert interviews allowed us to confirm and discuss several of our initial
interpretations.
This paper only covers some aspects of the data-production dispositif. This is because no dispositif
works in isolation but is always entangled with other discourse, action, and materialization networks.
To explicate the totality of the data-production dispositif would mean to analyze its relationship
with, among many others, the scientific dispositif, the economic dispositif, and more specifically,
the academic and the tech-industry dispositifs. Critical aspects of these relationships have been
reported in these pages, but covering them all in one paper would be unfeasible. The fact that our
analysis is bound to remain “incomplete” could be seen as a limitation. However, we consider it
an opportunity for future research to expand our findings and interrogate ways of working with
data that today seem commonplace. We think that a profound exploration into the tech-industry
dispositif and its relationship with the data-production dispositif could be especially fruitful.
100
6 CONCLUSION
To explore how data for machine learning is produced through labor outsourced to Venezuela and
Argentina, we have turned to the Foucauldian notion of dispositif and applied an adapted version of
the dispositif analysis method outlined, among others, by Sigfried Jäger [39, 41]. Our investigation
comprised the analysis of task instructions, interviews with data workers, managers, and requesters,
as well as observations at crowdsourcing platforms and a business process outsourcing company.
What we have called the data-production dispositif comprises discourses, work practices, and
materializations that are (re)produced in and through ML data work. Our findings have shown that
requesters use task instructions to impose predefined forms of interpreting data. The context of
poverty and dependence in Latin America leaves workers with no other option but to obey. In view
of these findings, we propose three ways of counteracting the data-production dispositif and its
effects: making worldviews encoded in task instructions explicit, thinking of workers as assets, and
empowering them to produce better data.
While the potentially harmful effects of algorithmic biases continue to be widely discussed, it
is also essential to address how power imbalances and imposed classification principles in data
creation contribute to the (re)production of inequalities by machine learning. The empowerment
of workers and the decommodification of their labor away from market dependency, as well as the
detailed documentation of outsourced processes of data creation, remain essential steps to allow
spaces of reflection, deliberation, and audit that could potentially contribute to addressing some of
the social questions surrounding machine learning technologies.
ACKNOWLEDGMENTS
International Development Research Centre of Canada, and the Schwartz Reisman Institute for
Technology and Society. We thank Tianling Yang, Marc Pohl, Alex Taylor, Alessandro Delfanti,
Paula Núñez de Villavicencio, Alex Hanna, Paola Tubaro, Antonio Casilli, and the anonymous
reviewers. Special thanks to the data workers who shared their experiences with us. This work
would not have been possible without them.
REFERENCES
[1] Agence France-Presse. 2021. Venezuela reports 2020 inflation of 3,000 percent. ABS CBN News (2021).
[2] Julia Angwin, Jeff Larson, Surya Mattu, and Lauren Kirchner. 2016. Machine Bias. ProPublica (2016).
[3] Ron Artstein and Massimo Poesio. 2005. Bias decreases in proportion to the number of annotators. In Proceedings of
FG-MoL 2005 : the 10th Conference on Formal Grammar and the 9th Meeting on Mathematics of Language, Edinburgh,
5–7 August, 2005. 139–148.
[4] Jeffrey Bardzell, Shaowen Bardzell, Guo Zhang, and Tyler Pace. 2014. The lonely raccoon at the ball: designing for
intimacy, sociability, and selfhood. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems.
ACM, Toronto Ontario Canada, 3943–3952. https://doi.org/10.1145/2556288.2557127
[5] Emily M. Bender, Timnit Gebru, Angelina McMillan-Major, and Shmargaret Shmitchell. 2021. On the Dangers of
Stochastic Parrots: Can Language Models Be Too Big? Conference on Fairness, Accountability, and Transparency (FAccT
’21) (mar 2021).
[6] Claus Bossen, Kathleen H Pine, Federico Cabitza, Gunnar Ellingsen, and Enrico Maria Piras. 2019. Data work
in healthcare: An Introduction. Health Informatics Journal 25, 3 (Sept. 2019), 465–474. https://doi.org/10.1177/
1460458219864730
[7] C. E. Brodley and M. A. Friedl. 1999. Identifying Mislabeled Training Data. Journal of Artificial Intelligence Research 11
(Aug. 1999), 131–167. https://doi.org/10.1613/jair.606
[8] Andrea D. Bührmann and Werner Schneider. 2007. Mehr als nur diskursive Praxis? – Konzeptionelle Grundlagen
und methodische Aspekte der Dispositivanalyse. Forum Qualitative Sozialforschung / Forum: Qualitative Social
Research Vol 8, No 2: From Michel Foucault’s Theory of Discourse to Empirical Discourse Research (May 2007).
https://doi.org/10.17169/FQS-8.2.237
101
[9] Joannah Caborn. 2016. On the Methodology of Dispositive Analysis. Critical Approaches to Discourse Analysis Across
Disciplines 1, 1 (2016), 115–123. https://doi.org/10.5209/CLAC.53494
[10] Antonio A. Casilli. 2017. Digital labor studies go global: Toward a digital decolonial turn. International Journal of
Communication 11 (2017), 3934–3954.
[11] Antonio A. Casilli and Julian Posada. 2019. The Platformisation of Labor and Society. In Society and the Internet (vol. 2
ed.), Mark Graham and William H. Dutton (Eds.). Oxford University Press, Oxford.
[12] Justin Cheng and Dan Cosley. 2013. How annotation styles influence content and preferences. In Proceedings of the
24th ACM Conference on Hypertext and Social Media - HT ’13. Association for Computing Machinery, Paris, France,
214–218. https://doi.org/10.1145/2481492.2481519 tex.ids: cheng2013a.
[13] Kate Crawford. 2021. Atlas of AI. Power, Politics, and the Planetary Costs of Artificial Intelligence. Yale University Press,
New Haven, CT.
[14] Ciaran Cronin. 1996. Bourdieu and Foucault on power and modernity. Philosophy & Social Criticism 22, 6 (Nov. 1996),
55–85. https://doi.org/10.1177/019145379602200603
[15] Peter Dauvergne. 2020. AI in the Wild. Sustainability in the Age of Artificial Intelligence. MIT Press, Cambridge, MA.
[16] Alessandro Delfanti and Sarah Sharma (Eds.). 2019. Log Out! The Platform Economy and Worker Resistance. Notes
from Below 8 (2019).
[17] Catherine D’Ignazio and Lauren F. Klein. 2020. Data feminism. The MIT Press, Cambridge, Massachusetts. https:
//mitpress.mit.edu/books/data-feminism
[18] Shaoyang Fan, Ujwal Gadiraju, Alessandro Checco, and Gianluca Demartini. 2020. CrowdCO-OP: Sharing Risks and
Rewards in Crowdsourcing. Proceedings of the ACM on Human-Computer Interaction 4, CSCW2 (Oct. 2020), 1–24.
https://doi.org/10.1145/3415203
[19] Melanie Feinberg. 2017. A Design Perspective on Data. In CHI ’17: Proceedings of the 2017 CHI Conference on Human
Factors in Computing Systems (CHI ’17). Association for Computing Machinery, Denver, Colorado, USA, 2952–2963.
https://doi.org/10.1145/3025453.3025837
[20] Uwe Flick. 2007. Qualitative Sozialforschung: Eine Einführung (10, erweiterte neuausgabe ed.). Rowohlt Taschenbuch,
Reinbek bei Hamburg.
[21] Michel Foucault. 1971. Orders of discourse. Social Science Information 10, 2 (April 1971), 7–30. https://doi.org/10.1177/
053901847101000201 Publisher: SAGE Publications Ltd.
[22] Michel Foucault. 1982. The Archaeology of Knowledge: And the Discourse on Language. Vintage, New York.
[23] Michel Foucault. 1982. The Subject and Power. Critical Inquiry 8, 4 (1982), 777–795. https://www.jstor.org/stable/
1343197
[24] Michel Foucault. 1996. What Is Critique? In What is Enlightenment?: Eighteenth-Century Answers and Twentieth-Century
Questions, James Schmidt (Ed.). University of California Press.
[25] Michel Foucault and Colin Gordon. 1980. Power/knowledge: selected interviews and other writings, 1972-1977 (1st
american ed ed.). Pantheon Books, New York.
[26] Ujwal Gadiraju, Jie Yang, and Alessandro Bozzon. 2017. Clarity is a Worthwhile Quality: On the Role of Task Clarity
in Microtask Crowdsourcing. In Proceedings of the 28th ACM Conference on Hypertext and Social Media (HT ’17).
Association for Computing Machinery, New York, NY, USA, 5–14. https://doi.org/10.1145/3078714.3078715
[27] R. Stuart Geiger, Kevin Yu, Yanlai Yang, Mindy Dai, Jie Qiu, Rebekah Tang, and Jenny Huang. 2020. Garbage in, Garbage
out? Do Machine Learning Application Papers in Social Computing Report Where Human-Labeled Training Data Comes
From?. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency (Barcelona, Spain) (FAT* ’20).
[28] Mor Geva, Yoav Goldberg, and Jonathan Berant. 2019. Are We Modeling the Task or the Annotator? An Investigation of
Annotator Bias in Natural Language Understanding Datasets. In Proceedings of the 2019 Conference on Empirical Methods
in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-
IJCNLP). Association for Computational Linguistics, Hong Kong, China, 1161–1166. https://doi.org/10.18653/v1/D19-
1107
[29] Bhavya Ghai, Q. Vera Liao, Yunfeng Zhang, and Klaus Mueller. 2020. Measuring Social Biases of Crowd Workers using
Counterfactual Queries. Honolulu, HI, USA. http://fair-ai.owlstown.com/publications/1424
[31] Julian Hamann, Jens Maesse, Ronny Scholz, and Johannes Angermuller. 2019. The Academic Dispositif: Towards a
Context-Centred Discourse Analysis. In Quantifying Approaches to Discourse for Social Scientists, Ronny Scholz (Ed.).
Springer International Publishing, Cham, 51–87. https://doi.org/10.1007/978-3-319-97370-8_3
[32] Karen Hao. 2020. In 2020, let’s stop AI ethics-washing and actually do something. MIT Technology Review (2020).
https://www.technologyreview.com/2019/12/27/57/ai-ethics-washing-time-to-act/
102
[33] Kotaro Hara, Abigail Adams, Kristy Milland, Saiph Savage, Chris Callison-Burch, and Jeffrey P. Bigham. 2018. A
Data-Driven Analysis of Workers’ Earnings on Amazon Mechanical Turk. In Proceedings of the 2018 CHI Conference on
Human Factors in Computing Systems. ACM, Montreal QC Canada, 1–14. https://doi.org/10.1145/3173574.3174023
[34] Ellie Harmon and Melissa Mazmanian. 2013. Stories of the Smartphone in everyday discourse: conflict, tension
& instability. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, Paris France,
1051–1060. https://doi.org/10.1145/2470654.2466134
[35] Heather A. Horst and Daniel Miller. 2012. Digital Anthropology. Berg. 196–213 pages.
[36] Christoph Hube, Besnik Fetahu, and Ujwal Gadiraju. 2019. Understanding and Mitigating Worker Biases in the
Crowdsourced Collection of Subjective Judgments. In Proceedings of the 2019 CHI Conference on Human Factors in
Computing Systems (CHI ’19). Association for Computing Machinery, New York, NY, USA, 1–12. https://doi.org/10.
1145/3290605.3300637 tex.ids: hube2019a event-place: Glasgow, Scotland Uk.
[37] Lilly Irani. 2015. The cultural work of microwork. New Media & Society 17, 5 (2015), 720–739. https://doi.org/10.1177/
1461444813511926
[38] Lilly C. Irani and M. Six Silberman. 2013. Turkopticon: interrupting worker invisibility in amazon mechanical turk. In
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI ’13). Association for Computing
Machinery, Paris, France, 611–620. https://doi.org/10.1145/2470654.2470742
[39] Siegfried Jäger and Florentine Maier. 2016. Analysing discourses and dispositives: A Foucauldian approach to theory
and methodology. Methods of critical discourse studies (2016), 109–136.
[40] Brian Justie. 2021. Little history of CAPTCHA. Internet Histories 5, 1 (2021), 30–47. https://doi.org/10.1080/24701475.
2020.1831197
[41] Siegfried Jäger. 2007. Deutungskämpfe: Theorie und Praxis Kritischer Diskursanalyse. Springer-Verlag.
[42] Gopinaath Kannabiran, Jeffrey Bardzell, and Shaowen Bardzell. 2011. How HCI talks about sexuality: discursive
strategies, blind spots, and opportunities for future research. In Proceedings of the SIGCHI Conference on Human Factors
in Computing Systems. ACM, Vancouver BC Canada, 695–704. https://doi.org/10.1145/1978942.1979043
[43] Lawrence F. Katz and Alan B. Krueger. 2016. The Role of Unemployment in the Rise in Alternative Work Arrangements.
, 10 pages. https://doi.org/10.1257/aer.p20171092
[44] Gunay Kazimzade and Milagros Miceli. 2020. Biased Priorities, Biased Outcomes: Three Recommendations for Ethics-
oriented Data Annotation Practices. In Proceedings of the AAAI/ACM Conference on Artificial Intelligence, Ethics, and
Society. (AIES ’20). Association for Computing Machinery, New York, NY, USA, 1–7. https://doi.org/10.1145/3375627.
3375809
[45] Granate Kim. 2019. Microsoft Funds Facial Recognition Technology Secretly Tested on Palestinians. Truthout (2019).
[46] Yubo Kou, Xinning Gui, Yunan Chen, and Bonnie Nardi. 2019. Turn to the Self in Human-Computer Interaction: Care
of the Self in Negotiating the Human-Technology Relationship. In Proceedings of the 2019 CHI Conference on Human
Factors in Computing Systems. ACM, Glasgow Scotland Uk, 1–15. https://doi.org/10.1145/3290605.3300711
[47] Valérie Larroche. 2019. The Dispositif: A Concept for Information and Communication Sciences (1 ed.). Wiley. https:
//doi.org/10.1002/9781119508724
[48] Jürgen Link. 2014. Dispositiv. In Foucault-Hanbuch, Clemens Kammler, Rolf Parr, Ulrich Johannes Schneider, and Elke
Reinhardt-Becker (Eds.). J.B. Metzler, Stuttgart, 237–242. https://doi.org/10.1007/978-3-476-01378-1_27
[49] Leib Litman, Jonathan Robinson, and Cheskie Rosenzweig. 2015. The relationship between motivation, monetary
compensation, and data quality among US- and India-based workers on Mechanical Turk. Behavior Research Methods
47, 2 (June 2015), 519–528. https://doi.org/10.3758/s13428-014-0483-x
[50] Alex Jiahong Lu, Tawanna R. Dillahunt, Gabriela Marcu, and Mark S. Ackerman. 2021. Data Work in Education:
Enacting and Negotiating Care and Control in Teachers’ Use of Data-Driven Classroom Surveillance Technology.
Proceedings of the ACM on Human-Computer Interaction 5, CSCW2 (oct 2021), 1–26. https://doi.org/10.1145/3479596
[51] Katharina Manderscheid. 2014. The Movement Problem, the Car and Future Mobility Regimes: Automobility as
Dispositif and Mode of Regulation. Mobilities 9, 4 (Oct. 2014), 604–626. https://doi.org/10.1080/17450101.2014.961257
[52] David Martin, Benjamin V. Hanrahan, Jacki O’Neill, and Neha Gupta. 2014. Being a turker. In Proceedings of the 17th
ACM conference on Computer supported cooperative work & social computing. ACM, Baltimore Maryland USA, 224–235.
https://doi.org/10.1145/2531602.2531663
[53] Milagros Miceli, Martin Schuessler, and Tianling Yang. 2020. Between Subjectivity and Imposition: Power Dynamics
in Data Annotation for Computer Vision. Proceedings of the ACM on Human-Computer Interaction 4, CSCW2 (Oct.
2020), 1–25. https://doi.org/10.1145/3415186
[54] Milagros Miceli, Tianling Yang, Laurens Naudts, Martin Schuessler, Diana Serbanescu, and Alex Hanna. 2021.
Documenting Computer Vision Datasets: An Invitation to Reflexive Data Practices. In Proceedings of the 2021
ACM Conference on Fairness, Accountability, and Transparency. ACM, Virtual Event Canada, 161–172. https:
//doi.org/10.1145/3442188.3445880
103
[55] Naja Holten Møller, Claus Bossen, Kathleen H. Pine, Trine Rask Nielsen, and Gina Neff. 2020. Who Does the Work of
Data? Interactions 27, 3 (April 2020), 52–55. https://doi.org/10.1145/3386389
[56] Bruno Moreschi, Gabriel Pereira, and Fabio G. Cozman. 2020. The Brazilian Workers in Amazon Mechanical Turk:
Dreams and realities of ghost workers. Revista Contracampo 39, 1 (apr 2020). https://doi.org/10.22409/contracampo.
v39i1.38252
[57] Michael Muller, Ingrid Lange, Dakuo Wang, David Piorkowski, Jason Tsay, Q. Vera Liao, Casey Dugan, and Thomas
Erickson. 2019. How Data Science Workers Work with Data: Discovery, Capture, Curation, Design, Creation. In
Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (CHI ’19). Association for Computing
Machinery, Glasgow, Scotland Uk, 1–15. https://doi.org/10.1145/3290605.3300356
[58] Michael Muller, Christine T Wolf, Josh Andres, Zahra Ashktorab, Narendra Nath Joshi, Michael Desmond, Aabhas
Sharma, Kristina Brimijoin, Qian Pan, Evelyn Duesterwald, and Casey Dugan. 2021. Designing Ground Truth and the
Social Life of Labels. (2021), 17.
[59] Gemma Newlands. 2021. Lifting the curtain: Strategic visibility of human labour in AI-as-a-Service. Big Data & Society
8, 1 (Jan. 2021), 205395172110160. https://doi.org/10.1177/20539517211016026
[60] Magdalena Nowicka-Franczak. 2021. Post-Foucauldian Discourse and Dispositif Analysis in the Post-Socialist Field of
Research: Methodological Remarks. Qualitative Sociology Review 17, 1 (Feb. 2021), 72–95. https://doi.org/10.18778/1733-
8077.17.1.6
[61] Samir Passi and Steven Jackson. 2017. Data Vision: Learning to See Through Algorithmic Abstraction. In Proceedings
of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing (CSCW ’17). Association
for Computing Machinery, Portland, Oregon, USA, 2436–2447. https://doi.org/10.1145/2998181.2998331
Corporate Data Science Projects. Proc. ACM Hum.-Comput. Interact. 2, CSCW (Nov. 2018), 1–28. https://doi.org/10.
1145/3274405
[63] Thomas Poell, David B. Nieborg, and José van Dijck. 2019. Platformisation. Internet Policy Review 8, 4 (2019).
https://doi.org/10.14763/2019.4.1425
[64] Julian Posada. 2020. The Future of Work Is Here: Toward a Comprehensive Approach to Artificial Intelligence and
Labour. Ethics in Context (2020).
[65] Julian Posada. 2022. Embedded Reproduction in Platform Data Work. Information, Communication & Society (2022).
[66] Julian Posada, Nicholas Weller, and Wendy H. Wong. 2021. We Haven’t Gone Paperless Yet: Why the Printing Press
Can Help Us Understand Data and AI. Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society (AIES
’21) (2021).
[67] Rida Qadri. 2020. Algorithmized but not Atomized? How Digital Platforms Engender New Forms of Worker Solidarity
in Jakarta. In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society. ACM, New York, NY, USA, 144–144.
https://doi.org/10.1145/3375627.3375816
[68] Sverre Raffnsøe, Marius Gudmand-Høyer, and Morten S. Thaning. 2016. Foucault’s dispositive: The perspicacity of
dispositive analytics in organizational research. Organization 23, 2 (March 2016), 272–298. https://doi.org/10.1177/
1350508414549885 Publisher: SAGE Publications Ltd.
[69] Noopur Raval. 2021. Interrupting invisibility in a global world. Interactions 28, 4 (July 2021), 27–31. https://doi.org/10.
1145/3469257
[70] Sarah T. Roberts. 2019. Behind the Screen: Content Moderation in the Shadows of Social Media. Yale University Press,
New Haven, CT. 280 pages. https://doi.org/10.1177/1461444819878844
[71] Joel Ross, Lilly Irani, M. Six Silberman, Andrew Zaldivar, and Bill Tomlinson. 2010. Who are the crowdworkers?:
shifting demographics in mechanical turk. In CHI ’10 Extended Abstracts on Human Factors in Computing Systems.
ACM, Atlanta Georgia USA, 2863–2872. https://doi.org/10.1145/1753846.1753873
[72] Niloufar Salehi, Lilly C. Irani, Michael S. Bernstein, Ali Alkhatib, Eva Ogbe, Kristy Milland, and Clickhappier. 2015.
We Are Dynamo: Overcoming Stalling and Friction in Collective Action for Crowd Workers. In Proceedings of the
33rd Annual ACM Conference on Human Factors in Computing Systems. ACM, Seoul Republic of Korea, 1621–1630.
https://doi.org/10.1145/2702123.2702508
[73] Nithya Sambasivan. 2022. All Equation, No Human: The Myopia of AI Models. Interactions 29, 2 (mar 2022), 78–80.
https://doi.org/10.1145/3516515
[74] Nithya Sambasivan, Shivani Kapania, Hannah Highfill, Diana Akrong, Praveen Paritosh, and Lora M Aroyo. 2021.
“Everyone wants to do the model work, not the data work”: Data Cascades in High-Stakes AI. In Proceedings of the
2021 CHI Conference on Human Factors in Computing Systems. ACM, New York, NY, USA, 1–15. https://doi.org/10.
1145/3411764.3445518
[75] Morgan Klaus Scheuerman, Emily Denton, and Alex Hanna. 2021. Do Datasets Have Politics? Disciplinary Values in
Computer Vision Dataset Development. arXiv:2108.04308 [cs] (Aug. 2021). https://doi.org/10.1145/3476058 arXiv:
2108.04308.
104
[76] Daniel Schteingart, Martin Trombetta, and Gisella Pascuariello. 2020. Primas salariales sectoriales en Argentina.
Ministerio de Desarrollo Productivo de la Nación. Centro de Estudios para la Producción XXI (Nov. 2020), 39.
[77] Cathrine Seidelin, Yvonne Dittrich, and Erik Grönvall. 2018. Data Work in a Knowledge-Broker Organisation: How
Cross-Organisational Data Maintenance Shapes Human Data Interactions. In Proceedings of the 32nd International BCS
Human Computer Interaction Conference (Belfast, United Kingdom) (HCI ’18). BCS Learning Development Ltd., Swindon,
GBR, Article 14, 12 pages. https: //doi.org/10.14236/ewic/HCI2018.14 (2018).
[78] Katta Spiel. 2017. Critical Experience: Evaluating (with) Autistic Children and Technologies. In Proceedings of the 2017
CHI Conference Extended Abstracts on Human Factors in Computing Systems. ACM, Denver Colorado USA, 326–329.
https://doi.org/10.1145/3027063.3027118
[79] Divy Thakkar, Azra Ismail, Pratyush Kumar, Alex Hanna, Nithya Sambasivan, and Neha Kumar. 2022. When is Machine
Learning Data Good ?: Valuing in Public Health Datafication. Proceedings of the 2022 CHI Conference on Human Factors
in Computing Systems (CHI ’22) (2022).
[80] Divy Thakkar, Neha Kumar, and Nithya Sambasivan. 2020. Towards an AI-powered Future that Works for Vocational
Workers. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. ACM, New York, NY, USA,
1–13. https://doi.org/10.1145/3313831.3376674
[81] Paola Tubaro, Antonio A. Casilli, and Marion Coville. 2020. The trainer, the verifier, the imitator: Three ways in
which human platform workers support artificial intelligence. Big Data & Society 7, 1 (2020). https://doi.org/10.1177/
2053951720919776
[82] United Nations. 2015. Sustainable Development Goals. https://www.un.org/sustainabledevelopment/
[83] United Nations General Assembly. 1999. Measures to combat contemporary forms of racism, racial discrimination,
xenophobia and related intolerance. Technical Report. United Nations.
[84] Fabian L. Wauthier and Michael I. Jordan. 2011. Bayesian Bias Mitigation for Crowdsourcing. In Proceedings of the 24th
International Conference on Neural Information Processing Systems (NIPS’11). Curran Associates Inc., Granada, Spain,
1800–1808. http://papers.nips.cc/paper/4311-bayesian-bias-mitigation-for-crowdsourcing.pdf
[85] Glen Whelan. 2019. Born Political: A Dispositive Analysis of Google and Copyright. Business & Society 58, 1 (Jan.
2019), 42–73. https://doi.org/10.1177/0007650317717701
[86] Ricky Wichum. 2013. Security as Dispositif: Michel Foucault in the Field of Security. Foucault Studies (Jan. 2013),
164–171. https://doi.org/10.22439/fs.v0i15.3996
[87] Alex J. Wood, Mark Graham, Vili Lehdonvirta, and Isis Hjorth. 2019. Networked but Commodified: The
(Dis)Embeddedness of Digital Labour in the Gig Economy. Sociology (2019). https://doi.org/10.1177/0038038519828906
[88] Alex J. Wood, Vili Lehdonvirta, and Mark Graham. 2018. Workers of the Internet unite? Online freelancer organisation
among remote gig economy workers in six Asian and African countries. New Technology, Work and Employment 33, 2
(2018), 95–112. https://doi.org/10.1111/ntwe.12112
[89] Jamie Woodcock and Mark Graham. 2020. The Gig Economy: A Critical Introduction. Polity Press, London. 160 pages.
[90] Shoshana Zuboff. 2019. The Age of Surveillance Capitalism: The Fight for a Human Future at the New Frontier of Power.
PublicAffairs, New York, NY. 691 pages.
Received January 2022; revised April 2022; accepted May 2022
105
Co-Designing Documentation for
5
Reflexivity and Participation

This chapter, the last one before going into the conclusion and reflection sections, describes
the design phase of my doctoral work, in which I applied the findings of the first two
studies (Chapters 3 and 4) to investigate documentation practices and re-imagine dataset
documentation frameworks. Here, I understand documentation as a tool and the process of
making explicit the contexts, actors, and practices comprised in ML data work, as well as the
relationships among these elements.
As discussed by previous research [138, 139, 140], one persistent challenge to data
documentation is that datasets are living artifacts that evolve and change over time. Moreover,
their production involves actors with varying amounts of decision-making power. Capturing
such variations requires consideration of the contexts where data production is carried out and
the often distributed nature of the actors and organizations that participate in the process.
Following the research agenda outlined in Paper 1 [37], I explore in this chapter how data
documentation can become sensitive to power and enable the participation of data workers in
shaping production processes. From that perspective, this chapter expands previous research
in the field of dataset documentation [16, 17, 18, 19, 20] by proposing a shift of perspective,
from documenting datasets toward documenting data production processes. Such shift of
perspective involves a shift in the motivation guiding documentation efforts: While previous
data documentation initiatives in ML were rooted in the value of transparency and motivated
by the need to inform consumers and the general public about dataset characteristics and
composition, I argue that documentation should additionally promote reflexivity [1, 141, 140]
in terms of how tasks are laid out and how data is produced.
The ideas, discussions, and observations presented in this chapter are based on and oriented
by the papers included in the previous chapters of this dissertation. This means that the
theorizations and design considerations described in this chapter are based on the challenges and
107
5. Co-Designing Documentation for Reflexivity and Participation
needs of data workers. Documentation here is oriented toward addressing power differentials
in data work and “truths” encoded in data by making them explicit and contestable.
Paper 4, Documenting Computer Vision Datasets. An Invitation to Reflexive Data
Practices[141], presents a theorization of reflexivity, understood as a collective consideration
of social and intellectual factors that lead to praxis [73]. The paper argues that reflexivity
is a necessary precondition for documentation and that reflexive documentation can help to
expose the contexts, relations, routines, and power structures that shape data. The findings
are based on the interviews conducted in S1 and S2 (although in this paper we refer to these
organizations as “Emérita” and “Action Data,” respectively) and with managers at other
BPOs (S3) and ML practitioners (S4).
Following the line of Paper 2, the focus of Paper 4 is on the field of computer vision and
on the production of image datasets. However, the considerations presented in the paper can
be expanded to other forms of data work and other ML applications. The study was guided
by the following research questions:
1. How can the specific contexts that shape the production of image datasets be made
explicit in documentation?
2. Which factors hinder documentation in this space?
3. How can documentation be incentivized?
The findings are organized around four salient documentation-related issues emerging
from the analysis, namely the variety of actors involved and the collaboration among them,
the different purposes and forms of documentation, the perception of documentation as
burdensome and problems around the intelligibility of documentation. Finally, the paper
introduces and discusses four elements which could motivate companies to implement reflexivity-
driven documentation, namely, preservation of knowledge, inter-organizational accountability,
auditability, and regulatory intervention.
Based on these findings, Paper 5, Documenting Data Production Processes: A Participatory
Approach for Data Work [124] presents a hands-on design inquiry into what data workers want
to see documented and how documentation could be shaped by their ideas and desiderata.
Guided by participatory design, the findings result from my long-term engagement with workers
at S1 (here we call this firm “Alamo”) and S2 (pseudonymized here as “Action Data”).
The focus of this paper is on exploring ways of making distributed processes of data
production more reflexive and participatory through documentation. Documentation, here, is
regarded as a boundary object [142, 143], i.e., an artifact that can be used differently across
organizations and teams to allow collaboration but holds enough immutable content to maintain
integrity. This paper is intentional and explicit about co-producing design considerations
for and with data workers. It explores the potential of documentation to allow workers to
intervene in production process, for instance, through the co-creation of task instructions. To
that end, the paper addresses the following research questions:
1. How can documentation reflect the iterative and collaborative nature of data production
processes?
108
2. How can documentation practices contribute to mitigating the information asymmetries

present in data production and ML supply chains?
3. What information do data workers need to perform their tasks in the best conditions,
and how can it be included in the documentation?
The findings show that data workers prioritize documentation that is collaborative and
circular, i.e., documentation that is able to transport information about tasks, teams, and
payment to data workers, and communicate workers’ feedback back to the requesters. The
paper includes design considerations related to the integration of documentation practices as
integral part of data production, questions of access and trust, and the differentiation between
the creation and use of documentation.
Finally, this chapter closes with an example of how the findings from both papers can
be implemented. Section 5.2 presents an initial prototype showing one of the many ways in
which the considerations outlined in this chapter could shape the design of a documentation
framework and interface.
I was in charge of producing the first manuscript draft for both papers included in this
chapter. Moreover, the initial idea, the study design, and the collection and analysis of data was
conducted by me. In both papers, Tianling Yang was key to helping code the data and establish
inter-subject comprehensibility [120]. For Paper 4, Diana Serbanescu, Laurens Naudts, Martin
Schuessler, and Alex Hanna provided theory-related ideas and feedback. For Paper 5, Adriana
Alvarado facilitated one of the workshop activities, provided essential methodological assistance
for the conduction of the co-design workshops, and collaborated with Sonja Wang and me
to improve the Methods section of the paper. Julian Posada facilitated one of the workshop
activities, collaborated with me to write up the findings from the workshops, and provided key
literature recommendations on data work. Marc Pohl helped with the design of the workshop
activities and their documentation. And Alex Hanna supported this research since its inception,
providing literature recommendations, feedback, and ideas for the research design as well as
insightful comments on the manuscript. Paper 4 was published in the Proceedings of the
2021 ACM Conference on Fairness, Accountability, and Transparency (FAccT ’21). Paper 5
was presented at the 2022 ACM Conference On Computer-Supported Cooperative Work And
Social Computing (CSCW’22) and published in the November edition of the Proceedings of
the ACM on Human-Computer Interaction
109
Paper 4: An Invitation to Reflexive Data Practices
Documenting Computer Vision Datasets: An Invitation to

Reflexive Data Practices
Milagros Miceli Tianling Yang Laurens Naudts
Technische Universität Berlin Technische Universität Berlin Centre for IT & IP Law (CiTiP), KU
m.miceli@tu-berlin.de tianling.yang@tu-berlin.de Leuven
laurens.naudts@kuleuven.be
Martin Schuessler Diana Serbanescu Alex Hanna

Technische Universität Berlin Technische Universität Berlin Google Research
schuessler@tu-berlin.de diana-alina.serbanescu@tu-berlin.de alexhanna@google.com
ABSTRACT results in machine learning [41]. Reaching a new high in popularity,

In industrial computer vision, discretionary decisions surrounding computer vision models are used in a broad range of applications,
the production of image training data remain widely undocumented. penetrating ever more aspects of daily life. Creating datasets for
Recent research taking issue with such opacity has proposed stan- computer vision is not straightforward. Work practices involved in
dardized processes for dataset documentation. In this paper, we gathering, annotating, and cleaning image data comprise subjec-
expand this space of inquiry through fieldwork at two data pro- tive choices and discretionary decision-making [35, 39, 40]. Such
cessing companies and thirty interviews with data workers and decisions range from the framing of real-world questions as compu-
computer vision practitioners. We identify four key issues that hin- tational problems [5, 38] to the establishment of taxonomies to label
der the documentation of image datasets and the effective retrieval images [32]. Data is also “the product of unequal social relations”
of production contexts. Finally, we propose reflexivity, understood [19] that are present among data workers as well as in the relation-
as a collective consideration of social and intellectual factors that ship between those whose data is collected and those who make
lead to praxis, as a necessary precondition for documentation. Re- use of data for research and/or profit. The opacity of industrial
flexive documentation can help to expose the contexts, relations, practices regarding computer vision datasets is a significant threat
routines, and power structures that shape data. to ethical data work and intelligible systems [49].
Recent research has proposed implementing structured disclo-
CCS CONCEPTS sure documents to accompany machine learning datasets [4, 22,
23, 27]. Despite their good intentions, those efforts fail to effec-
• Human-centered computing → Empirical studies in col-
tively reflect power dynamics and their effects on data [19, 32].
laborative and social computing; • Social and professional
For instance, Gebru et al. [22] propose that datasheets include the
topics → Quality assurance; Computing industry; • Computing
question “does the dataset identify any subpopulations?” [22] e.g.
methodologies → Computer vision problems.
by race, age, or gender. This way of documenting dataset compo-
sition is helpful. However, we argue that disclosing if a dataset
KEYWORDS
includes racial categories does not speak to the problem of such
datasheets for datasets, dataset documentation, reflexivity, data categories’ reductiveness, nor makes the assumptions behind race
annotation, training data, transparency, accountability, audits, ma- classifications embedded in datasets explicit. In the same way, ask-
chine learning ing “who created this dataset?” [22] and “who was involved in the
ACM Reference Format: data collection process (...) and how were they compensated?” [22]
Milagros Miceli, Tianling Yang, Laurens Naudts, Martin Schuessler, Diana remains insufficient to interrogate hierarchies in industrial settings
Serbanescu, and Alex Hanna. 2021. Documenting Computer Vision Datasets: and their effects on data [32] . Reflecting on interests, preconcep-
An Invitation to Reflexive Data Practices. In Conference on Fairness, Account- tions, and power encoded in training data [16, 19, 46] is essential
ability, and Transparency (FAccT ’21), March 3–10, 2021, Virtual Event, Canada. for addressing many of the ethical concerns surrounding computer
ACM, New York, NY, USA, 12 pages. https://doi.org/10.1145/3442188.3445880 vision products.
In this paper, we lay our focus at the intersection of manual
1 INTRODUCTION data processing and computer vision engineering. We investigate
Since the rise of deep learning and convolution neural nets, the field how work practices involved in the production of computer vision
of computer vision has demonstrated some of the most impressive datasets can be made explicit in documentation. Although data
processing can cover a variety of activities, we refer to companies
where human workers collect, segment, and label image training
data. Data processing companies of this kind provide data services at
the request of computer vision companies (hereinafter "requesters")
that wish to outsource parts of dataset production. Work between
FAccT ’21, March 3–10, 2021, Virtual Event, Canada
ACM ISBN 978-1-4503-8309-7/21/03. service providers and requesters requires strong coordination ef-
https://doi.org/10.1145/3442188.3445880 forts as it comprises many actors and iterations [32]. Collaboration
161
111
FAccT ’21, March 3–10, 2021, Virtual Event, Canada Milagros Miceli, Tianling Yang, Laurens Naudts, Martin Schuessler, Diana Serbanescu, and Alex Hanna
is informed by negotiation over the meanings that are ascribed these disclosure documents primarily focus on AI models and ser-
to images [16]. In this context, not all actors hold equal power to vices, information relevant to training datasets is also required to
shape datasets: Data processing companies generally collect and be reported.
interpret data according to categories instructed by requesters, and Recent research [4, 22, 23, 27] has called for applying similar
workers often trust the judgment of their managers in case of doubt structured procedures for documenting datasets specifically. This
or disagreement [32]. These dynamics have a crucial effect on the line of research advocates for and applies the systematic docu-
datasets that train commercial computer vision products. Making mentation of datasets’ purpose, composition, collection process,
them explicit in documentation can help better understand models’ preprocessing, uses, distribution [22, 23, 47], and maintenance
behavior and uncover broader ethical issues. [11, 22, 27, 47]. Several studies also draw special attention to the
We base our investigation on fieldwork at two data process- documentation of actors involved, including their characteristics
ing companies, and several interviews with data collectors, an- and roles [4, 22, 23], the use of software and other tools [4, 22, 47],
notators, managers, and computer vision practitioners. We iden- availability of training and additional resources for documentation
tify key aspects of the effective documentation of responsibilities, [4, 23], and fair pay for workers [22, 23, 47]. Furthermore, ethical
decision-making, and power asymmetries that decisively shape im- concerns have been raised in documentation regarding privacy
age datasets. Our investigation is framed by the following research [22, 27, 47] and potential harms of datasets [22, 47] (see Table 1).
questions: (RQ1) How can the specific contexts that inform the Most prominently, Gebru et al [22] argue that documentation
production of image datasets be made explicit in documentation? can improve transparency, accountability and reproducibility, and
(RQ2) Which factors hinder documentation in this space? (RQ3) facilitate the communication between "dataset consumers and pro-
How can documentation be incentivized? ducers". They propose that every dataset be accompanied by a
Given the complex interweaving of actors, iteration, and respon- checklist which should be flexible enough to accommodate specific
sibilities involved, documenting the context of data transformations domains and “existing organizational infrastructure and workflows”
is crucial, yet hard to achieve. We propose reflexivity, understood [22]. Holland et al. [27] argue that documentation of datasets can
as the consideration of social and intellectual factors that predeter- enable consumers to select appropriate datasets better and, at the
mine and shape praxis [7], as a crucial component for retrieving same time, improve data collection practices among dataset cre-
and documenting power dynamics in data creation. We borrow ators, as they would need to explain and justify their practices. They
Bourdieu’s “Invitation to Reflexive Sociology” [8] and translate it propose a dataset nutrition label that is composed of modules to
into an invitation to reflexive data practices. Our invitation regards be filled in through a combination of manual work and automated
reflexivity not as personal introspection but as a collective and procedures. Geiger et al. [23] focus primarily on documentation of
collaborative endeavor [8]. datasets in academic settings. They maintain that documentation
We start by reviewing work that investigates the documentation not only contributes to increasing reproducibility and open science,
of machine learning datasets and models. Then, we explore differ- but is also a matter of “research validity and integrity” [23].
ent conceptualizations of reflexivity. After offering an overview of Whereas current proposals and practices of documentation of-
research methods, informants, and fieldwork sites, we present our ten prioritize reproducibility, power imbalances in contexts of data
findings. These are organized around four salient documentation- creation are not often accounted for. In their investigation of data
related issues emerging from our analysis, namely the variety of annotation services, Miceli et al. [32] present evidence of how power
actors involved and the collaboration among them, the different pur- asymmetries shape computer vision datasets. In particular, the au-
poses and forms of documentation, the perception of documentation thors show how the judgements of managers and, even more, of
as burden, and problems around the intelligibility of documentation. requesters remain unquestioned when it comes to interpreting and
Next, we discuss the implications of our findings and propose the labeling data. In view of these dynamics, D’Ignazio and Klein [19]
implementation of reflexivity in disclosure documents. Finally, we underline the importance of restoring the context where datasets
introduce and discuss four motivations which could lead companies are produced, be it “social, cultural, historical, institutional, (...) [or]
to implement reflexivity-driven documentation, namely, preserva- material,” and the identities of dataset creators. They explain that
tion of knowledge, inter-organizational accountability, auditability, “one feminist strategy for considering context is to consider the
and regulatory intervention. cooking process that produces ‘raw’ “data” [19] and propose ask-
ing “who questions” to drive reflection and analysis on power and
privilege. In line with this research, we highlight the importance of
looking into processes of data creation and foster disclosure docu-
2 RELATED WORK ments that go beyond datasets’ technical features. We argue that
the dimensions proposed or applied in structured dataset documen-
2.1 Documentation of Datasets and Models tation formats (see Table 1) are necessary but insufficient to drive a
Previous work has pointed at the need for opening black-box al- much-needed reflection of industry practitioners’ and researchers’
gorithms by explicating their outcomes [37, 44] and documenting position and influence on data. For such a reflection to be possible,
their modeling [26, 34]. A growing body of literature has investi- datasets must be placed in the context of their production. This per-
gated and developed structured disclosure documents or checklists spective would not only provide a better understanding of datasets’
for artificial intelligence models and services, which document “functional limitations” but can also make power asymmetries in
their intended uses, testing methodologies and outcomes, actors data settings [19] visible.
involved, possible bias, and ethical problems [3, 15, 26, 34]. While
162
112
Documenting Computer Vision Datasets: An Invitation to Reflexive Data Practices FAccT ’21, March 3–10, 2021, Virtual Event, Canada
Table 1: Summary of descriptive dimensions in documentation frameworks proposed or applied in previous research. It should
be noted that the dimensions are often interconnected and not mutually exclusive.
Authors / Proposed or Applied Documentation Form

Descriptive Dimensions in Documentation Gebru et Geiger Bender and Holland Seck et Choi et
al. [22]: et al. [23]: Friedman[4]: et al. [27]: al. [47]: al. [11]:
Datasheets manual and Data state- Dataset Datasheets Datasheets
technology- ments Nutrition
assisted Label
documen-
tation
Description of dataset’s motivation: private or public? single ✓ ✓ ✓ ✓
use or open dataset?
Description of actors involved: e.g. funding providers, data ✓ ✓ ✓ ✓ ✓
workers, data subjects and so on
Description of dataset’s composition ✓ ✓ ✓ ✓ ✓
Description of dataset’s collection process ✓ ✓ ✓ ✓ ✓
Account of data (pre-)processing steps (e.g., cleaning,labeling) ✓ ✓ ✓ ✓ ✓
Description of dataset’s intended and recommended uses ✓ ✓ ✓ ✓
Description of datasets’ distribution ✓ ✓ ✓ ✓ ✓
Description of datasets’ maintenance ✓ ✓ ✓ ✓
Description of software and other tools used in data work ✓ ✓ ✓
Reflection on potential impacts and ethical issues relevant to ✓ ✓ ✓ ✓
datasets
Description of training for data workers ✓ ✓
Formal definitions and instructions for annotation ✓ ✓
Payment for workers ✓ ✓ ✓
Team composition and diversity ✓ ✓ ✓ ✓
Account for production settings and hierarchies ✓
Procedures for solving discrepencies in data production ✓
Rationale for data collection framing and labeling taxonomies ✓
2.2 The Notion of Reflexivity Previous investigations in sociotechnical systems have intro-
According to D’Ignazio & Klein [19], reflexivity is a precondition duced reflexivity by drawing experiences and methodologies from
for restoring context in data creation. The authors define reflex- other disciplines to examine presumptions and taken-for-granted
ivity as “the ability to reflect on and take responsibility for one’s practices in machine learning and data science. Viewing machine
own position within the multiple, intersecting dimensions of the learning via computational ethnography, Elish and boyd [20] un-
matrix of domination” [19]. The matrix of domination is a concept derline the situated nature of knowledge work and argue in favor
first termed by Patricia Hill Collins [13] to explain how systems of methodological reflections and reflexive practices. Drawing on
of power are configured and experienced. Black feminist scholars critical race methodologies and operationalization of race in other
and critical race theorists have given considerable attention to the disciplines, Hanna and Denton et al. [24] argue that the widespread
importance of one’s positionality with regard to race, gender, and conception and operationalization of race in algorithmic systems as
class in scientific practice. The work of Dorothy Smith [48], Patricia a fixed attribute is decontextualized and, therefore, problematic. Pre-
Hill Collins [13], and Sandra Harding [25] in standpoint theory vious work has furthermore argued that machine learning systems
is an important strand in this space. Researchers in critical race have positionality. Among other factors, “they inherit positionality
theory further interrogate ideological positioning of privileged and from data” [1]. Preconceptions and values get embedded in data,
dominant groups [2, 6, 18]. More broadly, scholars on positionality for instance, through collection and analysis methods and through
frame actors’ positions in socio-political contexts and scrutinize the taxonomies used in data annotation. The sensemaking and clas-
researchers’ personal identities and stances concerning the con- sification of data through labels as performed by annotators [32] is
texts of knowledge and study [9, 12, 31]. These positions shape “a judgement and as such informed by the knowledge, experiences,
researchers’ view of the world and thereby the whole research perspectives, and value commitments of annotators or labelers” [1].
process, i.e., how they perceive, construct and approach a research As we will explain in Discussion, Pierre Bourdieu’s conceptu-
problem, how they report research findings, and the process of alization of reflexivity, understood as a relational construct and
knowledge construction and production [9, 12].
163
113
an integral part of inquiry praxis, is at the core of the documenta- moderation, and software testing. Its clients are large regional cor-
tion framework we present in this paper. Bourdieu’s writings on porations in diverse fields such as security, e-commerce, and energy.
reflexivity offer a systematic investigation into social and intellec- At the time of the observations, between May and June 2019, the
tual factors that predetermine and shape researchers’ practices in Buenos Aires branch of Emérita had around 200 data-related em-
scientific work [7, 8, 21]. The Bourdieusian notion of reflexivity ployees who mostly worked 4 hours shifts, Mondays to Fridays,
goes beyond personal experiences and regards researchers’ posi- and were paid at the minimum wage.
tion at the collective level, that is, in relation to other actors and “Action Data” is the code-name of the Bulgarian company. Ac-
the field of inquiry as a whole. Moreover, Bourdieu’s reflexivity tion Data specializes in image data collection, segmentation, and
does not aim to undermine objectivity. Instead, it is presented as labeling. Its clients are computer vision companies, mostly located
an analytical tool to sensitize researchers to “the social and intel- in North America and western Europe. The company offers its
lectual unconscious” that condition their thoughts and practices workers contractor-based work and the possibility to complete
in research, and is, therefore, an integral part of and a “necessary their assignments remotely, with flexible hours. Contractors are
prerequisite” for scientific inquiry [8]. The French sociologist pin- paid per picture or annotation, and payment varies according to
points three types of bias that may influence scientific research, each project and its difficulty. At the time of the observations, in
which may be mitigated by introducing reflexivity.The first bias July 2019, the Bulgarian company was very small in size. Three
results from researchers’ positions in the social structure, such as employees in salaried positions and a pool of around 60 contractors
class, gender, and ethnicity. The second bias comes from researchers’ handled operations.
position in academic disciplines, i.e., academic traditions, prevailing At both sites, we conducted several weeks of observations, with
currents, and socio-organizational structures in specific disciplines different levels of interaction and involvement. All tasks observed
that determine specific field epistemologies. The third bias, termed were related to the production of datasets for computer vision and
by Bourdieu as the intellectualist bias, is embedded in the scholarly requested by computer vision companies. Moreover, we observed
gaze that places researchers outside or above the object of research the on-boarding, briefing, and further training of workers as well
and considers their engagement with problems as purely scientific as instances of communication between managers and teams, and
and unconstrained from social positions and economic interests. managers and requesters. It is important to mention that the obser-
In opposition to this idea, Bourdieu argues that researchers are vations were primary conducted with a different research question
participants rather than external observers and restores research in mind and focused on general work practices and not specifi-
practices as knowledge-producing activities rather than pure and cally on documentation. However, the exploratory character of the
disinterested investigations. In the Discussion section, we will come method and the rich interactions observed allowed us to extract
back to this notion of reflexivity. The three Bourdieusian levels of useful insights for this investigation that were later corroborated
bias will be the base to discuss why reflexivity is fundamental for by our interview partners.
documenting data practices. Reflexivity to make individual and In addition to the observations, fieldwork at both sites also con-
collective positions explicit and acknowledge their effects on data sisted of intensively interviewing data collectors, annotators, and
is not only crucial for conducting better science, as Bourdieu [8] management. In total, we conducted sixteen in-depth interviews
argues. It could also help researchers and practitioners uncover with an average length of 65 minutes, face-to-face, at both loca-
broader ethical issues in computer vision systems. tions. Informants were aged 21 to 40. Eleven of them identified as
female and four as male. None of them had received an education in
tech-related fields or had technical knowledge prior to their current
3 METHOD employment. At Emérita in Argentina, we conducted five in-depth
interviews with data workers and employees in managerial posi-
3.1 Data Collection tions. At Action Data, we conducted eleven in-depth interviews
This investigation was organized around two phases, involving with workers and managers. Interview partners were asked to
different (yet related) research foci and methods. Documentation choose code names to preserve their identity and that of related
practices are a critical aspect we investigated at both stages: informants. The interviews included accounts of specific work situ-
In the first phase, we focused on work practices in data processing ations involving the interpretation of data, the communication with
companies, where human workers collect, segment, and label image managers and clients, and the documentation of responsibilities
training data. We conducted ethnographic fieldwork at two data and decisions. Moreover, the interviews covered task descriptions,
processing companies of the "impact sourcing" sector located in general views on the company and the work, informant’s profes-
Buenos Aires, Argentina, and Sofia, Bulgaria. Impact sourcing refers sional and educational background, expectations for the future, and
to a special type of business outsourcing processing company that biographical details.
intentionally employs workers from marginalized communities. As The second phase of this investigation dealt with the role of stake-
described on their websites and confirmed by our observations, the holders at the opposite end of the service relationship, namely, the
Argentine company employs young people living in slums, while computer vision companies requesting data processing services.
the Bulgarian organization works with refugees from the Middle At fieldwork, we observed that requesters have a major influence
East. on the documentation practice of data processing companies and
The Buenos Aires-located company that we will call “Emérita” is decided to pursue this line of inquiry. Through expert interviews
a medium-sized organization. With branches in three Latin Ameri- with computer vision engineers, data quality analysts, and man-
can countries, Emérita conducts projects in data annotation, content agers, we investigated how task instructions are formulated and
164
114
communicated to data processing workers, and how this process is for creating effective documentation procedures that are based on
documented. The interviews revolved around the object, purpose, workers’ needs and possibilities.
and responsibilities of documentation. Moreover, we discussed is-
sues and possible solutions for implementing broader forms of 4.1 Actors and Collaboration
documentation in industrial contexts at the intersection of data
Our first research question inquires about ways of making the
processing and computer vision.
specific production contexts of image datasets explicit in documen-
We conducted a total of fourteen expert interviews. Four infor-
tation. In this section, we take a first step towards unpacking RQ1
mants were managers with large data processing companies located
by describing the characteristics of such production contexts.
in Kenya, India, and Iraq. In addition, six expert interviews were
The creation of computer vision datasets requires the collabo-
conducted with computer vision practitioners working on products
ration of actors that often work in different organizations. At the
including an aesthetics model that sorts and rates personal image
intersection of data collection, data annotation, and computer vi-
libraries, a scanner that detects contamination on hands, and optical
sion engineering, not every actor has the same influence on data
sorting equipment for the classification of waste. The computer
[32]. Power differentials become evident when deciding which data
vision practitioners work for companies located in Germany, Spain,
to collect, how to classify it, and how to label it. Many datasets are
and the United States. Finally, four of the interviews conducted at
produced with a specific computer vision product in mind. Dataset
Emérita and Action Data revolved almost exclusively around the
design begins as the expected outcome of that product (in terms
role of requesters in documentation and were framed as expert
of computational output but also of revenue) is transformed into
interviews.
task instructions for data collectors and annotators. A typical as-
While the goal of in-depth interviews is revealing practices and
signments is illustrated by a data collection project of Active Data:
perceptions, the purpose of expert interviews is to obtain additional
the company received task instructions to collect images of diverse
professional assessments on the research topic [29]. The sampled
human faces from a Western European company, producing identi-
interview partners were considered experts because they were able
fication and verification systems. Eva, the founder of Active Data,
to provide unique insights into widespread routines and practices
offered more details:
in their and other companies. With an average length of 48 min-
utes and conducted face-to-face or remotely, the expert interviews “They were interested in a diversity of five differ-
allowed us to contextualize some of the practices observed at field- ent ethnicities, so Caucasian, African, Middle Eastern,
work and analize to what extent observations could be generalized Latin American and Asian. Of course, very debatable
to other settings. whether these can be the five categories that can clas-
sify people around the world ”
3.2 Data Analysis This type of assignment generally revolves around a client‘s envi-
sioned computer vision product and underlying business idea. The
For the analysis, we integrated field notes with a total of thirty
technical assumptions of a classification system demand mutually
interview transcriptions and used constructivist grounded theory
exclusive categories, in this case even for a problematic concept
principles [10] to code and interpret the data. We conducted phases
such as race. Whether such categorisation captures the realities
of open, axial, and selective coding and let the categories emerge
of data subjects or coincides with the values and believes of data
from the data. We applied a set of premises [14] to make links
workers is not negotiated. Written instructions formulated by the
between categories visible and make them explicit in our research
requester are passed along to project managers who brief workers.
documentation and in open discussions among three coders. We
Workers then start collecting the images. For outsourcing compa-
constantly compared the collected data to revise our emergent
nies, the rationale behind data-related decisions is “doing what
understanding or find additional evidence of observed phenomena.
the client ordered” and “offering value to the client.” Conversely,
Four salient axial dimensions identified during the analysis process
the rationale shaping datasets in computer vision companies is
constitute the base for the findings we present in the following
“data needs to fit the model” and “data processing should be fast,
section.
cost-efficient, and high-quality.”
Power differentials between service providers and requesters be-
4 FINDINGS come even more evident given that the data processing companies
participating in this investigation are located in developing coun-
As stated in Introduction, this paper explores three research ques-
tries, while their clients are in the Global North. In view of such
tions: (RQ1) How can the specific contexts that inform the produc-
asymmetries, decisions about what to document and the financial
tion of image datasets be made explicit in documentation? (RQ2)
means to do so largely depend on the most powerful actors. Anna,
Which factors hinder documentation in this space? (RQ3) How can
an intern working at Action Data and in charge of auditing the com-
documentation be incentivized? Our findings unpack documenta-
pany and conducting an impact assessment, concisely described
tion practices at the intersection of data collection, data annotation,
these dynamics:
and computer vision engineering. Through descriptions and in-
terview excerpts, we describe salient dimensions emerging from Q: “What do you think are the potential drivers or rea-
our data: actors and collaboration, documentation purpose, docu- sons for the implementation of the more transparent
mentation as burden, and intelligibility of documentation. These approach to documenting systems and processes?”
four dimensions reveal scenarios that should be taken into account A: “If the customer demands it.”
165
115
Q: “Is this something you have heard before, cus- avoid repeating the same mistakes. And also, it helps
tomers demanding a more ...” us in looking for better ways of doing the work, how
A: “No.” to measure where it is possible and also what other
Moreover, computer vision companies often regard some of the process we can improve, like in the process flow”
information that could or should be documented as confidential, Given the differentials of power described in the previous section,
especially if it involves details about the intended product or if some documentation is many times perceived as useful for accountability
of the processes involved in producing the dataset are considered a between outsourcers and requeters. Several informants working at
strategic advantage. Given the collaborative nature of data creation, data processing companies highlight the importance of preserving
one stakeholder’s opacity may affect others’ inclination towards task instructions and documenting changes instructed by clients.
transparency. As Active Data’s founder Eva (and several others of Keeping this type of record might serve as proof that tasks were
our informants) described, secrecy in computer vision hinders her carried out as instructed. In the next interview excerpt, the founder
company’s attempts to document work processes: of Active Data describes how documentation might help resolve
“It’s also a small challenge of how to preserve some discrepancies if clients are not satisfied with the quality of the
of the know-how throughout the different projects service provided or decide to demand more:
without of course revealing too much about the dif- “We also keep the client accountable so that they don’t
ferent processes that each client has, you know, the come up with a new requirement or something that
confidential information from each project.” we haven’t mentioned before. So, SoWs [scope of work
In many cases, this issue leads to reluctance to share existing documents] are also for accountability of us towards
documentation with other stakeholders and the general public or the client as well so that the client can have a docu-
to not document at all. ment where they can keep track of what the arrange-
ment is and so on beyond our contract”
4.2 Documentation Purpose However, accountability within teams can become surveillance
The reasons for documenting the production of datasets and the for workers: several informants account for the connection be-
forms of documentation vary with each organization. To start con- tween project documentation and the measurement of workers’
sidering ways of incentivizing documentation (RQ3), we first must performance in data processing companies. The Argentine com-
look into common needs and goals that different stakeholders may pany, Emérita, directs great efforts to measure workers’ perfor-
have in relation to disclosure documents. In this sense, we have mance and output quality and to transform those into numbers and
identified four common documentation purposes: preservation of charts. Nati, Emérita’s continuous improvement analyst, described
knowledge, improvement of work practices, accountability, and dis- this process:
closure of dataset’s specifications. “Within the project documentation, we have an ex-
All data processing companies participating in this investigation ternal person who checks if the work the team did
carry out some form of project documentation. In a more or less is right or wrong, then documents the percentage
structured way, companies document task instructions provided by of right and wrong. [...] If something is wrong, we
clients. Instructions may change as projects develop, or workers fix it before the client notices. But still, even when
might develop new practices according to clients’ feedback. Soo is it is fixed, we record that there was something that
a project manager at the Kenyan branch of a large data processing was wrong and record who was responsible for the
company. During our interview, he explained how this form of mistake.”
documentation can help improve existing processes and practices: Finally, in the case of datasets for public use or without a pre-
“We have a ‘lessons learned’- folder where we put all established purpose, organizations might find it important to docu-
these items. Like the client has said, ‘You did not do ment and disclose datasets’ specifications. This particular case was
well here.’ We’ll find in our process, there was this reported by our informants at Action Data, as the company had
flaw. We will document that. And then what happens recently released two datasets for public use. During an interview,
after we document is that information is stored to be Eva contemplated the possibility of releasing a disclosure document
used for that project and some future projects with along with the datasets:
the same kind of process work.” “It might be nice to implement some type of docu-
The preservation of this form of praxis-based knowledge is crucial mentation at least for them [datasets for open use]
because it helps organizations resolve doubts that might emerge, because they’re for external use and it might be good
train future workers, and apply situated solutions to future projects. to know what the origin of the images are, what the
Similarly, documentation can also serve to revise and improve work process of annotation had been and so on.”
practices and flows, as further described by Soo: It is worth mentioning that releasing datasets for public use is
“How can we improve this process? This did not go usually not within the scope of outsourcing companies. Investing
well. What was the issue? How did we solve it? How resources to produce a pro-bono dataset represents a considerable
can we avoid this in future? And you will get infor- effort for these companies. In the case of Active Data, the dataset
mation for a project that was done five years ago [...] was made publicly available as part of the company’s marketing
The documentation helps us in making sure that we strategy.
166
116
4.3 Documentation as Burden discussed the need for integrating documentation in existing work-
Relevant to start unpacking factors that hinder documentation flows. He moreover imagines that extending projects’ deadlines to
(RQ2) is the fact that several informants see documentation as prioritize documentation would not be seen as acceptable within
time-consuming, extra work that is likely to delay the completion his company’s culture:
of workers’ “actual” tasks. This is a widespread view among the “Time is a huge issue. I mean, I think planning is very
computer vision practitioners interviewed for this investigation important, get the time to do it [documenting] and
and coincides with the observation that, among the different roles that everybody knows this is supposed to be done.
explored in this study, computer vision companies seem to be the Because right now, documenting is not a task and I
least inclined to document work practices. don’t know that I would have a gap between projects
“Lack of time” is the most widespread answer when informants so I could document. And this is never a priority for
are asked why there are not more aspects of data creation reflected the company, they expect me to meet my deadlines, I
in reports. Documentation is broadly perceived as optional, a nice- can’t just drop my deadlines to document. And this is
to-have feature that is implemented only once all “important” issues a problem. If documenting was part of the deadline,
are sorted. Andre, a US-based computer vision engineer with a start- companies wouldn’t just leave it for another time”
up dedicated to producing scanners that detect contamination on
hands, described his company’s position on this issue: Even in companies that integrate laborious documentation in
their work processes, as is the case of Emérita, there are instances
“[Documenting] is lower on our priority list than a where documenting is just not profitable. Nati, one of our infor-
bunch of other things that we need to do. It’s just mants with the Argentine data processing company, describes one
not the company’s priority at this moment. There of those situations:
are other more valuable things to keep the company
successful. As the engineering team grows, as we have “It happens sometimes that we do one-time projects
more time to do those things and our work to meet that go only for one or two weeks. In those cases,
the company’s exact needs are less burdensome, then documentation is a waste of time and money, because
we’d go to more documentation.” the client buys, let’s say, eighty hours and you spend
twenty documenting. It’s just not profitable.”
Among our informants in computer vision companies, the view
persists that documentation is an activity only large corporations As expected, financial incentives, or the lack thereof, can also
can afford. As further reported by Andre, start-up teams are smaller, influence views on documentation.
and workers are multitasking, which reinforces the view that there
are more pressing issues than documentation: 4.4 Intelligibility of Documentation
“That’s one of the interesting things about start-ups. To further investigate factors that hinder documentation (RQ2) it is
You don’t have the time to document everything. [...] necessary to explore issues around creating compelling, retrievable,
There is a lot of knowledge in every single person and intelligible disclosure documents. To illustrate some relevant as-
here that would take far too long to pull out of them pects related to structuring and providing access to documentation,
and transfer to a new person and keep the company we draw on the observations made during fieldwork at both data
still running at the same time.” processing companies, Emérita and Active Data. Both companies
A similar observation was made by Eva, the founder of the Bul- have vast experience in the documentation of data collection and
garian data processing company, regarding her company’s clients: annotation projects.
In the case of the Argentine company, Emérita, due to the exten-
“We’ve been working with quite a lot of new compa-
sion of documentation and the large number of projects conducted,
nies recently. Some of them are bigger corporations
navigating and maintaining disclosure documents has become dif-
that have more let’s say bureaucratic procedures and
ficult. Nati, a continuous improvement analyst, is in charge of
more detailed processes of description of everything
addressing this issue:
that’s happening around the project, while others are
just start-ups that prefer very lightweight, minimum “What happened a lot was that information was re-
involvement and paperwork around their projects.” peated in many places. The objectives were written
in three different documents. The people who were
Lack of incentives, external or internal, is another reason why
in the project were in two different systems [...] So,
documentation might be perceived as a burden. For instance, some
having that repeated was horrible, because every time
informants agreed that laws and regulations would be an excellent
people in the team changed, well, you needed to up-
external incentive for technology companies to integrate documen-
date many things and credentials”
tation as a constitutive part of their work. In the absence of regula-
tions, documentation is seen as optional extra work. As for internal Nati works on optimizing some of her company’s internal pro-
incentives within organizations, several computer vision practi- cesses, including documentation. For that purpose, she has surveyed
tioners explained that documenting was not a part of their work project documents, observed how the company teams work, and dis-
routines and was therefore not encouraged by the company’s struc- cussed with them how documentation can be improved. Her main
tures. Emmanuel, a computer vision engineer based in Barcelona focus lies in producing documentation that can be easily retrieved
and working on optical sorting equipment for waste’s classification, and used, which can be very challenging:
167
117
“For example, in the case of project guides, it was not lay out implications of our observations and outline a documen-
clear what documentation had to be done, so everyone tation framework to address the contexts and issues described in
did what they wanted, or what they remembered, or Findings.
what they knew, because someone told them, and Given the collaborative nature of datasets production, we argue
when information was needed, they didn’t know if that documentation should not be carried out in the vacuum of each
it had been documented or not, or they didn’t know organization. The framework we propose regards dataset documen-
where to find it. We lost a lot of information like this.” tation as a collaborative project involving all actors participating
in the production chain. This is not easy for sure. To address such
Further issues related to the intelligibility of documentation may challenge, we propose that reflexivity, understood as a collective
arise depending on who is in charge of documenting and who are endeavor [7], be an integral part of such collaborative documenta-
the users of documentation. In the case of Active Data, the Bulgarian tion. As argued by Bourdieu [8], this form of collective reflexivity
company working with refugees from the Middle East, language accounts for actors’ social position and aims to interrogate praxis
and lack of technical knowledge is one of those issues: fields and the relations that constitute them. In a similar manner, re-
flexive documentation should help to make visible the interpersonal
“Since we’re working with people who very frequently
and inter-organizational relations that shape datasets. As described
do not have high levels of education or do not speak
in the Related Work section, Bourdieu’s notion of reflexivity cov-
good English, I’ve heard a lot of complaints that peo-
ers three levels of hidden presupposition: the researcher’s social
ple are not reading the training documents or they’re
position, the epistemology of each disciplinary field, and “the intel-
not following them or they’re asking questions that
lectualist bias”, described as the scholarly gaze researchers use to
appear or are already answered in the training docu-
analyze the social world as if they were not part of it [7, 8]. We take
ments. So, it can be quite frustrating because people
this perspective and transform Bourdieu’s “Invitation to Reflexive
may not be used to following such documentation
Sociology” [8] into an invitation to reflexive data practices. What
and they might need additional training just to know
constitutes our invitation entails much more than observing how
how to use this recommendation, how to read it and
one actors’ positionality affects data: If documentation is to be seen
how to follow it”
as a collaborative project, reflexivity of work practices should be un-
Creating useful reports that can be easily retrieved and under- derstood as a collective endeavor, where widespread assumptions,
stood is challenging. How disclosure documents are created, in- field methodologies, and power relations are interrogated.
dexed, and stored depends to a greater extent on the intended With this framework, we regard documentation in a two-fold
addressees of documentation. As illustrated by the previous inter- manner: First, as an artifact (the resulting documentation) that
view excerpt, language is important if stakeholders with different enables permanent exchange among stakeholders participating in
levels of literacy will make use of documentation. data creation. We envision disclosure documents that travel among
actors and organizations, across cultural, social, and professional
boundaries, and are able to ease communication and promote inter-
5 DISCUSSION organizational accountability. Second, we regard documentation
As described in Findings, work at the intersection of data collec- as a set of reflexive practices (the act of documenting) intended to
tion, annotation, and computer vision engineering requires strong make naturalized preconceptions and routines explicit. Just as Bour-
coordination efforts among actors that occupy different (social) posi- dieu regards reflexivity as a “necessary prerequisite” for scientific
tions. Documentation purpose, organizational priorities, and needs inquiry [8], the reflexive practices involved in our documentation
around documentation intelligibility vary across stakeholders. In framework should be seen as a constitutive part of data work. If
such heterogeneous contexts, some actors hold more power than reflexivity is only regarded as a desirable goal related to AI ethics
others and decisions made at the most powerful end will inevitably and not as actual part of the job, documentation will never be con-
affect work practices and outputs at every level. These power differ- sidered a priority and, as described in Findings, it will continue to
entials and their effects are broadly naturalized [17, 19, 32]. Despite be perceived as a burden.
their decisive effects on data, decisions and instructions that are
rooted in such naturalized power imbalances are mostly perceived
as self-evident and remain undocumented as a consequence.
Previous research has emphasized the importance of document- 5.1 Why Reflexivity?
ing machine learning datasets [22, 23, 27, 30, 49]. While we acknowl- Our research questions enquire about ways of making the contexts
edge that work for creating the foundations for our investigation, that inform the production of image datasets explict in documenta-
we also argue that the frameworks proposed are not sufficient to tion and about factors that hinder or incentivize the implementation
interrogate power differentials and naturalized preconceptions en- of documentation in industry settings. In view of our findings, we
coded in data. With our investigation, we move the focus away argue that effective documentation should be able to reflect the
from documenting datasets’ technical features and highlight the dynamics of power and negotiation shaping datasets through work
importance of accounting for production contexts. Our research practices. However, making visible the hierarchies, worldviews,
questions address the challenge of documenting production pro- and interests driving decisions and instructions is extremely chal-
cesses that are characterized by the multiplicity of actors, needs, lenging. One major difficulty lies in their taken-for-grantedness:
and decision-making power. In this and the following sections, we documenting naturalized power dynamics and decisions that are
168
118
largely perceived as self-evident [33] require intensive reflexive 5.2 Why Document?
practice. Data processing services and computer vision companies might be
The three previously-mentioned levels of reflexivity proposed reluctant to implement such an elaborate approach to documen-
by Bourdieu (social position, field epistemology, and intellectualist tation. Our third research question asks how can documentation
gaze) can be useful to discuss why reflexivity should be at the core be incentivized. In this section, we consider four ways in which
of documentation practices in data creation for computer vision. the Bourdieusian framework previously outlined can constitute
They provide an additional lens through which data practices can an asset for organizations, and thus serve as an incentive for the
be approached, and as such, serve as a complement to on-going uptake of reflexive documentation.
work and discussions regarding the documentation of datasets:
First, reflexive documentation should consider the social posi- 5.2.1 Preservation of Knowledge. Reflexive documentation could
tion of workers involved in dataset production, not just individu- make praxis-based and situated decision-making explicit and help
ally but in their relation to other stakeholders. Such consideration preserve it in documentation. This knowledge can become long
could help produce documentation that brings power imbalances term business assets for companies. Moreover, reflexive documen-
into light and questions taken-for-granted instructions and hier- tation can preserve know-how relevant to data work [39] that may
archies. This relational examination is especially important due get lost due to workers flow. As the flow of workers brings about
to the widespread use of outsourced services for the collection problems in task transfer and reinvestment in training new em-
and annotation of data: Workers at crowdsourcing platforms are ployees, documentation that preserves knowledge and methods for
subject to precarious employment conditions [28, 45]. In the im- effective data work, be they project-specific or not, can ease the
pact sourcing companies presented in this paper, workers come transition.
from marginalized communities (refugees in Active Data, slum res- Furthermore, documentation can “have analytical value [and]
idents in Emérita). Most of them have no technical education. How improve communication in interdisciplinary teams” [32]. The frame-
does their social position affect these workers’ ability and power work offered in this paper highlights the collective nature of re-
to question the instructions commanded by computer vision engi- flexivity. We argue that documentation that preserves praxis-based
neers or data scientists in tech companies? This question becomes knowledge and best practices (as described in section 4.2) should be
even more pressing if we examine the relationship that connects circulated among collaborating companies rather than be produced
data processing services in developing countries with computer and retrieved in the vacuum of each organization. For one thing,
vision companies in the Global North. Documentation frameworks sharing such documentation with other stakeholders may improve
that are oblivious to the fact that production chains are shaped by the quality of data work and of the datasets that are produced as
asymmetrical relationships will never be effective in reflecting how a result. For another, documentation providing more details on
those asymmetries affect data. In this sense, reflexive documenta- discretionary decision-making and its contexts can enhance trans-
tion should bring power differentials to light and, ideally, empower parency and facilitate a better understanding of datasets before
those in vulnerable positions to speak up and raise questions. model development.
Second, reflexive documentation should serve to question field
epistemologies. Examining the epistemology of computer vision 5.2.2 Inter-organizational accountability. Tracking decisions and
might shed light on the assumptions, methods, and framings un- responsibilities in environments and processes that involve multiple
derlying the production of image datasets. As Crawford and Paglen organizations can be challenging. As described in Findings, data pro-
[16] argue, computer vision is “built on a foundation of unsubstan- cessing companies use documentation to foster inter-organizational
tiated and unstable epistemological and metaphysical assumptions accountability and protect themselves in the face of disagreements
about the nature of images, labels, categorization, and represen- with clients. At the same time, computer vision companies might
tation.” Bringing these assumptions forward in documentation is consider documentation as a tool to keep track of the processing
important because socially-constructed categories, such as race and status of projects and audit requested tasks. Reflexive documen-
gender, are generally presented as indisputable in image datasets tation could be especially useful to improve traceability, as the
[46]. Furthermore, a fixed and universal nature is not only ascribed participation of many actors and iterations in data creation may
to the categories as such, but also to the correspondence that sup- lead to accountability dilution [32]. Moreover, documentation could
posedly exists between images and categories, appearances and provide “organizational infrastructure” that empowers individual
essences [16]. Reflexivity should help reveal the political work such advocates among workers to raise concerns and reduces the social
assumptions perform behind their purely technical appearance. costs for such actions [30]. An infrastructure based on the reflexivity
Finally, reflexive documentation should help practitioners ques- framework outlined in this paper could facilitate the interrogation
tion the "intellectualist gaze" [7] in data work. This type of bias is of intra- and inter-organizational relations, normative assumptions,
the inclination to place ourselves outside the object of research. This and workflows shaping data at the three levels described in the
form of examination would highlight the role of workers and or- previous section.
ganizations in creating data while questioning widespread notions Conducting documentation at a collaborative level, which means
such as “raw data” and “ground truth labels”. Reflexivity should to engage various actors and to accommodate documentation to
therefore help to adopt a relational view on data and data work, their needs, can serve as a platform for permanent exchange among
acknowledging data as a “human-influenced entity” [35] that is stakeholders. Enabling permanent exchange could help anticipate
shaped by individual discretion, (inter-)organizational routines, disagreements and misunderstandings, thus improving task quality
and power dynamics. and reducing completion time.
169
119
5.2.3 Auditability. Documentation based on reflexivity could con- demonstrate that societal values and fundamental rights, as well as
stitute an asset for organizations to prevent issues before they are an appropriate level of reflexivity, have been maintained throughout
made public or weather the storm in the face of PR failures. Dis- the computer vision value chain, rather than purposefully avoided
closure documents that are able to retrieve the context of dataset via outsourcing strategies and/or the exercise of power. Similarly,
production could constitute a useful tool for auditability, for in- provenance may increase the accountability and responsibility of
stance, when computer vision outputs are publicly questioned or powerful entities in both their actions and their given instructions.
for internal ethics teams who would like to perform an assess-
ment for potential fairness concerns prior to the release of a model 6 LIMITATIONS AND FUTURE WORK
trained on such data [42, 43]. Such documents could help to identify
problematic issues before they become public pushbacks. Moreover, This investigation was designed to be qualitative and exploratory.
Our findings are bound to the specific contexts of the companies
in case of public failures, documentation could provide an audit
and individuals participating in our studies and cannot be general-
trail that would allow organizations to address problems and offer
ized to all computer vision production settings. In the future, we
solutions promptly. In this sense, public pressure could constitute
seek to broaden this research by investigating ways of integrat-
an incentive for companies towards documentation.
ing the framework outlined in this paper in real-world production
In such cases, counting with reflexive documentation to au-
dit datasets could help companies offer solutions that go beyond workflows and co-designing actionable guidelines for reflexive doc-
“throwing in more data“ and are able to address issues at the three umentation together with industry practitioners.
Bourdieusian levels previously described: identifying asymmetrical
relationships that might have been encoded in datasets, interrogat- 7 CONCLUSION
ing widespread assumptions in computer vision, and questioning Based on fieldwork at two data processing companies and inter-
data, even “raw” data. views with data collectors, annotators, managers, quality assurance
analysts, and computer vision practitioners, we described wide-
5.2.4 Regulatory Intervention. Organizations could also be pushed spread documentation practices and presented observations related
towards documentation through regulatory intervention. Yet, be- to the purpose, challenges, and intelligibility of documentation.
fore any form of reflection, including the documentation thereof, In view of these findings, we proposed a reflexivity-based ap-
can be imposed, a few observations can be made: proach for the documentation of datasets, with a special focus on
First, while documentation might be considered an important the context of their production. We described documentation as a
component or step of the reflexive process, it is neither constitutive set of reflexive practices and an artifact that enables permanent ex-
to, nor sufficient for, reflection. Reflexivity represents a state of change among actors and organizations. We argued that disclosure
awareness, an encouragement for actors involved in data creation documents should travel across organizational boundaries, and be
to more widely consider the impact of their practices. Reflexivity able to ease communication and foster inter-organizational account-
can already be valuable in itself. The policy end-goal is therefore ability. We imagined documentation as a collaborative project and
to stimulate a reflexive mindset and to establish the right condi- argued that reflexivity of work practices should therefore be under-
tions for such a mindset to fully come to fruition. Conversely, if stood as a collective endeavor, where not only personal positions
regulation only aims at pushing documentation, the danger exists but also praxis fields are interrogated.
that such regulatory requirements are approached as merely an Achieving a healthy balance between these elements and in-
administrative exercise towards compliance. centivizing practitioners and organizations to implement reflexive
Second, if the encouragement of reflexivity through legal means documentation is not easy. The challenge is nevertheless worth
would be desired, such mechanisms may already be (partially) exploring if we aim at addressing some of the ethical issues related
present in existing initiatives. For instance, it could be argued that to the production of data for computer vision systems.
the EU General Data Protection Regulation’s increased emphasis
on accountability and risk-based responsibility stimulates some
level of reflection where personal data are involved [36]. Reflexiv-
8 ACKNOWLEDGMENTS
ity could moreover become an additional supportive tool for data Funded by the German Federal Ministry of Education and Research
workers as a means to detect and mitigate the impact data actions (BMBF) – Nr. 16DII113f. Laurens Naudts received support from
have on (fundamental) rights, and as such, contribute towards the the Weizenbaum Institute Research Fellowship programme. We
compliance with existing legal frameworks. dearly thank the individuals and organizations participating in this
Third, given the multiplicity of actors involved in data creation, study. Thanks to Philipp Weiß for his help with Overleaf and to
regulatory initiatives should also carefully consider the actors they Leon Sixt, Matt Rafalow, Julian Posada, Gemma Newlands, and our
wish to target. Stakeholders should not only be targeted in isolation; anonymous reviewers for their valuable feedback. Special thanks
instead, policy makers should understand the relationships these to Prof. Bettina Berendt for her continuous support.
actors hold vis-a-vis one another, and the consequences that their
relationships bear on the activities performed. REFERENCES
Finally, any regulatory response must adequately consider the [1] Yewande Alade, Christine Kaeser-Chen, Elizabeth Dubois, Chintan Parmar, and
power asymmetries described in this paper, including their mani- Friederike Schüür. 2019. Towards Better Classification. (2019), 4. https://drive.
google.com/file/d/14uL1DQN8hRyDDDAm2WEleYbmxP7dqP72/view
festation within a globalized, international environment. Mecha- [2] Michelle Alexander. 2012. The New Jim Crow: Mass Incarceration in the Age of
nisms of provenance, such as documentation, could help ensure and Colorblindness (revised edition ed.). New Press, New York.
170
120
[3] M. Arnold, D. Piorkowski, D. Reimer, J. Richards, J. Tsay, K.R. Varshney, R. K. E. [26] Michael Hind, Stephanie Houde, Jacquelyn Martino, Aleksandra Mojsilovic, David
Bellamy, M. Hind, S. Houde, S. Mehta, A. Mojsilovic, R. Nair, K. Natesan Rama- Piorkowski, John Richards, and Kush R. Varshney. 2020. Experiences with Improv-
murthy, and A. Olteanu. 2019. FactSheets: Increasing trust in AI services through ing the Transparency of AI Models and Services. In Extended Abstracts of the 2020
supplier’s declarations of conformity. IBM Journal of Research and Development CHI Conference on Human Factors in Computing Systems (CHI EA ’20). Association
63, 4/5 (2019), 6:1–6:13. https://doi.org/10.1147/JRD.2019.2942288 for Computing Machinery, 1–8. https://doi.org/10.1145/3334480.3383051
[4] Emily M. Bender and Batya Friedman. 2018. Data Statements for Natural Lan- [27] Sarah Holland, Ahmed Hosny, Sarah Newman, Joshua Joseph, and Kasia Chmielin-
guage Processing: Toward Mitigating System Bias and Enabling Better Science. ski. 2018. The Dataset Nutrition Label: A Framework To Drive Higher Data
Transactions of the Association for Computational Linguistics 6 (2018), 587–604. Quality Standards. arXiv:1805.03677 (2018). .http://arxiv.org/abs/1805.03677
https://doi.org/10.1162/tacl_a_00041 [28] Lilly C. Irani and M. Six Silberman. 2013. Turkopticon: interrupting worker
[5] Bettina Berendt. 2019. AI for the Common Good?! Pitfalls, challenges, and ethics invisibility in amazon mechanical turk. In Proceedings of the SIGCHI Conference
pen-testing. Paladyn, Journal of Behavioral Robotics 10, 1 (Jan. 2019), 44–65. on Human Factors in Computing Systems (CHI ’13). Association for Computing
https://doi.org/10.1515/pjbr-2019-0004 Machinery, Paris, France, 611–620. https://doi.org/10.1145/2470654.2470742
[6] Eduardo Bonilla-Silva. 2006. Racism without Racists: Color-Blind Racism and the [29] Natalia M Libakova and Ekaterina A Sertakova. 2015. The Method of Expert
Persistence of Racial Inequality in the United States. The Rowman & Littlefield Interview as an Effective Research Procedure of Studying the Indigenous Peoples
Publishing Group, Inc., Lanham. OCLC: 781274997. of the North. Journal of Siberian Federal University. Humanities & Social Sciences
[7] Pierre Bourdieu. 2000. Pascalian meditations. Stanford University Press, Stanford, 8, 1 (2015), 114–129. https://doi.org/10.17516/1997-1370-2015-8-1-114-129
Calif. OCLC: 833852849. [30] Michael A. Madaio, Luke Stark, Jennifer Wortman Vaughan, and Hanna Wallach.
[8] Pierre Bourdieu, Loïc J. D. Wacquant, and Loïc J. D. Wacquant. 1992. An Invitation 2020. Co-Designing Checklists to Understand Organizational Challenges and
to Reflexive Sociology. University of Chicago Press. tex.ids: bourdieu1992b Opportunities around Fairness in AI. In Proceedings of the 2020 CHI Conference
googlebooksid: rs4fEHa0ijAC. on Human Factors in Computing Systems (CHI ’20). Association for Computing
[9] B. Bourke. 2014. Positionality: Reflecting on the Research Process. The Qualitative Machinery, Honolulu, HI, USA, 1–14. https://doi.org/10.1145/3313831.3376445
Report 19, 33 (2014), 1–9. https://nsuworks.nova.edu/tqr/vol19/iss33/3 tex.ids: madaio2020a.
[10] Kathy Charmaz. 2006. Constructing Grounded Theory: A Practical Guide through [31] Frances A. Maher and Mary Kay Tetreault. 1993. Frames of Positionality: Con-
Qualitative Analysis. Sage Publications, London ; Thousand Oaks, Calif. structing Meaningful Dialogues about Gender and Race. Anthropological Quar-
[11] Eunsol Choi, He He, Mohit Iyyer, Mark Yatskar, Wen-tau Yih, Yejin Choi, Percy terly 66, 3 (1993), 118–126. https://doi.org/10.2307/3317515
Liang, and Luke Zettlemoyer. 2018. QuAC : Question Answering in Context. In [32] Milagros Miceli, Martin Schuessler, and Tianling Yang. 2020. Between Subjectivity
Proceedings of the 2018 Conference on Empirical Methods in Natural Language and Imposition: Power Dynamics in Data Annotation for Computer Vision. Proc.
Processing. Association for Computational Linguistics, Brussels, Belgium, 2174– ACM Hum.-Comput. Interact. 1, 1 (2020), 25. https://doi.org/10.1145/3415186
2184. https://doi.org/10.18653/v1/D18-1241 [33] Milagros Miceli, Martin Schüßler, and Tianling Yang. 2020. Between Subjectivity
[12] David Coghlan and Mary Brydon-Miller (Eds.). 2014. The Sage encyclopedia of and Imposition: A Grounded Theory Investigation into Data Annotation. (2020),
action research. SAGE Publications, Inc, Thousand Oaks, California. 19.
[13] Patricia Hill Collins. 1990. Black feminist thought: knowledge, consciousness, and [34] Margaret Mitchell, Simone Wu, Andrew Zaldivar, Parker Barnes, Lucy Vasser-
the politics of empowerment. Number v. 2 in Perspectives on gender. Unwin man, Ben Hutchinson, Elena Spitzer, Inioluwa Deborah Raji, and Timnit Gebru.
Hyman, Boston. 2019. Model Cards for Model Reporting. In Proceedings of the Conference on
[14] Juliet M. Corbin and Anselm L. Strauss. 2015. Basics of qualitative research: Fairness, Accountability, and Transparency (FAT* ’19). Association for Computing
techniques and procedures for developing grounded theory (fourth edition ed.). Machinery, 220–229. https://doi.org/10.1145/3287560.3287596
SAGE, Los Angeles. https://us.sagepub.com/en-us/nam/basics-of-qualitative- [35] Michael Muller, Ingrid Lange, Dakuo Wang, David Piorkowski, Jason Tsay, Q. Vera
research/book235578 tex.ids: dnsc2015. Liao, Casey Dugan, and Thomas Erickson. 2019. How Data Science Workers
[15] Henriette Cramer, Jean Garcia-Gathright, Sravana Reddy, Aaron Springer, and Work with Data: Discovery, Capture, Curation, Design, Creation. In Proceedings
Romain Takeo Bouyer. 2019. Translation, Tracks & Data: An Algorithmic Bias of the 2019 CHI Conference on Human Factors in Computing Systems (CHI ’19).
Effort in Practice. In Extended Abstracts of the 2019 CHI Conference on Human Association for Computing Machinery, Glasgow, Scotland Uk, 1–15. https:
Factors in Computing Systems (CHI EA ’19). ACM, New York, NY, USA, CS21:1– //doi.org/10.1145/3290605.3300356
CS21:8. https://doi.org/10.1145/3290607.3299057 event-place: Glasgow, Scotland [36] Laurens Naudts. 2019. How Machine Learning Generates Unfair Inequalities
Uk. and How Data Protection Instruments May Help in Mitigating Them. In Data
[16] Kate Crawford and Trevor Paglen. 2019. Excavating AI: The Politics of Images in Protection and Privacy : The Internet of Bodies (first ed.), Ronald Leenes, Rosamunde
Machine Learning Training Sets. https://www.excavating.ai tex.ids: zotero-3263. van Brakel, Serge Gutwirth, and Paul De Hert (Eds.). Hart Publishing, Oxford,
[17] Emily Denton, Alex Hanna, Razvan Amironesei, Andrew Smart, Hilary Nicole, 71–92.
and Morgan Klaus Scheuerman. 2020. Bringing the People Back In: Contesting [37] High-Level Expert Group on Artificial Intelligence. 2019. Ethics Guidelines for
Benchmark Machine Learning Datasets. arXiv:2007.07399 [cs] (July 2020). http: Trustworthy AI. Technical Report. European Commission.
//arxiv.org/abs/2007.07399 arXiv: 2007.07399. [38] Samir Passi and Solon Barocas. 2019. Problem Formulation and Fairness. In
[18] Robin J. DiAngelo. 2018. White fragility: why it’s so hard for white people to talk Proceedings of the Conference on Fairness, Accountability, and Transparency (FAT*
about racism. Beacon Press, Boston. ’19). Association for Computing Machinery, Atlanta, GA, USA, 39–48. https:
[19] Catherine D’Ignazio and Lauren F. Klein. 2020. Data feminism. The MIT Press, //doi.org/10.1145/3287560.3287567
Cambridge, Massachusetts. https://mitpress.mit.edu/books/data-feminism [39] Samir Passi and Steven Jackson. 2017. Data Vision: Learning to See Through
[20] M. C. Elish and danah boyd. 2018. Situating methods in the magic of Big Data Algorithmic Abstraction. In Proceedings of the 2017 ACM Conference on Computer
and AI. Communication Monographs 85, 1 (Jan. 2018), 57–80. https://doi.org/10. Supported Cooperative Work and Social Computing (CSCW ’17). Association for
1080/03637751.2017.1375130 Computing Machinery, Portland, Oregon, USA, 2436–2447. https://doi.org/10.
[21] Mustafa Emirbayer and Matthew Desmond. 2012. Race and reflexivity. Ethnic 1145/2998181.2998331
and Racial Studies 35, 4 (April 2012), 574–599. https://doi.org/10.1080/01419870. [40] Samir Passi and Steven J. Jackson. 2018. Trust in Data Science: Collaboration,
2011.606910 Translation, and Accountability in Corporate Data Science Projects. Proc. ACM
[22] Timnit Gebru, Jamie Morgenstern, Briana Vecchione, Jennifer Wortman Vaughan, Hum.-Comput. Interact. 2, CSCW (Nov. 2018), 1–28. https://doi.org/10.1145/
Hanna Wallach, Hal Daumé III, and Kate Crawford. 2020. Datasheets for Datasets. 3274405
arXiv:1803.09010 [cs] (March 2020). http://arxiv.org/abs/1803.09010 arXiv: [41] Samira Pouyanfar, Saad Sadiq, Yilin Yan, Haiman Tian, Yudong Tao, Maria Presa
1803.09010. Reyes, Mei-Ling Shyu, Shu-Ching Chen, and S. S. Iyengar. 2018. A Survey on
[23] R. Stuart Geiger, Kevin Yu, Yanlai Yang, Mindy Dai, Jie Qiu, Rebekah Tang, and Deep Learning: Algorithms, Techniques, and Applications. ACM Comput. Surv.
Jenny Huang. 2020. Garbage in, garbage out? do machine learning application 51, 5 (Sept. 2018), 92:1–92:36. https://doi.org/10.1145/3234150
papers in social computing report where human-labeled training data comes [42] Inioluwa Deborah Raji and Joy Buolamwini. 2019. Actionable Auditing: Inves-
from?. In Proceedings of the 2020 Conference on Fairness, Accountability, and tigating the Impact of Publicly Naming Biased Performance Results of Com-
Transparency (FAT* ’20). Association for Computing Machinery, Barcelona, Spain, mercial AI Products. In Proceedings of the 2019 AAAI/ACM Conference on AI,
325–336. https://doi.org/10.1145/3351095.3372862 Ethics, and Society (AIES ’19). Association for Computing Machinery, 429–435.
[24] Alex Hanna, Emily Denton, Andrew Smart, and Jamila Smith-Loud. 2020. Towards https://doi.org/10.1145/3306618.3314244
a Critical Race Methodology in Algorithmic Fairness. In Proceedings of the 2020 [43] Inioluwa Deborah Raji, Andrew Smart, Rebecca N. White, Margaret Mitchell,
Conference on Fairness, Accountability, and Transparency (FAT* ’20). Association Timnit Gebru, Ben Hutchinson, Jamila Smith-Loud, Daniel Theron, and Parker
for Computing Machinery, Barcelona, Spain, 501–512. https://doi.org/10.1145/ Barnes. 2020. Closing the AI Accountability Gap: Defining an End-to-End Frame-
3351095.3372826 tex.ids: hanna2020a. work for Internal Algorithmic Auditing. In Proceedings of the 2020 Conference
[25] Sandra Harding. 1993. Rethinking Standpoint Epistemology: What is "Strong on Fairness, Accountability, and Transparency. ACM, Barcelona Spain, 33–44.
Objectivity"? In Feminist Epistemologies. Routledge, 49–82. https://doi.org/10.1145/3351095.3372873
171
121
[44] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. "Why Should I Gender in Image Databases for Facial Analysis. Proc. ACM Hum.-Comput. Interact.
Trust You?": Explaining the Predictions of Any Classifier. In Proceedings of the 4, CSCW1 (2020). https://doi.org/10.1145/3392866 Article 058.
22nd ACM SIGKDD International Conference on Knowledge Discovery and Data [47] Ismaïla Seck, Khouloud Dahmane, Pierre Duthon, and Gaëlle Loosli. 2018. Base-
Mining. ACM, San Francisco California USA, 1135–1144. https://doi.org/10.1145/ lines and a datasheet for the Cerema AWP dataset. In Conférence d’Apprentissage
2939672.2939778 CAp (Conférence d’Apprentissage Francophone 2018). Rouen, France. https:
[45] Niloufar Salehi, Lilly C. Irani, Michael S. Bernstein, Ali Alkhatib, Eva Ogbe, Kristy //doi.org/10.13140/RG.2.2.36360.93448
Milland, and Clickhappier. 2015. We Are Dynamo: Overcoming Stalling and [48] Dorothy E. Smith. 1990. The conceptual practices of power: a feminist sociology of
Friction in Collective Action for Crowd Workers. In Proceedings of the 33rd Annual knowledge. Northeastern University Press, Boston.
ACM Conference on Human Factors in Computing Systems - CHI ’15. ACM Press, [49] Jennifer Wortman Vaughan and Hanna Wallach. 2020. A Human-Centered
Seoul, Republic of Korea, 1621–1630. https://doi.org/10.1145/2702123.2702508 Agenda for Intelligible Machine Learning. In Machines We Trust: Getting Along
[46] Morgan Klaus Scheuerman, Kandrea Wade, Caitlin Lustig, and Jed R Brubaker. with Artificial Intelligence. http://www.jennwv.com/papers/intel-chapter.pdf
2020. How We’ve Taught Algorithms to See Identity: Constructing Race and
172
122
Paper 5: Documenting Data Production Processes
Documenting Data Production Processes: A Participatory

Approach for Data Work
MILAGROS MICELI, DAIR Institute, Technische Universität Berlin, and Weizenbaum Institute, Germany
TIANLING YANG, Technische Universität Berlin & Weizenbaum Institute, Germany
ADRIANA ALVARADO GARCIA∗ , IBM Thomas J. Watson Research Center, United States
JULIAN POSADA† , Yale University, United States
SONJA MEI WANG, Technische Universität Berlin, Germany
MARC POHL, Independent Researcher, Germany
ALEX HANNA, DAIR Institute, United States
The opacity of machine learning data is a significant threat to ethical data work and intelligible systems.
Previous research has addressed this issue by proposing standardized checklists to document datasets. This
paper expands that field of inquiry by proposing a shift of perspective: from documenting datasets towards
documenting data production. We draw on participatory design and collaborate with data workers at two
companies located in Bulgaria and Argentina, where the collection and annotation of data for machine learning
are outsourced. Our investigation comprises 2.5 years of research, including 33 semi-structured interviews, five
co-design workshops, the development of prototypes, and several feedback instances with participants. We
identify key challenges and requirements related to the integration of documentation practices in real-world
data production scenarios. Our findings comprise important design considerations and highlight the value of
designing data documentation based on the needs of data workers. We argue that a view of documentation
as a boundary object, i.e., an object that can be used differently across organizations and teams but holds
enough immutable content to maintain integrity, can be useful when designing documentation to retrieve
heterogeneous, often distributed, contexts of data production.
CCS Concepts: • Human-centered computing → Empirical studies in collaborative and social com-
puting; Collaborative and social computing design and evaluation methods.
Additional Key Words and Phrases: dataset documentation, data production, data work, data labeling, data
annotation, machine learning, transparency
Milagros Miceli, Tianling Yang, Adriana Alvarado Garcia, Julian Posada, Sonja Mei Wang, Marc Pohl, and Alex
Hanna. 2022. Documenting Data Production Processes: A Participatory Approach for Data Work. Proc. ACM
Hum.-Comput. Interact. 6, CSCW2, Article 510 (November 2022), 34 pages. https://doi.org/10.1145/3555623
∗ Also
† Also
with Georgia Institute of Technology, United States.
with University of Toronto, Canada.
510
Authors’ addresses: Milagros Miceli, m.miceli@tu-berlin.de, DAIR Institute, Technische Universität Berlin, and Weizenbaum
Institute, Berlin, Germany; Tianling Yang, tianling.yang@tu-berlin.de, Technische Universität Berlin & Weizenbaum
Institute, Berlin, Germany; Adriana Alvarado Garcia, adriana.ag@ibm.com, IBM Thomas J. Watson Research Center,
Yorktown Heights, NY, United States; Julian Posada, julian.posada@yale.edu, Yale University, New Haven, CT, United States;
Sonja Mei Wang, s.wang.1@tu-berlin.de, Technische Universität Berlin, Berlin, Germany; Marc Pohl, marc@zweipohl.de,
Independent Researcher, Berlin, Germany; Alex Hanna, alex@dair-institute.org, DAIR Institute, United States.

2573-0142/2022/11-ART510
https://doi.org/10.1145/3555623
123
1 INTRODUCTION
In recent years, research has turned its attention toward the datasets used to train and validate
machine learning (ML) models [28, 29, 78, 91, 94, 95, 98]. Work in this area [6, 10, 37, 38, 46] has
proposed frameworks to document and disclose datasets’ origins, purpose, and characteristics to
increase transparency, help understand models’ functioning, and anticipate ethical issues comprised
in data. Frameworks such as Datasheets for Datasets [37], Dataset Nutrition Label [46], and Data
Statements [10] have become increasingly influential. These forms of documentation are useful for
providing a snapshot of a dataset’s state at a given moment. Still, one persistent challenge is that
datasets are living artifacts [8, 51, 94] that evolve and change over time. Moreover, their production
involves actors with varying amounts of decision-making power. Capturing such variations requires,
as we will argue, a shift of perspective: from documenting datasets to documenting data production
processes.
In this paper, we understand documentation as a tool and the process of making explicit the
contexts, actors, and practices comprised in ML data work, as well as the relationships among
these elements. With the term data work, we refer to tasks that involve the generation, annotation,
and verification of data [68]. We understand data as a process [72], not a fact, and lean on CSCW
and HCI work that has focused on data work in the ML pipeline [49, 69, 90, 91], studying the
socio-economic situation of data workers [34, 43, 87] and the power relations involved in the labor
process [50, 64, 89]. Moreover, this paper expands our previous work [67, 69, 70] that studies a
segment of data production that is outsourced through business process outsourcing companies
(BPOs) that employ data workers from marginalized populations, mostly in developing countries.
Unlike crowdsourcing platforms, where algorithms manage the labor process [86, 102, 116], BPOs
are characterized by traditional and localized managerial structures and business hierarchies [67].
Building on that work, the focus of the present paper is exploring ways of making distributed
processes of data production more reflexive and participatory through documentation. Promoting
reflexivity through documentation means fostering collaborative documentation practices, including
feedback loops, iterations, and the co-creation of tasks. Such considerations can help produce
documentation that serves to interrogate taken-for-granted classifications and hierarchies in data
work [70].
We report on 2.5 years of research comprising 33 semi-structured interviews, five co-design
workshops, and several feedback instances where preliminary findings and prototypes were dis-
cussed with participants. We draw on participatory design to work closely with two BPO companies
located in Argentina and Bulgaria. We focus on the perspectives of data workers at both research
sites and identify the challenges and requirements of integrating documentation in real-world data
production scenarios.
To that end, our research questions are:
(1) How can documentation reflect the iterative and collaborative nature of data production
processes?
(2) How can documentation practices contribute to mitigating the information asymmetries
present in data production and ML supply chains?
(3) What information do data workers need to perform their tasks in the best conditions, and
how can it be included in the documentation?
Our findings show how the participants prioritized documentation that is collaborative and
circular, i.e., documentation that is able to transport information about tasks, teams, and payment
to data workers and communicate workers’ feedback back to the requesters. We discuss design
considerations related to the integration of documentation practices as integral part of data pro-
duction, questions of access and trust, and the differentiation between the creation and use of
124
Documenting Data Production Processes: A Participatory Approach for Data Work 510:3
documentation. We reflect on these observations and the ideas of our participants through the
acknowledgment that documentation is a boundary object [100, 101], i.e., an artifact that can be
used differently across organizations and teams but holds enough immutable content to maintain
integrity. We further discuss a series of design and research implications that the acknowledgment
of documentation as a form of boundary object entails.
This investigation extends previous work on dataset documentation in two aspects. First, instead
of documenting datasets, we focus on the documentation of data production processes, particularly
the intra- and inter-organizational coordination comprised in data production, taking into account
the iterative character of such collaboration and its impacts on data practices and datasets. This
type of documentation can help break with patterns of information gatekeeping and top-down
communication in data production by creating documentation that enables and reflects feedback
loops and iterations. Second, our investigation aims at producing documentation for and with data
workers. In this sense, we explore documentation forms that enable data workers’ participation in
the discussion of data production processes. Through this approach, we aim to open up paths for
designing documentation that captures the needs and practices of data workers in contexts where
their voices and labor tend to remain silenced [42, 50, 64, 68, 69, 81, 89, 108].
2 RELATED WORK
2.1 The Documentation of ML Datasets and Data Pipelines
What counts as documentation is linked to the disciplinary background of each research community.
For instance, while the field of open science envisions the documentation of data provenance and
protocols toward reproducibility [39, 40], the Fairness, Accountability, and Transparency in the ma-
chine learning community is concerned with documenting ML models to prevent bias and improve
transparency [71]. In this paper, we approach the documentation of data production by leaning
on previous work that has advocated for transparency by proposing innovative frameworks for
documenting ML datasets [6, 10, 37, 38, 46]. These frameworks vary in the forms of documentation,
such as datasheets, dataset nutrition labels, and data statements, and have different prioritized
goals and intended field of use in their design. They share similarities in requiring a standardized
set of information for documenting the motivation [37], curation rationale [10], composition [37],
provenance [6, 10, 37, 46], collection process and context [10, 37], actors involved [10, 38] and
tools used [37, 38]. Some of these groundbreaking frameworks have been successfully integrated
and implemented in corporations such as Google and IBM as pilot projects [1, 3], and several
ML conferences that now require new dataset submissions to be accompanied by a datasheet, as
evidenced by the recent NeurIPS Dataset and Benchmark track [2].
The foregoing frameworks focus on the documentation of the datasets themselves, offering a
useful snapshot of the dataset’s current state to dataset producers, users, and researchers. However,
as previous work [68] has argued, two fundamental aspects of ML datasets represent a challenge to
such standardized documentation frameworks: (1) The collaborative nature of data work and data
production [69, 73, 75], including the often messy relationship between stakeholders [70], which
challenges the strict differentiation between data subjects, data users, and data producers assumed
by standardized documentation frameworks; and (2) the fact that datasets are not static entities but
rather evolving projects with flexible boundaries [8, 94].
In view of these challenges, recent research in data documentation has raised attention toward
a consideration of data production pipelines, including the multiplicity of workflows and actors
involved in producing ML datasets [8, 30, 70, 82, 90]. Hutchinson et al. [47] put forward a set of
125
questions structured by interconnected stages of the data life cycle. Conceiving this cycle as non-
linear, the authors frame datasets as technical infrastructures and their production processes as “goal-
oriented engineering,” which favors adopting methodologies with deliberation and intentionality
to improve transparency and accountability over requiring only post-hoc justification. Relatedly,
Balayn et al. [8] introduce a process-oriented lens to dataset documentation and expand the focus
to the design of data pipelines and the reasoning behind them. This process lens can help identify
and locate knowledge gaps in the pipeline and uncover places and procedures that are worth
more rigorous documentation and careful reflection on their potential impact and harm. Other
studies [33, 51, 94] have called for the professionalization of data work and care (i.e., the careful
consideration of the dataset in terms of “the domain setting where it originates, and the potential
questions modeling that data might answer or problems it might solve” [113]), the publication of
data documentation, the implementation of institutional frameworks and procedures to promote
documentation, and the consideration of data work as an own subfield of research [51, 91, 94].
These investigations shift the perspective to regard datasets as living artifacts that are formalized
by actors with different positions and decision-making power. From this perspective, documentation
could capture data pipelines and production contexts as well as various actors that shape datasets. At
its ideal state, documentation should also be able to foster “a documented culture of reflexivity about
values in their [data] work” [94]. As goals, interests, and needs vary across different contexts and
organizations, documentation should be flexible to accommodate local specificities and constraints
and, at the same time, keep certain common identities. In line with Pushkarna et al. [82], whose
work in data documentation reflects the need for disclosure documents to be responsive to a larger
body of stakeholders, including annotators, data curators, and users, we argue that recognizing
documentation as a boundary object can be helpful to approach and reflect on such challenge.
2.2 Boundary Objects

The notion of boundary object, defined as “objects that both inhabit several communities of practice
and satisfy the informational requirements of each of them” [15], is a useful lens to study how
documentation operates across diverse contexts in the ML supply chain. Not only do boundary
objects facilitate communication among different groups, but they also translate and transfer
knowledge to be accessible to all potential stakeholders. This way, boundary objects can meet
stakeholders’ specific information needs and work requirements, but they do not require them to
have a panoramic grasp and full comprehension of every detail. The variety of maps of a city is an
example of a boundary object. The maps have the same geographical boundary, namely the city,
but different groups use them differently. For example, consumption-minded citizens may search
information about local parks and restaurants, physical geographers may pay attention to mountains
and rivers, botanists to certain horticultural areas and botanical gardens, and administrators may
keep an eye on political boundaries. Suchman [105] emphasizes the appropriability of objects into
use contexts and working practices. In her analysis of a civil engineering project, the author shows
how heterogeneous artifacts were brought together and integrated into everyday work practices
[104]. When such “artful integration” [105] involves continuous and relatively stable relationships
among different communities and shared objects jointly produced by them, "then boundary objects
arise"[15].
Within CSCW and HCI, there is a longstanding use of the boundary object notion to investigate
how artifacts can be used across varying contexts and to facilitate cooperation and collaboration,
such as in the medical order [118], in the coordination among different actors in hospitals [13, 14], in
aircraft technical support [60, 61], in disease management at home [4], and in issue tracking systems
[11]. This notion is also used to study organizational memory [5] and common information space
[9, 96]. Lee [58, 59] further coins the term boundary-negotiating artifacts to explore non-standardized,
126
fluid informal objects that not only travel across but also destabilize and push boundaries. Moreover,
some participatory design approaches consider prototypes and models as boundary objects that
facilitate the exchange of information and knowledge, and make it possible to involve different
stakeholders in the design process by establishing a shared syntax or language [16, 22, 32, 44].
Socio-material perspectives point out that the features of physical objects impact the ways they
behave and operate as boundary objects [45, 48, 66]. Documents, as a specific form of boundary
objects, can become part of communicative practices in three ways: they can be an object of
evaluation, a medium to facilitate communication, and/or the arena where communicative prac-
tices are shaped [48, 66, 119]. Contexts of data production, where actors often pursue different
goals and interests, pose their own challenges in terms of fostering shared values and effective
communication and, finally, promoting common documentation practices. Given these challenges,
documentation should adapt to the needs of different stakeholders while remaining stable enough
not to lose its function across different settings. The acknowledgment of data documentation as
a boundary object [82] can provide a useful lens to approach and address some of the tensions
arising from coordinating and collaborating across actors and organizations in data production.
Specifically, the boundary object notion can add to existing work on data documentation in two
ways. First, the acknowledgement of data documentation as a boundary object helps to make visible
and specify the variety of actors involved in data production contexts with their own priorities,
workflows, and needs. Secondly, this acknowledgment can add a practice-oriented perspective
to the formal standardization of documentation frameworks and make work more visible. This
includes work not only in the accommodations of standardized documentation frameworks to local
needs and specificities, but also in the adaptation and improvement of existing work practices
and workflows. Looking at documentation through the lens of boundary objects can be helpful in
fostering documentation practices and design documentation frameworks that capture the practical,
economic, and organizational decisions involved in data production and encoded in datasets.
3 METHOD
This investigation comprises 2.5 years of research in collaboration with two BPO companies located
in Buenos Aires, Argentina (Alamo), and Sofia, Bulgaria (Action Data), specializing in the collection
and annotation of data for ML. Our engagement with both organizations included fieldwork at their
offices, interviewing phases, moments of feedback and discussion, the collaborative development
of prototypes, and a series of co-design workshops. In this section, we provide details of our
engagements with the participants and the methods used. Figure 1 shows how the research process
evolved over time.
ta
Da
)
ri a
wi lamo
n
tio
)
lga
na
Ac
(Bu
sho ith A
nti
s
iew
th
rge
ata
ork ps w
nd
er v
(A
nD
ps
ted
rou
nd
int
mo
o
tio
sig rksh
tes
rou
ck
of
A la
Ac
are
ba
nd
ck
o
co- gn w
nw
at
at
eed
ba
ou
es
ork
ork
ed
yp
dr
df
si
t fe
ldw
ldw
tot
on
on
de
de
pro
firs
co-
se c
se c
fie
fie
data analysis data analysis iterative development of prototypes data analysis data analysis
May July January August January July Oct. Nov.

2019 2019 2020 2020 2021 2021 2021 2021
Fig. 1. Timeline of the iterative phases of data collection and analysis comprised in this research.
127
3.1 Participatory Design

At the core of Participatory Design (PD) is the active involvement of the communities that will use
technologies and systems in their design. Moreover, PD recognizes the importance of the context
in which technologies are used and the situated practices within that context [84]. In CSCW and
HCI, participatory design methodologies have been deployed to explore various fields of inquiry,
ranging from public services [7, 25, 26, 36, 54, 93, 97], to healthcare [56, 79, 88], to organizational
settings [62, 106, 117]. Additionally, PD has been used in the field of ML Fairness, Accountability,
and Transparency to guide algorithmic accountability interventions [52], learn about the concerns
of communities affected by algorithmic decision-making systems [21, 83], and elucidate causal
pathways to be input into complex modeling [65]. Regardless of the discipline, what these previous
efforts have in common is their commitment to building reciprocal partnerships to inform the
design of systems and technologies based on the values of justice and participation.
In this investigation, we use PD to explore the development, implementation, and use of doc-
umentation in real-world data production scenarios that involve many stakeholders that are
geographically distributed. Similar to Wong-Villacres et al. [115], we understand participatory
design as a long-term iterative process in which exploring participants’ insights is as essential as
tangible design outcomes. We use PD and co-design as synonyms [27].
Throughout the research and design process, we focused on three core elements of PD: having a
say, mutual learning, and co-realization [17, 74]. We were intentional about listening to what the
different actors involved in data production (the data workers, their managers, and the clients) had
to say about dataset documentation in order to derive design considerations that would adapt to
their needs. As the investigation evolved, we especially focused on the experiences of data workers
because we realized they were the ones that usually did not have a say in how workflows and
processes are designed in ML supply chains. We discuss this shift of perspective in Section 4.
While working on the research design, we put effort in remaining sensitive to the “politics of
participation” [27] by trying to anticipate power asymmetries that could emerge among participants
and between researchers and participants. In this sense, the interviewers and workshop facilitators
were at all times intentional about making room for data workers’ voices, for instance, by insisting
on conducting in-depth interviews with them or preventing managers from hogging the mic
during the workshops. Moreover, Bratteteig and Wagner [18] argue that “language is power”
and by speaking one’s own language, discourse can be expanded and model monopoly, i.e., the
adoption of external mental models, can be avoided [17]. In this sense, it was important to us to
let the participants maintain their preferred language — both verbally and visually. To that end,
we provided interpretation and facilitation in the participants languages and worked largely with
visual symbols [18]. In section 5.2, we reflect upon our research and highlight a few implications
for participatory research in terms of power, negotiation, and participation in corporate settings
[99].
3.2 The Participating Organizations

Two companies dedicated to the collection and labeling of ML data participated in this investigation:
• Alamo (pseudonym), located in Buenos Aires, Argentina.
• Action Data (pseudonym), located in Sofia, Bulgaria.
Both BPOs are impact sourcing companies. Impact sourcing refers to a branch of the outsourcing
industry that proposes employing workers from poor and marginalized populations with the twofold
aim of offering them a chance in the labor market and providing information-based services at lower
prices. In both organizations, data workers perform tasks related to the collection and annotation
of data. These tasks are requested by external clients located in various regions of the world, who
128
seek to outsource the production of datasets used to train and validate ML models. The relationship
between requesters and data workers is mediated by managers, reviewers, and quality assurance
(QA) analysts that form a hierarchical structure where data workers occupy the lowest layer [69].
3.3 Data Collection

3.3.1 Semi-Structured Interviews.
A total of 33 interviews were included in the present investigation. 19 were conducted in person
in 2019. Due to the on-going COVID-19 pandemic and the resulting travel restrictions, 14 interviews
were conducted remotely between August 2020 and September 2021.
Table 1. Overview of Participants and Research Methods. Some participants were interviewed more than
once.
Method Organization Description Participants Medium Language

Alamo BPO in Argentina 6 data workers In-Person Spanish
SEMI-STRUCTURED INTERVIEWS
Alamo BPO in Argentina 2 managers In-Person Spanish

Action Data BPO in Bulgaria 9 data workers In-Person English with
Arabic interpre-
tation
Action Data BPO in Bulgaria 2 managers In-Person English
Alamo’s ML Practitioners request- 1 requesters Virtual Spanish
clients ing data work from Alamo
Action Data’s ML Practitioners request- 5 requesters Virtual English
clients ing data work from Action
Data
Other Clients ML Practitioners request- 4 requesters Virtual English
ing data work from other
BPOs
Action Data BPO in Bulgaria 2 managers
Action Data BPO in Bulgaria 2 data workers English with
CO-DESIGN WORKSHOPS
Ranua Action Data’s Syrian part- 1 manager Virtual Arabic

ner organization interpretation
Ranua Action Data’s Syrian part- 13 data workers
ner organization
Action Data’s ML Engineers at Scandina- 3 requesters
clients vian University
Alamo BPO in Argentina 13 data workers
Virtual Spanish
Alamo BPO in Argentina 4 managers
All interviews were semi-structured. Those conducted in 2019 were in-depth interviews that
explored the lived experiences of data workers and aimed at understanding the contexts in which
data work is carried out [69]. The average duration of those interviews was 65 minutes. For the
analysis comprised in this paper, we selected only the in-depth interviews that revealed insights
about documentation practices. In 2020 and 2021, we carried out a second round of interviews
focused on the topic of documentation. Each one of those interviews had an average duration of
50 minutes. We interviewed actors who were actively involved in the documentation of projects
at both companies. In addition, we interviewed machine learning practitioners with experience
outsourcing data work to platforms and BPOs. Some of these practitioners were direct clients of
Alamo and Action Data. Table 1 provides an overview of the interviews and participants included
in this work.
129
The interviews allowed us to explore documentation practices carried out in different organi-
zations and identify perceived challenges, experiences, and needs. Some interview partners were
interviewed more than once at different stages of the design process. For each interview, the
informants received a gift card for €30 (or the equivalent in their local currency) for an online
marketplace. The audio of the interviews was recorded and, later on, transcribed. The name of the
interview partners and any other elements that would help identify them were changed. The names
that appear in the excerpts included in this paper are pseudonyms that the interview partners chose
for themselves.
3.3.2 Co-Design Workshops.
Given the multiplicity of actors involved in processes of data production [69, 70], we saw value
in inviting data workers, their managers, and their clients to a series of co-design workshops to
discuss documentation practices. Table 1 offers an overview of the organizations and individuals
participating in each workshop series.
The workshop series with each company took place on different days with a similar structure
but was tailored to each setting and based on the ongoing projects and clients of the respective
BPO. Due to the COVID-19 pandemic, all sessions took place online via Zoom and the activities
were conducted in parallel on the visual tool Miro. As compensation, all workshop participants
received €15 per hour of participation. The sessions were video-recorded and, later, transcribed. In
addition, note-takers registered the activities and interactions and one of the authors produced a
real-time graphic recording on Miro. The graphic recording was kept visible to participants as it
evolved throughout the workshop sessions.
INTRODUCTION: GROUND RULES 2 mins. workshop facilitator
ORGANIZATION: PARTICIPATION:
The session will be recorded, including content posted to the chat. Be patient, be kind, be supportive and show sensitivity to anyone who has
For privacy and consent, please do not share content from the workshop the floor.
on social media. All voices matter. Don't hog the mic. Encourage others to speak up by
being mindful with the floor time.
SELF-CARE: There are no dumb questions or comments. Be kind when commenting on
We understand that everyone’s situation is different and Zoom fatigue is other's ideas.
real. Feel free to step away if you need to but please mute your mic and By default, you are muted. If you wish to speak use the raise hand function
disable your video. on Zoom. When not speaking, please mute your mic.
If you absolutely must discuss potentially triggering or sensitive topics, We support language justice and aim to build multilingual spaces. Thus, we
please provide a content warning. encourage participants to communicate in the language they feel most
In the event of discomfort, problems, questions or requests, please comfortable with. The interpreters are here to assist us.
message any of the organizers through the chat and let us know what is
going on. Feel free to ask one of the interpreters to help translate the
message if needed.
:‫المشاركة‬
:‫االمور التنظيمية‬ ‫ وساهم في مساعدة الشخص المتحدث‬،‫ لطيفًا‬،‫كن صبورًا‬
‫هذه المحادثة متضمنتًا المحتوى سيتم تسجيلها‬ ‫ لذلك ال تستخدم الميكرفون في االثناء وساهم في السماح لألخرين بالكالم في‬.‫اصواتكم لها أهمية‬
‫ألجل المصداقية الرجاء عدم مشاركة المحتوى على وسائل التواصل االجتماعي‬ .‫الوقت المعطى لهم‬
.‫ لذلك كن ودودًا عند التعليق‬،‫ال يوجد سؤال او تعليق غبي‬
:‫االنتباه للذات‬ .‫ لذلك اضغط على رز اليد للتحدث على تطبيق الزوم‬،‫بشكل أتوماتيكي لن يعمل الميكرفون الخاص بك‬
‫ ولكن تذكر بان‬،‫ لذلك يمكنك اخذ قسط من الراحة عند الحاجة‬.‫نحن نتفهم وضعكم جميعا بان استخدام التطبيق متعب‬ .‫الرجاء ايضًا وضع المكروفون في وضعية الصامت عند عدم التحدث‬
.‫تغلق الفيديو والميكرفون‬ ‫ لذلك نحن نشجع المشاركين على‬.‫نحن ندعم العدالة في اللغات ونهدف الى بناء مساحة لغوية متعددة‬
.‫ الرجاء إعطاء مالحظتك للمحتوى‬،‫إذا احسست بضرورة مناقشة موضوع مهم او حساس‬ .‫ المترجمين هنا ليدعمونا‬.‫التواصل باللغة المريحة لكم‬
‫ الرجاء التواصل مع المنظمين او‬.‫ سؤال او مشكلة‬،‫في حالة الشعور ب عدم الراحة او ب حاجة طرح مشكلة‬
.‫المترجمين للمساعدة في أمور الترجمة عند الحاجة‬
Co-Designing
Data Documentation
Fig. 2. The ground rules for participation were visible at all times throughout the co-design sessions with
Action Data’s workers. A Spanish version of the same rules was introduced at the workshops with Alamo.
The workshops with the Argentine company Alamo were conducted in Spanish, the native
language of both participants and the co-authors who served as facilitators. The sessions took
place on October 26 and 28, 2021. Each session comprised two hours. 17 participants attended the
workshops, including four managers and 13 data workers. We suggested Alamo’s management
130
include one of the company’s clients in the workshops but this request was repeatedly declined.
Our proposal to include a third workshop session for data workers only (without the presence of
management) was also declined due to scheduling issues according to the company’s management.
The 3-day workshop series that we organized with Action Data took place on November 3, 5,
and 8, 2021. On day 1, we hosted a 90-minute session and invited Action Data’s management (two
participants) located in Bulgaria and 13 of the data workers that Action Data manages through
its partner organization, Ranua, located in Syria. For day 2, we prepared a 3-hour workshop and
had three members of one of Action Data’s client organizations — a Nordic university — join
in. Finally, day 3 was conceived as a 2-hour round table only for the Syrian data workers to
discuss their impressions without the presence of managers and clients. This was motivated by
the acknowledgment that the presence of managers can influence the level of comfort workers
feel when expressing their opinions. In all sessions, the co-authors who served as main workshop
facilitators spoke in English with simultaneous interpretation. For the breakout rooms, we offered
facilitation in English and Arabic, depending on the needs of each group.
The power differentials present among the participant groups motivated many of our decisions in
terms of how to design the respective sessions with Alamo and Action Data. The participation rules
introduced at the beginning of the workshops reflect our stance (see Fig. 2). The acknowledgment
of such power differentials also demanded much attention and flexibility on our side to quickly
adapt and reformulate activities in view of difficulties identified while conducting the workshops.
3.4 Data Analysis

The collected data were analyzed through reflexive thematic analysis (RTA) as developed by Braun
and Clarke [19] and Braun et al. [20]. RTA is a type of thematic analysis that places importance
on researcher subjectivity and sees the researcher as playing an active role in the knowledge
production process, a value which is emphasized by the word “reflexive.” We describe the position
from which we have conducted this research in the next subsection, 3.5.
To analyze the data collected in the present investigation, two of the co-authors explored the
interview and audio transcriptions first. They started with a first coding round after which two
further authors revised and added codes. A meeting followed where the four coders exchanged
impressions and discussed disagreements around the codes. Additionally, they revisited video
recordings of the workshops as well as the activities captured on Miro boards.
We constructed candidate themes after revisiting coding categories. Braun et al. [20] argue that
themes "do not emerge fully-formed" and that there are first candidate themes that are developed
from earlier phases, before settling on the final themes. It took us several iterations until arriving
at a (more-or-less) final set of themes. Table 2 provides an overview.
The transcriptions of the data collected at Alamo were in Spanish. The data collected at Action
Data was in English, with the exception of three interviews and all workshop sessions that included
English and Arabic-speaking participants. In those cases, the English interpretations as performed
by the interpreters were transcribed. Coding the transcriptions in both Spanish and English was
possible because all authors are proficient in English and three of them are native Spanish speakers.
Some of the excerpts included in the Findings sections were originally in Spanish and were translated
into English only after the analysis was completed.
It is worth mentioning that some of the initial interview data had been previously coded for an
earlier study conducted by some of the authors [69] using constructivist grounded theory [23], a
method that was consistent with the exploratory spirit of the fieldwork and the first interviews
conducted in 2019. For the present paper, we revisited some of those interviews and re-coded them
along with the newly collected data using RTA.
131
Table 2. Data Analysis: Summary of codes and themes
Domain Summaries Codes Themes

Standardization
Foster Workers Growth
Quality Control & Metrics
FUNCTION OF
Work Enabler
DOCUMENTATION
Inter-org. Accountability
Knowledge Preservation
DOCUMENTATION AS COMMUNICATION
Communication Medium
ENABLER ACROSS BOUNDARIES
Auditor
Managers
ROLES AND Service Owners
COLLABORATION Labelers
BPO’s Teams
Clients
Wiki
Kick-off Document
DOCUMENTATION
Scope of Work (SoW)
FORMATS
Dataset Documentation
Postmortem Report HOW DOCUMENTATION CAN SUPPORT
Impact and Responsibility WORKERS’ AGENCY
Trust
WORK ETHIC AND
Security
LABOR CONDITIONS
Labor Conditions
Confidentiality
Integration in Workflows
Use of Documentation
LIMITATIONS OF CHALLENGES DESIGNING FOR
Lack of Information
CURRENT DOC. WORKERS IN CORPORATE
Access
PRACTICES ENVIRONMENTS
Feedback
Updates
Ownership
Synthesis /Simplicity
Interactivity
POSSIBLE DESIGN Information Type DESIGN IDEAS EMERGED FROM THE
SOLUTIONS Integration WORKSHOPS
Centralizing
Automation
Visualization
Lessons from Fieldwork
WHAT PARTICIPANTS & ORGANIZERS
Lessons from our Feedback
LESSONS LEARNED THROUGH THE CO-DESIGN
Lessons from the Interviews
PROCESS
Lessons from the Workshops
3.5 Positionality Statement

Our ethical stance throughout this investigation was centered on respecting workers’ expertise
[27, 85, 92] and creating spaces where they could describe work practices and conditions, challenges,
and needs. A central motivation — and certainly a challenge — of this research was to empower
data workers to speak up and participate in shaping their own workflows and tools [103].
Our position and privilege vis-à-vis our study participants have been in constant interrogation
and discussion throughout data collection, analysis, and while considering the implications of this
investigation. Opting for an analysis method that would help us to acknowledge and make our
132
positionalities as researchers explicit was essential to us. Following the idea of conducting a reflexive
thematic analysis, it seems appropriate to disclose some elements of the authors’ backgrounds that
might have informed the findings we will present in the next section.
All authors except one are academics working in institutions located in the Global North. Most of
us carry passports that are foreign to the places where we live and work. Four of us are sociologists,
one a designer, and two are computer scientists. We all work in the field of Human-Computer
Interaction and Critical Data Studies. Three of us are Latin American, two were born in China,
one of us is German, and another is a first-generation US-American. All of us are first-generation
academics. Our position as researchers living and working in the Global North provides us with
privilege that our study participants do not hold. Despite being born and raised within working-class
families, some of us in the same country as some of the participants and having too experienced
migration, we acknowledge that our experiences differ from that of the participants in the sense, for
instance, that we have never experienced war or had to flee our home countries to become refugees.
This is especially relevant vis-à-vis the Syrian data workers that participated in this investigation.
The acknowledgment of such privilege has guided several of the decisions that we have made,
for instance, in terms of mitigating power differentials in data collection. In this sense, we must
mention that, despite our best efforts, some power asymmetries could not be mitigated and we
could only acknowledge and reflect upon them during data analysis and in this paper (see Sections
3.3.2 and 5.2).
4 FINDINGS
Our findings are presented in a way that reflects the diverse priorities, opinions, positions, and
power of our participants. For each research site (Alamo in Sect. 4.1 and Action Data in Sect.
4.2), we first provide details of our engagement with the respective partner organization and the
preliminary knowledge obtained through the interviewing phases. We then move on to describe
salient discussions that took place at the workshops. In Section 4.3, we present a summary of
findings including commonalities and differences across sites.
Long-term engagements with participants are key to participatory research [24, 110] and, as
Sloane et al. [99] argue, it is important “to make the tensions that characterize the goal of long-term
participation in ML visible, acknowledging that partnerships and justice do not scale in frictionless
ways, but require constant maintenance and articulation with existing social formations in new
contexts.” We believe that there is a contribution to be made by placing our findings in conversation
with the vicissitudes of conducting participatory research, especially with the inter-mediation of
partner organizations. This is why we show how our observations evolved over 2.5 years and across
sites and methods. We are convinced that our findings would not be the same had we stopped after
the interviews. In the same way, without a description of the interviewing process and how our
relationship with the participant organizations evolved, the ideas emerging from the co-design
workshops would lose some of their nuances.
4.1 Case 1: Alamo

The Buenos Aires-located company that we will call Alamo is, at the time of this investigation, a
medium-sized organization with around 400 data workers distributed in three offices located in
Argentina and two other Latin American countries. Its main client is a large regional e-commerce
corporation. 90% of Alamo’s projects are requested by this client and include image segmentation
and labeling, data collection, and content moderation. The Buenos Aires office employs around 200
data workers, mainly young people living in very poor neighborhoods or slums in and around the
city. Looking for workers in poor areas of Buenos Aires is part of Alamo’s mission as an impact
sourcing company. Alamo’s data workers are offered a steady part- or full-time salary and benefits,
133
which contrasts with the widespread contractor-based model observed in other BPOs and data
work platforms [80]. Even so, Alamo’s workers receive the minimum legal wage in Argentina (the
equivalent to US$1.50/hour in 2019) and their salaries situate them below the poverty line.
4.1.1 Investigating Alamo’s Documentation Practices.
During fieldwork in 2019, one of the observations that stood out was the company’s stated
commitment to fostering its workers’ professional growth [69]. In contrast with this mission,
Alamo’s emerging documentation practices seemed, at the beginning of this investigation, almost
completely oriented toward surveilling workers and measuring their performance. Noah, one of
the data workers that we interviewed in 2019 described that documentation process as follows:
For instance, for this project, we have three metrics for individual performance and now
we’re about to add one more. Mostly, we quantify and document errors or rollbacks
reported by the requester.
Nancy, who was one of Alamo’s project managers in 2019, explained what was done with the
documented performance metrics:
Within the company’s structure, QA [the quality assurance department] then takes
those metrics and puts together something like a “scoreboard” that shows everyone’s
performance and that is visible to the whole company.
After our first fieldwork period and once we had analyzed the collected data, we provided Alamo
with a report summarizing our observations. We found that there was a contradiction between
the company’s mission of fostering worker growth and its emphasis on quantifying individual
performances, and suggested reorienting the purpose of documentation to include information
that could help workers understand the background and purpose of their work. We have discussed
those observations and their implications at length in previous work [67, 69]. After presenting our
observations, we engaged in a conversation with Alamo’s co-founder Priscilla who, as shown by
the following interview excerpt, agreed with our suggestions and expressed her desire to establish
documentation practices based on the needs of data workers:
In my opinion, this is what we need to do. Listen more to their needs and ask them
what they need to perform better and do their work better. And we must generate and
provide the information they need. We need to make time for that.
In September 2020, we resumed contact with Alamo’s management and invited the company to
participate in our investigation of data production documentation, starting with a series of follow-
up interviews. By then, the company was already in the process of restructuring its documentation
practices. Nati, one of Alamo’s former QA analysts, had just been promoted to “continuous im-
provement analyst” and put in charge of bringing together the company’s quality standards and
its goals in terms of promoting workers’ professional growth. Nati’s main task was re-designing
Alamo’s documentation practices. She described this process as follows:
During the past year, we made an effort to ask everyone “what would you like to see
reflected in documentation? What do you need?” After that, we started dropping old
practices that didn’t make sense anymore. We threw them away, and started anew. We
learned a lot about empathy too, you know? This work is very much about empathy
[...] because we assign workers in each team according to their competencies and their
skills, and they need to know what projects are about to actually make the most out of
those skills.
What came out of the process led by Nati was a series of documentation practices strongly
oriented toward knowledge preservation, i.e., the documentation of workflows, tools, specific de-
tails about the conduction of projects, best practices, and lessons learned [70]. In Nati’s words,
134
documentation was envisioned as an educational resource for Alamo’s workers and its aim was
to mitigate information loss due to fast worker turn around. A special team led by Nati was put
in charge of developing and implementing this vision. With the documentation guidelines and
templates designed by them, each team leader would document the projects carried out by their
data workers. Moreover, following one of our suggestions, the company saw value in documenting
contextual information about the ML products that are trained or validated using data produced at
Alamo. Management then started to seek clients’ involvement in documentation practices. The first
step was sharing the project documentation with the project’s requester and asking for feedback.
These collaborative practices progressed to the point that some clients became co-producers and
users of the documentation produced by Alamo, as Pablo, leader of one of Alamo’s data labeling
teams described:
I mean, it depends on the client. Luckily this requester is very committed and is always
looking at our documentation, providing feedback, checking if the information is up to
date. Sometimes they just jump in and add information, they collaborate on the best
practices manual. It’s also good for them to see if their information is updated too,
not just for us but for the requester’s own team, because they share these documents
within the requester’s company.
While this arrangement was beneficial for the clients, Alamo was providing the infrastructure
and most of the workforce that made documentation possible. Alamo’s workers started to feel the
burden, as team lead Pablo described:
We definitely noticed the change: from not documenting at all to documenting every-
thing. It also depends on the project. For instance, our main client [a large regional
e-commerce corporation] brought many more requirements for us to implement in our
documentation in terms of what they wanted to see reflected. There are new documents
and new guidelines all the time.
We followed up with continuous improvement analyst Nati in December 2020 as she continued
to review and revise documentation practices within Alamo. By then, Alamo’s documentation
revolved around a format they called the Wiki that was conceived as a collaborative document
to enable the conduction of projects (see Fig. 3). Each Wiki comprised information, collected in
many cases in direct collaboration with the requester, about processes, tools, and best practices
concerning each specific project. Nati’s main concern back then was reducing the time and effort
that were flowing into documentation and, at the same time, finding efficient ways of keeping
documentation updated.
After these interviews with Nati, Pablo, and some other data workers, we got back to Alamo’s
management with feedback. Among other things, we highlighted the need to reduce the amount of
work put on Alamo’s team in terms of producing and maintaining documentation and distributing
responsibilities more equitably. The idea of producing exhaustive metadata for each project was
commendable but the price of such an effort was being paid by the data workers. Moreover, we found
that the original idea of the Wikis, namely, that of preserving the knowledge generated in each
project and offering more information about the ML supply chain to foster workers’ professional
growth, had shifted toward a constant consideration of what clients might want to see reflected in
documentation, as Sole, leader of another data labeling team at Alamo, described:
We need to know what we want to document first. Who we are documenting for. That’s
a challenge at the moment. Because right now I ask myself: who am I doing this for?
What for? [. . . ] Right now, we’re focusing on the clients: what is the client asking for?
Because I may want many things, but what is the client interested in? What is the
client expecting from us as service providers?
135
https://www.alamo.com.ar
A lamo
Miembros del Proyecto

PM
conoce el equipo que lo hace todo possible
Inicio
Oliver PM
Objectivo
Sandra
Métricas Vera
Juan A.
Equipo
Herramientas
Descripción del servicio
URLs y Credenciales
Nuestras Heramientas
Fig. 3. The wiki-based documentation prototype created by Alamo to brief and train workers.
In view of these challenges, we proposed the idea of conducting a series of co-design workshops
with Alamo’s data workers and managers with the purpose of moving the focus away from “what
clients want” and re-imagining documentation based on the needs of workers. In June 2021, we started
conversations with Alamo. Negotiation points included participants’ compensation and attendance,
i.e. how many and which workers would Alamo allow to participate in the workshops. We agreed
to use the Wiki as the prototype we would discuss and re-imagine at the workshops. After several
months of negotiation, the workshops finally took place in October 2021.
4.1.2 Co-Designing Documentation with Alamo’s Workers.
The workshops with Alamo workers and managers started with a discussion of the Wiki template.
The discussion focused on a recent project from Alamo’s main client, an e-commerce corporation
that operates primarily in Latin America. The client instructed Alamo to produce a training dataset
with images of false identity cards from three Spanish-speaking Latin American countries. This
dataset was later used by the client to train an algorithm to screen IDs. The instructions for workers
involved the collection of images of authentic identification cards, their modification to render
them invalid, and the labeling of the resulting photos.
The managers participating in the session mentioned that the main benefit of the Wikis is for
data workers to take an interest in existing projects, explore them through the respective Wikis,
and use that information for their education and growth. In the words of a project manager:
We’ve been working with growth metrics for each person working at Alamo this year.
For example, if someone wants to learn programming skills, they can look at the Wiki,
see what tools are available, the projects, the languages used, and know what skills to
develop.
However, while managers want to encourage workers to have an overview of the undergoing
projects at the company, they are also mindful of their clients’ demands in terms of confidentiality
and security. For this reason, either much of the information in the documentation lacks details
and appears general and superficial, or the access to such information is restrained. Several of the
data workers argued that the restrictions and the lack of access to the Wikis as well as difficulties
keeping the information updated represented a hurdle for their work:
One of the limitations is access. That we all know the information is there and can
access it. How can we make that happen? Not everyone has access. How can we
136
guarantee that the information is updated? It takes a lot of time to write down the
documentation.
In addition, the data workers asked the managers for trust in accessing complete versions of the
documentation, but the managers argued that the information is sensitive and clients often request
not to make it available to all. This tension, coupled with the difficulty of keeping documents
updated as projects evolve, makes it hard for workers to incorporate feedback in the documentation
due to the lack of information.
Moreover, the workers acknowledged that they were less inclined to use the information on
Wikis for reference. For one thing, some workers were not aware that the Wikis were available for
them to see other projects and had to be reminded of their presence. For another, the documentation
was perceived as “on the machine” and “not close at hand” and, thus, it was not the workers’ initial
responses to check if relevant information could be found on the Wikis. By the end of day 1, it
became evident that there is a discrepancy between how management thinks workers should use
the Wikis and the latter’s perceived usefulness vis-à-vis the time and effort required to maintain
these documents.
2.C Co-Designing
Data Documentation
October 2021
ENTENDIENDO ROLES Y PROCESOS | GRUPO C

¿Cómo se llevan a cabo los
Perspectiva de lxs project managers. 20 min. grupos brainstorming y visualización
proyectos y qué como se 01
relacionan los diferentes roles? 01 Juanita 02

Personaje: agregá más fases
02 Fases del proceso Kick-off Onboarding Conducción del Proyecto

INSTRUCCIONES
Acciones
03 ¿Qué hace el personaje en Comunicar Documentar: Establecer Status
Conocer las Coordinación,
Recaurdar
Cómo ser más
qué Consolidar manuel de métricas y Organizar semanales
brainstorming participación activa grupos 20 min. cada fase? documentación
demandas equiop, Coordinación buenas prácticas, efectivos, gnar
servicios se
del cliente herramientas equipo para el SO y
criterios del ver cómo las tareas de los más tiempo
analista cliente
dan medirlas objetivos
Darle forma
a la
realización
del proyecto
Esta actividad require tu participación activa para visualizar 04 Interacciones
la discusión. ¿Con quiénes interactúa en el Con el
Service
Internamente:
cada fase? ¿Cómo?

admin, plataforma,
RH Service Analistas con el manager y Cliente Analista
service (por medio
Nos vamos a dividir en grupos y cada grupo va a a trabajar en owner
Externamente:
cliente y los pares
de esas áreas del
owner owner del SO) cliente capital
humano
una sala distinta de Zoom.
ciente
Seguí las instrucciones con tu grupo y usá esta plantilla para

documentar el brainstroming. 05 Razonamiento
¿Qué está pensando el Tener Tener claro el
Qué potencial
real hay? Qué se
alcance del
personaje? otro proyecto y qué
puede hacer
para lograr
se necesita
La discusión se basa en la perspectiva de unx project manager. proyecto objetvos?
01 Creen un personaje con estas características. Pónganle nombre y

descríbanlo.
05 Sentimientos
Identifiquen las fases más importantes en la conducción de ¿Qué siente el personaje en
02 proyectos dentro de Alamo. Agreguen fases de ser necesario. cada fase?
😃😨🤔💪 Alerta
Con la organización Juanita se siente
menos estresada
03 Describan la experiencia: ¿Qué se hace en cada fase?
Describan las interacciones: ¿Quién más está presente en Liderazgo

04 cada fase?
06 ¿Quién lidera cada fase?
Denle vida a la experiencia: ¿Qué sentimientos y pensamientos

05
negativos y positivos tiene el personaje? Usen emojis para ilustrar. YO, acompañando al service owner Service owner, acompañamiento de PM Service owner, acompañamiento de
PM
06 Asignen una persona responsable o que lidere cada fase. 07 Documentación
¿Qué información se Registro:
Etapas previas: Planificació es informal:
Tareas Documentación Indicadores,
Facturación, Documentar
Describan qué tipo de documentación se genera en cada fase. preguntas, estimación. Contrato:
tableros de
07 registra en cada fase? ¿Qué document estimación con
el cliente. No
formalización con estilo
diferente: estilo de en cada
técnica de los
criterios del
estatus de
proyecto, reporte
registro de los flujos de;
tipo de documentos se de kickoff hay formato
contraro y
confidencialidad
etapa proyecto de fin de mes
servicio proyecto.
Escriban ideas, desafíos, y comentarios que surjan durante el producen?

08
brainstorming.
08
HERRAMIENTAS ÚTILES Notas y Observaciones
! ! ? ?
Fig. 4. One of the workshop activities aiming to understand roles, workflows, and collaboration within Alamo.
We started Day 2 with a focused discussion of the tensions identified during the previous session:
how to keep the Wikis up to date, how to guarantee access and promote use, how to provide details
while keeping confidentiality, and how to integrate documentation practices in existing workflows.
When discussing how to update the documentation efficiently, one of the data workers pointed
out that gathering required information often took weeks due to the organizational structure and
dispersed nature of teams. In this discussion, possible forms of documentation were explored. A
former data analyst, now team leader, argued that automating the production of metadata was
probably not the best solution because much information exchange and knowledge transfer took
place in daily conversations. He argued that the potential of documentation lies in integrating it into
communication processes:
137
I think that’s where the greatest possibility of improving communication lies. To leave
at least a record. It does not necessarily have to be so precise to set up a whole process,
but a spreadsheet where I document things, an e-mail, a chat. I don’t know. There are
many ways of communication. I don’t know if the best thing is automation, because
we are quite beyond automation. Yes, it is a matter of communication.
Finally, the last workshop activity asked workers to imagine possible approaches to the problems
discussed earlier in the session. We asked them to think of a documentation process that would
guarantee simplicity, accessibility, and integration in workflows. These requirements were derived
from the previously conducted interviews. In their own way, all teams focused on the use of
documentation rather than the act of documenting. They all agreed that documentation should be
centralized rather than dispersed in different locations and be easy to edit and interpret by any team
member. In this sense, three themes were salient. First, the documents should have better navigation
tools for readers. Some proposed additions include a history of editions (such as in other Wikis
like Wikipedia) and a glossary to enable translation between languages and illustrate technology
jargon. Second, documentation should be more legible. In this sense, participants suggested including
visual material, especially in task instructions, such as graphics or video tutorials, and reducing text.
Some participants also proposed more interactive documentation. For example, teams can benefit
from more precise structures instead of documents without format. Third, participants suggested
three ways to increase participation in the documentation: making everybody in the organization in
charge of documenting, automating the passage between text input and the graphic output, and
adding elements of gamification to encourage and measure participation.
4.2 Case 2: Action Data

The Bulgarian company Action Data specializes in image data collection, segmentation, and labeling.
Its clients are computer vision companies and academic institutions, mostly located in North
America and Western Europe. Action Data offers its data workers contractor-based work and the
possibility to complete their assignments remotely, with flexible hours. Workers are paid per piece
(image/video or annotation), and payment varies according to the project and its difficulty. At the
time of our first visit in July 2019, Action Data operated with one manager and two coordinators
in salaried positions handling operations and a pool of around 60 freelance data workers. As part
of its impact-sourcing mission, Action Data recruited its data workers among refugees from the
Middle East who had been granted asylum in Bulgaria. That approach changed in 2021 and now
Action Data outsources its projects to partner organizations located in Syria and Iraq. In fact, in
November 2021, 90% of Action Data’s projects were conducted by data workers in Syria using a
re-intermediation model [41] in which a third-party organization that we will call Ranua recruits
the workers while Action Data manages the projects with clients. This model of using a third party
to recruit workers has been documented in the crowdsourcing and gig economy sectors, in which
some platforms recruit workers to do data work for other platforms [109].
4.2.1 Investigating Action Data’s Documentation Practices.
In 2019, as we began this investigation, Action Data’s projects and workers fluctuated frequently
and communication seemed to adapt to the needs of each project. Consequently, Action Data did
not have standardized documentation guidelines. This was critical considering the fact that most of
the workers worked remotely after the initial training at the company’s office.
After that first fieldwork period and once we had analyzed the collected data, we provided
Action Data with a report summarizing our observations — most of them related to labor conditions,
highlighting their effects on the quality of the datasets produced by the company. In addition, we
provided recommendations regarding the need to educate data workers about the ML pipeline, the
138
urgency of considering labor conditions an essential point in the “ethical AI” discourse at the core
of the firm’s marketing strategy, and the importance of documenting workflows and practices [53].
The company’s management found that our suggestions were useful and actionable. By September
2020 they proceeded to implement some of them, starting with an education program for data
workers. The program included a machine learning course that covered technical and ethical aspects
of ML as well as an overview of the ML supply chain.
It was around that time that we invited Action Data to participate in a series of follow-up
interviews centered around documentation practices. An important development was the company’s
implementation of an internal database of all the projects they had conducted since the beginning
of their operations. However, some of the information contained in it was considered confidential or
sensitive, which resulted in the documents only being shared with Action Data’s management and
the respective clients. The data workers did not have access to the project database. We provided
feedback on this issue and highlighted the importance of sharing information with the data workers.
The recently hired operations officer at the time, Tina, was then put in charge of creating a
series of documents targeted at data workers, that contained guidelines and information about
the different projects carried out by Action Data. Apart from our recommendation, two factors
contributed to the company’s decision to develop these documents. First, Action Data had managed
to establish itself in the market of data services for machine learning — especially data labeling. This
means that, by 2020, the company was conducting long-term projects for a more or less steady set
of clients, which resulted in the need to keep better records and continually train workers. Second,
because of its growth, the company had started outsourcing some requests to partner organizations
located in the Middle East. This development revealed the need for clear — written — guidelines for
a geographically distributed workforce that, in most cases, did not speak English.
Fig. 5. One of the documentation templates prototyped by Action Data: The Scope of Work form (SoW).
139
Since the inclusion of third-party organizations in the conduction of projects and because of
the disadvantaged position of service providers vis-à-vis their clients, Action Data’s perceived
usefulness of documentation started to shift toward the idea of inter-organizational accountability
[69]. The company started to develop a documentation template that was able to reflect its business
relationship with other organizations, i.e. clients and partner organizations in the Middle East
to which projects were being increasingly outsourced. This form of documentation would make
explicit each organization’s responsibilities in terms of work, payment, and deadlines and would
reflect iterations and updates. Moreover, keeping clear records of the task instructions provided
by clients and documenting instructed changes could provide proof that tasks were carried out as
instructed, especially in cases where clients are not satisfied with the quality of the service provided
or decide to demand more, as Eva explained:
We also keep the client accountable so that they don’t come up with a new requirement
or something that we haven’t mentioned before. So, documents are also for account-
ability of us toward the client as well so that the client can have a document where
they can keep track of what the arrangement is and so on beyond our contract.
toward the beginning of 2021, Action Data began developing three documentation templates.
The prototypes were designed as forms where information would be filled in by Action Data’s
management. When sending the prototypes to us to request feedback, operations manager Tina
explained via email:
On the first page is the SoW (Scope of Work) that we regularly use in order to scope
clients’ projects before we start working. On the second one, there is a new idea Eva is
piloting for Dataset Documentation, so some information we will fill out during and
after projects so that the client can use it for transparency purposes if they wish. And
then the third page is a Post Mortem template that we can use to address issues and
challenges after the project is completed.
These three prototypes were examined and interrogated at the co-design workshops conducted
with Action Data workers that we describe in the next subsection.
4.2.2 Co-Designing Documentation with Action Data’s Workers.
The co-design workshops took place in three different sessions facilitated over Zoom.
Day 1 worked as an introduction and focused on getting to know Action Data’s general structure
and workflows, and discussing how stakeholders use the three documentation prototypes (“Scope
of Work,” “Dataset Documentation,” “Postmortem Report”). To this end, participants mapped the
actions and relationships of each of the stakeholders in the data labeling process, from clients to
Action Data’s intermediation, to their partner organization Ranua (pseudonym) in Syria, and the
Syrian data workers (see Fig. 6).
Day 2 introduced the perspectives of the clients — three researchers at a Nordic university. They
were working on automating a safety system at the harbor front that would release a signal, for
instance, if people fall in the water. For this client, the workers in Syria annotated thermal videos
from CCTV footage that would serve as training data. The task was coordinated by Action Data
and outsourced to Ranua’s workers in Syria.
The first group activity of the day was a roleplay (see Fig. 7). We asked each group to reflect
on the knowledge required to conduct data-production projects in their assigned roles by filling
information cards. Afterward, each team presented their cards in a discussion round during which
each group gave their feedback on how accurately it had been roleplayed by the others. The clients
in the role of data workers generated many cards comprising questions about labeling type and
classes, edge cases, desired output, and annotation precision (see Fig. 7). However, the feedback of
140
1
If not working well: come
Co-Designing
Data Documentation
November 2021
STAKEHOLDERS MAP 30 mins. all participants group brainstorming back to clients and ask
for higher payment for
‫خريطة الجهات المعنّية‬ video or to reduce the

number of videos. - if
clients are kind they will
Partners require increase the payment or
the service (end reduce the video.
Who is who in data labeling?

user). They also
CLIENT: Professor: PhD 2: PhD 1: fund the project

partially
private/public
‫َم ْن وراء َم ْن في تسمية البيانات؟‬ University team data data sector
(Scandinavia) collaborator
leader engineer scientist Quality control
INSTRUCTIONS is assessed by
Data
originates
RANUA and
! communicated
from
requester
Do labellers participate in
(own camera) the documentation? Only if with the client
‫الَعْص ف الذهني‬ ‫المشاركة الشفوية‬ ‫الجلسة العامة‬ ‫ دقيقة‬30 they're requested. Docs are
brainstorming oral plenum 30 mins. translated and used in the
ACTION quality
training. ONLY THE
project GUIDELINE IS SHARED,
DATA assurance AND NGO ONLY
manager PARTICIPATES IN THE Reassessment
(Bulgaria) analyst POST-MORTEN Videos received
(quantity &
with a deadline. payment) can
TOOL TIPS Trying not to have
blank or missing occur thanks to
spaces. Clarify the
Why RANUA? 1. Which of different in
partner (RANUA)
the NGO partner would objects. and with the
need a project the most? 2. Capability Project tester Tool choice.
client
Is the partner capable of assessment. assesses the E.g., CVAT:
doing the project? 3. Do Data sample difficulty of open-source
SOW DD DD they have experience/ have CCTV
done something similar
requested the & better for
before? and tried annotation video
Videos Some problems
(identify Why no
could be related to
the tool and
SOW SOW PM PM
individuals) annotation for annotator's browser
animals? Re: no not saving work in
Classify need from
progress
the data requester's end
RANUA supervisor
(decides team operations project based on
labelers the task
(Syria) officer
and internal
timeline)
assistant Human,
"This workflow
vehicle, is shaped like
bycicle, --- a pyramid"
! ! ! ! ! ! Quantity
animal--- (hierarchies)
and timeline
Testing decides Once
Easy part:
(difficulty), team
Test with a
part of the receiving the
After Export
Difficult part: A lot of
labelling,
people/ objects in the recruiting
(social impact),
dataset by feedback
supervisor and
image. Distance between annotator, from clients,
object is close needs more joining and
training then review adjusting in review delivery
concentration. Bad image training
the next instructions
quality also doubles the
work.
with are
Mohamad translated
+ PM into arabic
Fig. 6. Stakeholder map showing the three groups participating in the second workshop session, the nature
of their relationship, and the workflows that characterize it. The map was developed with the workshop
participants. The real names of the organizations and individual actors, originally included in the stakeholder
map, have been changed to be included in this paper.
the actual data workers uncovered two further issues that the clients generating the questions had
not contemplated. First, workers wanted to know how much they would earn. And second, it was
important to them to include information about the ethical implications of the data they had to
work with in terms of potential harms to data subjects as well as concerns about the mental health
of data workers dealing with violent, offensive, or triggering data.
In the same discussion, workers considered the extent of annotation precision required an
important information type because it defines the difficulty of the task and can have negative effects
on the received wages. This requirement is usually negotiated between requesters and managers
without the participation of data workers. The dispersion and intermediation — and power — of
actors like Action Data and Ranua also render pricing negotiation inaccessible to workers. These
negotiations occur primarily between the client and Action Data with lesser participation of
managers from Ranua.
Following these discussions, participants acknowledged that one of the main problems of the
documentation prototypes developed by Action Data is that they only concerned managers and
clients while ignoring data workers. Moreover, instruction documents for workers are translated into
Arabic by Ranua’s management due to workers’ limited knowledge of the English language. In this
sense, the English-based documentation prototypes proposed by Action Data remain inaccessible
to most Syrian workers.
Finally, the last activity of day 2 prompted participants to imagine new forms of documentation
by thinking of its format, structure, information, and the stakeholders involved in its development
and access. Similar to the Alamo team, Action Data’s workers prioritized a view of documentation
as a communication medium. Mostly, they acknowledged that documentation should allow the
co-creation of task instructions with different stakeholders, particularly workers. In the words of
one of the Syrian workers:
141
Communication and feedback should be the highest priorities. The biggest loser when
communication is lacking is the annotators because they spend more effort and time
adapting their work each time.
In addition, by increasing communication between different stakeholders, the documentation
can potentially integrate feedback to make data-production processes clearer by reducing redun-
dant work. Furthermore, it was discussed that a post-mortem report aiming at reflecting on the
finished projects allows producing knowledge for future tasks. Such a document could include
missing information, met goals, fairness in pricing and wages, the quality of the work environment,
difficulties, and learned lessons.
2.A Co-Designing
Data Documentation
November 2021
ROLEPLAY: GENERATING QUESTIONS | GROUP A 10 mins. breakout groups hands-on brainstorming
Obtaining and sharing information: Labelers

What information do Topic
labelers need to do their Topic Lab n y e
Et i s
job and who can provide it?
Question
Question **Wha t e
Topic Topic
la l
Wha t e *$$$$ Ed e C s
Topic
Cla s la n im c i s hi Who holds this information? Question
INSTRUCTIONS Question Req er Wha y ed

Question da ? ca re
brainstorming hands-on breakout 10 mins. *How h lI in d /no ??
How ud e r ?
Who holds this information? Who holds this information?
e c l ?
Req er
This is a hands-on activity in breakout groups. You
will be placed in a breakout room on Zoom. Discuss Who holds this information? Req er
with your group and use this canvas to document the Ac i D ta
Who holds this information?
group's ideas.
Discuss with your group and collect questions that Ran /AD/Req er Topic
labelers might have when doing their work. In ru n u t o

Write only one question in each yellow card. Topic
& un t i y
Question
For each question, consider who might be able to Go l Topic
provide the answer. An o t o P c i *Do w e t lo p i

Question pe n? (ID te n e) * How
Question Topic ha l a b t i l e ? *
Wha 's e g o Wha y u 't e ob t * Are
t e r t? Wha *How c e h
Topic
*Des Ou p t e p so h a t i r ?
t i d u d ? an ti h u Out
Question Question
be?
l h oj w Wha n an ti
TOOL TIPS
Who holds this information? Wha Req er
Who holds this information? be in f u ? s y e? Pol n, bo n
Req er bo , et C n ed h
Req er + AD
Topic Topic
Who holds this information? to

! ?
Question Question
Req er
Who holds this information? Who holds this information?
Relevance
little very
Relevance
little very
! ? Ac i D ta
Fig. 7. Roleplay activity in breakout groups. Here the requester group had to generate questions (the yellow
cards) assuming the role of the data labelers. Afterward, the data labelers provided feedback and added their
own questions (in the blue cards.)
Finally, on day 3, we hoped to enable a space for data workers to express their opinions and
wishes without the presence of managers. Although all workshop sessions were designed to be
about documentation practices, there were some more pressing needs that were the focus of the
data workers: to earn a living wage and be able to provide for their families. This way, we opened
up the floor to discuss these issues and talk about possible approaches. After the workshops, we
helped facilitate the dialogue [12] between the Syrian data workers and Action Data management.
One of the main issues described by the workers was that they received the same payment regardless
of their effort or time spent on the tasks. One of the data workers explained the general feeling among
her colleagues:
Some feel that the profit or the earnings they’re making after the end of the job are not
enough compared to the effort they have put into it.
Another issue with payment was the lack of room for negotiation by workers about their wages.
For example, one practice that workers perceived as unfair was that management divided the client’s
payment equally among all team members regardless of individual efforts. Furthermore, the workers
142
are not consulted in decisions about team composition and cannot influence how many workers
will share the payment:
When we divide the amount of money that we have received for that certain project,
when it’s divided by the number of people working on that project, so it turns out to
be something completely unfair.
Tensions were observed during the discussion on the tools used to perform data work. The
limitations of tools used by workers require them to share accounts and keep a shared spreadsheet
with the number of annotations per task that each worker had done. This makes individual
performances visible, accessible, and modifiable to all. One of the workshop participants explained
that this was a practical choice due to the diverse working conditions and location of workers
and the complexity of supervising and managing their work. He further emphasized mutual trust
between the management and workers and among workers. While acknowledging the existence of
honest behavior and mutual trust within the team, the other participants did not regard trust as a
satisfying justification for the limitations of tools and privacy bridging, and pointed out that the
problem was beyond trust issues:
I get the question of trust, but I believe that some of the issues that have been raised
don’t really have much to do with trust, but maybe with technical problems or the
adequacy of free tools that, you know, make work easier or more difficult.
For one thing, current tools and arrangements posed difficulties in keeping track of and securing
the tasks they individually performed. For another, they did not allow adequate respect for privacy,
which could potentially undermine team solidarity, as some participants explained. Therefore,
other workers advocated for better tools and arrangements to facilitate work and enhance privacy
and security. In this sense, as a potential solution related to documentation efforts, the workers
suggested that documentation should reflect how much workers are being paid and the purpose of
their work. However, the individual earnings of each worker should remain private.
4.3 Salient Considerations and Summary of Findings

Before the workshops, we observed two different orientations in the ways in which the BPOs
implemented our feedback to create the documentation prototypes. While Alamo’s documentation
prototype (the Wiki) was oriented toward the preservation of knowledge that could be useful to train
(future) workers, Action Data had oriented the design of their documentation templates (especially
the Scope of Work) toward keeping inter-organizational accountability, i.e., clearly stating clients’
expectations and workers’ responsibilities [70].
After the workshops, however, we learned that the needs of data workers at both organizations
were not significantly different in terms of what they wanted to see reflected in the documentation. The
apparent diverging orientations adopted by Action Data and Alamo were based on the corporate
needs of each organization because, up until that point, the design process had been piloted by
the companies’ founders and managers. Engaging in hands-on design sessions directly with data
workers gave us the possibility to explore their challenges and needs beyond corporate priorities
and gave rise to a different set of considerations regarding documentation.
To summarize our findings, we present here five design considerations derived from our long-
term engagement at both research sites. Taking steps toward approaching actionable documentation
guidelines, we add concrete questions to illustrate how these considerations could flow into a
documentation framework based on data workers’ needs. We present these documentation items
as questions because that is how the data workers expressed them at the co-design workshops as
we prompted them to re-imagine documentation (see Fig. 8). For the most part, these observations
143
can be generalized to both BPOs. We added specific notes where differences between the research
sites were observed.
• The documentation of data production should be collaborative. Many actors take part
in data production processes and their documentation. The roles can vary according to the
context. There are, however, two patterns that can be derived from our findings. First, most of
the information about tasks comes from the requesters and is documented by managers. The
managers act as gatekeepers, only granting access to information according to the client’s
confidentiality requests. Second, data workers receive only minimum information that allows
them to complete tasks according to the client’s instructions. To break with these patterns,
workers must be acknowledged as collaborators and be granted access to information.
→ Who is the requester? The data workers would like to see information about the client,
its products, and its relationship history with the data processing company included in the
documentation. This information could help workers get an idea of “what to expect” from
each project and produce strategies to collaborate more effectively.
→ Who else is working on this project? Information about team composition is helpful for
the coordination and division of labor. Moreover, at Action Data where the project budget is
divided among the team members, it is essential for data workers to know how many people
will be working on the project because their earnings depend on that.
• The documentation of data production should enable communication. It should trans-

port information that is useful and intelligible to diverse actors and organizations. Language
could present a barrier for geographically distributed teams, as reported by Action Data
workers. Moreover, communication should not just flow top-down. In this sense, data work-
ers at both sites emphasized the need to enable communication channels for feedback from
workers to reach clients.
→ What kind of data is this? Where does it come from? Data workers and managers at
both BPOs argued that requesters should communicate key facts about the data, i.e., data
provenance, dataset’s previous use, and proof of compliance with local regulations, especially
in the case of personal data. Moreover, Action Data participants suggested including a warning
to inform workers if projects involve violent or offensive material.
→ What have we learned from this project? Participants at both sites argued for the inclusion
of a post-mortem report after project completion to reflect upon edge cases, lessons learned,
and possible mismatches between time, effort, and payment. Alamo workers considered this
to be a useful way to preserve knowledge for their professional growth and for future projects.
Instead, Action Data workers saw such reports as a useful communication channel to provide
feedback to requesters, especially on whether pricing and wages were fair vis-a-vis the labor
involved in the task.
• The documentation of data production should not be seen as a one-time action. Data
production is often messy and iterative. ML datasets evolve and change over time. Therefore,
data production requires living, evolving documentation.
→ How has this project evolved over time? Participants at both sites favored the inclusion of
an update history log, including the date, time, and name of the person who documented
each update. Given the constant negotiation and modification of task instructions and the
adaptation of data work projects, a history log could keep important decisions recorded and
task instruction and requirements updated.
144
→ What is the data going to be used for? Data workers at both sites expressed the need to
receive more information about the ML pipeline and dataset maintenance and stewardship.
Specifically, the participants wanted documentation to include a description of the ML product
that will be trained on the basis of the data produced by them. Regarding the involvement
of sensitive or personal data, Action Data workers suggested asking clients for a written
commitment that the data will not be used for harmful purposes. In comparison, workers in
Alamo pointed to the need for clear instructions on how to process and store sensitive data
to enhance privacy.
5.A Co-Designing
Data Documentation
October 2021
REPENSANDO LA DOCUMENTACIÓN | GRUPO A 30 min. grupos construcción
¿Cómo podemos documentar 01 FORMATO

más fácil y efectivamente?
-SIMPLICIDAD
- ACCESIBILIDAD
INSTRUCCIONES -INTEGRACIÓN
Cuestionario Wiki o sitio web Spread sheet Check list
construcción participación activa grupos 30 min.

02 SECCIONES
04
Objetivos
Esta actividad require tu participación activa para
visualizar la discusión. Nos vamos a dividir en grupos y Instrucciones
cada grupo va a a trabajar en una sala distinta de Zoom. Ética y seguridad
05
Deconstruimos los procesos de documentación y los
rediseñamos pensando en:
03 INFORMACION
SIMPLICIDAD: el proceso debe ser simple y la

Por
Qué?
información fácil de entender.
Quién? Cómo? Cuándo? Dónde?
ACCESIBILIDAD: cuidamos la privacidad de los datos y
le damos seguridad al cliente mientras permitimos el
qué?
acceso a los analistas.
INTEGRACIÓN: pensamos en formas de integrar la
documentación en los procesos de trabajo. ¿Se puede
04 ACTORES
automatizar la documentación? ¿Cómo?
service
analista PM público auditor ... ...
owners
05 ACCIONES
HERRAMIENTAS ÚTILES
Automatizar Colaborar Dar Permiso Integrar Detalle de Info

Feedback
Search
Reflexionar Linkear Datos Privados Encontrar Info
Fig. 8. Template for one of the co-design exercises carried out at the workshop series with Action Data and
Alamo. The activity prompted data workers to deconstruct and re-imagine documentation practices.
• The documentation of data production should be integrated in existing workflows

and routines. This could help foster a view of documentation as a constitutive part of
data production and not as extra work. Action Data workers expressed that documentation
could help them understand projects and their requirements better, which would ease their
everyday tasks. Alamo workers mentioned that the preservation of information might come
in handy and help them save time in future projects or to train future workers.
→ What should be done with the data? The participants asked to receive a detailed description
of tasks, including categories and classes for data annotation and collection, edge cases,
and the required quality of annotations. One important point mentioned by Action Data
workers was to include information about annotation precision and acceptance and rejection
standards which could negatively impact their wages.
→ What tools are used? This includes clear guidelines in terms of communication tools, the
software used to work on the data, and therepository where the data is to be stored. Alamo
participants mentioned that the use of certain tools (e.g. Photoshop) has sometimes increased
the interest of workers to join certain projects because they wanted to learn that specific
software tool.
145
• The documentation of data production should be adaptable to stakeholder needs.

We base the design considerations presented in this paper on the needs and wants of data
workers and acknowledge that other stakeholders such as requesters, data subjects, or
policymakers might have different requirements. Moreover, even if the suggestions and needs
of the data workers at both BPOs were similar, some subtle differences could be observed. For
instance, the workers at Action Data attached value to the evaluation of project workload and
payment details, which would help them assess and decide the extent of commitment to the
project, based on the perceived fairness of the wages. Instead, workers in Alamo placed more
emphasis on documenting the evolution of projects because the company runs periodical
evaluations based on individual and project-based performance metrics.
→ How difficult is the task? How labor-intensive is this data? Action Data workers suggested
including a “difficulty rating” for workers to quickly estimate the workload and time required
for each project. In a similar way, Alamo workers mentioned that an overview of the tools
used in each project would help workers understand the difficulty and the knowledge required
for specific tasks.
→ How much does this project pay each data worker? Clear information about wages, payment
structure, and rules was the most highlighted need at the workshops with Action Data
workers because they are paid according to the project and the task. Instead, Alamo workers,
who receive a fixed salary independently from the project, prioritized information about
client expectations and quality standards that would help them complete tasks “according to
what each client has in mind,” which could have a positive impact on workers’ performance
reviews.
5 DISCUSSION
Our findings show that data workers are expected to use the documentation to perform better
and learn new skills. However, they receive only the bare minimum information to allow them
to complete tasks according to clients’ instructions. Workers’ feedback is rarely considered and
only occasionally integrated into task instructions. In addition, access to documents is not always
granted (as reported by Alamo’s workers), and language translations can constitute a barrier (as
discussed by Action Data’s workers). The production of ML datasets involves many iterations
and the collaboration of actors working in different organizations that are often geographically
distributed. Moreover, datasets are frequently updated, relabeled, and reused even after being
produced. Therefore, the challenge consists of creating documentation that is able to capture the
evolving character of datasets and the intricacies of data work. To address this challenge, we
advocate for a shift of perspective in the development and implementation of data documentation
frameworks: from documenting datasets toward documenting data production.
Data production documentation should travel across multiple stakeholders and facilitate their
communication. In this sense, documentation should register iterations in production processes and
enable feedback loops. Documenting data production processes means making collaboration instances
and discussions explicit. By including detailed information about tasks and wages and enabling
the feedback of workers, documentation could make precarious labor conditions in outsourced
data work explicit and, therefore, contestable. Previous data documentation initiatives in ML were
rooted in the value of transparency and motivated by the need to inform consumers and the general
public about dataset characteristics and composition. Our research shifts the focus away from the
notion of transparency and toward promoting reflexivity [31, 70, 94] in terms of how tasks are laid
out and how datasets are produced.
146
The acknowledgment that this form of documentation works as a boundary object [15, 101]
can be helpful in approaching the challenges arising from coordinating and collaborating across
organizations [82]. In this sense, we recognize two contributions that the study of data documenta-
tion as a form of boundary object can bring to the line of research on developing documentation
frameworks for datasets. First, while previous work on data documentation often assumes a single
and consistent notion of “dataset producer,” the boundary object notion helps to highlight the
multiplicity of involved actors and needs. This is especially important in data production, as actors
that are often globally distributed may have different — and even divergent — goals, priorities, and
work requirements. Second, this acknowledgment helps draw more attention to the dynamics
between formal standardization and local accommodations, making the local tailoring of documen-
tation practices visible. As described by Star [100], we believe that this local adaptation is also a
form of invisible work. Useful as the existing data documentation frameworks are, they remain
relatively stable and standardized. In this regard, making visible how a documentation framework
accommodates local specificities and requirements, which can be observed in how it operates as a
boundary object, has values for the design of documentation frameworks.
Three aspects identified and explored by Star [100] and Star and Griesemer [101] could be the
starting point for valuable considerations regarding the functioning of documentation as a boundary
object. The first aspect is interpretive flexibility, understood as the adaptability of boundary objects
to be interpreted, used, and mobilized differently depending on the communities that use them. In
our findings, this is manifested in the workers’ view to conceive data documentation primarily
as a medium to facilitate the co-creation of task instructions, while managers see documentation
as an important site for preserving knowledge and improving performance. Moreover, compa-
nies use documentation to keep each other accountable in the event of discrepancies [70] and
researchers to promote transparency and reproducibility [37, 38]. The second aspect is the material
and organizational structures of boundary objects, which arise from the information needs and
work requirements of different communities. Such material and organizational structures could
include differential access to information, the ownership of information among workers, and the
incorporation of feedback instances. In this sense, the questions included in the previous section
reflect the needs and requirements of data workers regarding documentation. The third aspect
is the tension between the loosely structured common use and more strongly structured local use of
boundary objects. In view of the ideas expressed by our participants, such tension could be eased
through the incorporation of text-based approaches to document processes, visual elements such
as video and images to present the documentation, the use of interactive features to retrieve them,
and a translation aid to facilitate the process.
While this paper centers on the needs and requirements of data workers, future research could
approach the perspectives of other stakeholders, the relationships and communication among them,
and how they shape the development of documentation in intra- and inter-organizational contexts
through the lens of boundary objects.
5.1 Design Implications: Documentation for and with Data Workers

Our findings show that designing documentation based on data workers’ needs and ideas requires
careful consideration of the unique conditions present in machine learning supply chains and in
specific data production settings. For instance, the fact that Action Data is in the European Union
while Ranua and the data workers are in the Middle East requires higher coordination efforts
than data projects at the Argentina-based Alamo. Similarly, while all workers in Alamo are native
Spanish speakers, in the case of Action Data, English is used as a vehicular second language. Given
that most data workers do not speak English, the language choice renders coordination — and
147
documentation — efforts challenging. Such variations entail several implications for the design of
data production documentation that we list in the following.
First, access to information must be guaranteed for documentation to support collaboration.
However, different groups might have diverging needs regarding privacy and security [55]. In this
sense, trust in collaboration [76] must be supported by the documentation framework to foster a
view of data workers as essential collaborators who need access to information to produce better
data [68]. Some of the approaches mentioned by our participants included promoting the ownership
of information among workers and differentiating personal and group information when restricting
access.
Second, requesters could benefit from the early feedback of data workers regarding task design
and instructions. This feedback should also address possible mismatches between payment and
effort. For such an exchange to be possible, documentation frameworks must enable feedback
loops and transparent information about tasks, intended uses of the datasets as well as wages and
payment conditions.
Third, to facilitate the integration of documentation in data production workflows, the infor-
mation input should be simple. In this sense, the workers favored standardized text-based options
such as checklists or multiple-choice questionnaires. Conversely, visual forms were favored in the
documentation presentation. Some of the participants’ ideas included integrating images and short
videos to illustrate instruction. Moreover, conciseness, good structure, and the integration of a
search function and interactive elements were considered crucial to the use of documentation.
Fourth, the mismatch between a documentation input that should be text-based and a documen-
tation output that should be visually appealing must be considered. The participants discussed the
integration of some form of translation aid, in this case, to turn a concise checklist into a compelling
and visual document. Translation was also considered in linguistic terms, for instance, in the case
of Action Data, where most of the information is documented in one language (English), but the
documentation users need it in another (Arabic).
Fifth, the design of a boundary object such as documentation is not a neutral process [45, 48]. In
fact, boundary objects are shaped within power relations that inform how they are understood and
enable specific interpretations out of a general interpretative flexibility [45, 107]. The acknowledg-
ment of such power differentials is at the core of our decision to focus on the needs of data workers
to re-imagine data documentation and is a critical issue to consider when designing documentation
frameworks.
Finally, boundary objects, especially documents, can comprise different, and even divergent,
perspectives of involved participants but do not necessarily require reaching a consensus [100, 101].
Attempts to reach consensus may reinforce hegemonic aspirations and favor the worldviews of
certain groups over others [45, 48]. In this regard, any form of documentation based on workers’
needs should aim to preserve moments of different and even conflicting viewpoints rather than
striving for consensus, as is commonly sought in crowdsourcing tasks.
An initial low-fidelity prototype showing one of the many ways in which our findings and
these design considerations could be integrated into a documentation framework is available at
dataworksheet.com.
5.2 Research Implications: Challenges of Participatory Research

Leaning on the considerations argued by Le Dantec and Fox [57], in what follows, we reflect upon
our work’s limitations and outline the negotiations and interactions through which we constructed
the relationship with our participants. By doing so, we bring to light our position as researchers
within the power relations present at the research sites. We believe that the challenges we discuss
148
here are not exclusive to our investigation and can be understood as more general implications for
participatory research in CSCW.
First, conducting participatory research with organizations does not mean having full access to
observe everything within them. The challenges of negotiating access to corporate sites are not
new within CSCW [35, 77, 111, 114]. In many cases — including this research — the participating
organizations influence what researchers are allowed to see and to whom they are allowed to talk.
This affects the study sample since, often, individual participants are selected by the organizations’
management. In this sense, our observations were enabled but also constrained by the organizations
that hosted us.
Second, co-designing with partner organizations does not always mean getting to design with
the intended community. Often, the gatekeeping tendency of organizations constitutes a barrier
for researchers to establish rapport with participants and meaningfully engage in participatory
processes with them. In our case, even though we had engaged in in-depth interviews with data
workers and their views were prioritized in our feedback to the BPOs, the development of the first
documentation prototypes was piloted by the companies’ founders and managers, and, therefore,
based on corporate priorities instead of workers’ needs. After the co-design workshops, we realized
that enabling spaces for workers to lead the design process would reveal ideas and requirements
that had not surfaced before due to the intermediation of the management.
Third, including culturally and geographically diverse communities in participatory research and
design is as necessary as it is challenging [63, 112]. For instance, in our investigation, working across
language barriers came with its own set of challenges that, at times, counteracted the dynamic
of the workshops. Such cases demand large amounts of researcher adaptability, flexibility, and
creativity.
Finally, being critical while maintaining a relationship with partner organizations requires
researchers to balance their commitments [99]. We acknowledge that a continuous challenge of
our relationship with the BPOs has been finding the right balance between being explicit and
critical with our feedback without losing access to the research sites [114]. This also surfaces an
issue that many participatory projects face, namely, that of balancing contributions to the research
community and to the community that is participating in the design process [57]. Keeping such
balance required us to compromise and adapt while remaining reflexive and true to our commitment
to the data workers.
6 CONCLUSION
In the context of industrial machine learning, discretionary decisions around the production of
training data remain widely undocumented. Previous research has addressed this issue by proposing
standardized checklists or datasheets to document datasets. With our work, we seek to expand that
field of inquiry toward a documentation framework that is able to retrieve heterogeneous (and
often distributed) contexts of data production.
For 2.5 years, we conducted a qualitative investigation with two BPO companies, Alamo and
Action Data, dedicated to producing ML datasets. We engaged in a participatory process with data
workers at both sites to explore the challenges and possibilities of documentation practices. The
process involved phases of interviewing, data analysis, presentation and discussion of preliminary
findings, the collaborative development of prototypes, feedback rounds, and a series of co-design
workshops.
Previous practices at Alamo and Action Data reflect a constant of linear and top-down com-
munication through documentation. However, as our findings show, designing a documentation
framework based on the needs and ideas of data workers involves a different set of considerations.
The data workers prioritized documentation that is able to transport information about tasks,
149
teams, and payment. They also expected documentation to be able to reflect workers’ feedback
and communicate it back to the requesters. In this sense, task instructions could be co-created
through iterations and communication enabled by documentation. Our findings include design
considerations related to communication patterns and the incorporation of feedback, questions of
access and trust, and the differentiation between the creation and use of documentation.
Together with our participants, we have re-imagined data documentation as a process and
an artifact that travels among actors and organizations across cultural, social, and professional
boundaries and is able to ease the collaboration of geographically distributed stakeholders while
preserving the voices of data workers. Based on this, we have identified and approached several
tensions and challenges that emerge from such a form of boundary object. Breaking with top-down
communication and gatekeeping patterns by enabling feedback loops and promoting reflexivity in
the documentation process have been, in this sense, helpful design considerations.
ACKNOWLEDGMENTS
International Development Research Centre of Canada, and the Schwartz Reisman Institute for Tech-
nology and Society. Our deepest gratitude goes to the workers and organizations that participated
in this research. Thanks to Christopher Le Dantec, Marisol Wong-Villacres, and the anonymous
reviewers for their comments and suggestions.
REFERENCES
[1] [n.d.]. AI FactSheets 360. https://aifs360.mybluemix.net/
[2] [n.d.]. Call For Datasets Benchmarks. https://neurips.cc/Conferences/2021/CallForDatasetsBenchmarks
[3] [n.d.]. Google Cloud Model Cards. https://modelcards.withgoogle.com/about
[4] Rikke Aarhus and Stinne Aaløkke Ballegaard. 2010. Negotiating boundaries: managing disease at home. In Proceedings
of the 28th international conference on Human factors in computing systems - CHI ’10. ACM Press, Atlanta, Georgia,
USA, 1223. https://doi.org/10.1145/1753326.1753509
[5] Mark S. Ackerman and Christine Halverson. 1998. Considering an organization’s memory. In Proceedings of the 1998
ACM conference on Computer supported cooperative work - CSCW ’98. ACM Press, Seattle, Washington, United States,
39–48. https://doi.org/10.1145/289444.289461
[6] Shazia Afzal, C Rajmohan, Manish Kesarwani, Sameep Mehta, and Hima Patel. 2021. Data Readiness Report. In 2021
IEEE International Conference on Smart Data Services (SMDS). IEEE, Chicago, IL, USA, 42–51. https://doi.org/10.1109/
SMDS53860.2021.00016
[7] Mariam Asad, Christopher A. Le Dantec, Becky Nielsen, and Kate Diedrick. 2017. Creating a Sociotechnical API:
Designing City-Scale Community Engagement. In Proceedings of the 2017 CHI Conference on Human Factors in
Computing Systems (Denver, Colorado, USA) (CHI ’17). Association for Computing Machinery, New York, NY, USA,
2295–2306. https://doi.org/10.1145/3025453.3025963
[8] Agathe Balayn, Bogdan Kulynych, and Seda Gürses. 2021. Exploring Data Pipelines through the Process Lens: a
Reference Model for Computer Vision. (2021), 8.
[9] Liam Bannon and Susanne Bødker. 1997. Constructing Common Information Spaces. In Proceedings of the Fifth
European Conference on Computer Supported Cooperative Work. Springer Netherlands, Dordrecht, 81–96. https:
//doi.org/10.1007/978-94-015-7372-6_6
[10] Emily M. Bender and Batya Friedman. 2018. Data Statements for Natural Language Processing: Toward Mitigating
System Bias and Enabling Better Science. Transactions of the Association for Computational Linguistics 6 (2018),
587–604. https://doi.org/10.1162/tacl_a_00041
[11] Dane Bertram, Amy Voida, Saul Greenberg, and Robert Walker. 2010. Communication, collaboration, and bugs: the
social nature of issue tracking in small, collocated teams. In Proceedings of the 2010 ACM conference on Computer
supported cooperative work - CSCW ’10. ACM Press, Savannah, Georgia, USA, 291. https://doi.org/10.1145/1718918.
1718972
[12] Susanne Bødker and Morten Kyng. 2018. Participatory Design That Matters—Facing the Big Issues. ACM Trans.
Comput.-Hum. Interact. 25, 1, Article 4 (feb 2018), 31 pages. https://doi.org/10.1145/3152421
[13] Claus Bossen, Lotte Groth Jensen, and Flemming Witt. 2012. Medical secretaries’ care of records: the cooperative
work of a non-clinical group. In Proceedings of the ACM 2012 conference on Computer Supported Cooperative Work -
150
CSCW ’12. ACM Press, Seattle, Washington, USA, 921. https://doi.org/10.1145/2145204.2145341

[14] Claus Bossen, Lotte Groth Jensen, and Flemming Witt Udsen. 2014. Boundary-Object Trimming: On the Invisibility
of Medical Secretaries’ Care of Records in Healthcare Infrastructures. Computer Supported Cooperative Work (CSCW)
23, 1 (Feb. 2014), 75–110. https://doi.org/10.1007/s10606-013-9195-5
[15] Geoffrey C. Bowker and Susan Leigh Star. 1999. Sorting things out: classification and its consequences. MIT Press,
Cambridge, Mass. https://mitpress.mit.edu/books/sorting-things-out
[16] Eva Brandt, Thomas Binder, and Elizabeth Sanders. 2012. Tools and techniques: Ways to engage telling, making
and enacting. In Routledge Handbook of Participatory Design, Jesper Simonsen and Toni Robertson (Eds.). Routledge,
145–181.
[17] Tone Bratteteig, Keld Bødker, Yvonne Dittrich, Preben H. Mogensen, and Jesper Simonsen. 2012. Methods. Organising
principles and general guidelines for Participatory Design projects. In Routledge Handbook of Participatory Design,
Jesper Simonsen and Toni Robertson (Eds.). Routledge, 117–144.
[18] Tone Bratteteig and Ina Wagner. 2016. Unpacking the Notion of Participation in Participatory Design. Computer
Supported Cooperative Work 25, 6 (dec 2016), 425–475. https://doi.org/10.1007/s10606-016-9259-4
[19] Virginia Braun and Victoria Clarke. 2006. Using thematic analysis in psychology. Qualitative Research in Psychology
3 (01 2006), 77–101. https://doi.org/10.1191/1478088706qp063oa
[20] Virginia Braun, Victoria Clarke, Nikki Hayfield, and Gareth Terry. 2019. Thematic Analysis. In Handbook of
Research Methods in Health Social Sciences, Pranee Liamputtong (Ed.). Springer Singapore, Singapore, 843–860.
https://doi.org/10.1007/978-981-10-5251-4_103
[21] Anna Brown, Alexandra Chouldechova, Emily Putnam-Hornstein, Andrew Tobin, and Rhema Vaithianathan. 2019.
Toward Algorithmic Accountability in Public Services: A Qualitative Study of Affected Community Perspectives on
Algorithmic Decision-Making in Child Welfare Services. In Proceedings of the 2019 CHI Conference on Human Factors
in Computing Systems (Glasgow, Scotland Uk) (CHI ’19). Association for Computing Machinery, New York, NY, USA,
1–12. https://doi.org/10.1145/3290605.3300271
[22] Paul Carlile. 2002. A Pragmatic View of Knowledge and Boundaries: Boundary Objects in New Product Development.
Organization Science 13 (08 2002), 442–455. https://doi.org/10.1287/orsc.13.4.442.2953
[23] Kathy Charmaz. 2006. Constructing Grounded Theory: A Practical Guide through Qualitative Analysis. Sage Publications,
London ; Thousand Oaks, Calif.
[24] Michael Chibnik. 2020. Practical and Ethical Complications of Participatory Research. Annals of Anthropological
Practice 44, 2 (Nov. 2020), 208–212. https://doi.org/10.1111/napa.12153
[25] Eric Corbett and Christopher Le Dantec. 2019. Towards a Design Framework for Trust in Digital Civics. In Proceedings
of the 2019 on Designing Interactive Systems Conference (San Diego, CA, USA) (DIS ’19). Association for Computing
Machinery, New York, NY, USA, 1145–1156. https://doi.org/10.1145/3322276.3322296
[26] Terry Costantino, Steven LeMay, Linnea Vizard, Heather Moore, Dara Renton, Sandra Gornall, and Ian Strang.
2014. Participatory Design of Public Library E-Services. In Proceedings of the 13th Participatory Design Conference:
Short Papers, Industry Cases, Workshop Descriptions, Doctoral Consortium Papers, and Keynote Abstracts - Volume
2 (Windhoek, Namibia) (PDC ’14). Association for Computing Machinery, New York, NY, USA, 133–136. https:
//doi.org/10.1145/2662155.2662232
[27] Sasha Costanza-Chock. 2020. Design Justice: Community-Led Practices to Build the Worlds We Need. The MIT Press,
Cambridge, MA. https://design-justice.pubpub.org/
[28] Kate Crawford and Trevor Paglen. 2019. Excavating AI: The Politics of Images in Machine Learning Training Sets.
https://www.excavating.ai tex.ids: zotero-3263.
[29] Emily Denton, Alex Hanna, Razvan Amironesei, Andrew Smart, and Hilary Nicole. 2021. On the genealogy of
machine learning datasets: A critical history of ImageNet. Big Data & Society 8, 2 (July 2021), 205395172110359.
https://doi.org/10.1177/20539517211035955
[30] Emily Denton, Alex Hanna, Razvan Amironesei, Andrew Smart, Hilary Nicole, and Morgan Klaus Scheuerman. 2020.
Bringing the People Back In: Contesting Benchmark Machine Learning Datasets. arXiv:2007.07399 [cs] (July 2020).
http://arxiv.org/abs/2007.07399 arXiv: 2007.07399.
[31] Catherine D’Ignazio and Lauren F. Klein. 2020. Data feminism. The MIT Press, Cambridge, Massachusetts. https:
//mitpress.mit.edu/books/data-feminism
[32] Pelle Ehn. 2008. Participation in Design Things. In Proceedings of the Tenth Anniversary Conference on Participatory
Design 2008 (Bloomington, Indiana) (PDC ’08). Indiana University, USA, 92–101.
[33] Jordan Famularo, Betty Hensellek, and Philip Walsh. 2021. Data Stewardship: A Letter to Computer Vision from
Cultural Heritage Studies. (2021), 11.
[34] Shaoyang Fan, Ujwal Gadiraju, Alessandro Checco, and Gianluca Demartini. 2020. CrowdCO-OP: Sharing Risks and
Rewards in Crowdsourcing. Proceedings of the ACM on Human-Computer Interaction 4, CSCW2 (Oct. 2020), 1–24.
https://doi.org/10.1145/3415203
151
[35] Casey Fiesler, Jed R. Brubaker, Andrea Forte, Shion Guha, Nora McDonald, and Michael Muller. 2019. Qualitative
Methods for CSCW: Challenges and Opportunities. In Conference Companion Publication of the 2019 on Computer Sup-
ported Cooperative Work and Social Computing (Austin, TX, USA) (CSCW ’19). Association for Computing Machinery,
New York, NY, USA, 455–460. https://doi.org/10.1145/3311957.3359428
[36] Asbjørn Ammitzbøll Flügge. 2021. Perspectives from Practice: Algorithmic Decision-Making in Public Employment
Services. In Companion Publication of the 2021 Conference on Computer Supported Cooperative Work and Social
Computing. Association for Computing Machinery, New York, NY, USA, 253–255. https://doi.org/10.1145/3462204.
3481787
[37] Timnit Gebru, Jamie Morgenstern, Briana Vecchione, Jennifer Wortman Vaughan, Hanna Wallach, Hal Daumé III, and
Kate Crawford. 2021. Datasheets for Datasets. Commun. ACM 64, 12 (nov 2021), 86–92. https://doi.org/10.1145/3458723
[38] R. Stuart Geiger, Kevin Yu, Yanlai Yang, Mindy Dai, Jie Qiu, Rebekah Tang, and Jenny Huang. 2020. Garbage in,
Garbage out? Do Machine Learning Application Papers in Social Computing Report Where Human-Labeled Training
Data Comes From?. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency (Barcelona,
Spain) (FAT* ’20). Association for Computing Machinery, New York, NY, USA, 325–336. https://doi.org/10.1145/
3351095.3372862
[39] Yolanda Gil, Cédric H. David, Ibrahim Demir, Bakinam T. Essawy, Robinson W. Fulweiler, Jonathan L. Goodall,
Leif Karlstrom, Huikyo Lee, Heath J. Mills, Ji-Hyun Oh, Suzanne A. Pierce, Allen Pope, Mimi W. Tzeng, Sandra R.
Villamizar, and Xuan Yu. 2016. Toward the Geoscience Paper of the Future: Best practices for documenting and
sharing research from data to software to provenance. Earth and Space Science 3, 10 (2016), 388–415. https:
//doi.org/10.1002/2015EA000136 arXiv:https://agupubs.onlinelibrary.wiley.com/doi/pdf/10.1002/2015EA000136
[40] Alyssa Goodman, Alberto Pepe, Alexander W. Blocker, Christine L. Borgman, Kyle Cranmer, Merce Crosas, Rosanne
Di Stefano, Yolanda Gil, Paul Groth, Margaret Hedstrom, David W. Hogg, Vinay Kashyap, Ashish Mahabal, Aneta
Siemiginowska, and Aleksandra Slavkovic. 2014. Ten Simple Rules for the Care and Feeding of Scientific Data. PLOS
Computational Biology 10, 4 (04 2014), 1–5. https://doi.org/10.1371/journal.pcbi.1003542
[41] Mark Graham, Isis Hjorth, and Vili Lehdonvirta. 2017. Digital labour and development: impacts of global digital
labour platforms and the gig economy on worker livelihoods. Transfer: European Review of Labour and Research 23, 2
(may 2017), 135–162. https://doi.org/10.1177/1024258916687250
[43] Kotaro Hara, Abigail Adams, Kristy Milland, Saiph Savage, Chris Callison-Burch, and Jeffrey P. Bigham. 2018. A
Data-Driven Analysis of Workers’ Earnings on Amazon Mechanical Turk. In Proceedings of the 2018 CHI Conference
on Human Factors in Computing Systems. ACM, Montreal QC Canada, 1–14. https://doi.org/10.1145/3173574.3174023
[44] Lisa Haskel. 2017. Participatory design and free and open source software in the not for profit sector: the Hublink Project.
Ph.D. Dissertation. Bournemouth University.
[45] Beverley Hawkins, Annie Pye, and Fernando Correia. 2017. Boundary objects, power, and learning: The matter
of developing sustainable practice in organizations. Management Learning 48, 3 (July 2017), 292–310. https:
//doi.org/10.1177/1350507616677199
[46] Sarah Holland, Ahmed Hosny, Sarah Newman, Joshua Joseph, and Kasia Chmielinski. 2018. The Dataset Nutrition
Label: A Framework To Drive Higher Data Quality Standards. arXiv:1805.03677 (2018). .http://arxiv.org/abs/1805.03677
[47] Ben Hutchinson, Andrew Smart, Alex Hanna, Emily Denton, Christina Greer, Oddur Kjartansson, Parker Barnes, and
Margaret Mitchell. 2021. Towards Accountability for Machine Learning Datasets: Practices from Software Engineering
and Infrastructure. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency. ACM,
Virtual Event Canada, 560–575. https://doi.org/10.1145/3442188.3445918
[48] Isto Huvila. 2011. The politics of boundary objects: Hegemonic interventions and the making of a document. Journal of
the American Society for Information Science and Technology 62, 12 (2011), 2528–2539. https://doi.org/10.1002/asi.21639
arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1002/asi.21639
[49] Lilly Irani. 2015. The cultural work of microwork. New Media & Society 17, 5 (2015), 720–739. https://doi.org/10.
1177/1461444813511926
[50] Lilly C. Irani and M. Six Silberman. 2013. Turkopticon: interrupting worker invisibility in amazon mechanical turk. In
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI ’13). Association for Computing
Machinery, Paris, France, 611–620. https://doi.org/10.1145/2470654.2470742
[51] Eun Seo Jo and Timnit Gebru. 2020. Lessons from archives: strategies for collecting sociocultural data in machine
learning. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency. ACM, Barcelona Spain,
306–316. https://doi.org/10.1145/3351095.3372829
[52] Michael Katell, Meg Young, Dharma Dailey, Bernease Herman, Vivian Guetler, Aaron Tam, Corinne Bintz, Daniella Raz,
and P. M. Krafft. 2020. Toward Situated Interventions for Algorithmic Equity: Lessons from the Field. In Proceedings
of the 2020 Conference on Fairness, Accountability, and Transparency (Barcelona, Spain) (FAT* ’20). Association for
152
Computing Machinery, New York, NY, USA, 45–55. https://doi.org/10.1145/3351095.3372874

[53] Gunay Kazimzade and Milagros Miceli. 2020. Biased Priorities, Biased Outcomes: Three Recommendations for
Ethics-oriented Data Annotation Practices. In Proceedings of the AAAI/ACM Conference on Artificial Intelligence, Ethics,
and Society. (AIES ’20). Association for Computing Machinery, New York, NY, USA, 1–7. https://doi.org/10.1145/
3375627.3375809 tex.ids: kazimzade2020a, kazimzade2020b.
[54] Sandjar Kozubaev, Fernando Rochaix, Carl DiSalvo, and Christopher A. Le Dantec. 2019. Spaces and Traces: Implica-
tions of Smart Technology in Public Housing. In Proceedings of the 2019 CHI Conference on Human Factors in Computing
Systems. Association for Computing Machinery, New York, NY, USA, 1–13. https://doi.org/10.1145/3290605.3300669
[55] Jacob Leon Kröger, Milagros Miceli, and Florian Müller. 2021. How Data Can Be Used Against People: A Classification
of Personal Data Misuses. https://papers.ssrn.com/abstract=3887097
[56] Diana S. Kusunoki and Aleksandra Sarcevic. 2015. Designing for Temporal Awareness: The Role of Temporality in
Time-Critical Medical Teamwork. In Proceedings of the 18th ACM Conference on Computer Supported Cooperative Work
and Social Computing (Vancouver, BC, Canada) (CSCW ’15). Association for Computing Machinery, New York, NY,
USA, 1465–1476. https://doi.org/10.1145/2675133.2675279
[57] Christopher A. Le Dantec and Sarah Fox. 2015. Strangers at the Gate: Gaining Access, Building Rapport, and
Co-Constructing Community-Based Research. In Proceedings of the 18th ACM Conference on Computer Supported
Cooperative Work & Social Computing (Vancouver, BC, Canada) (CSCW ’15). Association for Computing Machinery,
New York, NY, USA, 1348–1358. https://doi.org/10.1145/2675133.2675147
[58] Charlotte P. Lee. 2005. Between Chaos and Routine: Boundary Negotiating Artifacts in Collaboration. In EC-
SCW 2005, Hans Gellersen, Kjeld Schmidt, Michel Beaudouin-Lafon, and Wendy Mackay (Eds.). Springer-Verlag,
Berlin/Heidelberg, 387–406. https://doi.org/10.1007/1-4020-4023-7_20
[59] Charlotte P. Lee. 2007. Boundary Negotiating Artifacts: Unbinding the Routine of Boundary Objects and Embracing
Chaos in Collaborative Work. Computer Supported Cooperative Work (CSCW) 16, 3 (June 2007), 307–339. https:
//doi.org/10.1007/s10606-007-9044-5
[60] Wayne G. Lutters and Mark S. Ackerman. 2002. Achieving safety: a field study of boundary objects in aircraft technical
support. In Proceedings of the 2002 ACM conference on Computer supported cooperative work - CSCW ’02. ACM Press,
New Orleans, Louisiana, USA, 266. https://doi.org/10.1145/587078.587116
[61] Wayne G. Lutters and Mark S. Ackerman. 2007. Beyond Boundary Objects: Collaborative Reuse in Aircraft Technical
Support. Computer Supported Cooperative Work (CSCW) 16, 3 (June 2007), 341–372. https://doi.org/10.1007/s10606-
006-9036-x
[62] Michael A. Madaio, Luke Stark, Jennifer Wortman Vaughan, and Hanna Wallach. 2020. Co-Designing Checklists
to Understand Organizational Challenges and Opportunities around Fairness in AI. In Proceedings of the 2020 CHI
Conference on Human Factors in Computing Systems. Association for Computing Machinery, New York, NY, USA,
1–14. https://doi.org/10.1145/3313831.3376445
[63] Henry Mainsah and Andrew Morrison. 2014. Participatory Design through a Cultural Lens: Insights from Postcolonial
Theory. In Proceedings of the 13th Participatory Design Conference: Short Papers, Industry Cases, Workshop Descriptions,
Doctoral Consortium Papers, and Keynote Abstracts - Volume 2 (Windhoek, Namibia) (PDC ’14). Association for
[64] David Martin, Benjamin V. Hanrahan, Jacki O’Neill, and Neha Gupta. 2014. Being a turker. In Proceedings of the 17th
ACM conference on Computer supported cooperative work & social computing. ACM, Baltimore Maryland USA, 224–235.
https://doi.org/10.1145/2531602.2531663
[65] Donald Martin Jr, Vinodkumar Prabhakaran, Jill Kuhlberg, Andrew Smart, and William S Isaac. 2020. Participatory
problem formulation for fairer machine learning through community based system dynamics. arXiv preprint
arXiv:2005.07572 (2020).
[66] Laurie McLeod and Bill Doolin. 2010. Documents As Mediating Artifacts in Contemporary IS Development. In
Proceedings of the 43rd Hawaii International Conference on System Sciences. IEEE, Honolulu, HI, USA, 1–10. https:
//doi.org/10.1109/HICSS.2010.155
[67] Milagros Miceli and Julian Posada. 2022. The Data-Production Dispositif. arXiv. https://doi.org/10.48550/arXiv.2205.
11963 arXiv:2205.11963 [cs] type: article.
[68] Milagros Miceli, Julian Posada, and Tianling Yang. 2022. Studying Up Machine Learning Data: Why Talk About Bias
When We Mean Power? Proc. ACM Hum.-Comput. Interact. 6, Article 34 (Jan. 2022), 14 pages. https://doi.org/10.1145/
3492853
[69] Milagros Miceli, Martin Schuessler, and Tianling Yang. 2020. Between Subjectivity and Imposition: Power Dynamics
in Data Annotation for Computer Vision. Proceedings of the ACM on Human-Computer Interaction 4, CSCW2 (Oct.
2020), 1–25. https://doi.org/10.1145/3415186
[70] Milagros Miceli, Tianling Yang, Laurens Naudts, Martin Schuessler, Diana Serbanescu, and Alex Hanna. 2021.
Documenting Computer Vision Datasets: An Invitation to Reflexive Data Practices. In Proceedings of the 2021
153
ACM Conference on Fairness, Accountability, and Transparency. ACM, Virtual Event Canada, 161–172. https:
//doi.org/10.1145/3442188.3445880
[71] Margaret Mitchell, Simone Wu, Andrew Zaldivar, Parker Barnes, Lucy Vasserman, Ben Hutchinson, Elena Spitzer,
Inioluwa Deborah Raji, and Timnit Gebru. 2019. Model Cards for Model Reporting. In Proceedings of the Conference
on Fairness, Accountability, and Transparency (FAT* ’19). Association for Computing Machinery, 220–229. https:
//doi.org/10.1145/3287560.3287596
[72] Michael Muller and Angelika Strohmayer. 2022. Forgetting Practices in the Data Sciences. (2022), 30.
[73] Michael Muller, Christine T Wolf, Josh Andres, Zahra Ashktorab, Narendra Nath Joshi, Michael Desmond, Aabhas
Sharma, Kristina Brimijoin, Qian Pan, Evelyn Duesterwald, and Casey Dugan. 2021. Designing Ground Truth and the
Social Life of Labels. (2021), 17.
[74] Michael J. Muller. 2009. Participatory Design: the third space in HCI. In Human-computer interaction. CRC press,
181–202.
[75] Samir Passi. 2018. Collaboration as Participation: The Many Faces in a Corporate Data Science Project. In The Changing
Contours of "Participation" in Data-driven Algorithmic Ecosystems: Challenges, Tactics, and an Agenda’ workshop in the
2018 ACM CSCW. https://www.samirpassi.com/pubs/working-papers/SamirPassi-CollaborationAsParticipation.pdf
Corporate Data Science Projects. Proc. ACM Hum.-Comput. Interact. 2, CSCW, Article 136 (nov 2018), 28 pages.
https://doi.org/10.1145/3274405
[77] Samir Passi and Phoebe Sengers. 2020. Making data science systems work. Big Data & Society 7, 2 (July 2020),
205395172093960. https://doi.org/10.1177/2053951720939605
[78] Amandalynne Paullada, Inioluwa Deborah Raji, Emily M. Bender, Emily Denton, and Alex Hanna. 2020. Data and its
(dis)contents: A survey of dataset development and use in machine learning research. arXiv:2012.05345 [cs] (Dec.
2020). http://arxiv.org/abs/2012.05345 arXiv: 2012.05345.
[79] Laura R. Pina, Sang-Wha Sien, Teresa Ward, Jason C. Yip, Sean A. Munson, James Fogarty, and Julie A. Kientz. 2017.
From Personal Informatics to Family Informatics: Understanding Family Practices around Health Monitoring. In
Proceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing (Portland,
Oregon, USA) (CSCW ’17). Association for Computing Machinery, New York, NY, USA, 2300–2315. https://doi.org/
10.1145/2998181.2998362
[80] Julian Posada. 2020. The Future of Work Is Here: Toward a Comprehensive Approach to Artificial Intelligence and
Labour. Ethics in Context (2020).
[81] Julian Posada. 2022. Embedded Reproduction in Platform Data Work. Information, Communication & Society (2022).
[82] Mahima Pushkarna, Andrew Zaldivar, and Oddur Kjartansson. 2022. Data Cards: Purposeful and Transparent
Dataset Documentation for Responsible AI. arXiv:2204.01075 [cs] (April 2022). http://arxiv.org/abs/2204.01075 arXiv:
2204.01075.
[83] Samantha Robertson and Niloufar Salehi. 2020. What If I Don’t Like Any Of The Choices? The Limits of Preference
Elicitation for Participatory Algorithm Design. https://doi.org/10.48550/ARXIV.2007.06718
[84] Toni Robertson and Jesper Simonsen. 2012. Participatory Design: An introduction. In Routledge International Handbook
of Participatory Design. Routledge, 1–18.
[85] T. Robertson and I. Wagner. 2012. Ethics: engagement, representation and politics-in-action. In Routledge Handbook
of Participatory Design, Jesper Simonsen and Toni Robertson (Eds.). Routledge, 64–85.
[86] Alex Rosenblat and Luke Stark. 2016. Algorithmic Labor and Information Asymmetries: A Case Study of Uber’s
Drivers. International Journal Of Communication 10, 27 (2016), 3758–3784. https://doi.org/10.2139/ssrn.2686227
[87] Joel Ross, Lilly Irani, M. Six Silberman, Andrew Zaldivar, and Bill Tomlinson. 2010. Who are the crowdworkers?:
shifting demographics in mechanical turk. In CHI ’10 Extended Abstracts on Human Factors in Computing Systems.
ACM, Atlanta Georgia USA, 2863–2872. https://doi.org/10.1145/1753846.1753873
[88] M. J. Rothmann, D. B. Danbjørg, C. M. Jensen, and J. Clemensen. 2016. Participatory Design in Health Care:
Participation, Power and Knowledge. In Proceedings of the 14th Participatory Design Conference: Short Papers, Interactive
Exhibitions, Workshops - Volume 2 (Aarhus, Denmark) (PDC ’16). Association for Computing Machinery, New York,
NY, USA, 127–128. https://doi.org/10.1145/2948076.2948106
[89] Niloufar Salehi, Lilly C. Irani, Michael S. Bernstein, Ali Alkhatib, Eva Ogbe, Kristy Milland, and Clickhappier. 2015.
We Are Dynamo: Overcoming Stalling and Friction in Collective Action for Crowd Workers. In Proceedings of the
33rd Annual ACM Conference on Human Factors in Computing Systems. ACM, Seoul Republic of Korea, 1621–1630.
https://doi.org/10.1145/2702123.2702508
[90] Nithya Sambasivan. 2022. All Equation, No Human: The Myopia of AI Models. Interactions 29, 2 (mar 2022), 78–80.
https://doi.org/10.1145/3516515
[91] Nithya Sambasivan, Shivani Kapania, Hannah Highfill, Diana Akrong, Praveen Paritosh, and Lora M Aroyo. 2021.
“Everyone wants to do the model work, not the data work”: Data Cascades in High-Stakes AI. In Proceedings of the
154
2021 CHI Conference on Human Factors in Computing Systems. ACM, Yokohama Japan, 1–15. https://doi.org/10.1145/
3411764.3445518
[92] Elizabeth B.-N. Sanders and Pieter Jan Stappers. 2008. Co-creation and the new landscapes of design. CoDesign 4, 1
(2008), 5–18. https://doi.org/10.1080/15710880701875068 arXiv:https://doi.org/10.1080/15710880701875068
[93] Devansh Saxena and Shion Guha. 2020. Conducting Participatory Design to Improve Algorithms in Public Services:
Lessons and Challenges. In Conference Companion Publication of the 2020 on Computer Supported Cooperative Work
and Social Computing. Association for Computing Machinery, New York, NY, USA, 383–388. https://doi.org/10.1145/
3406865.3418331
[94] Morgan Klaus Scheuerman, Emily Denton, and Alex Hanna. 2021. Do Datasets Have Politics? Disciplinary Values in
Computer Vision Dataset Development. arXiv:2108.04308 [cs] (Sept. 2021). https://doi.org/10.1145/3476058
[95] Morgan Klaus Scheuerman, Kandrea Wade, Caitlin Lustig, and Jed R Brubaker. 2020. How We’ve Taught Algorithms
to See Identity: Constructing Race and Gender in Image Databases for Facial Analysis. Proc. ACM Hum.-Comput.
Interact. 4, CSCW1 (2020). https://doi.org/10.1145/3392866 Article 058.
[96] Kjeld Schmidt. 2008. Taking CSCW Seriously: Supporting Articulation Work (1992). In Cooperative Work and
Coordinative Practices. Springer London, London, 45–71. https://doi.org/10.1007/978-1-84800-068-1_3
[97] Kristen M. Scott, Sonja Mei Wang, Milagros Miceli, Pieter Delobelle, Karolina Sztandar-Sztanderska, and Bettina
Berendt. 2022. Algorithmic Tools in Public Employment Services: Towards a Jobseeker-Centric Perspective. In 2022
ACM Conference on Fairness, Accountability, and Transparency (Seoul, Republic of Korea) (FAccT ’22). Association for
[98] Shilad Sen, Margaret E. Giesel, Rebecca Gold, Benjamin Hillmann, Matt Lesicko, Samuel Naden, Jesse Russell,
Zixiao (Ken) Wang, and Brent Hecht. 2015. Turkers, Scholars, "Arafat" and "Peace": Cultural Communities and
Algorithmic Gold Standards. In Proceedings of the 18th ACM Conference on Computer Supported Cooperative Work
& Social Computing (CSCW ’15). Association for Computing Machinery, New York, NY, USA, 826–838. https:
//doi.org/10.1145/2675133.2675285 tex.ids= sen2015a.
[99] Mona Sloane, Emanuel Moss, Olaitan Awomolo, and Laura Forlano. 2020. Participation is not a Design Fix for Machine
Learning. arXiv:2007.02423 [cs] (Aug. 2020). http://arxiv.org/abs/2007.02423 arXiv: 2007.02423.
[100] Susan Leigh Star. 2010. This is Not a Boundary Object: Reflections on the Origin of a Concept. Science, Technology, &
Human Values 35, 5 (Sept. 2010), 601–617. https://doi.org/10.1177/0162243910377624
[101] Susan Leigh Star and James R. Griesemer. 1989. Institutional Ecology, ‘Translations’ and Boundary Objects: Amateurs
and Professionals in Berkeley’s Museum of Vertebrate Zoology, 1907-39. Social Studies of Science 19, 3 (Aug. 1989),
387–420. https://doi.org/10.1177/030631289019003001
[102] David Stark. 2021. Algorithmic Management in the Platform Economy. 3, 2020 (2021), 47–72.
[103] Marc Steen. 2013. Co-Design as a Process of Joint Inquiry and Imagination. Design Issues 29, 2 (04 2013), 16–28.
https://doi.org/10.1162/DESI_a_00207 arXiv:https://direct.mit.edu/desi/article-pdf/29/2/16/1715163/desi_a_00207.pdf
[104] Lucy Suchman. 2000. Embodied Practices of Engineering Work. Mind, Culture, and Activity 7, 1-2 (Jan. 2000), 4–18.
https://doi.org/10.1080/10749039.2000.9677645
[105] Lucy Suchman. 2002. Located accountabilities in technology production. Scandinavian journal of information systems
14, 2 (2002), 91–105. http://aisel.aisnet.org/sjis/vol14/iss2/7
[106] Christine T. Wolf and Jeanette L. Blomberg. 2020. Ambitions and Ambivalences in Participatory Design: Lessons from
a Smart Workplace Project. In Proceedings of the 16th Participatory Design Conference 2020 - Participation(s) Otherwise
- Volume 1 (Manizales, Colombia) (PDC ’20). Association for Computing Machinery, New York, NY, USA, 193–202.
https://doi.org/10.1145/3385010.3385029
[107] Pascale Trompette and Dominique Vinck. 2009. Revisiting the notion of Boundary Object. Revue d’anthropologie des
connaissances 3, 1, 1 (2009), 3. https://doi.org/10.3917/rac.006.0003
[108] Paola Tubaro and Antonio A. Casilli. 2019. Micro-work, artificial intelligence and the automotive industry. Journal of
Industrial and Business Economics (2019). https://doi.org/10.1007/s40812-019-00121-1 ISBN: 4081201900 Publisher:
Springer International Publishing.
[109] Paola Tubaro, Antonio A. Casilli, and Marion Coville. 2020. The trainer, the verifier, the imitator: Three ways in
which human platform workers support artificial intelligence. Big Data & Society 7, 1 (2020). https://doi.org/10.1177/
2053951720919776
[110] Gabriela Vargas-Cetina. 2020. Do Locals Need Our Help? On Participatory Research in Anthropology. Annals of
Anthropological Practice 44, 2 (Nov. 2020), 202–207. https://doi.org/10.1111/napa.12152
[111] James R. Wallace, Saba Oji, and Craig Anslow. 2017. Technologies, Methods, and Values: Changes in Empirical
Research at CSCW 1990 - 2015. Proc. ACM Hum.-Comput. Interact. 1, CSCW, Article 106 (dec 2017), 18 pages.
https://doi.org/10.1145/3134741
[112] Heike Winschiers. 2006. The challenges of participatory design in a intercultural context: designing for usability in
Namibia. In PDC. 73–76.
155
[113] Christine T. Wolf. 2019. Conceptualizing Care in the Everyday Work Practices of Machine Learning
Developers. In Companion Publication of the 2019 on Designing Interactive Systems Conference 2019 Companion.
[114] Christine T. Wolf, Julia Bullard, Stacy Wood, Amelia Acker, Drew Paine, and Charlotte P. Lee. 2019. Mapping the
"How" of Collaborative Action: Research Methods for Studying Contemporary Sociotechnical Processes. In Conference
Companion Publication of the 2019 on Computer Supported Cooperative Work and Social Computing (Austin, TX, USA)
(CSCW ’19). Association for Computing Machinery, New York, NY, USA, 528–532. https://doi.org/10.1145/3311957.
3359441
[115] Marisol Wong-Villacres, Carl DiSalvo, Neha Kumar, and Betsy DiSalvo. 2020. Culture in Action: Unpacking Capacities
to Inform Assets-Based Design. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems.
[116] Alex J. Wood, Mark Graham, Vili Lehdonvirta, and Isis Hjorth. 2018. Good Gig, Bad Big: Autonomy and Algorithmic
Control in the Global Gig Economy. Work, Employment and Society 00, 0 (2018), 1–20. https://doi.org/10.1177/
0950017018785616
[117] Allison Woodruff, Sarah E. Fox, Steven Rousso-Schindler, and Jeffrey Warshaw. 2018. A Qualitative Exploration
of Perceptions of Algorithmic Fairness. In Proceedings of the 2018 CHI Conference on Human Factors in Computing
Systems. Association for Computing Machinery, New York, NY, USA, 1–14. https://doi.org/10.1145/3173574.3174230
[118] Xiaomu Zhou, Mark Ackerman, and Kai Zheng. 2011. CPOE workarounds, boundary objects, and assemblages.
In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, Vancouver BC Canada,
3353–3362. https://doi.org/10.1145/1978942.1979439
[119] Carsten Østerlund. 2008. The Materiality of Communicative Practices. Scandinavian Journal of Information Systems
20, 1 (Jan. 2008). https://aisel.aisnet.org/sjis/vol20/iss1/4
Received January 2022; revised April 2022; accepted August 2022
156
5.2 Dynamic Work Sheet: A Prototype

This section presents a low-fidelity prototype that shows how the design considerations provided
in Papers 4 [141] and 5 [124] could be applied in a documentation framework for data production.
A low-fidelity prototype is an initial mockup used to translate high-level design concepts into
tangible artifacts to test functionality rather than the visual appearance of the product. The
prototype is not specific to any type of data-work task or machine learning system. It is simply
an example not meant to be normative or prescriptive. Considering that the investigations
presented in this chapter have multiple implications and can be applied in several forms, this
prototype shows one of the many documentation frameworks that can be derived from the
findings. In the following, I will call this prototype the “Dynamic Work Sheet”.
Classic model New approach
Client Data Worker

Client BPO Data worker
BPO
Figure 5.9: The main idea of the prototype is to break open a typically linear process and turn it
into a participatory one. The goal is to reduce the workload of formulating task instructions and
documenting data work. The collaborative approach should improve communication and enhance
the quality of data by enabling feedback loops and involving data workers in the process.
In view of our findings, two points are key for this prototype:
1. the client needs the expertise of the data worker to actually understand the data and
make sure a high-quality dataset is produced.
2. the workers could be more focused and motivated if involved in understanding what
their labor is for.
In collaborative work, it is important to understand who is providing information and who is

making a decision. As of the classic model, it is common that the client sends task instructions,
and the BPO management forwards the instructions to the workers after interpreting and,
sometimes, translating them. Questions and technical problems are usually dealt with upon
personal interpretation of the managers. In our approach, the client still writes the project
description and provides training material. However, that information flows into an open
document — confidential between client, data workers, and BPO — that the three parties can
access and comment on.
Unlike approaches that strive for consensus in data work, one of the main goals of our
approach to document data production processes is allowing the participation of workers in
shaping workflows and tasks, enabling circular communication with requesters, and preserving
moments of disagreement and discussion.
Wikipedia is an example of open, collaborative work at a large scale where dissent is
preserved. Diverse and, in many cases, diverging contributions from globally dispersed
individuals flow into each Wikipedia article, go through processes of joint creation and edition,
and form an evolving document that becomes available to readers. Disputes are often found in
157
discussions, especially on but not limited to controversial topics. Conflicts are negotiated and
managed differently, and such processes are preserved. For instance, the history of version
control might show reverts, which are often applied against attempts to spread misinformation
or destroy an article [144] or when certain pages are repeatedly alternated and disputed by two
authors or groups [145, 146]. Moreover, non-content space can allow and preserve discussions,
while the article page remains “clean” [145, 147]. Similarly, the Dynamic Work Sheet is
designed to preserve workers’ feedback and notes in the comments section, while the main
section containing instructions and other important information remains clean and clear. The
purpose of the history log is to preserve older versions of the main section.
1. Client questionnaire
2. BPO
Dynamic work sheet The BPO is the first instance to
moderate, add and change text
in the document.
At the same time it assigns its

The questionnaire provides 3 types workers individual accounts to
of Information. the project platform.
1. Essential pieces of information
about the client, 2. details for the
project e.g. budget, time, etc.,
and 3. the work description, which 3. Data Worker
automatically flows into the the
The data workers are mainly
dynamic work sheet.
interested in the work descripti-
on. They need to fully understand
4. Client facilitating the instructions to complete their
changes work. They add comments and
The client sees comments and feedback to the instruction
accordingly adapts the design of document.
the task.
Figure 5.10: The Dynamic Work Sheet serves a boundary object that allows the collaboration of
requesters, BPO managers, and data workers. One important aspect of this framework is that it
enables the co-creation of task instructions with data workers through feedback loops and iteration
instances.
This prototype includes a questionnaire that the requester must fill out to specify the
scope of the request, pricing, deadlines, and instructions. It is the responsibility of the BPO
management to make sure that the client provides the required information. The Dynamic
Work Sheet automatically “translates” that information into an interactive space for the data
workers. In the following, I will describe each system’s section, its interface, and functions in
detail.
158
Each worker gets an invite

BPO = the host of the Platform from the BPO to participate
in Project A. Depending on
whether the worker has
project info inv already worked in this
ita environment, they can use
incl. finances t ion
n
tio the existing login informati-
ita
inv on or create a new account.
CLIENT WORKER
questionnaire
project info
project incl. finances
description for the data
worker
Annotator Interface
Client interface
Figure 5.11: As the three parties have different interests and responsibilities, it is necessary that
the tool has three different areas of content and access. One for the BPO who is also the host, one
for the client who will receive a guided questionnaire to provide all relevant information, and one
for the workers who will each get an individual account.
Project Descripotion with initial training Material Discussion space
PM
Changelog
01/03/2022
01/01/2022
12/23/2021
!!
Link to lableing tool
12/10/2021
Other tools
Client Team lead Sandra Vera Juan 1
...Login The team
Project A / Welcome Oliver Workclock

00:00:00
start stop history
Changelog
Dynamic project space 3
01/03/2022
Messenger 1
01/01/2022
Project 12/23/2021 A Project
is about annotating video material of
PAYMENT
!! potential people in danger fallig into Amount of images/video sequences 1200

Project Info the harbour water. With the help of the 200 videos/labeler
A B C
labeled material, we are training a ML
model to monitor the harbour. The Expected duration per video
12/10/2021 Team
system is able to release an emergency
signal, for instance, if a person falls
2:00 minutes = 30 videos/hour
into the water. Incl. breaktime

5 min./hour = 2 Videos payed /28 Videos completed
Project partner:
12/10/2021 Private Scandinavian University Payment

$0,2 / Video = $6 / hour
assign as main project for next login ....
C
Payment conditions:
Weekly payment via Western Union
Team
You: Oliver Sandra Vera open open
message message message
Personal Information: Agreement / Contract
Name
Address
....
Experience:
Labeling people in videos.
Languages:
Arabic, English, ...
....
Password
Login
I want to I do not want I want to stop
....
participate to participate working
Figure 5.12: Interface overview. This figure offers an overview of the worker’s interface and
shows the different sections of the worker’s space. The subsequent figures show specific sections
and features in detail
159
After logging in, the

worker has the option
Project
to choose a project.
Login
A B C The option to assign a
project for the next
login allows skipping
assign as main project for next login this step.
Main project page

B.
Project A / Welcome Oliver Workclock
00:00:00
time management
start stop auto history
A.
Changelog C.
Dynamic project space 3
01/03/2022 collaboration tools
Messenger 1
01/01/2022
12/23/2021
!!
Project Info
12/10/2021 Team fix functions
12/10/2021 Private
Figure 5.13: The main project page has three sections: A. A changelog for the whole project,
where all updates are documented. B. A time management tool that allows data workers to have
an overview and check for possible mismatches between the project goal and the time needed
to complete the tasks. This tool should support variations in working hours and, in this sense,
promote fair payment. C. A menu to navigate the different information included on the tool.
Project Info
Team
Private
Figure 5.14: Main menu. These three menu points remain constant at the beginning of
each project and as documentation for future reference: Project Information (Fig. 5.15), Team
Composition (Fig. 5.16), and Private Information (Fig. 5.16.)
Project Info
Payment Core Information:

Project A
is about annotating video material of Amount of images/video sequences: 1200 - What?
potential people in danger fallig into 200 videos/labeler - Why?
the harbour water. With the help of the - For whom?
labeled material, we are training a ML Expected duration per video:
model to monitor the harbour. The 2:00 minutes = 30 videos/hour - Payment
system is able to release an emergency information.
signal, for instance, if a person falls Breaktime: - Amount of
into the water. 5 min./hour = 2 Videos payed /28 Videos completed
content.
Project partner: Payment:
- Expected
Scandinavian University $0,2 / Video = $6 / hour duration
per video.
Dr. Peter is our project partner and will be
Payment conditions: - Breaktime
supporting the team.
message
Weekly payment via Western Union yes/no.
Figure 5.15: Project information. This section includes an easy-to-understand idea of what
the data is produced for and who is the requester. This information alone could be important
for some workers to make the decision to continue or abandon the project. The most important
part of this section is a clear overview of what kind of earnings can be expected. As some of our
participants mentioned, the fact that data workers are often not paid by the hour but per task
makes it difficult for them to calculate what kind of earnings to expect.
160
Team
Team Transparent view on

who is working on the
project
+ mini profile which
may include language
skills, professional
background, and data
work expertise.
You: Oliver Sandra Vera open open
message message message
Figure 5.16: Team composition. Our participants were interested in knowing who works in
each project. This would help them foresee how project earnings will be divided and find partners
for information exchange and collaboration.
Private
Personal Information: Agreement / Contract
Personal information
and experience.
Name
Address Information is only
visible for worker and
....
BPO.
Experience:
Labeling people in videos.
Languages:
Arabic, English, ...
....
Password I want to I do not want I want to stop

Login participate to participate working
Figure 5.17: Private section. The private page is meant for personal information and any sort
of contract or data security agreement. The information is kept between the data worker and the
BPO. This provides security in both directions and the information is always accessible.
Dynamic project space
Messenger
Figure 5.18: Collaboration Tools. This prototype includes two communication and
collaboration spaces. The dynamic project space (Fig. 5.19) is the access to the dynamic worksheet
and serves as the main communication tool between the three parties, namely, data workers,
managers, and requesters. If direct and quick communication is needed, an instant messenger is
also included as a second menu point (Fig. 5.20.)
161
Dynamic project space
Project description with initial training material Space for comments Task instructions are
generally written.
V In addition, the
dynamic
S project space could
support training
J material such as videos
or images showing
V samples or best
practices.
O
Dynamic work
descriptions:
workers can comment,
and highlight points
which are not clear.
Others could "like"
those comments and
PM
push them up for a
supervisors or the client
to respond to them first.
If this exchange gave
V place to an update in
the task instructions,
the update is highligh-
ted in the instruction
document for everyone
to see. Workers then
new events since last login
Changelog 3 receive a notification
that the instructions
01/03/2022 Link to annotation tool have been updated.
01/01/2022
Other tools
12/23/2021 Updates are mainly
!! Link to data repository important changes to
the original job
12/10/2021 description which
everyone should execute
as new rules.
Figure 5.19: Dynamic project space. This is the main communication tool between requesters,
managers, and workers. Communication happens through the document. It is designed to actively
exchange information between all parties of the project. The dynamic project space should enable
the co-creation of task instructions with the participation of data workers and preserve moments
of dissent. It is key for reflexivity and clear communication.
Instant Messenger
The whole team Client Team lead Sandra Vera Juan 1
The instant messenger
is only for work
communication
The whole team (all) without showing email
or phone numbers.
Any topics could be

directly communicated
to individual team
members, the
management or the
client, or be discussed
within the whole team.
Relevant topics could
become updates for
everyone to see in the
changelog in the main
project page.
Figure 5.20: Instant Messenger. The messenger is a secondary communication tool meant to
help team members to quickly resolve questions and discrepancies, especially in remote working
situations. Unlike the dynamic project space, where comments and feedback are seen by all
participants, communication via the messenger is private.
162
Reflection and Conclusion
6
6.1 Summary of Findings and Contributions
This dissertation covers three years of research engaging with communities of data laborers in
Argentina and Bulgaria as well as exploring the views and perspectives of data-work managers
and ML practitioners in several regions of the world, including Spain, Kenya, India, Syria,
Germany, Iraq, and the USA.
Chapter 2 critically explored the implications of framing diverse socio-technical problems
as “bias” in machine learning. Through examples related to the study of ML datasets, data
work, and dataset documentation, I argued for a shift of perspective to orient efforts toward
considering the effects of power asymmetries on data and systems. The chapter outlines the
research agenda that I followed in the studies included in this dissertation.
Chapter 3 presented an investigation of the sensemaking of image data as performed by data
annotators, based on the fieldwork I conducted at S1 and S2. In this chapter, I discussed how
meanings are naturalized in and through the labor that data annotators perform. The chapter
situates data production in specific organizational contexts and introduces a socio-economic
dimension to the analysis of data annotation.
In Chapter 4, I presented an analysis of task instruction documents, interviews, as well
as observations of work interfaces on crowdsourcing platforms and in a BPO. The chapter
introduces labor as a fundamental dimension of AI ethics, argues that the “wisdom of crowds” is
a myth, and proposes three ways of counteracting the precarization, alienation, and surveillance
of data workers: making worldviews encoded in task instructions explicit, thinking of workers
as assets, and empowering them to produce better data.
Chapter 5 explored documentation practices as a way to address and make explicit the power
differentials described in the previous chapters. Through a participatory design engagement
with data workers at S1 and S2, I proposed re-imagining data documentation as a process
that can ease the collaboration of geographically distributed stakeholders while preserving the
voices of data workers. The chapter closed with a low-fidelity prototype showing how these
findings could be implemented into a documentation system for data production processes.
163
6. Reflection and Conclusion
Table 6.1: Overview of papers, research questions, methods, and findings.
Paper 1: STUDYING UP MACHINE LEARNING DATA. WHY

TALK ABOUT BIAS WHEN WE MEAN POWER? [37]
Topic Methodology / Findings

Data
Literature survey in three Literature

important HCI and CSCW research • Data is always biased: Data represents specific, arbi-
sub-fields: data quality, trary “truths." Data is produced in settings shaped by
data work, and data docu- unequal social relations.
mentation. The paper dis- • The bias framing at the core of many HCI and CSCW
cusses the futility of fram- investigations overlooks the political character of data
ing diverse types of injustice and is insufficient to address injustice produced and
and harms as “bias,” and ar- perpetuated in data.
gues that the study of power
• To “study up” data and account for power, different
could help expand the field
research questions, methods, and expertise are neces-
of study and better account
sary. Interdisciplinary collaboration and dialogue are
for such issues.
key.
• Power-aware research agenda: study of labor con-
ditions, institutional practices, infrastructures, and
epistemological stances encoded into datasets.
Paper 2: BETWEEN SUBJECTIVITY AND IMPOSITION.

POWER DYNAMICS IN DATA ANNOTATION FOR COMPUTER
VISION [85]
Research Questions Methodology / Findings

Data
1. How do data anno- GTM/fieldwork, • Data annotation is a sense-making practice in which

tators make sense of interviews, and data workers assign meaning to data using labels.
data? observations. • It involves several actors and iterations that form a
2. What conditions, hierarchical structure.
structures, and • Meanings are imposed on data workers, and through
standards shape their labor, on data as well.
that sense-making
• Clients hold power to impose their preferred “truths”
practice?
on data as long as they have the financial means to
3. Who, and at what pay workers to execute that imposition.
stages of the annota-
• Those arbitrary meanings are naturalized in the process.
tion process, decides
Power differential and the epistemic authority of
which classifications
managers and clients is naturalized as well.
best define each data
point? • Documentation and reflexivity are key to breaking with
naturalization.
164
6.1 Summary of Findings and Contributions
Paper 3: THE DATA-PRODUCTION DISPOSITIF [38]

Data
1. What discourses are Dispositif anal- • The data-production dispositif is the network of dis-
present in task in- ysis of instruc- courses, work practices, hierarchies, subjects, and
structions provided tion documents, artifacts comprised in ML data work and the
to outsourced data interviews, and power/knowledge relationships that are established
workers? observations. and naturalized among these elements.
2. How do outsourced • Instead of seeking the wisdom of crowds, requesters use
data workers, man- task instructions to impose predefined “truth values”
agers, and requesters that respond primarily to profit-oriented interests.
interact with each Managers in BPOs and algorithms in labor platforms
other and instruction are in charge of overseeing the process.
documents to pro- • Poverty and dependence in the areas where data work
duce data? is outsourced foster the unquestioning obedience of
3. What artifacts sup- data workers to instructions.
port the observance • Documents, tools, and interfaces are designed to
of instructions, and constrain workers and guarantee their obedience.
what kind of work
• Labor is a fundamental aspect of AI ethics. Fighting
do these artifacts per-
workers’ precarization, alienation, and surveillance are
form?
key to counteract the data-production dispositif.
Paper 4: DOCUMENTING COMPUTER VISION DATASETS. AN

INVITATION TO REFLEXIVE DATA PRACTICES [141]

Data
1. How can the specific GTM / Inter- • Documentation is a tool sensitive to power. Reflexivity
contexts that shape views. is a pre-condition for documentation. Reflexivity as a
the production of im- collective consideration of social and intellectual factors
age datasets be made that lead to praxis.
explicit in documen- • Factors that hinder documentation in ML data pro-
tation? duction: the variety of actors involved, the different
2. Which factors hin- purposes and forms of documentation, the perception
der documentation in of documentation as burdensome, and problems around
this space? the intelligibility of documentation.
3. How can documenta- • Incentives for documentation: preservation of knowl-
tion be incentivized? edge, inter-organizational accountability, auditability,
and regulatory intervention.
165
Paper 5: DOCUMENTING DATA PRODUCTION PROCESSES. A

PARTICIPATORY APPROACH FOR DATA WORK [124]

Data
1. How can PDM. • From documenting datasets, towards documenting

documentation interviews, data-production processes.
reflect the iterative observations, • Documentation to allow workers to intervene and con-
and collaborative prototyping, test workflows and production processes. Preservation
nature of data workshops. of dissent is key.
production
• Documentation is a boundary object. It should
processes?
transport information about tasks, teams, and payment
2. How can to data workers and communicate workers’ feedback
documentation back to the requesters.
practices contribute
• Documentation should be integrated into existing
to mitigating
workflows and routines, and not be seen as a one-time
the information
action.
asymmetries present
in data production
and ML supply
chains?
3. What information do
data workers need to
perform their tasks
in the best condi-
tions, and how can
it be included in the
documentation?
The papers included in this dissertation provide concrete findings in terms of the power
relationships that are present in data work, how these shape ground-truth data and ML
systems, and how the documentation of data production processes can help make some of
these dynamics explicit and contestable. Table 6.1 offers an overview of the research questions
and findings included in each of the papers.
The main contributions can be summarized as follows: (1) This dissertation expands the
field of data bias and bias in crowdsourcing by showing how arbitrary truth values are imposed
onto data workers and, through their labor, onto datasets as well [85]; (2) it shows that labor
conditions in data work are a fundamental aspect of “AI ethics” [38]; (3) it expands the field
of transparency and documentation in ML by showing that a view of datasets as processes,
not fixed entities, can be fruitful to design documentation tools that foster reflexivity and
empower workers [141, 124]; (4) finally, this dissertation introduces a valuable methodological
contribution to the study of data and socio-technical systems by providing a mode of analysis
to approach them through the lens of power [37, 38].
166
6.2 Knowledge Transfer and Science Dissemination
6.2 Knowledge Transfer and Science Dissemination

In my two and a half years working as a doctoral researcher, I published several papers (see
List of Papers Published as Part of my Doctoral Work, p. xix). In addition to conducting
studies and publishing results, knowledge transfer is a very important part of my work at
the Weizenbaum Institute. Since Weizenbaum researchers generally do not have teaching
opportunities, the transfer of knowledge and the dissemination of our findings is carried out
through social media channels, guest lectures, blog posts, and the participation in and the
organization of workshops and conferences. Media attention is key.
I have been invited to hold talks at the University of Cambridge, University College London,
DiPLab (Institut Polytechnique de Paris), the NoBias project, BIFOLD, and TU Berlin. I have
also presented papers at top conferences such as FAccT, ICA, CSCW, CHI, and AIES. I have
furthermore participated and presented my research in numerous workshops and symposia,
which required participants to apply by submitting a paper. Moreover, I have co-organized
three science dissemination workshops, one focused on the use of algorithmic tools in public
employment services [148], the second on data work across domains [50], and the last one on
bringing together academic and activism-based knowledge around data production and use in
Latin America [149]
Figure 6.1: At the workshop Crossing Data: Building Bridges with Activist and Academic
Practices from and for Latin America, which was part of CHI 2022, the participants worked on
creating a “zine” to reflect the discussions and ideas from the workshop.
167
The papers presented in this thesis have received excellent feedback and reviews, as well as
positive attention in the press and across social media platforms. They were mentioned in
hundreds of tweets around the globe, some of them by leading researchers who I personally
admire and by renowned journalists and activists. Our investigation into the power dynamics
involved in image data annotation (Paper 2 [85]) received a Best Paper Award at the CSCW
2020 Conference from among more than 1000 submissions. Paper 3, The Data-Production
Dispositif was awarded a Honorable Mention, a Methods Recognition, and an Impact Award
at the CSCW 2022 Conference.
My research was featured in leading international news outlets such as MIT Technology
Review, Die Tagespost, VentureBeat, El Economista, and Fortune. I was interviewed for
Netzpolitik, DW-Deutsche Welle, the Dutch Public TV, Der Spiegel, Página 12, and numerous
podcasts. I furthermore wrote contributions as a guest writer for several blogs and magazines.
However, the response I am most proud of is knowing that my work is read by data workers,
as some have mentioned. Seeing my research cited by community organizers and union leaders,
seeing it having a real impact on real people, has been the greatest joy. In this sense, among
the many encouraging reviews that my papers received, I cherish the one received for Paper
3 the most: “I imagine this paper will also be important beyond the research community
for workers, organizers, and policy-makers.” Honestly, that is the kind of impact that I look
forward to the most.
6.3 Future Research

In the course of the collaboration projects and studies included in this thesis, various interesting
directions for future work were identified. I consider the following to be most relevant:
• Translation and access to findings: The type of work that I believe to be most urgent is
that of translating existing research to make it accessible to the general public, particularly,
the affected communities. Translation could be in terms of language — considering,
for instance, that most of the data workers participating in this research do not speak
English — but also in terms of format and presentation. Conducting participatory
research and fighting extractive practices means balancing the researcher’s commitments
to science and to the participants [150, 116]. It also means making conscious decisions
about who the audience of our research is. In this sense, I consider that working on
making my findings accessible to communities of data workers and organizers through
visual, interactive, and language-appropriate formats is, at this point, a more pressing
issue than publishing another research paper.
• Dissent in data work: Human-assisted AI — or ML data work as it has been referred

to throughout this dissertation — often relies on inter-rater agreement and consensus to
establish truth values that will be encoded in models via training data [22]. However,
as we have seen, data work does not happen in horizontal and democratic contexts but
is, as any form of labor, entangled with power asymmetries and top-down impositions.
In such contexts, attempts to reach consensus may reinforce hegemonic aspirations and
favor the worldviews of powerful groups over others [151, 152]. As I have argued in these
168
6.4 Reflections on the Limitations of my Research
pages, a shift of perspective is needed toward a view of data workers as important assets
to produce better data. This shift of perspective implies creating avenues for workers to
provide feedback to requesters, as I have proposed in relation to documentation. In this
sense, workers’ dissent might be useful to flag broader data quality issues. As Aroyo and
Welty [134] have suggested, future work could aim at producing empirical evidence of
the usefulness of preserving moments of different and even conflicting viewpoints rather
than striving for consensus, as is commonly sought in crowdsourcing tasks.
• Data cascades: The excellent work by Sambasivan et al. [153], has defined “data cascades”
as compounding events causing negative, downstream effects from data issues—triggered
by conventional AI/ML practices that undervalue data quality and data work. In this
dissertation, I have argued that power asymmetries in data production fundamentally
shape data and systems. Future research should definitely include case studies to identify
specific data cascades effects on specific ML systems, especially those used in high-stakes
scenarios such as health and security. Data cascade studies could be valuable, beyond
the ML research community, to demonstrate the key role of data work — which is often
regarded as not requiring much skill — to ML practitioners and the general public.
• The tech-industry dispositif: As discussed in Paper 4.1.1, our analysis of the data-
production dispositif is bound to remain “incomplete” in the sense that no dispositif
operates on its own. Any attempt to produce a comprehensive account of the data-
production dispositif would need to study its relationship with, among many others, the
“tech-industry dispositif”. Critical aspects of these relationships have been reported in
this dissertation, but future research should delve deeper into the role of ML companies
in the production of datasets to outline how subjective preconceptions and business
interests get encoded in data via problem formulation and value prioritization. The
mode of analysis introduced in Paper 3 could be useful for such future research.
• Documentation and other boundary objects: In this dissertation, I have investigated forms
of documentation based on data workers’ desiderata (see Chapter 5). Future research
could focus on other stakeholders (e.g., data subjects, ML practitioners, or regulators),
however, with careful consideration of the dynamics of power and collaboration in
processes of data production described in this work. Moreover, and given that Paper 4
has demonstrated that documentation is very often seen as a burden, future research could
expand the field of inquiry to imagine and explore other types of boundary object — not
just documentation — with the goal of enabling feedback loops and promoting reflexivity
in ML supply chains.

I have written this dissertation as a first-generation college graduate studying and working
in a foreign country with its own set of bureaucratic rules and politics. The research work
included in these pages is definitely shaped by factors that are external to methodological
choices. These factors include contractual conditions, visa-related issues, and supervision
169
arrangements as well as the negotiation involved in obtaining field access with organizations
(see Section 1.3.4).
I have learned invaluable lessons doing this work. Some of them are research-related but
many others are about being a better listener, ally, and mentor. As I reflect on the past three
years, I see many things that I could have done better. To close this dissertation, I would like
to share four of those reflections.
First, the inner dynamics of research make avoiding extractive practices extremely hard.
Since my first visits to S1 and S2, I knew I wanted to build meaningful relationships with
the data workers and that this required time. I wanted to give something back to them, help
improve the conditions at their workplaces, and center their voices. I was intentional about
giving participants a fair compensation for their time and mindful of their position and mine.
However, the COVID-19 pandemic dashed my plans to go back to the research sites and the
limited time of my contract led me to prioritize my obligations to research, i.e., to publish and
finish my PhD, over my commitment to give back to the data workers. This was a conscious
decision that I must own, even if influenced by unforeseeable conditions. I knew from the start
that I needed to be careful to avoid extractive practices that are, unfortunately, very common
in participatory research and ethnography. I cannot help but think that I could have done
more and better.
Second, as discussed in Paper 5, doing fieldwork at corporate sites and conducting
participatory research with the inter-mediation of partner organizations is challenging [154,
155, 156, 157]. The challenges include negotiating access and overcoming the gatekeeping
tendency of organizations that decisively influences what researchers are allowed to see and
to whom they are allowed to talk. In terms of this dissertation, such negotiations affected
study sample and, ultimately, the findings since individual participants were selected by the
organizations’ management. Further factors are limitations in the amount of time that workers
were allowed to participate in the interviews, the spacial settings for the interviews made
available by the organizations that often counteracted my attempts to ensure confidentiality
and build rapport, and the watchful eye and continuous interventions of managers during some
of the co-design sessions despite my attempts to center workers’ voices. Most of these obstacles
were tacit and implicit. For instance, the organizations would not set specific limitations to
the length of the interviews explicitly but, de facto, interviews would get squeezed in right
before lunch breaks or on a Saturday, which, of course, made some interview partners less
prone to engage deeply with the questions.
Third, remaining reflexive of our own position as researchers vis-à-vis the participants
is as necessary as it is challenging. Throughout my research work, I put much emphasis
on remaining reflexive of my power and the position from which I was conducting research.
These reflexivity exercises were not necessarily about mitigating power differentials (that are
often unavoidable) but rather about acknowledging their existence when interacting with
participants and in reports, papers, and this dissertation. Given the many intersections of
both identity and power, I found that, at times, I made assumptions about the participants
based on my position and privilege. For instance, before my first interviews with data workers
in Argentina, I assumed that we shared a similar background, considering that I, too, was born
in Argentina and raised in a working-class family. However, there is a substantial difference
170
Figure 6.2: Sketch from my field book showing the place in S2 (Bulgaria) where I conducted
the first interviews included in this dissertation. It was an extremely warm summer day and the
participants sat with me for one hour at a time to be interviewed under a fiberglass ceiling — which
was the only place made available by the Bulgarian BPO. Because it was a Saturday, some of the
participants had small children with them.
between spending one’s childhood in a working-class neighborhood (as I did) and growing up
in a slum (as is the case with most of the data workers at S1). In the same way, assuming
I understand the experience of an extremely poor family migrating from Bolivia to live in a
slum in Buenos Aires based on my own experience as a migrant to Germany is, at the very
least, misleading.
Finally, including culturally and geographically diverse communities in research is not
straightforward. Right from the beginning, I wanted my work to shed light on communities
of workers that had remained largely ignored by researchers and practitioners. For instance,
besides this being a no-go in ethnographic research, I decided to conduct fieldwork in Bulgaria
with Arabic-speaking workers even though I do not speak Arabic or Bulgarian. This decision
came with its own set of challenges that, at times, counteracted the communication and
participation dynamics, demanding large amounts of adaptability, flexibility, and creativity.
Unexpectedly, and despite language barriers, I built more lasting relationships with the workers
that I met in Bulgaria than with the Argentine participants.
The four points that I bring up here might seem incidental. However, I believe that they
raise important questions about who gets to do research and where, who benefits from that
research work, and how engaging with underserved communities is never straightforward.
Finally, these questions serve as a reminder that, more often than not, research is that which
was possible, despite what the researcher might have envisioned.
171
6.5 Conclusion
This dissertation is a call to critically examine the set of relations that inscribe specific forms
of power and knowledge in ground-truth data. Such a focus not only encompasses privileged
groups among machine learning practitioners but is also about the role of researchers and
the intertwined discourses in industry and academia [158]. The orientation to study power is
also an attempt to move the research focus beyond a simplistic view of individual behaviors
and perspectives that, as Paper 1 [37] discusses, ends up allocating responsibilities to data
workers exclusively by portraying them as “bias-carrying hazards” whose subjectivities need
to be tamed.
While the potentially harmful effects of algorithmic biases continue to be widely discussed,
it is also essential to address how power imbalances and imposed classification schemes in data
production contribute to the (re)production of injustice. The improvement of labor conditions
in data work, the empowerment of workers, and the consideration of their labor as a powerful
tool to produce better data, as well as the detailed documentation of outsourced processes
of data production, remain essential steps to allow for spaces of reflection, deliberation, and
audit that can contribute to addressing important social and ethical questions surrounding
machine learning technologies.
172
References
[1] Catherine D’Ignazio and Lauren F. Klein. Data Feminism. Strong Ideas Series.
Cambridge, Massachusetts: The MIT Press, 2020. isbn: 978-0-262-04400-4.
[2] Alexandra Olteanu et al. “Social Data: Biases, Methodological Pitfalls, and Ethical
Boundaries”. en. In: SSRN Electronic Journal (2016). issn: 1556-5068. doi: 10 .
2139 / ssrn . 2886526. url: https : / / www . ssrn . com / abstract = 2886526 (visited
on 08/03/2018).
[3] Lucas Dixon et al. “Measuring and Mitigating Unintended Bias in Text Classification”.
en. In: Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society -
AIES ’18. New Orleans, LA, USA: ACM Press, 2018, pp. 67–73. isbn: 978-1-4503-6012-8.
doi: 10.1145/3278721.3278729. url: http://dl.acm.org/citation.cfm?doid=
3278721.3278729 (visited on 11/03/2019).
[4] Joy Buolamwini and Timnit Gebru. “Gender Shades: Intersectional Accuracy Disparities
in Commercial Gender Classification”. In: Proceedings of the 1st Conference on Fairness,
Accountability and Transparency. Vol. 81. PMLR, 2018, pp. 77–91. url: http : / /
proceedings.mlr.press/v81/buolamwini18a.html.
[5] Cathy O’Neil. Weapons of Math Destruction: How Big Data Increases Inequality and
Threatens Democracy. English. London: PENGUIN BOOKS, 2017. isbn: 0-14-198541-0
978-0-14-198541-1.
[6] Solon Barocas and Andrew D. Selbst. “Big Data’s Disparate Impact”. en. In: California
Law Review 104.3 (2016), pp. 671–732. doi: 10.15779/Z38BG31.
[7] Anna Offenwanger et al. “Diagnosing Bias in the Gender Representation of HCI Research
Participants: How it Happens and Where We Are”. en. In: Proceedings of the 2021
CHI Conference on Human Factors in Computing Systems. Yokohama Japan: ACM,
May 2021, pp. 1–18. isbn: 978-1-4503-8096-6. doi: 10.1145/3411764.3445383. url:
https://dl.acm.org/doi/10.1145/3411764.3445383 (visited on 06/29/2022).
[8] Josef Ditrich. “Data representativeness problem in credit scoring”. en. In: Acta
Oeconomica Pragensia 23.3 (June 2015), pp. 3–17. issn: 05723043, 18042112. doi:
10.18267/j.aop.472. url: http://aop.vse.cz/doi/10.18267/j.aop.472.html
(visited on 06/29/2022).
[9] Paul Baker and Amanda Potts. “‘Why do white people have thin lips?’ Google and
the perpetuation of stereotypes via auto-complete search forms”. In: Critical Discourse
Studies 10.2 (May 2013), pp. 187–204. issn: 1740-5904. doi: 10.1080/17405904.2012.
173
REFERENCES
744320. url: https://www.tandfonline.com/doi/abs/10.1080/17405904.2012.

744320 (visited on 02/21/2019).
[10] Jahna Otterbacher, Jo Bates, and Paul Clough. “Competent Men and Warm Women:
Gender Stereotypes and Backlash in Image Search Results”. en. In: Proceedings of the
2017 CHI Conference on Human Factors in Computing Systems - CHI ’17. Denver,
Colorado, USA: ACM Press, 2017, pp. 6620–6631. isbn: 978-1-4503-4655-9. doi: 10.
1145/3025453.3025727. url: http://dl.acm.org/citation.cfm?doid=3025453.
3025727 (visited on 08/14/2019).
[11] C. E. Brodley and M. A. Friedl. “Identifying Mislabeled Training Data”. en. In: Journal
of Artificial Intelligence Research 11 (Aug. 1999), pp. 131–167. issn: 1076-9757. doi:
10.1613/jair.606. url: https://jair.org/index.php/jair/article/view/10238
(visited on 08/19/2019).
[12] Justin Cheng and Dan Cosley. “How annotation styles influence content and preferences”.
en. In: Proceedings of the 24th ACM Conference on Hypertext and Social Media - HT
’13. Paris, France: Association for Computing Machinery, 2013, pp. 214–218. isbn: 978-
1-4503-1967-6. doi: 10.1145/2481492.2481519. url: http://dl.acm.org/citation.
cfm?doid=2481492.2481519 (visited on 11/03/2019).
[13] Christoph Hube, Besnik Fetahu, and Ujwal Gadiraju. “Understanding and Mitigating
Worker Biases in the Crowdsourced Collection of Subjective Judgments”. In: Proceedings
of the 2019 CHI Conference on Human Factors in Computing Systems. CHI ’19. tex.ids:
hube2019a event-place: Glasgow, Scotland Uk. New York, NY, USA: Association for
Computing Machinery, 2019, pp. 1–12. isbn: 978-1-4503-5970-2. doi: 10.1145/3290605.
3300637. url: https://dl.acm.org/doi/10.1145/3290605.3300637 (visited on
06/27/2019).
[14] Fabian L. Wauthier and Michael I. Jordan. “Bayesian Bias Mitigation for Crowd-
sourcing”. en. In: Proceedings of the 24th International Conference on Neural
Information Processing Systems. NIPS’11. Granada, Spain: Curran Associates Inc., 2011,
pp. 1800–1808. isbn: 978-1-61839-599-3. url: http://papers.nips.cc/paper/4311-
bayesian-bias-mitigation-for-crowdsourcing.pdf.
[15] Bhavya Ghai et al. “Measuring Social Biases of Crowd Workers using Counterfactual
Queries”. en. In: Honolulu, HI, USA, Apr. 2020. url: http://fair- ai.owlstown.
com/publications/1424 (visited on 05/08/2020).
[16] Emily M. Bender and Batya Friedman. “Data Statements for Natural Language
Processing: Toward Mitigating System Bias and Enabling Better Science”. en. In:
Transactions of the Association for Computational Linguistics 6 (2018), pp. 587–604.
doi: 10 . 1162 / tacl _ a _ 00041. url: https : / / www . aclweb . org / anthology / Q18 -
1041/.
[17] Timnit Gebru et al. “Datasheets for Datasets”. In: arXiv:1803.09010 [cs] (Mar. 2020).
arXiv: 1803.09010. url: http://arxiv.org/abs/1803.09010 (visited on 10/06/2020).
174
REFERENCES
[18] R. Stuart Geiger et al. “Garbage in, garbage out? do machine learning application
papers in social computing report where human-labeled training data comes from?” In:
Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency. FAT*
’20. Barcelona, Spain: Association for Computing Machinery, Jan. 2020, pp. 325–336.
isbn: 978-1-4503-6936-7. doi: 10.1145/3351095.3372862. url: https://doi.org/10.
1145/3351095.3372862 (visited on 01/28/2020).
[19] Sarah Holland et al. “The Dataset Nutrition Label: A Framework To Drive Higher
Data Quality Standards”. en. In: arXiv:1805.03677 (2018). url: .%20http://arxiv.
org/abs/1805.03677.
[20] Shazia Afzal et al. “Data Readiness Report”. In: 2021 IEEE International Conference
on Smart Data Services (SMDS). Chicago, IL, USA: IEEE, Sept. 2021, pp. 42–51. isbn:
978-1-66540-058-9. doi: 10.1109/SMDS53860.2021.00016. url: https://ieeexplore.
ieee.org/document/9592479/ (visited on 11/22/2021).
[21] Michael Muller et al. “Designing Ground Truth and the Social Life of Labels”. en. In:
Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems.
Yokohama Japan: ACM, May 2021, pp. 1–16. isbn: 978-1-4503-8096-6. doi: 10.1145/
3411764 . 3445402. url: https : / / dl . acm . org / doi / 10 . 1145 / 3411764 . 3445402
(visited on 09/16/2021).
[22] Lora Aroyo and Chris Welty. “Crowd Truth: Harnessing disagreement in crowdsourcing
a relation extraction gold standard”. In: 2013. doi: 10.6084/M9.FIGSHARE.679997.V1.
[23] Antonio A. Casilli and Julian Posada. “The Platformisation of Labor and Society”. In:
Society and the Internet. Ed. by Mark Graham and William H. Dutton. Vol. 2. Oxford:
Oxford University Press, 2019.
[24] Antonio A. Casilli et al. Le Micro-Travail en France. Derrière l’automatisation de
nouvelles précarités au travail ? Paris: Projet DiPLab « Digital Platform Labor », 2019,
p. 72.
[25] Paola Tubaro and Antonio A. Casilli. “Micro-work, artificial intelligence and the
automotive industry”. In: Journal of Industrial and Business Economics (2019). issn:
1972-4977. doi: 10.1007/s40812-019-00121-1. url: https://doi.org/10.1007/
s40812-019-00121-1.
[26] Edith Law and Luis von Ahn. “Human Computation”. In: Synthesis Lectures on Artificial
Intelligence and Machine Learning 5.3 (June 2011). Publisher: Morgan & Claypool Pub-
lishers, pp. 1–121. issn: 1939-4608. doi: 10.2200/S00371ED1V01Y201107AIM013. url:
https : / / www . morganclaypool . com / doi / 10 . 2200 / S00371ED1V01Y201107AIM013
(visited on 06/04/2022).
[27] Pradeep Shenoy and Desney S. Tan. “Human-aided computing: utilizing implicit human
processing to classify images”. In: Proceedings of the SIGCHI Conference on Human
Factors in Computing Systems. CHI ’08. New York, NY, USA: Association for Computing
Machinery, Apr. 2008, pp. 845–854. isbn: 978-1-60558-011-1. doi: 10.1145/1357054.
1357188. url: https://doi.org/10.1145/1357054.1357188 (visited on 06/04/2022).
175
REFERENCES
[28] Vinay Gaba. “Crowd sourced Human Computation on the Smartphone Lock Screen
- Inpressco”. In: International Journal of Current Engineering and Technology 4.5
(Oct. 2014), pp. 3602–3604. url: http://inpressco.com/crowdsourced- human-
computation-on-the-smartphone-lock-screen/ (visited on 06/30/2022).
[29] Lin Ling and Chee Wei Tan. “Human-Assisted Computation for Auto-Grading”. In:
2018 IEEE International Conference on Data Mining Workshops (ICDMW). Singapore,
Singapore: IEEE, Nov. 2018, pp. 360–364. isbn: 978-1-5386-9288-2. doi: 10.1109/ICDMW.
2018.00059. url: https://ieeexplore.ieee.org/document/8637410/ (visited on
06/30/2022).
[30] Vitaliy Liptchinsky. “Collaboration-assisted computation”. en. Dissertation. TU Wien,
2016. doi: 10 . 34726 / HSS . 2016 . 37962. url: https : / / repositum . tuwien . at /
handle/20.500.12708/7959.
[31] Lexing Xie, David A. Shamma, and Cees Snoek. “Content is Dead ... Long Live Content:
The New Age of Multimedia-Hard Problems”. In: IEEE MultiMedia 21.1 (2014), pp. 4–8.
issn: 1070-986X. doi: 10.1109/MMUL.2014.5. url: http://ieeexplore.ieee.org/
document/6756788/ (visited on 06/30/2022).
[32] Marta Sabou, Kalina Bontcheva, and Arno Scharl. “Crowdsourcing research oppor-
tunities: lessons from natural language processing”. en. In: Proceedings of the 12th
International Conference on Knowledge Management and Knowledge Technologies -
i-KNOW ’12. Graz, Austria: ACM Press, 2012, p. 1. isbn: 978-1-4503-1242-4. doi:
10 . 1145 / 2362456 . 2362479. url: http : / / dl . acm . org / citation . cfm ? doid =
2362456.2362479 (visited on 06/30/2022).
[33] Rainer Mühlhoff. “Human-aided artificial intelligence: Or, how to run large computations
in human brains? Toward a media sociology of machine learning”. en. In: New Media
& Society 22.10 (Oct. 2020). Publisher: SAGE Publications, pp. 1868–1884. issn:
1461-4448. doi: 10 . 1177 / 1461444819885334. url: https : / / doi . org / 10 . 1177 /
1461444819885334 (visited on 06/04/2022).
[34] Lilly Irani. “The cultural work of microwork”. In: New Media & Society 17.5 (2015),
pp. 720–739. issn: 1461-4448, 1461-7315. doi: 10 . 1177 / 1461444813511926. url:
http://nms.sagepub.com/content/early/2013/11/19/1461444813511926%7B%5C%
%7D5Cnhttp://nms.sagepub.com/content/early/2013/11/19/1461444813511926.
abstract.
[35] Natã M. Barbosa and Monchu Chen. “Rehumanized Crowdsourcing: A Labeling
Framework Addressing Bias and Ethics in Machine Learning”. en. In: Proceedings of the
2019 CHI Conference on Human Factors in Computing Systems. Glasgow Scotland Uk:
ACM, May 2019, pp. 1–12. isbn: 978-1-4503-5970-2. doi: 10.1145/3290605.3300773.
url: https://dl.acm.org/doi/10.1145/3290605.3300773 (visited on 07/29/2021).
[36] Florian A Schmidt. Crowdproduktion von Trainingsdaten: zur Rolle von Online-Arbeit
beim Trainieren autonomer Fahrzeuge. de. OCLC: 1118990966. 2019. isbn: 978-3-86593-
330-0. url: http : / / www . boeckler . de / pdf / p _ study _ hbs _ 417 . pdf (visited on
06/04/2022).
176
REFERENCES
[37] Milagros Miceli, Julian Posada, and Tianling Yang. “Studying Up Machine Learning
Data: Why Talk About Bias When We Mean Power?” en. In: Proc. ACM Hum.-
Comput. Interact. GROUP, (January 2022) 6.Article 34 (Jan. 2022), 14 pages. doi:
10.1145/3492853. url: https://doi.org/10.1145/3492853.
[38] Milagros Miceli and Julian Posada. “The Data-Production Dispositif”. In: Proc. ACM
Hum.-Comput. Interact. 6.CSCW2 (Nov. 2022). doi: 10.1145/3555561. url: https:
//doi.org/10.1145/3555561.
[39] Benjamin Shestakofsky. “Working Algorithms: Software Automation and the Future of
Work”. en. In: Work and Occupations 44.4 (Nov. 2017). Publisher: SAGE Publications
Inc, pp. 376–423. issn: 0730-8884. doi: 10 . 1177 / 0730888417726119. url: https :
//doi.org/10.1177/0730888417726119 (visited on 06/04/2022).
[40] Autumn Edwards et al. “Initial expectations, interactions, and beyond with social
robots”. en. In: Computers in Human Behavior 90 (Jan. 2019), pp. 308–314. issn:
0747-5632. doi: 10.1016/j.chb.2018.08.042. url: https://www.sciencedirect.
com/science/article/pii/S0747563218304175 (visited on 06/04/2022).
[41] Julian Posada, Milagros Miceli, and Gemma Newlands. “Labor, Automation, and
Human-Machine Communication”. In: SAGE Handbook on Human-Machine Communi-
cation. Ed. by Andrea Guzman, Rhonda N Mcewen, and Steve Jones. Sage Publications,
2022.
[42] Samir Passi and Steven Jackson. “Data Vision: Learning to See Through Algorithmic
Abstraction”. en. In: Proceedings of the 2017 ACM Conference on Computer Supported
Cooperative Work and Social Computing. CSCW ’17. Portland, Oregon, USA: Associ-
ation for Computing Machinery, 2017, pp. 2436–2447. isbn: 978-1-4503-4335-0. doi:
10.1145/2998181.2998331.
[43] Samir Passi and Steven J. Jackson. “Trust in Data Science: Collaboration, Translation,
and Accountability in Corporate Data Science Projects”. en. In: Proc. ACM Hum.-
Comput. Interact. 2.CSCW (Nov. 2018), pp. 1–28. issn: 25730142. doi: 10 . 1145 /
3274405.
[44] Samir Passi and Solon Barocas. “Problem Formulation and Fairness”. en. In: Proceedings
of the Conference on Fairness, Accountability, and Transparency. FAT* ’19. Atlanta, GA,
USA: Association for Computing Machinery, 2019, pp. 39–48. isbn: 978-1-4503-6125-5.
doi: 10.1145/3287560.3287567.
[45] Cathrine Seidelin. “Towards a Co-design Perspective on Data : Foregrounding Data in
the Design and Innovation of Data-based Services”. en. Ph.D. thesis. IT-Universitetet i
København, 2020. url: https://pure.itu.dk/portal/en/publications/towards-
a- codesign- perspective- on- data(8cbca471- a8f5- 49cf- a127- e0e8ae40e5a1)
.html.
[46] Melanie Feinberg. “A Design Perspective on Data”. en. In: CHI ’17: Proceedings of
the 2017 CHI Conference on Human Factors in Computing Systems. CHI ’17. Denver,
Colorado, USA: Association for Computing Machinery, 2017, pp. 2952–2963. isbn:
978-1-4503-4655-9. doi: 10.1145/3025453.3025837.
177
REFERENCES
[47] Michael Muller et al. “How Data Science Workers Work with Data: Discovery, Capture,
Curation, Design, Creation”. en. In: Proceedings of the 2019 CHI Conference on
Human Factors in Computing Systems. CHI ’19. Glasgow, Scotland Uk: Association
for Computing Machinery, 2019, pp. 1–15. isbn: 978-1-4503-5970-2. doi: 10.1145/
3290605.3300356. url: http://dl.acm.org/citation.cfm?doid=3290605.3300356
(visited on 08/09/2019).
[48] Naja Holten Møller et al. “Who Does the Work of Data?” In: Interactions 27.3 (Apr.
2020), pp. 52–55. issn: 1072-5520. doi: 10.1145/3386389. url: https://doi.org/10.
1145/3386389.
[49] Claus Bossen et al. “Data work in healthcare: An Introduction”. en. In: Health
Informatics Journal 25.3 (Sept. 2019), pp. 465–474. issn: 1460-4582, 1741-2811. doi:
10.1177/1460458219864730. url: http://journals.sagepub.com/doi/10.1177/
1460458219864730 (visited on 06/24/2022).
[50] Kathleen Pine et al. “Investigating Data Work Across Domains: New Perspectives on the
Work of Creating Data”. In: CHI Conference on Human Factors in Computing Systems
Extended Abstracts. CHI EA ’22. New York, NY, USA: Association for Computing
Machinery, Apr. 2022, pp. 1–6. isbn: 978-1-4503-9156-6. doi: 10 . 1145 / 3491101 .
3503724. url: https://doi.org/10.1145/3491101.3503724 (visited on 04/29/2022).
[51] Mary L. Gray and Siddharth Suri. Ghost Work: How to Stop Silicon Valley from
Building a New Global Underclass. Englisch. Boston: Houghton Mifflin Harcourt, May
2019. isbn: 978-1-328-56624-9.
[52] Brian Justie. “Little history of CAPTCHA”. In: Internet Histories 5.1 (2021), pp. 30–
47. issn: 2470-1475. doi: 10 . 1080 / 24701475 . 2020 . 1831197. url: https : / / www .
tandfonline.com/doi/full/10.1080/24701475.2020.1831197.
[53] Gemma Newlands. “Lifting the curtain: Strategic visibility of human labour in AI-
as-a-Service”. en. In: Big Data & Society 8.1 (Jan. 2021), p. 205395172110160. issn:
2053-9517, 2053-9517. doi: 10.1177/20539517211016026. url: http://journals.
sagepub.com/doi/10.1177/20539517211016026 (visited on 05/18/2021).
[54] Julian Posada. “The Future of Work Is Here: Toward a Comprehensive Approach to
Artificial Intelligence and Labour”. In: Ethics of AI in Context (2020). issn: 10185909.
[55] Julian Posada. “Unbiased: AI Needs Ethics from Below”. In: New AI Lexicon. Ed. by
Noopur Raval, Amba Kak, and Luke Strathman. New York, NY: AI Now Institute,
2021.
[56] Julian Posada. “Embedded Reproduction in Platform Data Work”. In: Information,
Communication & Society (2022).
[57] Christine Gerber. “Community building on crowdwork platforms: Autonomy and control
of online workers?” en. In: Competition & Change 25.2 (Apr. 2021), pp. 190–211. issn:
1024-5294, 1477-2221. doi: 10 . 1177 / 1024529420914472. url: http : / / journals .
178
REFERENCES
[58] Martin Krzywdzinski and Christine Gerber. “Between automation and gamifica-
tion: forms of labour control on crowdwork platforms”. en. In: Work in the
Global Economy 1.1 (Oct. 2021), pp. 161–184. issn: 2732-4176. doi: 10 . 1332 /
273241721X16295434739161. url: https://bristoluniversitypressdigital.com/
doi/10.1332/273241721X16295434739161 (visited on 06/28/2022).
[59] Thomas Poell, David Nieborg, and José van Dijck. “Platformisation”. In: Internet
Policy Review 8.4 (Nov. 2019). issn: 2197-6775. url: https://policyreview.info/
concepts/platformisation (visited on 06/23/2022).
[60] Mark Borman. “Applying Multiple Perspectives to the BPO decision: A Case Study
of Call Centres in Australia”. en. In: Journal of Information Technology 21.2 (June
2006), pp. 99–115. issn: 0268-3962, 1466-4437. doi: 10.1057/palgrave.jit.2000057.
url: http://journals.sagepub.com/doi/10.1057/palgrave.jit.2000057 (visited
on 06/23/2022).
[61] Hamid R Motahari-Nezhad, Bryan Stephenson, and Sharad Singhal. “Outsourcing
Business to Cloud Computing Services: Opportunities and Challenges”. In: IEEE
Internet Computing, Special Issue on Cloud Computing. 2009. url: https://www.hpl.
hp.com/techreports/2009/HPL-2009-23.html.
[62] Lorenza Errighi, Charles Bodwell, and Sameer Khatiwada. Business process outsourcing
in the Philippines: Challenges for decent work. en. Working paper. International Labour
Organization, Dec. 2016. url: http://www.ilo.org/asia/publications/WCMS_
538193/lang--en/index.htm.
[63] Mark Graham, Isis Hjorth, and Vili Lehdonvirta. “Digital labour and development:
impacts of global digital labour platforms and the gig economy on worker livelihoods”.
In: Transfer: European Review of Labour and Research 23.2 (May 2017), pp. 135–162.
issn: 1024-2589. doi: 10.1177/1024258916687250.
[64] danah boyd and Kate Crawford. “Critical Questions for Big Data: Provocations
for a Cultural, Technological, and Scholarly Phenomenon”. en. In: Information,
Communication & Society 15.5 (June 2012), pp. 662–679. issn: 1369-118X, 1468-4462.
doi: 10.1080/1369118X.2012.678878.
[65] Marion Fourcade and Kieran Healy. “Classification Situations: Life-Chances in the
Neoliberal Era”. en. In: Accounting, Organizations and Society 38.8 (Nov. 2013), pp. 559–
572. issn: 03613682. doi: 10.1016/j.aos.2013.11.002.
[66] Steffen Mau. The Metric Society: On the Quantification of the Social. eng. Trans. by
Sharon Howe. Cambridge ; Medford, MA: Polity, 2019. isbn: 978-1-5095-3040-3.
[67] Sam Corbett-Davies and Sharad Goel. The Measure and Mismeasure of Fairness: A
Critical Review of Fair Machine Learning. Tech. rep. arXiv:1808.00023. arXiv:1808.00023
[cs] type: article. arXiv, Aug. 2018. doi: 10.48550/arXiv.1808.00023. url: http:
//arxiv.org/abs/1808.00023 (visited on 05/16/2022).
179
REFERENCES
[68] Jenny L. Davis, Apryl Williams, and Michael W. Yang. “Algorithmic reparation”. en.
In: Big Data & Society 8.2 (July 2021), p. 205395172110448. issn: 2053-9517, 2053-9517.
doi: 10.1177/20539517211044808. url: http://journals.sagepub.com/doi/10.
1177/20539517211044808 (visited on 05/16/2022).
[69] Geoffrey C. Bowker and Susan Leigh Star. Sorting things out: classification and its
consequences. Inside technology. tex.ids: bowker2000, bowker2000a. Cambridge, Mass:
MIT Press, 1999. isbn: 978-0-262-02461-7. url: https://mitpress.mit.edu/books/
sorting-things-out.
[70] Emile Durkheim and Marcel Mauss. Primitive Classification. Ed. and trans. by Rodney
Needham. University of Chicago Press, 1963. isbn: 0-226-17334-8.
[71] Pierre Bourdieu. Outline of a Theory of Practice. en. tex.ids: bourdieu1977a.
Cambridge: Cambridge University Press, 1977. isbn: 978-0-511-81250-7. doi: 10 .
1017 / CBO9780511812507. url: http : / / ebooks . cambridge . org / ref / id /
CBO9780511812507 (visited on 08/17/2019).
[72] Pierre Bourdieu. “Social Space and Symbolic Power”. en. In: Sociological Theory 7.1
(1989), pp. 14–25. issn: 07352751. doi: 10.2307/202060. url: https://www.jstor.
org/stable/202060?origin=crossref (visited on 08/17/2019).
[73] Pierre Bourdieu. Language and Symbolic Power. Englisch. New. Cambridge: Blackwell
Publishers, Dec. 1992. isbn: 978-0-7456-1034-4.
[74] Eviatar Zerubavel. The Fine Line: Making Distinctions in Everyday Life. 2nd ed.
University of Chicago Press, 1993. isbn: 978-0-226-98159-8.
[75] Susan Leigh Star. “The Ethnography of Infrastructure”. en. In: American Behavioral
Scientist 43.3 (Nov. 1999), pp. 377–391. issn: 0002-7642, 1552-3381. doi: 10.1177/
00027649921955326. url: http : / / journals . sagepub . com / doi / 10 . 1177 /
00027649921955326 (visited on 01/29/2021).
[76] Alex Hanna et al. “Towards a Critical Race Methodology in Algorithmic Fairness”. en.
In: Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency.
FAT* ’20. Barcelona, Spain: Association for Computing Machinery, 2020, pp. 501–512.
isbn: 978-1-4503-6936-7. doi: 10.1145/3351095.3372826. url: https://dl.acm.
org/doi/10.1145/3351095.3372826 (visited on 02/20/2020).
[77] Martha Lampland and Susan Leigh Star. Standards and Their Stories: How Quantifying,
Classifying, and Formalizing Practices Shape Everyday Life. Englisch. Illustrated Edition.
Ithaca: Cornell University Press, Jan. 2009. isbn: 978-0-8014-7461-3.
[78] Safiya Umoja Noble. Algorithms of Oppression: How Search Engines Reinforce Racism.
New York: NYU Press, 2018. isbn: 978-1-4798-4994-9.
[79] Danielle Keats Citron and Frank Pasquale. “The Scored Society: Due Process for
Automated Predictions”. en. In: Washington Law Review 89.1 (2014).
[80] Kathleen H. Pine and Max Liboiron. “The Politics of Measurement and Action”. In:
Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing
Systems. CHI ’15. New York, NY, USA: Association for Computing Machinery, 2015,
pp. 3147–3156. isbn: 978-1-4503-3145-6. doi: 10.1145/2702123.2702298.
180
REFERENCES
[81] Lisa Gitelman, ed. "Raw Data" Is an Oxymoron. Infrastructures Series. Cambridge,
Massachusetts ; London, England: The MIT Press, 2013. isbn: 978-0-262-51828-4.
[82] Michel Foucault and Colin Gordon. Power/knowledge: selected interviews and other
writings, 1972-1977. engfre. 1st American ed. New York: Pantheon Books, 1980. isbn:
978-0-394-51357-7 978-0-394-73954-0.
[83] Michel Foucault. “What Is Critique?” In: What is Enlightenment?: Eighteenth-Century
Answers and Twentieth-Century Questions. Ed. by James Schmidt. University of
California Press, 1996.
[84] Pierre Bourdieu. “The Social Space and the Genesis of Groups”. In: Theory and
Society 14.6 (1985). tex.ids: bourdieu, pp. 723–744. issn: 03042421, 15737853. doi:
10.1007/BF00174048. url: http://www.jstor.org/stable/657373.
[85] Milagros Miceli, Martin Schuessler, and Tianling Yang. “Between Subjectivity and
Imposition: Power Dynamics in Data Annotation for Computer Vision”. en. In:
Proceedings of the ACM on Human-Computer Interaction 4.CSCW2 (Oct. 2020),
pp. 1–25. issn: 2573-0142, 2573-0142. doi: 10 . 1145 / 3415186. url: https : / / dl .
acm.org/doi/10.1145/3415186 (visited on 10/16/2020).
[86] Ciaran Cronin. “Bourdieu and Foucault on power and modernity”. en. In: Philosophy
& Social Criticism 22.6 (Nov. 1996), pp. 55–85. issn: 0191-4537, 1461-734X. doi:
019145379602200603 (visited on 04/08/2020).
[87] Jürgen Link. “Dispositiv”. de. In: Foucault-Hanbuch. Ed. by Clemens Kammler et al.
Stuttgart: J.B. Metzler, 2014, pp. 237–242. isbn: 978-3-476-02559-3 978-3-476-01378-1.
doi: 10.1007/978-3-476-01378-1_27. url: http://link.springer.com/10.1007/
978-3-476-01378-1_27 (visited on 06/22/2021).
[88] Pierre Bourdieu. Classification struggles. eng. General sociology volume 1. Cambridge,
UK ; Medford, MA: Polity Press, 2018. isbn: 978-1-5095-1327-7.
[89] Michel Foucault. “Orders of discourse”. en. In: Social Science Information 10.2 (Apr.
1971). Publisher: SAGE Publications Ltd, pp. 7–30. issn: 0539-0184. doi: 10.1177/
053901847101000201. url: https : / / doi . org / 10 . 1177 / 053901847101000201
(visited on 08/19/2021).
[90] Siegfried Jäger and Florentine Maier. “Analysing discourses and dispositives: A
Foucauldian approach to theory and methodology”. In: Methods of critical discourse
studies (2016), pp. 109–136.
[91] J. I. (Hans) Bakker. “Grounded Theory Methodology and Grounded Theory Method:
Introduction to the Special Issue”. In: Sociological Focus 52.2 (Apr. 2019). Publisher:
Routledge _eprint: https://doi.org/10.1080/00380237.2019.1550592, pp. 91–106. issn:
0038-0237. doi: 10.1080/00380237.2019.1550592. url: https://doi.org/10.1080/
00380237.2019.1550592 (visited on 05/13/2022).
[92] Clay Spinuzzi. “The Methodology of Participatory Design”. In: Technical Communica-
tion 52.2 (May 2005), pp. 163–174.
181
REFERENCES
[93] Robert Thornberg. “Informed Grounded Theory”. en. In: Scandinavian Journal of
Educational Research 56.3 (June 2012), pp. 243–259. issn: 0031-3831, 1470-1170. doi:
10.1080/00313831.2011.581686. url: http://www.tandfonline.com/doi/abs/10.
1080/00313831.2011.581686 (visited on 08/12/2019).
[94] Virginia Braun and Victoria Clarke. “Using thematic analysis in psychology”. In:
Qualitative Research in Psychology 3 (Jan. 2006), pp. 77–101. doi: 10 . 1191 /
1478088706qp063oa.
[95] Kathy Charmaz. Constructing Grounded Theory: A Practical Guide through Qualitative
Analysis. en. Introducing Qualitative Methods Series. London ; Thousand Oaks, Calif:
Sage Publications, 2006. isbn: 978-0-7619-7352-2 978-0-7619-7353-9.
[96] Michael Muller. “Curiosity, Creativity, and Surprise as Analytic Tools: Grounded Theory
Method”. en. In: Ways of Knowing in HCI. Ed. by Judith S. Olson and Wendy A. Kellogg.
New York, NY: Springer, 2014, pp. 25–48. isbn: 978-1-4939-0378-8. doi: 10.1007/978-
1-4939-0378-8_2. url: https://link.springer.com/chapter/10.1007/978-1-
4939-0378-8_2 (visited on 01/15/2020).
[97] Michael Muller et al. “Machine Learning and Grounded Theory Method: Convergence,
Divergence, and Combination”. en. In: Proceedings of the 19th International Conference
on Supporting Group Work. GROUP ’16. tex.ids: muller2016a. Sanibel Island, Florida,
USA: Association for Computing Machinery, 2016, pp. 3–8. isbn: 978-1-4503-4276-6.
doi: 10.1145/2957276.2957280. url: http://dl.acm.org/citation.cfm?doid=
2957276.2957280 (visited on 03/18/2020).
[98] Barney G. Glaser and Anselm L. Strauss. Grounded theory: Strategien qualitativer
Forschung. ger. Hans Huber Programmbereich Pflege. OCLC: 845029525. Bern: Huber,
1998. isbn: 978-3-456-82847-3.
[99] Michel Foucault. The Archaeology of Knowledge: And the Discourse on Language.
English. New York: Vintage, Sept. 1982. isbn: 978-0-394-71106-5.
[100] Julian Hamann et al. “The Academic Dispositif: Towards a Context-Centred Discourse
Analysis”. en. In: Quantifying Approaches to Discourse for Social Scientists. Ed. by
Ronny Scholz. Cham: Springer International Publishing, 2019, pp. 51–87. isbn: 978-
3-319-97369-2 978-3-319-97370-8. doi: 10.1007/978-3-319-97370-8_3. url: http:
//link.springer.com/10.1007/978-3-319-97370-8_3 (visited on 04/08/2021).
[101] Joannah Caborn. “On the Methodology of Dispositive Analysis”. en. In: Critical
Approaches to Discourse Analysis Across Disciplines 1.1 (2016), pp. 115–123. issn:
1576-4737. doi: 10.5209/CLAC.53494. url: https://revistas.ucm.es/index.php/
CLAC/article/view/53494 (visited on 04/07/2021).
[102] Siegfried Jäger. Deutungskämpfe: Theorie und Praxis Kritischer Diskursanalyse. de.
Springer-Verlag, Mar. 2007. isbn: 978-3-531-15072-7.
[103] Valérie Larroche. The Dispositif: A Concept for Information and Communication
Sciences. en. 1st ed. Wiley, Apr. 2019. isbn: 978-1-78630-309-7 978-1-119-50872-4. doi:
10.1002/9781119508724. url: https://onlinelibrary.wiley.com/doi/book/10.
1002/9781119508724 (visited on 06/24/2021).
182
REFERENCES
[104] Magdalena Nowicka-Franczak. “Post-Foucauldian Discourse and Dispositif Analysis

in the Post-Socialist Field of Research: Methodological Remarks”. en. In: Qualitative
Sociology Review 17.1 (Feb. 2021), pp. 72–95. issn: 1733-8077. doi: 10.18778/1733-
8077.17.1.6. url: https://czasopisma.uni.lodz.pl/qualit/article/view/9271
(visited on 05/12/2021).
[105] Eric Corbett and Yanni Loukissas. “Engaging Gentrification as a Social Justice Issue in
HCI”. en. In: Proceedings of the 2019 CHI Conference on Human Factors in Computing
Systems. Glasgow Scotland Uk: ACM, May 2019, pp. 1–16. isbn: 978-1-4503-5970-2.
doi: 10.1145/3290605.3300510. url: https://dl.acm.org/doi/10.1145/3290605.
3300510 (visited on 08/24/2021).
[106] Sandjar Kozubaev et al. “Spaces and Traces: Implications of Smart Technology in
Public Housing”. In: Proceedings of the 2019 CHI Conference on Human Factors in
Computing Systems. New York, NY, USA: Association for Computing Machinery, 2019,
pp. 1–13. isbn: 9781450359702. url: https://doi.org/10.1145/3290605.3300669.
[107] Mariam Asad et al. “Creating a Sociotechnical API: Designing City-Scale Community
Engagement”. In: Proceedings of the 2017 CHI Conference on Human Factors in
Computing Systems. CHI ’17. Denver, Colorado, USA: Association for Computing
Machinery, 2017, pp. 2295–2306. isbn: 9781450346559. doi: 10.1145/3025453.3025963.
url: https://doi.org/10.1145/3025453.3025963.
[108] Devansh Saxena and Shion Guha. “Conducting Participatory Design to Improve
Algorithms in Public Services: Lessons and Challenges”. In: Conference Companion
Publication of the 2020 on Computer Supported Cooperative Work and Social Computing.
New York, NY, USA: Association for Computing Machinery, 2020, pp. 383–388. isbn:
9781450380591. url: https://doi.org/10.1145/3406865.3418331.
[109] Asbjørn Ammitzbøll Flügge. “Perspectives from Practice: Algorithmic Decision-Making
in Public Employment Services”. In: Companion Publication of the 2021 Conference on
Computer Supported Cooperative Work and Social Computing. New York, NY, USA:
Association for Computing Machinery, 2021, pp. 253–255. isbn: 9781450384797. url:
https://doi.org/10.1145/3462204.3481787.
[110] Michael A. Madaio et al. “Co-Designing Checklists to Understand Organizational
Challenges and Opportunities around Fairness in AI”. In: Proceedings of the 2020 CHI
Conference on Human Factors in Computing Systems. CHI ’20. tex.ids: madaio2020a,
madaio2020b. Honolulu, HI, USA: Association for Computing Machinery, Apr. 2020,
pp. 1–14. isbn: 978-1-4503-6708-0. doi: 10 . 1145 / 3313831 . 3376445. url: https :
//dl.acm.org/doi/10.1145/3313831.3376445 (visited on 05/04/2020).
[111] Allison Woodruff et al. “A Qualitative Exploration of Perceptions of Algorithmic
Fairness”. In: Proceedings of the 2018 CHI Conference on Human Factors in Computing
Systems. New York, NY, USA: Association for Computing Machinery, 2018, pp. 1–14.
isbn: 9781450356206. url: https://doi.org/10.1145/3173574.3174230.
183
REFERENCES
[112] Christine T. Wolf and Jeanette L. Blomberg. “Ambitions and Ambivalences in

Participatory Design: Lessons from a Smart Workplace Project”. In: Proceedings of the
16th Participatory Design Conference 2020 - Participation(s) Otherwise - Volume 1.
PDC ’20. Manizales, Colombia: Association for Computing Machinery, 2020, pp. 193–202.
isbn: 9781450377003. doi: 10.1145/3385010.3385029. url: https://doi.org/10.
1145/3385010.3385029.
[113] Michael Katell et al. “Toward Situated Interventions for Algorithmic Equity: Lessons
from the Field”. In: Proceedings of the 2020 Conference on Fairness, Accountability,
and Transparency. FAT* ’20. Barcelona, Spain: Association for Computing Machinery,
2020, pp. 45–55. isbn: 9781450369367. doi: 10.1145/3351095.3372874. url: https:
//doi.org/10.1145/3351095.3372874.
[114] Anna Brown et al. “Toward Algorithmic Accountability in Public Services: A Qualitative
Study of Affected Community Perspectives on Algorithmic Decision-Making in Child
Welfare Services”. In: Proceedings of the 2019 CHI Conference on Human Factors
in Computing Systems. CHI ’19. Glasgow, Scotland Uk: Association for Computing
url: https://doi.org/10.1145/3290605.3300271.
[115] Samantha Robertson and Niloufar Salehi. What If I Don’t Like Any Of The Choices?
The Limits of Preference Elicitation for Participatory Algorithm Design. 2020. doi:
10.48550/ARXIV.2007.06718. url: https://arxiv.org/abs/2007.06718.
[116] Mona Sloane et al. “Participation is not a Design Fix for Machine Learning”. In:
arXiv:2007.02423 [cs] (Aug. 2020). arXiv: 2007.02423. url: http://arxiv.org/abs/
2007.02423 (visited on 04/15/2022).
[117] Donald Martin Jr et al. “Participatory problem formulation for fairer machine learning
through community based system dynamics”. In: arXiv preprint arXiv:2005.07572
(2020).
[118] Sasha Costanza-Chock. Design Justice: Community-Led Practices to Build the Worlds
We Need. Englisch. Cambridge, MA: The MIT Press, Mar. 2020. isbn: 978-0-262-04345-8.
url: https://design-justice.pubpub.org/.
[119] Frauke Mörike. Ethnography for Human Factors Researchers. Collecting and Interweav-
ing Threads of HCI. CHI2019, Glasgow, Scotland, May 2019.
[120] Uwe Flick, Ernst von Kardorff, and Ines Steinke, eds. A companion to qualitative research.
en. London ; Thousand Oaks, Calif: Sage Publications, 2004. isbn: 978-0-7619-7374-4.
[121] Anselm L. Strauss. Qualitative analysis for social scientists. en. Cambridge [Cam-
bridgeshire] ; New York: Cambridge University Press, 1987. isbn: 978-0-521-32845-6
978-0-521-33806-6.
[122] Virginia Braun et al. “Thematic Analysis”. In: Handbook of Research Methods in Health
Social Sciences. Ed. by Pranee Liamputtong. Singapore: Springer Singapore, 2019,
pp. 843–860. isbn: 978-981-10-5251-4. doi: 10.1007/978-981-10-5251-4_103. url:
https://doi.org/10.1007/978-981-10-5251-4_103.
184
REFERENCES
[123] Emeline Brulé and S. Finnigan. Thematic Analysis in HCI. fr-FR. Billet. 2020. url:
https://sociodesign.hypotheses.org/555 (visited on 01/02/2022).
[124] Milagros Miceli et al. “Documenting Data Production Processes: A Participatory
Approach for Data Work”. In: Proc. ACM Hum.-Comput. Interact. 6.CSCW2 (Nov.
2022). doi: 10.1145/3555623. url: https://doi.org/10.1145/3555623.
[125] Alexandra K. Murphy, Colin Jerolmack, and DeAnna Smith. “Ethnography, Data
Transparency, and the Information Age”. en. In: Annual Review of Sociology 47.1
(July 2021), pp. 41–61. issn: 0360-0572, 1545-2115. doi: 10 . 1146 / annurev - soc -
090320- 124805. url: https://www.annualreviews.org/doi/10.1146/annurev-
soc-090320-124805 (visited on 06/27/2022).
[126] Jan Nespor. “Anonymity and Place in Qualitative Inquiry”. en. In: Qualitative
Inquiry 6.4 (Dec. 2000), pp. 546–569. issn: 1077-8004, 1552-7565. doi: 10 . 1177 /
107780040000600408. url: http : / / journals . sagepub . com / doi / 10 . 1177 /
107780040000600408 (visited on 06/27/2022).
[127] Will C. van den Hoonaard. “Is Anonymity an Artifact in Ethnographic Research?” en. In:
Journal of Academic Ethics 1.2 (2003), pp. 141–151. issn: 1570-1727. doi: 10.1023/B:
JAET . 0000006919 . 58804 . 4c. url: http : / / link . springer . com / 10 . 1023 / B :
JAET.0000006919.58804.4c (visited on 06/27/2022).
[128] Geoffrey Walford. “The impossibility of anonymity in ethnographic research”. en. In:
Qualitative Research 18.5 (Oct. 2018), pp. 516–525. issn: 1468-7941, 1741-3109. doi:
1468794118778606 (visited on 06/27/2022).
[129] Hella von Unger. “Ethical Reflexivity as Research Practice”. In: Historical Social
Research / Historische Sozialforschung 46.2 (2021), pp. 186–204. issn: 0172-6404. url:
https://www.jstor.org/stable/27032978 (visited on 06/27/2022).
[130] Enrico Di Minin et al. “How to address data privacy concerns when using social
media data in conservation science”. en. In: Conservation Biology 35.2 (Apr. 2021),
pp. 437–446. issn: 0888-8892, 1523-1739. doi: 10.1111/cobi.13708. url: https:
//onlinelibrary.wiley.com/doi/10.1111/cobi.13708 (visited on 06/27/2022).
[131] Kate Crawford and Trevor Paglen. Excavating AI. en-US. https://www.excavating.ai.
2019.
[132] Julia Powles and Helen Nissenbaum. The Seductive Diversion of ‘Solving’ Bias
in Artificial Intelligence. en. Dec. 2018. url: https : / / onezero . medium . com /
the - seductive - diversion - of - solving - bias - in - artificial - intelligence -
890df5e5ef53 (visited on 02/20/2020).
[133] Nick Couldry and Ulises Ali Mejias. The costs of connection: how data is colonizing
human life and appropriating it for capitalism. Culture and economic life. Stanford,
California: Stanford University Press, 2019. isbn: 978-1-5036-0366-0 978-1-5036-0974-7.
185
REFERENCES
[134] Lora Aroyo and Chris Welty. “Truth Is a Lie: Crowd Truth and the Seven Myths of
Human Annotation”. en. In: AI Magazine 36.1 (Mar. 2015), p. 15. issn: 0738-4602,
0738-4602. doi: 10.1609/aimag.v36i1.2564. url: https://aaai.org/ojs/index.
php/aimagazine/article/view/2564 (visited on 11/03/2020).
[135] Hannah Davis. A Dataset Is a Worldview. en. https://towardsdatascience.com/a-dataset-
is-a-worldview-5328216dd44d. Mar. 2020.
[136] Jennifer Wortman Vaughan and Hanna Wallach. “A Human-Centered Agenda for
Intelligible Machine Learning”. In: Machines We Trust: Getting Along with Artificial
Intelligence. 2020.
[137] Kate Crawford and Vladan Joler. “Anatomy of an AI System”. en. In: Virtual Creativity
9.1 (Dec. 2019), pp. 117–120. issn: 2397-9704. doi: 10 . 1386 / vcr _ 00008 _ 7. url:
https://www.ingentaconnect.com/content/10.1386/vcr_00008_7 (visited on
06/25/2022).
[138] Agathe Balayn, Bogdan Kulynych, and Seda Gürses. “Exploring Data Pipelines through
the Process Lens: a Reference Model for Computer Vision”. en. In: (2021), p. 8.
[139] Eun Seo Jo and Timnit Gebru. “Lessons from archives: strategies for collecting
sociocultural data in machine learning”. en. In: Proceedings of the 2020 Conference
on Fairness, Accountability, and Transparency. Barcelona Spain: ACM, Jan. 2020,
pp. 306–316. isbn: 978-1-4503-6936-7. doi: 10.1145/3351095.3372829. url: https:
//dl.acm.org/doi/10.1145/3351095.3372829 (visited on 05/21/2021).
[140] Morgan Klaus Scheuerman, Emily Denton, and Alex Hanna. “Do Datasets Have Politics?
Disciplinary Values in Computer Vision Dataset Development”. en. In: arXiv:2108.04308
[cs] (Sept. 2021). doi: 10.1145/3476058. url: http://arxiv.org/abs/2108.04308
(visited on 10/01/2021).
[141] Milagros Miceli et al. “Documenting Computer Vision Datasets: An Invitation to
Reflexive Data Practices”. en. In: Proceedings of the 2021 ACM Conference on Fairness,
Accountability, and Transparency. Virtual Event Canada: ACM, Mar. 2021, pp. 161–172.
[142] Susan Leigh Star and James R. Griesemer. “Institutional Ecology, ‘Translations’ and
Boundary Objects: Amateurs and Professionals in Berkeley’s Museum of Vertebrate
Zoology, 1907-39”. en. In: Social Studies of Science 19.3 (Aug. 1989), pp. 387–420. issn:
0306-3127, 1460-3659. doi: 10.1177/030631289019003001. url: http://journals.
[143] Susan Leigh Star. “This is Not a Boundary Object: Reflections on the Origin of a
Concept”. en. In: Science, Technology, & Human Values 35.5 (Sept. 2010), pp. 601–617.
issn: 0162-2439, 1552-8251. doi: 10.1177/0162243910377624. url: http://journals.
186
REFERENCES
[144] Luciana S. Buriol et al. “Temporal Analysis of the Wikigraph”. In: 2006
IEEE/WIC/ACM International Conference on Web Intelligence (WI 2006 Main
Conference Proceedings)(WI’06). Hong Kong: IEEE, Dec. 2006, pp. 45–51. isbn:
978-0-7695-2747-5. doi: 10.1109/WI.2006.164. url: http://ieeexplore.ieee.org/
document/4061340/ (visited on 06/28/2022).
[145] Fernanda B. Viégas, Martin Wattenberg, and Kushal Dave. “Studying cooperation and
conflict between authors with history flow visualizations”. en. In: Proceedings of the
2004 conference on Human factors in computing systems - CHI ’04. Vienna, Austria:
ACM Press, 2004, pp. 575–582. isbn: 978-1-58113-702-6. doi: 10.1145/985692.985765.
url: http : / / portal . acm . org / citation . cfm ? doid = 985692 . 985765 (visited on
06/27/2022).
[146] Anamika Chhabra, Rishemjit Kaur, and S. R.S. Iyengar. “Dynamics of Edit War
Sequences in Wikipedia”. en. In: Proceedings of the 16th International Symposium
on Open Collaboration. Virtual conference Spain: ACM, Aug. 2020, pp. 1–10. isbn:
978-1-4503-8779-8. doi: 10.1145/3412569.3412585. url: https://dl.acm.org/doi/
10.1145/3412569.3412585 (visited on 06/28/2022).
[147] Aniket Kittur et al. “He says, she says: conflict and coordination in Wikipedia”. en. In:
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. San
Jose California USA: ACM, Apr. 2007, pp. 453–462. isbn: 978-1-59593-593-9. doi: 10.
1145/1240624.1240698. url: https://dl.acm.org/doi/10.1145/1240624.1240698
(visited on 06/28/2022).
[148] Kristen M Scott et al. “Algorithmic Tools in Public Employment Services: Towards a
Jobseeker-Centric Perspective”. en. In: (2022), p. 11.
[149] Adriana Alvarado Garcia et al. “Crossing Data: Building Bridges with Activist and
Academic Practices from and for Latin America (Cruzar datos: Tendiendo Puentes con
Prácticas Activistas y Académicas desde y para América Latina)”. In: CHI Conference on
Human Factors in Computing Systems Extended Abstracts. CHI EA ’22. New York, NY,
USA: Association for Computing Machinery, Apr. 2022, pp. 1–6. isbn: 978-1-4503-9156-6.
doi: 10.1145/3491101.3505222. url: https://doi.org/10.1145/3491101.3505222
(visited on 04/29/2022).
[150] Christopher A. Le Dantec and Sarah Fox. “Strangers at the Gate: Gaining Access,
Building Rapport, and Co-Constructing Community-Based Research”. In: Proceedings
of the 18th ACM Conference on Computer Supported Cooperative Work &; Social
Computing. CSCW ’15. Vancouver, BC, Canada: Association for Computing Machinery,
2015, pp. 1348–1358. isbn: 9781450329224. doi: 10.1145/2675133.2675147. url:
https://doi.org/10.1145/2675133.2675147.
[151] Isto Huvila. “The politics of boundary objects: Hegemonic interventions and the making
of a document”. In: Journal of the American Society for Information Science and
Technology 62.12 (2011), pp. 2528–2539. doi: https://doi.org/10.1002/asi.21639.
eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1002/asi.21639. url:
https://onlinelibrary.wiley.com/doi/abs/10.1002/asi.21639.
187
REFERENCES
[152] Beverley Hawkins, Annie Pye, and Fernando Correia. “Boundary objects, power, and
learning: The matter of developing sustainable practice in organizations”. en. In:
Management Learning 48.3 (July 2017), pp. 292–310. issn: 1350-5076, 1461-7307. doi:
1350507616677199 (visited on 01/06/2022).
[153] Nithya Sambasivan et al. ““Everyone wants to do the model work, not the data work”:
Data Cascades in High-Stakes AI”. en. In: Proceedings of the 2021 CHI Conference on
Human Factors in Computing Systems. Yokohama Japan: ACM, May 2021, pp. 1–15.
[154] Christine T. Wolf et al. “Mapping the "How" of Collaborative Action: Research Methods
for Studying Contemporary Sociotechnical Processes”. In: Conference Companion
Publication of the 2019 on Computer Supported Cooperative Work and Social Computing.
CSCW ’19. Austin, TX, USA: Association for Computing Machinery, 2019, pp. 528–532.
isbn: 9781450366922. doi: 10.1145/3311957.3359441. url: https://doi.org/10.
1145/3311957.3359441.
[155] Samir Passi and Phoebe Sengers. “Making data science systems work”. en. In: Big
Data & Society 7.2 (July 2020), p. 205395172093960. issn: 2053-9517, 2053-9517. doi:
2053951720939605 (visited on 05/06/2021).
[156] James R. Wallace, Saba Oji, and Craig Anslow. “Technologies, Methods, and Values:
Changes in Empirical Research at CSCW 1990 - 2015”. In: Proc. ACM Hum.-Comput.
Interact. 1.CSCW (Dec. 2017). doi: 10.1145/3134741. url: https://doi.org/10.
1145/3134741.
[157] Casey Fiesler et al. “Qualitative Methods for CSCW: Challenges and Opportunities”.
In: Conference Companion Publication of the 2019 on Computer Supported Cooperative
Work and Social Computing. CSCW ’19. Austin, TX, USA: Association for Computing
url: https://doi.org/10.1145/3311957.3359428.
[158] Ben Green. Data Science as Political Action: Grounding Data Science in a Politics of
Justice. en. SSRN Scholarly Paper ID 3658431. Rochester, NY: Social Science Research
Network, July 2020. doi: 10.2139/ssrn.3658431. url: https://papers.ssrn.com/
abstract=3658431 (visited on 04/30/2021).
188
A
Interview Guides
This appendix comprises the interview guides used in this dissertation. They are organized
around the different interview methods and interview partners. It must be mentioned that
these guides are not fixed questionnaires. This means that they helped guide the conversations
with the interview partners but were not followed to the letter in every interview situation.
The questions’ order and phrasing changed depending on the interview partner and the tone
of the conversation. Some questions were omitted in cases where the interview partner had
already provided details on the points in question. And some questions emerging from specific
conversations with the participants were improvised.
A.1 In-depth Interviews
A.1.1 Data Workers

1. What is your position at S1/S2?
2. How long have you been doing this work?
3. How did you get in contact with the company? Where did you hear about the job?
4. What was the recruitment process like?
5. What did you do before? (previous jobs, education)
6. How do you feel about working as a data worker?
7. What is your job about? Can you describe typical tasks?
8. What can you tell me about S1/S2 as a work place? How do you feel working here?
9. What does your contract look like? (benefits, wages, shifts) What are, in your opinion,
the pros and cons of this model?
189
A. Interview Guides
10. Do you have a desk at the office or work from home? Do you use your own device or
was it given to you by the company?
11. Did you receive training before starting as a data worker? Do you still receive training?
What kind?
12. Have you received training in ethics? Data protection? Avoidance of bias?
13. Let’s imagine a new project comes in: Can you walk me trough the process, starting
with the client sending the task? What happens then?
14. Whenever a new task/project comes in, do you know who the client is and what is their
core business?
15. What languages are used in the data you work with and the instructions? What is the
language used to communicate with managers and clients?
16. Do you work in teams? Do you consult with colleagues?
17. Can you describe what is done with the data after you have labeled it? What is it used
for?
18. What do you think is the impact and importance of your work? (for society, the tech
industry, own community)
19. How and by whom is your work evaluated? (supervision, control, quality standards)
20. What happens if you don’t understand the task? Whom do you ask?
21. Have you ever been reprimanded by managers or clients because they were unsatisfied
with your work? What happened?
22. Can you tell me about a time when you had to resolve disagreement, for instance, in
labeling tasks? Are there times when you find mistakes in the logic of a task or the
instructions don’t make sense to you? What do you do then?
23. What would you say is the best and worst part of your job?
24. What are your plans for the future (professionally)? Do you want to continue working in
tech?
25. Does this experience as data worker help you for future jobs? What are important skills
and leanings?
26. How old are you?
27. Where do you live? With whom?
190
A.1 In-depth Interviews
28. Where were you born? What is your nationality / status?
29. (if migrant) Since when do you live here?
30. (if migrant) Why did you decide to migrate? How did you get here?
A.1.2 Managers / Founders at S1 and S2

1. What is your position at S1/S2?
3. When was the company founded? How was the process? How did the idea come up?
4. What services does the company provide?
5. Who are the clients? Have you worked for big corporations? What is the deal you are
most proud of?
6. Please describe: How is a typical project conducted? What does the workflow look like?
7. What is the typical duration of a project? Do you have long-term projects / recurring
clients?
8. Please describe in your own words: what does it mean to be an impact sourcing company?
9. How many people work at S1/S2? How many in management positions? How many
data workers?
10. What is, in general terms, the background of the data workers? (education, experience,
age, migration)
11. How and where are they recruited? How does the hiring process go?
12. What things do you look for in a data worker?
13. What kind of conditions does the company offer workers? What do contracts, shifts, and
wages look like?
14. What kind of training do they receive? How are new projects and instructions briefed to
workers?
15. What kind of equipment is used? Does the company provide that equipment?
16. What kind of software is used?
17. What languages are used in the labels and the instructions? What is the language used
to communicate with workers and clients?
191
A. Interview Guides
18. How would you describe the impact of this work on workers’ lives?
19. How does the company measures quality, for instance, in data labeling?
20. Do clients bring their own quality standards?
21. Does the company have quality measurements regarding ethics, bias, transparency, and
data protection? Has any of your clients brought something of this sort up? How did
that conversation go?
22. How and by whom is the work evaluated? How is the performance of workers evaluated?
23. What are, in your opinion, the most pressing challenges regarding ethics and AI?
24. What are the challenges posed by the impact sourcing model?
25. What is the most rewarding aspect?
26. If someone would observe this organizations from the outside, how would they describe
it?
27. I’m sure you have heard critical voices that have described these facilities as “digital
sweatshops:” What can you say about that?
28. What are your plans for your professional future? What are the plans for the company?
29. How old are you?
30. Where do you live? With whom?
32. (if migrant) Since when do you live here?
33. (if migrant) Why did you decide to migrate? How did you get here?
A.2 Expert Interviews
A.2.1 Other BPO Managers/Founders

1. What is your position at [BPO]?
3. What did you do before? (previous experience, education)
192
A.2 Expert Interviews
4. In a few sentence, what does [BPO] do? What is the core idea? What services does the
company provide?
5. When was the company founded? How was the process?
6. How many people work at [BPO]? How many in management positions? How many data
workers?
7. How is work structured? What kind of conditions does [BPO] offer workers? What do
contracts, shifts, and wages look like?
8. How common are these conditions in the BPO sector?
9. What kind of training do the workers receive? How are new projects and instructions
briefed to workers?
10. How would you describe the impact of this work on workers’ lives?
11. Who are the clients?
12. What do clients value the most?
13. How competitive is the data processing market? How does your company stand out?
14. How is the quality of data measured?
15. Do clients bring their own quality standards?
16. Is it common to use quality measurements regarding ethics, bias, transparency, and data
protection? Has any of your clients brought something of this sort up? How did that
conversation go?
17. How and by whom is the work evaluated? How is the performance of workers evaluated?
18. What languages are used in the labels and the instructions? What is the language used
to communicate with workers and clients?
19. What are the future plans of the company?
21. What are the challenges posed by the impact sourcing model?
22. What is the most rewarding aspect?
193
A. Interview Guides
24. (if migrant) Since when do you live here? Did you move because of the company / this
job?
A.2.2 ML Practitioners
1. What is your position within [company]?
2. How long have you worked here?
3. How would you describe your job? What are your responsibilities and main challenges?
4. In a few sentence, what does [company] do?
5. What is the main product/service?
6. Who are the users?
7. What kind of data does your product require?
8. Where does the training data come from?
9. What is your experience outsourcing data work? Where to?
10. (if informant has experience with both BPOs and platforms) Why did the outsourcing
strategy changed, from (BPO/platform) to (platform/BPO)?
11. What would you say are the differences (between different BPOs / between different
platforms / between BPO and platform) in terms of price, quality, communication?
12. How do you measure the quality of data?
13. How do you communicate with data service providers? What are the challenges?
14. Do you have direct contact with the data workers?
15. How are data work instructions developed? How does the process look like? Who is in
charge?
16. How are the categories and classes for data labeling and collection agreed upon? What
do they derive from?
17. How does the company measures the quality of the products? What are the quality
standards that you use? How are products tested? What processes do you have in place
to determine whether a model is good enough for deployment?
18. Do corporate clients use their own quality standards? Can you give me an example?
194
A.3 Semi-structured Interviews on Documentation Practices
19. What could be the potential drivers for implementing more transparent systems and
processes?
A.3 Semi-structured Interviews on Documentation Practices
A.3.1 BPO managers and ML practitioners

1. What is your position within [company]?
2. How long have you worked here?
3. How would you describe your job? What are your responsibilities and main challenges?
4. Who are the company’s main clients?
5. How do you communicate with clients?
6. What is this communication process like? What are challenges?
7. Do you have direct contact with data workers /with clients?
8. How do you measure the quality of data?
9. Does [company] have a process in place to determine whether a dataset is good enough
for deployment?
10. Is the data production process documented? How?
11. Is the training process documented? How?
12. Does [company] documents iterations and transformations in data? (re-labeling, changes
in the instructions)
13. (If no documentation is used) Is this something you have considered? Why (not)?
14. (If any form of documentation is used) What is the purpose of the documentation? What
is it used for?
15. (If any form of documentation is used) Who is in change of documenting?
16. (If any form of documentation is used) Who has access and who uses the documentation?
17. (If any form of documentation is used) Do stakeholders outside your organization have
access to the documentation?
195
A. Interview Guides
18. (If any form of documentation is used) How and when was the current documentation
system implemented? Has it evolved over time? How?
19. What is your personal experience with documentation? What are the positive and the
negative aspects?
20. What are factors that hinder the implementation of documentation in organizations?
21. What could be the potential drivers for implementing more transparent processes and
documentation?
196
Workshop Facilitation Templates
B
This appendix comprises the activities and templates used at the co-design workshop sessions
that we conducted with S1 and S2. The workshop session took place online, via Zoom. The
activities were conducted on the platform Miro. Each template contains instructions and
information on whether the activity was conducted during the main session or in breakout
groups. The templates used with S1 workers are in Spanish. The ones used at S2 are in
English and Arabic.
B.1 Workshops with S1
197
B. Workshop Facilitation Templates
CO-DESIGNING DATASET DOCUMENTATION
DÍA 1
Tribu ML
managers
2
horas
Co-Designing
Data Documentation
198
INTRODUCCIÓN: DATOS Y DOCUMENTACIÓN 1 min. facilitador/a
AGENDA:
DÍA 1 tribu ML
managers
2
horas
DÍA 2 tribu ML
managers
2
horas
20 Min. 10 Min.
Todos Introducción y Presentaciones Grupos
Actividad 4: Generando preguntas
20 Min. 10 Min.
Todos
Actividad 1: Entendiendo prácticas de documentación Grupos
Actividad 5: Proveyendo respuestas
20 Min. 32 Min.
Grupos Actividad 2: Entendiendo roles y procesos All
Activity 6: Identificando desafíos
20 Min.
15 Min.
Presentaciones All
Activity 7: Interviniendo el wiki
Todos
30 Min.
10 Min.
Actividad 3: Documentando 1 All
Presentaciones
Grupos
15 Min. 10 Min.
Todos
Presentaciones All
Cierre
10 Min.
Todos
Cierre y preparación para día 2
Co-Designing
Data Documentation
199
TEMA Y METAS DEL WORKSHOP:

¿Cómo producir datos más transparentes para machine learning?
Co-Designing
Data Documentation
200
TEMA Y METAS DEL WORKSHOP:

¿Cómo producir datos más transparentes para machine learning?
De documentar datasets...
A documentar procesos de producción de datos.
Diseño Datos Modelo Aplicación

XXXXXXXX
Co-Designing
Data Documentation
201
HERRAMIENTAS DEL WORKSHOP: Estamos

acá
¿Cómo vamos a trabajar?
3.A Co-Designing
Data Documentation
October 2021
DOCUMENTANDO EL PROYECTO 1 | GRUPO A

¿Cómo quedaría armado el
Lxs analistas arman la descripción del servicio. 10 min. grupos brainstorming y visualización
wiki del projecto 1? 01
INSTRUCCIONES
brainstorming participación activa grupos 10 min.
Esta actividad require tu participación activa para visualizar

la discusión.
Nos vamos a dividir en grupos y cada grupo va a a trabajar en
documentar el brainstroming.
Desde la perspectiva de cada grupo, vamos a recrear cada

una de las secciones del wiki para el proyecto 1. Tu grupo es
el de los analistas y la sección del wiki a recrear es la de la
descripción del servicio.
1. Usen post-its para recrear la descripción del servicio para

este proyecto: ¿Qué información incluyen? ¿Por qué?
2. Usen el ícono
! para marcar información
especialmente relevante para la descripción del servicio.
3. Usen el ícono ? para marcar información que los

analistas no tienen disponible.
GRUPO A:
Participante 1
Participante 2
HERRAMIENTAS ÚTILES Participant 3
Participante 4
Mila (moderación)
! ! ? ? ! ? Cristhian (notas)
Co-Designing
Data Documentation
202
INTRODUCCIÓN: REGLAS PARA TRABAJAR JUNTOS 2 min. facilitador/a
ORGANIZACIÓN:
El wokshop va a ser grabado incluyendo todo lo que se escriba en el chat.
Para respetar la privacidad de todos, por favor no compartas capturas de pantalla o materiales del workshop en redes sociales.
PARTICIPACIÓN:
Sé paciente, amable y constructivx. Mostrá respeto por la persona que esté hablando.
Todas las voces son importantes y algunas personas son tímidas. No acaparés el micrófono y animá a otros a compartir sus ideas.
No existen las preguntas o ideas tontas. Sé respetuosx al dar feedback.
Si querés hablar, usá la función "levantar la mano" en Zoom. Permanecé muteadx si no es tu turno de hablar.
CUIDÉMONOS:
Si necesitás una pausa, no hay problema. Usá el chat de Zoom para avisarnos que vas ausentarte por un rato. Muteate, apagá tu
camara y tomate tu tiempo.
En general, intentá no desviarte del tema del workshop. Si vas a hablar de temas sensibles o privados, por favor introducí una
advertencia.
No dudés en mandarnos un mensaje privado en Zoom, email o Whatsapp si te sentís mal, tenés problemas técnicos o con tu grupo,
o si querés hacernos alguna pregunta. Estamos para ayudarte a la vez que mantenemos la confidencialidad.
Co-Designing
Data Documentation
203
INTRODUCCIÓN: CONOZCÁMONOS 10 min. todos
PRESENTEMONOS! QUIENES ESTAMOS HOY ACÁ?
¿Cómo te llamas? PARTICIPANTES FACILITADORES

¿Cuál es tu trabajo en S1?
Mila
¿Cuánto hace que trabajas en este Adriana
puesto? Julian
¿Cuál es tu animal favorito?
Decinos algo importante que DOCUMENTACIÓN
tengamos que saber sobre vos... Mariana

Beatriz
Agregá tu nombre Cristhian
SOPORTE TÉCNICO
Ling
ILUSTRADOR
Marc
Co-Designing
Data Documentation
204
2 INTRODUCCIÓN A ACTIVIDAD 2: EL USE-CASE 5 min. facilitador/a
RECOLECCIÓN, MODIFICACIÓN, ETIQUETADO:

¿De qué se trata el proyecto?
EL cliente contrata a S1 para crear un dataset con imágenes de DNIs.
El dataset se va a usar para entrenar a un sistema que permite reconocer

identificaciones falsas.
Instrucciones:
Recolección de datos: Búsqueda de DNI’s argentinos, nativos y
extranjeros.
Edición de datos: Modificar las imágenes de DNI usando fotocopias,
pegando una foto diferente a la correspondiente, usando Photoshop.
Etiquetado de datos: Una vez recolectados los DNIs y modificados en
distintas maneras, procedemos a etiquetar las imágenes indicando de
qué tipo de falsificación se trata (fotocopia, photoshop, foto de
internet, reemplazo de la foto de la persona, etc.)
Co-Designing
Data Documentation
205
1
October 2021
Co-Designing
Data Documentation
ENTENDIENDO CÓMO SE DOCUMENTA EN S1
¿Qué tipos de documentación Todo lo que hay que saber sobre el Wiki
20 min. todos brainstorming
hay y cómo se usan?

- ¿Te parece útil
-¿Cuándo se usa el wiki por
primera vez? esta forma de
- ¿Se adapta para cada
documentar?
¿Cómo se usa el wiki? cliente?
¿Qué cosas funcionan bien y qué se puede mejorar? - ¿Cuánto tiempo
INSTRUCCIONES - ¿El cliente tiene acceso?
- ¿Quién arma el wiki? lleva este tipo de
- ¿Quién lo usa?
documentación?
brainstorming oral todos 20 min.
Esta actividad es un brainstorming oral. Vamos a

documentar las ideas que surjan usando post-its
violetas.
Los organizadores se van a ocupar de escribir en
los post-its. No hace falta que escribas o uses este
canvas pero podes hacerlo si así lo deseas.
Guiadxs por las preguntas escritas en los post-its

verdes, vamos a hablar sobre las practicas - ¿En qué situaciones
actuales de documentación dentro de S1: concretas se usa el wiki? - ¿Para qué sirve el
¿Quién lo usa y para qué? wiki y para qué no?
Uso del wiki - ¿Qué se hace en caso de
dudas al documentar?
- ¿Qué información
Ventajas y desventajas - ¿Hay casos en los que la falta y cuál sobra en
Uso de la información y posibilidad de feedback información documentada el wiki?
les haya hecho pensar en
Limitaciones cuestiones éticas?
¿Para qué sirve la información del wiki? ¿Cuáles son las limitaciones del wiki?
206
2.A Co-Designing
Data Documentation
October 2021
GRUPO A:
Participante 1
Participante 2
Participante 3 ENTENDIENDO ROLES Y PROCESOS | GRUPO A
Participante 4
¿Cómo se llevan a cabo los Mila (moderación)

Cristhian (notas)
Perspectiva de lxs analistas. 20 min. grupos brainstorming y visualización
proyectos y cómo se 01
Personaje: Bruno
relacionan los diferentes roles? 01
02 Fases del proceso Kick-off Onboarding Conducción del Proyecto

INSTRUCCIONES
Acciones
03 ¿Qué hace el personaje en
brainstorming participación activa grupos 20 min. cada fase?
Esta actividad require tu participación activa para visualizar 04 Interacciones

la discusión. ¿Con quiénes interactúa en
cada fase? ¿Cómo?
documentar el brainstroming. 05 Razonamiento
¿Qué está pensando el
personaje?
La discusión se basa en la perspectiva de unx analista. Creen un
01 personaje con estas características. Pónganle nombre y
descríbanlo.
Identifiquen las fases más importantes en la conducción de
05 Sentimientos
¿Qué siente el personaje en
02 proyectos dentro de S1. Agreguen fases de ser necesario. cada fase?
03 Describan la experiencia: ¿Qué se hace en cada fase?
Describan las interacciones: ¿Quién más está presente en Liderazgo

04 cada fase?
06
¿Quién lidera cada fase?
Denle vida a la experiencia: ¿Qué sentimientos y pensamientos

05
negativos y positivos tiene el personaje? Usen emojis para ilustrar.
06 Asignen una persona responsable o que lidere cada fase. 07 Documentación

¿Qué información se
Describan qué tipo de documentación se genera en cada fase. registra en cada fase? ¿Qué
07 tipo de documentos se
producen?
Escriban ideas, desafíos, y comentarios que surjan durante el
08
brainstorming. 09 TAREA EXTRA:
08 Notas y Observaciones
¿Dónde se necesitaría más
documentación?
Usen este sticker
! ! ? ? para marcar fases, acciones,

o interacciones que faltaría
documentar.
207
3.A Co-Designing
Data Documentation
October 2021
DOCUMENTANDO ESTE PROYECTO | GRUPO A

INSTRUCCIONES

la discusión.


2. Usen el ícono

GRUPO A:
Participante 1
Participante 2
HERRAMIENTAS ÚTILES Participante 3
Participante 4
Mila (moderación)
208
3.A Co-Designing
Data Documentation
October 2021
DOCUMENTANDO EL PROYECTO 1 | GRUPO A

INSTRUCCIONES

la discusión.


2. Usen el ícono

GRUPO A:
Participante 1
Participante 2
HERRAMIENTAS ÚTILES Participant 3
Participante 4
Mila (moderación)
209
DÍA 2
tribu ML
managers
2
hours
Co-Designing
Data Documentation
210
AGENDA:
DÍA 1 tribu ML
managers
2
horas
DÍA 2 tribu ML
managers
2
horas
20 Min. 5 Min.
Todos Introducción y Presentaciones Todos
Bienvenida
30 Min. 15 Min.
Todos
Actividad 1: Entendiendo prácticas de documentación Todos
Repaso
20 Min. 30 Min.
Grupos Actividad 2: Entendiendo roles y procesos Todos
Actividad 4: Identificando desafíos
30 Min.
15 Min.
Presentaciones Grupos
Actividad 5: Repensando la documentación
Todos
30 Min.
10 Min.
Actividad 3: Documentando 1 All
Presentaciones
Todos
15 Min. 10 Min.
Todos
Presentaciones All
Cierre
10 Min.
Todos
Cierre y preparación para día 2
Co-Designing
Data Documentation
211
4
October 2021
Co-Designing
Data Documentation
IDENTIFICANDO DESAFÍOS 30 min. todos brainstorming
¿Qué desafíos presenta la

actual forma de documentar?
- ¿Cuáles son las
- ¿Quién arma el wiki?
barreras de acceso?
- ¿De dónde viene la
información? - ¿Cómo se decide
Mantener la documentación actualizada - ¿Quién lo actualiza?
Garantizar acceso, promover uso sobre permisos?
INSTRUCCIONES -¿Cuánto tiempo lleva?
- ¿Consultan los
- ¿Cómo hacer la
actualización más eficaz? analistas los wikis de
otros proyectos?
brainstorming oral todos 30 min.
Esta actividad es un brainstorming oral. Vamos a

documentar las ideas que surjan usando post-its.
Lxs facilitadores se van a ocupar de escribir en
los post-its. No hace falta que escribas o uses este
canvas pero podes hacerlo si así lo deseas.
Guiadxs por las preguntas escritas en los celestes,

vamos a explorar cuatro áreas problemáticas:
Actualización de la información - ¿Qué otras tareas
Acceso a la información - ¿A qué información les cumplen las personas a
gustaría tener acceso? cargo de armar el wiki?
Granularidad de la información - ¿Por qué no hay acceso - ¿Cómo podemos
Integración de la documentación en procesos actualmente?
armar los procesos
- ¿Quién define qué
de trabajo. información se para integrar la
documenta? documentación?
Detalle vs. confidencialidad Integración en procesos de trabajo
212
5.A Co-Designing
Data Documentation
October 2021
REPENSANDO LA DOCUMENTACIÓN | GRUPO A 30 min. grupos construcción
¿Cómo podemos documentar 01 FORMATO

más fácil y efectivamente?
-SIMPLICIDAD
- ACCESIBILIDAD
INSTRUCCIONES -INTEGRACIÓN
Cuestionario Wiki o sitio web Spread sheet Check list
construcción participación activa grupos 30 min.

02 SECCIONES
04
Objetivos
Esta actividad require tu participación activa para
visualizar la discusión. Nos vamos a dividir en grupos y Instrucciones
cada grupo va a a trabajar en una sala distinta de Zoom. Ética y seguridad
05
Deconstruimos los procesos de documentación y los
rediseñamos pensando en:
03 INFORMACION
SIMPLICIDAD: el proceso debe ser simple y la

Por
Qué?
información fácil de entender.
Quién? Cómo? Cuándo? Dónde?
ACCESIBILIDAD: cuidamos la privacidad de los datos y
le damos seguridad al cliente mientras permitimos el
qué?
acceso a los analistas.
INTEGRACIÓN: pensamos en formas de integrar la
documentación en los procesos de trabajo. ¿Se puede
04 ACTORES
automatizar la documentación? ¿Cómo?
service
analista PM público auditor ... ...
owners
05 ACCIONES
Automatizar Colaborar Dar Permiso Integrar Detalle de Info

Feedback
Search
Reflexionar Linkear Datos Privados Encontrar Info
213
214
Co-Designing
Data Documentation
ABOUT THIS PROJECT: CO-DESIGNING DATASET DOCUMENTATION
WORKSHOP THEME & GOAL: THE WORKSHOP SERIES:

How can we make the production of machine learning data more transparent? DAY 1 data workers 1.5
BOP managers hours
This 3-day workshop series aims to bring research on dataset documentation closer to industry
This 90-minute session will gather data workers and management of the Bulgarian BPO to explore real
scenarios and workers’ needs. The workshops are the last iteration of our collaboration with a business
scenarios in which the documentation templates are used. Together with the participants, we will review the
process outsourcing (BPO) company specialized in the collection and annotation of data for machine
documentation templates and discuss existing practices, issues, and desiderata as preparation for the next
learning. The collaboration with the Bulgarian BPO has been on-going since 2019 and has included
iteration (see day 2).
participant observations, interviews, discussions and feedback rounds aiming to understand how
We want to know:
the outsourcing of data production projects work and how it can be made explicit in
At what stage of the relationship BPO-requester is documentation introduced?
documentation.
Who is in charge of documenting?
Are the templates tailored or adapted to each project?
Based on our feedback and in constant communication with us, the company has developed and
How is the integration of the documentation templates with company's processes and work practices
recently implemented three documentation templates to capture different stages of data-related
Is the feedback of data workers and the resulting iterations reflected in documentation?
projects:
A Scope of Work document to document orders placed by DAY 2 data workers

BPO managers
3
hours
requesters. requesters
A Dataset Documentation template to document project For the second session, we will have one of the BPO clients joining the data workers and their managers. For
as they develop. three hours and through different activities, we will discussing challenges, constraints, and chances of the
And a Post-Mortem Report to evaluate projects relationship BPO-requester. We aim to identify key information about the usefulness of documentation for both
retrospectively. BPO and requester by discussing what is possible in data-production contexts, what is useful for data services, and
what is valuable for clients.
The activities include the outlining of stakeholder map to identify key actors in dataset production, and a
The idea of the workshops is to evaluate these templates and expand them. The goal is to co-design a
stakeholder journey to make explicit how typical projects develop. In breakout groups, participants will discuss
framework to produce documentation that is able to retrieve some of the power dynamics inscribed
strategies to gather the necessary information that should flow into documentation. Towards the end of this
in datasets. Documentation of this sort should aim at:
session, the participantes will be invited to intervene the existing documentation templates.
1. Making data production contexts explicit – including the rationale and actors behind decisions
that shape data.
2. Integrating documentation practices in current workflows and routines to foster a view of
documentation as an integral part of dataset production.
DAY 3 data workers 1.5 hour
3. Given the multiplicity of actors collaborating to produce training data we believe in the potential of
The last session aims at mitigating some of the power differentials between the three groups present in session
documentation to serve as a communication medium among different stakeholders.
2. On day 3, we will only invite data workers to comment on the previous sessions. This form of debrief might
In addition, workshop participants will discuss ways to incentivize organizations and workers to
offer data workers a chance to express their opinions and discuss without the presence of bosses and clients.
document dataset production. We hope to identify desiderata and concerns as well as obstacles for
implementing documentation in industry settings. Based on these discussions, participants and
workshop organizers would draft together a documentation framework for dataset production.
215
DAY 1
data workers
BPO managers
1.5
hours
Co-Designing
Data Documentation
216
INTRODUCTION: DATASET DOCUMENTATION 15 mins. workshop facilitator
AGENDA
Day 1:
30 Min. Introduction and Presentations

DAY 1 data workers
managers
1.5
hours
All
‫ﻣﻘﺪﻣﺔ وﻋﺮوض‬
50 Min. Activity 1: Stakeholder Map & Documentation

All
DAY 2 data workers
managers
3
hours ‫ ﻓﻬﻢ ﻣﻤﺎرﺳﺎت اﻟﺘﻮﺛﯿﻖ‬:‫اﻟﻨﺸﺎط اﻟﺜﺎﻧﻲ‬
requesters
5 Min. Closing Remarks & Preparations for Day 2

All
DAY 3 data workers 1.5 hour
‫اﻟﻤﻼﺣﻈﺎت اﻟﺨﺘﺎﻣﯿﺔ واﻻﺳﺘﻌﺪادات ﻟﻠﯿﻮم اﻟﺜﺎﻧﻲ‬
Co-Designing
Data Documentation
217
INTRODUCTION: GROUND RULES 2 mins. workshop facilitator
ORGANIZATION: PARTICIPATION:
The session will be recorded, including content posted to the chat. Be patient, be kind, be supportive and show sensitivity to anyone who has
For privacy and consent, please do not share content from the workshop the floor.
on social media. All voices matter. Don't hog the mic. Encourage others to speak up by
being mindful with the floor time.
SELF-CARE: There are no dumb questions or comments. Be kind when commenting on
We understand that everyone’s situation is different and Zoom fatigue is other's ideas.
real. Feel free to step away if you need to but please mute your mic and By default, you are muted. If you wish to speak use the raise hand function
disable your video. on Zoom. When not speaking, please mute your mic.
If you absolutely must discuss potentially triggering or sensitive topics, We support language justice and aim to build multilingual spaces. Thus, we
please provide a content warning. encourage participants to communicate in the language they feel most
In the event of discomfort, problems, questions or requests, please comfortable with. The interpreters are here to assist us.
message any of the organizers through the chat and let us know what is
going on. Feel free to ask one of the interpreters to help translate the
message if needed.
:‫اﻟﻤﺸﺎرﻛﺔ‬
:‫اﻻﻣﻮر اﻟﺘﻨﻈﯿﻤﯿﺔ‬ ‫ وﺳﺎﻫﻢ ﻓﻲ ﻣﺴﺎﻋﺪة اﻟﺸﺨﺺ اﻟﻤﺘﺤﺪث‬،‫ﻟﻄﯿﻔﺎ‬ ً ،ً‫ﻛﻦ ﺻﺒﻮرا‬
ً
‫ﻣﺘﻀﻤﻨﺘﺎ اﻟﻤﺤﺘﻮى ﺳﯿﺘﻢ ﺗﺴﺠﯿﻠﻬﺎ‬ ‫ﻫﺬه اﻟﻤﺤﺎدﺛﺔ‬ ‫ ﻟﺬﻟﻚ ﻻ ﺗﺴﺘﺨﺪم اﻟﻤﯿﻜﺮﻓﻮن ﻓﻲ اﻻﺛﻨﺎء وﺳﺎﻫﻢ ﻓﻲ اﻟﺴﻤﺎح ﻟﻸﺧﺮﯾﻦ ﺑﺎﻟﻜﻼم ﻓﻲ‬.‫اﺻﻮاﺗﻜﻢ ﻟﻬﺎ أﻫﻤﯿﺔ‬
‫ﻷﺟﻞ اﻟﻤﺼﺪاﻗﯿﺔ اﻟﺮﺟﺎء ﻋﺪم ﻣﺸﺎرﻛﺔ اﻟﻤﺤﺘﻮى ﻋﻠﻰ وﺳﺎﺋﻞ اﻟﺘﻮاﺻﻞ اﻻﺟﺘﻤﺎﻋﻲ‬ .‫اﻟﻮﻗﺖ اﻟﻤﻌﻄﻰ ﻟﻬﻢ‬
.‫ ﻟﺬﻟﻚ ﻛﻦ ودوداً ﻋﻨﺪ اﻟﺘﻌﻠﯿﻖ‬،‫ﻻ ﯾﻮﺟﺪ ﺳﺆال او ﺗﻌﻠﯿﻖ ﻏﺒﻲ‬
:‫اﻻﻧﺘﺒﺎه ﻟﻠﺬات‬ .‫ ﻟﺬﻟﻚ اﺿﻐﻂ ﻋﻠﻰ رز اﻟﯿﺪ ﻟﻠﺘﺤﺪث ﻋﻠﻰ ﺗﻄﺒﯿﻖ اﻟﺰوم‬،‫ﺑﺸﻜﻞ أﺗﻮﻣﺎﺗﯿﻜﻲ ﻟﻦ ﯾﻌﻤﻞ اﻟﻤﯿﻜﺮﻓﻮن اﻟﺨﺎص ﺑﻚ‬
‫ وﻟﻜﻦ ﺗﺬﻛﺮ ﺑﺎن‬،‫ ﻟﺬﻟﻚ ﯾﻤﻜﻨﻚ اﺧﺬ ﻗﺴﻂ ﻣﻦ اﻟﺮاﺣﺔ ﻋﻨﺪ اﻟﺤﺎﺟﺔ‬.‫ﻧﺤﻦ ﻧﺘﻔﻬﻢ وﺿﻌﻜﻢ ﺟﻤﯿﻌﺎ ﺑﺎن اﺳﺘﺨﺪام اﻟﺘﻄﺒﯿﻖ ﻣﺘﻌﺐ‬ .‫اﯾﻀﺎ وﺿﻊ اﻟﻤﻜﺮوﻓﻮن ﻓﻲ وﺿﻌﯿﺔ اﻟﺼﺎﻣﺖ ﻋﻨﺪ ﻋﺪم اﻟﺘﺤﺪث‬ ً ‫اﻟﺮﺟﺎء‬
.‫ﺗﻐﻠﻖ اﻟﻔﯿﺪﯾﻮ واﻟﻤﯿﻜﺮﻓﻮن‬ ‫ ﻟﺬﻟﻚ ﻧﺤﻦ ﻧﺸﺠﻊ اﻟﻤﺸﺎرﻛﯿﻦ ﻋﻠﻰ‬.‫ﻧﺤﻦ ﻧﺪﻋﻢ اﻟﻌﺪاﻟﺔ ﻓﻲ اﻟﻠﻐﺎت وﻧﻬﺪف اﻟﻰ ﺑﻨﺎء ﻣﺴﺎﺣﺔ ﻟﻐﻮﯾﺔ ﻣﺘﻌﺪدة‬
.‫ اﻟﺮﺟﺎء إﻋﻄﺎء ﻣﻼﺣﻈﺘﻚ ﻟﻠﻤﺤﺘﻮى‬،‫إذا اﺣﺴﺴﺖ ﺑﻀﺮورة ﻣﻨﺎﻗﺸﺔ ﻣﻮﺿﻮع ﻣﻬﻢ او ﺣﺴﺎس‬ .‫ اﻟﻤﺘﺮﺟﻤﯿﻦ ﻫﻨﺎ ﻟﯿﺪﻋﻤﻮﻧﺎ‬.‫اﻟﺘﻮاﺻﻞ ﺑﺎﻟﻠﻐﺔ اﻟﻤﺮﯾﺤﺔ ﻟﻜﻢ‬
‫ اﻟﺮﺟﺎء اﻟﺘﻮاﺻﻞ ﻣﻊ اﻟﻤﻨﻈﻤﯿﻦ او‬.‫ ﺳﺆال او ﻣﺸﻜﻠﺔ‬،‫ﻓﻲ ﺣﺎﻟﺔ اﻟﺸﻌﻮر ب ﻋﺪم اﻟﺮاﺣﺔ او ب ﺣﺎﺟﺔ ﻃﺮح ﻣﺸﻜﻠﺔ‬
.‫اﻟﻤﺘﺮﺟﻤﯿﻦ ﻟﻠﻤﺴﺎﻋﺪة ﻓﻲ أﻣﻮر اﻟﺘﺮﺟﻤﺔ ﻋﻨﺪ اﻟﺤﺎﺟﺔ‬
Co-Designing
Data Documentation
218
INTRODUCTION: WORKSHOP TOOLS 15 mins. workshop facilitator
ZOOM
Co-Designing
Data Documentation
219
INTRODUCTION: WORKSHOP TOOLS 15 mins. workshop facilitator
MIRO
1 Co-Designing
Data Documentation
November 2021
STAKEHOLDER MAP 30 mins. all participants group brainstorming
Who is who in data labeling? ‫ﺧﺮﯾﻄﺔ اﻟﺠﻬﺎت اﻟﻤﻌﻨﯿّﺔ‬
ْ ‫َﻦ وراء ﻣ‬
‫َﻦ ﻓﻲ ﺗﺴﻤﯿﺔ اﻟﺒﯿﺎﻧﺎت؟‬ ْ‫ﻣ‬
INSTRUCTIONS
‫َﺼﻒ اﻟﺬﻫﻨﻲ‬
ْ ‫اﻟﻌ‬ ‫اﻟﻤﺸﺎرﻛﺔ اﻟﺸﻔﻮﯾﺔ‬ ‫اﻟﺠﻠﺴﺔ اﻟﻌﺎﻣﺔ‬ ‫ دﻗﯿﻘﺔ‬30
brainstorming oral plenum 30 mins.
TOOL TIPS
SOW SOW SOW DD DD DD DD
SOW SOW SOW PM PM PM PM
! ! ! ! ! ! !
Co-Designing
Data Documentation
220
INTRODUCTION: LET'S GET TO KNOW EACH OTHER! 10 mins. all participants
‫ﻣﺮﺣﺒﺎ ﺟﻤﯿﻌﺎ‬
INTRODUCE YOURSELF! WHO'S HERE TODAY?

PARTICIPANTS
What’s your name? ‫ﻣﺸﺎرﻛﻮن‬
FACILITATORS ‫اﻟﻤﯿﺴﺮﯾﻦ‬
What’s your work about? Mila
Adriana
Where are you calling from?
Julian
What's your favorite animal?
Something you'd like us to know SUPPORT & NOTETAKING ‫واﻟﺘﺪوﯾﻦ اﻟﺪﻋﻢ‬
about yourself... Mouath
Sonja
Add your name here INTERPRETATION EN-ARA-EN

‫ اﻧﺠﻠﯿﺰﯾﺔ‬- ‫ ﻋﺮﺑﯿﺔ‬- ‫ﺗﺮﺟﻤﺔ اﻧﺠﻠﯿﺰﯾﺔ‬
‫ﻣﺎ ﻫﻮ إﺳﻤﻚ؟‬ Hanadi
‫ﻣﺎ اﻟﻠﺬي ﯾﺘﻀﻤﻨﻪ ﻋﻤﻠﻚ؟‬ Mahdy
‫ﻣﻦ أﯾﻦ ﺗﺘﺼﻞ ﺣﺎﻟﯿﺎ؟‬

TECH SUPPORT ‫ﺗﻘﻨﻲ دﻋﻢ‬
‫ﻣﺎ ﻫﻮ ﺣﯿﻮاﻧﻚ اﻟﻤﻔﻀﻞ؟‬ Ling
‫ﻣﺎ اﻟﻠﺬي ﺗﺮﯾﺪ ان ﺗﺸﺎرﻛﻨﺎ ﺑﻪ ﻋﻦ ﻧﻔﺴﻚ؟‬
GRAPHIC DOCUMENTATION ‫اﻟﺠﺮاﻓﯿﻜﻲ اﻟﺘﻮﺛﯿﻖ‬
Marc
Co-Designing
Data Documentation
221
WORKSHOP THEME & GOAL

How can we make the production of training data more transparent?
‫ﻛﯿﻒ ﯾﻤﻜﻨﻨﺎ ﺟﻌﻞ إﻧﺘﺎج ﺑﯿﺎﻧﺎت اﻟﺘﺪرﯾﺐ أﻛﺜﺮ ﺷﻔﺎﻓﯿﺔ؟‬
Co-Designing
Data Documentation
222

Transparancy and Black-box Datasets
‫ وﻟﻜﻦ ﻟﯿﺲ اﻷﻋﻤﺎل اﻟﺪاﺧﻠﯿﺔ‬،‫اﻟﺸﻔﺎﻓﯿﺔ وﻣﺠﻤﻮﻋﺎت اﻟﺒﯿﺎﻧﺎت اﻟﺘﻲ ﯾﻤﻜﻨﻨﺎ ﻣﻦ ﺧﻼﻟﻬﺎ ﻓﻘﻂ ﻣﺮاﻗﺒﺔ اﻟﻤﺪﺧﻼت واﻟﻤﺨﺮﺟﺎت‬
Co-Designing
Data Documentation
223

Re-imagining Documentation
‫إﻋﺎدة ﺗﺨﯿﻞ ﻋﻤﻠﯿﺔ اﻟﺘﻮﺛﯿﻖ‬
FROM DATASET DOCUMENTATION...

--> TOWARD DOCUMENTING DATA PRODUCTION
Co-Designing
Data Documentation
224
1 Co-Designing
Data Documentation
November 2021
Who is who in data work? ‫ﺧﺮﯾﻄﺔ اﻟﺠﻬﺎت اﻟﻤﻌﻨﯿّﺔ‬

‫َﻣ ْﻦ وراء َﻣ ْﻦ ﻓﻲ ﺗﺴﻤﯿﺔ اﻟﺒﯿﺎﻧﺎت؟‬
INSTRUCTIONS
ْ ‫اﻟﻌ‬ ‫اﻟﻤﺸﺎرﻛﺔ اﻟﺸﻔﻮﯾﺔ‬ ‫اﻟﺠﻠﺴﺔ اﻟﻌﺎﻣﺔ‬ ‫ دﻗﯿﻘﺔ‬30 S2
Partner
TOOL TIPS Org. in
Syria
SOW SOW DD DD PM PM
PM PM
SOW SOW DD SOW
Requester
! ! ! ! ! !
225
DAY 2
labelers
managers
clients
3
hours
Co-Designing
Data Documentation
226
AGENDA
15 Min. Welcome and introduction

All
DAY 1 labelers
HITL managers
1.5
hours
30 Min.
All
Activity 1: Stakeholder map recap
5 Min. Break
10 Min.
Activity 2: Roleplay - Generating Questions
Groups
DAY 2 labelers
HITL managers
3
hours
40 Min.
Groups
Presentations
requesters
5 Min. Break
25 Min.
Activity 3: Re-imagining documentation
Groups
DAY 3 labelers 1.5

hours
30 Min.
Groups
Presentations
5 Min. Closing Remarks & Feedback

All
Co-Designing
Data Documentation
227
2.A Co-Designing
Data Documentation
November 2021
ROLE PLAY: GENERATING QUESTIONS | GROUP A

10 mins. breakout groups hands-on brainstorming
Obtaining and sharing information: Data Workers
What information do labelers need
to do their job and who can
provide it?
Topic
Topic Topic
Question
Cla s la n
Question
Question
INSTRUCTIONS
How ud Who holds this information?
brainstorming hands-on breakout 10 mins.
e c l ?
This is a hands-on activity in breakout groups. You Who holds this information?
will be placed in a breakout room on Zoom. Discuss

with your group and use this canvas to document the S2 ma me
group's ideas.
Topic Topic
Discuss with your group and collect questions that
labelers might have when doing their work.
Write only one question in each yellow card. Topic
Question
Question
For each question, consider who might be able to
provide the answer.
Question

TOOL TIPS
Topic
Team A
! ?
Question
Participant 1
Participant 2
Participant 3
Relevance
Participant 4
little very
! ?
Participant 5
Facilitator (EN)
Facilitator (ARA)
228
3B Co-Designing
Data Documentation
November 2021
RE-IMAGINING DOCUMENTATION | GROUP B 25 mins. breakout groups hands-on prototyping
How can documentation

become a part of current 01 FORMAT
workflows?
-SIMPLICITY
- ACCESSIBILITY
-INTEGRATION
Questionnaire Website Spread sheet Check list Form
INSTRUCTIONS 02 SECTIONS
04
prototyping hands-on breakout 25 mins. General Goals
Instructions
Ethics and Security
05
This is a hands-on activity in breakout groups. You will be placed
in a breakout room on Zoom. Discuss with your group and use
this canvas to document the group's ideas.
03 INFORMATION
Let's design a documentation process to mitigate some of the

information asymmetries we discussed in the previous
activity!
What? Who? How? When? Where? Why?
SIMPLICITY: The process of documenting projects must be simple.
ACCESSIBILITY: The information that is documented must remain
legible and easy to access.
INTEGRATION AND COLLABORATION: Documentation must be 04 STAKEHOLDERS
part of the labeling process. How do we integrate labelers in the
process of documenting?
data Partner
requester labelers PM manager ... ...
subjects Org.
05 ACTIONS
TOOL TIPS
Group B:
Participant 1
Participant 2
Participant 3
Automate Allow Access Feedback Detailed Participant 4
Collaborate
Participant 5
Information
Facilitator (EN)
Facilitator (ARA)
Search Interpreter (ARA-EN-ARA)
Reflect Link Ban Search

Information
229
1 Co-Designing
Data Documentation
November 2021
Who is who in data labeling? ‫ﺧﺮﯾﻄﺔ اﻟﺠﻬﺎت اﻟﻤﻌﻨﯿّﺔ‬
‫َﻣ ْﻦ وراء َﻣ ْﻦ ﻓﻲ ﺗﺴﻤﯿﺔ اﻟﺒﯿﺎﻧﺎت؟‬

INSTRUCTIONS
ْ ‫اﻟﻌ‬ ‫اﻟﻤﺸﺎرﻛﺔ اﻟﺸﻔﻮﯾﺔ‬ ‫اﻟﺠﻠﺴﺔ اﻟﻌﺎﻣﺔ‬ ‫ دﻗﯿﻘﺔ‬30
TOOL TIPS
SOW SOW SOW DD DD DD DD
SOW SOW SOW PM PM PM PM
! ! ! ! ! ! !
230
CO-DESIGNING DATASET DOCUMENTATION : LET'S TALK!
Communication
and Information
DAY 3
Asymmetries
Work
Data Workers
Tools
1.5
hours
Labor
Conditions
Co-Designing
Data Documentation
231

PHD Exploitation

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

PHD Exploitation

Hochgeladen von

Copyright:

Verfügbare Formate

Whose Truth?

Power, Labor, and the Production of

an der Fakultät IV - Elektrotechnik und Informatik

List of Tables xvii

List of Papers Resulting from my Doctoral Work xix

List of Abbreviations xxiii

2 Data Production and Power 21

3 Meaning Imposition and Epistemic Authority in Data Annotation 37

4 Precarization, Alienation, and Control in Data Work 65

5 Co-Designing Documentation for Reflexivity and Participation 107

P5–3.1 Participatory Design . . . . . . . . . . . . . . . . . . . . . . . . 128

6 Reflection and Conclusion 163

Appendix A Interview Guides 189

Appendix B Workshop Facilitation Templates 197

1.1 Ground-truth data production . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

4.1 Paper 3 – Fig. 1: Commercial data annotation tool. . . . . . . . . . . . . . . . 94

5.17 Dynamic Work Sheet: Private area. . . . . . . . . . . . . . . . . . . . . . . . . 161

6.1 Zine created by participants at the workshop “Crossing Data” . . . . . . . . . . 167

3.1 Paper 2 – Table 1: Overview of participants and research sites. . . . . . . . . . 45

5.1 Paper 4 – Table 1: Summary of descriptive dimensions in previous data

6.1 Overview of papers, research questions, methods, and findings. . . . . . . . . . 164

Papers Included in this Dissertation

Other Refereed Papers

Workshop Papers and Pre-Prints

Research Sites CSCW Computer-Supported Cooperative Work

S3 Management employees at five other data- GTM Grounded Theory Methodology

S4 Instruction documents for data-work tasks HCI Human-Computer Interaction

PDM Participatory Design Methodologies

CCTV Closed Circuit Television SoW Scope of Work

CDA Critical Discourse Analysis STS Science and Technology Studies

CS Computer Science USA United States of America

1.1 Thesis Outline

reflexivity in documentation practices. Paper 5, Documenting Data Production Processes. A

1.2 Defining Key Concepts

1.2.1 Human-Based Computation

1.3 Research Design

• S2: A BPO that produces ML datasets located in Sofia, Bulgaria. It specializes

• S3: Management employees at five other data-work BPOs located in India,

• S5: ML engineers in their role as data-work requesters at four technology start-

Figure 1.2: Worldwide distribution of research sites, participants, and data.

1.3.1.1 Grounded Theory Methodology

1.3.1.2 Dispositif Analysis

2. Non-linguistically performed practices: Interviews with data workers, managers, and

3. Materializations: Through participant observations, the knowledge built into physical

1.3.1.3 Participatory Design

1.3.2 Data Collection

1.3.2.2 Qualitative Interviews

1.3.2.3 Document Collection

1.3.2.4 Co-Design Workshops

1.3.3 Data Analysis

Use case? Who annotates? Labeling clases?

‫ ﯾﻔﺎن ﺻﺮح ﺑﺎﻧﮫ‬Very difficult to

client Data sientist

put inital general lable

labeler Size of dataset

1.3.3.1 GTM Coding

1.3.3.2 Critical Discourse Analysis

1.3.3.3 Reflexive Thematic Analysis

1.3.4 Research Ethics and Positionality

Studying Up Machine Learning Data: Why Talk About Bias

© 2022 Copyright held by the owner/author(s). Publication rights licensed to ACM.

34:2 Milagros Miceli, Julian Posada, and Tianling Yang

2 THE LIMITS OF BIAS