Precision and Uncertainty in a World of Data
The Departments of Anthropology and the History of Medicine and the Center for Medical Humanities and Social Medicine will be launching a two-year Sawyer Seminar on the topic of Precision and Uncertainty in a World of Data in the 2019-2020 academic year. This semester (Spring 2019), we are holding a series of reading group meetings to connect with faculty and students across the different divisions at Johns Hopkins University and beyond, in an effort to structure the seminar’s two-year course.
We envision for the Sawyer seminar to prompt conversations around what kinds of ethical and social issues are new about our Big Data moment, what has carried over from the past, and what kinds of approaches might help us extend our understanding of this moment’s specificity. On this page, we will be posting notes on our reading group meetings. If you have attended one of our meetings and would like to contribute to this page with your notes, please let us know! If you would like to attend our events, get on our mailing list to stay informed, or get in touch with us with any questions or comments, please send an email to Canay Özden-Schilling (firstname.lastname@example.org).
2/13/2019: Automating Inequality: A discussion of Virginia Eubanks’s book
3/6/2019: Privacy and Data: A Discussion with Anita Allen (UPenn)
4/10/2019: Big Data and Resource Allocation: A Discussion with Sanmay Das (Washington University)
5/6/2019: Algorithms and Accountability: A Discussion with Juliet Floyd (Boston University) and Matthew Jones (Columbia University)
NOTES FROM OUR PREVIOUS DISCUSSIONS
2/13/2019 | A discussion of Virginia Eubanks’s Automating Inequality: How High-Tech Tools Profile, Police, and Punish the Poor (St. Martin’s Press, 2018) | “Data Sciences and Society” Reading Group
Discussion notes by Naveeda Khan:
I enjoyed reading this book. I was moved by it in many ways, by its attention to the specters of the poor in American history, its nuanced profiles of the poor as singular and collective in turn in contemporary U.S., and its serious call to social action. It is not often that I read scholarly books that diagnose our present so presciently, showing how inequality is being scaled up and up and up, how it has become almost inexorable, and takes up the call to keep up the good fight. Others are made much more breathless by the scaling, marked by their handwringing. Although I would not put Cathy O’Neil’s Weapons of Math Destruction in this last category, I did think that she seemed to have less interest in historical and social context, which led her to provide only a few portraits of affected people who could do little but feel stunned by the wizardry of the new algorithms ripping through their lives. In Eubanks one could see that even as the ambitions to automate social services got more elaborate and algorithms more complex, they were still embedded within institutional settings and political contexts, driven by state and individual interests and desires, and informed by longstanding biases that hadn’t gone away with automation but had become more embedded within its structures.
But praises and comparisons aside it was precisely Eubank’s focus on the history of the poor that garnered her some criticism at the start of the discussion on her book at our Data Sciences and Society Reading Group last Wednesday. What is so new about this moment if it is only an extension of how the U.S. has always dealt with its poor? Why isn’t this more modestly claimed as a study of state bureaucracy? And if there isn’t anything new about this moment then why even bring up “high-tech tools” in the title? Why not go further into the hardware and design aspects of these tools that O’Neil at least does? We decided that some of the promise of the title was realized in chapter four of Automating Inequality, “The Allegheny Algorithm” in which Eubanks provides the three key data based components that comprise the Allegheny Family Screening Tool (AFST), notably, outcome variables, predictive variables and validation data. At the same time the author’s focus in insistently upon what is sought within existing bodies of data and how (controversial mining practice of only seeking out highly correlated data points of statistical significance) rather than where does this data come from in the first place, who has access to it and for what varied purposes.
While speaking to designers of algorithms and automated systems would gain us ready insight into their healthy doubts and skepticism about their products, this book wasn’t about tooling and creating error free systems by a few artisanal designers. It is about how such tools, which, ready or not, were captured, operationalized and managed by states. It provided the dimension of the state that we missed in O’Neil in whose book encounters between individuals and systems that fail them happened more haphazardly, whereas with Eubanks’s introduction of the dimension of the state the aggregate effect of such automation came into clearer focus.
Our discussion of Eubanks kept returning to the question of how does the past endure into the present? Is it really the case that history holds as unchanging, the return of the same, the poorhouse made into the digital poorhouse in the present? At the level of the U.S.’s proclaimed ethos of self-help and ambivalent attitudes towards the poor, one can make an argument for continuity as Eubanks does but at the level of the relationship between tools and states there are discernible shifts. If tools were created in the past to purge the numbers of poor dependents, it was initiated by states seeking a variety of ends, from the elusive search for efficiency to trying to help people. But now tools are created within the context of a widespread suspicion of government. Thus the work of automation now is not just to cut the welfare rolls but to render government irrelevant. This felt to us as an important difference from the past as it makes automation more hydra-headed, directed at the poor and at government, raising the question of what else is under attack and being undermined.
We returned to our vexation with the anti-technical bias of the book, for instance the lack of acknowledgment that centralization may be beneficial in some instances, and the need for Eubanks to have larger data sets for her research to nuance her denunciatory stance. Perhaps then we would find that high tech tools also serve the poor but different groups within them or through appeal to different aspects of their identities other than as poor, perhaps as white, male, vulnerable, pleasure-seeking individuals? Questions were raised about communities of interest within Reddit, sub Reddit, the dark web, who may be on welfare and are facing the negative effects of automation, but who have other dimensions to their lives. It is salutary that she gives the poor a face here but do they seek only this face?
Finally, the one chapter that went into any depth into the actual mechanics of an automated system, chapter four, captured a different kind of fear than automation, impersonality of services, break down of care without any possibility of human intervention and triage. The chapter also captured the fear of being modulated against one’s wishes or even conscious knowledge through interface with the machine. We got the example of the welfare counselors who started to question their judgment in terms of determining the risk level of children within families if their scores were far off from those generated by the machine with its deep backlog of data. Instead of questioning or even overriding the machine’s decision counselors began to doubt themselves and rerun evaluations to see if they could match the machine. This self-modulation, this fear of having one’s insides reshaped, offered a moment in which the present was not merely a recapitulation of the past but something new, unknown, and potentially terrifying. On reflection, this subject deserves more attention than Eubanks gave it. Our discussion sub-group, mostly anthropologists and historians, several with STS interests, and two clinicians, felt that we would like to encourage the participation of those with interest in neuroscience and cognitive science to understand and diagnose this fear of manipulation that includes the manipulation of so called inner selves.
Discussion notes by Jeremy Greene:
I agree with Naveeda that Eubanks has produced a remarkable book added greatly to the collective conversation of our interdisciplinary reading group, especially layered onto our recent reading of Cathy O’Neil’s Weapons of Math Destruction. Eubanks’ book is elegantly structured, well-written, and compelling, and it has the potential to engage with broad popular and policy audiences. As Naveeda describes, Eubanks is able to capture in her case studies a nuanced sense of the historical and social context in which empirical knowledge about poverty is used to frame institutions that continue to separate and stigmatize. Several people in our discussion group of historians, anthropologists, clinicians, and bioethicists wondered, however, why Eubanks was not willing to look under the hood and show the reader how, exactly, these algorithms worked, in the way that O’Neil seemed consistently eager to do. Were she present in the room (as we hope she may be in a future Sawyer Seminar event), we would have like to ask her how she might productively open the “black box” and expose the innards of the technologies she describes. Better yet, perhaps, would be to put Eubanks and O’Neil in conversation with one another, as each seems to have a piece of the puzzle which the other lacks: where O’Neil really could benefit from more engagement with historical and social context, Eubanks could benefit from more engagement with the workings of the technologies themselves.
Our discussion ended with an open question regarding known knowns, known unknowns, and unknown unknowns regarding algorithms and inequality. On the one hand, how do we learn the answers to questions we already knew to be important? E.g., changing definitions of privacy, rising saturation of data surveillance, the encoding of prior biases through computation, etc. On the other hand, what new questions might emerge from these engagements? E.g., how are new collectives being formed through these technologies? Whose voices are amplified through techniques of big data and machine learning, and whose are silenced? What forms of labor are being displaced, and what new forms are emerging? How do we attend to the changing interfaces through which people become data and/or have their understandings and future behaviors shaped?
Discussion notes by Veena Das & Canay Özden-Schilling:
Our group’s discussion on Automating Inequality centered on the relation between human bias and bias introduced into decision models based on predictive algorithms. The book demonstrates how bias weaponizes these tools against poor populations. Is the problem with the design of these algorithms—e,g., the choice of certain variables, the omission of others, discriminatory assumptions about proper ways of parenting and organizing domestic space? In what ways does a certain (overwhelmingly white, middle-class) demographic become the standard for evaluating those who are at risk in decisions to determine, for instance, eligibility for social services, allocation of scarce resources over housing, and identifying which children are at risk of abuse in their home environment? Or do the problems arise from poor implementation? We found that this was an empirical question to explore that left us wanting more extensive qualitative research as well as development of mixed methods for opening up a wider set of issues. For instance, how would race play as a factor if the sites chosen for analysis of documents and interviews included poor black neighborhoods? Could one use the qualitative research as generating further hypotheses for designing surveys over a random sample of households to determine the weight of different variables as the decision models are implemented at the local level? Since the population on whom the research was conducted consisted of families who were already under surveillance either because of their own needs to access social services, or were reported for child abuse or for minor crimes, it would not be possible to assess whether there were endangered children in families who had not come under the eye of the social service or criminal justice apparatus. These comments were not offered as criticism of the book per se but as issues that arose from the study.
Several of our participants were intrigued by the book’s portrayal of continuities in poverty management and discrimination against the poor from prior eras to our current moment. Can punitive resource allocation be attributed specifically to the work of algorithms? By the same coin, are algorithms simply the henchmen of neoliberal governance? Some of us have pointed out our experience with Big Data practitioners and students who believe in the unprecedented revelatory powers of Big Data—that it allows us to see reality in ways never seen before. But if, as Virginia Eubanks argues, the digital database is a continuation of yesteryear’s poorhouse (except now scaled up and everlasting), what does that mean for the specificity of our Big Data moment? We thought that one answer might have to do with contemporary processes of data collection—the scale of surveillance, voluntary vs. involuntary sharing of data, and ownership of one’s data. We returned time and again to the relationship between Big Data and ethics. We agreed that the two were necessarily bound up with one another and that there was no way to extract the social from Big Data to arrive at neutral tools. There could be no universal ethical framework for the design and implementation of Big Data; a simple plea to return to human judgment and banish the machines wouldn’t do either. Bias in data and algorithms can take diverse forms; the ends of manipulation, be it by governments or corporations, are not uniform either. One possibility is that different kinds of questions arise over different scales of data – thus, for instance, the questions arising from population-level genomics might be very different from questions arising from the level of data contained in individual clinical records, or files in the criminal justice system. While in the current milieu questions of ethics seem closely tied up with distributional questions related to fairness and justice, were there other regions of ethics that could be unearthed from other traditions of philosophy, bioethics, and the social sciences? We are looking forward to continuing to explore this variety and the futures of resource allocation in future meetings.