View on GitHub


Download this project as a .zip file Download this project as a tar.gz file

Chat Bots and Data Collection as Art and Political Protest: Feminist Data Set

Caroline Sinders

In June 2018, the Guardian reported on a years long trend of ‘pseudo-AI’ of technology startups focusing on artificial intelligence are actually using humans to do chatbot and AI tasks. As early as 2016, Bloomberg discovered that AI calendar assistants like and Clara were using humans instead of AI enabled chatbots to to do menial and often mind numbing tasks. According to the Guardian, the human counterparts were so burned out that… “the employees said they were looking forward to being replaced by bots…” 1 This kind of trickery does serve a capitalistic purpose- it allows the start up to quickly ship and continue to work on the technical peculiarities while the product gains footing and a customer base. But, it’s still trickery and calls to mind the man behind the Wizard in the Wizard of Oz, or the original mechanical turk in a way. “Pay No Attention to that Mind Behind the Curtain,” if you will. Human expertise and human’s ability to contextualize processing information will be hard to replicate and replace using artificial intelligence. In fact, it will be nearly impossible. But, conversely, there are tasks machines do so well that humans cannot compete: such as the speed of processing, of analyzing and shifting through large data sets and numbers much more quickly than the the human brain can analyze. But this tension of between what humans can do better versus what machines do well is at the crux of my interests in creating art that examines artificial intelligence, human intelligence, and emotions.

Which brings us to bots, and chat bots in particular. The idea of creating a system to mimic human behavior and human speech through technology is an interesting mechanical medium. Chatbots, whether or not intended to, daily exemplify all of the nuances of speech: from what sounds organic and natural to the uncanny valleys of not-quite-right uncomfortable awareness and awkwardness. But, there’s an inherent cleverness to bot creation, in which the skills of language and conversation inside of technology become an elevated art form. Conversation is ‘designed,’ and it’s much of a skill as any technical, mathematical or artistic medium. The combination of the “right” kind of words can eleciate emotions like delight, or accidental frustration. As the critic Nora Khan writes in her essay and talk, [“Mapping the Hidden]{.underline}2,” large tech and bot companies are recognizing the importance of conversational data: of linguistics, humor, and emotional cues teased out through words. These companies are hiring writers to make the bots sound more….human like, and to make the conversational experience more enjoyable. But, this still requires language, and this language is translated into data. Khan poetically described,

“AI researchers hone in on a very difficult field, psycholinguistics, > the creation of how humans produce language, learn it and understand > it, in order to develop finer computational models. Further, these > researchers use reams of real-world data, open playgrounds in which > subjects speak for the language itself, we see the > employment of more creative writers in AI. This is a fascinating > shift. Microsoft’s infamous TAY bot was written largely by comedians, > Cortana’s writing team is filled with script writers and novelists; > there are poets and novelists working at Google on building the > language of the future…”

Thus, novel writing, literature, and comedy are remapped onto to data structures, data systems, and novelists are collaborating with data scientists. To make a conversational bot, one doesn’t just need algorithms, one needs data, and the conversation of a chatbot is data. Data is part of the infrastructure, the backbones of AI. It’s what determines what the AI system can do, it’s a key ingredient. A chatbot that focuses on the nuances of traffic rules, for examples, needs to understand traffic regulations, as well as cars and infrastructure. A traffic aware bot would need to understand what a pedestrian is as well as that a car belongs on the street. A ‘pedestrian’ is a noun, it’s a specific entity inside world of ‘traffic’’, it’s a form of data. A car, a car that moves, a car that moves on the street are bits of data as well.

These two topics can feel wholly unsimilar: data/data collection and chatbots. In fact this may feel like two completely separate areas of study, only linked because they both touch software and hardware. But data and bots are inexplicably tied together. A conversation cannot happen without the ability to communicate, and to converse, in any shape or form. A chat bot cannot bot or converse without data. To create a chat bot, to create a conversational system using AI you need data for the system to be trained on; you need a lot of recorded or written conversations to feed into the system. But if that data is biased, if that data is adversarial, this will be reflected back into the AI system. If the AI system itself is improperly designed, what’s been created is something can have many levels of harm to it. That’s the dark side to data and data collection for artificial intelligence. From biased data being feed into algorithmic systems used by the police, to new forms of surveillance software analyzing emotions in faces, data collection feels especially fraught. Or, even just incorrect or incomplete data being fed into a system. To make a chatbot, one needs data, to have data, one must get data. But this begs the question of where did the data come from, and who owns it, who made it? How is data collected, can it be grown, where does data for large artificial intelligence systems come from? Data, like most things, can be incredibly political, from why it was gathered, to who currently ‘owns’ it to how a specific data set is used. Data has intentionality, data can harm.

Feminist Data Set is a multi-year long process to explore ethical and slow data collection as an art practice by holding workshops to collaboratively collect and archive data. This data will then be used to create a feminist AI. However, the slowness of the data collection is important. Similar to a slow food movement, I’m interested in exploring sustainability, and equity with data collection, by making something that seems incredibly technical start to feel incredibly human, from start to finish, by focusing on the labor, and the creation of this mechanized process. The project explores a series of questions: can data collection and data sets be an art project? What about qualitative data or emotional data? Can there be feminist data and feminist AI? The data collected through the project Feminist Data Set is then being used to create a Feminist AI chat bot. The project is multifold: can data collection be consensual, ethical, at time when data is considered such a dangerous and adversarial topic? And what does feminist UI look like in mechanical conversations?

Feminist data is ‘data’ such as books, essays, interviews, songs, anything within a written form that is created from a feminist perspective. As the architect of the data set, there are parameters I’ve set. The data should be reflective of modern feminist: it should be diverse, intersectional, and inclusive, specifically open to trans women and women of color experiences in feminism. The data set is carefully considered, it’s collected through workshops and archived, and then reviewed. Is the data too ‘white’, is the data to ‘North American’, how old is the data set, is the data too academic, is it too colloquial? All of these terms: location, race, age, type of written work (conversational, academic) are data types that will shape and terime the what’s in the interactions and how the chat bot will interact. But in this case, the project pushes data equally to the forefront of the conversational experience of the bot, and of the artificial intelligence system. Data is not an afterthought, but a fore-thought.

There is a place for chatbots as art, and as art pieces that are politically engaged but also use conversation as decisive and specific medium. Specifically the work of of Amme Talks by Ulf Stotlerfoht3, or Sandy Speaks by American Artist come to mind. Amme Talks focuses on the abstraction of poetry and language pushing it to extremes, another on the untimely death of Sandra Bland, which is described as the artist as “... a monument that is not stagnant, but rather an active and learning system that maintains what Sandra set out to perform…”4 Both of these pieces are less about how the conversation unfolds and what a conversation is: a back and forth that ‘naturally’ unfolds between two entities. Instead, these pieces are about what is inside of the conversation: the topics, the statements, the sentences, and uses the ‘conversation’ as a medium for the projects to unfold. Additionally projects like Mimi Onuoha’s “Missing Data Sets” or Heather Dewey-Hagborg’s “Probably Chelsea” critically address the ethical issues around data collection and data visualization, while also pushing the medium towards the poetics of data, even while critiquing. Data, when creating and making art with technology, is still a part of the conversation.

To make something, critically about AI, the data being used has to be critical, and integral to the piece as well. As an artist and researcher, I’m less concerned with does the feminist AI sound human but how to do we reimagine what speech is with mechanical interactions while creating ethical technology? Ethical, political engaged technology must focus on not just technical systems but what is feed into those technical systems, and who got to have a voice in the process. Onuoha’s work above explores specific groups of people that are ignored by data systems and data groups. In this sense, representation is powerful. To more towards a equitable future in technology, representation is important. Does our data, inside of artificial intelligence, represent the general population or does it represent only those who work in technology?

The process of collecting feminist data for Feminist Data Set is an important design, because of how slow and analog it is. By holding collaborative workshops, multiple people are able to effect and change the systems of feminist data set. What could slow, community driven data collection look like at a larger scale? While this seems antithetical to technology, I believe there are ways to reimagine technical processes to create digital equity. The idea of a more communal data collection can bring the data labor to the surface and make it more public, which is an inherently political act. By creating a feminist data set, the process of creating the data set as well as it is inside of it a feminist act.