Algorithmic Listening: Voice and Affect
Thesis publication; available to read
The development of AI tools with the skills of emotional intelligence—often dubbed ‘affective computing’—is no longer a distant goal. Sensors and algorithms that can recognise and respond to emotions are already being marketed by a range of businesses, from start-ups to the major tech companies. Audio recording of paralanguage—how we convey meaning and feelings through volume, pitch, speed, rhythm and pauses when we speak, as well as by non-verbal sounds like sighs, cries and gasps—has provided a relatively cheap source of data for such AI tools. However, paralanguage is often more ambiguous and culturally influenced than written language. My analysis of a dozen such applications and the companies that market them indicates a marked lack of transparency about the data and research from which they are derived, raising concerns about their robustness, especially given the ‘black box’ nature of the deep-learning techniques employed.
What does it mean to be ever surrounded by sensory devices and AI tools which have the capability to ‘listen-in’? This thesis explores some of the psychological, political and cultural implications. Notwithstanding the considerable momentum for developing and applying paralinguistic computing applications to make life easier or reduce costs and the increasing ‘everydayness’ of such devices, my findings point to the desirability of strengthening ethical and quality-assurance frameworks. As well as fears about personal privacy, there is a risk of harm to individuals from decision-making based on algorithms that are much less evidence-based than companies may often claim.
60pp; Risoprint on 90gsm, black & teal
Video and sound installation
Working with an audio data corpus of unscripted telephone speech used for teaching machine learning models, I explore the process of developing and applying an algorithm based on voice and speech data in this piece. The results point up a number of issues of interpretability, where missing information or ambiguities can result in biased, stereotypical and unreliable results. How this technology is shaping the way knowledge and decisions are produced is explored in a sound essay, alongside which a sound installation plays data extracted by algorithms that mimic those examined.
A variation of this work will be on show at Dutch Design Week 2021.
To the Mother of All Bowls
Using a Teensy 3.2 microcontroller, audio adaptor, inbuilt mic and amp mounted on a breadboard I applied a Fast Fourier transform (FFT) to the audio signals of a poetry reading by Judy Grahn. The FFT takes as input a number of samples from the signal and produces as output the intensity at corresponding frequencies. Pulling this data into P5.js I created an audio-reactive visualisation for the reading.
Cyantypes w/ Martina Eddone
Paying attention to the “empty” spaces in the urban fabric of a local neighbourhood was the starting point for this project; an exercise in the simple act of noticing. Martina Eddone and I collected fragments of flora from voids—spaces that live alongside the infrastructure that are in continual transformation and regeneration. Through the cyanotype printing technique we documented small details of these fragments. The results were on show as a part of Meeting Grounds at Onamatopee Project Space.
Video essay w/ Emma Schep + Hi-Kyung Eun
This video considers the future of genetic data storage. Recently the global market for direct-to-consumer DNA testing kits, which provide insights into ancestry and genetic health, have boomed. However the businesses selling these services often harness customer data in unscrupulous ways. In the very near future it will be possible to fully sequence the genetic information of any living organism on a mass scale, but how will all this data be handled? This work considers a design scenario of a future decentralised bank system that sees access to scientific knowledge as a right, whilst also providing individuals with increased autonomy over how their personal information is shared.
A Generative Adversarial Network (GAN) trained on an image dataset of 7782 spectrograms used to teach machine learning models to detect and classify seven emotions from interpretations of vocal cues. The GAN synthetically-generates pseudo-random spectrograms based on the original dataset.