How voice technology plunges us into the “Uncanny Vocal Valley“
by Antonio Valdez
If something non-human can scream out in pain, should we bother caring for it? When someone screams out in pain, we have the impulse to help. What if that thing had no heartbeat, should we even worry about it? What is this “something” is a cell, to be exact a yeast cell? Wait a minute? A yeast cell? Can such cells experience pain? And if so, could we hear that as pain? Is the voice of someone pain?
In a 2009 study, Jim Gimewsky found that something as simple as yeast cells have a way to communicate distress. Yeast cells are single-celled microorganisms that are similar to bacteria in structure. In the experiment conducted by Gimewsky that studied screaming yeasts, he found that they made a sound by vibrating their organelles when they were in distress (Gimewsky 2009). But were these sounds noise or signals? Some of Gimewsky’s colleagues believed that the sounds being produced weren’t screams and instead the connotations of the noises being made were a product of humans (Gimewsky 2009). His experiment raises some interesting questions. If a simple yeast cell can convey distress, how does this relate to human suffering? Is it like an anguished emotion? Humans have the impulse to anthropomorphize, that is ascribing human characteristics to nonhuman things (Hamilton 1983). We often find anthropomorphization in animation and anything non-living that humans grant agency and empathy. What is more: can something as complex as technology do the same?
Technology is a central part of our daily lives and seems to have taken on a life of its own. We ask Siri or Alexa a question; we listen to directions from our GPS; we are nudged by the subway announcement to get ready for our stop. We give technology a voice. We also hear technology alter the human voice every day. The rising popularity of auto-tune in the music industry is technology shifting our voice. How voice technology is influenced by humans and how technology influences our voice has deep entanglements with anthropomorphization. It is amazing how we give technology a voice and how our voices are altered by technology, but what would future conversations look like as voice technology advances? The answer is both amazing and terrifying.
First off, what does it actually mean to have a voice? In their book Locating Voice in Film: Critical Approaches and Global Practices, Tom Whitaker and Sara Wright suggest that voice is a conduit for language and is not only a form of communication but also a form of self-expression as well as a measure of social identity (Whitaker 2017). This definition is fluid. Voice is hard to pin down. Voice is also a critical concept (“you have a voice in this”) as well as a material phenomenon (vibrating vocal cords). Or more broadly: we think of the voice of nature as the sounds of animals make or we even say the wind “howls”. In terms of giving technology a voice, where can it be found?
The sound of technology can be found somewhere around or in the uncanny valley. The uncanny valley is a theory made by robotics engineer Masahiro Mori in 1970. The theory is that intense human-like robot replicas can create an intense sense of eeriness if they approach high levels of realism while still having small imperfections (Mori 1970). This can be applied to voice-altering technology as well as human-constructed voices for technology.
If a human-made robotic voice sounds too human-like while simultaneously retaining robotic-like qualities, it can fall into the uncanny vocal valley. We experience uneasiness from sounds from the uncanny valley because of the evolutionary fear of cognitive dissonance. This is the fear that when we encounter an entity that counters what we already know about them, we feel uneasiness (Stein 2017). We categorize groups such as humans or robots to which we associate specific beliefs about them. When those beliefs clash with what we see, we feel uneasy. But where does voice-altering technologies such as auto-tune fall? What about anthropomorphized versions of technology?
There are two aspects of voice that cause it to clash with what we know about it enough to fall into the uncanny vocal valley. The first is agency and the second is experience which is the ability to feel emotions and physiological sensations such as pain, bodily sensations, and pleasure (Grey et al. 2007). Sound can also have these qualities. Like yeast cells when they are in distress. There are two takes to the yeast cell screams. One side believes that these are screams of agony. The other side believes these screams are just noise— given significance by humans. Is the voice of technology the same? If we amporphamorsize technology then that then triggers the capacity to feel empathy for it. An example of this is in the animation of technology.
In the 2006 film Cars, all of the characters in movie are talking cars. They communicate with spoken language. Their voice hits all of the checkmarks required to fall in the uncanny vocal valley. Yet there’s no feeling of uneasiness watching these cars communicate. The cars have human qualities but still imperfections of being motor vehicles. They are able to leap over the uncanny valley by having enough empathy and agency through amporphamization. When imagination is added to amporphamorsize technology, they can escape the uncanny valley. Moreover, the human voice actors also allow the cars to reach nearly complete human likeness. Even though the technology assumes human-like qualities, animation is able to manipulate it into a form that is comfortable and recognizable to the viewer— allowing Cars to escape the uncanny vocal valley.
The future of the voice of technology and technology voice alteration advances every day. Voice alteration in music is becoming more common with the rising popularity of autotune. Singer Laurie Anderson began playing around with technological manipulation of voice with songs like “O Superman” in the 80s, and music producer and rapper Kanye West brought auto-tune into contemporary pop with his album “808’s and Heartbreaks”. Voice manipulation jumped the uncanny valley to nearly one-hundred percent human likeness, but this may hold consequences. The future of voice-altering technology is scary because it is becoming impossible to tell what is human and what is not. The rising popularity of “deep fakes” that began in 2017 makes the future especially frightening. According to the Congress Research Service, a deep fake is a term used to describe an ultra-realistic photo, audio recording, or video with artificial intelligence technologies that could serve a multitude of national security risks in the near future (Harris 2021). The voice of technology and voice-altering technology are advancing at such a high rate that now it is nearly indistinguishable from a real human voice. The voice of technology is essentially taking human’s voice. It is frightening how advanced technology’s voice is advancing, but it’s also spectacular. The feat of Cars anthropomorphizing technology and escaping the uncanny valley is something to behold. There is something amazing about the beauty and danger of feeling empathy for technology— and that all starts with the voice.