Interacting with VAs; whistle stop tour of personification and privacy literature

Interacting with VAs; whistle stop tour of personification and privacy literature
Photo by charlesdeluvio / Unsplash

There is a significant cluster of research that examines how humans psychologically, socially and behaviourally relate to their machines now that they can speak with them. Thus far, research suggests that speech can “trigger personification tendencies in users” (Lopatovska and Williams, 2018), who may use seemingly unnecessary politeness markers, reprimand, or even express emotional attachment to their device. A plethora of experiments have been conducted that look at relational emotions such as Trust in CAs (see Rheu et al, who looked at 29 published between 2000 and 2019 alone).

This personification tendency may be more pronounced in vulnerable populations. For instance children, whilst able to differentiate between CAs and people (Aeschlimann et al, 2020), are still generally more likely to attribute human-like traits to VAs (Garg and Sengupta 202028, and Xu and Warschauer, 2020) than adults. As one study participant commented; “He is somehow alive in an electric way” (Strathmann, 2020). Older people also present an intriguing case study, as VAs may play a more intimate part in building the fabric of their social lives; whilst it drew on only a modest sample size (and skewed heavily female) one study found that factors such as loneliness might positively impact inclination to anthropomorphise (Pradhan, Findlater and Lazar). Other socio-technical work has examined how VAs might perpetuate harms such as misinformation and gender stereotypes, as well as the opportunities they might provide in the context of accessibility (for instance, supporting visually impaired people.)  

One of the most extensive bodies of VA research is privacy and security. This “rich tapestry of research” (or “disorganised mass”, depending on one’s perspective), is one with cross-disciplinary contributions from Computer Science and Engineering, HCI, CSCW, and Sociology.  

The reasons are obvious; with the constant, intimate companionship of VAs comes a waterfall of privacy concerns. The wake word response protocols for VAs require the devices to be constantly ‘listening’, by which we simply mean constantly capable of activation. They are generally located in highly sensitive contexts; in the case of smart speakers, usually a home location such as a living room or kitchen, and in the case of mobile phones, by day in one’s bag or pocket, and by night beside one’s bed. The volume of audio data being recorded, and how it has been handled, has been the subject of extensive scrutiny. Apple, Amazon and Google have all on the receiving end of criticism for their handling of customer audio data.  

It may be worth pausing for a moment on why this matters. After all, many people in the West have only ever known a world in which their existence proliferates data, which is stored, processed, moved and analysed by a plethora of profit-oriented companies, the vast majority of whom they will never even be able to name; our data is now irrevocably in the world, and being put to increasingly elaborate use.  

It matters because establishing consensus now about fundamental principles – does our data belong to us, and what rights does this entail – lays the groundwork for the practicalities that must be in place to ensure a healthy future. Many people may be unpeturbed by whether their social media searches end up powering product recommendations, but they are very likely to be concerned about whether an insurance company can deny them medical coverage on the basis of those searches. Some of us may have nothing to fear from the improvement of facial recognition – may trust our political parties to be responsible with such technology – others may have less reason to feel secure. When a British man was prosecuted in the UK for murder on the evidential basis of recordings from his Alexa device, the public was divided; in a US poll by Pew, 49% of Americans said that they thought audio recording being shared with law enforcement in the course of criminal investigations was unacceptable (vs 25% who thought it was okay, and 25% who were unsure). In a globalised world in which data and legislative norms flow across borders, decisions around issues such as these will impact us all. If our data is not ours, if it is for sale, or for someone else to govern unilaterally, then we have no say in its uses. Not only that, but neither does anyone else.  

The very term ‘Privacy’ has, as Sheppele puts it, “an embarrassment of meanings.” Perhaps because of this amorphousness, Seymour et al’s 2023 review of all ethics-oriented papers on VAs found ’privacy’ was the most prevalent keyword. To give it some shape, it may be worth drawing on the ICO‘s tripartite framework for thinking about ’data protection harms‘; cause (factors that create, entail or exacerbate risk, for example, financial data being shared), event (an occurrence, real or potential – such as, identity theft), and resulting harm to the person (injury or harm, for instance, loss of funds, stress and anxiety). 

In the context of adults interacting with devices like voice assistants, the framework might be something like;  

  • Causes (poor user understanding of what data is being collected, opaque control mechanisms, sensitive personal conversations happening in their presence, abusable audio data, highly triangulated and situated data across services),  
  • Events (less common: active intrusion or attacks, more common: passive surveillance without informed consent),  
  • Harm (less common: exposure, appropriation, more common: manipulation of service users for financial gain, exclusion or altered terms of service).  

Unfortunately, the evidence suggests that these harms are not fringe considerations. Many of these risks are not just real, but common – 30% of Americans in a 2019 Pew survey reported experiencing identity theft in the last 12 months alone.^4 Researchers concur. It has been demonstrated that active attacks from bad actors are more than feasible (Edu), that protective measures are woefully insufficient (Courtney) and that data is certainly being collected in error (Schonherr et al). Solutions to this spectrum of privacy issues – both technical and legislative – has attracted a great deal of research energy as well. There have been suggestions of standardized frameworks for data collection and processing (Bytes et al., 2019) as well as countermeasures and detection features to establish IoT security (Sudharsanet al., 2019; Javed and Rajabi, 2020). 

If CS has devoted most energy to privacy practices and security vulnerabilities, then HCI researchers have generally focussed on how these might violate user understanding and expectations. In a US-based survey with 80 participants, Sharma et al demonstrated that most people do not fully understand which data their VAs collect, or how it is used. Indeed they found “that most participants had superficial knowledge about the type of data collected by GVA… 38.7% of users were unaware about the collection of audio clips by GVA.” This is echoed across devices; Zhang found that Alexa users – for the most part – ”did not understand the security implications of interacting with third parties via Alexa’s voice use interface”. Microsoft and Bing’s 2019 study found that 52% of respondents were concerned about their data in the context of voice tech - a figure that may even be too conservative; other surveys claim that 81% of us feel we have little to no control over data collection on our devices, and that the risks of this may outweigh the benefits. (Pew, 2019)

The discomfort may even be increasingly warranted. Audio data is becoming a more tightly regulated commodity. Not previously considered biometric data, there are moves to rethink its status. Ten years ago, the amount of audio data needed to build a convincing vocal replica (‘deepfake’) would have been impractical for most actors to collect. Now, there are many companies dedicated to speech synthesis and impersonational Text-to-Speech (TTS) and they are increasingly capable of generating reasonable avatars with small amounts of data. Apple‘s September 2023 Personal Voice feature requires only 150 utterances to train a reasonable vocal avatar. As Kröger et al. have pointed out, there is also the possibility of “unexpected inferences” from clip analysis – it is possible, even straightforward, to make accurate speculations about speaker sex, age, emotional state and nationality from audio clips.  

Moreover, Voice Assistants are embedded in devices and ecosystems that collect data beyond audio clips. For instance, as discussed above, smart speakers may be connected to Internet of Things devices like smart bulbs and smart energy meters, or connect to services like calendars, email or online marketplaces. For instance, Amazon has been incentivised to make it easy to buy goods through Alexa, and Google to support the management of calendars. Lau has made the point that “Smart assistants’ capabilities can also be expanded through third party applications, also known as “skills” for Alexa and “actions” for Google.” These can include accessing sensitive government services, such as taxation information and medical advice. Taken in aggregate, the data generated in relation to VAs can naturally “reveal intimate insights” about users.  

copyright 2024 E.M.Lewis-Jong, all rights reserved


Select Bibliography
Personification of the Amazon Alexa: BFF or a Mindless Companion, Irene Lopatovska & Harriet Williams, CHIIR '18: Proceedings of the 2018 Conference on Human Information Interaction & Retrieval March 2018 Pages 265–268
https://doi.org/10.1145/3176349.3176868
 Minjin Rheu, Ji Youn Shin, Wei Peng & Jina Huh-Yoo (2021) Systematic Review: Trust-Building Factors and Implications for Conversational Agent Design, International Journal of Human–Computer Interaction, 37:1, 81-96, DOI: 10.1080/10447318.2020.1807710 
Sara Aeschlimann, Marco Bleiker, Michael Wechner, and Anja Gampe. 2020. 
Communicative and social consequences of interactions with voice assistants. 
Computers in Human Behavior 112 (2020), 106466. https://doi.org/10.1016/j.chb. 2020.106466 
Ying Xu and Mark Warschauer. 2020. What Are You Talking To?: Understanding 
Children’s Perceptions of Conversational Agents. In Proceedings of the 2020 CHI 
Conference on Human Factors in Computing Systems. Association for Computing 
Machinery, New York, NY, USA, 1–13. https://doi.org/10.1145/3313831.3376416 
Clara Strathmann, Jessica Szczuka, and Nicole Krämer. 2020. She talks to me as if she were alive: Assessing the social reactions and perceptions of children toward voice assistants and their appraisal of the appropriateness of these reactions. In Proceedings of the 20th ACM International Conference on Intelligent Virtual Agents (IVA '20). Association for Computing Machinery, New York, NY, USA, Article 52, 1–8. https://doi.org/10.1145/3383652.3423906https://dl.acm.org/doi/pdf/10.1145/3359316 Alisha Pradhan, Leah Findlater, & Amanda Lazar . “Phantom Friend” or “Just a Box with Information”. Personification and Ontological Categorization of Smart 
Speaker-based Voice Assistants by Older Adults. ACM on Human-Computer Interaction, Vol. 3, No. CSCW, Article 214. Publication date: November 2019. 
William Seymour, Xiao Zhan, Mark Coté, and Jose Such. 2023. A Systematic Review of Ethical Concerns with Voice Assistants. In AAAI/ACM Conference on AI, Ethics, and Society (AIES ’23), August 8–10, 2023, Montréal, QC, Canada. ACM, New York, NY, USA, 15 pages. https://doi.org/10.1145/3600211.3604679
Sestino A, Prete MI, Piper L, Guido G. Internet of Things and Big Data as enablers for business digitalization strategies. Technovation. 2020;98:102173. doi: 10.1016/j.technovation.2020.102173. [CrossRef] [Google Scholar] 
Grande D, Luna Marti X, Feuerstein-Simon R, Merchant RM, Asch DA, Lewson A, Cannuscio CC. Health Policy and Privacy Challenges Associated With Digital Technology. JAMA Netw Open. 2020 Jul 1;3(7):e208285. doi: 10.1001/jamanetworkopen.2020.8285. PMID: 32644138; PMCID: PMC7348687. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7348687/
Kim Lane Sheppele, Legal Secrets,pg 184 via https://www.jstor.org/stable/40041279?saml_data=eyJzYW1sVG9rZW4iOiIxYTMwMzIyMS1lODU4LTRjMTktYmI0NS1mZjE5ODZmMGZmZjUiLCJpbnN0aXR1dGlvbklkcyI6WyJkZWNjZTkxNS04ZWRkLTQyNTEtYTYzNi1mZDdlMzg0Yjk5YzIiXX0&seq=4
William Seymour, Xiao Zhan, Mark Coté, and Jose Such. 2023. A Systematic Review of Ethical Concerns with Voice Assistants. In AAAI/ACM Conference on AI, Ethics, and Society (AIES ’23), August 8–10, 2023, Montréal, QC, Canada. ACM, New York, NY, USA, 15 pages. https://doi.org/10.1145/3600211.3604679 
https://ico.org.uk/media/about-the-ico/documents/4020144/overview-of-data-protection-harms-and-the-ico-taxonomy-v1-202204.pdf 
https://www.pewresearch.org/internet/2019/11/15/americans-and-privacy-concerned-confused-and-feeling-lack-of-control-over-their-personal-information/  
Zhang, 2023, Do Users Really Know Alexa? Understanding Alexa Skill Security Indicators https://doi.org/10.1145/3579856.3595795 
Josephine Lau, Benjamin Zimmerman, and Florian Schaub. 2018. Alexa, Are You Listening? Privacy Perceptions, Concerns and Privacy-seeking Behaviors with Smart Speakers. Proc. ACM Hum.-Comput. Interact. 2, CSCW, Article 102 (November 2018), 31 pages. https://doi.org/10.1145/3274371