My name is Joellene. There are about two extra letters in there that make it confusing — it’s pronounced Jolene, like the Dolly Parton song. I’ve had people pronounce it Jo-Ellen, Joelle-lene, Joel-ene and so on, especially in contexts where people don’t really know me and are just going off the spelling, like at Starbucks when they give you your drink. But when it really matters, like when you’re graduating and walking across the stage to receive a little rolled-up piece of paper that actually isn’t your diploma: I’d rather a human completely butcher my name, than hear it from a text-to-speech machine.
I graduated from UC Berkeley’s new College of Computing, Data Science and Society on Thursday, 17th May 2024. As they reminded us no less than four times during the commencement ceremony, I’m part of the inaugural class, forever and always the first ever students to graduate from this new college. Weeks before, we received an email asking us to indicate the phonetic spellings of our names. This was obviously so the announcer wouldn’t pronounce our names wrong as we walk to receive our certificates. We all received little pieces of paper with our name and a QR code, and as we entered the stage someone was there to scan it.
I think I assumed that, like previous commencement ceremonies I attended, it would be one of the professors or lecturers reading students’ names out. When I graduated from the Media Studies major, it was more intimate because of that — the professor pronounced my name right because they knew me. He gave me a wink and ushered me on into the blinding lights of the stage. But having the text-to-speech function really made my CDSS commencement feel so much more impersonal. We walked across the stage to the toneless monotony of a machine, every syllable pronounced with the same distant cadence and enunciation, the machinic voice randomly switching between two equally-impersonal male and female stock recordings.
I can see exactly how the decision went: In order to eliminate human error, CDSS would use a machine to read out students’ names, so it would get it right every time, and offend no one.
It didn’t really get it right though. It got the common names right, but Cal is full of international students without Anglicised names. There was a student who’s name sounded like “Dongyeong Kim”, but with the “dong” and the “yeong” syllables pronounced as if they were Chinese, in the Mandarin Pinyin’s first tone: dōng, yōng. But that was weird: Kim is a Korean last name.
But within the framework of that decision — ‘eliminating human error’ — getting it wrong is fine. This way, there’s no individual professor to blame for the mispronunciation. It’s instead a nameless, faceless machine that can’t be accused of stuttering, inconsistency, or the lack of exposure to non-English names. This way, it’s fair. The machine got it wrong? Aw man 🤷♂️
I would argue that the names the machine got wrong were ones that most of the CDSS professors would likely get wrong anyway. Chinese names that start with a ‘q’ or a ‘j’ (e.g. Qiuquan), or Indian names like Prithvi or Atman might not lend themselves well to the American tongue. Given that, it becomes more a question of who would I rather get it wrong: a professor or a machine?
Personally, I think that a professor mispronouncing my name isn’t really ‘human error’, depending on how it’s done. I think it’s possible to tell, in the way a human pauses, frowns, hesitates and then in their tone when they do go for it, that they’re truly putting in effort to try to get it right. The evidence of that effort is meaningful to me because I interpret it as an act of care. Even if the text-to-speech voice got it completely right, I think I’d still rather a human completely butcher my name while trying to get it right.
More broadly, I think choosing to eliminate mispronunciations by a human at the cost of presenting a more impersonal ceremony is indicative of a larger fallacy: that the objectivity and accuracy we so prize in machines is the prime metric we should optimize for. There are two assumptions I question here. First, that machines are accurate — what is accuracy here? Accurate to how an American would pronounce the name “Youyun”, or how a native Chinese speaker would pronounce it? Is there an ‘accurate’ way to pronounce things at all? Second, that accuracy is the ideal. Aside from its formless definition, I think there’s charm in how someone pronounces your name, whether its how you want it to be or not, as long as they’re doing their best and showing they care. Putting the task entirely up to a machine allows the humans behind the decision to distance themselves from any mispronunciations or mistakes, but also allows them to avoid the uncomfortable task of meeting a name they don’t know how to pronounce. But that’s the charm of it, to me, that another human would attempt to overcome that discomfort, to give me a more personal and intimate experience, as I walk across that stage, and into a new chapter of my life.
Cross-posted on Substack here