Stars from Hollywood’s golden age are being reborn through celebrity AI voice cloning deals, a sign of “Wild West” concerns about unauthorized AI impersonations being addressed by new business models.
ElevenLabs, an audio technology startup funded by venture capital firms including Andreessen Horowitz and Sequoia has written several deals with the legendary actor’s property for the IconicVoices tool that allows users to read AI-generated voices through an audiobook application. The stars included Burt Reynolds, Judy Garland, James Dean and Sir Laurence Olivier.
ElevenLabs, which launches in 2023, creates audio for books and news articles, video game characters, film pre-production, and social media and advertising. The company has worked with publishers including the New York Times and the Washington Post and earlier this year, the company was selected by Disney to join its accelerator program.
“It takes about 30 minutes of high-quality audio to create a professional voice clone,” said Sam Sklar, a member of the ElevenLabs growth team, and the voice was created from a celebrity catalog. Once created, it can be called to read text (article, PDF, ePub, newsletter, or other text content). However, the sound and content cannot be exported, with all listening in the reader app.
Users can, for example, have articles narrated by James Dean in the app, but users can’t access voice for any content that isn’t already in the app.
This type of offering could help set the boundaries for a future where AI-generated voice content is less controversial and more of a controlled and curated area. Google Play and Apple Books already use AI-generated voices, despite the high barriers to human pacing, intonation and emotion.
The AI ​​industry has been plagued by concerns over the use of celebrity voices, with OpenAI doing a face on Mayafter actress Scarlett Johansson accusing the company of ripping off her voice after she turned down a license offer.
“We are very much alive to the risks associated with synthetic media and using devices safely,” Sklar said. Safeguards include active content moderation, possible liability with bans, and special provisions to protect the influence of AI votes in the 2024 election.
Among today’s generation of actors, there is still significant apprehension about the use of AI to generate voice content. Voice actors for video games have raised concerns, and last year’s film and television strike had strong roots in concerns about the use of AI. The use of good sound sold by plantations is a niche market that has the potential to avoid these pitfalls, representing new revenue streams from AI rather than lost revenue streams due to AI.
The use of celebrity voice-overs is a problem before AI, such as the case of Frito Lay in 1988 using Tom Waits’ voice in an advertisement, and another case of Waits in 2007, after Waits himself had long refused an advertising offer. AI is providing an easier path to creating soundalikes, and the recent lawsuit brought against AI startup Lovo for allegedly using inappropriate and unpaid voice actors to generate AI voices is a reminder that the world of AI voice generation may remain complex. , which is litigious. (Lovo has denied the claims in the suit and also points to a revenue-sharing model offered by actors for cloned voices.)
It’s difficult to establish protections in place without examining the specific language of IconicVoices’ contract, said Steve Cohen, a partner at Pollock & Cohen who is representing voice actors in an unrelated lawsuit alleging unauthorized voice cloning.
ElevenLabs shows how the IconicVoices tool obtains permission and manages the use of these voices.
“Giving them permission to use their voice is one of the basics,” Cohen said. “I think the key factors are permission, compensation, and control.”
The new, clearer law could also act as a disincentive for people tempted to make inappropriate noises, “not for hardened criminals, but for fringe cases,” Cohen said. But quoting Bette Davis in “All About Eve,” he added, “‘Buckle your seat belts; it’s going to be a bumpy ride.'”
How realistic the clone sound is is also a growing issue. Many experts say that if AI doesn’t “know” what it’s talking about, the quality of its performance is limited. Sklar said ElevenLabs’ latest level of speech quality is indistinguishable from real human speech. “The text-to-speech tool from ElevenLabs can understand the context of the words,” he said.
AI is only as good as the models it trains, and datasets of actors’ voices are part of the process.
“Neural models gain capabilities from imitating/remembering the nuances and patterns present in the training data,” said Nauman Dawalatabad, a postdoctoral associate at the MIT Computer Science and Artificial Intelligence Laboratory with extensive research in AI voice generation. “The quality and diversity of training data significantly affect model performance.”
Vocal delivery from movie stars can enhance AI’s mimicry and learning by providing “high-quality voice data sets for training and tuning large models” that Dawalabad says are essential to the process. But he expressed his reservations about “sounding humans” as a proper test for the AI ​​voice field, as it could reinforce the antagonistic relationship between human and synthetic voices.
Voice actors remain divided on the technology, with some refusing to consider any offer, but others saying the opportunity to clone voices for faster and cheaper production on some type of audiobook cannot be ignored. “AI technology can help with workflow. AI is not a new tool for voice talent, producers, and publishers, who are increasingly using it to improve quality control in post-production,” Michele Cobb, executive director of the Audio Publishers Association, told CNBC last year.
The new generative model has shown significant progress compared to previous iterations, making it increasingly difficult to distinguish between fake and genuine sounds by ear, according to Dawalatabad. AI voice licenses can lighten the workload for voice actors, he added, without replacing them, because they “intercede in the process by focusing on correcting or improving aspects that cannot be understood such as intonation, warmth, and emphasis, which are still challenging. ”