In the early years, getting an AI model like ChatGPT or rival Cohere to spit out human-like responses required a large team of low-cost workers who helped the model distinguish basic facts such as pictures of cars or carrots.
But more sophisticated updates to AI models in today’s competitive arena demand a rapidly growing network of human trainers with specialized knowledge — from historians to scientists, some with doctorates. “A year ago, we were able to hire scholars, just mostly to teach AI how to improve,” said Cohere co-founder Ivan Zhang, speaking of internal human trainers.
“Now we have licensed physicians teaching models of how to act in the medical environment, or financial analysts or accountants.”
For more training, Cohere, eventually valued at more than $5 billion, is working with a startup called Invisible Tech. Cohere is one of OpenAI’s main competitors and specializes in AI for business. Startup Invisible Tech employs thousands of trainers, works remotely, and is one of the main partners of AI companies ranging from AI21 to Microsoft to train AI models to reduce errors, known in the AI ​​world as hallucinations.
“We have 5,000 people in more than 100 countries around the world who are PhD, who have a Master’s degree and who specialize in science work,” said Francis Pedraza, founder of Invisible.
Invisibility pays up to $40 per hour, depending on the location of the worker and the complexity of the work. Some companies such as Outlier pay up to $50 an hour, while another company called Labelbox says it pays up to $200 an hour for “high skill” subjects like quantum physics, but starts at $15 for basic topics. Invisible was founded in 2015 as a workflow automation company that provided food delivery company DoorDash with digitizing its delivery menu. But that changed when an unknown research company called OpenAI contacted him in the spring of 2022, before the launch of ChatGPT.
“OpenAI came to us with a problem, that is when you ask the initial version of ChatGPT a question, it is going to hallucinate. You cannot trust the answer,” Pedraza told Reuters.
“They need advanced AI training partners to provide reinforcement learning with human feedback.”
OpenAI did not respond to a request for comment. Generative AI generates new content based on past data that it uses to train. However, sometimes they cannot distinguish between true and false information and produce false results known as hallucinations. In one famous example, in 2023 Google’s chatbot showed inaccurate information about which satellite was the first to take pictures of planets outside the Earth’s solar system in a promotional video.
AI companies are aware that hallucinations can detract from GenAI’s appeal to business and are trying various ways to reduce it, including using human trainers to teach concepts of reality and fiction. Since joining OpenAI, Invisible says it has become an AI training partner for most GenAI companies, including Cohere, AI21 and Microsoft. Cohere and AI21 are confirmed as clients. Microsoft does not confirm that it is an Invisible client.
“These are all companies that have training challenges, where the number one cost is computing power, and then the number two cost is quality training,” Pedraza said.
How does it work?
OpenAI, which started the craze around GenAI, has a team of researchers called the “Human Data Team” who work with AI trainers to collect specific data to train models like ChatGPT.
OpenAI researchers are conducting various experiments such as reducing hallucinations or improving writing style and are working with AI trainers from Invisible and other vendors, sources familiar with the company’s processes said.
Either way, dozens of experiments are being conducted, some with tools developed by OpenAI and others with vendor tools, the person said.
Based on what the AI ​​company wants – from being better at Swedish history or doing financial modeling – Invisible hires employees with the right degree for the job, reducing the burden of managing hundreds of trainers by the AI ​​company. “OpenAI has some of the most incredible computer scientists in the world, but they’re not necessarily experts in Swedish history or chemistry questions or biology questions or anything you can ask,” Pedraza said, adding that more than 1,000 contract workers provide OpenAI alone .
Zhang Cohere said he used Invisible trainers to find ways to teach GenAI models to find relevant information from large data sets.
competition
Among its competitors in this space is Scale AI, a $14 billion private startup that provides training datasets for AI companies. It has also been working on providing AI trainers, and counts OpenAI as a customer. AI Scale did not respond to an interview request for this story.
Invisible, which has been profitable since 2021, has just raised $8 million in initial capital, “We are 70% owned by the team, and only 30% owned by investors,” said Pedraza. “We facilitated a secondary round, and the latest trading price is in the range of half a billion dollars.” Reuters could not confirm the price. The first human trainers were involved in AI training through data labeling work that requires low qualifications and is also paid very little, sometimes as little as $2, usually done by people in African and Asian countries.
As AI companies roll out more advanced models, the demand for specialized and multilingual trainers is increasing, creating a well-paid niche where workers from various disciplines can become AI trainers without knowing how to code.
Demand from AI companies has led to the creation of more companies offering similar services. “My inbox is mostly flooded with new companies popping up here. I see this as a new place where companies hire humans just to create data for AI labs like ours,” Zhang said.
Published – 30 September 2024 09:39 IST