Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More
Mostly AI is addressing a significant AI training bottleneck for enterprises by launching synthetic text capabilities. This innovative solution enables businesses to derive value from their private datasets without compromising privacy.
Today, Mostly AI offers a tool that creates synthetic versions of organizations’ proprietary information, excluding personally identifiable information (PII) and diversity gaps. This empowers teams to train large language models (LLMs) effectively for enhanced innovation and decision-making.
As AI training reaches a plateau, companies are seeking sources beyond public data to drive greater value and potential.
How does Synthetic Text work?
Synthetic data, generated artificially, serves as an alternative when real data is scarce, expensive, imbalanced, or unusable. While synthetic information, primarily images, has been utilized by enterprises, the advent of generative AI is set to revolutionize its application, encompassing a wider array of data types. Gartner predicts that by 2026, 75% of companies will use generative AI to fabricate synthetic data, a substantial increase from under 5% in 2023.
Despite the benefits of synthetic data, it may lack context and insights specific to an organization, hindering downstream model performance.
Mostly AI empowers enterprises to train their AI generators to produce synthetic data instantly. Initially focused on structured tabular datasets capturing transaction nuances, patient journeys, and CRM databases, the company has expanded into text data.
While large-scale proprietary text datasets like emails and chatbot conversations contain valuable information, they are challenging to use due to PII inclusion, diversity gaps, and structured data complexities.
With Mostly AI’s synthetic text feature, users can train an AI generator on their proprietary text and generate sanitized synthetic versions free from PII or diversity gaps. This tool captures text nuances, insights, and contextual structured data, offering a range of language model options.
Tobias Hann, CEO of Mostly AI, explained, “The selected LLM is fine-tuned with the original text data on the Mostly AI Platform. This process incorporates additional structured data to enhance the quality of the synthetic text, enabling the platform to produce downloadable or database-stored synthetic text.”
How will it help enterprises?
The synthetic text generated by Mostly AI’s platform empowers enterprises with analytics and generative AI use cases. While there are no live applications yet, the company plans to delve into prompt-response pairs for fine-tuning LLMs, especially for customer service applications.
This new feature provides a valuable solution for enterprises seeking to enhance their AI training efforts without privacy concerns. Mostly AI claims a 35% performance boost in text classifier training compared to GPT-4o-mini-generated data.
However, it’s crucial to note that performance comparisons with other synthetic generators like Gretel are pending, and Mostly AI’s platform has consistently demonstrated superior performance in quality and privacy of synthetic data.
“The Mostly AI platform has been benchmarked against other companies and solutions in the past and has consistently demonstrated superior performance when it comes to the quality (accuracy, fidelity) and privacy of the created synthetic data,” Hann stated.
VB Daily
Stay in the know! Get the latest news in your inbox daily
Thanks for subscribing. Check out more VB newsletters here.
An error occurred.
FAQs
What is the primary advantage of Mostly AI’s synthetic text functionality?
The primary advantage is the ability to generate synthetic text from proprietary data without compromising privacy, enabling enhanced AI training.
How does Mostly AI plan to apply synthetic text in real-world scenarios?
While specific applications are still in development, Mostly AI is considering prompt-response pairs, particularly for customer service fine-tuning.
How does Mostly AI compare to other synthetic data generators in terms of performance?
Mostly AI’s platform has demonstrated superior performance in both data quality and privacy compared to other synthetic data generators in the industry.
Credit: venturebeat.com