In an interview with Reuters, Meta Platforms (META.O) ‘s chief policy executive said their new Meta AI virtual assistant was trained using public Facebook and Instagram postings but not private ones shared solely with family and friends to protect users’ privacy.
At the company’s annual Connect conference this week, Meta President of Global Affairs Nick Clegg said Meta did not use private chats on its messaging services as training data for the model and filtered private details from public datasets.
“We’ve tried to exclude datasets that have a heavy preponderance of personal information,” Clegg said, adding that Meta’s training data was publicly available “vast majority.”
Meta avoided LinkedIn’s material due to privacy concerns. Clegg’s comments follow criticism of Meta, OpenAI, and Alphabet’s (GOOGL.O) Google for using internet scraping without permission to train their AI models, which consume massive amounts of data to summarize and generate imagery.
The firms are considering how to handle private or copyrighted materials swept up in that process that their AI systems may replicate while authors sue them for copyright infringement.
Meta AI was CEO Mark Zuckerberg’s most important first consumer-facing AI tool introduced on Wednesday at Meta’s annual products conference, Connect. Unlike previous conferences on augmented and virtual reality, this year’s gathering concentrated on AI.
Meta claimed it used a bespoke model based on its strong Llama 2 big language model published for public, commercial usage in July to create the helper.
It will create text, audio, and pictures and include real-time information from Microsoft’s (MSFT.O) Bing search engine.
Clegg claimed Meta AI was trained on public Facebook and Instagram postings with text and photographs. He added Meta also banned photo-realistic public figure photographs to ensure safety.
On copyrighted materials, Clegg expected a “fair amount of litigation” on “whether creative content is covered or not by existing fair use doctrine,” which allows limited usage for criticism, study, and parody.
“We think it is, but I strongly suspect that’s going to play out in litigation,” Clegg added. Some image-generation firms reproduce Mickey Mouse, while others pay for or exclude him from training data.
OpenAI struck a six-year partnership with Shutterstock this summer to train with its picture, video, and audio libraries. If Meta had taken any precautions to prevent copyrighted photos, a representative cited updated rules of service prohibiting users from creating anything that infringes privacy and intellectual property rights.