Plaintiffs in the case of Kadrey et al. vs. Meta have filed a motion alleging the firm knowingly used copyrighted works in the development of its AI models.
The plaintiffs, including author Richard Kadrey, submitted their “Reply in Support of Plaintiffs’ Motion for Leave to File Third Amended Consolidated Complaint” in the United States District Court in the Northern District of California.
The filing accuses Meta of systematically torrenting and removing copyright management information (CMI) from pirated datasets, such as those from the shadow library LibGen.
According to court documents, evidence suggests highly incriminating practices by Meta’s senior leaders. Allegations include that Meta CEO Mark Zuckerberg explicitly approved the use of the LibGen dataset despite internal concerns raised by the company’s AI executives.
In a December 2024 memo, Meta acknowledged LibGen as “a dataset we know to be pirated,” sparking internal debates about the ethical and legal implications. Documents also show top engineers hesitated to torrent the datasets due to concerns about using corporate laptops for questionable activities.
Internal communications suggest that, after acquiring the LibGen dataset, Meta stripped CMI from the copyrighted works—a practice plaintiffs argue is central to copyright infringement claims.
The Allegations Against Meta
According to the deposition of Michael Clark, a corporate representative for Meta, scripts were intentionally designed to remove information identifying works as copyrighted. These scripts targeted keywords like “copyright,” “acknowledgements,” and other common copyright markers. Clark stated this was done to prepare the dataset for training Meta’s Llama AI models.
Emails included as exhibits show that Meta engineers were uneasy about torrenting pirated datasets within corporate spaces. One engineer remarked that “torrenting from a [Meta-owned] corporate laptop doesn’t feel right,” yet the downloading and distribution of pirated data proceeded.
Legal counsel for the plaintiffs claims that by January 2024, Meta had “already torrented (both downloaded and distributed) data from LibGen.” Additionally, records indicate that hundreds of related documents were obtained months earlier but withheld during discovery, which plaintiffs argue reflects bad-faith obstruction.
During a December 2024 deposition, Zuckerberg admitted such activities raised “lots of red flags” and said it “seems like a bad thing,” though he provided limited direct responses on Meta’s broader AI training practices.
Expanding the Lawsuit
Initially focused on intellectual property infringement, the plaintiffs now seek to add two major claims:
- Violation of the Digital Millennium Copyright Act (DMCA) – Plaintiffs allege Meta knowingly removed copyright protections to conceal unauthorized uses in its Llama models.
- Breach of the California Comprehensive Data Access and Fraud Act (CDAFA) – This claim centers on Meta’s alleged use of torrenting to obtain copyrighted datasets unlawfully.
The DMCA claim highlights that stripping CMI reduces the likelihood of detecting infringement, while the CDAFA claim emphasizes concerns about Meta’s methods for acquiring the LibGen dataset.
Broader Implications
The case underscores growing tensions between copyright law and AI development. Plaintiffs argue that using pirated data denies rightful compensation to creators and undermines their livelihoods.
This legal battle arises amidst global scrutiny over generative AI technologies. Companies like OpenAI, Google, and Meta face increasing criticism for their use of copyrighted data in model training. Courts in the US and UK are grappling with landmark cases that could redefine rights management in the AI era.
In Kadrey et al. vs. Meta, US courts have shown willingness to hear claims about AI’s impact on copyright law. Plaintiffs cited The Intercept Media v. OpenAI, a recent New York decision allowing a similar DMCA claim to proceed.
The Stakes for Meta
For Meta, these allegations represent significant reputational and legal risks. As AI becomes central to its strategy, claims of reliance on pirated datasets could harm its credibility and future ambitions.
The outcome of this case could set legal precedents, shaping the AI industry’s approach to data usage and copyright compliance worldwide.
Sources: https://www.artificialintelligence-news.com/news/meta-accused-using-pirated-data-for-ai-development/, https://zvelo.com/data-piracy-challenges-daas-business-models/