Stay Ahead of the Curve

Latest AI news, expert analysis, bold opinions, and key trends — delivered to your inbox.

Home Page » News » News » Encyclopaedia Britannica Sues OpenAI Over Training Data

Encyclopaedia Britannica Sues OpenAI Over Training Data

6 min read Encyclopaedia Britannica and its subsidiary Merriam‑Webster have sued OpenAI, alleging that ChatGPT was trained on tens of thousands of copyrighted articles and dictionary entries without permission. The lawsuit claims this use infringes on copyright and trademarks, particularly where AI outputs mimic or reference Britannica content, potentially diverting traffic and revenue from the company. March 16, 2026 14:28

In a significant escalation of the legal battle over AI training data, Encyclopaedia Britannica Inc. and its subsidiary Merriam‑Webster have filed a lawsuit against OpenAI in federal court in Manhattan. The complaint, lodged March 13, 2026, alleges that OpenAI improperly ingested and used nearly 100,000 Britannica articles and dictionary entries to train its flagship ChatGPT models — without permission, licensing, or compensation.

At the heart of Britannica’s argument is not just a claim of copyright infringement, but a broader accusation that generative AI systems are cannibalizing original content: the plaintiff asserts that AI‑generated responses that closely mirror Britannica content divert valuable web traffic, ad revenue and subscription potential away from its platforms. In plain terms: when users get a perfectly serviceable summary from an AI model, they aren’t visiting the encyclopedia or dictionary — and Britannica sees that as a direct commercial loss.

The complaint also accuses OpenAI of trademark infringement — claiming that the model’s outputs sometimes imply endorsement or affiliation with Britannica or Merriam‑Webster that doesn’t exist, and even attach brand names to erroneous or fabricated text.

For AI insiders and practitioners, this lawsuit matters for three key reasons:

1. It underscores ongoing tension over data used to train frontier models.
OpenAI and other AI labs have long argued that training on large swaths of publicly available text falls under fair use — the idea that models transform raw data into something new rather than reproducing it wholesale. But critics counter that when outputs begin to resemble the original expression of copyrighted material, or when models are used as substitutes for original sources, the legal shield becomes thinner. This debate isn’t limited to Britannica — similar suits have come from authors, news publishers, and metadata providers like Nielsen’s Gracenote in recent weeks.

2. The case highlights evolving arguments around AI “hallucinations” and brand misuse.
Britannica’s trademark complaint goes beyond classic copyright claims. By alleging that incorrect information — sometimes patently false — is being paired with its brand in AI responses, it raises novel questions about how liability and attribution should be handled when models generate false or misleading text. This touches on broader industry concerns about accuracy, trust, and corporate risk in deploying generative models — especially when they cite or mimic reputable sources.

3. The outcome could shape commercial relationships between AI labs and content owners.
This isn’t the first time Britannica has taken legal action over AI training data; it previously sued AI startup Perplexity AI in a similar dispute. The growing patchwork of lawsuits — from newspapers to encyclopedias to books — signals that content owners are increasingly willing to take this fight to court. A favorable ruling for Britannica could push AI developers toward new licensing deals, changes in training methodologies, and stricter controls on the data used to build models.

For OpenAI, which has not publicly commented on this specific case yet, the stakes are high. Similar litigation by news publishers like The New York Times has dragged on for years and is expected to continue redefining how copyright law intersects with machine learning.

More broadly, this lawsuit lands amid a broader reckoning in the AI ecosystem: developers, policymakers, and rights holders are still feeling their way through how to fairly compensate creators while advancing AI technology. The legal frameworks governing data usage, attribution, and model outputs are being tested in real time — with profound implications for innovation, regulation, and the economics of AI.

In other words: as generative AI becomes more central to information access, disputes like this are not edge cases — they’re the groundwork for how the next decade of AI will be governed.

User Comments (0)

Add Comment

No comments added yet.

Add Comment

Your Name: *

Comment Title: *

Your E-mail: * We'll never share your email with anyone else.

Your Comment: *

Comments will not be approved to be posted if they are SPAM, abusive, off-topic, use profanity, contain a personal attack, or promote hate of any kind.