Meta Shows Exactly What Happens When Laws Fail to Keep Up With Technology

Dion Wiggins

CTO at Omniscien Technologies | Board Member | Strategic Advisor | Consultant | Author

Published Apr 25, 2025

When billion-dollar AI models are built on pirated books and no one is held accountable, it’s not innovation—it’s industrialized theft.

Technology moves fast. The law doesn’t. And when one sprints while the other stalls, it creates a gap wide enough for billion-dollar corporations to exploit without consequence.

That’s exactly what we’re seeing with Meta’s LLaMA models.

They were trained on over 7 million pirated books. Not scraped summaries. Not web blurbs. Full books—word for word—sourced from pirate sites like Bibliotik and LibGen. The collection included works by Pulitzer Prize winners, bestselling authors, literary award recipients, and thousands of others.

These authors were never asked, never informed, and never paid.

Internal Meta documents reveal this wasn’t a case of oversight. It was a calculated move. The team:

Knew the books were pirated
Stripped copyright pages from the files
Masked the data source
Instructed employees not to ask legal questions
And moved forward anyway

This is not “accidental ingestion.” This is deliberate appropriation—on an industrial scale.

The Same Crime, Just Digitized

Imagine for a moment that Meta received shipping containers full of physical books. These weren’t library donations or public domain texts. These were recent, copyrighted, published works.

They knew the books were stolen. They didn’t report it. They didn’t return them.

Instead, they:

Removed identifying markings,
Scanned every page,
Fed them into a commercial product pipeline, and
Sold the result to the market as their own.

Everyone would call that theft.

But because the files were digital—downloaded instead of delivered—Meta walks away untouched.

Same books. Same authors. Same outcome. Just a different format. And that, apparently, is all it takes to make theft legally invisible.

What If the Books Had Been Physical—Then Scanned in Digital Formats?

Let’s push the thought experiment further.

What if those 7 million books hadn’t been downloaded as digital files, but instead had been obtained in physical form—boxes and boxes of paperbacks and hardcovers? And then Meta, or any AI company, manually scanned them into digital format?

Would that change how we see the act?

Would it make the theft more obvious? Would the law respond differently if we could see the boxes being wheeled in, page by page fed into scanners?

Because in practice, that’s what happened. Whether downloaded or digitized by hand, the outcome is the same: copyrighted material was taken without consent, without compensation, and converted into fuel for commercial AI products.

The only difference is what it looked like.

And that’s the trap: when theft becomes abstract—just data, just text—it becomes easier to rationalize. But swap “files” for “books,” and suddenly the ethics snap into focus.

Of course, this scenario isn’t even hypothetical. It already happened once—at massive scale. Only then did the courts and the public start paying attention.

The Blueprint: Google Did It First

Meta didn’t invent this playbook—they borrowed it.

In the early 2000s, Google scanned millions of physical books from libraries—without permission. Entire volumes, copyrighted and protected, were digitized under the Google Books initiative. Google argued it was “for the public good.” Then they monetized the searchability of that corpus.

Publishers and authors sued. The legal battle dragged on for years. Eventually, the courts sided with Google under “fair use,” even though the books had already been used, processed, and indexed.

The message that sent to Silicon Valley was clear: Take first. Delay the fight. Profit regardless.

Meta didn’t just learn from that. They industrialized it.

If Google Did This in 2004, Why Haven’t the Laws Changed Already?

That’s the real question, isn’t it?

Google began scanning copyrighted books in 2004. The legal challenges started shortly after. By 2015, courts ruled in favor of Google under fair use—a decision that stunned many in the creative world. Authors had spent a decade fighting, only to watch the court endorse a system that copied first and justified it later.

So why didn’t lawmakers respond?

Because Big Tech lobbies harder than artists. Because legislators didn’t fully understand the implications of mass digitization at the time. And because the ruling, while controversial, was narrowly focused on search and indexing—not model training, not AI, and not derivative monetization at scale.

There was no sense of urgency. There was no regulatory foresight. And there was no accountability.

That void has now become a gold rush.

What was once about indexing books is now about extracting them, reshaping them, and commercializing the knowledge within—without a cent going to the original creators. And still, the laws have not caught up.

We had a warning 20 years ago. We ignored it. Now the cost is exponentially higher—and still rising.

And It’s Still Happening: The Pattern Repeats

Meta isn’t alone. OpenAI is facing multiple lawsuits for training on copyrighted works without consent—including suits from the New York Times, novelists like Sarah Silverman, and large classes of creators. These lawsuits allege unauthorized copying of full texts and training on proprietary data—exactly the same pattern.

Every time the legal system lags, another AI company takes advantage.

This isn’t a one-off.

It’s a business model.

Why the Law Fails—Badly

The current legal framework isn’t just outdated—it’s dangerously permissive when it comes to digital property. Here’s what it gets wrong:

“Copying” isn’t treated like stealing. Courts still treat digital content as intangible. But in practice, a pirated book is still a stolen book—especially at this scale.
Intent doesn’t matter. Meta’s internal documentation shows full awareness. There was no ambiguity. But that doesn’t make it a crime under current law.
Civil, not criminal. Copyright enforcement is almost entirely civil. That means individual authors would have to sue Meta themselves—an impossible fight for most.
AI training is unregulated. There is no binding law that governs what data can or cannot be used to train models. So companies push the boundaries as far as they can get away with.
No real enforcement. Regulators aren’t stopping this. Courts are slow. And for now, there’s nothing to deter repeat offenses.

This is not a gap in the system. This is the system being gamed.

Now They’re Suing the Whistleblower

Instead of accountability, we get retaliation.

Meta is now targeting the whistleblower who exposed this entire operation—the only person in this story who told the truth.

Not only did Meta admit they used the pirated books and knew about it beforehand—they later claimed in court that these books were “low quality” and “not valuable.”

Let’s be clear:

The books were valuable enough to train a cutting-edge model
Valuable enough to shape the linguistic core of LLaMA
Valuable enough to build a multi-billion-dollar product

But apparently, not valuable enough to pay the authors. And not valuable enough for the law to protect.

How Big Was the Theft? Let’s Talk Numbers

Let’s say Meta had properly licensed the books at just $10 each—a low-end estimate for a digital usage license.

That’s $70 million in licensing costs they avoided by simply taking the books.

Instead, they paid nothing—and monetized everything.

This wasn’t just a legal grey zone. It was a massive economic shortcut, executed at global scale, with zero consequences.

Who Was Affected? Everyone You’ve Read

This wasn’t obscure content. The training set includes books from:

Margaret Atwood
Stephen King
Zadie Smith
Jonathan Franzen
Neil Gaiman
Thousands of independent authors across every genre

And none of them consented.

This wasn’t public data. It was a full-blown raid on modern literature.

What Needs to Change—Immediately

If this stands, we aren’t just allowing theft—we’re institutionalizing it. Here’s what must happen now:

✅ 1. Treat digital theft like physical theft.

The format doesn’t reduce the crime. If you knowingly ingest pirated IP, you’re complicit.

✅ 2. Regulate AI training data.

Introduce legal requirements for documenting provenance, consent, and licensing for any dataset used to train commercial models.

✅ 3. Introduce criminal liability for industrial-scale infringement.

Civil suits aren’t enough. This is organized IP laundering at scale.

✅ 4. Create an AI IP Enforcement Body.

Empower a neutral regulator to audit training datasets and respond to violations—not leave it to overwhelmed creators to sue trillion-dollar firms.

✅ 5. Protect whistleblowers.

They are not the problem. They’re the warning signal. And right now, they’re being punished for doing the public's job.

This Affects Everyone—Not Just Authors

If this precedent holds, no digital work is safe—not yours, not anyone’s.

Your code
Your art
Your research
Your voice
Your photos
Your writing

Anything you’ve ever created online can be scraped, repackaged, and fed into a model that turns your labor into someone else’s product—without your knowledge, consent, or compensation.

The issue isn’t that the laws haven’t caught up. The issue is that the companies know it, and they’re racing to exploit the gap while it still exists.

It’s almost as if, if you’re big enough, the rules don’t apply to you. You can break the law, ask for forgiveness, and know you’re not going to pay a real price later.

If you're a creator, a policymaker, or someone who still believes consent and compensation matter—now is the time to act. Because if we don’t fix this, the foundation of AI will be built on theft.

See my other related research:

#Meta #AI #CopyrightTheft #DigitalSovereignty #BooksMatter #AIEthics #IPTheft #LLM #FairUseFraud #OpenSource #WritersRights #TechAccountability #MetaLLaMA

About the Author

Dion Wiggins is Chief Technology Officer and co-founder of Omniscien Technologies, where he leads the development of Language Studio—a secure, regionally hosted AI platform built for digital sovereignty. Language Studio powers advanced natural language processing, machine translation, generative AI, and media workflows for governments, enterprises, and institutions seeking to maintain control over their data, narratives, and computational autonomy. The platform has become a trusted solution for sovereignty-first AI infrastructure, with global clients and major public sector entities.

A pioneer of the Asian Internet economy, Dion Wiggins founded one of Asia’s first Internet Service Providers—Asia Online in Hong Kong—and has since advised hundreds of multinational corporations including Microsoft, Oracle, SAP, HP, IBM, Dell, Cisco, Red Hat, Intuit, BEA Systems, Tibco, Cognos, BMC Software, Novell, Sun Microsystems, LVMH, and many others.

With over 30 years at the intersection of technology, geopolitics, and infrastructure, Dion is a globally recognized authority on AI governance, cybersecurity, digital sovereignty, and cross-border data regulation. He is credited with coining the term “Great Firewall of China,” and his strategic input into national ICT frameworks was later adopted into China’s 11th Five-Year Plan.

Dion has advised governments and ministries across Asia, the Middle East, Europe, and beyond on national ICT strategy, data policy, infrastructure modernization, and AI deployment—often at the ministerial and intergovernmental level. His counsel has helped shape sovereign technology agendas in both emerging and advanced digital economies.

As Vice President and Research Director at Gartner, Dion led global research on outsourcing, cybersecurity, e-government, and open-source adoption. His insights have influenced public and private sector strategies across Asia Pacific, Latin America, and Europe, supporting decision-makers at the highest levels.

Dion received the Chairman’s Commendation Award from Bill Gates for software innovation and was granted the U.S. O-1 Visa for Extraordinary Ability, a designation reserved for individuals recognized as having risen to the top 5% of their field globally.

A seasoned speaker and policy advisor, Dion has delivered insights at over 1,000 international forums, including the key note for Gartner IT Symposium/Xpo, United Nations events, ministerial summits, and major global tech conferences. His analysis has been featured in The Economist, The Wall Street Journal, Time, CNN, Bloomberg, MSBC, BBC, and more than 100,000 media citations.

At the core of his mission is a belief that sovereignty in the digital era is not a luxury—it’s a necessity.

“The future will not be open by default—it will be sovereign by design, or not at all.”

AI Analysis from the Field

2,212 followers

+ Subscribe

Martin Montero

XR & AI: Bridging Research, Workforce Development, and Industry Transformation. Translating complex tech concepts into practical solutions that create human-centered technological frameworks that augment human potential.

But if a college kid gains access to Jstor or any schorlary articles database, using similar methods as Facebook and Google have done (a lot of this research funded with tax payer dollars) so they can continue their educational pursuits without the economic barriers of paying for this access they will get arrested and thrown in jail with major fines.

1 Reaction

Larry Rosenthal

Metaverse/ Spatial Design Pioneer , 30+ years. OG creator of Transmedia 3D Worlds and IP's / Owner- CubeXR LLC , Vice Chair - LA ACM SIGGRAPH 2021-25

30 years ago. Dcma 1996 and section 230 were crooked ideas evangelized by those who groomed just about every bad actor you’re writing about today.

3 Reactions

Jim Duggan

It is really an extension of valuing capital above labor. Seeking replacement for skilled labor has driven massive investment for centuries. See: Industrial Revolution, mass production, etc.

1 Reaction

Stephanie G

Again if you are dreaming about the law it is effectively another trap, law works for the riches and the powerful ones, as always...Anything can buy they do the best, their army of the lawyers are going to tear you to pieces. For the EU fine to Meta, well effectively it was just they have been pissed off for not be seen the jurisdiction as a whole, NOT relevant to your individual right, sorry but if yes then lucky you.

1 Reaction

Alexander Gustafson

Award-Winning Illustrator | Creative Super Nova. Illustration, Concept Development, & Visual Design. Available for hire!

Why did no one feel this way when they stole everything from every illustrator on the planet? You literally use gen AI to talk about theft when it is also theft

2 Reactions

See more comments

To view or add a comment, sign in

Meta Shows Exactly What Happens When Laws Fail to Keep Up With Technology

Dion Wiggins

CTO at Omniscien Technologies | Board Member | Strategic Advisor | Consultant | Author

The Same Crime, Just Digitized

What If the Books Had Been Physical—Then Scanned in Digital Formats?

The Blueprint: Google Did It First

If Google Did This in 2004, Why Haven’t the Laws Changed Already?

And It’s Still Happening: The Pattern Repeats

Why the Law Fails—Badly

Now They’re Suing the Whistleblower

How Big Was the Theft? Let’s Talk Numbers

Who Was Affected? Everyone You’ve Read

What Needs to Change—Immediately

✅ 1. Treat digital theft like physical theft.

✅ 2. Regulate AI training data.

✅ 3. Introduce criminal liability for industrial-scale infringement.

✅ 4. Create an AI IP Enforcement Body.

✅ 5. Protect whistleblowers.

This Affects Everyone—Not Just Authors

About the Author

AI Analysis from the Field

2,212 followers

More articles by Dion Wiggins

Explore topics

The Same Crime, Just Digitized

What If the Books Had Been Physical—Then Scanned in Digital Formats?

The Blueprint: Google Did It First

If Google Did This in 2004, Why Haven’t the Laws Changed Already?

And It’s Still Happening: The Pattern Repeats

Why the Law Fails—Badly

Now They’re Suing the Whistleblower

How Big Was the Theft? Let’s Talk Numbers

Who Was Affected? Everyone You’ve Read

What Needs to Change—Immediately

✅ 1. Treat digital theft like physical theft.

✅ 2. Regulate AI training data.

✅ 3. Introduce criminal liability for industrial-scale infringement.

✅ 4. Create an AI IP Enforcement Body.

✅ 5. Protect whistleblowers.

This Affects Everyone—Not Just Authors

About the Author

AI Analysis from the Field

2,212 followers

More articles by Dion Wiggins

The AI Political Spin Cycle: How Fear Becomes Policy, and Hype Becomes ‘Fact’

AI Analysis from the Field – Newsletter – Week 12

Adoption Fraud at Scale: Meta’s “1.2 Billion LLaMA Downloads” Is Marketing Theater That Worked

ChatGPT Got Too Agreeable—And That’s Not Funny. It’s a Safety Failure.

The Top 85 Excuses Critics Use to Dismiss AI Writing—And Why None of Them Hold Up

AI Didn’t Ruin Writing. It Just Exposed Who Was Faking It.

AI Analysis from the Field – Newsletter - Week 11

When Meta Called It LLaMA, They Accidentally Told the Truth

Meta Stole Millions of Books to Train AI—Then Called Them Worthless. Now They’re Suing to Silence One.

🕊️ Ode to Skype and Simpler Times

Explore topics