How The Judge Got It Wrong In Bartz v. Anthropic About "Fair Use" (& Why AI Developers Shouldn't Be Cheering Just Yet)
No, AI Copying Is Not The Same As Human "Copying" (& Most Scraping Is On Creative Works Not Bought & Paid For, Which Was Critical To The Judge)
Yesterday, the entire media and AI developer communities were abuzz about the long-anticipated first court decision on the issue of whether AI training on unlicensed individual copyrighted works (books in this case) is infringement or, instead, defensible “fair use.” I wrote about it myself to explain the federal judge’s reasoning on his split decision in Bartz v. Anthropic — he concluded that it was “fair use” for Anthropic to train its LLMs on properly acquired (paid for) books, but likely not when it trained on pirated libraries of books (this second issue still goes to trial).
But what if the federal judge (William Alsup) simply got it wrong?
I think he did. And here’s why.
AI Copying Is Not The Same As Human “Copying”
The crux of Judge Alsup’s opinion (which I read in its entirety) focuses on factor 4 of the relevant 4-part “fair use” test courts use when analyzing the impact of infringing works on the market for plaintiff’s original works. Factor 4 is widely seen as being the most important one, and its focus is on “market substitution” — which was the pivotal rationale used by the Supreme Court in its most recent rejection of “fair use” in Andy Warhol Foundation v. Goldsmith (“Andy Warhol-Prince”). When there’s “market substitution” — commercial harm — by the infringing work on the original (a photograph of artist Prince in that case), then factor 4 strongly goes against a finding of “fair use.”
Here in Bartz, Judge Alsup rejected the notion that the books Anthropic copied to use for training its AI adversely impacted the market for the individual authors’ books. He essentially equates Anthropic’s unlicensed AI training to “copying” by humans for creative inspiration — something humans have done from the beginnings of time. In his view AI copying is no different than you listening to a song, liking it, and then creating your own song inspired by the original. The Judge puts it this way: “Authors’ complaint is no different than it would be if they complained that training schoolchildren to write well would result in an explosion of competing books. That is not the kind of competitive or creative displacement that concerns the Copyright Act.”
But AI training IS fundamentally different than human “copying” for creative inspiration. Humans generally don’t copy the creative works of others in their entirety (as Anthropic did here), because when they do it’s called infringement. And here, Anthropic concedes that it swallowed up all of the authors’ individual books, word for word.
Substantial Similarity Isn’t Needed
Yes, AI’s outputs/displays may show little resemblance to any individual creative work sucked into generative AI’s insatiable LLM black box due to the sheer numbers involved. The individual authors here in Bartz made no claim that Anthropic’s outputs created replicas of their works. But AI’s wholesale copying — at such grand scale and for such broad purposes — actually makes it more diabolical. Here’s why. When humans do copy for “inspiration” purposes, they aren’t creating entirely new commercial systems like Anthropic’s that are designed to enable billions of users to generate endless creative works that ultimately flood the marketplace.
The fuel Anthropic needs to power its LLMs is comprised of millions of individual works. Without them, its AI models would have no utility whatsoever. Anthropic invests tens of billions of dollars on its AI infrastructure. Shouldn’t it pay for the essential ingredients it needs to build its AI? Think of it even more esoterically. Think of each individual book in Bartz v. Anthropic as being a “widget” in a much larger machine — a screw used in car, for example. Aren’t all businesses expected to pay for their widgets and screws?
“Transformative” Outputs Aren’t Enough
Still, in the words of Judge Alsup, “the purpose and character of using copyrighted works to train LLMs to generated new text was quintessentially transformative” and is, therefore, “fair use.” RIght?
Wrong. The Supreme Court in Andy Warhol-Prince — in a remarkably unified 7-2 decision joined by both sides of the aisle — rejected the notion that the “transformative” nature of a purportedly infringing work alone ends the “fair use” issue. Writing for the majority, Justice Sonia Sotomayor noted that copyright’s protection is even stronger “where the copyrighted material serves an artistic rather than utilitarian function.” And here, in Bartz v. Anthropic, Judge Alsup was faced with Anthropic’s Claude being used to generate new creative outputs.
This Is Not The Same As “Google Books”
But what about the famous Authors Guild v. Google case (commonly referred to as “Google Books”), which is always the case Big Tech trots out to argue that no consent or compensation is required to the creators on which its AI trains. The Court there found “fair use” — and its same rationale applies here with generative AI. RIght?
Wrong again.
First, “Google Books” hailed from the Second Circuit Court of Appeals – not the U.S. Supreme Court –- so it isn’t the law of the land. But even if it were, the court in Google Books gave its stamp of approval to Google for fundamentally different reasons –- none of which were market substitution.
Yes, there was wholesale copying in that case too -- Google copied entire libraries of books without author consent. But Google did so to make them searchable and only displayed snippets of copied books in its search results. That, in turn, drove more discovery, sales and consumption of those books – not less -- which of course meant more dollars for the authors themselves. Google, in other words, promoted authors in Google Books. It didn’t seek to replace them.
It's the exact opposite when AI relentlessly scrapes creative works in their entirety. As I write above, Big Tech developed generative AI systems precisely to create commercial substitutes for wholesale sectors of the media and entertainment industry. Global news analysis and features? Who needs The New York Times when you have OpenAI (litigation ongoing). Stock photos? Who needs Getty Images when you have Stability AI (litigation ongoing). These media companies invested massively to create their individual works. Generative AI companies, however, believe they can simply take them –- no payment needed.
Authors & Creators Aren’t Trying To Stop Tech — They Just Expect To Be Paid
Silicon Valley predictably warns of dire consequences for any kind of roadblocks to generative AI’s unbridled arms race in the name of progress. But the Supreme Court dealt with similar doom and gloom pronouncements in Andy Warhol-Prince and roundly rejected them. Justice Sotomayor openly mocked claims that the Court’s decision would “snuff out the light of Western civilization, returning us to the Dark Ages ….” In her words, “It will not impoverish our world to require [the infringer] to pay [the creator] a fraction of the proceeds from its reuse of [the] copyrighted work.”
Exactly! That’s all we’re talking about here.
The creative community isn’t trying to stop Big Tech’s development of generative AI. To the contrary, it expressly acknowledges AI’s power and potential. The Human Artistry Campaign, a coalition of major media and entertainment organizations, lays out seven “core principles for artificial applications” in its mission statement, and its first principle expressly states: “Technology long empowered human expression, and AI will be no different.” That’s principle number one!
Instead, the creative community just expects AI developers to pay for the foundational ingredients they need for their LLMs to have value. In Justice Sotomayor’s words, new expression “does not in itself dispense with the need for licensing.”
So, Judge Alsup in Bartz v. Anthropic simply got it wrong.
And the next federal judge to rule on the issue of “fair use” in the generative AI context — Judge Vince Chhabria (also of the Northern District of California) — has the opportunity to “right the wrong” and go the other way. And he’s already signaled that he will.
Critical Post-Script: AI Community, Don’t Cheer Judge Alsup’s Decision Just Yet. Even He Believes That Content You Scrape Must Be Bought & Paid For.
Let’s not forget one more critical thing. Judge Alsup ruled that Anthropic’s use of the authors’ copyrighted books was “fair use” because Anthropic had already bought and paid for those books. But Alsup made a critical distinction for Anthropic’s use of pirated libraries of books for its LLM training — and, although he didn’t make a definitive ruling on that separate issue, Alsup signaled that it likely would go the other way at trial.
So AI developer industry, should you really be cheering Judge Alsup’s “fair use” decision in Bartz v. Anthropic? After all, most of your scraping of copyrighted works is on content that you never bought and paid for. We already know that virtually all of you use “classic pirate sites” to power your generative AI models.
And let’s not forget that a vibrant and growing market exists for the licensing of copyrighted works for AI training purposes. The New York Times just entered into a major licensing deal with Anthropic-fueled Amazon, for example. That too is an important element in Factor 4 of the relevant “fair use” analysis.
All my books are in there too. Looking forward to a boost to my royalties statements. 😄