"Google Books" Court's New Decision Sets Up 3 Paths to GenAI Infringement

And The New York Times' "Cease & Desist" Against Perplexity Showcases "RAG" Infringement's Path #3

Oct 21, 2024

Wake up! It’s your Monday morning brAIn dump! First, it’s this week’s “mAIn event” (my feature story — an important one, about critical “fair use” issues raised by The New York Times’ recent “cease and desist” against GenAI search tool Perplexity — and the 3 separate potential paths to finding infringement, including so-called “RAG” processing’s third path). Next, it’s the “GenAI Headlines of the Week,” together with this week’s podcast. Finally, it’s McKool Smith’s “AI Litigation Tracker” (the Firm’s detailed updates on key infringement cases).

I. The mAIn event - GenAI’s 3 Potential Paths to Infringement (& Why The Second Circuit’s New Hachette Decision Is Bad News for GenAI’s “Fair Use” Arguments)

I usually write about generative AI-powered copyright infringement on the “input” side of the equation — i.e., the content GenAI developers scrape to train their large language models (LLMs). Last week, I laid out why I believe that Tech’s wholesale reliance on “fair use” to defend its non-consensual and uncompensated scraping is increasingly precarious — both ethically and legally. I call this GenAI infringement’s “Path #1.”

But, as I’ve frequently pointed out, and despite Tech’s efforts to muddy the copyright waters by lumping all GenAI-related copyright issues together into one overall basket, there’s more than one path to copyright infringement. In fact, two more possible paths exist. That’s this week’s critical topic. So let’s put aside Path #1’s training issue for now (which itself is fatal if a courts rejects “fair use”) and focus on Paths #2 and #3.

GenAI’s Second & Third Potential Paths to Infringement

I think it’s fair to say that most concede (including many in Tech) that AI outputs can’t be “substantially similar” to the relevant unlicensed creative work (or any individual’s voice, likeness, etc.) on which an LLM has trained. Even that is generally a bridge too far for those who believe unlicensed “training” is always “fair use.”

But other critical factors arise in the utterly unique context that is GenAI, and “substantial similarity” and direct visual resemblance on the “output” side of the equation likely isn’t necessary to find actionable infringement — for both Paths #2 and #3 discussed below. The most critical recent copyright infringement court decisions back this up — (1) the U.S. Supreme Court’s Andy Warhol decision (which I discussed at length last week), and (2) the Second Circuit Court of Appeals’ new September 4th decision in Hachette Book Group v. Internet Archive (“Hachette”), which relies heavily on Warhol’s new reasoning. Remember, the Second Circuit is the same court that decided the famous “Google Books” case on which GenAI tech companies hang their “fair use” hats. So the Second Circuit’s new decision in Hachette is critical.

Setting the Stage for Infringement Paths #2 and #3: The New York Times’ “Cease & Desist” to Perplexity

Consider this real world situation. If an LLM (like Jeff Bezos-backed AI search site Perplexity) trains on quality content sources (like The New York Times) without a license in order to be your personalized one-stop source for up-to-date news and information, and if Perplexity spits out its up-to-date summaries based on those unlicensed quality sources, should the mere fact that Perplexity’s output (for any specific prompt) doesn’t precisely track The New York Times’ relevant articles word-for-word mean that The Times has no claims against Perplexity?

The New York Times certainly doesn’t think so. That’s why the company just demanded Perplexity to “cease and desist” from actions it claims “unjustly enriched” Perplexity when it used the The Times’ “expressive, carefully written and researched, and edited journalism without a license.”

“RAG” Is Critical Here, Yet Little Discussed & Understood

Perplexity — as well as other GenAI tools like ChatGPT (which itself is the focus of The New York Times’ separate lawsuit against OpenAI) — use so-called “RAG” (Retrieval-Augmented Generation) natural language processing. RAG combines real-time information retrieval with GenAI to enable LLMs to retrieve relevant external content, and then use that retrieved content to generate the most accurate, informative, contextually relevant, and up-to-date responses to user prompts. Perplexity’s use of RAG is at the center of The New York Times’ new “cease and desist” against it following infringement Paths #2 and #3.

First, RAG-enabled search is only possible by direct real-time (or near real-time) copying of the content on which it crawls (using search indexes like Bing to even side-step content behind paywalls), even if that content isn’t ultimately exposed to the end user. And wholesale copying without consent is infringement (this is Path #2) (to be clear, Path #2 is completely separate from Path #1’s copying for “training” purposes — it’s a second set of copying altogether). Second, and apart from Paths #1 and #2, RAG-powered GenAI “outputs” can be infringing, even if they aren’t substantially similar to the relevant “inputs” (this is Path #3).

“Fair Use,” The Importance of Context, and the Second Circuit’s New “Hachette” Decision

Perplexity would predictably say that its actions raised by both Paths #2 and #3 (not to mention Path #1) are all “fair use,” since its generated “outputs” do not explicitly track The New York Times’ content. But Perplexity’s arguments — just like similar arguments made by virtually all other GenAI companies — fail under both the Supreme Court’s Warhol decision and the Second Circuit’s latest ruling in Hachette.

First, the question of infringement in the context of GenAI is unlike any type of infringement previously considered by the courts. And, context matters — and can make all the difference. Don’t take that from me. Take it from the Second Circuit in Hachette, where the Court writes that “each case raising the question [of fair use] must be decided on its own facts” — and further, that “fair use is a flexible concept whose application varies depending on the context.” In the specific context before it in Hachette, the Second Circuit rejected the defense’s argument of “fair use.” It wasn’t a GenAI case, but Hachette’s core rationale is directly relevant to GenAI.

At least one federal court has already determined that GenAI infringement cases are utterly unique and, therefore, require new ways of thinking about what constitutes infringement. U.S. District Judge William Orrick of the Northern District of California, overseeing a high profile case against image generator Stability AI, recently wrote that “run of the mill” substantial similarity copyright cases are “unhelpful in this case where the copyrighted works themselves are alleged to have not only been used to train the AI models but also invoked in their operation.”

So, because context matters, let’s take a close look at GenAI’s fundamental over-arching themes. In Google Books, the Second Circuit excused Google’s copying on “fair use” grounds, because Google’s actions “amplified” the market for the books it copied (I won’t repeat my analysis of “Google Books” here, but you can refer back to last week’s newsletter where I discuss the issue at length). But in its new Hachette ruling, the Second Circuit differentiated its context from Google Books to reject “fair use,” relying heavily on the Supreme Court’s rejection of “fair use” in Warhol (I also discuss Warhol at length in last week’s newsletter). The Second Circuit’s fundamental rationale for its decision in Hachette — just like the Supreme Court’s in Warhol — is a finding of “market substitution,” which is 180-degrees different than its market “amplification” rationale at the core of Google Books.

In reaching its decision in Hachette, the Second Circuit applied all four factors of the Copyright Act’s relevant “fair use” test, relying most heavily on the fourth which considers “the effects of the use upon the potential market for or value of the copyrighted work.” It called this fourth factor “undoubtedly the most important element of fair use” and concluded that defendant Internet Archive’s actions directly and adversely impacted the plaintiffs’ relevant commercial licensing opportunities.

RAG Tech Can Be Infringing for Two Distinct Reasons

Leveraging RAG processing, Perplexity and other GenAI search tools (like ChatGPT) spew up-to-date responses to prompts generated by crawling and copying the unlicensed content of The New York Times and others. In so doing, two types of infringements come into play. First, RAG can only deliver its up-to-date quality “goods” if it makes actual copies in real-time (or near real-time) of a limited number of quality content sources (I’m told by experts I trust that RAG can only copy a limited set of content sources in order to expeditiously generate its responses). The fact that end users may not see those full copies exposed in the relevant outputs doesn’t negate the fact that copying did take place. Second, apart from RAG’s outright copying, Perplexity’s outputs themselves also may be infringing if they are too similar to the copied sources.

In both cases, there can be little doubt that the kind of commercial harm and market substitution at the core of both Hachette and Warhol is also at stake. Applying the “fair use” test’s fourth factor (the most important one, as recent court decisions have emphasized), an established market now exists (and is growing) for media and entertainment companies to license their content for GenAI applications. OpenAI’s recent $250 million licensing deal with News Corp. is proof positive of that.

So Perplexity’s and OpenAI’s unlicensed use of The New York Times and other copyrighted content directly adversely impacts licensing revenues that would otherwise go The Times’ way. And, as the Second Circuit points out, “the impact on potential licensing revenues is a proper subject for consideration in assessing the fourth [fair use] factor.”

But Isn’t Perplexity’s RAG Output “Transformative” & Therefore Defensible “Fair Use”?

Perplexity and other GenAI players like OpenAI try to obfuscate and muddy the waters about infringement Paths #2 and #3 by first mixing them all up together into one basket, and second, by saying that GenAI “outputs” show no substantial similarity to The New York Times inputs — and, therefore, are protected “transformative” uses.

But the Second Circuit’s new ruling in Hachette — which heavily relied upon the Supreme Court’s Warhol decision — gives The New York Times strong arguments to beat back those “fair use” challenges. First, RAG’s real-time copying alone (separate and apart from the initial training copying Path #1) should be enough to find infringement for the market “substitution” reasons noted above. After all, Perplexity’s intent, when copying, is to be the single source for up-to-date information. Second, Perplexity’s GenAI “repackaging” of The Times’ unlicensed content doesn’t necessarily make it “transformative.” In fact, in the words of the Hachette court, “to be transformative, a use must do something more than repackage or republish the original copyrighted work.”

Critically, the Second Circuit noted that the Supreme Court in Warhol significantly narrowed what it means for an output to be “transformative” (and, therefore, non-infringing). If the “new work merely supplants the original” — i.e., is intended to “achieve a purpose that is the same as, or highly similar to, that of the original” — that can be enough. In fact, it was enough for the Supreme Court to reject “fair use.”

Perplexity’s actions can be infringing even if its “outputs” aren’t verbatim or even substantially similar reproductions of The Times’ copyrighted content. The Second Circuit, in fact, underscores that “the word ‘transformative,’ then, cannot be taken too literally.” If Perplexity could simply do as it pleases — build a business that competes directly with The Times based, at least in meaningful part, on using The Times content without licensing it (which it does when it uses RAG) — it would be doing so, in the words of the Court, “as to deprive the rights holder of significant revenues” that could lead to the kind of widespread “market harm that would result from the unrestricted and widespread conduct of the same sort.” In other words, to bless Perplexity’s acts, would be to bless the similar acts of the other Perplexity’s of the world.

Public Policy Favors Rights-Holders in the GenAI Context

Tech companies will predictably scream that public policy demands that their powerful new GenAI tools should be made widely available to the public — and that they should be applauded for enabling a new form of creative democratization. In their view, this societal good outweighs the Copyright Act’s commercial monopoly that it gives to individual rights-holders like The Times.

But even conceding some degree of merit to Tech’s purportedly noble argument, the Second Circuit in Hachette rejected similar ones, underscoring that the Copyright Act’s “monopolistic power is a feature, not a bug” — and“rewards the individual author in order to benefit the public.” The Court emphasized — much in the same way as did the Supreme Court in Warhol — that too literal of an application of an output’s “transformative” nature could lead to absurd results. To rule otherwise, it writes, would be to “significantly narrow — if not entirely eviscerate — copyright owners’ exclusive right to prepare (or not prepare) derivative works.”

The “derivative works” at stake here in the GenAI context are, among other things, the substantial lucrative commercial content licensing opportunities that should flow to rights-holders in order for GenAI to be able to “do its thing.” Without this essential content they need to feed, GenAI models are essentially practically useless.

At the end of its opinion, the Second Circuit poignantly quotes the written declaration of one of the individual plaintiffs, author Sandra Cisneros — using her story to show real world consequences to rights-holders if “fair use” defenses are applied too broadly. This is what Cisneros wrote about the infringement of her copyrighted works: “It was like I had gone to a pawn shop and seen my stolen possessions on sale.”

The same can be said here with The New York Times and other rights-holders in the context of GenAI. To excuse Perplexity’s unlicensed “taking” in this case would be to render The Times’ massive investment in its reporting and analyses essentially meaningless. Remember, it’s not just about the harm caused by one infringer like Perplexity. As the Second Circuit points out, it’s also about “the market harm that would result from unrestricted and widespread conduct of the same sort.” If The Times and other rights-holders can’t collect from GenAI players like Perplexity that try to cash in and compete using The Times and other content for free, then why even bother? As the Court notes in Hachette, “it is difficult to compete with free.“

Defendants Carry the Burden of Proving “Fair Use”

One more critical thing. The Second Circuit underscored that those who rely upon the defense of “fair use” to beat back claims of infringement (i.e., Perplexity in this example) carry the “burden of proving that the secondary use does not compete in the relevant market.” It’s not the other way around.

Perplexity, in my view, would not be able to satisfy its burden against The Times.

The New York Times’ recent “cease and desist” to Perplexity, together with its ongoing litigation against OpenAI, shine a bright spotlight on RAG and GenAI’s three potential pathways to copyright infringement. And I believe the Second Circuit’s new Hachette opinion — together with the Supreme Court’s recent Warhol decision — pave the way for rejection of Tech’s “fair use” attempts to block Paths #1, #2 and #3 to infringement.

Ultimately, I believe the courts should — and will — simply find “unjust enrichment” in these GenAI cases and demand that Tech pays to play.

What do you think? Reach out to me with your POV and perspective at peter@creativemedia.biz.

II. GenAI Headlines of the Week

(1) You Go, Nvidia!

According to Venturebeat (full article link here), “Nvidia just dropped a new AI model that crushes OpenAI’s GPT-4” and “outperforms” other GenAI industry leaders, including Anthropic. and all other major LLMs. This news is significant because Nvidia, to date, has been known for its high performance chips that enable the GenAI industry writ large (in other words, for its hardware). But now, with its quiet launch of its own AI model, Nvidia has moved into high-performance AI software as well — positioning itself for the first time as being a full-service, comprehensive AI solutions provider. It’s the same kind of strategy that has served Apple well over the past decades.

(2) Adobe Goes MAX!

Meanwhile, Adobe celebrated its own big week at its Adobe MAX event, where it showcased new GenAI-infused creator tools, including “ethically sourced” generative video (meaning that the company says it only trains its AI models on licensed and public domain works). Director and GenAI filmmaker Paul Trillo highlighted one example of Adobe’s fascinating new AI power for creators to me — “rotatable vectors” (check out this video tutorial by Adobe Design Evangelist Howard Pinsky and think of its production impacts).

III. Your GenAI Podcast of the Week: “Fair Use”? Or Simply “Unfair”?

This week, synthetic co-hosts discuss last week’s feature article, “Are Tech’s ‘Fair Use’ Dominoes Beginning to Fall?” I generated it by dropping its text into Google NotebookLM. No prompting needed.

LISTEN TO PODCAST

(NOTE TO MY READERS: As you can see by my use of Google’s NotebookLM to generate these podcasts based on my wholly human writing, I’m not against the use of GenAI. I embrace new tech, so long as it’s done right. GenAI developers scraping copyrighted works to train their AI models — without consent from, and compensation to, rights-holders — is not “doing it right” and “fair use” in my view, for all the reasons I’ve spelled out in these pages. BUT to be further clear, I believe strongly that fair solutions and compensation — rather than conflict — is the solution for GenAI developers and rights-holders because GenAI tech is here to stay. And I’m dedicated to driving fair licensing agreements and am directly involved in those efforts. Reach out to me at peter@creativemedia.biz if you would like to learn more).

IV. AI Litigation Case Tracker - Updates on Key GenAI Litigation (brought to you by McKool Smith)

Partner Avery Williams and the team at McKool Smith (recently named “Plaintiff IP Firm of the Year”) lay out the facts - and latest critical developments - via this link to the “AI Litigation Tracker”. You’ll get everything you need to know about each case.

(1) The New York Times v. Microsoft & OpenAI

(2) Sarah Silverman v. OpenAI (class action)

(3) Sarah Silverman, et al. v. Meta (class action)

(4) UMG Recordings v. Suno

(5) UMG Recordings v. Uncharted Labs (d/b/a Udio)

(6) Getty Images v. Stability AI and Midjourney

(7) Universal Music Group, et al. v. Anthropic

(8) Sarah Anderson v. Stability AI

(9) Authors Guild et al. v. OpenAI