The AI Litigation Tracker: Updates on Key Media and Entertainment Generative AI Infringement Cases (as of 6/29/25)
Here are the latest updates on the key generative AI focused IP infringement-related litigations that impact media, entertainment, marketing and more (as of 6/29/25) — monitored, compiled, updated and tracked in depth by partner Avery Williams and the team at McKool Smith (just voted “Plaintiff Firm of the Year” by The National Law Journal and with the most “Top 100 Verdicts” in U.S.). McKool is rich both on the copyright and patent sides of the AI & IP equation. The most significant developments for the week are listed first.
1. Kadrey et al. v. Meta: A Bombshell “Fair Use” Decision!
Background: Author Richard Kadrey, comedian Silverman, and others sued Mark Zuckerberg’s Meta on July 7, 2023 in the U.S. District Court (Northern District of California) for mass infringement - i.e., unlicensed “training” of their generative AI model on millions of copyrighted works, including their own. Meta’s defense is “fair use.” The judge assigned is Judge Vince Chhabria.
At first, in November 2023, the Court dismissed the bulk of plaintiffs’ claims against Meta. But the Court gave plaintiffs a chance to amend their complaint to add a more direct link to actual harm - and they filed their amended complaint in December 2023.
Current Status: Current Status: “Fair use” in this case, but not in all. After nearly six weeks, the Court ruled on the hotly contested cross-motions for summary judgment by the parties, finding largely in favor of Meta. But this ruling is not the end of the fight. Judge Chhabria limited his ruling, holding that “this ruling does not stand for the proposition that Meta’s use of copyrighted materials to train its language models is lawful” and that he would have sided with Plaintiffs if only they had made stronger arguments, instead of the “clear losers” which they opted to pursue.
We’ll get into the details, but here is a quick reference chart for understanding the opinion:
As we’ve discussed previously, “fair use” is predicated on four factors: 1) the purpose and character of the use, 2) the nature of the work copied, 3) the amount of the work copied, and 4) the effect of the use on the potential market for the work. Courts have held that the last factor is “undoubtedly the single most important element of fair use” because that factor goes to the motivating purpose of copyright law: “preserving the incentive for human beings to create artistic and scientific works.”
The first factor favored Meta, characterizing training LLMs as a “highlight transformative” use, despite its undeniably commercial character. The second factor favored plaintiffs, whose books were “highly expressive works ‘of the type that the copyright laws value and seek to protect.’” The third factor favored Meta because, although Meta copied the books in their entirety, the Court found this reasonable given its relationship to the transformative purpose of training LLMs.
In considering the fourth factor, which is often the most important factor, the Court observed that three arguments could be made for market harm by generative AI: 1) that LLMs regurgitate input works, allowing them to become free substitutes for the works, 2) that unauthorized copying interfered with a licensing market for the works, and 3) that LLMs can create works that compete with the originals and indirectly substitute for them in the market. The Court called last argument “far more promising,” but, unfortunately for Plaintiffs, they chose to focus on the first two arguments.
With respect to regurgitation, the Court observed that Llama regurgitated at most 50 words from Plaintiffs’ works, even under adversarial conditions designed to induce the models to regurgitate works. Because this was not enough to threaten the market for Plaintiffs’ books, the Court discounted the first argument.
With respect to the licensing-market argument, the Court held that a fair-use analysis cannot consider harm to the potential market for licensing the copyrighted material for the use in question. Every fair use decision could involve the loss of some possible licensing market for the use at issue, and rightsholders are not entitled to a licensing fee for fair-use copying of their works. The reasoning therefore becomes circular: You must first presume that there is no fair use to conclude that there could be harm to a licensing market for fair use, which is then the source of the “no-fair-use” assumption you made in the first place. The Court, therefore, discounted the licensing-market argument.
Turning to the fourth factor, Judge Chhabria held that outputs that are non-infringing, yet similar to the copyrighted work, could be a market substitute relevant to the fourth factor. But Judge Chhabria found that the Plaintiffs had failed to adequately raise the issue or develop the requisite record evidence to support the argument. Chhabria observed that “it seems likely that market dilution will often cause plaintiffs to decisively win the fourth factor” in cases like this one. Flooding the market with alternatives to human-generated works was indirect substitution, but “indirect substitution is still substitution.” The Court also pointed out that this potential for indirect substitution was magnified because LLMs are “a technology that can generate literally millions of secondary works, with a miniscule fraction of the time and creativity used to create the original.”
By devoting most of their briefing to the second factor (to which Judge Chhabria devoted two paragraphs) and failing to collect any empirical evidence to support market dilution, the Court’s hands were tied, and Meta came out victorious.
Another interesting component of the holding is that Chhabria was not moved by the fact that Meta pirated a large selection of the training material. The Court held that “To say that Meta’s downloading was ‘piracy’ and thus cannot be fair use begs the question because the whole point of fair use analysis is to determine whether a given act of copying was unlawful.” Chhabria did not distinguish between the copy made to acquire the books in the first instance, and the copies made to train the AI model. Judge Alsup’s decision in Bartz v. Anthropic (discussed below) reached a different conclusion under what may be slightly different facts. Either way, we don’t think we’ve heard the last of the piracy story.
This decision is sure to reverberate through all of the currently filed AI copyright cases. In ruling against Kadrey, Judge Chhabria set out a roadmap for future plaintiffs: develop a theory of indirect infringement and prepare evidence showing that authors are losing sales and jobs to LLMs. We expect that it won’t be long before these arguments are made in every case tracked here.
2. Bartz v. Anthropic: Another Comprehensive Fair Use Analysis in the Age of AI
Judge Alsup’s decision in Bartz v. Anthropic is another pivotal decision in the copyright fight over generative AI training. The case addressed whether Anthropic’s use of millions of copyrighted books — acquired both through piracy and legitimate purchase — to train large language models was fair use under 17 U.S.C. § 107. It was a mixed decision, largely favoring Anthropic, except for training on works acquired by piracy.
As with the case above, here is a brief chart summarizing the factors and who they favored:
The Court’s analysis of the first fair use factor — the purpose and character of the use— was central to its holding. For the use of books in LLM training, the Court found the purpose to be “spectacularly” transformative. The Court analogized the process to human learning: just as people read, internalize, and later draw upon books to create new works, so too do LLMs “memorize” and internalize patterns from the training data to generate new, non-infringing outputs. We note that Chhabria specifically disagreed with this particular stance, finding that LLM’s were far more dangerous to the market for the copyrighted works. In this case, Alsup emphasized that the LLMs did not output infringing copies or substantial duplicates of the Plaintiffs’ works, and that the training process itself was “orthogonal” to the original market for the books.
However, Alsup drew a sharp distinction when it came to Anthropic’s creation of a permanent, general-purpose digital library from pirated books. Here, the Court found the use was not transformative. The purpose was not to create something new, but to amass a comprehensive archive for potential future uses, which the Court likened to a simple substitution for a legitimate purchase of the copyrighted works. The Court rejected the notion that the transformative nature of LLM training could “bless” the initial act of piracy for library-building purposes.
The second factor considers the nature of the copyrighted work. The Court recognized that the works at issue — both fiction and nonfiction — were highly expressive and thus at the core of copyright protection. While this factor alone is rarely dispositive, the Court found it weighed against fair use for all copies, as the works were not merely factual compilations but contained significant creative expression.
Anthropic’s copying was extensive, involving entire works. The Court acknowledged that, in general, copying an entire work “militates against a finding of fair use.” However, it found that such copying was “reasonably necessary” for the transformative purpose of training LLMs, given the scale and complexity of the technology. The Court noted that LLMs require exposure to vast amounts of text to function effectively, and that the use of complete works was justified in this context. In contrast, the wholesale pirating of millions of books for the central library was excessive and not justified by any transformative purpose.
The fourth factor examines the effect of the use on the potential market for or value of the copyrighted work. The Court found no evidence that LLM training displaced the market for the original works, drawing an analogy to how teaching students to write does not usurp authors’ rights. The Court rejected arguments that the mere possibility of a licensing market for AI training should weigh against fair use, holding that copyright does not entitle authors to control all conceivable uses. However, the Court found that pirating books for a central library directly displaced sales and undermined the market, as each pirated copy was a lost sale.
Bartz affirms that using copyrighted works to train LLMs can be fair use if the use is transformative and does not substitute for the original. However, the Court held sharply against wholesale copying of works to build building a permanent digital library, especially when done through piracy.
We are anxious to see how other courts apply Bartz and Kadrey, particularly on points where the two opinions seem to differ: 1) the use of piracy to acquire copyrighted materials; 2) the difference between training an AI and training a human author, and 3) the potential for market dilution from similar, yet non-infringing works. There is sure to be more to come!
3. Disney & Universal v. Midjourney
Background: A couple weeks ago, Disney and Universal filed suit for copyright infringement against Midjourney, an AI image generation platform. This new filing is unique because, although the AI litigation battlefield has seen major plaintiffs in news media entities such as The New York Times, so far no major visual media corporations have entered the fray. Disney and Universal are alleging straightforward claims of direct and secondary copyright infringement based on Midjourney’s unlicensed use of copyrighted works for training its models, which lead to an ability to generate the unlicensed likeness of the Plaintiffs’ characters.
The copyright infringement in Disney and Universal’s complaint also stands out from that alleged by author and news plaintiffs in other cases because it relies on what could be considered “normal use” of the Midjourney product. The complaint is filled with examples of images depicting Marvel, Star Wars, and DreamWorks characters. These images were not generated in response to prompts crafted by attorneys preparing the complaint — they were generated by normal Midjourney users who wanted to see what Shrek would look like as a 1950s greaser, as an example.
The complaint incorporates all of the grievances we have already seen from other plaintiffs, such as the use of copyrighted material for training and the generation of works that seem to mirror copyrighted training data. However, Disney and Universal are uniquely positioned to argue that Midjourney is engaged in and profiting from the massive distribution of works that infringe their copyrights. The facts, combined with Disney’s behemoth status in the media industry, will likely lead to unique dynamics not seen in the other cases.
Current Status: Still in the earliest stages. Two weeks ago, Disney and Universal filed suit for copyright infringement against Midjourney, an AI image generation platform. This new filing is unique because, although the AI litigation battlefield has seen major plaintiffs in news media entities such as The New York Times, so far no major Hollywood studio had entered the fray. Disney and Universal are alleging straightforward claims of direct and secondary copyright infringement based on Midjourney’s unlicensed use of copyrighted works for training its models, which lead to an ability to generate the unlicensed likeness of the Plaintiffs’ characters.
This case is still in the earliest stages. Midjourney has 21 days to file an answer, but extensions to that time are incredibly common. We may not see anything out of the docket for several more weeks.
4. Reddit v. Anthropic
Background: Reddit, which has an ongoing generative AI licensing program, recently filed state court claims against Anthropic. Reddit filed its new state court action in California alleging breach of contract, unjust enrichment, trespass to chattels, tortious interference, and unfair competition. Reddit is an online forum and link-sharing site with millions of daily users and which recently held its initial public offering. Reddit offers materials posted by its users as training data available for licensing to companies like Anthropic, and its central complaint is that Anthropic has circumvented its licensing process in order to scrape training data without compensating Reddit, in violation of its User Agreement.
According to Reddit, although Anthropic has publicly stated that it does not scrape Reddit for training data, Reddit’s audit logs show that Anthropic has continued to deploy “automated bots to access Reddit content more than one hundred thousand times” in the months following these statements. Reddit alleges that this access outside its permitted licensing channels posses a risk to its users’ privacy as well as to the performance of the site itself, which must process these incoming automated requests in order to serve responses to them.
This case is distinct from other cases reported here because — unlike many other Plaintiffs — Reddit has an established channel through which Anthropic could pay for and benefit from Reddit’s data. Reddit’s position, therefore, is not one of an unwilling participant in the development process of AI, but rather that of a company which very much wishes to be a go-to source for training data and which is attempting to ensure that this benefit is only provided to paying partners in this enterprise. We will continue to report as this case develops.
Current Status: No major substantive developments this past week. A few weeks ago, Reddit filed a new state court action in California against Anthropic alleging breach of contract, unjust enrichment, trespass to chattels, tortious interference, and unfair competition. Reddit is an online forum and link-sharing site with millions of daily users and which recently held its initial public offering. Reddit offers materials posted by its users as training data available for licensing to companies like Anthropic, and its central complaint is that Anthropic has circumvented this licensing process in order to scrape training data without compensating Reddit, in violation of its User Agreement.
According to Reddit, although Anthropic has publicly stated that it does not scrape Reddit for training data, its audit logs show that Anthropic has continued to deploy “automated bots to access Reddit content more than one hundred thousand times” in the months following these statements. Reddit alleges that this access outside its permitted licensing channels posses a risk to its users’ privacy as well as to the performance of the site itself, which must process these incoming automated requests in order to serve responses to them.
This case is distinct from other cases reported here because — unlike many other Plaintiffs — Reddit has an established channel through which Anthropic could pay for and benefit from Reddit’s data. Reddit’s position is therefore not one of an unwilling participant in the development process of AI, but rather that of a company which very much wishes to be a go-to source for training data and which is attempting to ensure that this benefit is only provided to paying partners in this enterprise. We will continue to report as this case develops.
5. Thomson Reuters v. Ross Intelligence
Background: On February 11th, 2025, in a case that comes tantalizingly close to deciding the issue of “fair use” in generative AI model training (with many taking the position that now that issue is firmly decided, as laid out below), Circuit Judge Bibas of the District of Delaware ruled that the “fair use” doctrine does not protect the use of West Headnotes in determining what to display as a result of a user query. Thomson Reuters v. Ross Intelligence involves an AI search tool made by the now-defunct Ross Intelligence (“Ross”). Ross’ tool accepted user queries on legal questions and responded with relevant case law. To determine what cases to provide in response to user queries, Ross compared the user queries to “Bulk Memos” from LegalEase, which were written using Westlaw Headnotes. Boiling it down, when a user’s query contained language similar to a West Headnote, Ross’ tool would respond by providing the cases that the West Headnote related to.
While Ross’s tool was not a modern generative AI model (it didn’t use a transformer model or perform next-token prediction to generate unique output for queries), an important similarity exists between Ross’ use of West Headnotes and the way generative AI models train on other copyrighted materials. Ross’ tool did not actually reproduce the West Headnotes in response to a user’s query. Ross used the Headnotes just for “training,” that is, to determine what to produce in response to a user's query. It is easy to draw an analogy between Ross’ use of West Headnotes to determine what cases are responsive to a user’s query, and OpenAI’s use of The New York Times articles to determine how to respond to a question about politics (see the separate The New York Times case against OpenAI summary below). The technology is different, but the themes are similar.
In that context, the Court’s grant of summary judgment against Ross’ fair-use defense — as a matter of law — provides insight into how another court might rule in a generative AI training case. “Fair use” is based on four factors: (1) the purpose and character of the use, (2) the nature of the copyrighted work, (3) the amount of the work used, and (4) the potential impact on the market. The Thompson Reuters Court found that factors two and three favored Ross because of the low degree of creativity involved in carving out headnotes from cases, as well as the fact that Ross did not output the headnotes themselves but rather judicial opinions. However, factor one favored Thomson Reuters because of the commercial nature of Ross’ product and the fact that it was not transformative. The Court noted that Ross’ product was not generative AI, suggesting that a generative AI product could be more transformative than the simpler lexical searching tool that Ross made. Finally, the fourth factor and “undoubtedly the single most important element of fair use” favored Thomson Reuters because of the potential impact on Thomson Reuters’ ability to sell its own data for use in training AI if Ross’ use was permissible. On balance, the Court flatly rejected Ross’ “fair use” defense as a matter of law. That question will not go to a jury.
AI developers will undoubtedly focus on the issue of transformative-use in generative AI fair-use battles to come, but the “commercial use” and “market impact” factors will continue to favor content owners over generative AI companies. We have already seen several massive licensing deals where companies like Reuters and Reddit are profiting from the sale of their own data. If courts continue to favor the “market impact” factor as we see in Thompson Reuters, then OpenAI, Suno, and the like will have an uphill battle to prove their “fair use” defense.
Current Status: We are still awaiting the “fair use” determination by the Third Circuit Court of Appeals, after the federal district court had certified Ross’ fair-use and copyrightability arguments for interlocutory appeal. On April 4th, the federal district court judge granted Ross Intelligence’ motion for interlocutory appeal off the Court’s summary judgment ruling against Ross’ “fair use” and copyrightability arguments. The Court stated that “Though I remain confident in my February 2025 summary judgment opinion, I recognize that there are substantial ground for difference of opinion on controlling legal issues in this case.” The two questions certified are “(1) whether the West headnotes and West Key Number System are original; and (2) whether Ross’s use of the headnotes was fair use.”
We’re not particularly surprised at this development. The country is watching this case rather closely because of its proximity to the generative AI training cases, and the hundreds of billions of dollars at stake there. Trying the case without considering the certified questions, and then having those issues reversed on appeal could waste everyone’s resources.
6. SDNY Multi-District Litigation (New Format for the Consolidated Cases)
Background: Federal Judge Stein (Southern District of New York) is the multi-district litigation (MDL) judge for twelve high-profile generative AI cases. On April 3rd, the MDL panel consolidated the following cases for pretrial proceedings:
1. TREMBLAY, ET AL. v. OPENAI, INC., ET AL., C.A. No. 3:23−03223
2. SILVERMAN, ET AL. v. OPENAI, INC., ET AL., C.A. No. 3:23−03416
3. CHABON, ET AL. v. OPENAI, INC., ET AL., C.A. No. 3:23−04625
4. MILLETTE v. OPENAI, INC., ET AL., C.A. No. 5:24−04710Southern District of New York
5. AUTHORS GUILD, ET AL. v. OPENAI, INC., ET AL., C.A. No. 1:23−08292
6. ALTER, ET AL. v. OPENAI, INC., ET AL., C.A. No. 1:23−10211
7. THE NEW YORK TIMES COMPANY v. MICROSOFT CORPORATION, ET AL.,C.A. No. 1:23−11195
8. BASBANES, ET AL. v. MICROSOFT CORPORATION, ET AL., C.A. No. 1:24−00084
9. RAW STORY MEDIA, INC., ET AL. v. OPENAI, INC., ET AL., C.A No. 1:24−01514*
10. THE INTERCEPT MEDIA, INC. v. OPENAI, INC., ET AL., C.A. No. 1:24−01515
11. DAILY NEWS LP, ET AL. v. MICROSOFT CORPORATION, ET AL.,C.A. No. 1:24−03285
12. THE CENTER FOR INVESTIGATIVE REPORTING, INC. v. OPENAI, INC., ET AL.,C.A. No. 1:24−04872
Current Status: OpenAI must preserve its logs. As we have been reporting, the magistrate judge had ordered OpenAI to preserve chat logs even if they would normally be slated for deletion and OpenAI objected. This week, based on reasoning provided during the June 26th hearing, Judge Stein denied OpenAI’s objections to the magistrate judge’s order that they preserve chat logs. This was not the only loss for OpenAI this week, with Judge Stein denying their request for cross-use of plaintiff documents.
The potential privacy concerns here are staggering. Anyone using LLMs to generate source code may need to rely on trade secret protection given the difficulty of copyrighting or patenting AI-generated content. If all of the logs generating that code must be preserved, and perhaps produced to third-parties in totally unrelated litigation, users should be cautioned to ensure that they are meeting the reasonable-measures requirement to protect their information.
7. The New York Times v. Microsoft & OpenAI
Background: On December 27, 2023, The New York Times sued Microsoft and OpenAI in the U.S. District Court for the Southern District of New York for copyright infringement and other related claims. The Times alleges that the companies used “millions” of its copyrighted articles to train their AI models without its consent. The Times claims this has resulted in economic harm by pulling users away from their paywalled content and impacting advertising revenue. The complaint alleges several causes of action, including copyright infringement, unfair competition, and trademark dilution. In its pleadings, The Times asserts that Microsoft and OpenAI are building a “market substitute” for its news and further that their AI generates “hallucinations” based on The Times’ articles also substantially damage its reputation and brand. The Times seeks “billions of dollars of statutory and actual damages.” Microsoft and OpenAI assert the defense of “fair use” - i.e., no license, payment or consent is needed.
On September 13, 2024, the Court granted a motion to consolidate the case with another brought by the Daily News and other publications (see the MDL discussion above in Summary 1). The judge assigned to the consolidated cases is Judge Sidney Stein.
Current Status: OpenAI objects to preservation order. As we discussed previously, the Court has maintained its order for OpenAI to preserve chat logs that would otherwise be deleted under its privacy policies, citing concerns that the same users who might use ChatGPT to circumvent The New York Times’ paywall protections would be more likely to request deletion of their chat logs. OpenAI has maintained its objections to this order.
In its Objection to Preservation Order, OpenAI raised three main arguments. First, that the preservation order does not serve a useful purpose because the idea that certain users attempt to “cover their tracks” is far-fetched. Second, they argued that the Order is not proportional to the needs of the case, because it would require OpenAI to make infrastructural changes to support the retention and would strain user trust in OpenAI. Finally, OpenAI argued that the preservation order was based on false premises, stating that OpenAI did not “destroy” any data, and certainly did not delete any data in response to litigation events.
OpenAI’s false-premises argument cites support from a non-public declaration, but it appears that OpenAI may be relying on a narrow distinction between “deleting” data (which it is arguing is needed to maintain customer trust) and “destroying” data which the News Plaintiffs could argue gives rise to adverse inference.
8. UMG Recordings v. Uncharted Labs (d/b/a Udio)
Background: This case was brought on June 24, 2024, in the Southern District of New York, by a group of major record companies against the company behind Udio, a generative AI service launched in April 2024 by a team of former researchers from Google Deepmind. Much like Suno (above), Udio allows users to create digital music files based on text prompts or audio files. And as with the complaint against Suno, Plaintiffs rely on tests comprising targeted prompts including the characteristics of popular sound recordings — such as the decade of release, the topic, genre, and descriptions of the artist. They allege that using these prompts caused Udio's product to generate music files that strongly resembled copyrighted recordings. The claims are for direct infringement and related causes of action. The judge assigned is Judge Alvin K. Hellerstein.
Current Status: No major substantive developments this past week. The docket saw little movement this week following the Court’s clarification that digital copies of deposit copies would suffice from UMG earlier. However, the court will be holding a status conference on July 30th. But it is widely reported that UMG and the other major labels are exploring a potential licensing deal with Udio, pursuant to which the record companies would also get equity stakes in Udio.
9. UMG Recordings v. Suno
Background: The RIAA on behalf of the major record labels filed their lawsuit in the federal district Court in Massachusetts on June 24th, 2024, for mass copyright infringement and related claims based on alleged training on their copyrighted works. Suno is a generative AI service that allows users to create digital music files based on text prompts. This is the first case brought against an AI service related to sound recordings. In its answer on August 1st, 2024, Suno argued that its actions were protected by fair use. The judge assigned is Chief Judge F. Dennis Saylor, IV.
Current Status: No major substantive developments this past week. As we discussed six weeks ago, the Court adopted the parties’ joint stipulation and proposed order stipulating that digital copies of the works would suffice as deposit copies. The Court held a status conference on June 5th, at which the parties noted that they anticipated a joint motion to extend deadlines, and they filed that motion today, further pushing out the timeline for this litigation. Continue checking back and we will let you know when the case starts moving again. BUT it is widely reported that UMG and the other major labels are exploring a potential licensing deal with Suno as well, pursuant to which the record companies would also get equity stakes in Suno.
10. Concord Music Group, et al. v. Anthropic
Background: UMG, Concord Music and several other major music companies sued Amazon-backed OpenAI competitor Anthropic on October 18th, 2023 in the U.S. District Court (Middle District of Tennessee). The music companies assert that Anthropic is infringing their music lyric copyrights on a massive scale by scraping the entire web to train its AI, essentially sucking up their copyrighted lyrics into its vortex – all without any licensing, consent or payment. In its response, Anthropic claimed fair use. The case was transferred to the Northern District of California on June 26th, 2024 and closed in Tennessee. The judge assigned is Judge Eumi K. Lee. The parties have not yet had a case management conference.
Current Status: No major substantive developments this past week. The past four weeks have been quiet, with few docket developments. This week saw no new filings. Continue to check back and we will provide updates as soon as filings resume.
11. Sarah Andersen v. Stability AI
Background: Visual artists filed this putative class action on January 13th, 2023, alleging direct and induced copyright infringement, DMCA violations, false endorsement and trade dress claims based on the creation and functionality of Stability AI’s Stable Diffusion and DreamStudio, Midjourney Inc.’s generative AI tool, and DeviantArt’s DreamUp. On August 12th, 2024, the Court dismissed many of the claims in Plaintiffs’ first amended complaint, leaving the claims for direct copyright infringement, trademark, trade dress, and inducement. The assigned judge is Judge William H. Orrick.
Current Status: No major substantive developments this past. Those who have been following this case will be familiar with the ongoing dispute over whether Plaintiffs should be permitted to use Dr. Ben Yanbin Zhao as an expert given his involvement in the development of “poison pill” tools used to protect artists’ works against unauthorized use as training data for LLMs. These tools, Glaze and Nightshade, alter images and their associated data to cause them to be poor training data for models. As a result, Defendants have argued that Dr. Zhao should not be given access to their confidential materials.
As discussed last week, the parties have remained at an impasse on this issue, and this week saw no new filing by either party on the subject. However, Judge Cisneros did issue a number of discovery orders regarding the discovery of training data, including extending the timeline for production of such data. Perhaps next week will bring more activity.
12. Dow Jones & Co, et al v. Perplexity AI
Background: On October 21st, 2024 The Wall Street Journal and The New York Post sued generative search company Perplexity AI in the U.S. District Court for the Southern District of New York for copyright infringement and other related claims. A new twist in this litigation is the focus on Retrieval Augmented Generation (“RAG”) AI. RAG GenAI not only uses an LLM trained on copyrighted material to respond to individual prompts, but also goes out to the web to update itself based on the relevant query. Perplexity even said the quiet part out loud, encouraging its users to “skip the links” to the actual sources of the copyrighted content. Based on Perplexity’s RAG model, the media Plaintiffs allege that Perplexity is infringing on their copyrights at the input and output stage, sometimes reproducing copyrighted content verbatim. Plaintiffs cited their parent company News Corp’s recent licensing agreement with OpenAI in explaining that GenAI technology can be developed by legitimate means.
Current Status: No major substantive developments this past week. A few weeks ago. we reported that Plaintiffs submitted their memorandum of law opposing Defendant’s motion to dismiss or transfer the case. Three weeks ago, Perplexity submitted its reply arguing that the Court lacked personal jurisdiction due to insufficient contacts with New York. No new developments occurred this week but check back to find out how Defendant’s motion resolves.
13. Getty Images v. Midjourney and Stability AI
Background: Getty Images filed this lawsuit against image generator Stability AI on February 2nd, 2023, accusing the company of infringing more than 12 million photographs, their associated captions and metadata, in building and offering Stable Diffusion and DreamStudio. Getty’s claims are similar to those in The New York Times v. Microsoft & OpenAI case above, but here they are in the context of visual images instead of written articles - i.e., unlicensed scraping by their AI with an intent to compete directly with, and profit from, Getty Images (i.e., market substitution). This case also includes trademark infringement allegations arising from the accused technology’s ability to replicate Getty Images’ watermarks in the AI outputs. Getty filed its Second Amended Complaint on July 8th, 2024, and the parties are currently engaged in jurisdictional discovery related to Defendants’ motion to transfer the case to the Northern District of California. The judge assigned is Judge Jennifer L. Hall.
Current Status: Still no update for Getty. With no notable updates for Getty in over a third of a year, we will be placing this case on the back-burner. As a final update on where we last left off: Getty had submitted a letter to the Court on November 25th, 2024 explaining its frustration with Stability AI’s refusal to participate in discovery or participate in a Rule 26(f) conference. In August 2024, Stability AI had argued that they were under no obligation to commence fact discovery until the Court issued its ruling on jurisdiction. Getty had requested that the Court order Stability to stop delaying and proceed with the case, but after several months with no response from the Court, it appears unclear when things will begin moving forward again.
We will continue to keep tabs on this case and provide an update if and when it resumes forward movement.