$1 Trillion for GenAI: So Where Are The Content Dollars?

I Propose a "3-Tier Content Licensing Solution" That Can Work for All

Nov 04, 2024

I. the “mAIn event: Where Are The Content Licensing Dollars?

Last week, I wrote about generative AI (“GenAI”) search “tool” Perplexity, which had just been sued by News Corp for copyright infringement. Perplexity is representative of those GenAI companies that are dependent upon massive amounts of content — much of which is scraped and used without consent from, and compensation to, the rights-holders — in order to train their large language models (LLMs). GenAI companies like Perplexity rely upon “fair use” as their primary defense to infringement, and the courts have yet to conclusively weigh in on the issue (although, as I recently wrote, I predict they’ll reject fair use in this entirely novel GenAI context).

$1 Trillion Expected To Be Spent on Generative AI

Of course, none of this litigation is slowing down Perplexity or other GenAI developers. Goldman Sachs, in fact, predicts that $1+ Trillion will be invested in GenAI — and Perplexity now seeks a new $500 million round of financing at an $8+ billion valuation. That got me wondering how much of that $500 million is budgeted for the media rights-holders (like News Corp), whose content it scrapes and uses in in order to compete with them as a new one-stop shop for news, information and analysis (in other words, a market "substitute")?

And that question got me thinking of the even bigger macro issue — of the $1+ Trillion expected to be spent on generative AI, how much have the major GenAI developers actually budgeted to license much of the content they need to give their LLMs real, scalable and massive commercial utility?

Well, the answer likely won’t surprise you.

Important Context

Let’s take a deep dive into relevant content licensing budgets (or, better said, current lack thereof). As we do, remember, every single GenAI company below achieved significant growth in its overall market cap year-over-year due — at least in significant part — to its GenAI/AI-focused strategy and overall “story” to Wall Street.

[IMPORTANT CAVEATS: I’ve linked to the underlying sources for the relevant capital expenditure budgets I use and for the content licensing deals I identify. BUT to be clear, I estimate the numbers I lay out for each company regarding: (i) amounts budgeted specifically for AI-focused initiatives, and (ii) content licensing expenditures to date. You should consider those to be illustrative for purposes of my analysis (although, because I’ve tried to be conservative in my estimates [i.e., favorable to the GenAI companies], the percentages I use for their GenAI-related content licensing expenditures — at least to date — are likely higher than they actually are at this point)].

(1) Content Licensing Dollars Spent to Date

OpenAI

OpenAI just recently raised $6.6 billion at a valuation of $157 billion in its continuing quest to lead the GenAI pack and accelerate its march toward its holy grail of artificial general intelligence (“AGI”). CEO Sam Altman’s company expects to generate $11.6 billion in revenues in 2025 — albeit with significant losses (meaning, of course, that its anticipated costs will be significantly north of that). But those mega-bucks are virtually all slated for capital expenditures (mostly computing power) and R&D. I’ve seen no reports of numbers budgeted for content licensing.

But that doesn’t mean that OpenAI isn’t licensing content. In fact, at least publicly, OpenAI has been the most active so far, having announced several agreements with media companies. Its biggest to date is its 5-year $250 million deal with News Corp. That sounds big in theory. But is it really in relation to its overall billions in investments?

Total Financing Raised by to Date (approximate): $20 Billion

Total AI Investment to Date (conservative estimate): $10 Billion (I’m assuming it’s used 50% of its raised capital so far).

Total Spent on Licensing Content To Date: Estimated $500 Million (maximum) (including its News Corp. deal).

Percentage of Content Licensing (vis-a-vis Overall AI Investment): 5%

Current Valuation: $157 Billion (earlier this year, the company was valued at $80 billion, up from its $29 billion valuation in 2023) (it’s worth considering amount paid out to content licensing in relation to the company’s overall valuation as well).

Microsoft

OpenAI’s biggest investor, Microsoft, spent $19 billion on CapEx in its quarter ending June 30th, 2024 alone, much of which was used to build out its data centers and acquire CPUs and GPUs to be ready for expected massive demand for AI. Just imagine what those numbers will look like in 2025? But nowhere was I able to find any dollars specifically earmarked for content licensing.

But like its OpenAI progeny, Microsoft has at least demonstrated a willingness to license content. One such deal is with academic publisher Informa, reportedly valued at about $10 million — a drop in the overall bucket of tens of billions. But hey, it’s a start — and sure beats litigation.

Total AI Investment to Date (conservative estimate): $75 Billion ($80 billion CapEx this year, $40 billion last year - equals $120 billion). Assume 50% of that is AI-related. Equals $60 billion. Add $15 billion for its OpenAI and other GenAI investments.

Total Spent on Licensing Content To Date: Estimated $200 Million (maximum) (including its Informa deal).

Percentage of Content Licensing (vis-a-vis Overall AI Investment): .3% (point 3%).

Current Valuation: $3.1 Trillion (up from about $2.5 Trillion one year ago).

Google

How about Sundar PichAI’s Google (even his name screams “AI”)? Certainly Google engages with media companies, since it too is a major media company (YouTube) and itself sounded the alarms when it reported that OpenAI used its content for GenAI training purposes. Plus there’s some relevant history here. Remember that YouTube, in its early days, was sued for mass copyright infringement — and, after settling the litigation it faced, Google ultimately developed a Content ID and ad revenue sharing content compensation scheme that still lives today. So there’s precedent here to try to work something out with rights-holders.

But although the tech giant was on track to spend nearly $50 billion on infrastructure this year — much of which is to fuel its AI aspirations — I found no reports of dollars budgeted for content licensing. Like the others, its tens of billions are spent primarily on data centers, cloud infrastructure, and R&D to power its GenAI dreams.

Google has also dipped its toes into GenAI-focused content licensing — including its $60 million deal with Reddit. But its reported deals are few and far between. At least so far.

Total AI Investment to Date (conservative estimate): $40 Billion ($50 billion CapEx this year, $25 billion last year - equals $75 billion). Assume 50% of that is AI-related. Equals $37.5 billion. Add $2.5 billion for its Anthropic and other GenAI investments.

Total Spent on Licensing Content To Date: Estimated $200 Million (maximum) (including its Reddit deal).

Percentage of Content Licensing (vis-a-vis Overall AI Investment): .5% (point 5%).

Current Valuation: $2.1 Trillion (up from about $1.5 Trillion one year ago).

Amazon

Chairman Jeff Bezos is a media owner himself — The Washington Post — so, even if he isn’t willing to take a stand on the Presidential race, surely he values content, right? Of course he does, but I’ve seen no reports yet of numbers specifically budgeted for content licensing. Nor have I been able to find any reported major GenAI content licensing deals so far — despite the fact that Amazon will spend an astronomical $150 billion over the next decade plus to support its AI buildout of data centers, etc.

Total AI Investment to Date (conservative estimate): $16 Billion ($15 billion CapEx this year, $7.5 billion last year) - equals $22.5 billion. Assume 50% of that is AI-related. Equals $11.25 billion. Add $4.75 billion for Anthropic and other related investments.

Total Spent on Licensing Content To Date: Estimated $100 Million (maximum)

Percentage of Content Licensing (vis-a-vis Overall AI Investment): .65% (point 65%) (and likely lower since there’ve been no reports of licensing deals to date).

Current Valuation: $2.1 Trillion (up from about $1.4 Trillion one year ago).

Anthropic

Amazon-backed Anthropic is one of OpenAI’s most competitive, competitors — kind of like the “Pepsi” to OpenAI’s “Coke.” So it’s no surprise that Amazon bet big on this contender (to the tune of $4 billion) as it races toward a $30-$40 billion valuation. But although Anthropic CEO Dario Amodei recently said he expects future LLMs to cost hundreds of billions of dollars to develop, I saw no reports of dollars budgeted specifically for content licensing. Nor was I able to find any publicly-announced GenAI-focused content licensing deals.

Total Raised by to Date (approximate): $8 Billion

Total AI Investment to Date (conservative estimate): $6 Billion

Total Spent on Licensing Content To Date: Estimated $50 Million (maximum)

Percentage of Content Licensing (vis-a-vis Overall AI Investment): .83% (point 83%) (and likely lower since there’ve been no reports of licensing deals to date).

Current Valuation: $30-$40 Billion (one year ago at the end of October 2023, the company was reportedly valued at between $20-$30 billion).

Apple

Certainly Apple — which prides itself in being THE tech company for the creative community — has earmarked billions of dollars for the licensing of content necessary to train its own LLMs. Right? Well, not so fast. Despite the fact that it’s the most valuable company in the world — with a current market cap of about $3.4 trillion — so far, I haven’t seen any reports of dollars spent for AI. Nor have I seen any reports of dollars budgeted for content — or publicly announced licensing deals for content.

Total AI Investment to Date (conservative estimate): $15 Billion

Total Spent on Licensing Content To Date: Estimated $100 Million (maximum)

Percentage of Content Licensing (vis-a-vis Overall AI Investment): .67% (point 67%) (and likely lower since there’ve been no reports of licensing deals to date).

Current Valuation: $3.4 Trillion (up from about $2.6 Trillion one year ago).

(2) So Let’s Do The Math Together

So here’s the final tally boys and girls.

Total dollars expected to be spent on generative AI = $1+ Trillion (although, to be fair, this is a total estimated to be invested by all players in the GenAI space — not just the companies mentioned above).

Total AI-related content licensing expenditures so far = $1 Billion max (and likely much less than that). That means about .1% of the total expected spend has been on content licensing to date (as in point 1%, not 1 %). If just the major GenAI players analyzed above have together invested $100 billion (of the expected $1 trillion) so far, then at most 1% has been paid out to rights-holders.

But based on my “reading of the room,” I expect this all to change as the major GenAI developers realize it’s better to switch (to doing licensing deals) than fight (and potentially lose in the courts, with all the wasteful litigation costs and “time sink” that go with it). And I see positive signs that things are moving that way — albeit still too slowly.

(3) So What’s The Right Number?

So, how much should GenAI developers pay to rights-holders for use of the essential content they need (without which much of their GenAI tech investments would have little commercial value)? In other words, what percentage of their overall GenAI budgets is a “fair” allocation to rights-holders?

To answer that fundamental question, maybe we should at least take a look at how Big Tech and Big Media solved this analogous question in earlier Tech disruptions.

Let’s take the last major tech-tonic shift in the world of media and entertainment — streaming. YouTube pays out about 55% of its ad dollars to rights-holders. Meanwhile, Spotify pays out about 70% of its revenues to music rights-holders.

Maybe those kind of streaming percentages don’t work here in this GenAI context due to the massive capital investments necessary for generative AI to “do its thing.” Also, streaming monetizes the relevant licensed content arguably more directly; whereas each individual content source used for GenAI training is typically less directly “traceable” to the generated outputs.

But to be clear, a lack of directly “traceable” commercialization doesn’t necessarily excuse infringement even on the output side. As I always point out, I believe the courts will ultimately rule that training alone without consent and compensation is copyright infringement. And, in its recent Hachette v. Internet Archive decision (which I recently discussed), the Second Circuit Court of Appeals (the same “Google Books” court that the GenAI players always cite to support “fair use”) ruled that direct “for profit” commercialization is not necessary to find copyright infringement. The Internet Archive is a non-profit.

In any event, when we objectively look at the relevant estimated total GenAI-related content licensing percentages above (which range from .33% to 5%), it’s hard to justify those as being the “right” or “fair” answers given the critical role content plays in GenAI value creation.

So, what is fair?

(4) Proposed Solution: A 3-Tiered Content Licensing System

So, it’s time to do “fair” licensing deals now — and define a practical, workable content licensing solution that gives all parties motivating incentives and the certainty they need. It’s in everyone’s interests to take away the “friction” of acrimony and uncertainty — which inevitably leads to significant risk and wasteful litigation (when instead all can move forward constructively to share in the value created together).

Here’s my proposal for a solution that benefits all. It’s a 3-tiered “opt in” licensing system for rights-holders that compensates them for past unlicensed training — and pays them for inclusion of their works in next-generation LLMs developed during a defined period of time (i.e., the term of such licensing deal):

Tier 1: 1-1 Direct Licenses Between Individual GenAI Developers & Individual Major Rights-Holders (media and entertainment companies with significant content libraries). OpenAI’s $250 million deal with News Corp is an example;

Tier 2: 1-1 Direct Licenses Between Individual GenAI Developers & Aggregated Rights-Holders (individual media and entertainment companies with smaller, yet still sizable, content assets — who “pool” their assets together to bring content scale and diversity). This is a “strength in numbers” approach; and

Tier 3: Automated “Opt In” Platforms for All Other Individual Rights-Holders (smaller rights-holders who want to participate in GenAI monetization — in a way that is akin to music public performance licenses). Several VC-backed companies are on a path to enable this kind of “automated” approach. I know several of them.

I know there’s a lot to “chew” on here — and I want to hear from you. Send me your own ideas and feedback to peter@creativemedia.biz.

Listen to A Smart Podcast Discussion of It All

For a surprisingly comprehensive and compelling discussion of my article, listen to this podcast episode that I generated using Google’s NotebookLM.

LISTEN TO PODCAST

II. the GenAI Content Licensing “Snapshot” (brought to you by Variety)

Variety compiles and regularly updates a “snapshot” of the key publicly-announced GenAI content licensing deals. Here are highlights from its latest (subscribe to Variety’s VIP+ “Variety Intelligence Platform to get it yourself. I highly recommend it).

III. AI Litigation Tracker - Updates on Key Infringement Cases (by McKool Smith)

Partner Avery Williams and the team at McKool Smith (recently named “Plaintiff IP Firm of the Year” by The National Law Journal) lay out the facts of - and latest critical developments in — the key copyright infringement cases listed below (via this link to the “AI Litigation Tracker”). So much you need to know (no matter what role you play). So little time. We do the work so you can don’t have to!

(1) Dow Jones, et al. v. Perplexity AI

(2) The New York Times v. Microsoft & OpenAI

(3) Sarah Silverman v. OpenAI (class action)

(4) Sarah Silverman, et al. v. Meta (class action)

(5) UMG Recordings v. Suno

(6) UMG Recordings v. Uncharted Labs (d/b/a Udio)

(7) Getty Images v. Stability AI and Midjourney

(8) Universal Music Group, et al. v. Anthropic

(9) Sarah Anderson v. Stability AI

(10) Authors Guild et al. v. OpenAI

(11) The Center for Investigative Reporting v. OpenAI

NOTE: Go to the “AI Litigation Tracker” tab at the top of “the brAIn” website for the full discussions and analyses of these and other key generative AI/media litigations. And reach out to me, Peter Csathy (peter@creativemedia.biz), if you would like to be connected to McKool Smith) to discuss these and other legal and litigation issues. I’ll make the introduction.

And check out my firm Creative Media and our AI-focused services

Learn more about Creative Media

Send feedback to my newsletter - including guest essay submissions and other ideas - to me, Peter Csathy, at peter@creativemedia.biz.

E.R. Burgess

Nov 4

This is how Credtent.org has built its system. Thanks for the data to support this vision.

Expand full comment

Jim Amos

So far, I don't see any evidence to convince me that these AI companies have any long-term plan to license any data at all: they'd much rather see changes to copyright law that allow AI training to be considered "fair use." Any deals they have struck so far have been temporary measures to ensure their data models don't turn into synthetic content cannibals too soon, but my gut says they are waiting to see if Trump wins the election because his admin will be easier to convince that they should be allowed to scrape content with impunity.

1 reply by Peter Csathy

4 more comments...

Peter Csathy's "the brAIn" - real media & AI intelligence

Discussion about this post