Is 100% "Ethically Sourced" GenAI Possible?
The Money Certainly Is There To Do It (When Big Tech Budgets $1 Trillion for AI)
Welcome to your Monday morning “brAIn” dump, featuring this week’s “mAIn event” that answers the provocative question frequently raised in my circles - i.e., is 💯 % “ethically sourced” generative AI even possible (spoiler alert: “YES,” but read on). Then, the “AI Litigation Tracker” — updates on key generative AI/media cases by McKool Smith (check out the full “Tracker” here).
But First …
A federal appellate court just echoed the U.S. Copyright Office’s official standing position that autonomously AI generated art, with no human authorship, is not eligible for copyright protection. So our fundamental creative guardrail stands firm. For now.
I. The mAIn Event - Is 100% “Ethically Sourced” AI Possible?
A Call for a Fair Licensing Solution
I’ve repeatedly called for establishing fundamental rules of the generative AI game that play nicely — and pay fairly — with the world of media and entertainment. After all, generative AI tech isn’t even possible without the content sources used to fuel it.
One fundamental rule that’s obvious to me, both as a business exec and IP lawyer, is the need to seek consent from — and give compensation to — rights-holders. My call is for a content licensing scheme that works for both media and AI tech. After all, it’s not about stopping AI development. It’s just about doing it right. To that end, I recently proposed a 3-tier licensing system that can work in the real world (you can check it out here).
Enter the “Fair Use Accelerationists”
But many in the AI world — some earnestly (and some, not so much) — call for a certain kind of licensing-free scraping “absolution” via limitless application of “fair use” (which, to be clear, is simply a defense to actual copyright infringement). These “Fair Use Accelerationists” argue that there is no other way — i.e., that the size and scale of the AI opportunity is so vast and requires so much content “fuel” — that it simply isn’t possible to seek licensing from every rights-holder. They’re essentially saying, “It’s too hard, so why bother?”
But it goes even further for these Fair Use Accelerationists. They try to turn the tables on the creative community by loudly proclaiming that the real problem is them for: (a) holding back society from the promise of AI tech-fueled progress that can transform the world (that’s the positive spin), and (b) hindering U.S.-led AI development which, in turn, enables others — especially China — to eat our AI lunch and put U.S. national security at risk (that’s the darker, dire, more negative spin).
To be clear, I understand both of those Fair Use Accelerationist arguments — and they aren’t completely devoid of merit. I certainly don’t disregard them.
The Media World Isn’t Trying to Stop AI
NONETHELESS, none of this means that generative AI companies should be given carte blanche to take copyrighted works and IP without consent and compensation, especially as they increase their market caps by trillions. I believe that an “opt in” licensing system can be developed, implemented, and executed for the benefit of all — both media and tech (again, to that end, I proposed a 3-tier licensing system that could work).
Sure, there would need to be some form of significant compensation to rights-holders for past unlicensed scraping up to this point in time (essentially, a form of forgiveness). But from this point forward, a new scheme — which is inevitable in my view — should implement a “fair” allocation to rights-holders of the AI value generated by dependence on their works.
But Isn’t It Simply Too Hard?
Some who agree with me nonetheless ask the very fair question whether it’s even possible to implement a compelling and commercially viable generative AI tool that utilizes only licensed content — as in, “100% pure” (no filler!).
Adobe certainly thinks so. It was the first to make this “100% pure” claim when it launched its AI tool “Firefly” in 2023. Upon further inspection at the time however — as Andy Beach, former CTO of Microsoft points out — some of Adobe’s stock images upon which it trained its AI “turned out to be third-party submissions originally generated by other AI platforms, including Midjourney.” This inconvenient truth — which was heavily publicized at the time — meant that Adobe didn’t quite reach its 100% bogey. Nonetheless, the company’s intentions appear to be honorable.
I asked Beach, who until recently drove AI/media strategies at one of the biggest AI companies on the planet (Microsoft), whether he believes “100% purity” is possible. This is what he told me:
“The reality is nuanced. Skepticism around achieving a fully ethically sourced AI is justified. The sheer scale required — millions of carefully vetted, explicitly licensed images and extensive training — is logistically and economically daunting. Currently, there’s no truly powerful AI creative tool that meets this standard completely. This underscores just how complex and layered this issue is, highlighting the urgent need for clearer guidelines and industry-wide transparency.”
So even Beach — who hails from the biggest of Big Tech — doesn’t say it’s impossible. He just says the goal is “daunting.” And, if nothing else, the tech community has always liked a good challenge. That’s why highly pedigreed companies have risen up to meet that “purity” challenge. Take ProRata AI, for example. The company just recently launched its own generative search tool called “Gist” that it says is literally 100% “ethically sourced” — no exceptions or asterisks. VP, Business Development Josh Freeman said this to me: “I can’t understand a rationale where billion-dollar companies don’t think they can or should pay for the intellectual property they consume to fuel their businesses.”
ProRata certainly isn’t a billion dollar company (at least not yet), but it has found a way to license content from some of the biggest and most prestigious publishers and media companies in the world, including Financial Times, The Atlantic, The Guardian, Universal Music Group, etc.
Respected AI/media expert Renard Jenkins (President & CEO of I2A2 Technologies, Studios & Labs) certainly believes 100% purity can be done. This is what he tells me:
“I do believe that a 100% ethically sourced model is possible, and I believe very strongly that it could be competitive. There are literally millions of artists around the world with multiple millions of individual styles and pieces of art, that if incentivized and properly compensated, would be willing to provide the data that’s needed to create a competitive and fully functioning 100% ethically sourced model on which video/image/animation/VFX and many other types of generators could be built.”
We’ve Solved Daunting Licensing Challenges Before
To be clear, the task of accomplishing 100% ethically sourced “purity” won’t be easy (especially when other issues also come into play in the content scraping game — including individual biometrics, rights of publicity, etc., as I've previously written). But neither was it easy to create a system for paying musicians when bars and venues around the world played their songs. Yet, the industry rose up to the challenge to create performing rights societies like ASCAP and BMI. That system isn’t perfect, but it is certainly better than the alternative of saying “it can’t be done.”
Nor was it easy to create a system to pay rights-holders when millions of users on social platforms like YouTube uploaded videos with recorded music or television clips. Yet, YouTube — after being sued for enabling massive copyright infringement — was somehow able to create its Content ID system that met the moment. Content ID certainly didn’t slow down YouTube’s development and massive growth. It just made sure YouTube did it “right” — and “right” meant that YouTube began to share at least some of the immense wealth it created with “a little help from its friends” in the the creative community. Legal, political and media industry pressure work, after all. So does Big Tech cooperation, as Jenkins points out.“To bring “ethically sourced” to fruition, it will require that technologists and developers include artists and creators as part of the design/development team at the foundational level.”
“Ethically Sourced” Is The Great Differentiator
Here’s the thing. Just because something is hard, doesn’t mean that it isn’t the right thing to do both legally and ethically — particularly when the economics are so one-sided. Remember, it’s not about stopping or even slowing down generative AI. It’s simply about doing generative AI “right.” And that’s in everyone’s interests, including the generative AI developers. The best way to unleash adoption and growth of GenAI is to take away the friction of litigation and legal risk.
In this current AI “arms race,” being “ethically sourced” — which enables users to benefit from the most compelling, high quality and trusted content sources for training, RAG, and display purposes — is perhaps the single greatest GenAI differentiator of them all. In a market now awash with generative creator and search tools that are strikingly similar, those AI developers that move fast to offer the best “ethically sourced” GenAI products are the most likely to win.
“Synthetic Data” Addendum
Of course, some major generative AI companies seek to sidetrack the whole licensed data issue by focusing instead on so-called “synthetic data” sources — i.e., computer-generated and designed data designed to mimic real-world data. To that end, just last week Nvidia announced a 9-figure deal to acquire synthetic data firm Gretel.
Good luck with that! I see at least two fundamental issues with that approach: (1) synthetic data requires real-world data in the first place to get to the point where it can start building on itself; so that initial “seeding” of real-world content, if un-licensed, is infringing regardless; and (2) many leading AI researchers warn of the serious risks of so-called “model collapse” — significant quality degradation when LLMs are fine-tuned over and over again with data generated by other models. Wired puts it this way: “if you feed the machine nothing but its own machine generated output, it theoretically begins to eat itself, spewing out detritus as a result.”
Those sounds like massive risks to me, especially when the best differentiator in an AI arms race is the quality of the generative AI tool.
Follow Peter Csathy on BlueSky via this link.
You can also continue to follow Peter’s longer daily posts on LinkedIn via this link.
II. AI Litigation Tracker: Updates on Key Generative AI/Media Cases (by McKool Smith)
Partner Avery Williams and the team at McKool Smith (named “Plaintiff IP Firm of the Year” by The National Law Journal) lay out the facts of — and latest critical developments in — the key generative AI/media litigation cases listed below. All those detailed updates can be accessed via this link to the “AI Litigation Tracker”.
The Featured Updates:
(1) The New York Times v. Microsoft & OpenAI
(2) Kadrey v. Meta
(3) In re OpenAI Litigation (class action)
(4) Dow Jones, et al. v. Perplexity AI
(5) UMG Recordings v. Suno
(6) UMG Recordings v. Uncharted Labs (d/b/a Udio)
(7) Getty Images v. Stability AI and Midjourney
(8) Universal Music Group, et al. v. Anthropic
(9) Sarah Anderson v. Stability AI
(10) Raw Story Media v. OpenAI
(11) The Center for Investigative Reporting v. OpenAI
(12) Authors Guild et al. v. OpenAI
NOTE: Go to the “AI Litigation Tracker” tab at the top of “the brAIn” website for the full discussions and analyses of these and other key generative AI/media litigations. And reach out to me, Peter Csathy (peter@creativemedia.biz), if you would like to be connected to McKool Smith) to discuss these and other legal and litigation issues. I’ll make the introduction.
About My Firm Creative Media
We represent media companies for generative AI content licensing, with deep relationships and market access, insights and intelligence second to none. We also specialize in market-defining strategy, breakthrough business development and M&A, and cost-effective legal services for the worlds of media, entertainment, AI and tech. Reach out to Peter at peter@creativemedia.biz to explore working with us.
Send your feedback to me and my newsletter via peter@creativemedia.biz.