Swooper 3000: Indemnity

Wake up, honey! A new property regime and value contract for the internet just dropped.

Johan Brandstedt

Nov 12, 2023

Article voiceover

1×

0:00

-24:19

What a week! And what a clownshow. Almost too much happened in AI to keep up with.

OpenAI Dev Day wipes out fans, enrages property owners (again!)
VCs and Big Tech submit outrageous justifications to the USCO
The FTC joins the USCO fray in a rare move.
New York Times pull out of Common Crawl.
France and Germany demand degen dereg for foundation models in the EU.
New altMan & Swooper drops ;-)

Dev D.ai

OpenAI single handedly wiped out dozens of their most ardent fans/developers by launching features in their ChatGPT update that directly replace ChatPDF and the like — cracking the whip and dangling the carrot for those who want in on the action to stand in line to their app store and pay up the Apple Tax equivalent of 30%.

Bow down before the one you serve. I for one hail our new robot overlords.

Also announced: a 300-page context window, meaning push-button plagiarism of entire works is now no longer limited to art — books are next. Empowering AI entrepreneurs such as the ones behind the Game of Thrones prequel, mushroom foraging books and fake Janet Friedman with increased productivity — doing away with prompting, bot scripting and human cut-and-paste (ech!) — seems like another worthy stepping stone on the noble path to curing climate cancer…? Now if Kindle would only lift that unfair limit to publishing a totally human three books a day.

But to be fair, our precious innovators also launched measures to ensure no undue market harm will come from this, by announcing copyright protection in the form of indemnity.

No, not for rightsholders – for Enterprise users.

altMan & Swooper page #10. Pay no attention to those crinkles 😇

Ownership for me but not for thee

Maddening as this might be on its own to the masses of rights-holders already suing, seen in light of VC and Big Tech comments to the USCO Request For Comments on AI and IP, a big picture emerges that is as clear as it is infuriating. One that made even tech hype media baffle at the audacity. We’ll get to why in a bit.

The USCO RFC ended up yielding 10K comments. I expected a lot of novel evidence of market harm, from actors such as the European Writer’s Council and the Newsmedia Alliance of 2,200 American publishers. That was the theme I worked on myself, and that we indeed got. (I summarized some of the findings in previous posts. My full public comment here, covering all of the sordid 39 questions.)

I also expected a thousand prompters to claim moral ownership and commercial exploitation rights to expressive content extracted from text and image models. That probably happened. I’ve heard all the arguments, and debunked some of the dumbest ones here.

Brief aside: property rights are ultimately there to incentivise original human work — “original,” “human” and “work” all being key. In light of 20% of all music every made having been generated this year, and more images generated than photos in all of history, surely quantity needs no incentives. And copyrights are not about copies. They are exclusive commercial exploitation rights, including to prepare derivatives (see the Berne convention).

There are plenty of tricky edge cases in advanced prompting and hybrid process with proprietary training data. But the USCO has kept the principal line hard and clear, again, and again and again: without a human author, without human original thought and direct control over expression: no dice.

It is this set of principles that Big Tech challenge in practice, from five sides at once — all of which came out starkly this week:

One, by indemnity

Assurance of paying customer legal costs in case of infringement. By adding this to their gen-AI offer, Microsoft, Google, Adobe and OpenAI all bet their money against the principle of the original humans’ rights to their work. They establish a new property regime in practice: one where AI users are legally recognized as authors for work they did not do. Even when — especially when — evidence of large chunks of direct citation surfaces (recent studies point to 25-75% of articles, 60-80 word streaks from famous works).

Two, by any excuse to wiggle out of licensing

Repeating their intent to not have to pay any license for the trillions of dollars worth of intellectual property ingested for training and sourced from pirate collections, backdoor access to platforms and good-old scraping of publicly available content.

It was truly hilarious to see the hodgepodge of excuses thrown at the wall:

What about our billions and billions (Andreessen)
We won’t pay licensors enough for it to be worthwhile anyway (Meta)
Products are people too (Google, Stability)
It would hurt the little guy (Microsoft!!!)
Copyrighted works and personal data should be treated the same as code (Adobe)
Copyright is just about copies, and we only did that in passing (Anthropic)
What if Japan (Stability, while setting up shop in Japan)

In addition, as reported by Gary Marcus, both Germany and France came out hard against any regulation at all of foundation models in this week’s EU AI Act trilogues, at the direct behest of their local AI industries.

PSA: Publicly available for human consumption does not equal a Public Domain waiver of exclusive commercial exploitation rights, nor Publicity Rights to commercial exploitation of personal likeness.

Three, by foisting new retrofit, internet-wide and infinity+1 ToU:s

What we’ve seen this year is Microsoft claim training rights to anything you do on Windows. Zoom claim training rights to your videocalls. DeviantArt and Adobe to any images you put on display for sale. Meta to everything you share with friends. Pinterest to anything you collect. Google to the entire internet. And so on. They all bet on getting away with this, and try to convince you that you signed up for it by simply continuing to use their services.

PSA: Terms of Use don’t trump your rights to property and privacy, don’t extend outside of the contracting parties, and can’t be extended indefinitely into the future, be silently changed, or be retrofitted over past agreements — especially when introducing novel use that directly competes with and undermines the value of your work (Fair Use factors 1 and 3)

Four, by claiming all user inputs and outputs

The free labor and content you have supplied to OpenAI, Midjourney, Stability etc over this past year in exchange for “free trials” will be used as training data. As well as everything they generated. So regardless if they eventually license training data, they leveraged their pirated content to make you market their services, upload and generate property to refine their existing models, or to bootstrap their new ones, should the FTC move ahead with algorithm disgorgement.

Five, by shielding all of their outputs against AI training

God forbid anyone else use their outputs to train on — despite it all being legally part of the public domain as it lacks human authorship. While Adobe Content Authenticity and other watermarking, data provenance and metadata initiatives nominally serve to stem reality collapse through fake flooding, it is just as much about avoiding inbreeding and eventual model collapse from algos eating their own excrement. And anyone who reads their fine print will see that it aims to help prevent others doing unto them what they do unto others.

More heartening in the USCO commentary is the rare participation by the Federal Trade Commission. The FTC oversee both consumer protection and fair competition, and posted a comment based on both: their ongoing probe against OpenAI over deceptive business practices and roundtable on market harms against creative professionals, which is well worth a listen. They put in no uncertain terms that generative AI as exists today is being actively used for piracy as well as consumer fraud, at staggering scale.

And the UNESCO too, has spoken up for the rights of creatives world wide.

A new value contract for the internet

The thing is, the internet — while anonymous and wide open — is not a free-for-all. And data licensing for AI training is established practice since long, as any other commercial exploitation of intellectual property — for simple reason:

There is zero difference in terms of pure marketplace relation between a garment maker wanting a nice-looking print to sell T-shirts, a radio broadcaster wanting music to attract ears to sell for jingles, a social platform wanting user content to attract eyeballs to sell for ads, a stock site wanting quality photos to sell to publishers and an AI company wanting quality creative content to remix for new outputs. Any property that adds value to a product or service merits informed opt-in consent and compensation.

The value contract for publishers on the internet used to be this:

There is near-infinite supply and near-infinite demand; Search and Social will help you widen your audience through discovery. Keep posting quality, keep hauling them in. Let us know if someone takes your stuff and we’ll take it down, but consistent posting is how you build an audience.

This has been turned on its head:

There is bona fide infinite supply, and anything you ever published has been turned against you to replace you. By the same actors. So long, thanks for your free lunch, your content and your audience.

In this wild-west state of affairs, publishers and licensors are offered a devil’s bargain: stay on platform, stay searchable, and give your property away as training data. Or minimize discoverability and retreat behind a paywall to not be replaced. The only way to opt out is to opt out completely, and sue for discovery. As the New York Times now do by retracting all of their past content from Common Crawl — the sludge funnel at the bottom of the internet that catches any content you didn’t bother to police.

But to end on a conciliatory note:

Gen-AI is unique in that it extracts many very different kinds of value from the substrate, all competing and exploitative by varying degree – from fully to none at all. There are “snowplow” use cases that don’t interfere with any rightsholder, and predatory and directly exploitative ones – many of which we’ve already seen play out. Original creators deserve compensation accordingly — by degree, depending on use case.

We will no doubt see a rapid maturation of AI training data markets, driven by Big Content. But also a fierce downwards push on prices by the ongoing piracy and content flooding pushed by Big Tech.

I plan to do some system mapping and value stream mapping ahead to untangle some of this — stay tuned.

As for other AI news this week, I’ll pass on the Musk/SamA bot battle and the ChatGPT DDoS, and save commentary on the actor strike end, No Fakes Act and the Humane Pin for some other time …

And if you want to see this old man rant at clouds in real time, follow me on LinkedIn.

Wow, thanks! Yeah, legible is what I shoot for. List I will have to look up.

And as usual, like and subscribe, lube the tuba, rattle the snakes, blare from the rooftops, put the smaller thing on top of the slightly larger thing and all what it is you youngsters do with your newfangled thingamajigs. The more the merrier.

Expand full comment

Elizabeth Hope

For anyone short on time, specifically read:

Point #1 — AI Indemnity = New Property Regime, in which AI users are legal owners for work that they didn’t do.

Point #2 — succinct (& funny :) 1-line excuses from each Big Tech company

Point #3 — wholesaling of your data by Microsoft, Facebook, Google, Pintintest, Zoom (even your video calls!!), etc.

Point #4 — whatever you supply during “free trial” of their services WILL be used as training data.

Point #5 — But Big Tech makes sure that their own data will NOT be used for AI training. Their stuff stays proprietary - but not yours.

1 more comment...

Towards wiser digitalization

Discussion about this post