21 Comments

It is fascinating. Stability especially are delusional. They sit with a straight face in Senate and try to convince them that products are people.

Expand full comment

California is banning gas stoves pretty soon... :)

Expand full comment

Haha sorry Johan. Veiled reference to gaslighting

Expand full comment

Thanks for your kind words! Do spread it around.

And as for China -- the funny thing is, AI bros will hammer on about the need for deregulation to beat that boogeyman. Meanwhile, they regulated AI wisely, with mandatory watermarks, algorithm registration and informed prior consent for using someone’s likeness as an AI input. It’s not their elections and stock markets that will implode from reality collapse...

Expand full comment

Shared on Facebook and LinkedIn

Expand full comment

Johan, this is by far the best, most comprehensive, and lucid argument against IP theft by AI that I’ve ever read! Amazing!

You need to disseminate this article ASAP, especially before the submission deadline for comments to the US Copyright Office. Share it on Twitter, Reddit, LinkedIn, Facebook, Quora, etc because this eloquent counter-narrative is sorely needed given so much mainstream propaganda.

You make such excellent points that activists on the ground need them to share with others. Anything that can be patented or protected by copyright is at grave risk.

Therefore, people who don’t identify as artists or writers (such as video game developers, entrepreneurs, inventors, etc) are also subject to future expansion of AI training datasets.

At the end of the day, a type of mercantilistic relationship is being developed here:

A) Content Creators = 3rd World countries selling raw materials for rock bottom prices.

B) AI Companies = 1st World countries buying these raw materials cheap and transforming them into “manufactured products” that sell at a much higher price in the global marketplace.

The difference in this analogy is that AI tech companies never paid for the materials that they used in their training datasets. Plus, they can exercise the option of excluding certain types of materials, thereby giving preferred status to these exempt content creators (likely housed in some corporate entity) in return for some favor or benefit.

All of this is highly undemocratic. Basically Big Tech unanimously declared private property “eminent domain,” something that only governments can do with democratic input.

In communist regimes that eliminated the right to private property, you saw a new type of “aristocracy” arise - the Communist Party members on top. George Orwell was a devoted Socialist until he saw the reality of what happened in the USSR. That’s why he wrote his famous book “Animal Farm.”

Some may argue that China is a more successful example of communism, but that assertion is also false. Under Mao, China was headed for bankruptcy (especially after the bloody Cultural Revolution 1966 until his death in 1976).

It was Deng Xiaoping who saved China because he incorporated capitalism in limited measures after seeing Lee Kwon Yew’s successful transformation of Singapore from backwater jungle to 1st World status within the span of generation.

So ultimately what’s going to happen if Big Tech is not taken to task for their blatant disregard for the rule of law? What else will they invent that we’re all supposed to roll with?

Expand full comment

You claim that "The 'public good' of everyone getting free stuff is here somehow held to invalidate the owner’s objections to everyone getting their stuff for free." But these models are trained using only publicly available data. The LAION dataset, for instance, isn't even composed of the images themselves, just links to said images. It's stupid to release your work for free to the public, and then complain about, in your own words, "everyone getting [your] stuff for free."

Also, there are plenty of people putting very real effort into works created with AI. Look at something like "Anime Rock Paper Scissors" for instance: it does use AI as a shortcut, but it's far from low-effort slop.

And it's kind of hard to argue that AI training isn't transformative. There are a couple places where you imply that the AI is somehow gathering up images from some dataset and combining them together, but that's simply not true. The checkpoint file is only 8gb; it's not possible to fit that many images into that small a space either. If the original images aren't a part of the AI, how can you argue that this isn't transformative? And art styles aren't copyrightable, so don't go there.

This article doesn't feel like a genuine attempt to try to connect with people invested in these AI technologies. It just feels like a bit of a hate train, especially with all the condescending remarks. But I'll give it a try anyway.

Expand full comment

So here's how copyright and image licensing works, since 125 years:

- you own what you make

- you have moral rights to informed opt-in consent to commercial use of your works

- if you want to make money off of someone else's works, you license it

Conditions being that you are a human and came up with something original enough and actually made the thing as in directly controlled the creative expression.

This is very basic, fundamental stuff that applies equally across all of the 193 countries of the WIPO and the Berne convention.

Here's how the business of images works:

Licensing a professional image for commercial online display use starts at $300 per year. This is how photographers and illustrators make a living. They spend years in training and thousands on equipment to produce works that add value to some product or service. Such as Facebook driving time on site that they can sell for ads, or a publisher making a book on dogs or a garment manufacturer producing PopEye T-shirts. Licenses are limited in time and scope, and most often work on a per-item and per-period revenue share model. This is called royalty.

Here's how that played out on the internet until just now:

Big Tech secured an exception for user uploads, called the DMCA. This means they don't have to license works up front as long as they respond to reported infringement quickly enough. This puts the onus on every artist to police all of their work across all of the internet, all of the time. Most artists are not even aware where and by whom their work ended up there. And as "publicly available" until just now implied FOR HUMAN CONSUMPTION, they generally saw it as free marketing and not worth the hassle. For the residual infringements, platforms such as Facebook pay publishers, who then pay their illustrators and photographers.

Here is how the business of AI worked, until just recently:

Researchers aquire datasets for training through licensing them from for instance a stock site, that in turn license the material from illustrators and photographers and remunerate them. This is costly for AI companies.

So rather than deal with all that and paying for the raw material that their service depends on, they covertly took everything they could get their hands on and came up with a BS story about sentient robots and deliberately confuse PUBLICLY AVAILABLE FOR HUMAN CONSUMPTION with PUBLIC DOMAIN LEGAL FOR COMMERCIAL USE.

This is all according to standard VC behavior as seen with Uber and AirBNB: Step 1: Break the law. Step 2: Blitzscale at a loss. Step 3: Get "too big to ban". Step 4: Lobby regulators for new law via users, such as yourself.

The reason I slug at it in this post is that every online artist have to repeatedly answer those same BS talking points over and over and over.......

Expand full comment

I see where you got confused. Copyright protects your content from being distributed without your permission or making a derivative work. Creators of datasets didn't actually distributed your content. They collect links to the artworks and distribute links, you don't own a link to your work, link is public domain (you can actually delete the content from the server and the link will be broken and scrappers can't access it), copyright only protects the actual work being distributed by third party.

Then AI company takes this dataset of links and uses them to access your work, copyright totally allows this, every user that sees your work online is downloading it to their PC, the image is stored in cache on their PC to show it on the page. Any user can take this image and do whatever they want with it, I can take your precious masterpiece you worked 20 years on and print it out on a roll of toilet paper and wipe my ass with your work and it won't break copyright, unless I distribute it to other people. The moment the work distributed without consent copyright is broken.

So AI company takes these images and uses machine learning algorithms to analyze them and derive a model from them, this process doesn't copy any part of any image that was presented to the algorithm, nor the model can be used to perfectly reproduce any image in the dataset. Then AI company can do whatever they want with the model. The model doesn't contain any part of any copyrighted images, so distributing it is perfectly legal. All notions of copyright being only for "HUMAN CONSUMPTION" is your wild imagination because it doesn't state it anywhere. I can show any copyrighted image to my cat and no one can sue me for it.

Surely by now you're saying: "But wait, you yourself said about copyright prohibiting making derivative works, haha, got ya!" but alas AI model is not more derivative of the picture as a filesize of a picture or average color value. It would be derivative work if they made some changes to original while it can still be recognized, or they made a reproduction, taking an idea from the picture and remaking it so that people can clearly see the influence. But deriving some very abstract relationships between a picture and a text label to be captured in a numerical matrix cannot justify in any reasonable capacity to call it a derivative work. Maybe you could actually argue it if it were only your works specifically, so they targeted you personally and model is specifically created to plagirize you, but when a model goes through 100 million images, only some of which are copyrighted, that's just ridiculous. No human on Earth can claim that AI model is a derivative of their work if it was trained on 100 million images and there was like 30 of them from them. The model weighs around 6 Gb and around 10 bytes of them are from one picture (in average statistically speaking, probably even less since it's initialized with random values that are iteratively adjusted). Even if you just take an image and compress it to 10 bytes in size, it'll be so much changed by compression algorithm no judge in the world will see it as a derivative work and we're talking about abstact matrix math that didn't even copy the image but learned from it.

You're very eloquent with presenting your points but they just don't make any sense if you know what you're talking about.

Expand full comment

You rattle off a rather random mix of talking points from the first wave of GenAI PR canvassing. Get with the program. Everyone has done their technical due diligence, and nobody buys the even distribution fallacy.

Is your core argument that copying doesn't really happen, or that the copying that does happen is fair use? Decide what leg to stand on.

Regardless:

Dataset compilation is a separate activity from copying -- no argument there. However, Laion who compiled the Stable Diffusion dataset were both salaried and sponsored by Stability.ai, who in return got preferential access to monetize the resulting model. This is in clear breach of both the letter and spirit of the law of the EU text- and datamining exemption, which explicitly forbids this exact setup (see German copyright act, section 60d) Research scraping is permitted, profit scraping requires opt-out. None such was given. And as you well know, SD in turn powers CivitAI, which have built an entire business that encourages, incites and pays for targeted scraping and monetization of personal likeness and copyright works by living professionals as nonconsensual deepfake and plagiarism models.

Having analyzed the Laion datasets, it is evident that they targeted sources of proprietary images under license, such as stock sites and art marketplaces who display viewing copies of works for sale. In essence, shop windows were raided for raw material instead of purchasing from source. This material was weighted more in training, as quality data gives quality results. It's a popular spin to pretend all training data is created equal and none of it comes with property claims, but the opposite is true.

As for the tired "too small to be a copy": lossy interleaved storage is still storage. A model is a model *of* the underlying works, and a store of their expressive content. Approximate retrieval is still retrieval. Prompting "Mona Lisa on a bike" generates a derivative work of the Mona Lisa, enabled by the model maker and monetized by the service provider. Not because some imaginary ghost in the machine was "inspired" by "looking" at the work when "training", but because the model stores expressive content extracted from copies of the work in latent form.

That particular work is in the public domain, but most of what Midjourney subscribers pay for isn't. Arguing technicalities there is like saying a zip file isn't a copy because it's small, or a jpeg isn't a copy because it's lossy, when it is apparent to everyone that the derivative can't exist without the original.

The usecase demonstrated in IEEE Spectrum by Marcus&Southen back in January was just ruled illegal in China, and more of these trademark violations are set to be prosecuted: https://www.yahoo.com/news/china-court-says-ai-broke-223654653.html

Back up, start over, try again.

Expand full comment

Fantastic writeup. Thanks.

Expand full comment

I watched the entire Senate proceeding with Karla Ortiz. I wouldn't normally say this about Senate proceedings but this was riveting. None of the statements surprised, but it was eye-opening and sobering nonetheless. Here's a link to a YouTube video (with some commentary): https://www.youtube.com/watch?v=XSpyx7Yp-Dg

Expand full comment

Interesting & thoughtful article.

Expand full comment

Thanks! It’s kind of a slugger post against the most common hype BS. You might appreciate my latest one from the other day!

Expand full comment

Generative AI is intellectual & creative communism and I'm here for it. 🌈 Property is theft, including intellectual property.

Expand full comment

I’ve been meaning to write a bit on that angle.

Me personally I’m all for commoning. We should do it more and I think Creative Commons is awesome.

That’s not what’s happening here though. It’s misappropriation and destruction of legitimate livelihoods, in order to centralize power and harvest free labor. All of the tech CEOs are pretty transparent about that. But if you want to give away your property, labor and capital to them, it should be your choice.

Expand full comment

It's so much more than that. Generative AI is absorbing all of human creative achievement and making it available for everyone at very low cost. That's frickin' amazing. And if we had UBI, we wouldn't be arguing over the zero-to-few pennies that most people get for our creativity currently. Very few people are able to make a living being creative, and there is a ton of gatekeeping to it. No creative industries are actually healthy to work in--they are all massively exploitative already. We can't get to a future without jobs unless we destroy all the jobs. It will be a messy process, but we have the opportunity here to completely revolutionize the idea that people are only as valuable as their labor, which was always wrong and we all suffer for that being how our economy is structured. You should not have to prove your worth to capitalism to get to be creative and live a healthy, happy life.

Expand full comment

That low cost is subsidized by

1) misappropriation of intellectual property in the trillions, representing massive investments in education, skills and equipment.

2) venture capital that flushes the market with freebies to kill competition and build habits to later crank up prices, extract rent and recoup investment -- see AirBNB, Uber, etc.

3) hardware sourced through six continent supply chains powered by fossil fuels and child labor. Your laptop and phone both caused ~100kg carbon emissions. The data center cpus and gpus aren’t made from unicorn farts either.

4) massive energy and water consumption for the same.

5) forced re-training of millions of professionals (unless duly compensated for the theft)

It’s great for *you* to receive “free” stuff, but the true cost is taken from others, borrowed from the future, and externalized to both nature and society. What you call “gatekeeping” is what lets the lucky, talented, hard-working few who spent their blood, sweat and tears who did make it to earn a living at all.

Nobody can disagree with happy, healthy, creative lives, but your right to a hobby does not trump their right to a livelihood.

Those are some real issues to work out, and what we have -- time limited exclusivity to protect investment for the long term greater good -- can’t be thrown out over night.

Expand full comment

I didn't convey anything like "my right to a hobby trumps their right to a livelihood" or "I want free stuff at others expense". Please don't twist my arguments to make your points.

The fact that gatekeeping is required for people to make a living is exactly the problem I'm pointing to: the system doesn't serve humans, it serves the machine that extracts value upwards and chews through people's humanity to do it. Nobody should have to sacrifice "blood, sweat, and tears" to earn a living. A living should be guaranteed. We would all then have "hobbies" (which is just a way to trivialize people's creativity who don't have the privilege or masochistic endurance to devote themselves to something without being able to earn a living at it), and then some people would become masters and would get recognition for that, which is great, but everyone else would still be OK. Status should not be the metric we use to decide whether someone deserves food, housing, etc. We can still have status, but it should be divorced from basic human needs.

The true cost of our system is taken from *everyone*. The fact that some people can eke out a living doesn't mean the system is *good* or should be defended or maintained. Yes, of course people should be protected through the transition (and probably won't be, unfortunately). But what I'm saying is, let's just get clear on what kind of future we are fighting for here.

The argument that AI is being created by corporations so it is inherently damaging doesn't make sense--there's no other way it could or would come into existence, because that's our current economic model, but the technology itself is still revolutionary. Environmental costs also aren't compelling to me because literally everything we do causes those effects. Saying, "It's more of the same" isn't true. It's fundamentally different, something humanity has never created before, and I think it changes everything, if we take the opportunity.

Expand full comment

An equitable future with sufficiency for all while getting back within planetary means is a fine and necessary thing to aim for.

I was trying to point to the present real-world effects of the technology.

The global north generally has sufficiency, but not equality nor environmental sustainability. Others lack even basic healthcare and sanitation. There are precious few real-world examples of societies that manage both social and environmental sustainability.

As for today, right now-- we need to reverse the resource and energy consumption trend in the global north, fast. AI writ large may be helpful to that end, but explosive use of text and image generators just serve to expand our extractive technology substrate.

Tech as we know it is inherently linear, power hungry and fragile, regardless under what political system it was built. AI exponentially more so than previous equivalent tech.

The inherent growth obligation in capital further expands it.

And civilization as a whole, including our jobs, leisure, money, technology, and its transition, is powered by a rapidly depleting carbon pulse that adds the equivalent of 500 billion human slaves worth of energy.

Unless the transition starts from a holistic, life-centric view, including all those externalized factors, we’re pretty much fucked.

We also need jobs for the foreseeable future. AI is evidently not helping there. It replaces labor with computation and automation several orders of magnitude more energy hungry. A robot requires 100 times as much energy as a person. One ChatGPT session “drinks” a bottle of freshwater.

“Democratizing creativity” in the sense that Adobe et al shoot for, means replacing millions of self-sufficient workers with a billion users renting access to data centers. It builds more dependency, fragility and centralization.

Expand full comment