Oceans Infinity

You and I have been witness to the greatest art heist in the planet’s history. Yup, today I want to talk about the underpinning of the current crop of Large Language Model (LLM) style AIs: the training data they scraped from across the internet, published works, and anything else they could get their digital paws on.

Compared to this, the recent Louvre break in was amateur hour, Vincenzo Peruggia was a lucky chancer, and even the stripping of artwork from across Europe to line the Nazi galleries barely warrants a footnote. Hyperbole? Kind of, but not entirely. 

On the one hand, the artwork hasn’t been removed. It’s still where it was to start with, pretending that nothing has happened. And the creators may even now be none the wiser. However, behind the scenes, a hugely consequential theft has taken place. 

Theft? Before we go any further, let me say that this is how I understand it. I’m not a lawyer, but I’ve dealt with copyright cases a number of times in my years in the industry. Fundamentally, copyright law seems simple. A human creator makes a new work, and by the process of that creation they gain the copyright in it. They don’t have to claim it, the act of creation itself grants them rights (as long as they’re human). Other entities may not use that copywritten work for their own financial gain without permission. There are a few cut outs for what is broadly called Fair Use (FU, ironically), though this applies to reviews and education rather than profit-centred endeavours. So, let’s say I want to design and publish a Star Wars game. I’d need permission from the copyright holder to do so. If I wanted to teach a course on media studies, I could reference Star Wars as Fair Use without express permission. 

On the face of it, the AI case seems pretty simple. The LLM industry collectively scraped all the art and writing and music and code and anything else they could find and used it to train software with the express purpose of competing with and replacing those creators. I don’t think that’s up for debate. This isn’t Fair Use. It’s also done without consent or license, and is therefore theft. If I sold that imaginary Star Wars game I’d end up in court. I’ve been involved in cases where exactly this sort of thing happened. 

There are three things that muddy the waters.

Firstly, tech-enabled scale. We’ve never seen so much stolen from so many individuals, and stolen so quickly. It’s so brazen, and so blatantly immoral and uncaring of the harm it does that authorities have no clue how to react. The law has simply failed to keep pace with the crime. 

Secondly, Fair Use and Legitimate Interest (LI). We’ve already mentioned the former, and that’s used to defend all manner of stuff that it doesn’t legally cover. The second is far more problematic because it has some legit uses. 

Legitimate Interest basically means that there’s a good reason why someone might need to use your data without asking permission. For example, a bank must conform to money laundering and other laws, so they need to look at your data. The GDPR has a number of clauses under which someone could claim LI. Unfortunately, the last one is a catch-all that can be used to exploit the process. Sure, you could send in a request to ask them to explain themselves, and then debate whether that was reasonable, and so on through the complaints procedure and courts. Realistically, this is almost never going to happen, or be worth the effort, because it’s massively resource consuming and by that point they’ve already used your data anyway. If we’re talking about AI, then it’s buried in the training pool and isn’t removable. So, LI is another pseudo-legal smokescreen that the AI companies can use to steal your data. 

Remember that I said this series was about me exploring and learning? Well, this third point is where we come to the core of this aspect of AI, and it leads to my first major takeaway, and it’s a pragmatic rather than an immediately emotionally satisfying one.

So where was I? Oh, yes. The AI companies have plundered what they needed to start their work and continue to harvest whatever they can get away with. A few have built models on training data that they’ve asked permission to use, though this is a minority approach and mainly seems to be a marketing exercise. Despite this, overwhelmingly, LLMs are built on data that’s being used without permission of the legal owner. It is not FU or LI under those definitions. Calling this anything other than theft is, in my opinion, either deluded or intentionally misleading. 

The third, and most important point, is that nothing material is going to be done about this. At least, not in any major way. The legal system lags too far behind, the power and resource imbalance between AI companies and creators is laughably one-sided, and the politicians are clearly leaning on the side of their oligarch peers outside the odd occasion where they need to pretend to care for votes. 

None of this should surprise anyone. It’s not new, it’s just business as usual. I don’t know if it’s because I’m older or whether anything really has changed, but it feels to me that while this abuse of power has always been the way, it’s less masked than it was in my youth. The politicians and billionaires used at least to pretend to care a little. Now the corruption and nepotism are out in the open in a way that would make the Borgias blush. 

So, yes. It’s not good. Creators large and small have had their work taken and used without permission. Stolen, in other words. And outside a few cases that perform the social function of show trials to salve the public conscience (which I predict they will fail to impress), the legal system will side with the money and power, as is tradition.

What I think this means is that there’s no real point in complaining. It’s a new paradigm we live in. Assume that anything you show in public will be stolen without repercussion. We’re back in the pre-copyright days now. Of course, the law will still be used to prosecute you if you use stuff the big boys own without permission, but the far more consequential thefts by the giant AI companies will continue unchecked.

All of this may seem a bit dystopian and gloomy, and it’s certainly not sunshine and rainbows. However, it’s not really new. People with more money than you will ever see have always had this power. The only difference is that it’s being wielded more brazenly and being used to abuse creatives. That doesn’t make it right, and being angry is a sane response. It won’t, however, make any difference. 

So, what to do? It seems to me that you’ve got 3 choices. 

  1. Take up an AK and lead the revolution against the capitalist running dogs of Big AI. Vanishingly small likelihood of success. Cannot recommend.
  2. Rail against the unfairness of it all, post online, complain to your friends, etc. This reminds me of Douglas Adam’s Hitchhiker’s Guide to the Galaxy. When the Vogon’s arrive, Ford tells the barman that the world is about to be destroyed. “Aren’t we supposed to lie down or put a paper bag over our heads or something,” he says. “Will that help?” That’s what getting angry feels like to me. The answer is no, by the way. It won’t help. I think that you’re far better off putting your energies into the last option. 
  3. There are sort of 2 versions of this, lurking under one umbrella. Overall, option 3 is simple: deal with it. The two possible flavours are coping by ignoring AI and coping by embracing it. You could blend your own middle ground, though purists will probably suggest that any use of AI is going to taint you. Whatever happens, in this third option you find your way to navigate the unfairness and lopsided immorality of it all. It’s never been fair or moral anyway, and this may be a useful wakeup call if you’d failed to notice previously. Whatever LLMs or subsequent AI forms do to the creative space, there will always be humans making things, and for the foreseeable future these will have a different place in the world to whatever software churns out. 

This last point is my takeaway from pondering this aspect of the current AI/LLM wave. The genie won’t be going back in the bottle, and the folk that let it out will not be held accountable for the damage they’ve done. 

One of the many lessons that decades of gaming has taught me is that you need to pick your fights. Some battles can’t be won. In this analogy, they have a million tanks and you have a pointed stick. It’s not a winnable position. Now, you don’t have to like it (you’d be strange if you did), but you stand far more chance of winning the game as a whole if you let this battle go. Reinforcing a loss is just a waste of your resources.

Realising this makes me think that the only useful way forward is to accept the shit sandwich they forced upon us, and move forward. Take your anger and channel that energy into getting better, learning more, finding your own way, because, at the end of the day, you are what you have most control over. Maybe the only thing. 

Let the AI companies do what they’re doing, just as you let the other human creators get on with their thing. It’s competition, it’s inspiration, and it’s background noise. Focus on yourself, your work, your skills and your progression. There was always someone better than you, and others who were less skilled. That hasn’t changed. Just now, some of it’s software. Learn, improve, and be the most exciting and interesting creator you can be. In a way, you should pity the poor AI. After all, for all its crass theft, and there was a lot of it, it can only copy, it cannot truly create. 

Maybe next week we should look at why this is a good thing.

This entry was posted in Random Thoughts and tagged . Bookmark the permalink.

24 Responses to Oceans Infinity

  1. I get what you are saying in your conclusion, and agree. But I still feel like we should / need to do something. Maybe biding my time isn’t one of my strong suits, but I do think in some way, shape, or form, we shouldn’t be the silent majority.

    • Quirkworthy's avatar Quirkworthy says:

      @Golden – I thought that for a long time. It’s only when I step really far back and look at the biggest picture than I come to the conclusion that my energies are better spent on my own projects. YMMV, of course. I certainly understand the urge to do something.

  2. kelvingreen's avatar kelvingreen says:

    Some small victories have already occurred: various computer game companies like Larian and Sandfall getting their GenAI usage punished with bad PR and revocation of awards; the recent Coca-Cola and McDonald’s festive adverts being ridiculed to the point that the companies had to issue hand-wringing responses.

    So yes, you’re absolutely right that it’s here and it won’t go away until the bubble bursts and the techbros move on to the next scam, but it very much is worth fighting back in the short term, because it does make a difference.

    (And I think targeting the users of GenAI is probably where the battle needs to happen, because it’s like a pyramid scheme, and it’s dependent on the CEOs being able to show their investors how many end users they have; if everyone’s too embarrassed to use it then the whole thing will rot from the bottom up.)

    • Quirkworthy's avatar Quirkworthy says:

      I agree that there have been some small victories, it’s just that they’re small. And, I’d argue, not going to stop the steamroller.

      I also agree that it’s something of a bubble. I’m not so convinced that it’ll stop the push for AI even if it does burst, and I think the recent movements towards Small Language Models and other sub-behemoth options sound like the start of a saner approach that might well save them from the pop, (if they transition in time). We’ll see. Not convinced that the push backs are slowing the steamroller.

      Currently, none of the big AI companies are making any useful money at all, hence the bubbly nature of the thing. There’s nothing to show in terms of profits, and free users are worthless to convince shareholders. I think that the current investors’ quasi-religious belief in the long term potential is what’s keeping this going. That’s hard to dent as you’re arguing with faith, and all you have is reason and logic.

      • kelvingreen's avatar kelvingreen says:

        I agree with all that, and your wider point that it’s perhaps healthier for us, at this end, to ignore it as best we can and just work on our own stuff.

        • Quirkworthy's avatar Quirkworthy says:

          I am leaning very much towards that being the best route forward.

          I can imagine using AI art for a specific project which is about the robots taking over, mainly because that meta amuses me. Distant future stuff though.

      • riccy diccy's avatar riccy diccy says:

        Even after the bubble bursts, generative AI will not go away. That’s what previous tech bubbles have taught us, going all the way back to railways. It will just get cheaper and even more common & integrated with various other tech tools.

        • Quirkworthy's avatar Quirkworthy says:

          Absolutely, 100% The genie won’t be going back in the bottle, popping bubbles or not. And even if it doesn’t burst, AI will change. It’s already evolving.

      • André's avatar André says:

        At least the LLM carry the seed of their own destruction already within them. They depend massively on information being fed to them. With human content and correct content being pushed to the side by their output they will get dumber and dumber by the hour. And we still do not get the real cost told for a minute of AI… and it is really expensive. At least the LLM bubble will burst, the question is when and how much damage it caused until then.

        • Quirkworthy's avatar Quirkworthy says:

          If/when the bubble bursts, there will presumably be a temporary drop in public availability of some LLMs as various companies die or are bought out and things are reshuffled. I don’t see them going away entirely though, and after a brief pause to reorganise, I expect them to be back. They’ll presumably be a bit different, but they’re evolving anyway, as early versions of a tech do.

          In other words, I don’t expect any bursting bubbles to be more than a temporary glitch in AI development.

  3. I’m interested to ask why using Star Wars material to teach students about media use is considered fair use, but using Star Wars material to teach LLMs about media use is considered theft.

    If the students write science fiction adventure books and the LLM does the same, and both parties try to sell their books for profit, has only the LLM profited illegally from the same training process?

    I don’t see LLMs trying to sell books or art or star wars games or other copyrighted products themselves, they’re being trained like students to produce works at the request of humans which are inspired and influenced by the material they were trained on. Like a newly trained student would if employed by humans.

    So does the fault lie with the way the LLMs are trained, or with the humans employing them to produce art or literature? And if so, how does that compare with employing humans to do the same?

    Perhaps in time we’ll see more people tire of getting derivative work from LLMs, particularly if they’re looking for good quality product, in the same way that forged art or a copycat novel plot produced by a human is derided. Maybe the response to the festive adverts is already that reaction happening.

    Or maybe with sufficient training and experience, the LLMs will progress to the point where it’s difficult to tell where copying ends and creativity begins. Like a human would.

    • Quirkworthy's avatar Quirkworthy says:

      It’s confusing.

      Fair use allows folk to use copywriter material for certain specific uses, including teaching and critique. So using it as an example in class is fine.

      Modern LLM AIs were not around when the law was written, so their proposals that taking all existing works as training data without permission is fair use has gone to the courts in an ever increasing number of cases.

      Many of these cases are still ongoing, and there are several different approaches used by both sides to plead their cases. Confusingly, both sides have won different cases. In general though, what’s happening is that big players like Disney and Warner Brothers are doing the suing, and severall of these cases have been settled rather than adjudicated. It’s big money being thrown about by big players and less well-known individual creators are getting nothing.

      Overall, it’s a mess, though the trend seems to be that most cases are either lost or settled by AI companies.

      You’re right that there’s a vocal group mocking stuff like the Coke ad, etc. I wonder how many of the same people happily use Chat GPT to do their homework.

      Where it all ends will be intriguing to watch. One can only assume that they’ll continue to improve.

    • André's avatar André says:

      LLM lack many things, they have no understanding of meta, contexts, time etc. This already taints a lot of their output. There is also no understanding, only the percentage of how much the result fits what has been requested.

      • Quirkworthy's avatar Quirkworthy says:

        That’s true of a single LLM now. However, the next wave seems to be acknowledging some of these issues and combining several different AI processes (not all of which are LLMs) to give a better output overall. I can see that as being a small step towards a more functional AI, though as you rightly point out, they’ve got a long way to go. Much further than the CEOs of these companies say to their shareholders.

        • André's avatar André says:

          LLM are a dead end, but a very expensive one. They scale badly, are prone to hallucinations. They still rely heavily on clickworkers to iron ot the worst issues, steal wherever they can, and gobble up components and energy at a rate that will really hurt the electronics consumer market and that for little overall gain outside the AI bros. The only thing they really are good at is charming those who ask them. And even this gets annoying after a while. Real AI research has gotten a big financial hit from the money being diverted to LLM and it wil take a lot of time to make up for the research lost. Ask Gary Marcus, one of the best in that area, how dmaging LLM is for the field. It is in the end the bitcoin bubble, only many times worse.

        • Quirkworthy's avatar Quirkworthy says:

          It feels to me as though the current LLM model is the expensive brute force approach that needs to be refined into a more elegant, streamlined, and targeted approach. Though I’m not sure you can get there from here.

          Maybe all this is doing is conditioning folk to the insane levels of resource cost, so the real version seems reasonable.

  4. riccy diccy's avatar riccy diccy says:

    If a human artist looks at a painting, carefully studies it, even measuring it in various ways (including, say, color gradients & temperatures & stuff), and then goes on to create something in the same style, without it being a copy, would this be stealing? If yes, then most illustrators are thieves. If no, why can’t an AI do the same?

    Anyway, AI art is here and it will stayn it’s too easy & profitable and the big players are running with. As they do for producing all sorts of texts, or accounting tasks. It’s bad for employment, I myself am losing my boring office job to capitalist automation, but there it is, it’s happening.

    • Quirkworthy's avatar Quirkworthy says:

      Very sorry to hear about your job. Hope you can find something else quickly.

      Your point comparing human artists and AI “artists” has been rolled around a lot, and I’m not sure I can give a concise answer now. I do have it on my list of things to address as I think it deserves an answer.

    • André's avatar André says:

      At least the good artists are capable of introducing a new spin to what they saw. The LLMs are not capable of that… they will always offer you only a pastiche of stuff they saw.

  5. Odysseas's avatar Odysseas says:

    I feel there’s a fourth choice- support anti-AI actively. Support artists, commission stuff from them; if you’re a game designer don’t use AI for art (or even worse, designing mechanics- I’ve heard of that…). Explain to people who abuse AI why that’s a problem (friends and relatives- no need to go wild on the internet, unless that’s your shtick too). And don’t use AI for anything- not to brainstorm, not to make images for personal use, not to “help organize your notes”.

    • Quirkworthy's avatar Quirkworthy says:

      This feels like one possible blended variant of option 3. Possibly a common one. You’re dealing with AI (the essence of option 3) by a combination of not using it (ignoring it) and encouraging others to do the same. A perfectly valid approach.

      I don’t think that there’s any right or wrong way to deal with AI. You need to pick what you feel is right. I’m just trying to understand the space and the options.

      • Odysseas's avatar Odysseas says:

        Definitely an overlap with 3, but I didn’t see 3 promoting action to support those that AI most affects, like artists and content creators, only avoiding or embracing it (or somehow mixing it…?). I guess it overlaps with 2 a bit, in the “make your anti-AI sentiment known).Interesting semantics question though- when does something stop being a variation, and becomes something on its own right? Weird tangent, sorry.

Leave a reply to Quirkworthy Cancel reply