You and I have been witness to the greatest art heist in the planet’s history. Yup, today I want to talk about the underpinning of the current crop of Large Language Model (LLM) style AIs: the training data they scraped from across the internet, published works, and anything else they could get their digital paws on.
Compared to this, the recent Louvre break in was amateur hour, Vincenzo Peruggia was a lucky chancer, and even the stripping of artwork from across Europe to line the Nazi galleries barely warrants a footnote. Hyperbole? Kind of, but not entirely.
On the one hand, the artwork hasn’t been removed. It’s still where it was to start with, pretending that nothing has happened. And the creators may even now be none the wiser. However, behind the scenes, a hugely consequential theft has taken place.
Theft? Before we go any further, let me say that this is how I understand it. I’m not a lawyer, but I’ve dealt with copyright cases a number of times in my years in the industry. Fundamentally, copyright law seems simple. A human creator makes a new work, and by the process of that creation they gain the copyright in it. They don’t have to claim it, the act of creation itself grants them rights (as long as they’re human). Other entities may not use that copywritten work for their own financial gain without permission. There are a few cut outs for what is broadly called Fair Use (FU, ironically), though this applies to reviews and education rather than profit-centred endeavours. So, let’s say I want to design and publish a Star Wars game. I’d need permission from the copyright holder to do so. If I wanted to teach a course on media studies, I could reference Star Wars as Fair Use without express permission.
On the face of it, the AI case seems pretty simple. The LLM industry collectively scraped all the art and writing and music and code and anything else they could find and used it to train software with the express purpose of competing with and replacing those creators. I don’t think that’s up for debate. This isn’t Fair Use. It’s also done without consent or license, and is therefore theft. If I sold that imaginary Star Wars game I’d end up in court. I’ve been involved in cases where exactly this sort of thing happened.
There are three things that muddy the waters.
Firstly, tech-enabled scale. We’ve never seen so much stolen from so many individuals, and stolen so quickly. It’s so brazen, and so blatantly immoral and uncaring of the harm it does that authorities have no clue how to react. The law has simply failed to keep pace with the crime.
Secondly, Fair Use and Legitimate Interest (LI). We’ve already mentioned the former, and that’s used to defend all manner of stuff that it doesn’t legally cover. The second is far more problematic because it has some legit uses.
Legitimate Interest basically means that there’s a good reason why someone might need to use your data without asking permission. For example, a bank must conform to money laundering and other laws, so they need to look at your data. The GDPR has a number of clauses under which someone could claim LI. Unfortunately, the last one is a catch-all that can be used to exploit the process. Sure, you could send in a request to ask them to explain themselves, and then debate whether that was reasonable, and so on through the complaints procedure and courts. Realistically, this is almost never going to happen, or be worth the effort, because it’s massively resource consuming and by that point they’ve already used your data anyway. If we’re talking about AI, then it’s buried in the training pool and isn’t removable. So, LI is another pseudo-legal smokescreen that the AI companies can use to steal your data.
Remember that I said this series was about me exploring and learning? Well, this third point is where we come to the core of this aspect of AI, and it leads to my first major takeaway, and it’s a pragmatic rather than an immediately emotionally satisfying one.
So where was I? Oh, yes. The AI companies have plundered what they needed to start their work and continue to harvest whatever they can get away with. A few have built models on training data that they’ve asked permission to use, though this is a minority approach and mainly seems to be a marketing exercise. Despite this, overwhelmingly, LLMs are built on data that’s being used without permission of the legal owner. It is not FU or LI under those definitions. Calling this anything other than theft is, in my opinion, either deluded or intentionally misleading.
The third, and most important point, is that nothing material is going to be done about this. At least, not in any major way. The legal system lags too far behind, the power and resource imbalance between AI companies and creators is laughably one-sided, and the politicians are clearly leaning on the side of their oligarch peers outside the odd occasion where they need to pretend to care for votes.
None of this should surprise anyone. It’s not new, it’s just business as usual. I don’t know if it’s because I’m older or whether anything really has changed, but it feels to me that while this abuse of power has always been the way, it’s less masked than it was in my youth. The politicians and billionaires used at least to pretend to care a little. Now the corruption and nepotism are out in the open in a way that would make the Borgias blush.
So, yes. It’s not good. Creators large and small have had their work taken and used without permission. Stolen, in other words. And outside a few cases that perform the social function of show trials to salve the public conscience (which I predict they will fail to impress), the legal system will side with the money and power, as is tradition.
What I think this means is that there’s no real point in complaining. It’s a new paradigm we live in. Assume that anything you show in public will be stolen without repercussion. We’re back in the pre-copyright days now. Of course, the law will still be used to prosecute you if you use stuff the big boys own without permission, but the far more consequential thefts by the giant AI companies will continue unchecked.
All of this may seem a bit dystopian and gloomy, and it’s certainly not sunshine and rainbows. However, it’s not really new. People with more money than you will ever see have always had this power. The only difference is that it’s being wielded more brazenly and being used to abuse creatives. That doesn’t make it right, and being angry is a sane response. It won’t, however, make any difference.
So, what to do? It seems to me that you’ve got 3 choices.
- Take up an AK and lead the revolution against the capitalist running dogs of Big AI. Vanishingly small likelihood of success. Cannot recommend.
- Rail against the unfairness of it all, post online, complain to your friends, etc. This reminds me of Douglas Adam’s Hitchhiker’s Guide to the Galaxy. When the Vogon’s arrive, Ford tells the barman that the world is about to be destroyed. “Aren’t we supposed to lie down or put a paper bag over our heads or something,” he says. “Will that help?” That’s what getting angry feels like to me. The answer is no, by the way. It won’t help. I think that you’re far better off putting your energies into the last option.
- There are sort of 2 versions of this, lurking under one umbrella. Overall, option 3 is simple: deal with it. The two possible flavours are coping by ignoring AI and coping by embracing it. You could blend your own middle ground, though purists will probably suggest that any use of AI is going to taint you. Whatever happens, in this third option you find your way to navigate the unfairness and lopsided immorality of it all. It’s never been fair or moral anyway, and this may be a useful wakeup call if you’d failed to notice previously. Whatever LLMs or subsequent AI forms do to the creative space, there will always be humans making things, and for the foreseeable future these will have a different place in the world to whatever software churns out.
This last point is my takeaway from pondering this aspect of the current AI/LLM wave. The genie won’t be going back in the bottle, and the folk that let it out will not be held accountable for the damage they’ve done.
One of the many lessons that decades of gaming has taught me is that you need to pick your fights. Some battles can’t be won. In this analogy, they have a million tanks and you have a pointed stick. It’s not a winnable position. Now, you don’t have to like it (you’d be strange if you did), but you stand far more chance of winning the game as a whole if you let this battle go. Reinforcing a loss is just a waste of your resources.
Realising this makes me think that the only useful way forward is to accept the shit sandwich they forced upon us, and move forward. Take your anger and channel that energy into getting better, learning more, finding your own way, because, at the end of the day, you are what you have most control over. Maybe the only thing.
Let the AI companies do what they’re doing, just as you let the other human creators get on with their thing. It’s competition, it’s inspiration, and it’s background noise. Focus on yourself, your work, your skills and your progression. There was always someone better than you, and others who were less skilled. That hasn’t changed. Just now, some of it’s software. Learn, improve, and be the most exciting and interesting creator you can be. In a way, you should pity the poor AI. After all, for all its crass theft, and there was a lot of it, it can only copy, it cannot truly create.
Maybe next week we should look at why this is a good thing.






