OpenAI being Sued for "Stealing" Peoples Content Online

manitcor@lemmy.intai.tech · 1 year ago

OpenAI being Sued for "Stealing" Peoples Content Online

RatzChatsubo@vlemmy.net · 1 year ago

So we can sue robots but when I ask if we can tax them, and reduce human working hours, I’m the crazy one?

atzanteol@sh.itjust.works · 1 year ago

So we can sue robots

… No?

slipperydippery@lemmy.world · 1 year ago

What would you tax exactly? Robots don’t earn an income and don’t inherently make a profit. You could tax a company or owner who profits off of robots and/or sells their labor.

RatzChatsubo@vlemmy.net · 1 year ago

It would have to be some sort of moderated labor cost saving tax kind of thing enforced by the government

devzero@sh.itjust.works · 1 year ago

Should we tax bulldozers because they take away jobs from people using shovels? What about farm equipment, since they take away jobs from people picking fruit by hand? What about mining equipment, because they take away jobs from people using pickaxes?

PlebsicleMcGee@feddit.uk · 1 year ago

And we’re just gonna let the pickaxes off without a tax?

RatzChatsubo@vlemmy.net · edit-2 1 year ago

If the machine replaced the human, yes. That’s the argument being made currently.

Imagine if we simply taxed machine profits after 40 hours of work. You not only can give kickbacks to large companies, but you could also rewire profits to UBI

pitninja@lemmy.pit.ninja · 1 year ago

It’s the wrong way to go about it, though. Just tax businesses’ profits and close the bullshit loopholes they exploit to avoid paying them.

RatzChatsubo@vlemmy.net · 1 year ago

That too

devzero@sh.itjust.works · 1 year ago

But 40 hours of “work” is poorly defined. If you had everyone digging with spoons on your construction site, then you might need 100 people at 40 hours per week. If you have everyone shovels, you would only need 10 people at 40 hours a week. Do you want to tax shovels for “taking the job” from 90 people?

RatzChatsubo@vlemmy.net · 1 year ago

Yeah idk, I’m no expert. I just want wealth redistribution

PlebsicleMcGee@feddit.uk · edit-2 1 year ago

If we think of production as costing land, labour and capital, then more efficient methods of production would likely swap labour for capital. In that case then we just tax capital growth like we’re doing now (Only properly, like without the loopholes). No need to complicate it past that

veganzombeh@lemmy.world · 1 year ago

I’m not sure how feasible it is but I’ve seen a sort of “minimum wage” for robots suggested which is paid to the government as tax.

God@sh.itjust.works · 1 year ago

What would be the legal argument for this? I’m not against it but I don’t know how it could be argued.

Changetheview@lemmy.world · 1 year ago

Legal basis for suing a company that uses another company’s product/creations without approval seems like a fairly pretty straightforward intellectual property issue.

Legal basis for increased taxes on revenue from AI or automation in general could be created in the same way that any tax is created: through legislation. Income tax didn’t exist before, now it does. Tax breaks for mortgage interest didn’t use it exist, now it does. Levying higher taxes on companies that rely heavily on automated systems and creating a UBI out of the revenue might not exist right now, but it can, if the right people are voted in to pass the right laws.

tony_lasagna@lemmy.world · 1 year ago

I don’t think a UBI makes sense, for many people it will just be extra money in their pocket to spend which continues the endless inflation in prices until the gain disappears.

More efficient targeting of benefits to those who need it with that money would actually help reduce inequality

smokeythebear@lemmy.world · 1 year ago

Every single example of means testing has been more expensive than just distributing the benefits to the people that ask for them.

tony_lasagna@lemmy.world · 1 year ago

I don’t think a UBI makes sense, for many people it will just be extra money in their pocket to spend which continues the endless inflation in prices until the gain disappears.

More efficient targeting of benefits to those who need it with that money would actually help reduce inequality

RatzChatsubo@vlemmy.net · 1 year ago

The idea is that UBI would give people time for the working class to pursue passions, spend more money, and enable more people to pursue entrepreneurship in the country. All things that in turn would benefit society and the arts.

Pulp@lemmy.dbzer0.com · 1 year ago

Then UBI amount is increased (hopefully fast enough) to fight inflation

PlebsicleMcGee@feddit.uk · 1 year ago

continues the endless inflation

UBI would likely lead to a decrease in wages or at least a period of stagnation as it would be less important to employees. As far as I’ve heard long run it shouldn’t hurt

Irisos@lemmy.umainfo.live · 1 year ago

It’s more that UBI is just not financially possible for any country.

I live in a country with the highest tax rate on the continent and with just 20% of our population as pensioners, the situation is just getting worse and worse even though 49% of the population has a tax rate between 25 and 50% (+13% from welfare taxes). Just with this small percentage, we are spending 20% of our budget in pensions. More than any other area by at least 5% of our national budget.

If the state now had to pay an UBI to 69% of our population on top of this, the very minimum to pay off the UBI without going bankrupt would be to sell off the free healthcare and public transport in their entirety. And I’m assuming a small UBI of 500€/month (Not even enough to rent a 1 room appartement with utilities in some areas).

UBI would destroy any country’s budget for what? Landlord increasing rent to match the UBI, corporations increasing prices to match the inflation and people wasting that money when it could have been put to use to increase renewable energy production, improve education, …

UBI is only a good idea in paper and you only need to look at the public expenses of most European countries + have a basic understanding of capitalist greed to see it.

RatzChatsubo@vlemmy.net · 1 year ago

I’m no expert on law but maybe something about AI unethically taking our jobs away

NightFantom@slrpnk.net · 1 year ago

Universal base income + AI/robots taking care of all necessary jobs sounds great

RatzChatsubo@vlemmy.net · edit-2 1 year ago

Thats exactly what Andrew Yangs political platform was. I hope he runs again

Shartacus@lemmy.world · 1 year ago

I wrote him in and probably will again

Flicsmo@rammy.site · 1 year ago

I don’t know if he’s running for president, but in case you’re unaware he founded a new political party, the Forward Party. It’s the first time I’ve really believed in anything political; it might not resonate with you but it’s worth looking into if you haven’t.

RatzChatsubo@vlemmy.net · 1 year ago

Oh I had no idea. Damn he’s not in the dem ticket? I thought he would be great arguing AI talking points on stage against all the boomers on the left, guess he’s done with the sham of centrist politics tho, can’t blame him really

FunkyDuck@lemmy.world · 1 year ago

The issue with Yang is that he’s proposing cutting other social safety nets and replacing them with UBI which would put a lot of people in worse situations. UBI would be great but we also need robust social programs.

Flicsmo@rammy.site · 1 year ago

I’ve been reliant on social programs and found them severely lacking. They’re bureaucracy at their worst, and I’m lucky to be able to navigate through it - it seems those who need the help the most are the least able to receive it. They’re wasteful too, I would rather the funds go directly to people who need it rather than feeding the middleman.

Slacking@sh.itjust.works · 1 year ago

China didn’t take your job and neither will AI. Corporations will replace you for something that cost less.

We can’t really legislate against AI because other countries won’t. Its also a huge boon for society, we just have to make sure the profits are redistributed and work hours overall are reduced instead of all the productivity gain going into the pockets of the mega wealthy

assembly@lemmy.world · 1 year ago

I’m not sure that people want to legislate against AI as much as they want to find a way to legislate for the fair outcomes associated with AI productivity. The challenge is that is harder to do. In the USA we can’t get out of our own way to properly tax corporations, nevermind have a more complex solution like reduce worker hours, increase PTO based upon improved societal output. In the absence of a complex but comprehensive solution (which I don’t think we have the capability to pull off) people are desperate and saying things like “let’s hold back on AI will we can put together this mythical great plan”. We’re never going to get the great plan though. Hopefully I’m just cynical but I don’t see a path (at least for the US as I can’t speak for the rest of the world) that doesn’t continue towards dystopia.

CoderKat@lemm.ee · 1 year ago

What makes it unethical? How is it different from advancements in technology taking away any other job, like elevator operators, numerous factory positions, calculators (the human kind), telephone operators, people who sew clothes (somewhat), and so on?

It seems to me that automating away jobs has historically bettered humanity. Why would we want a job to be done by a person when we can have a machine do it (assuming a machine does equal or better)? We can better focus people on other jobs and eventually, hopefully, with no mandatory need for a job at all.

GrandmasterFrank@lemmy.blahaj.zone · 1 year ago

hopefully, with no mandatory need for a job at all.

Lol, as if. Look at wages:productivity since the 70s

CoderKat@lemm.ee · 1 year ago

Well, this “eventually” thing wouldn’t be until we can automate away so many jobs that we simply couldn’t (meaningfully) employ a significant chunk of people. We’re not there yet. Though we shouldn’t wait till we reach that point to get some form of UBI available. It’s at that point where UBI would be critical and needs to be at a living wage level.

Guy_Fieris_Hair@lemmy.world · edit-2 1 year ago

It could be argued that when our tax code, laws, and constitution were created there weren’t AIs taking jobs and funneling the economy to only a few people breaking the system and it’s time for us to adapt as a society. But I know adapting isn’t a strength of our legal system.

Also, you wouldn’t be suing the AI as it’s own entity. You would be suing the creator/owner that is allowing it to steal people’s content. AI is not to the point it is sentient and responsible for it’s own actions.

God@sh.itjust.works · 1 year ago

That’s actually a great argument: an AI is trained without permission on the result of people’s labor, and is thus able to intercept the need for this labor and take away financial opportunities derived thereof. Therefore, An AI’s labor and its profit could be argued to contain, in the percentage that an AI is the content of its training, a portion that is proportionately belonging to those who did this labor its obscure process is based on. Therefore, an AI’s master should take a portion of its revenue as royalties and distribute them to the “people’s council” which in this case is just the government, for it to redistributed accordingly.

Guy_Fieris_Hair@lemmy.world · 1 year ago

i.e. tax the fuck out of the owners, and minimum basic income for all. Completing the economic circle of life.

JohnnyCanuck@sh.itjust.works · 1 year ago

“Massive Trouble”

Step 1 - Scrape everyone’s data to make your LLM and make a high profile deal worth $10B Step 2 - Get sued by everyone whose data you scraped Step 3 - Settle and everyone in the class will be eligible for $5 credit using ChatGPT-4 Step 4 - Bask in the influx of new data Step 5 - Profit

manitcor@lemmy.intai.tech · 1 year ago

i posted on the public internet with the intent and understanding that it would be crawled by systems for all kinds of things. if i dont want content to be grabbed i dont publish it publicly

you can’t easily have it both ways imo. even with systems that do strong pki if you want the world in general to see it you are giving up a certain amount of control over how the content gets used.

law does not really matter here as much as people would like to try to apply it, this is simply how public content will be used. Go post in a garden if you don’t want to get scrapped, just remember the corollary is your reach, your voice is limited to the walls of that garden.

lemmyvore@feddit.nl · 1 year ago

What you said makes a lot of sense. But here’s the catch: it assumes OpenAI checked the licensing for all the stuff they grabbed. And I can guarantee you they didn’t.

It’s damn near impossible to automatically check the licensing for all the stuff they got she we know for a fact they got stuff whose licensing does not allow it to be used this way. Microsoft has already been sued for Copilot, and these lawsuits will keep coming. Assuming they somehow managed to only grab legit material and they used excellent legal advisors that assured them out would stand in court, it’s definitely impossible to tell what piece of what goes where after it becomes a LLM token, and also impossible to tell what future lawsuits will decide about it.

Where does that leave OpenAI? With the good ol’ “I grabbed something off the internet because I could”. Why does that sound familiar? It’s something people have been doing since the internet was invented, it’s commonly referred to as “piracy”. But it’s supposed to be wrong and illegal. Well either it’s wrong and illegal for everybody or the other way around.

manitcor@lemmy.intai.tech · 1 year ago

there were court cases around this very thing and google and webarchive. I suspect thier legal team is expecting similar precedent with the issue being down to the individual and how they use the index, example, using it to make my own unique character (easily done) vs making an easy and obvious rip off of a Disney property. The same tests can be applied, the question IMO isn’t about the index that is built here. I can memorize a lot (some people have actual eidetic memory) and synthesize it too which is protected and I can copyright my own mental outputs. The disposition of this type of output vs mechanical outputs i expect will be where things end up being argued.

I’m not going to say I’m 100% right here, we are in a strange timeline but there is precedent for what OAI is doing IMO.

WhiteTiger@lemmy.world · edit-2 1 year ago

The issue becomes the sale/profit of selling access, such as with GPT-4 right now. Indexing/archiving and selling are two very different beasts.

manitcor@lemmy.intai.tech · edit-2 1 year ago

interesting lines to walk, depends on what they are selling, there is a definte cost to running a model and you are allowed to charge a reasonable fee to handle the process of providing the records. we used to pay per page for this kind of thing, now you pay per token

they can also sell a lot of services and tools around the model while still not using it in a non-infringing manner. this will all end up in front a of a judge, with the books laid out i suspect. I am not sure we will ever see any of the details, i hope we do.

rbhfd@lemmy.world · 1 year ago

The difference between piracy and having your content used for training a generative model, is that in the latter case, the content isn’t redistributed. It’s like downloading a movie from netflix (and eventually distributing it for free) vs watching a movie on netflix and using it as inspiration to make your own movie.

The legality of it all is unclear and most of that is because the technology evolved so quickly that the legal framework is just not equipped to deal with it. Despite the obvious moral issues with scraping artist’s content.

JohnnyCanuck@sh.itjust.works · 1 year ago

Yes, notice I said “Scrape” and not “steal” :)

manitcor@lemmy.intai.tech · 1 year ago

abs, providing some more background from my perspective.

Geograph6@lemmy.dbzer0.com · 1 year ago

People talk about OpenAI as if its some utopian saviour that’s going to revolutionise society. When in reality its a large corporation flooding the internet with terrible low-quality content using machine learning models that have existed for years. And the fields it is “automating” are creative ones that specifically require a human touch, like art and writing. Language learning models and image generation isn’t going to improve anything. They’re not “AI” and they never will be. Hopefully when AI does exist and does start automating everything we’ll have a better economic system though :D

fiasco · 1 year ago

The thing that amazes me the most about AI Discourse is, we all learned in Theory of Computation that general AI is impossible. My best guess is that people with a CS degree who believe in AI slept through all their classes.

leonardo_arachoo@lemm.ee · edit-2 1 year ago

we all learned in Theory of Computation that general AI is impossible.

I strongly suspect it is you who has misunderstood your CS courses. Can you provide some concrete evidence for why general AI is impossible?

fiasco · 1 year ago

Evidence, not really, but that’s kind of meaningless here since we’re talking theory of computation. It’s a direct consequence of the undecidability of the halting problem. Mathematical analysis of loops cannot be done because loops, in general, don’t take on any particular value; if they did, then the halting problem would be decidable. Given that writing a computer program requires an exact specification, which cannot be provided for the general analysis of computer programs, general AI trips and falls at the very first hurdle: being able to write other computer programs. Which should be a simple task, compared to the other things people expect of it.

Yes there’s more complexity here, what about compiler optimization or Rust’s borrow checker? which I don’t care to get into at the moment; suffice it to say, those only operate on certain special conditions. To posit general AI, you need to think bigger than basic block instruction reordering.

This stuff should all be obvious, but here we are.

leonardo_arachoo@lemm.ee · edit-2 1 year ago

Given that humans can write computer programs, how can you argue that the undecidability of the halting problem stops intelligent agents from being able to write computer programs?

I don’t understand what you mean about the borrow checker in Rust or block instruction reordering. These are certainly not attempts at AI or AGI.

What exactly does AGI mean to you?

This stuff should all be obvious, but here we are.

This is not necessary. Please don’t reply if you can’t resist the temptation to call people who disagree with you stupid.

fiasco · 1 year ago

This is proof of one thing: that our brains are nothing like digital computers as laid out by Turing and Church.

What I mean about compilers is, compiler optimizations are only valid if a particular bit of code rewriting does exactly the same thing under all conditions as what the human wrote. This is chiefly only possible if the code in question doesn’t include any branches (if, loops, function calls). A section of code with no branches is called a basic block. Rust is special because it harshly constrains the kinds of programs you can write: another consequence of the halting problem is that, in general, you can’t track pointer aliasing outside a basic block, but the Rust program constraints do make this possible. It just foists the intellectual load onto the programmer. This is also why Rust is far and away my favorite language; I respect the boldness of this play, and the benefits far outweigh the drawbacks.

To me, general AI means a computer program having at least the same capabilities as a human. You can go further down this rabbit hole and read about the question that spawned the halting problem, called the entscheidungsproblem (decision problem) to see that AI is actually more impossible than I let on.

leonardo_arachoo@lemm.ee · edit-2 1 year ago

Here are two groups of claims I disagree with that I think you must agree with

1 - brains do things that a computer program can never do. It is impossible for a computer to ever simulate the computation* done by a brain. Humans solve the halting problem by doing something a computer could never do.

2 - It is necessary to solve the halting problem to write computer programs. Humans can only write computer programs because they solve the halting problem first.

*perhaps you will prefer a different word here

I would say that:

it doesn’t require solving any halting problems to write computer programs
there is no general solution to the halting problem that works on human brains but not on computers.
computers can in principle simulate brains with enough accuracy to simulate any computation happening on a brain. However, there would be far cheaper ways to do any computation.

Which of my statements do you disagree with?

fiasco · 1 year ago

I suppose I disagree with the formulation of the argument. The entscheidungsproblem and the halting problem are limitations on formal analysis. It isn’t relevant to talk about either of them in terms of “solving them,” that’s why we use the term undecidable. The halting problem asks, in modern terms—

Given a computer program and a set of inputs to it, can you write a second computer program that decides whether the input program halts (i.e., finishes running)?

The answer to that question is no. In limited terms, this tells you something fundamental about the capabilities of Turing machines and lambda calculus; in general terms, this tells you something deeply important about formal analysis. This all started with the question—

Can you create a formal process for deciding whether a proposition, given an axiomatic system in first-order logic, is always true?

The answer to this question is also no. Digital computers were devised as a means of specifying a formal process for solving logic problems, so the undecidability of the entscheidungsproblem was proven through the undecidability of the halting problem. This is why there are still open logic problems despite the invention of digital computers, and despite how many flops a modern supercomputer can pull off.

We don’t use formal process for most of the things we do. And when we do try to use formal process for ourselves, it turns into a nightmare called civil and criminal law. The inadequacies of those formal processes are why we have a massive judicial system, and why the whole thing has devolved into a circus. Importantly, the inherent informality of law in practice is why we have so many lawyers, and why they can get away with charging so much.

As for whether it’s necessary to be able to write a computer program that can effectively analyze computer programs, to be able to write a computer program that can effectively write computer programs, consider… Even the loosey goosey horseshit called “deep learning” is based on error functions. If you can’t compute how far away you are from your target, then you’ve got nothing.

irmoz@reddthat.com · 1 year ago

From what I’ve heard, the biggest hurdle for AI right now is the fact that computers only work with binary. They are incapable of actually “reading” the things they write - all they’re qctually aware of is the binary digits they manipulate, that represent the words they’re reading and writing. It could analyse War and Peace over and over, and even if you asked it who wrote it, it wouldn’t actually know.

qfe0@lemmy.dbzer0.com · 1 year ago

The existence of natural intelligence is the proof that artificial intelligence is possible.

argv_minus_one@beehaw.org · 1 year ago

We can simulate all manner of physics using a computer, but we can’t simulate a brain using a computer? I’m having a real hard time believing that. Brains aren’t magic.

fiasco · 1 year ago

Computer numerical simulation is a different kind of shell game from AI. The only reason it’s done is because most differential equations aren’t solvable in the ordinary sense, so instead they’re discretized and approximated. Zeno’s paradox for the modern world. Since the discretization doesn’t work out, they’re then hacked to make the results look right. This is also why they always want more flops, because they believe that, if you just discretize finely enough, you’ll eventually reach infinity (or infinitesimal).

This also should not fill you with hope for general AI.

IllNess@infosec.pub · 1 year ago

It’s all buzzword exaggerations. It’s marketing.

Remember when hoverboards were for things that actually hover instead of some motorized bullshit on two wheels? Yeah, same bullshit.

Uriel-238@lemmy.fmhy.ml · 1 year ago

If this lawsuit is ruled in favor of the plaintiff, it might lead to lawsuits against those who have collected and used private data more maliciously, from advertisement-targeting services to ALPR services that reveal to law enforcement your driving habits.

Ajen@sh.itjust.works · 1 year ago

So some of the most profitable corporations in the world? In that case this lawsuit isn’t going anywhere.

Treemaster099@pawb.social · edit-2 1 year ago

Good. Technology always makes strides before the law can catch up. The issue with this is that multi million dollar companies use these gaps in the law to get away with legally gray and morally black actions all in the name of profits.

Edit: This video is the best way to educate yourself on why ai art and writing is bad when it steals from people like most ai programs currently do. I know it’s long, but it’s broken up into chapters if you can’t watch the whole thing.

PlebsicleMcGee@feddit.uk · 1 year ago

Totally agree. I don’t care that my data was used for training, but I do care that it’s used for profit in a way that only a company with big budget lawyers can manage

CoderKat@lemm.ee · edit-2 1 year ago

But if we’re drawing the line at “did it for profit”, how much technological advancement will happen? I suspect most advancement is profit driven. Obviously people should be paid for any work they actually put in, but we’re talking about content on the internet that you willingly create for fun and the fact it’s used by someone else for profit is a side thing.

And quite frankly, there’s no way to pay you for this. No company is gonna pay you to use your social media comments to train their AI and even if they did, your share would likely be pennies at best. The only people who would get paid would be companies like reddit and Twitter, which would just write into their terms of service that they’re allowed to do that (and I mean, they already use your data for targeting ads and it’s of course visible to anyone on the internet).

So it’s really a choice between helping train AI (which could be viewed as a net benefit for society, depending on how you view those AIs) vs simply not helping train them.

Also, if we’re requiring payment, only the super big AI companies can afford to frankly pay anything at all. Training an AI is already so expensive that it’s hard enough for small players to enter this business without having to pay for training data too (and at insane prices, if Twitter and Reddit are any indication).

Programmer Belch@lemmy.dbzer0.com · 1 year ago

Hundreds of projects in github are supported by donations, innovation happens even without profit incentives. It may slow down the pace of AI development but I am willing to wait anothrt decade for AIs if it protects user data and let’s regulation catch up.

Johem@lemmy.world · 1 year ago

Reddit is currently trying to monetize their user comments and other content by charging for API access. Which creates a system where only the corporations profit and the users generating the content are not only unpaid, but expected to pay directly or are monetized by ads. And if the users want to use the technogy trained by their content they also have to pay for it.

Sure seems like a great deal for corporations and users getting fleeced as much as possible.

archomrade [he/him]@midwest.social · 1 year ago

I’m honestly at a loss for why people are so up at arms about OAI using this practice and not Google or Facebook or Microsoft, ect. It really seems we’re applying a double standard just because people are a bit pissed at OpenAI for a variety of reasons, or maybe just vaguely mad at the monetary scale of “tech giants”

My 2 cents: I don’t think content posted on the open internet (especially content produced by users on a free platform being claimed not by those individuals but by the platforms themselves) should be litigated over, when that information isnt even being reproduced but being used on derivative works. I think it’s conceptually similar to an individual reading a library of books to become a writer and charge for the content they produce.

I would think a piracy community would be against platforms claiming ownership over user generated content at all.

Treemaster099@pawb.social · 1 year ago

https://youtu.be/9xJCzKdPyCo

This video can answer just about any question you ask. It’s long, but it’s split up into chapters so you can see what questions he’s answering in that chapter. I do recommend you watch the whole thing if you can. There’s a lot of information that I found very insightful and thought provoking

archomrade [he/him]@midwest.social · edit-2 1 year ago

Couple things:

While I appreciate this gentleman’s copywrite experience, I do have a couple comments:

his analysis seems primarily focused from a law perspective. While I don’t doubt there is legal precedent for protection under copywrite law, my personal opinion is that copywrite is a capitalist conception that is dependent on an economic reality I fundamentally disagree with. Copywrite is meant to protect the livelihoods of artists, but I don’t think anyone’s livelihood should be dependent on having to sell labor. More often, copywrite is used to protect the financial interests of large businesses, not individual artists. The current litigation is between large media companies and OAI, and any settlement isn’t likely to remunerate much more than a couple dollars to individual artists, and we can’t turn back the clock to before AI could displace the jobs of artists, either.
I’m not a lawyer, but his legal argument is a little iffy to me… Unless I misunderstood something, he’s resting his case on a distinction between human inspiration (i.e. creative inspiration on derivative works) and how AI functions practically (i.e. AI has no subjective “experience” so it cannot bring its own “hand” to a derivative work). I don’t see this as a concrete argument, but even if I did, it is still no different than individual artists creating derivative works and crossing the line into copywrite infringement. I don’t see how this argument can be blanket applied to the use of AI, rather than individual cases of someone using AI on a project that draws too much from a derivative work.

The line is even less clear when discussing LLMs as opposed to T2I or I2I models, which I believe is what is being discussed in the lawsuit against OAI. Unlike images from DeviantArt and Instagram, text datasets from sources like reddit, Wikipedia, and Twitter aren’t protected under copywrite like visual media. The legal argument against the use of training data drawn from public sources is even less clear, and is even more removed to protecting the individual users and is instead a question of protecting social media sites with questionable legal claim to begin with. This is the point id expect this particular community would take issue with: I don’t think reddit or Twitter should be able to claim ownership over their user’s content, nor do I think anyone should be able to revoke consent over fair use just because it threatens our status quo capitalist system.

AI isn’t going away anytime soon, and litigating over the ownership of the training data is only going to serve to solidify the dominant hold over our economy by a handful of large tech giants. I would rather see large AI models be nationalized, or otherwise be protected from monopolization.

Treemaster099@pawb.social · 1 year ago

I don’t really have the time to look for timestamps, but he does present his arguments from many different angles. I highly recommend watching the whole thing if you can.

Aside from that, the main thing I want to address is the responsibility of these big corporations to curate the massive library of content they gather. It’s entirely in their power to blacklist certain things like PII or sensitive information or hate speech, but they decided not to because it was cheaper. They took a gamble that people either wouldn’t care, didn’t have the resources to fight it, or would actively support their theft if it meant getting a new toy to play with.

Now that there’s a chance they could lose a massive amount of money, this could deter other ai companies from flagrantly breaking the law and set a better standard that protects people’s personal data. Tbh I don’t really think this specific case has much ground to stand on, but it’s the first step in securing more safety for people online. Imagine if the database for this ai was leaked. Imagine all of the personal data, yours and mine included, that would be available to malicious people. Imagine the damage that could cause.

archomrade [he/him]@midwest.social · 1 year ago

They do curate the data somewhat, though it’s not easy to verify if they did since they don’t share their data set (likely because they expect legal challenge)

There’s no evidence they have “personal data” beyond direct textual data scraped from platforms such as reddit (much of which is disembodied from other metadata). I care FAR more about data google, facebook, or microsoft has leaking than I do text written on my old reddit or twitter account, and somehow we’re not wringing our hands about that data collection.

I watched most of that video, and i’m frankly not moved by much of it. The video seems primarily (if not entirely) written in response to generative image models and image data that may actually be protected under existing copywrite, unlike the textual data in question in this particular lawsuit. Despite that, I think his interpretation of “derivative work” hand waving is flimsy at best, and relies on a materialist perspective that I just can’t identify with (a pragmatic framework might be more persuasive to me). A case-by-case basis of copywrite infringement of the use of AI tools is the most solid argument he makes, but I am just not persuaded that all AI is theft based on publicly accessible data being used as training data. And i just don’t think copywrite law is an ideal solution to a growing problem with technological automation and ever increasing levels of productivity and stagnating levels of demand.

I’m open to being wrong, but i think copywrite law doesn’t address the long-term problems introduced by AI and is instead a shortcut to maintaining a status quo destined to failure regardless.

sycamore@lemmy.world · 1 year ago

I once looked outside. Could I be sued for observing a public space?

manitcor@lemmy.intai.tech · 1 year ago

i once looked at a picture of spider man and badman then made a crappy drawing biterman

to jail with me!

bobs_monkey@lemm.ee · 1 year ago

Long live baman and piderman

UntouchedWagons@lemmy.ca · 1 year ago

Piracy isn’t stealing and neither is this.

manitcor@lemmy.intai.tech · 1 year ago

yarrr!

gigglehurtz@lemmy.dbzer0.com · 1 year ago

Piracy also isn’t copyright infringement but that’s what this is. Under the law, which sucks. And if it applies to us it should apply to them.

xSinStarx@lemmy.fmhy.ml · 1 year ago

Seeders are committing copyright infringement, by definition. Piracy actively encourages that behavior. Whether that is unethical or not can be debated though (FBI, I swear I have nyaa idea how qBT ended up on my machine, must have been a virus from a 4chan hacker).

DMmeYourNudes@lemmy.world · 1 year ago

Piracy is literally theft, what are you talking about?

hightrix@lemmy.world · 1 year ago

It is absolutely not theft. If you’d like a physical crime to compare it to, forgery would be what you are looking for. But piracy is not at all theft.

That is, unless you are talking about Captain Davy Jones and his pirate ship. That type of piracy is theft.

Technoguyfication@lemmy.ml · 1 year ago

It’s wild to see people in the piracy community of all places have an issue with someone benefiting from data they got online for free.

Altair@vlemmy.net · edit-2 1 year ago

Key difference is that they’re making (alot of) money of off the stolen work, and in a way that’s only possible for the already filthy rich

Wouldn’t mind it personally if it was foss though, like their name suggests

whoisearth@lemmy.ca · 1 year ago

FWIW even if it was FOSS I’d still care. For me it’s more about intent. If your business model/livelihood relies on stealing from people there’s a problem. That’s as true on a business level as it is an individual one.

Doesn’t mean I have an answer as sometimes it’s extremely complex. The easy analogy is how we pirate TV shows and movies. Netflix originally proved this could be mitigated by providing the material cheaply and easily. People don’t want to steal (on average).

Botree@lemmy.world · 1 year ago

I find people in general are much more willing to part with their money than the big corps think. I’ll even go to the extent to say that we enjoy doing so. Just look at Twitch – tonnes of money are thrown at streamers because it’s fun and convenient, or at TikTok vendors selling useless stuff on live streaming. We just don’t like to be lied to and treated like cash cows.

whoisearth@lemmy.ca · 1 year ago

Amen.

Flip side with twitch I saw a girl get gifted 500$. It was awkward as all get out because there’s a feeling you need to perform and you could see her struggling with what to do. Money makes things unduly complex and I don’t like it lol

pipows@lemmy.pt · 1 year ago

They’re using people’s content without authorization, but for a open information ideology or something like that, they are closed source and they are using it to make money. I don’t think that should be illegal, but it is certainly a dick move

DankMemeMachine@lemmy.world · 1 year ago

The difference is that they are profitting from other people’s work and properties, I don’t profit from watching a movie or playing a game for free, I just save some money.

Holodeck_Moriarty@lemm.ee · 1 year ago

You do if you make games or movies and those things give you inspiration.

This is just how learning is done though, whether it’s AI or human.

DankMemeMachine@lemmy.world · 1 year ago

Absolutely not comparable. Inspiration and an amalgation of everything a LLM consumes are completely different things.

Holodeck_Moriarty@lemm.ee · 1 year ago

I’d argue that what we do is an amalgamation of what we are exposed to, to a great extent. And we are exposed to way less information than a LLM.

Briongloid@aussie.zone · 1 year ago

Many of us are sharing without reward and have strong ethical beliefs regarding for-profit distribution of material versus non-profit sharing.

arinot@lemmy.world · 1 year ago

It really isn’t that bonkers. A lot software thought is about licensing. See GPL and Creative Commons and all that stuff thats all about how things can be profited from/responsibilities around it. Benefiting from free data is one thing. Privately profiting at the expense or not sharing the capability/advances that came from it is another. Willing to bet there’s GPL violations via the training sets.

Is it even possible to attach licenses to text posts on social media?

Armok: God of Blood@lemmy.world · 1 year ago

Curious to see if this goes anywhere.

WagnasT@iusearchlinux.fyi · 1 year ago

inal but i think it’s going to come down to the terms of service where the data was scraped from. If the terms say the stuff you post can be shared with third parties then they might not have a leg to stand on. Where it gets sketchy is if someone posted someone else’s work, then the original author had no say in it being shared with a third party, BUT, is that the fault of the third party or the service provider that shared it?

Also, if i were exposed to copyright material through some unauthorised person distributing it can i not summarize the information? I guess i don’t know enough about fair use to answer that.

The wording in the article says they are being sued for stealing their data, this seems like a stretch but i guess i’ll wait for more details of the case.

D_Air1@lemmy.ml · 1 year ago

I agree with the terms of service bit, but the hard part is going through the tos for so many different sites. Sort like how some open source code bases can’t re-license a code base because it is impossible to get into contact with all the people who have contributed to the project over the years. Online platforms already have certain protections from their users posting illegal content to their sites. We will have to see if that is extended to these large language models. When it comes to free use, there is no such thing. Free use must be proven in court. Each and every time. There are no guidelines on what is and isn’t free use when it comes to word of law, so that can swing either way. Just my two cents on the matter. Also, (inal).

Armok: God of Blood@lemmy.world · 1 year ago

The thing is that the images are used to train a set of weights and biases; the training data isn’t distributed as part of the AI or as part of the software used to generate images.

state_electrician@discuss.tchncs.de · 1 year ago

I don’t see how this is any different than humans copying or being inspired by something. While I hate seeing companies profiting off of the commons while giving nothing of value back, how do you prove that an AI model is using your work in any meaningful or substantial way? What would make me really mad is if this dumb shit leads to even harsher copyright laws. We need less copyright not more.

redditsucks@lemmy.world · 1 year ago

Hope it goes through and sets a president.

fosiacat@lemmy.world · 1 year ago

I think you mean precident

romaselli@lemmy.world · 1 year ago

I think you mean precindent

manitcor@lemmy.intai.tech · 1 year ago

I think you mean pragernat

Noetic97@lemmy.world · 1 year ago

I think you meant prednisone

toothpaste_sandwich@feddit.nl · 1 year ago

I think you mean pregnant.

MinusPi (she/they)@pawb.social · 1 year ago

PREGNART!?

𝙚𝙧𝙧𝙚@feddit.win · 1 year ago

Perhaps they meant president 🤔

sorenant@lemmy.world · 1 year ago

Vote Skynet for 2024 Presidential Election, the efficient choice!

errer@lemmy.world · 1 year ago

And give up billions in inflated stock valuations? Not a chance!

ChrisLicht@lemm.ee · 1 year ago

precedent

PinkPanther@sh.itjust.works · 1 year ago

Ricky?! Is that you?

magnetosphere@sh.itjust.works · 1 year ago

I can’t speak for others, but I don’t consider posts I made on a website I don’t own to be my property. If anything, it’s amusing to think of my idiotic rants making up a tiny fraction of an AIs “knowledge”.

CookieJarObserver@sh.itjust.works · 1 year ago

Won’t go anywhere…

OpenAI being Sued for "Stealing" Peoples Content Online

OpenAI being Sued for "Stealing" Peoples Content Online

ChatGPT in trouble: OpenAI sued for stealing everything anyone’s ever written on the Internet