mouthporn.net
#artificial intelligence – @silver-horse on Tumblr
Avatar

@silver-horse / silver-horse.tumblr.com

M (she/her) video game blog 18+ I make gifs and mildly entertaining memes and shitposts
Avatar
Avatar
leidensygdom

Please be aware that the "opt-out" choice is just a way to try to appease people. But Tumblr has not been transparent about when has data been sold and shared with AI companies, and there are sources that confirm that data has already been shared before the toggle was even provided to users.

Also, it seems to include data they should not have been able to give under any circumstance, including that of deactivated blogs, private messages and conversations, stuff from private blogs, and so on.

Do not believe that "AI companies will honor the "opt-out request retroactively". Once they've got their hands on your data (and they have), they won't be "honoring" an opt-out option retroactively. There is no way to confirm or deny what data do they have: The fact they are completely opaque on what do they currently "own" and have, means that they can do whatever they want with it. How can you prove they have your data if they don't give everyone free access to see what they've stolen already?

So, yeah, opt out of data sharing, but be aware that this isn't stopping anyone from taking your data. They already have been taking it, before you were given that option. Go and go to Tumblr's Suppport and leave your Feedback on this (politely, but firmly- not everyone in the company is responsible for this.)

Finally: Opt out is not good under any circumstance. Deactivated people can't opt out. People who have lost their passwords can't opt out. People who can't access internet or computers can't opt out. People who had their content reposted can't opt out. Dead people can't opt out. When DeviantArt released their AI image generator, saying that it wasn't trained on people who didn't consent to it, it was proven it could easily replicate the styles of people who had passed away, as seen here. So, yeah. AI companies cannot be trusted to have any sort of respect for people's data and content, because this entire thing is just a data laundering scheme.

Please do reblog for awareness.

Avatar
Avatar
jv

Well, see you, friends

https://www.404media.co/tumblr-and-wordpress-to-sell-users-data-to-train-ai-tools/

Don't go after staff members because of this, for what I know they weren't even informed until later in the process. You know who this comes from.

Avatar
eriyu

full article, for those who don't want to sign up for an account:

Tumblr and Wordpress are preparing to sell user data to Midjourney and OpenAI, according to a source with internal knowledge about the deals and internal documentation referring to the deals. 

The exact types of data from each platform going to each company are not spelled out in documentation we’ve reviewed, but internal communications reviewed by 404 Media make clear that deals between Automattic, the platforms’ parent company, and OpenAI and Midjourney are imminent.

The internal documentation details a messy and controversial process within Tumblr itself. One internal post made by Cyle Gage, a product manager at Tumblr, states that a query made to prepare data for OpenAI and Midjourney compiled a huge number of user posts that it wasn’t supposed to. It is not clear from Gage’s post whether this data has already been sent to OpenAI and Midjourney, or whether Gage was detailing a process for scrubbing the data before it was to be sent. 

Gage wrote:

“the way the data was queried for the initial data dump to Midjourney/OpenAI means we compiled a list of all tumblr’s public post content between 2014 and 2023, but also unfortunately it included, and should not have included:

- private posts on public blogs - posts on deleted or suspended blogs - unanswered asks (normally these are not public until they’re answered) - private answers (these only show up to the receiver and are not public) - posts that are marked ‘explicit’ / NSFW / ‘mature’ by our more modern standards (this may not be a big deal, I don’t know) - content from premium partner blogs (special brand blogs like Apple’s former music blog, for example, who spent money with us on an ad campaign) that may have creative that doesn’t belong to us, and we don’t have the rights to share with this-parties; this one is kinda unknown to me, what deals are in place historically and what they should prevent us from doing.”

Gage’s post makes clear that engineers are working on compiling a list of post IDs that should not have been included, and that password-protected posts, DMs, and media flagged as CSAM and other community guidelines violations were not included.

Automattic plans to launch a new setting on Wednesday that will allow users to opt-out of data sharing with third parties, including AI companies, according to the source, who spoke on the condition of anonymity, and internal documents. A new FAQ section we reviewed is titled “What happens when you opt out?” states that “If you opt out from the start, we will block crawlers from accessing your content by adding your site on a disallowed list. If you change your mind later, we also plan to update any partners about people who newly opt-out and ask that their content be removed from past sources and future training.” 

404 Media has asked Automattic how it accidentally compiled data that it shouldn’t share, and whether any of that content was shared with OpenAI, but did not immediately hear back from the company. 404 Media asked Automattic about an imminent deal with Midjourney last week but did not hear back then, either.

Another internal document shows that, on February 23, an employee asked in a staff-only thread, “Do we have assurances that if a user opts out of their data being shared with third parties that our existing data partners will be notified of such a change and remove their data?”

Andrew Spittle, Automattic’s head of AI replied: “We will notify existing partners on a regular basis about anyone who's opted out since the last time we provided a list. I want this to be an ongoing process where we regularly advocate for past content to be excluded based on current preferences. We will ask that content be deleted and removed from any future training runs. I believe partners will honor this based on our conversations with them to this point. I don't think they gain much overall by retaining it.” Automattic did not respond to a question from 404 Media about whether it could guarantee that people who opt out will have their data deleted retroactively.

News about a deal between Tumblr and Midjourney has been rumored and speculated about on Tumblr for the last week. Someone claiming to be a former Tumblr employee announced in a Tumblr blog post that the platform was working on a deal with Midjourney, and the rumor made it onto Blind, an app for verified employees of companies to anonymously discuss their jobs. 404 Media has seen the Blind posts, in which what seems like an Automattic employee says, “I'm not sure why some of you are getting worked up or worried about this. It's totally legal, and sharing it publicly is perfectly fine since it's right there in the terms & conditions. So, go ahead and spread the word as much as you can with your friends and tech journalists, it's totally fine.”

Separately, 404 Media viewed a public, now-deleted post by Gage, the product manager, where he said that he was deleting all of his images off of Tumblr, and would be putting them on his personal website. A still-live post says, “i've deleted my photography from tumblr and will be moving it slowly but surely over to cylegage.com, which i'm building into a photography portfolio that i can control end-to-end.” At one point last week, his personal website had a specific note stating that he did not consent to AI scraping of his images. Gage’s original post has been deleted, and his website is now a blank page that just reads “Cyle.” Gage did not respond to a request for comment from 404 Media. 

Several online platforms have made similar deals with AI companies recently, including Reddit, which entered into an AI content licensing deal with Google and said in its SEC filing last week that it’s “in the early stages of monetizing [its] user base” by training AI on users’ posts. Last year, Shutterstock signed a six year deal with OpenAI to provide training data.

OpenAI and Midjourney did not respond to requests for comment. 

Avatar
veilofmist

who would even want to opt in anyway?

Like great on giving the option i guess but as soon as it drops I can't imagine that anyone is going to pick any other option but opt out.

The rumour mill within ex-employees say they already sent the data to openAi, but I can't validate if that's exact or not

I feel like if there is/will be an opt out it violates the spirit of the option to not have even warned users this was coming.

Yeah that's a pure damage control setting. Once they have sent your data to openAI and Midjourney, you opting out won't take your posts out of the LLMs. It's shit and they know it.

Also, given how reblogs work in Tumblr, I suspect even if you opt-out, your new posts may end being sent anyway if someone who hasn't opted out reblogs you

Avatar
rainbowsky

There are some options for protecting your art and fighting back to some degree.

But it sounds like the horse has already left the barn...

Avatar
xipiti

This makes me just sick to my stomach. Tumblr was the only refuge left where I could post art and photos and have an actual chance for real people to see them and sometimes share and comment on them. No algorithm-fighting, no popularity boosting; if you made it and tagged it someone would find it. There’s nowhere else like this anymore.

This is such a betrayal of everything we’ve all worked so hard to make. It’s not your work to sell. (I won’t even start about how much here is fanart or fanfic. The point is it’s not yours.)

It certainly lends a large aroma of bullshit to that ‘oh we’re all good, the site will be in maintenance mode but that just means we’re refocusing on what works best for you folks’ statement. Were you already in the process of selling us to the robot farms? Is that why everything suddenly got all rosy again?

Even now if I glaze/shade every piece of art I’ve blogged does it even matter? Every previous reblog is an opportunity for unprotected scraping.

I guess I’ll still be here but I’m not sure what will happen to my art and photo sideblogs. Obviously I’ve toggled the toggle. But if there’s no meaningful assurance that reblogs are protected or if everything has already been sold I guess I’ll take them private or delete them. This sucks so very much y’all. I’m paying the damn subscription even, though I know there are no bounds on driving billionaires’ profits even if everyone subscribed. I know we’d still have been sold the instant the offer was made.

I just wanted to be able to make things and show them to people. Not have them ripped off and ground up to generate fungus in a Petri dish.

this should have been pretty obvious after they pulled everyone from Tumblr staff. Matt was looking for an exit strategy, make whatever he could off tumblr and then burn it down. and if no one will post art or anything anymore, tumblr dies.

Honestly? This is exactly what I think. Matt knows that this move puts Tumblr in an existential risk. But openaAi and Midjourney probably will pay enough to recover the money he has spent in Tumblr in the last 5 years.

Avatar
Avatar
copperbadge

AI Scraping Isn't Just Art And Fanfic

Something I haven't really seen mentioned and I think people may want to bear in mind is that while artists are the most heavily impacted by AI visual medium scraping, it's not like the machine knows or cares to differentiate between original art and a photograph of your child.

AI visual media scrapers take everything, and that includes screengrabs, photographs, and memes. Selfies, pictures of your pets and children, pictures of your home, screengrabs of images posted to other sites -- all of the comic book imagery I've posted that I screengrabbed from digital comics, images of tweets (including the icons of peoples' faces in those tweets) and instas and screengrabs from tiktoks. I've posted x-ray images of my teeth. All of that will go into the machine.

That's why, at least I think, Midjourney wants Tumblr -- after Instagram we are potentially the most image-heavy social media site, and like Instagram we tag our content, which is metadata that the scraper can use.

So even if you aren't an artist, unless you want to Glaze every image of any kind that you post, you probably want to opt out of being scraped. I'm gonna go ahead and say we've probably already been scraped anyway, so I don't think there's a ton of point in taking down your tumblr or locking down specific images, but I mean...especially if it's stuff like pictures of children or say, a fundraising photo that involves your medical data, it maybe can't hurt.

If you do want to officially opt out, which may help if there's a class-action lawsuit later, you're going to want to go to the gear in the upper-right corner on the Tumblr desktop site, select each of your blogs from the list on the right-hand side, and scroll down to "Visibility". Select "Prevent third party sharing for [username]" to flip that bad boy on.

(If someone wants to post a link in notes to instructions for doing it on the app, I haven't updated mine so the option doesn't appear and I don't know where to find it.)

Avatar
Avatar
sharkface

They are already selling data to midjourney, and it's very likely your work is already being used to train their models because you have to OPT OUT of this, not opt in. Very scummy of them to roll this out unannounced.

Avatar
writterings

here's some instructions for anyone who doesn't know how to opt out:

  1. login in on desktop, it's not available on mobile yet
  2. click "Account"
  3. click on your blog
  4. go to "Blog Settings"
  5. go to "Visibility"
  6. Scroll down to the bottom option
  7. turn the toggle ON, not off

you will have to do this individually for each sideblog you have too, no way to do it for each account in one go

Avatar
Avatar
staff

Hi, Tumblr. It’s Tumblr. We’re working on some things that we want to share with you. 

AI companies are acquiring content across the internet for a variety of purposes in all sorts of ways. There are currently very few regulations giving individuals control over how their content is used by AI platforms. Proposed regulations around the world, like the European Union’s AI Act, would give individuals more control over whether and how their content is utilized by this emerging technology. We support this right regardless of geographic location, so we’re releasing a toggle to opt out of sharing content from your public blogs with third parties, including AI platforms that use this content for model training. We’re also working with partners to ensure you have as much control as possible regarding what content is used.

Here are the important details:

  • We already discourage AI crawlers from gathering content from Tumblr and will continue to do so, save for those with which we partner. 
  • We want to represent all of you on Tumblr and ensure that protections are in place for how your content is used. We are committed to making sure our partners respect those decisions.
  • To opt out of sharing your public blogs’ content with third parties, visit each of your public blogs’ blog settings via the web interface and toggle on the “Prevent third-party sharing” option. 
  • For instructions on how to opt out using the latest version of the app, please visit this Help Center doc. 
  • Please note: If you’ve already chosen to discourage search crawling of your blog in your settings, we’ve automatically enabled the “Prevent third-party sharing” option.

If you have concerns, please read through the Help Center doc linked above and contact us via Support if you still have questions.

Avatar
pixelsmasher

@staff opt out isn't good enough, it needs to be off by default, we went through this with DeviantArt. In Gardner v. Nike, Inc., 279 F. 3d 774 (9th Cir. 2002), the Court found that an exclusive licensee whose license was silent as to a right of assignment could not assign that license without the copyright holder's express consent.

While you can sell SOME metadata, you cant sell the works, images, text (including for ai training) of users or works they post which you don't know for a fact is even theirs. It immediately violates their sovereignty to their copyright ,

To keep it simple, your agreement is limited to showing works for the purpose of showing works, promoting the site, or the users, that is all. You cant repurpose their copyright.

Avatar
Avatar
staff

Hi, Tumblr. It’s Tumblr. We’re working on some things that we want to share with you. 

AI companies are acquiring content across the internet for a variety of purposes in all sorts of ways. There are currently very few regulations giving individuals control over how their content is used by AI platforms. Proposed regulations around the world, like the European Union’s AI Act, would give individuals more control over whether and how their content is utilized by this emerging technology. We support this right regardless of geographic location, so we’re releasing a toggle to opt out of sharing content from your public blogs with third parties, including AI platforms that use this content for model training. We’re also working with partners to ensure you have as much control as possible regarding what content is used.

Here are the important details:

  • We already discourage AI crawlers from gathering content from Tumblr and will continue to do so, save for those with which we partner. 
  • We want to represent all of you on Tumblr and ensure that protections are in place for how your content is used. We are committed to making sure our partners respect those decisions.
  • To opt out of sharing your public blogs’ content with third parties, visit each of your public blogs’ blog settings via the web interface and toggle on the “Prevent third-party sharing” option. 
  • For instructions on how to opt out using the latest version of the app, please visit this Help Center doc. 
  • Please note: If you’ve already chosen to discourage search crawling of your blog in your settings, we’ve automatically enabled the “Prevent third-party sharing” option.

If you have concerns, please read through the Help Center doc linked above and contact us via Support if you still have questions.

If you are an artist and/or a writer, you need to opt out of this IMMEDIATELY. Tumblr buried it in the fine print here, but the AI platforms they plan to partner with are none other than fucking MidJourney and OpenAI. Meaning, unless you opt out of it, Tumblr is actively selling any creative work you have ever posted here to generative AI data collectors for their datasets.

And by the way, staff? The fact that this is an opt-out system and not an opt-in is shady as hell.

Avatar
Avatar
pukicho

Hey be sure to go to your blog settings, head down to visibility and turn on this little button that prevents Tumblr from stealing your posts and using it to train AI learning models. Good job, fuckheads, great update.

Avatar
Avatar
demilypyro

For the record, I would never knowingly use or share AI generated art in anything I post, so if you ever catch me doing so, it was an accident, and I'd like you to let me know so I can delete it.

Avatar
tbposting

Yeah, this ^

I've fallen for it a couple of times already, and been grateful when people let me know.

You are using an unsupported browser and things might not work as intended. Please make sure you're using the latest version of Chrome, Firefox, Safari, or Edge.
mouthporn.net