Once, I encountered the funny story of an AI image descriptor with a sheep obsession. It had been trained on pictures of fields of sheep. Therefore, it tagged anything in a field as 'sheep', including an empty field, because they work on statistical probability. Therefore, it thinks "ah, a field! there's probably a sheep here." (It's a bit more complicated but basically that.) It also couldn't recognise sheep in places that weren't fields, such as petrol stations or barns. [cont]
Now, the alarming aspect of this story is that the very same technology is probably what tumblr is using to identify porn. Now, if it can’t tell that an empty field is not, in fact, full of sheep, what hope do we have that it can’t tell an empty room isn’t full of writing human forms engaged in passionate coitus?
this really does sound like an episode of black mirror
But wait, it’s even weirder than that!
This is gonna produce some absolutely baffling pornography.
…. oh my fucking god they actually are using open source software. They’re using a fucking one-layer unidirectional bicategory tag-trained neural network. This will never work. Literally, it will never work. There’s just not enough algorithmic complexity to do what they’re asking of it. I bet you I could prove on a mathematical level that this joke of a neural net fundamentally lacks the abstraction necessary to do its job.
This will never get better. Their algorithm will never stop fucking up, it will never actually flag porn reliably and it will always require a massive quantity of human hours to deal with the deluge of mistagged pictures. This isn’t just a case of an insufficiently trained algorithm, it’s just … this is the most basic neural network you can make. It probably hasa a lot of neurons and has loads of training data but like … you can’t just brute force this kind of stuff. One layer of neurons is just Not Enough.
Also, just to make this clear, Tumblr lied. I mean, we already know this, but I mean they liiiieeeeed. All that stuff they promised about what would or would not be censored? That cannot be delivered on with a system this simple. Nude classical sculptures, political protests, male-presenting nipples (really Tumblr?), nude art outside the context of sex, all that? You cannot train a bicategory one-layer neural network to exclude those things. It cannot be done. Tumblr never intended for those things to actually be permitted, they were just lying. Because the system they have cannot actually do what they said it would and never will be able to.
Also, this kind of system is super vulnerable to counter-neural strategies. I bet you before the end of the month someone hooks up their own open source one layer bicategory neural network which puts an imperceptible (to humans) layer of patterned static over arbitrary images, and trains it by having it bot-post static-ed images to Tumblr and reinforcing based on whether the images are labeled nsfw or sfw. Seriously, within a month someone will have an input-output machine which can turn any image ‘sfw’ in Tumblr’s eyes.
This is genuinely pathetic. Like, I have real pity for whoever implemented this, because it’s clear Tumblr doesn’t actually have any engineers with any expertise with machine learning left at all and they foisted the job off on some poor bastard who has no idea what they’re doing and is going to get all kinds of flak for their (perfectly reasonable and predetermined) failure from management.
As has been pointed out before, there are no humans behind this at all. The review process just reruns either the same algorithm or another algorithm, but people have posted screen shots showing obviously SFW pictures that were still deemed NSFW on review, despite the fact that any human, no matter how overworked / tired would have seen that these pictures were not porn.
You serious!? Give it a month and a day and there'll be a tool to stenographically hide tags that are normally blocked in that counter-neutral static!
This is disgusting, yeah there will be less nsfw out there but now what slips through the nets will be banal imagery put up by researchers looking to crack a system or whatever hateful things predators dig up to shock kids on this now kid friendly platform. I figured Tumblr would finally be axed for not making nearly enough ad revenue, not by a mission so clearly designed from the getgo to make the platform gracelessly go the way of MySpace.