This new data poisoning tool lets artists fight back against generative AI

ElectroVagrant@lemmy.world · 8 months ago

This new data poisoning tool lets artists fight back against generative AI

barsoap@lemm.ee · edit-2 8 months ago

if the only source of input data for a network is subtly corrupted, won’t that guarantee corrupted output as well?

We have to distinguish between different kinds of “corruption”, here. What you seem to be describing is “if we only feed the model data from rule34, will it ever learn proper human anatomy” and the answer is no, it won’t. You’ll have to add data which narrows the range of body proportions from cartoonish to, well, real. That’s an external source of corruption: Feeding it bad data (for your own definition of “bad”). Garbage in, garbage out.

The corruption that these adversarial models are exploiting though is inherent in the model they’re attacking. Take… ropes and snakes and cats (or, generally, mammals). Good example: It is incredibly easy for a cat to mistake a rope for a snake – it looks exactly the same to the first layers of the visual cortex and evolution would rather have the cat jump away as soon as possible than be bitten, and it doesn’t hurt to jump away from a rope (even though the cat might end up being annoyed or ashamed (yes cats can 110% be self-conscious different story)), so when there’s an unexpected wiggly shape the first layers directly tell the motor cortex to move, short-circuiting any higher processing.

That trait has been written into the network by evolution, very similar to how we train AI models – conceptually, that is: In both cases the network gets trained for fitness for a purpose (the implementation details are indeed rather different but also irrelevant):

What those adversarial models do kinda looks like this: Take a picture of a rope. Now randomly shift pixels to make the rope subtly more snake-like until you get your cat to jump as reliably as possible, in as many different situations as possible, e.g. even if they’re expecting it and staring straight at it. Sell the product for a lot of money. People start posting pictures of ropes, rope manufacturers adjust their weaving patterns. Other cats see those pictures and ropes, some jump, and others only feel a bit, or a lot, uneasy. The ones that jump will not be able to procreate, any more, being busy jumping, while the uneasy ones will continue to evolve. After a couple of generations no cat cares about those ropes with shifted pixels any more.

Whether that trains general immunity against adversarial attacks – I wouldn’t be so sure. It very likely will make the rope/snake distinction more accurate. But even if it doesn’t build general immunity, it’s an eternal cat and mouse game and no artist will be willing to continue paying for that kind of software when it’s going to get defeated within days, anyway, because that’s just how fast we can evolve models.

Oh. Back to the definition of corruption: If all the pictures of rope that our models ever see have shifted pixels then it’s just going to assume that is the norm, and distinguish it from snakes because the tags say “rope” in one case, and “snake” in the other. The original un-shifted pictures probably won’t be an adversarial attack because they’re not a product of trying to get cats to jump.