I came across tools like nightshade that can poison images. That way, if someone steals an artist’s work to train their AI, it learns the wrong stuff and can potentially begin spewing gibberish.
Is there something that I can use on PDFs? There are two scenarios for me:
- Content that I already created that is available as a pdf.
- I use LaTeX to make new documents and I want to poison those from scratch if possible rather than an ad hoc step once the PDF is created.
Entire Bee Movie script in 0.1pt white on white in the header
“Why TF is this one-page document half a gigabyte?”
Text is small! The Bee Movie script is 89.2kb
Obviously you need some redundancy in case the script gets corrupted. 5000 repetitions seems reasonable for such a high quality work
I tried to copy some text in a report once.
It came out as gibberish.
A lot of the ways they scrape documents are the same used by accessibility tools, so I’d generally recommend against doing this.
So a layer of transparent text wouldn’t work?
I’m pretty sure most screen readers and stuff like copy/paste would also get whatever nonsense you filled it with.