Inspired by the comments on this Ars article, I’ve decided to program my website to “poison the well” when it gets a request from GPTBot.

The intuitive approach is just to generate some HTML like this:

<p>
// Twenty pages of random words
</p>

(I also considered just hardcoding twenty megabytes of “FUCK YOU,” but that’s a little juvenile for my taste.)

Unfortunately, I’m not very familiar with ML beyond a few basic concepts, so I’m unsure if this would get me the most bang for my buck.

What do you smarter people on Lemmy think?

(I’m aware this won’t do much, but I’m petty.)

  • Sigmatics@lemmy.ca
    link
    fedilink
    arrow-up
    3
    arrow-down
    2
    ·
    edit-2
    11 months ago

    It’s not going to work. I’m pretty sure they have filters in place for stuff like this. And your random website won’t be crawled anyway because nobody’s linking to it

    • Reader9@programming.dev
      link
      fedilink
      English
      arrow-up
      4
      ·
      11 months ago

      It’s probably not going to work as a defense against training LLMs (unless everyone does it?) but it also doesn’t have to — it’s an interesting thought experiment which can aid in understanding of this technology from an outside perspective.