StuartLemmy
  • Communities
  • Create Post
  • Create Community
  • heart
    Support Lemmy
  • search
    Search
  • Login
  • Sign Up
sabreW4K3@lazysoci.al to Technology@beehaw.orgEnglish · 1 day ago

The Open-Source Software Saving the Internet From AI Bot Scrapers

www.404media.co

external-link
message-square
24
fedilink
  • cross-posted to:
  • [email protected]
139
external-link

The Open-Source Software Saving the Internet From AI Bot Scrapers

www.404media.co

sabreW4K3@lazysoci.al to Technology@beehaw.orgEnglish · 1 day ago
message-square
24
fedilink
  • cross-posted to:
  • [email protected]
Anubis, which block AI scrapers from scraping websites to death, has been downloaded almost 200,000 times.
  • who@feddit.org
    link
    fedilink
    English
    arrow-up
    1
    ·
    8 hours ago

    Interesting. Judging by that option’s name, it seems to refer to use of the HTML <meta> tag to refresh a page.

    https://developer.mozilla.org/en-US/docs/Web/HTML/Reference/Elements/meta/http-equiv

    Neither this tag nor using it for refresh is new at all. I don’t think I’ve seen it used to detect bots, though. I wonder what Anubis is doing here.

    • JohnEdwa@sopuli.xyz
      link
      fedilink
      arrow-up
      2
      ·
      3 hours ago

      It’s simply checking if the connection is from an actual browser, as a scraper pretending to be one won’t actually refresh the page as instructed. It’s going to buy some time, but like the rest of Anubis in general, it will only work until the scrapers get modified to work around it.

Technology@beehaw.org

technology@beehaw.org

Subscribe from Remote Instance

Create a post
You are not logged in. However you can subscribe from another Fediverse account, for example Lemmy or Mastodon. To do this, paste the following into the search field of your instance: [email protected]

A nice place to discuss rumors, happenings, innovations, and challenges in the technology sphere. We also welcome discussions on the intersections of technology and society. If it’s technological news or discussion of technology, it probably belongs here.

Remember the overriding ethos on Beehaw: Be(e) Nice. Each user you encounter here is a person, and should be treated with kindness (even if they’re wrong, or use a Linux distro you don’t like). Personal attacks will not be tolerated.

Subcommunities on Beehaw:

  • Free and Open Source Software
  • Programming
  • Operating Systems

This community’s icon was made by Aaron Schneider, under the CC-BY-NC-SA 4.0 license.

Visibility: Public
globe

This community can be federated to other instances and be posted/commented in by their users.

  • 285 users / day
  • 1.07K users / week
  • 2.61K users / month
  • 6.63K users / 6 months
  • 1 local subscriber
  • 39.5K subscribers
  • 717 Posts
  • 6.99K Comments
  • Modlog
  • mods:
  • Chris Remington@beehaw.org
  • alyaza [they/she]@beehaw.org
  • TheRtRevKaiser@beehaw.org
  • gyrfalcon@beehaw.org
  • rs5th@beehaw.org
  • coldredlight@beehaw.org
  • Leigh@beehaw.org
  • BE: 0.19.9
  • Modlog
  • Instances
  • Docs
  • Code
  • join-lemmy.org