Hi, I’m building a personal website and I don’t want it to be used to train AI. In my robots.txt file I blocked:

  • ChatGPT-User
  • GPTBot
  • Google-Extended
  • FacebookBot

What bots should I also add? Are there any other ways to block AI bots?

IMPORTANT: I don’t want to block search engine crawlers, only bots that are used to train AI.

  • hperrin@lemmy.world
    link
    fedilink
    arrow-up
    39
    arrow-down
    2
    ·
    1 year ago

    Pollute your site with nonsense that’s invisible to users. Things like pages that are linked to with invisible links that are just walls and walls of random text.

    • chevy9294@monero.townOP
      link
      fedilink
      arrow-up
      13
      ·
      1 year ago

      Good idea. I will made a invisible link to “traps for bots”. One trap will show random text, one will be redirect loop and one would be random link generator that will link to itself. I will also make every response randomly slow, for example 0,5 to 1,5 seconds.

      Good thing is that I can also block search engine crawlers from accessing only the traps.

        • Pantherina@feddit.de
          link
          fedilink
          arrow-up
          2
          ·
          edit-2
          1 year ago

          I dont think thats really a big problem. Like simply make every key word useless, somehow automate the process.

          There should be a tool for this damn, there is at least one Unicode character that doesnt even display a blank in a damn Terminal.

          Like… modern web crap doesnt even load without Javascript or animations. So dont bother a bit more HTML

    • folkrav@lemmy.ca
      link
      fedilink
      arrow-up
      8
      ·
      1 year ago

      OP still wants search indexing, in which case it’s a big no-no - it can be perceived as spam by search engines, and links your pages to tons of unrelated keywords.

    • stewsters@lemmy.world
      link
      fedilink
      arrow-up
      3
      ·
      1 year ago

      As long as you do not rely on SEO to get traffic. This has a good chance of affecting how Google sees your site as well.