this post was submitted on 03 Dec 2025
0 points (NaN% liked)

Meta (slrpnk.net)

911 readers
1 users here now

Here we can discuss anything about this Lemmy instance/server itself.

Our XMPP support chat: Movim or XMPP client.

Please also refer to our Wiki

founded 4 years ago
MODERATORS
 

I wondered about the robots.txt

I can see the case for it, I could also see the case for allowing at least Google to index the site.

Has there been some discussion about this previously?

top 9 comments
sorted by: hot top controversial new old
[–] umbra@slrpnk.net 0 points 5 months ago (1 children)

I can see the case for it, I could also see the case for allowing at least Google to index the site.

Googlebot is the User-agent for search results which isn't blocked here

This looks like an anti-AI robots.txt, which makes sense given the instance. It's meant to prevent scraping data and using it to train AIs.

[–] sam_uk@slrpnk.net 0 points 5 months ago* (last edited 5 months ago) (1 children)

In the bottom

User-agent: *
Disallow: /

Blocks all scrapers and superseeds the text above AFAIK

[–] umbra@slrpnk.net 0 points 5 months ago

Doh.i missed that😅

[–] poVoq@slrpnk.net 0 points 5 months ago* (last edited 5 months ago) (1 children)

At this point we try to block pretty much everything even remotely related to AI companies.

Soon we will probably have to block Chrome browsers when they start to use them to scrape websites without their users knowing (yes that is why AI companies started to make their own browsers and Mozilla is planning the same proudly proclaiming how "stealthy" they can be with that.).

Google search results have become so useless that I see little point left trying to accomodate their search bot 🤷

Yes I am bitter and can't wait for the AI bubble to pop.

[–] sam_uk@slrpnk.net 0 points 5 months ago (1 children)

It's any day now I think, EU pension funds are moving out https://www.removepaywall.com/search?url=https%3A%2F%2Fwww.ft.com%2Fcontent%2F9d90d557-48e5-4f4b-a927-88071cef8ea9

Would you be up for re-enabling Google indexing? It is crappy, but still..

[–] poVoq@slrpnk.net 0 points 5 months ago (1 children)

Not very motivated, but I can look into it.

[–] sam_uk@slrpnk.net 0 points 5 months ago (1 children)

I think it would just be

User-agent: *
Disallow: /
User-agent: Googlebot
Allow: /
[–] poVoq@slrpnk.net 0 points 5 months ago* (last edited 5 months ago)

Ok I tried to allow-list some search engine spiders in the robot.txt, however they will probably still just run into the AI scraper block if they act too shady.

But honestly, I highly doubt we will get much traffic from Google search. It's completely gone to shit these days.

[–] Nemo@slrpnk.net 0 points 5 months ago

There has and IIRC it's to help prevent scraping.