rollenspiel.social ist einer von vielen unabhängigen Mastodon-Servern, mit dem du dich im Fediverse beteiligen kannst.
rollenspiel.social wird von RollenspielMonster bereitgestellt. Wir bieten einen Platz für Rollenspiel, Pen & Paper, Tabletop, TCG und vieles mehr. Die primäre Sprache ist Deutsch.

Verwaltet von:

Serverstatistik:

523
aktive Profile

Looking in Serverlogs for 2025-01-01:

37769 86.15% 4 0.82% 311.75 MiB ~T~\ ~T~@ GPTBot/1.2 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

(and counting)

I wondered why goaccess logparsing was so slow. Then I removed --ignore-crawlers.

I do not have 37.000 articles on my websites. That thing is completely crazy.

799683 requests to 1w6.org and draketo.de in 5 days. 1h processing time to evaluate logs:

532335 49.46% 12 0.06% 17.91 GiB ~T~\ ~T~@ GPTBot/1.2
57612 5.35% 15 0.07% 486.08 MiB ~T~\ ~T~@ AhrefsBot/7.0
56898 5.29% 5 0.02% 3.05 GiB ~T~\ ~T~@ Scrapy/2.11.2
21596 2.01% 90 0.44% 1.18 GiB ~T~\ ~T~@ MJ12bot/v1.4.8

Half a million requests just from GPTBot. For pages with~1000 articles. That thing’s a menace.

@ArneBab
I'm contemlating building a LLM-trap, with some hundreds/thousands of (pre-generated) fake articles, where the said bots are server-side misguided to.

Generation would be common texts (Gutenberg?) but with statistically randomly shifted/replaced ...
hm ... would be filtered out as it's random noise.

Or maybe just replace articles, pronouns, especially numbers etc in a consistent way to increase statistically relevance?

@vampirdaddy maybe you should talk to @phryk for exchanging ideas about LLM trapping.

@vampirdaddy @phryk Good news: Bot stopped battering my site on 10th of January.

During the two weeks prior, it pumped so many requests onto my server that goaccess crashes when processing the logs of the past 14 days.

Big Kudos to my hoster all-inkl for withstanding that attack.

Both sites are still up:

1w6.org
draketo.de

@BlumeEvolution

www.1w6.org1w6 | Ein Würfel System - Einfach saubere RegelnEin schlankes, flexibles und frei lizensiertes Rollenspielsystem, das Spielleiter und Weltenbastler einfach an das Spielgefühl ihrer Welt anpassen können.

@ArneBab @vampirdaddy @BlumeEvolution

That reminds me, I still have to do some forensics work.

Turns out most of the requests in the spikes weren't done by clients identifying as GPTBot. There still is some correlation on the time axis I need to look at, but more importantly, I'll have to bin requests to subnets to see if I can attribute the attacks with some modicum of certainty.

Already started writing a little log analyzer for that, maybe it'll grow into something I can actually release.

@ArneBab
One drive-by finding tho: Used user-agents definitely form clusters – a solid ~20% of the requests in the spike identified as different versions of MSIE.

@vampirdaddy @BlumeEvolution

Arne Babenhauserheide

@phryk for me the two weeks battering came from GPTBot using 4.227.x.x:
ipinfo.io/ips/4.227.0.0/24

Yepp. Phoenix.

Microsoft has been (in-)effectively DoSing my website.
@vampirdaddy @BlumeEvolution

ipinfo.io4.227.0.0/24 IP RangeAll about 4.227.0.0/24

@ArneBab Yeah, anything that identified itself as GPTBot came from 4.227.0.0/16 for me too – but at least in my case, that was just a small fraction of the request spikes I saw.

Currently, I'm only outputting any agents/IPs that made at least 10 requests so I don't just get 100k lines of output for one incident, but even with that limited perspective, I can immediately see a cluster in 172.68.0.0/16, one in 172.71.0.0/16 and possibly another one in 217.113.0.0/16

@vampirdaddy @BlumeEvolution

@phryk today I get these bots:
11454 8.25% SentiBot/1.0
9534 6.87% AhrefsBot/7.0
7022 5.06% Barkrowler/0.9
6316 4.55% Amazonbot/0.1
3560 2.56% SemrushBot/7~bl
3408 2.45% MJ12bot/v1.4.8
3052 2.20% bingbot/2.0
2266 1.63% GPTBot/1.2

@ArneBab Thanks for the data. Hopefully I find the time to add some more stuff to my log analysis tomorrow so I can compare.

From what I've seen so far, GPTBot is the most common UA-string hitting my systems by a large margin.

Like, during the incident I use as my test data, I got 2795 requests from GPTBot, the second most common UA is Safari with 1594 and after that it drops all the way to 209.

But that's just around 3% of the ~100k requests of the incident actually identifying as GPTBot.

@ArneBab My hunch here is of course that OpenAI is spoofing agents and routing through a bunch of VPNs to keep crawling services that explicitly don't want them to.

@ArneBab I mean, it could of course be an honest to Goddess DDoS.

But if that's the case, it's pretty damn incompetent as nothing was affected much during multiple incidents across a couple weeks – even tho I'm using just a single lower tier dedicated machine to host my online infra. That just doesn't track for me.

@phryk same here. Everything kept working. During the onslaught, GPTBot kept requesting more and more crazy parameter combinations from the Drupal site (like repeating the same parameter many times).

That makes me think that they are just incompetent.