Advertisement · 728 × 90
#
Hashtag
#Scrapers
Advertisement · 728 × 90
Preview
Netnut – Proxy, Reviews, Pros & Cons, Alternatives | Caproxy Netnut is a popular proxy provider that has been operating since 2017, with its headquarters in Israel. It is known for its high-quality proxies and is pr...

Netnut is a popular proxy provider that has been operating since 2017, with its headquarters in Israel. It is known for its high-quality proxies and is primarily focused on corporate clients.

caproxy.com/en/list/netn...

#netnut #proxy #proxies #caproxy #datasets #scrapers

2 0 0 0
Post image Post image Post image

I've decided to uh, scrap the "Scrappers" figures idea, but I haven't given up on mech toys... I've just decided to go BIGGER. I've reworked the concept into something a bit more fresh and exciting--

ROBOTS that turn into BUILDINGS!

#art #3dprinting #mech #mecha #robots #scrapers #april1st

12 0 0 0
Post image

#Shortwave #Solution #NEN4VVAR #7070Khz #AI #scrapers

4 0 0 0
Original post on mementomori.social

No outages in the latest Apache logs. However, there is plenty of suspicious activity.

The log has 16,033 lines.

Of these, 1,559 lines feature the "RecentChanges" function for my wikis. Which is something regular users _might_ call up from time to time, but I suspect that #scrapers are the […]

0 1 0 0
Preview
OpenStreetMap OpenStreetMap is a map of the world, created by people like you and free to use under an open license.

https://OpenStreetMap.org has been disrupted today. We're working to keep the site online while facing extreme load from anonymous scrapers spread across 100,000+ IP addresses. Please be patient while we mitigate and protect the service. #OpenStreetMap #DDoS #Scrapers #AI

25 124 4 1

#AI #Scrapers #Moltbook

0 0 0 0

#AI #Scrapers #Moltbook

0 0 0 0

Looks like those nasy AI scraper cannot follow 30x redirects
#webmaster #scrapers #website

0 0 0 0
Protection pack (web ACL) activity: summary of your protection rules and how their order contributes to terminating actions. Block-Old-Chrome-UA, Block-countries, and ipset-block-production siphon off about a sixth of the traffic. Of 53,720 requests, 47,700 are allowed and 6,030 are blocked.

Protection pack (web ACL) activity: summary of your protection rules and how their order contributes to terminating actions. Block-Old-Chrome-UA, Block-countries, and ipset-block-production siphon off about a sixth of the traffic. Of 53,720 requests, 47,700 are allowed and 6,030 are blocked.

#scrapers and #crawlers are waging a constant #DDOS on our site and driving up cloud hosting costs. We’re coping, but if it keeps getting worse, will OHM last? 🫠

0 0 1 0
Protection pack (web ACL) activity: summary of your protection rules and how their order contributes to terminating actions. Block-Old-Chrome-UA, Block-countries, and ipset-block-production siphon off about a sixth of the traffic. Of 53,720 requests, 47,700 are allowed and 6,030 are blocked.

Protection pack (web ACL) activity: summary of your protection rules and how their order contributes to terminating actions. Block-Old-Chrome-UA, Block-countries, and ipset-block-production siphon off about a sixth of the traffic. Of 53,720 requests, 47,700 are allowed and 6,030 are blocked.

#scrapers and #crawlers are waging a constant #DDOS on our site and driving up cloud hosting costs. We’re coping, but if it keeps getting worse, will OHM last? 🫠

0 1 0 0
Preview
Why Your Python Scrapers Keep Failing 6 proven Python tricks to dodge anti-bot walls and keep your scrapers alive—robots.txt safe, CAPTCHA-proof.

Anti-bot walls keep rising: randomize fingerprints, real browsers, proxy pools. Still getting blocked?
#Scrapers
open.substack.com/pub/pythonli...

1 0 0 0

Random #Mastodon feature idea:
Hiding random bits of text in the page to poison illegal #AI #scrapers.

0 1 1 0
Screenshot of Google Al Mode:
Prompt: "generate a paragraph of nonsense that almost reads like a sentence"

"The sproinging glibberwock woggled the
furkle-snort, which blibbited a high-pitched skronk before the jibbety-joo flibbited with fervent tranklements, leaving the gorbish to flanggle its own snarlatious way to the flibbity-gibbet."

Screenshot of Google Al Mode: Prompt: "generate a paragraph of nonsense that almost reads like a sentence" "The sproinging glibberwock woggled the furkle-snort, which blibbited a high-pitched skronk before the jibbety-joo flibbited with fervent tranklements, leaving the gorbish to flanggle its own snarlatious way to the flibbity-gibbet."

_"How can I spend less time on the nonsense posts I make to poison#AI #scrapers?"_

Me:

0 0 1 0
How to rate-limit requests with NGINX - Joshtronic All you do is take, take, take

#Development #Approaches
Rate-limiting requests with Nginx · An alternative approach to counter AI crawlers ilo.im/168axr by Josh Sherman

_____
#RateLimiting #Nginx #WebServer #AI #Scrapers #RobotsTxt #DevOps #WebDev #Backend

0 0 0 0
Preview
K-Tec Introduces New 1230 Scraper K-Tec introduces the new 1230 Scraper, featuring increased capacity and reduced weight for enhanced field performance. Precision engineering and field-tested endurance result in higher productivity, smoother operation, and cost-efficiency. The scraper's modular design enables global shipping and easy maintenance....

K-Tec Introduces New 1230 Scraper #KTec #Scrapers #Construction

0 0 0 0

Now i will #research how one can #protect, #art from #Ai and #scrapers etc,

so i will make another, thread, cuz, maybe, one can use AI to protect against AI's, cuz

just like bacteria and immune system,
both can and do evolve, maybe AI vs AI is better than AI vs Humans cuz..

1 0 1 0

Did you know the usage of #hashtags creates links on the #fediverse and they can come to confuse bots and #scrapers in the future. #DimensionalDataStorage

0 0 0 0
Preview
Vilka AI scrapers är bäst 2025? - Daniel Norin Mina favoritverktyg för AI driven webbscraping, från enkla klickverktyg till plattformar som skalar på riktigt, plus en bonus i form av img2dataset som samlar bilddata i rasande fart. Men vågar du köra den brutala bildscrapern ;)

Vilka AI scrapers är bäst 2025?

#AI #Apify #automation #automatisering #BrowseAI #Crawl4AI #Firecrawl #img2dataset #Octoparse #ScrapeGraphAI #scrapers #scraping #ScrapingBee #utveckling #aisweden #stockholm #sverige #sweden

0 1 0 0
Original post on meow.social

It looks like there is yet another AI scraper on the loose. Their user-agent links itself to a repository on github that has only a readme.md, and a webmasters.txt on it.
Their description of the bot is vague to say the least.

They hit my selfhosted website, and the forum of my main website […]

0 1 2 0
How to scrape Bluesky with Python | Crawlee for JavaScript · Build reliable crawlers. Fast. Crawlee helps you build and maintain your crawlers. It's open source, but built by developers who scrape millions of pages every day for a living.

How to scrape @bsky.app #Bluesky with Python | Crawlee for JavaScript·Build reliable crawlers. Fast. crawlee.dev/blog/scrape-...
*
I don't like #scrapers. I prefer people engage with me OR reshare my post even if they disagree with me & want to cuss me out.😀

Anyway! #Technology is #progress.

0 0 0 0

#Trump Media is partnering with #Perplexity to bring #AIsearch to #TruthSocial. Perplexity, which has a history of using #scrapers to evade websites and #plagiarising content. Despite Trump Media’s mission to end #BigTech’s influence, the partnership with Perplexity, whose investors include…

0 0 0 0
Screenshot of a Github activity line plot showing the number of daily clones per day over the past 14 days. In the past week the number of daily clones went up to 60k+ for 2 days, with the total number of clones for the entire timespan 222,356 with only 117 unique cloners.

Screenshot of a Github activity line plot showing the number of daily clones per day over the past 14 days. In the past week the number of daily clones went up to 60k+ for 2 days, with the total number of clones for the entire timespan 222,356 with only 117 unique cloners.

Anyone else getting these ridiculous repo scraping spikes? A clean checkout of the https://thi.ng/umbrella monorepo is ~370MB. Over the past 14 days there were 222k clones (only 117 unique) of this repo which have caused downloads of a whopping ~78TB. WTF! 🤯 […]

[Original post on mastodon.thi.ng]

1 1 0 0

#Anubis, an #opensource programme designed to #protect #websites from #AIbot #scrapers, has been downloaded nearly 200,000 times since its launch in January. It is used by organisations like GNOME, FFmpeg, and UNESCO.…

2 1 0 0
Michał "rysiek" Woźniak · 🇺🇦 (@rysiek@mstdn.social) My silly side project to very slowly generate Markov chain slop for LLM scrapers, wasting their resources, is slowly getting to a usable state: https://git.rys.io/libre/markov-tarpit/ Basic functionality is there, now I need to figure out how to establish that a given request comes from an LLM scraper, and how to trap them in the Markov chain world. That's going to be an interesting interplay between nginx, fail2ban, and nftables. :blobcatthink:

There was a thread where someone was suggesting filters at web server level for fighting the #AI #scrapers, but I can't find it again... Anyone got link?

Context: could be useful for https://mstdn.social/@rysiek/114761366640227394

#AskFedi #SysAdmin

0 0 0 0

Not sure why I had to do two CAPTCHAs before I could pay our dentist. I'd entered all the credit card information already. Are OpenAI's scrapers paying dentist bills now? #captcha #scrapers #llm

0 1 0 0
Original post on gts.dc09.ru

Ну что, это должно было произойти…

#Anubis запущен на:

* lxv.dc09.ru (LiteXiv)
* wp.dc09.ru (Wikimore)
* ak.dc09.ru (Akademik)
* определённых путях[^1] на git.dc09.ru (Forgejo)



ИИ-скрейперы теперь не будут перегружать сервер в своих грязных целях и не получат данных более, чем […]

0 0 0 0
Original post on gts.dc09.ru

Well, it had to happen…

#Anubis is deployed on:

* lxv.dc09.ru (LiteXiv)
* wp.dc09.ru (Wikimore)
* ak.dc09.ru (Akademik)
* specific routes[^1] on git.dc09.ru (Forgejo)



#AI #scrapers won't overload my server for their dirty purposes again and won't get more data than a legitimate […]

0 0 0 0

J'ai deploy #anubis pour protéger mon blog. La doc est devenue bien plus claire que celle de janvier et ça a fonctionné du premier coup !

Je vous recommande grandement de l'utiliser si vous souhaitez vous protéger des scrapers 👀
#ai #scrapers #selfhost

0 0 0 0