Expect bot/scraper/excel/sheets/etc traffic not pointing at data.naturalstattrick.com to stop working sometime tomorrow.
Posts by Natural Stat Trick
Under a week to transition any scrapers, google sheets, etc, and probably before the season ends
(looking like it'll be somewhere between Tuesday and Thursday)
Under a week to transition any scrapers, google sheets, etc, and probably before the season ends
(looking like it'll be somewhere between Tuesday and Thursday)
UPDATE: I am giving this one more shot on social media.
Does anyone know a good WordPress web designer or someone who has experience in publishing on WordPress?
DM me here.
Reminder: if you run a scraper on NST, you've got a little over a week to switch over to this before the anti-bot/scraper traffic restrictions get dialed up on the main site
Reminder: if you run a scraper on NST, you've got a little over a week to switch over to this before the anti-bot/scraper traffic restrictions get dialed up on the main site
For the back to work on Easter Monday crowd
Hunh. You don't see the "x" skip a team too often
Good to hear
Did I post this while forgetting to actually open the signup form up?
Yes I did.
It's on the profile page now, though. I'll be happy if that's the only hitch with launching this.
While I did say it wasn't high security, it is higher than "sequential key values" level security
...right now.
(whoops)
(and just an obligatory reminder that it is Easter weekend - I'll try to approve key requests as soon as I can, but it's a busy weekend so you might be waiting a few minutes or the better part of a day if you request one)
And while there is some time to get switched over, if your scraper is one of the ones that's been stuck since the site moved behind Cloudflare, I suggest you get switched sooner rather than later. No more troubleshooting is going to happen on letting more traffic through there.
I've tried to make it as simple as possible - this is simple authentication, not heavy duty security.
Send it as a custom header or as part of the query string, and point everything at data.naturalstattrick.com instead of www.naturalstattrick.com
(just don't accidentally share it publicly)
As mentioned at the start, this is intended to be the ONLY way to access the site with scrapers, bots, etc. That means the traffic restrictions are going to get dialed up everywhere else.
Not right away - I want to give people time to get set up - but soon. Plan on it being before the playoffs.
Use of the key is rate limited through tokens, matching 2 of the 4 old limits - 80 pages in 5 minutes and 180 in 1 hour stay, the 2 others are gone
Going over doesn't get you blocked anymore, you're just out until the refresh (but going over continually and excessively will get the key deactivated)
The big change here isn't just pointing to a new subdomain, it is that a key is also required. Those keys are free, but do require an account (also free) on the site and limited to one per account
Keys need to be approved (primarily to weed out bad faith usage), so expect a short delay during setup
So, here is that fundamental change - a new subdomain has been set up SPECIFICALLY for data scraping, bots and any other forms of automated access.
(and intended to be the only way fairly soon, but more on that a bit later in the thread)
(at least for now)
I will say that if you're using a headless/unattended/background version of a standard browser and are still having problems, you're probably out of luck.
I can't guarantee I can open it back up for everyone (without letting the problem scrapers back in too) but I'll at least take a look
2) if you haven't (or reverted and it still doesn't work), DM me with this from your most recent try:
-the public ip you're scraping from
-what you're using to scrape (excel, google, Python, etc)
-full link to one of the pages that is failing
If you are still having issues after this point:
1) if you made changes to your scraping method since Friday afternoon, revert them and try again
Excel should also be fixed now
Google Docs issues should be fixed now
The IPs being used were from all over the world, unfortunately
Ultimately, this may require some fundamental changes to how scrapers, google docs, etc, can access the site, even the ones that keep to a volume that doesn't cause problems. We'll see.
A fix is in place for now, unfortunately one that will probably block most scrapers and not just the problematic ones. It will have to do for now, but I'll keep looking for a better solution
I think it's one bot in particular, rotating through IPs (or using a bot net) and user agents to try to avoid detection, and sending such a high volume of requests that the server can't handle it well