Hacker News new | past | comments | ask | show | jobs | submit login

I’ve been archiving data.gov for over a year now and it’s not unusual to see large fluctuations on the order of hundreds or thousands of datasets. I’ve never bothered trying to figure out what exactly is changing, maybe I should build a tool for that…



Do you mirror these data sets anywhere?


It's not in any sort of format to do this kind of analysis unfortunately. I'm also missing some data b/c I throw away certain kinds of datasets that are not useful for me. I can probably write some scripts to diff my archives with the current data.gov and see what's missing, but it won't be "complete". But it might still be useful...

I did however just write a Python script to pull data.gov from archive.org and check the dataset count on the front page for all of 2024, here are the results:

https://postimg.cc/1V0WYtRt

As you can see, there were multiple drops on the order of ~10,000 during 2024. So it's not that unusual. There could be something bad going on right now, but just from the numbers I can't conclude that yet.

(Specifically it takes the first snapshot of every Wednesday of 2024).

If I get around to re-formatting my archives this week, I'll follow up on HN :).




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: