Although this specific topic is fascinating, it evokes in me a more general wonder: will we (soon?) get to a point where every court document, even those handwritten by people whose last name was intentionally forgotten, is preserved digitally and in a distributed fashion?
And can a legal system in the information age claim legitimacy if the answer is not "yes!"?
It's amazing how much we don't know about how these cases were handled and the bases for their disposition at a time when this horrifically oppressed class of people (ie, all black people in slave states, and many in free states) attempted to pursue an intellectual solution to this particular instance of state violence.
In short, I wonder: after two hundred more years pass, will our capacity to understand the circumstances of the legal and political failures of today be much better than the lens by which we view these documents?
I feel confident that the answer is "yes," but I also want it to be a more mainstream part of what passes for sociopolitical dialogue.
> will we (soon?) get to a point where every court document, even those handwritten by people whose last name was intentionally forgotten, is preserved digitally and in a distributed fashion?
In 2009 Aaron Swartz became the subject of an FBI investigation for copying freely-obtained court documents for the purpose of increasing public access (which would also help with distribution and preservation as you said). The program to provide monetarily free but physically cumbersome and limited access was shut down[1], probably because of Swartz's use. So, I doubt it will happen soon, unless some shakeups have been quietly happening since then.
I'm not sure what your point is. The FBI got involved because Swartz's script led to a library racking up a multi-million PACER bill and they rightfully suspected a security breach. it had nothing to do with the issue of making the documents more widely available.
I'd love to see Congress pony up the extra dollars so PACER could be run as a free service. But I think there is very little practical benefit to making these documents slightly more accessible to people who don't even have enough stake in them to justify the $3.00 max per document PACER fee. The big advance was making these documents publicly accessible in the first place, and we overcame that hurdle a few hundred years ago.
Just so we're on the same page, the number I've seen is a "$1.5 million bill" that the libraries racked (but presumably they would have paid nothing since they were piloting free access.) Something's a bit off about how they cry about millions when it's a good that's legally required to be produced, is public information, and they set their own price for the digital distribution. Anyway, maybe this is just me, but if I were running a public access program freely providing information and noticed a lot of that free information being downloaded, my first reaction wouldn't be to call up the FBI. I don't know if this file[1] is leading with a retroactive summary, but it sounds like they knew from the beginning exactly what was going on, and that it wasn't a real breach. And if I were going to sic the FBI, I'd have a better reason than the fact that volunteers shared their free accounts ("AARON SWARTZ would have known his access was unauthorized because it was with a password that did not belonged to him.") I guess this means they would have been perfectly fine with it if he used his own creds? Yeah, right... It sounds a lot more like a powerful institution pissed off that their monopolized data is being freed and having the FBI on speed dial, not one that was actually concerned about a breach.
It does seem like a "first world problem" to complain about whether you can download arbitrary court documents for free. But, if you're in the first world and the 21st century, reading about a court case at home in your online newspaper, why shouldn't you be able to instantly and freely consult the actual public court documents online? Or, sure it's great that they're "public" now, but the next advance would be making the documents more searchable (as in, search for free, unlike PACER[5]) and interlinked, so you can more easily find them whether or not you know which particular ones you "have a stake in." Swartz and others are evidence that there are people begging to build these systems for free (or at least on private donations), if the PACER hoarders would sacrifice a little of their $150 million surplus[2] and charge even a one-time $1,000 for a third party to get a copy of the entire corpus on their own provided hard drive. The courts could provide a few megabytes' worth of signed hashes and let third parties take care of the majority of the content distribution. People will seed gigabytes worth of data like this[3] and build free tools for working with it[4] if the "owners" would just let it out of the walled garden!
Even if they don't give completely free access, the huge surplus is proof that it costs less than 8 cents per page/$3 per document! (It's so astoundingly huge, I'm willing to hear why it's incorrect, but I'm doubting the NY Times got it wrong.)
Frankly I'm surprised Swartz didn't get charged under the CFAA for this. I can't entirely defend it -- extracting an auth token and using it from another ___location for full-scale scraping is a grey area, yet apparently the Computer Crime and Intellectual Property Section determined it was not "black enough" to be illegal. Or maybe they didn't want to publicly draw more attention to the courts' huge revenues off of public documents? Some people speculate the zeal with which he was prosecuted later, against the actual harmed parties' wishes, stemmed at least partially from his PACER stunt.
> Just so we're on the same page, the number I've seen is a "$1.5 million bill" that the libraries racked (but presumably they would have paid nothing since they were piloting free access.)
The $1.5 million bill goes to the magnitude of the unauthorized access. PACER isn't just (or even primarily), a public-facing interface to these documents. It's the system lawyers and courts themselves use to access these documents. There's a substantial DoS risk there that warranted investigation.
> But, if you're in the first world and the 21st century, reading about a court case at home in your online newspaper, why shouldn't you be able to instantly and freely consult the actual public court documents online?
Absolutely nothing stops whoever wrote the article from hosting the relevant documents at his or her own expense.
> Even if they don't give completely free access, the huge surplus is proof that it costs less than 8 cents per page/$3 per document! (It's so astoundingly huge, I'm willing to hear why it's incorrect, but I'm doubting the NY Times got it wrong.)
PACER subsidizes the rest of the judiciary's IT. The idea is that the people who use court services (litigants) should be the ones to pay for its upkeep, through user fees. So PACER fees aren't the government making "huge revenues off of public documents." The primary purpose of PACER, again, is as a tool for litigants, so it makes sense for it to be a revenue source for the judiciary.
The idea of the government distributing hashes and letting people host documents P2P style is cool, but what's the point? What are people going to do with that information? It's just information for information's sake. The idea scratches a techno-futurist itch, but I think there is very little tangible benefit to making these documents more easily available to people who don't care enough about them to pay a couple of bucks for them on PACER.
...says the guy with lots of disposable income. What makes sense for paying for a legal mechanism, can introduce crippling inequalities in access to information. But who cares if the poor have access to legal aid.
> The $1.5 million bill goes to the magnitude of the unauthorized access.
Given it's derived from a non-linear pricing system, I don't think that's the clearest way to express it. And I'm not sure it's accurate -- NY Times says Swartz got 19,856,160 pages. At a flat eight cents per page, that's $1,588,492.80, so it's possible the $1.5MM estimate is ignoring the $3/document maximum and any other pricing rules that exist. If it were about DoS, they would talk about how much bandwidth was taken from "legitimate" users, how many "legitimate" requests were slowed, any downtime or other obvious harm to "legitimate" users, etc. Instead, they quote big dollars, which means some combination of: there was no actual DoS, they care more about lost revenue than serving users, they want to impress the reader/FBI with big money. This "unauthorized access" somehow wasn't worthy of prosecution, not even $1.5 million worth of it.
> There's a substantial DoS risk there that warranted investigation.
But it sounds like, by the time the FBI got involved, they'd already determined the "DoS" was coming from one user of the free access program and they shut it down. They could have easily firewalled access from outside libraries and rate-limited free requests from inside libraries, to protect the system. Involving the FBI was punitive, not protective.
If public read access was threatening the private write side, or the critical users' read access, maybe they're doing something wrong and should spend some of those millions on fixing it.
If abuse by people downloading things willy-nilly because they're not paying is a legitimate threat to the system, again there are organizations and people who would be happy to offload that burden, if the courts would give them the data. Carl Malamud used $60,000 of donations to buy 50 years of federal appellate court data to put it online. Instead of helping, they fight by forbidding the use of RECAP[1]. I'll paraphrase: "If we transferred a document to you for free, don't give it to anyone else. Even though the document itself is public, we waived the fee on our cost to deliver it to you, and you are willing to ignore your own cost to deliver it to someone else. Oh, but since our field is built on research, we do allow scholarly work, even though the preceding sounds like it's entirely anti-intellectual, anti-education, and anti-progress." Seriously, it's not even about abusively mass-downloading for free -- ANY re-uploading of something you got for free is forbidden.
> The primary purpose of PACER, again, is as a tool for litigants
Frankly, I don't know enough about the courts to opine on who should be the "primary user", but if it's not for the public, maybe they should change the name from Public Access to Court Electronic Records, and stop saying things like "Online access makes the public record truly public, which I think is of great value." or "The biggest challenge—and opportunity—lies in the area of preservation of the electronic dockets and opinions for posterity... We will need to... ensure that future generations can access this valuable information."[2]
> Absolutely nothing stops whoever wrote the article from hosting the relevant documents at his or her own expense.
Sure, but the reader has to trust that the documents provided are unaltered and comprehensive to what the reader is interested in. And per the above, PACER forbids the author from re-hosting if they got it fee-exempt. (In connection, I do see there is $15 worth of exempt access to each account, which would allow some of my hypothetical article readers to follow up on court sources for free -- but, this has happened since Swartz's escapade.)
> So PACER fees aren't the government making "huge revenues off of public documents."
We have different understandings of "surplus", then. As I understand it, they've got a big pile of money that's doing nothing right now. And I would imagine (or hope) that its use is somewhat restricted, since it was made by the public courts selling public data. If they spend it on something, maybe my position will change, but for now I just see it as mostly-illegitimate and a waste of peoples' money.
Your conclusion ignores my arguments that people could make new things with the data, and more cheaply than the courts would. And the courts might not ever make those things, if they're not within the courts' scope (which limited scope I hope you would agree is not the same as being worthwhile in general.) Going back to the article's topic somewhat, people would find, index, and link interesting things nobody would have thought to look for, let alone dig out their credit card for. Maybe as an actual user of the system, you don't find this compelling, but you haven't really specifically said why. Or your reason, that people who don't pay don't really care, is vague and doesn't address the transformative and meta-uses(?) I'm talking about. But my visions are kind of vague, too, I'll grant you.
P.S. I noticed the price has risen to $0.10 per page, capped at $3/per document. I suppose not even the courts are immune to the rising cost of digital paper that they have their own monopoly on, and they wouldn't want that surplus to shrink.
"At the core of the project is a database containing the identity of all slave-owners in the British Caribbean at the time slavery ended. As the project unfolded, we amassed, analysed and incorporated information about the activities, affiliations and legacies of all the British slave-owners on the database, building the Encyclopedia of British Slave-Owners, which has now been made available online."
And can a legal system in the information age claim legitimacy if the answer is not "yes!"?
It's amazing how much we don't know about how these cases were handled and the bases for their disposition at a time when this horrifically oppressed class of people (ie, all black people in slave states, and many in free states) attempted to pursue an intellectual solution to this particular instance of state violence.
In short, I wonder: after two hundred more years pass, will our capacity to understand the circumstances of the legal and political failures of today be much better than the lens by which we view these documents?
I feel confident that the answer is "yes," but I also want it to be a more mainstream part of what passes for sociopolitical dialogue.