Hacker News new | past | comments | ask | show | jobs | submit | dunham's comments login

Below I'm discussing compressed size here rather than how "fast" it is to copy databases.

Yeah there are indexes. And even without indexes there is an entire b-tree sitting above the data. So we're weighing the benefits of having a ___domain dependent compression (binary format) vs dropping all of the derived data. I'm not sure how that will go, but lets try one.

Here is sqlite file containing metadata for apple's photo's application:

    767979520 May  1 07:28 Photos.sqlite
Doing a VACUUM INTO:

    719785984 May  1 08:56 photos.sqlite
gzip -k photos.sqlite (this took 20 seconds):

    303360460 May  1 08:56 photos.sqlite.gz
sqlite3 -readonly photos.sqlite .dump > photos.dump (10 seconds):

    1277903237 May  1 09:01 photos.dump
gzip -k photos.dump (21 seconds):

    285086642 May  1 09:01 photos.dump.gz
About 6% smaller for dump vs the original binary (but there are a bunch of indexes in this one). For me, I don't think it'd be worth the small space savings to spend the extra time doing the dump.

With indexes dropped and vacuumed, the compressed binary is 8% smaller than compressed text (despite btree overhead):

    566177792 May  1 09:09 photos_noindex.sqlite
    262067325 May  1 09:09 photos_noindex.sqlite.gz
About 13.5% smaller than compressed binary with indices. And one could re-add the indices on the other side.

Yup, these results are pretty consistent with what I'd expect (& why I noted the impact of indices) cause even string data has a lot of superfluous information when expressed in the DDL ("INSERT INTO foo ...") - I would expect all of that to exceed any bookkeeping within the btree. And non-string values like blobs or numbers are going to be stored more efficiently than in the dump which is a text encoding (or even hex for blobs) which is going to blow things up further.

Some more anecdata - from this it looks like you could `VACUUM INTO` + `zstd --long -12` using 19.1s and get 109% of the size you'd get from `dump` + `zstd --long -5` using 32.8s. Saves 13.7s at the cost of 76M. YMMV, obvs.

sqlite3 3.49.1, zstd 64bit 1.5.7, gzip (Apple) 457.120.3

Original file (3.3G)

    3264290816 Photos.sqlite
VACUUM INTO (10.3s, 3.1G, 94.3%)

    3078881280 test.sqlite
gzip -k (76s, 1.1G, 33.1%)

    1080119337 test.sqlite.gz
zstd --long (3.2s, 987M, 30.2%)

     986252298 test.sqlite.zst
zstd --long -9 (8.8s, 903M, 27.6%)

     902282663 test.sqlite.9.zst
zstd --long -12 (21.5s, 885M, 27.1%)

     884863443 test.sqlite.12.zst
.dump (27.6s, 4.7G)

    4693437307 photos.dump
gzip -k (72s, 942M, 28.8%)

     941018021 photos.dump.gz
zstd --long (5.2s, 860M, 26.3%)

     859204016 photos.dump.zst
zstd --long -12 (31.7s, 827M, 25.3%)

     826776415 photos.dump.12.zst
(edited to fix a typo in a size and a conclusion that came from that)

Brilliant. >60% savings. 700mb? wow.

Is that really necessary?

Depending on the bandwidth at the target site, which may be pretty remote, and not exposing a public internet service.

Ah no, I meant “is the snark necessary?” to the parent comment. I enjoyed the read!

I learned about this when trying to decode data from Firefox IndexedDB. (I was extracting Tana data.) Their structured clone data format uses nan-boxing for serialization.

Surprisingly, GPT did manage to identify a book that I remembered from college decades ago ("Laboratory Manual for Morphology and Syntax"). It seems to be out of print, and I assumed it was obscure.

Can agree that it’s good at finding books. I was trying to find a book (Titanic 2020) I vaguely remembered from a couple plot points and the fact a ship called Titanic was invoked. ChatGPT figured it out pretty much instantly, after floundering through book sites and Google for a while.

Wonder if books are inherently easier because their content is purely written language? Whereas movies and art tend to have less point by point descriptions of what they are.


> Wonder if books are inherently easier because their content is purely written language? Whereas movies and art tend to have less point by point descriptions of what they are.

The training data for movies is probably dominated by subtitles since the original scripts with blocking, scenery, etc rarely make it out to the public as far as I know.


I must be tired. The thing you remembered was the name of a boat in the book and any web search engine and Wikipedia would probably give you the correct answer?

Someone ask ai where my handle comes from.


That's recent enough that mail forwarding should work, if they set it up:

> Standard mail forwarding lasts 12 months. You can pay to extend mail forwarding for 6, 12, or 18 more months (18 months is the maximum).

Edit for source: https://www.usps.com/manage/forward.htm


> > Standard mail forwarding lasts 12 months. You can pay to extend mail forwarding for 6, 12, or 18 more months (18 months is the maximum).

That's kind of awkward when you consider people will find that address for source code where that license file just wont be updated for decades to come, if at all.


We need DNS, but for mail addresses.

Maybe DNS for mail addresses is like a Post Office Box number? :-) https://en.wikipedia.org/wiki/Post_office_box

With 20/20 hindsight, if the FSF had used a P.O. Box number in the license, the license addresses would always be correct even if the FSF office changed addressed or (as now) was no longer maintained.

Of course, the cost of a P.O. box over 40 years would have added up to thousands of dollars and that is less money for FSF advocacy. And time spent going to the post office to check the box would also have taken away from advocacy time.

Another physical mail DNS-like idea is mail forwarding -- but it typically has time limits at the post office although not for private mail forwarders: https://en.wikipedia.org/wiki/Mail_forwarding "Private mail forwarding services are also offered by private forwarding companies, who often offer features like the ability to see your mail online via a virtual mailbox. Virtual mailboxes usually have options to get your mail scanned, discard junk mail and forward mail to your current address."

Although strictly speaking, these forwarding services are not quite like DNS (even if they do get at the idea of indirection). A true mail DNS would be more like a service you mail a post card to with a person's or organization's name and which mails a post card back to you which tells you what address to currently write to in order to reach that person or organization. (At least, if you write to that received address during some time-to-live window of validity of the address.) And I guess Encrypted DNS would be like you and the service using more expensive security envelopes instead of post cards? :-)


> Of course, the cost of a P.O. box over 40 years would have added up to thousands of dollars and that is less money for FSF advocacy. And time spent going to the post office to check the box would also have taken away from advocacy time.

To be fair, renting office space in downtown Boston also adds up to tens (if not hundreds) of thousands of dollars, every year. By comparison, $500 dollars a year [0] for a medium PO Box (in the lobby of the building for their new office, no less!) is a steal.

[0] https://poboxes.usps.com/findBox.html?q=02196


CGP Grey, a youtube channel, has a video on some of the problems of the postal codes and addresses from earlier this year that I learned about alternates to my familiar US based system. https://www.youtube.com/watch?v=1K5oDtVAYzk

Even moving once has made the need for this clear to me, it boggles my mind that it isn’t a (common) thing.

One thing I've been meaning to try, but never got round to, is to stick a URL on an envelope, pointing at a page with an address, and see if the mail (royal mail, in my case) actually deliver it. I suspect they would but that it would take a few extra days. It's no worse than some of the addresses that they do deliver.

What about encoding the address as a QR code?

This should not require any Internet access to view by whoever is scanning it to be sorted for delivery.


It also does not help you to update the address later.

It does if it leads to a web page with an address.

What happens when all project maintainers die and the source code disappears?


It does, but I think the person you were responding to was referring to the "This should not require any Internet access to view" part.

Hopefully it will never disappear, since Software Heritage and ArchiveTeam will have saved it.

https://www.softwareheritage.org/ https://wiki.archiveteam.org/index.php/Codearchiver


Hope is not a strategy. As much as I hate crypto, something on the blockchain might be more durable. You want something that isn't reliant on any one person or company to continue to exist (though maybe the long now foundation will) and even if Bitcoin goes to zero, I think there will be some die hard true believers to keep running miners even past the built in 2140 expiration date.

you also have die hard true beliviers data hoarder/archivist

since this is hacker news... i once had some trouble changing mail address from one supplier (they would send the materials to the new address, but insisted on sending billing/tax info to the old one) so i did the mail forward process some three times + their extensions (i recall it was 6 + 3mo or so)... it got me close to 3 yrs of reliable mail forward from the great folks at usps until i could get thru the supplier personnel thick skull.

the only issue "redoing" the request is that people at the old address can block it, so be sure to talk to them first.


> the only issue "redoing" the request is that people at the old address can block it, so be sure to talk to them first.

That's so strange, especially when you consider that for legal purposes, if you receive mail at someone's home, you are now a "resident" and it is harder for police to kick you out. Why would anyone willingly want your mail to come to your address.


Simply receiving mail does not make you a resident. You must establish residency and that is being allowed access to the home, the understanding that you are leaving belongings behind with the ability to access them later, how long you have stayed, and maintaining things like utility bills. A lease is a contract that clearly establishes the guidelines between two willing parties. Absent that, the definition of residency is typically delineated in your state landlord-tenant laws.

Disclaimer: in the USA


yeah, I didn't make it past the first page of text because of this.


I try to remember Vonnegut: "We are here on Earth to fart around. Don't let anybody tell you any different."


An older lady friend used to say, "People like to spend their lives screaming around. When they don't want to wake others, they quietly fart around."


Vonnegut truly nailed it


Amen.


Interesting, I can't reproduce it. I've got Chrome 134.0.6998.166 on macos and with profiling turned on, it's about 55ms for me (3ms of that is spent in scripting).


I learned the top / middle / bottom from a book in elementary school in the early 80's. I did it for a talent show and kids accused me of watching the guy mix it up and memorizing the moves (that would have been more impressive than simply solving it).

Later in college, having forgotten everything. I worked out the solution myself after a hint from a prof (that it's essentially conjugations of group elements).

Years later, I again developed a solution, but this time I do edges first, with permutations that mess up the corners and then the corners. I mainly mixed it up to do something unique.


It depends on what subset of Notion you use. Nothing (including Notion) is perfect for me. I'd like to build my own eventually, but I'm currently using Obsidian which doesn't hit your "works in the browser" requirement.

One option, which is open source and self hosted, is Trilium[sic], found at https://github.com/zadam/trilium It's open source, so if it's close to what you want, you might be able to adjust it to meet your needs.

Other commercial options include Realm, Tana, and Craft. With varying degrees of "AI".

I really like the UX of Tana for building out graphs of pages with properties, but it's slow to start up, doesn't support math, etc. So it's mainly a UX example for me.


Yeah this one is really frustrating.


Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: