Hacker News new | past | comments | ask | show | jobs | submit login

I had a gig doing this once in grad school. Here's my method. It worked great:

First, close the library (or library section) until your work is complete. It's critical that the books not go wandering or get rearranged during this process.

Next, grab an SLR (or equivalent mirrorless) camera with video mode. Set it to video mode. In good lighting, play it over the shelves, one by one, from left to right. Slowly.

Make sure the spines are all legible. This is your set-of-books.

Set yourself or someone else up transcribing the titles from the recording, in the order shelved. Check it a couple of times. If you missed a book, or couldn't read the spine from the recording, add it here.

Once you are certain your list is accurate and complete, print (or put on your phone) the list of books. (Still in the order shelved.)

Now, again, working top-down, left-to-right, take books out in sets of eight. (I like eight because it's a nice round number, it's near miller's magic number, and it's also a number of books I can typically carry.)

For each 'byte' of eight books, take your SLR and, in photo mode, take a pic of the frontmatter page of that book -- the one containing the date of publication, and, most critically, the ISBN.

Put the eight books back on the shelf and take another eight. Repeat until complete. Be sure not to miss a book.

Now you have a list of books and a set of pics. Guess what? They are the same length and in the same order. So, book 1 on your list is the first pic on your SLR. And so on.

Now, you can OCR those pics for the ISBN. As backup/redundancy you can grab other info as well, e.g. publisher, etc. to sanity-check the results of your ISBN lookup.

Congratulations, you now have enough information -- a title and and ISBN -- for e.g. Google Books to pull up the rest of the info, which you can sanity check against the other deets you OCRed out of the frontmatter page.

Final tip: Calibre has a book information lookup thingy; it wasn't what I used back in the day, but AFAIK it should work great. It may be possible for you to simply populate the Calibre book list with titles and ISBNs, and have it just magically whisk other details -- date of publication etc. -- into the appropriate fields. Again, you can cross-check these (either exhaustively or spot-check) against the OCRed contents of the frontmatter pages, which (again) you associated with a book title in the initial step.

Happy librarianing!




I like the idea of photographing the books on the shelves, I wonder if you could possibly OCR the title of the book from the spine. Although I guess fonts used on books are pretty varied.


That's a cool idea. I was doing this in ~2008, and IIRC OCR was barely grabbing "ISBN" in 12pt Times on a white page, let alone RANDOM_FONT on a book spine.

(Unrelatedly I also had a gig OCRing books for someone with a visual disability, which is where I got the idea.)

Today I work in the AI division of a major tech company and I can say with some confidence that you could absolutely OCR spines if your OCR platform is new enough :3


This is a wonderful comment that juxtaposes any suggestion that requires buying a scanner. That said, I wonder if this works at the scale that OPs kid will be tasked with?


I processed a few thousand books this way. It scales nicely.

(also, why is this on-topic, strictly-procedural comment I spent 30 minutes writing downvoted? Hacker news, you are fickle)


Perhaps because it seems overly complicated, with a lot of manual parts prone to error? You made something that is N long (scan each book) into something over 3N (photo all might not be N, but transcribe titles, pull out, open, photo, then transcribe isbn/info is 3N or more.) I got the idea from a book (24 Hour Bookstore/Ajax Penumbra) but a panorama of a room to pick up the spines would be amazing.

I don't see it here, but even accepting the need to barcode/isbn all the books, I would like a way to pull digitized or high-quality spine images for a VR library that is fed from an ISBN database of what you want it stocked with, using a query for shelving/sorting. I fear that this is a whole new project, akin to the original good cover scans, but lacking even a starting point from publisher sites.




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: