The Library Problem

by Zack Grossbart on November 30, 2007

In March of 2006 my wife Mary and I owned about 3,500 books. We both have eclectic interests, voracious appetites for knowledge, and a great love of used bookstores. The problem was that we had no idea what books we had or where any of them were. We lost books all the time, cursed late into the night digging through piles for that one book we knew must be there, and even bought books only to find that we already owned them. There were books on random shelves, books on the floor, we were tripping over books when we walked up and down the stairs. In short, we had a mess.

We needed to get organized. We needed a way to store all of our books so they were easily accessible. We also needed to integrate the two separate book collections which represented one of the remaining holdouts of our single lives. We got together and came up with a list of requirements for our new system. …and yes we are both engineers.

  1. It needs to be easy to find a book.
  2. It needs to be easy to add a book to the system.
  3. The systems needs to handle foreign language books.
  4. It needs to be easy to maintain the system going forward.
  5. The initial cataloging effort can’t take forever.

To complete this project we needed a system to organize all of the books, a way to quickly add books to that system, and a place to store all of the books.

A Place for Everything and Everything in Its Place

Our first task was to decide what system we should use for ordering the books. Most of the systems used to organize books are based on combinations of the author’s name, the title of the book, and the category of the subject matter. Some of the systems provide a general outline for where a book should be and other systems are very specific. We considered three different systems: alphabetical, Dewey Decimal, and Library of Congress.

Alphabetizing

Probably the most common system used for organizing home libraries is alphabetizing. Books are arranged in alphabetical order by title or author’s name. This makes books reasonably easy to find, but puts Peter Pan by J.M. Barrie next to Runner’s World Guide to Injury Prevention by Dagny Scott Barrios. This organization makes it difficult to browse books.

Adding categorization to alphabetical sorting can fix that problem. This system organizes books into categories and then alphabetically within those categories. In this system the book Three Seductive Ideas by Jerome Kagan might end up next to The Blank Slate by Steven Pinker because they are both about psychology. This system makes browsing by subject possible, but it requires you to create categories for each book. Should The State, War, and the State of War by Kalevi J. Holsti be categorized as international relations, warfare, or politics? Creating categories which will work well with a set of unknown books is very difficult. We needed a system with established categories.

Dewey Decimal

Dewey Decimal is familiar to just about everyone who came through the American educational system. There is a good chance the library from your grade school used Dewey Decimal Classification (DDC for short). DDC assigns each book a number based on its subject matter. DDC organized all categories into three levels. The system has 10 main classes, 100 divisions and 1000 sections. The book Larousse Gastronomique edited by Prosper Montagne may have a DDC number of 641.3/003 21 – 600 the main class for technology, 641 is the division for food and drink, and 3/003 21 indicates the specific subsection specified in that library.

However, DDC has one big problem. The assigned numbers are not fixed. There is no central authority assigning DDC numbers to books and the same book can have a different number in two different libraries. We didn’t want to spend time working out the right catalog number for each of our books; we just wanted to look it up. We could use the catalog of a large library system such as the Minuteman Library Network to provide DDC numbers for most of our books, but that system does not provide programmatic access to their database and does not assign numbers to many of the books we own.

lccSalt: A World History by Mark Kurlansky has an LCC number of TN900.K865 2003. This indicates that it belongs to the broad topic of technology, the sub topic of mining, metallurgy, and the subclass of nonmetallic minerals. It has a Cutter number of K865 representing the author’s name and it was published in 2003.

Library of Congress Classification

It was Mary who suggested using Library of Congress classifications (LCC for short). This is the system used in most university libraries in the United States. LCC uses two letter general codes followed by a set of letters and numbers specifying the location of the book. The LCC system was created in 1897 and has held up quite well for over 100 years. Wikipedia has a great reference page about the Library of Congress Classification system including a list of all the categories here.

Many books already have LCC numbers printed on their copyrights page. The Library of Congress also makes its catalogs available at http://catalog.loc.gov. The LCC system offered us a system which categorized all of our books into a well known list with categories which could be looked up programmatically. This is the system we chose.

To recap:

  • Alphabetizing with Categorization – Required us to design the subject categories, isn’t very precise, and must be done manually.
  • Hard Alphabetizing – Works a little better with computers, but has all of the other drawbacks of soft alphabetizing.
  • Dewey Decimal (DDC) – Has better sorting, but subject categories are still somewhat subjective.
  • Library of Congress (LCC) – Provides excellent sorting capabilities, has pre-defined subject categories, and can be categorized programmatically.

The Catalog

When I was 9 years old we had to take library class, spending time in the school library learning how to use it. Mrs. Snuffleupagus (none of use could pronounce her real name) would walk us over to a large cabinet full of index cards and tell us to use them to look up books while admonishing us to not touch any of them because our fingers were probably sticky.

Mary and I wanted a digital catalog with good support for sorting and an easy way to add, edit, and delete books. We also needed a catalog which would support LCC numbers and have a good interface when handling the number of books we had It should preferably work under Microsoft Windows or Linux, the two OS’s we are currently running.

I started my search by posting the question to SlashDot.org. I got a lot of responses. Some were useful, others were less so. My favorite was, “I think you lost most of the slashdotters when you started with ‘My Wife…’ People are googling this ‘wife’ to see what they can find out about the phenomenon.” I thanked my good fortune that I am interested in computers while still being blessed with female companionship and compiled a list of options. I first looked for a good open source alternative, but I couldn’t find one. I then figured the project was worth spending a little money on and compiled the following list.

  1. Readerware
  2. Delicious Monster
  3. Collectorz.com Book Collector
  4. FinderWare

Readerware

Readerware is flexible and pretty fast. It has a decently clean interface and supports Windows, Linux, Mac, and even Palm. You can create your own columns and it has pretty good support for Library of Congress Catalog information (with the addition of a provided Python script). It will find the LCC number based on the ISBN number. Readerware can also be customized with your own Python scripts. It costs $40. We chose Readerware.
readerware

Delicious Monster

Delicious Monster also costs $40. It runs only on MacOS, but that wasn’t the biggest issue for us. Delicious Monster has a slick looking interface which most Mac users will find familiar. However, it feels like a better solution for organizing 100 DVDs instead of 3,500 books. It also doesn’t have good support of Library of Congress Catalog numbers.

Collectorz.com Book Collector

Collectorz also costs $40 (actually $39.95) and works on Windows and Mac. It has the same problem as Delicious Monster of feeling like it is aimed at smaller databases. It also doesn’t have support for Library of Congress Numbers.

FinderWare

FinderWare also costs $40 runs only on Windows and once again feels like it is aimed at smaller libraries. It also has a clunky interface for adding multiple books at once and does not support Library of Congress Classification numbers.

A Custom Made Catalog

Readerware was a good fit, but it wasn’t perfect. This got me working overtime to create something better. “I can build my own catalog system,” I thought. It would be exactly what I needed, support large amounts of data, and import automatically from the Library of Congress Catalogs. “I’m a professional software developer. I could bang this out in a day or two.” The system would have a new kind of interface on the library data, so easy and intuitive to use that it would take over the world. Every library on Earth could use my software.

Mary found me a couple of hours later surrounded by the full Edward Tufte collection and every book on user interfaces we own. Well… only the books I could find. That is why we needed the system in the first place. It took one look from her for me to come to my senses and decide that building my own library software would be a much bigger project than I had time for.

The Scanner

isbn

Now that we had a library catalog system we needed to add the books to it. Most books have a copyright page containing publisher and cataloging information, most books published after 1975 have an ISBN number, and most books published after 1985 have a barcode on the back which contains the ISBN number. The ISBN (International Standard Book Number) is a unique number identifying that book. This number contains information about the book, where it is from, and who published it. You can use this number to look up the rest of the information about a book from many online sources.

The problem with the ISBN number is that it isn’t a very good number to use to catalog the books. Sorting by ISBN number would create a list which didn’t have anything to do with the author or the subject of the book. This would create an effectively random order of books and make it very difficult to find what you are looking for.

Typing 3,500 ISBN numbers into the system didn’t sound like fun so I went looking for a good bar code scanner. The first one we tried was the CueCat. The CueCat was manufactured as part of an abandoned marketing scheme. We found one for sale on eBay for five dollars and figure it was worth a try. We couldn’t make it work. We spent a few days and couldn’t make it scan anything. Other people have said they had some success with it, but we never did.

flicscanner

After our poor experience with the CueCat we decided to go a little higher end and bought a FlicWare scanner. The FlicWare scanner is simple, sturdy, and cost about $100 at that time. (It is now called the Microvision ROV, at $159.00.) There are a lot of other handheld scanners on the market and I can’t claim to have made an exhaustive comparison. The FlicWare scanner just seemed simple and had good support from ReaderWare. ReaderWare even provides a PDF file with settings specifically for this scanner. I’m sure there are a lot of other good barcode scanners out there, but this one has worked well for us.

The Bookshelves

With our scanner and catalog in place we needed somewhere to actually put all of the books. We had some bookshelves already, but we were going to need a lot more. The cost of our project up to this point was $140. I was worried that this was where it would start to cost some real money.

The bookshelves were more than just a functional choice. We had to live with them every day. We haven’t been in college for a long time. We have a mortgage and own a car. We are adults. Two by fours and cinder blocks just weren’t an option. Thos. Moser will sell you a tall solid cherry bookcase for $4,750.00. A bookcase of this size will hold about 180 books.

shelf_cost

Luckily HomeDepot came to the rescue. HomeDepot sold us a solid particle board bookcase of the same dimensions for about $30. They don’t look too bad either.

Adding the Books

At this point we finally had a catalog, a scanner, and a source for good bookshelves. We were ready to start adding our books to the system. It made sense to shelve the books and catalog them at the same time. We set up one of our new bookcases, the scanner, and a laptop in one room and went to work. The process went like this:

  1. Scan about 50 books.
  2. Import the books into ReaderWare and let it find the information about them.
  3. Manually find LCC numbers for any books which weren’t found.
  4. Sort the list and add each book to the shelf.

We started this process with one shelf and moved on from there. We kept a clear gap between the cataloged and uncatalogued books. 50 turned out to be about the right number of books to catalog at one time. 50 is a large enough number to make it go quickly, but a small enough number that it is easy to look through the stack of books. We also added a column to ReaderWare to indicate whether a book had been shelved yet. This made it easy to sort our books by LCC number and just add the books that had not yet been shelved. Once we got a little practice the two of us were able to catalog 150-200 books an hour. We didn’t do it all at once. We took our time and slowly worked our way through.

Foreign Language, Oversized, and Children’s Picture Books

We have a decent number of books in languages other than English (mostly Japanese, Chinese, Korean, and French). These books are often not part of the Library of Congress system. We also have a number of oversized art books. These books need shelves which are especially tall and strong. We kept both of these types of books out of the system. Children’s picture books are notorious for going out of print quickly and being difficult to catalog. We kept all of those books out of the system as well since many of them did not have LCC numbers. This accounted for about 200 books.

The Results

At the beginning of this article I identified the following criteria for our system:

  1. It needs to be easy to find a book.
  2. It needs to be easy to add a book to the system.
  3. The systems needs to handle foreign language books.
  4. It needs to be easy to maintain the system going forward.
  5. The initial cataloging effort can’t take forever.

We achieved all of these except for number three. There does not exist (to my knowledge) a system which catalogs all books in all languages. The entirety of human knowledge just isn’t that well organized.

We started this process about one year ago. A new baby and life in general got in the way a little, but we have cataloged two out of three floors worth of books. Our current cataloged book count is 1,634. 87 of those books didn’t have LCC numbers and are kept on a special shelf. As part of this process we sold, gave away, or recycled about 500 books.

We have designated a shelf as the temporary holding shelf for new books until we get around to adding them to the system. The system has been working very well for us. We know what we have and where to find it, but there has been an added benefit – we can now browse within a subject. When we want to read something new we can go to that section and look at what there is. We can also easily sort the list of books by author or the date we bought them. This makes it much easier to find the book you didn’t know you were looking for.

A few statistics:

Total books – About 3,500
Sold, given away, or recycled – About 500
Cataloged – 1,634
Exempted – about 200
Total cost of Project – About $440

By the way, if you are curious I estimate that Mary and I have read about 1,300 of our 1,650 cataloged books. About 80 percent. We are actively working on the rest.