Posts Tagged ‘ebook’

* Calibre Weeks in Review

Posted on April 24th, 2011 by John. Filed under calibre.

Once again this is a weeks in review instead of once. I’ve been focusing more on new features than blogging. Calibre 0.7.57 is out and is also the beta for 0.8. Adding the tweak “test_eight_code = True” will enable the 0.8 features.

Get Books aka Stores

For quite some time I’ve been working on integrating support for searching and connecting to third party stores to make it easier for users to find and acquire books they’re interested in. This is a very large feature and one I’m very excited about. There are two pieces: The individual store plugins that connect the user to a given store and a meta-store search that searches all of the store plugins at once.

For 0.8 I have support for 14 stores. They are a mix of big name, independent, paid, free, and public domain. There is something for everyone. The majority of the stores are implemented though an embedded web browser. This is because the majority of stores are only accessible via their web site. MobileRead is the one exception but I’ll talk about that later.

By default accessing the stores is done though the embedded web browser but each store can be configured to open in the system web browser instead. One major befit of this approach is I’m able to detect ebook downloads. When an ebook is downloaded it is automatically added to the currently open library.

MobileRead is the exception and opens in it’s own search window. Right now it opens to the specific book’s entry in the embedded web browser so you can see details and download the book.

The meta-store search (along with MobileRead’s search dialog) allow for full boolean and field logic. Just like the main calibre window. The search gets results from every store and shows them in one easy to sort list. Title, Author, Price, DRM status, Store, and Formats are all listed.

PDB – Plucker Input

Not much to say about this but it’s been a long time coming. Plucker is now supported as an input format. Not all features are supported (tables for instance). However, plucker files for pretty much every source will have the main content come though.

Tags: , , , , .

    Comments Off

* Amazon APNX file format

Posted on February 9th, 2011 by John. Filed under programming.

Coming with the Kindle 3.1 firmware is the ability to have real page numbers. Getting ready for this Amazon has put out a preview release of the 3.1 firmware and has started adding the necessary information to Kindle books to show the page numbers.

The page numbers themselves map to the pages of the corresponding print book. Over all it gives a very pleasant experience. Amazon has implemented the page mapping though a new auxiliary file that has the .apnx extension. Doing this they can easily add this feature to all existing books and not have to worry about incompatibilities with older Kindles.

There is an easy way to tell if a book is going to include the APNX file. Look for “Page Numbers Source ISBN:”in the Product Details. All books that map pages to a print book will specify which edition they map to.

Now on to the more technical part of this post. I’ve spent some time looking at various books that Amazon is distributing with the APNX file and I’ve been able to reverse engineer the format. It’s a very simple format and after the header information is simply a list of 4 byte big-endian integers that correspond to locations in the uncompressed text. The position of the integer in the list corresponds to its page number.

Following is the documentation of the APNX specification I’ve written:


apnx files are used by the Amazon Kindle (firmware revision 3.1+) to
map pages from a print book to the Kindle version. Integers within
the file are big-endian.


bytes   content             comments 

4       00010001            Format identifier. Value of 65537 little-endian.
4       start of next       The offset after ending location of the first header.
                            Starts a new sequence of header info
4       length              Length of first header
N       first header        String containing content header
Starts next sequence
2       unknown             Always 1
2       length              Length of second header
2       page count          Total number of bytes after second header that
                            represent pages. This total includes bytes that
                            are ignored by the pageMap.
2       unknown             Always 32
N       second header       String containing the page mapping header
4*N     padding             The first number given in the page mapping header indicates the number of 0 bytes.
4*N     page list           

Content Header

The content header is a string enclosed in {} containing key, value pairs.

content             comments

contentGuid         Guid.
asin                Amazon identifier for the Kindle version of the book.
cdeType             MOBI cdeType. Should always be EBOK for ebooks.
fileRevisionId      Revision of this file.


Page Mapping Header

The page mapping header is a string enclosed in {} containing key, value pairs.

content             comments

asin                The ISBN 10 for the paper book the pages correspond to
pageMap             Three value tuple. Looks like: "(N,N,N)"
                    1) Number of bytes after header that starts the page numbering sequence
                    2) unknown
                    3) unknown


Page List

The page list is a sequence of offsets in the uncompressed HTML. Each
value is the beginning of a new page. Each entry is a 4 byte big endian
int. The list is ordered lowest to highest.

Tags: , , , , , , .

    Comments Off

* EZReader Pocket Pro Review

Posted on December 29th, 2009 by John. Filed under Opinion.


My wife bought me an Astak EZ Reader Pocket Pro For Christmas this year. This device isn’t my first or even second ebook reader. It is now my third. The first a Sony PRS-505 having been commandeered by my wife. The second is a Cybook Gen 3 which due to the firmware update shortly before Christmas might stay my primary reading device.


The Pocket Pro (PP) retails for 199 USD and US residents can purchase it at the the EZ Reader website. Mine came with a 2GB SD card, serviceable leather cover, usb cable, AC adapter, and the usual marketing / user materials. All this makes it the best deal I’ve found for ebook readers in the 5″ size.


The PP is a 5″ device and uses an eInk screen like most ebook readers. I did not find the 5″ screen to be too small. It is a good balance between portability and readablility. It does cause a few more page turns than with the larger devices but it was not cumbersome in any way. Overall found the size to my liking.

It comes in a variety of colors and feels good in your hands. The paint gives it a rubberized texture. It’s light while still feeling solid and sturdy. Visually it isn’t the best looking device but the buttons along the bottom work well enough for navigation. It is very similar to how the Sony (non-touch) readers work. However, I do think Sony, having put the buttons next to where they correspond to the screen, makes it a bit more intuitive than matching the number to the button as is required by the PP.

One area where I felt the PP’s hardware design was problem is with the thumb wheel along the right hand side. I had issues using it to turn pages. It would often turn more than one page. It is also a hard plastic nub and after using it for awhile my finger started to hurt. I soon stopped using it and only turned the page using the buttons.


One thing that I really like about the PP is how easy it is to change the firmware. There are a number of companies selling branded versions of the device. It is really a Hanlin V5 made by Jinke. The various companies that sell the device all have their own versions of the firmware that deviate to different degrees from what is produced by Jinke. The LBook has one of the more customized firmwares. I have tried it and found that it is a bit on the usable side because the majority of it is not in English. While there are a number of firmware options I’m going to focus the remainder of this review on the firmware available from Astak as of this writing.


It works. That’s the nicest thing I can say about it. The only thing it does is list all folders in the storage location. You select the folder and it opens it. When you get to a book you want to read you select it and it opens. On the surface this doesn’t sound so bad but compared to other devices (the Cybook and PRS-505) it is terrible.

When listing folders it lists all folders. Even system folders that should and do not contain books. Also, it does not read any metadata such as author or title. You only have the folder and filename to go by. There are no tags, collections, genre views or custom sorting. Selecting books is a slow, cumbersome and painful process.

Another issue I have with the bookshelf is it tries to force the use of an SD card. It displays the SD card and the main memory separately. It also defaults to the SD card when ever the device is turned on. I’ve gotten used to the combined view other readers offer and I don’t care if the book is on the SD card or the main memory. I just want to be able to get to my book quickly.

Fonts are another thing I have an issue with. I like the fact that users can include their own fonts. You can also set a font as your default font so you don’t have to change it every time you open a book (this doesn’t actually work, see the EPUB Rendering section). However, the only way you can add your own fonts is with an SD card. They can only be read from an SD card. There is no way to add your own fonts by putting them in the main memory. I have no idea why this is the case but it is an annoyance.

The bookself falls flat but it’s not the main place a person will be. Reading books is the main purpose of the device. My ebook library is mainly in two formats. .txt and .epub. Lets talk about how well it works with reading these formats.

TXT Rendering

One major thing it gets right, in my opinion, is justified text. It does a pretty accurate representation of the text. Another thing it does well is you can change the font size easily. Page turns happed very quickly. Much quicker than my other readers which was a pleasant surprise.

However, it does do some fancy auto detection of components and renders them differently. Words are often hyphenated and span two lines. This is without regard to where or what comes on the second line. Many times the last two letters and the period will appear alone on the second line because it is the end of the paragraph. This causes the text to become disjointed and ugly.

EPUB Rendering

Just like with TXT rendering it is very accurate and just like TXT rendering this is also a problem. It’s so accurate that you cannot change the font. Only the font size can be changed. I tried reading two EPUB files with it and found both to be unreadable.

Harry Potter’s Bookshelf by John Granger was the first EPUB I tried. Upon initially opening it, the text was too small to read. Increasing the text size to a reasonable level made it possible to read the text. However. the margins increased as well. The book has small margins included but to have the text at a reasonable level the margins ended up taking up a quarter of the page, each top, bottom, left, and right. This made the screen essentially a small little window with text. The text being justified only allowed for a few words per line with large spaces between them.

Page turning with Harry Potter’s Bookshelf was completely contrary to how wonderful it is for TXT files. With this book, turning the page was very slow and it didn’t always work. 3 out of 5 button presses wouldn’t register. The light lit up and nothing happened. To make the page turn I had to start holding down the button until the page changed and if it didn’t change after a few seconds I would let go and hold the button down again.

The second book I tried was Word War Z by Max brooks. It doesn’t even open. This is not an issue with DRM because it had been removed.

So far the two books I have as EPUB that I want to read cannot be read on the PP. Those same books open and render beautifully on both the Cybook and the PRS-505.


The PP is disappointing. The hardware is nice; I really like the 5″ size. However, the poor bookshelf, the poor rendering, and the inability to even open some books makes it pretty much unusable. I’m going to keep looking into new firmware releases but until I can actually use it to read books it’s not much more than a poor substitute for a paper weight.

Tags: , , , , , , .

* Calibre Week In Review

Posted on December 22nd, 2009 by John. Filed under calibre.

Not much on the calibre front for this past week from me. The only thing I’ve worked on was adding support to the Nook driver for the cover image to be sent to the device with the book. Also, if there is no cover associated with the book a default cover with the title authors and calibre library image are used. Similar to what already happens with Cybook Gen 3 and Opus.

Tags: , , , , .

    Comments Off

* Why User Replaceable Batteries Don't Matter

Posted on November 30th, 2009 by John. Filed under Opinion.

Robert B, who is Astak’s Director of Bus. Devl., posted a blog entry about user replaceable batteries. I mostly agree with him that they are a benefit to consumer electronics. I mostly agree because I don’t see them as a positive in every case. I posted the following on his blog as a comment but I wanted to post it here as well. This is in response to the statement that reviewers don’t get the idea of user replaceable batteries.

It’s not that most reviewers do not get the idea of a user replaceable battery, it’s that it really isn’t a selling point to most people. There are three reasons I can think of as to why a user replaceable batter does not matter.

1) Sealed in causes the device to be cheaper to produce and thus cheaper for the consumer. This leads into point two.

2) The device is not seen as a long term investment. This is very reminiscent of how Apple positions the iPod by inciting consumers to upgrade to the latest release. In one or two years the device will be replaced with a newer model. As someone who is looking to buy my third ebook reader for the third year in a row I haven’t had to worry about the battery wearing out and needing to be replaced.

3) Worries of availability. While it is very easy to buy a spare battery now what about in 5 years from now. Chances are the product will not longer be produced as the company has moved on to better and cheaper technology. 5 years from now obtaining a replacement battery can easily be impossible or cost prohibitive.

Tags: , , , .

    Comments Off

* Unidecoder

Posted on October 31st, 2009 by John. Filed under programming.

A while back I made a post about ASCIIizing Text. With it was a simple python application that would convert Unicode characters to ASCII equivalents. It doesn’t do a basic conversion but also Latinizes the characters when they are outside of the ASCII range.

The uni2ascii package I made has a few short comings I’ve decided to fix. The three major problems with it are: 1) Very basic permission checking, 2) Only accepts one file, 3) Required all input to be UTF8 encoded, 4) The decoder was a very literal port of a the ruby version.

To fix these issues I’ve written an entirely new script. Problems 1, 2 and 3 are fixed. It has robust error checking, can handle an arbitrary number of files, and the file encoding can be specified. Number 4 is fixed by using the Python port created by Tomaz Solc.

I’ve put the source code for the new decoder into a Launchpad branch:

$ bzr branch lp:~user-none/+junk/unidecoder

Tags: , , .

    Comments Off

* Calibre Week in Review

Posted on October 19th, 2009 by John. Filed under calibre.

Like every week there were miscellaneous bug fixes. However, this week I did a bit more. TCR input and output. Do be warned that the output supports multiple compression levels; the higher levels being slower than the lower. For instance a 200K TXT file as input will take around 25 seconds on the lowest level and 3.5 minutes at the highest.

TCR is an compressed text format used mainly by the Psion 3 and 5 series PDAs that were produced in the 90s. The compression used by TCR files is very interesting. It doesn’t have as high a compression ratio as say zlib but that is a trade off for being decompressable starting at any point in the stream. The history and more information about the format can be found at Andrew Giddings’ TCR page.

Tags: , , .

    Comments Off

* Calibre Week in Review

Posted on October 11th, 2009 by John. Filed under calibre.

I haven’t had one of these for quite some time. I’ve been working on other projects and on the calibre font I’ve only dealing with small bug fixes. However, this past week I’ve done a bit of work that is worth mentioning.

I’ve cleaned up the FB2 output. It fixes some invalid markup. Fixes some issues with text not being displayed by FBReader. It also fixes some issues with invalid characters making there way into hrefs.

eReader PDB output also got some love. Some kind people have been working on the reverse engineering of the file format and have filled in a number of the blanks I left. All of the additional information that has been discovered has been added to the files produced. The two main things that have been added are chapter and link indexes. The chapter indexes give the nice names at the top of the eReader viewer application. The link index allows links to work in the eReader viewer application.

To coincide with the eReader PDB output changes, PML input and output had some cleanup. It looks better now and replaces unicode characters with the \UXXXX equivalent.

Tags: , , , , .

    Comments Off

* Calibre Two Weeks in Review

Posted on August 2nd, 2009 by John. Filed under calibre.

This time I missed last weeks week in review because I simply forgot. I’m hoping to keep this to a minimum in the future.

The big news is calibre 0.6 has been released. Kovid is now back to his regular (weekly at the least) bug fix releases too. As of now the latest version is 0.6.4 and I get the feeling that 0.6.5 is right around the corner. For a listing of what’s gone into the 0.6 release take a look here.

Some features I’ve been working on that are included in the current release are: asciiize text, and iRex Iliad support.

The asciiize feature is one I’ve been wanting for a while and I’ve finally implemented it. It’s based on the ASCIIize text post I made last week. There is now an –asciiize option for the ebook-convert command and an option for the conversion dialog in the GUI. The basic premise of this feature is unicode characters often are not displayed correctly by ebook readers. My Cybook Gen 3 exhibits this behavior. ASCIIize transliterates the unicode text to an ASCII representation. Meaning it takes “Михаил Горбачёв” and converts it to “Mikhail Gorbachiov”.

The iRex Iliad is now as supported as I can get it. calibre detects it as a device, displays the list of books on the device and you can send books to it using send to device. The part I’m running into issues with is the manifest.xml file. It looks like it’s similar to the Sony PRS’s media.xml file, meaning it is a quick store for metadata. However, I don’t really know what goes into this file or should I say files (there are more than one). It also doesn’t look like the device updates it in any way because I had a user send me the files off of their Iliad and they were empty even though the user has put 20 or so books on the device with the Mobipocket desktop software. On the bright side it works well enough that I don’t think anyone will notice.

On the GUI tweaks front (these won’t be in a release until some point in the future) I’ve added a history drop down to the search field. There is a swap button for authors and title in the metadata bulk dialog, and you can hide the toolbars in the ebook viewer.

The remainder of what I accomplished over the past two weeks was bug fixes and code refactoring. Just th boring stuff that I would rather put off versus actually doing it.

Tags: , , .

* ASCIIize Text

Posted on July 24th, 2009 by John. Filed under programming.

One pet peeve of I have with my Cybook Gen 3 is its inability to properly display unicode characters in plain text files. I don’t need anything fancy like Japanese characters just simple things like “ and ” (as opposed to ” and “). To solve this problem I’ve been thinking about adding an –asciize option to calibre. I say thinking because I didn’t really know where to start. Thankfully a user recently requested this very functionality in bug #2846. He even included a link to work to accomplish this very task.

I will be integrating transliteration of unicode to ascii into calibre soon. However, in the mean time here is a script and classes, see unidecoder for a better method, to accomplish this task outside of calibre. This is my python port of the ruby unidecode gem. Which is a port of the original perl Text::Unidecode.

The major differences between my implementation and the others is it’s written in python and it uses a single dictionary instead of loading the code group files as needed.

You can find out more on how this all works at

Tags: , .

    Comments Off