Posts Tagged ‘apnx’
* APNX File Issues and Kindle Apps
Posted on March 21st, 2011 by John. Filed under hardware.
My last post was about the GUI plugin for calibre to generate APNX files. While it seems the Kindle apps (Kindle for Mac, Kindle4PC and Kindle for iPhone / iPad) don’t like APNX files with non Amazon ebooks.
A number of people complained that the APNX files generated with the GUI plugin were not working on their Kindle apps. I looked into it and what I discovered is not very encouraging. First the problem. If you take a MOBI file (generated with calibre for instance) and create an APNX file; then put those two files in the “My Kindle Content” directory the book will open but the page numbers will not display. The strange thing about this issue is the GUI plugin outputs the same APNX files as transferring to a Kindle via “Send to Device.” These generated APNX files are known to work on Kindles.
I installed the Kindle for Mac application and came to some startling conclusions about how Amazon is handling page numbers in their Kindle apps. Fortunately, the Kindle for Mac install comes with a few DRM free books that have APNX files. I looked at these Amazon provided APNX files and couldn’t find any real difference between them and the APNX files I generate with calibre.
When I first opened the application and selected a book I noticed there were no page numbers. Since I was only using this for testing and I was planning on removing the app after I finished I did not tie it to my Amazon account and I also had my firewall deny outbound connections for the app. After a time closing and opening the Kindle app page numbers were still not appearing.
I did some searching online and found an Amazon help page that had a few suggestions for page numbers not displaying. The first item is “Wait a few minutes: It may take several minutes for page numbers to become visible.” I can see this being the case if it needs to download the APNX file but all three of the books it comes with have APNX files…
I compared the Kindle for Mac’s configuration and cache directories before and after allowing it to connect to the internet. It appears that the Kindle for Mac app must connect to https://kindle-app-services.amazon.com (there is no web server at this url) before the app will display page numbers.
Once the app will exchange an encrypted question and receive an encrypted answer for Amazon. New encrypted values are then written to ~/Library/Application Support/Kindle/storage/.kindle-info.
The .kindle-info file holds a variety of different things. For instance, on my system removing the following will cause the app to ask me to agree to the Terms of Use.
[0TB-ZJBABgbzb3ZgZPBKZP0vbs0gB4Zb:bK0B0hZOBHbYZNBtZDBablZYZKbUbk0U]
mrsquash on MobileRead did some further testing of APNX files in the Kindle app. He was able to have the Kindle app display page numbers using calibre generate APNX files for Amazon provided ebooks. I did some testing myself and I can swap APNX files for Amazon ebooks.
This leads me to believe that the Kindle app(s) will only display page numbers once it has communicated with Amazon and verified the ebook file is from Amazon. The reasons I’ve come to this conclusion are: 1) Generated APNX files work on the Kindle and on Amazon ebooks. 2) APNX files can be swapped between Amazon ebooks. 3) Only non-Amazon ebooks do not display page numbers with a generated APNX or with an APNX from an Amazon ebook. My suspicion is Amazon is doing this purposely and restricting the displaying of page numbers to force people to buy from Amazon and not a third party.
* calibre APNX GUI Plugin
Posted on March 19th, 2011 by John. Filed under calibre.
The Amazon APNX file generation added to the Kindle device interface has been wildly popular. So popular that people want to use the APNX files without a Kindle. It turns out a large number of calibre uses don’t actually read using a Kindle but using one of the many reading apps Amazon produces (PC, Mac, iPad…). So I’ve created a GUI Plugin that allows users to create and save APNX files from MobiPocket (MOBI, AZW, and PRC) files. It can be found here.
Due to this feature being highly niche (only users of Amazon reading apps will have a use for it) I decided not to make it a part of calibre proper. Instead is being hosted as a 3rd party plugin on. The good news is the new Plugin Updater plugin will support my APNX plugin.
* Calibre Week in Review
Posted on February 13th, 2011 by John. Filed under calibre.
I’ve been putting up my week in reviews on based on a week starting on Monday for some time now. I’ve been thinking about this and it doesn’t really make much sense. Calibre has a release pretty much every Friday now. So starting next week I’m going to change my week in review to be Friday though Thursday. This way features I talk about in my review will be in the just released version.
TXT Input
First the small changes. Heuristic processing now enables smarten punctuation to further my goal of TXT documents coming out looking great. A change was made to have hard scene breaks separated from the text to ensure it doesn’t accidentally get merged into the paragraph before or after. The formatting type none was renamed to plain to correspond with the formatting output option.
The only big change for TXT input was a new paragraph type option was added. It’s called off. When specified there will be no modifications to the paragraph structure applied to the text. This is especially useful for Markdown and Textile formatted documents. It ensures there are no changes that will cause elements to render incorrectly.
TXTZ Input
A bug caused images to not be included when converting. With Kovid’s help this has been corrected.
TXT Output
I modified Textile output to not write %’s for span tags. The span tag is superfluous in calibre’s Textile output because it does not contain any real information. The span tags are invisible when rendering the XHTML. The %’s cluttered up the resultant TXT so they were removed.
PML Input
PML input saw a lot of of relating to \t and \T tags. The entire handling of these tags was rewritten. Unfortunately, there is no way to have these two tags map one to one to XHTML so only some common cases are handled.
- \T’s that do not start the line are ignored.
- \t’s that start and end the line use a margin for the text block.
- \t’s that start a line and end another line use a margin for the text block.
- \t’s that start a line but end before a line ending will use a text-indent.
- \t’s that are in the middle of lines are ignored. open and closed \t blocks within a line are ignored.
Heuristics
Once again the italicize common cases regex was tweaked. This time it was to fix an issue with None being inserted in the text before ajacent underscores. I’m hoping this is the last time for a while that I need to tweak them.
Kindle Interface
The work I did on the APNX format was undertaken for a very real world reason. Integrating APNX generation to calibre’s Kindle device interface plugin.
The 0.7.45 release saw the initial inclusion of this feature. After I received some user feed back I’ve tweaked it for the 0.7.46 release. The 0.7.45 release included a very basic APNX file that would create pages every 1024 bytes of uncompressed HTML.
In 0.7.46 there are a lot of differences. Writing the APNX can be disabled. This is very useful for Kindle 2 users as the Kindle interface works for both Kindle 2 and 3′s.
There are now two parser for generating pages. The default is the fast parser. It uses the uncompressed length of the MOBI HTML and creates pages every 2300 bytes. A few users complained that 1024 created too many pages. About double what you would find in an average paper back book. The 2300 number is a bit more than double 1024 and I chose 2300 after counting the number of characters in a page of an average paper back book. I counted approximately 2240 and added an additional 60 characters to account for markup per page. Thus 2300.
The other parser that can be enabled in the Kindle interface’s setting is the accurate parser. It works by decompressing the MOBI HTML and looking at the actual content. The big difference and why I’m calling it an accurate parser is it looks at the amount of visible text to decide when a page ends and a new one begins. The assumption is there are 30 lines per page and each line can have up to 70 characters. The parser starts a new line every time it encounters a new paragraph and every 70 characters in a paragraph.
The major disadvantage of the accurate parser and why it’s not the default is it’s slow. It requires the text to decompressed and parsed. With a PalmDoc compressed file this can take a few seconds but with a HUFF/CDIC compressed file it can take minutes.
The other minor disadvantage of the accurate parser is it cannot work on DRM content. The fast parser can because the uncompressed text length is stored unencrypted in the MOBI header. If the accurate parser is chosen it will fall back to the fast parser for DRM content. So when ever a Mobipocket book is sent to the Kindle (AZW, MOBI, PRC) an APNX file can and will (unless disabled) be generated.
One thing I will note about the accurate parer is it currently ignores all markup and only looks at text. Meaning it can be made even more accurate by accounting for <div class=”mbp_pagebreak” />, <br>, <hr>, images, margins, and font size changes. I do plan to add support for most if not all of these in the future but since most books people read on their Kindle are pretty much all text and because the accurate parser does a good enough job giving page numbers that correspond to the page length in a paper back book I’m don’t see a pressing need to spend the time on it at this moment.
* Amazon APNX file format
Posted on February 9th, 2011 by John. Filed under programming.
Coming with the Kindle 3.1 firmware is the ability to have real page numbers. Getting ready for this Amazon has put out a preview release of the 3.1 firmware and has started adding the necessary information to Kindle books to show the page numbers.
The page numbers themselves map to the pages of the corresponding print book. Over all it gives a very pleasant experience. Amazon has implemented the page mapping though a new auxiliary file that has the .apnx extension. Doing this they can easily add this feature to all existing books and not have to worry about incompatibilities with older Kindles.
There is an easy way to tell if a book is going to include the APNX file. Look for “Page Numbers Source ISBN:”in the Product Details. All books that map pages to a print book will specify which edition they map to.
Now on to the more technical part of this post. I’ve spent some time looking at various books that Amazon is distributing with the APNX file and I’ve been able to reverse engineer the format. It’s a very simple format and after the header information is simply a list of 4 byte big-endian integers that correspond to locations in the uncompressed text. The position of the integer in the list corresponds to its page number.
Following is the documentation of the APNX specification I’ve written:
APNX ---- apnx files are used by the Amazon Kindle (firmware revision 3.1+) to map pages from a print book to the Kindle version. Integers within the file are big-endian. Layout ------ bytes content comments 4 00010001 Format identifier. Value of 65537 little-endian. 4 start of next The offset after ending location of the first header. Starts a new sequence of header info 4 length Length of first header N first header String containing content header Starts next sequence 2 unknown Always 1 2 length Length of second header 2 page count Total number of bytes after second header that represent pages. This total includes bytes that are ignored by the pageMap. 2 unknown Always 32 N second header String containing the page mapping header 4*N padding The first number given in the page mapping header indicates the number of 0 bytes. 4*N page list Content Header -------------- The content header is a string enclosed in {} containing key, value pairs. content comments contentGuid Guid. asin Amazon identifier for the Kindle version of the book. cdeType MOBI cdeType. Should always be EBOK for ebooks. fileRevisionId Revision of this file. Example: {"contentGuid":"d8c14b0","asin":"B000JML5VM","cdeType":"EBOK","fileRevisionId":"1296874359405"} Page Mapping Header ------------------- The page mapping header is a string enclosed in {} containing key, value pairs. content comments asin The ISBN 10 for the paper book the pages correspond to pageMap Three value tuple. Looks like: "(N,N,N)" 1) Number of bytes after header that starts the page numbering sequence 2) unknown 3) unknown Example: {"asin":"1906694184","pageMap":"(4,a,1)"} Page List --------- The page list is a sequence of offsets in the uncompressed HTML. Each value is the beginning of a new page. Each entry is a 4 byte big endian int. The list is ordered lowest to highest. |
Tags
Archives
- April 2013 (1)
- March 2013 (1)
- February 2013 (1)
- December 2012 (2)
- October 2012 (1)
- August 2012 (1)
- July 2012 (1)
- June 2012 (2)
- April 2012 (1)
- March 2012 (1)
- February 2012 (3)
- January 2012 (3)
- December 2011 (2)
- November 2011 (1)
- October 2011 (3)
- September 2011 (9)
- August 2011 (15)
- July 2011 (5)
- June 2011 (3)
- May 2011 (4)
- April 2011 (2)
- March 2011 (2)
- February 2011 (4)
- January 2011 (4)
- December 2010 (2)
- November 2010 (1)
- October 2010 (1)
- August 2010 (3)
- July 2010 (4)
- June 2010 (1)
- May 2010 (2)
- March 2010 (1)
- January 2010 (8)
- December 2009 (5)
- November 2009 (6)
- October 2009 (4)
- September 2009 (2)
- August 2009 (6)
- July 2009 (6)
- June 2009 (4)
- May 2009 (6)
- April 2009 (4)
- March 2009 (2)
- February 2009 (4)
- January 2009 (4)
- December 2008 (7)
- November 2008 (2)