Archive for the ‘calibre’ Category

* Calibre Week in Review

Posted on June 6th, 2009 by John. Filed under calibre.


This week hasn’t seen very much in the way of new features from me. I’ve only added one. This is mainly because I’ve been doing small bug fixes leading up to the beta for 0.6.

The new feature, which Kovid helped me to implement, is ejecting the reader from within the GUI. When you mouse over the reader icon in the location list it will show an eject button next to the icon. Clicking the eject button will do just that, safely remove the device from the system and remove it from the location listing. I did the GUI work, created the interface for the device driver and added the Linux ejection code. Kovid added the OS X code and we both came up with a Window solution but his was more robust to it was used.

Tags: , , , .



* Calibre Week in Review

Posted on May 31st, 2009 by John. Filed under calibre.


Not much happened this week. A few bug fixes and a new output format, RTF. It produces acceptable results. It also embeds images into the file. The output could use some tweaking, but this will come with time. The only caveat is the output is ascii only. This is to keep compatibility with Cailbre’s RTF intput which can only accept ascii rtf files.

Pluginize has been merged back into trunk. Once a bit of testing is done by Kovid, he will be rolling out a beta for the 0.6 release. For those of you, like me, who use Ubuntu and build Calibre from source, there is a little change you will need to make in order to have it build. Open the file /usr/lib/python2.6/dist-packages/PyQt4/uic/Compiler/qtproxies.py and modify _qwidgets on line 238 to include “QWizardPage”.

Tags: , , , , .



* Calibre Week in Review

Posted on May 24th, 2009 by John. Filed under calibre.


A lot of work went into eReader and PML to have it supported better. Also, a new format has been added.

The XHTML to PML parser has been completely rewritten. It is based on the XHTML to FB2 parser I wrote for FB2 output. It produces much better looking PML markup and the displayed output looks very close to the original XHTML source. One major advantage of the new parser is that it accounts for XHTML style information and translates that into PML tags. For example if text is set to bold by CSS then the text will get the bold PML tag.

eReader also got another very important addition. Support for Makebook (202 bye header) file input. Makebook and Dropbook are the two applications provided by eReader (the company) for producing eReader files. Makebook is the older application that is no longer supported. Makebook and Dropbook produce very different record 0 headers. This header has information about where the text, images and other things contained in the file are located. It took a while but I’ve been able to understand enough of the Makebook header to add input support for these files.

Makebook produces a 202 byte header while Dropbook produces a 132 byte header. After comparing header values and section sizes I was able to determine that the 2 byte int at offset 0×08 contained the start of the non-text offset. Just like the 132 byte header files, everything before this offset is text.

Images in the 202 byte header files were easy to find because they are in the same format as the Dropbook produced files. However, I didn’t bother to determine if there was a header value. Since all images are in PNG format and the their section start with the text PNG, I simply loop though all non-text sections and see if they start with PNG. If they do I know it’s an image and extract it.

The hardest part of the 202 byte header files was the text itself. Even though I knew which sections contained the text I didn’t know how it was compressed. This is where Google came to the rescue. On the homepage for the Z-DOC PalmPilot application I found there was some work to reverse engineer this older format. This page gave me the information I was looking for. Text is PalmDoc compressed and then xored with 0xA5. It looks like this xor is an attempt to obfuscate the compression used to make it harder to decompress. It isn’t for copy protection because the Makebook application only produces non-DRM files. DRM eReader files from that time would be created in a different manner.

Syncing news now supports auto convert in the GUI. It’s just like auto convert with sending email and sending an eBook to a device. If the book is not in a format supported by the device it will be auto converted to a supported format based on user preference.

The final bit of work this week was support for the RocketBook (RB) format. Both input and output are working. Though they both do need testing. Output in particular as I don’t have a device that supports these files so I can only guess based on my input code that the RB files produced are 100% correct. If someone has a device that reads RB files please let me know if the output files are read correctly.

Tags: , , , , .



* Calibre Week in Review

Posted on May 16th, 2009 by John. Filed under calibre.


This week comes with some great new additions. It also comes with some great new challenges. Not to mention more work for next week.

eReader output is complete and working. The files produced are the same format as those produced by Dropbook (more on this in a bit). In addition to eReader output I’ve added metadata writing for eReader files.

There is one major issue I’ve come across concerning the eReader format. The files produced by Makebook are significantly different than those produced by Dropbook. The files I’ve been using for reverse engineering the format have all been produced by Dropbook. My implementation for eReader input only works with Dropbook format files. As such, Makebook produced files will not work and are currently not supported. They will be unsupported for the foreseeable future because of how different the format is to Dropbook, and because Makebook is not supported nor being developed. It was replaced by Dropbook some time ago.

The PDB container format also got a metadata writer. However, the PDB wrapper itself only supports setting the book title. So that’s all that gets written to PDB files that either don’t support metadata in their internal format or that don’t have their own specific metadata writer.

eReader wasn’t the only format to get some output work. FB2 output has been added as well. However, there is no metadata writer for it yet.

The final bit of work I did for this week was to add auto convert to sending by email in the GUI. It’s the same idea as auto convert for sending to a device. If the file that is being sent is not in a format that is accepted by the email address (this is configurable setting) the file will be auto converted to a suitable format before being sent.

Tags: , , , .



* Calibre Week in Review

Posted on May 10th, 2009 by John. Filed under calibre.


Device interfaces can now be configured in the GUI. Also, there is a simple framework for creating plugin configuration widgets.

I’ve added a metadata reader for the eReader format. However, eReader supports 3 ways to set the metadata in the file. 1) In the pdb header (only supports setting a short title). 2) In the metadata section of the file (supports the most information: title, author, publisher, copyright, isbn). 3) Embedded in the text as a comment. 2 and 3 are only accessible if the book does not contain DRM (or has been unlocked, but Calibre does not support this). 3 is not supported at all with this metadata reader. The reader first tires 2 then falls back to 1 if the book is DRMed or if the metadata section is non-existent.

Two new input and output formats have been added. ztxt and palmdoc. They are both pdb formats like eReader. For input the pdb input plugin will automatically determine the internal format and call the appropriate code path. For output the default is palmdoc but there is an option –format that can be used to change it to any other supported pdb output format (ztxt is the only other currently). The format option is also available in the conversion dialog in the GUI.

Speaking of conversion in the GUI. It now works. There are all new dialogs for single and bulk conversion. Pretty much anything that can be done using the command line ebook-convert can be done in the GUI. Bulk, single and auto conversion are all complete and working. Auto conversion will also honor a users preferences for formats set for the device interface plugin.

Tags: , , , , , .



* Calibre Week in Review

Posted on May 2nd, 2009 by John. Filed under calibre.


It seems that PDF is becoming the never ending format for me. Maybe I should start naming the posts PDF Work instead of Calibre Week in Review…

One minor and one major change to PDF processing this week. The minor change was a fix for bug 2342. German umlauts are now displayed correctly in the output. The major change is PDF output now supports comics. cbz, cbr, cbc are some of the input formats for comics that are support and now you can turn them into a PDF. The huge advantage is for people (like me) who have a Cybook. A comic can be turned into one PDF file sized for the device keeping down the amount of clutter in the library view.

I also worked on the device framework and have pluginized all of the device interfaces (I like the term interfaces better than drivers because it reduces confusion as Windows device drivers are very different). They also sport a new configuration system (though they didn’t have configuration before at all). The user will be able to specify their preferred format order for sending to the device. As well as disable certain formats from being sent to the device at all. I said will because while the configuration code is done there is currently no way to call it in the preferences dialog. However, this will be rectified before 0.6 is released.

eReader output has been put on hold for the foreseeable future. eReader input is complete and working but due to the undocumented nature of the eReader format I have not been able to produce a working output plugin. The main issue I’ve run into is the eReader header (record 0 within the pdb container) is a 132 byte package with 66 sections. There are to many unknown sections. Even with the inspector script I wrote to see what the values are in working eReader files I have not been able to understand how all of the sections interact with the file itself. My guesses have all resulted in files that are not readable by the eReader Pro software.

eReader files uses the PML markup language and while I couldn’t get eReader output working I have added support for PML input and PML output. The PML output can be taken and put into either MakeBook or DropBook to produce a working eReader file.

Two things to note about the the PML support is input can take either a straight .pml file or it can take a zip archive filled with .pml files and PNG images (the images must be in PNG format). The zip archive must have the extension changed to .pmlz for this to work. PML output will produce a zip archive with the extension .pmlz. Within this archive will be all of the image files in PNG format and the produced .pml files.

.pmlz is simply an easy way to group the files and ensure that there is not issues with including missing files or not being able to find referenced files.

Tags: , , , , , .



* Calibre Week in Review

Posted on April 27th, 2009 by John. Filed under calibre.


This weeks review of what I’ve been working on is a little late. Overall it wasn’t as productive as last week looking at what was accomplished but I spent just as much time coding as last. With projects like this you can’t judge output by the number of features add or bugs fixed.

The GUI received context sensitive treatment for the device menu. It will only have send to device, when a device is connected and send to card A and B will only enabled when they are available as well. A simple change but one that will reduce confusion.

I’ve spent a lot of time working with Lee Dolsen (ldolse from mobileread) on pdftohtml processing rules. They are nearly complete and the output is looking really good. I know I’ve been saying that for a while now but each week it just keeps getting better. However, PDF is still not an ebook format and should not be treated as such. This simply helps to get content out of the PDF format and into a more manageable one.

One big thing I spent most of my time this week on was eReader input. Yep, eReader pdb files can now be converted to any supported output format. Metadata reading of eReader files is not yet supported. That is on my todo list. The html it produces could probably use some work but that will come as people report issues once 0.6 is released.

The other big thing that has taken up my Calibre time is eReader output. Sadly, it does not work. Also, it will not be working for the foreseeable future. The issue I’ve run into is I don’t know enough about the format to produce a file that can be read by eReader’s reading software. The main problem I face is there are around 66 “sections” to the eReader format header (not the pdb header, this is record 0 of an eReader file). I know what 10 of those sections are and what values they should have as they are used for my reader. Around 40ish of the sections should have a value of 0. However, that leaves 26ish sections that I don’t know what they are, what they do or what value they should have and how it relates to the rest of the file. Suffice it to say until I know more about the format I won’t be able to complete the output plugin.

Oh, I did write an inspector script (it’s in the eReader directory in the Calibre source tree) to help understand the eReader format. If anyone is interested in analyzing the format they can use it to help them see what is in the header.

Tags: , , , .



* Calibre Week in Review

Posted on April 18th, 2009 by John. Filed under calibre.


This has been a busy week for me on the Calibre front. All of my changes were to pluginize and the first three I talk about also made it into trunk and will be appearing in the next release.

I re-worked the mobi metadata reader so that it does not read the entire file into memory. It only reads the parts of the file that hold the metadata. The advantage is reading the metadata is now about five times faster. These results are from unscientific testing by a the bug reporter. Basically he said that listing the books on his Kindle went from 5 minutes to about 1 minute.

The metadata writer for pdf files has been re-worked and is now enabled. Kovid did some work to my initial work so that it won’t lock up the GUI when working with large pdf files.

I (a bit of help from Kovid on this too) was able to fix bug 2112 (last few pdf files held open). Calibre relies on Python’s garbage collector and object scope for closing files. It does not explicitly close them. The bug as caused by pyPdf which is a Python library Calibre uses to read and write pdfs. For some reason pyPdf’s file reader wan’t allowing the files to be closed. They were no longer in use and the object went out of scope but the garbage collector didn’t close the file immediately. It would close it eventually. A wrapper object was created and is used so that pyPdf doesn’t have a direct reference to the open file and it now gets closed properly.

The GUI in the releases only supports displaying one storage card from a device. Not all device support two storage cards but the Sony PRS devices do. Support for the GUI to display two storage cards has been added.

To go along with the GUI supporting displaying two storage cards, Almost all device drivers have been made to support up to two storage cards. The USBMS base class supports two cards and as most device drivers use this base they all get support for it without much work. However, this doesn’t mean that a device that doesn’t physically a storage card or two storage card slots won’t magically support two cards. All except the PRS drivers don’t have any user visible changes. For anyone looking to write a device driver using USBMS if the device supports two cards USBMS has you covered.

The PRS505 and PRS700 drivers both received the two card treatment. They also received a bit of work. They have been moved to use the USBMS base class. This removed a lot of redundant code and puts them on the same code path as the other (except PRS500) drivers. Overall this change is to reduce work in finding and fixing bugs and maintenance.

Internal work on the PRS505 and PRS700 drives wasn’t all I did to them. They no longer dump all books into a single directory. Books are stored in author/title/book hierarchy. News items are stored in a news/title hierarchy. They also support the USBMS / tag as a custom layout path.

Earlier I said almost all device driver got two storage card support. The PRS500 driver did not. It still only supports one storage card. Due to the way the driver works I will not be touching it.

I’ve been working with ldolse from mobileread and with his help the processing rules for pdftohtml (used for pdf input) have been improved.

Tags: , , , , , .