Posts Tagged ‘pml’

* Calibre Week in Review

Posted on October 26th, 2009 by John. Filed under calibre.


Mostly bug fixes this week. The majority of them were centered around eReader PDB output and PML generation. eReader PDB output now marks the first image as the cover image if a cover image is not explicitly set. PMLZ got images named properly in the output. PML generation now has .png added to the end of image names. I also fixed a bug where excessive new lines were not being properly removed. PML, TXT, RB, FB2 output all got excessive space removal tones down so instances were spaces were completely removed will stop happening. Regex header and footer matching was tweaked to match at a later stage in the conversion pipeline. This should ease issues of expressions not matching properly. Finally, at Kovid’s request I’ve added some info about header / footer regexes and converting TXT and PDF files to the documentation.

Tags: , , , , , , , .

    Comments Off


* Calibre Week in Review

Posted on October 11th, 2009 by John. Filed under calibre.


I haven’t had one of these for quite some time. I’ve been working on other projects and on the calibre font I’ve only dealing with small bug fixes. However, this past week I’ve done a bit of work that is worth mentioning.

I’ve cleaned up the FB2 output. It fixes some invalid markup. Fixes some issues with text not being displayed by FBReader. It also fixes some issues with invalid characters making there way into hrefs.

eReader PDB output also got some love. Some kind people have been working on the reverse engineering of the file format and have filled in a number of the blanks I left. All of the additional information that has been discovered has been added to the files produced. The two main things that have been added are chapter and link indexes. The chapter indexes give the nice names at the top of the eReader viewer application. The link index allows links to work in the eReader viewer application.

To coincide with the eReader PDB output changes, PML input and output had some cleanup. It looks better now and replaces unicode characters with the \UXXXX equivalent.

Tags: , , , , .

    Comments Off


* Calibre Week in Review

Posted on May 24th, 2009 by John. Filed under calibre.


A lot of work went into eReader and PML to have it supported better. Also, a new format has been added.

The XHTML to PML parser has been completely rewritten. It is based on the XHTML to FB2 parser I wrote for FB2 output. It produces much better looking PML markup and the displayed output looks very close to the original XHTML source. One major advantage of the new parser is that it accounts for XHTML style information and translates that into PML tags. For example if text is set to bold by CSS then the text will get the bold PML tag.

eReader also got another very important addition. Support for Makebook (202 bye header) file input. Makebook and Dropbook are the two applications provided by eReader (the company) for producing eReader files. Makebook is the older application that is no longer supported. Makebook and Dropbook produce very different record 0 headers. This header has information about where the text, images and other things contained in the file are located. It took a while but I’ve been able to understand enough of the Makebook header to add input support for these files.

Makebook produces a 202 byte header while Dropbook produces a 132 byte header. After comparing header values and section sizes I was able to determine that the 2 byte int at offset 0×08 contained the start of the non-text offset. Just like the 132 byte header files, everything before this offset is text.

Images in the 202 byte header files were easy to find because they are in the same format as the Dropbook produced files. However, I didn’t bother to determine if there was a header value. Since all images are in PNG format and the their section start with the text PNG, I simply loop though all non-text sections and see if they start with PNG. If they do I know it’s an image and extract it.

The hardest part of the 202 byte header files was the text itself. Even though I knew which sections contained the text I didn’t know how it was compressed. This is where Google came to the rescue. On the homepage for the Z-DOC PalmPilot application I found there was some work to reverse engineer this older format. This page gave me the information I was looking for. Text is PalmDoc compressed and then xored with 0xA5. It looks like this xor is an attempt to obfuscate the compression used to make it harder to decompress. It isn’t for copy protection because the Makebook application only produces non-DRM files. DRM eReader files from that time would be created in a different manner.

Syncing news now supports auto convert in the GUI. It’s just like auto convert with sending email and sending an eBook to a device. If the book is not in a format supported by the device it will be auto converted to a supported format based on user preference.

The final bit of work this week was support for the RocketBook (RB) format. Both input and output are working. Though they both do need testing. Output in particular as I don’t have a device that supports these files so I can only guess based on my input code that the RB files produced are 100% correct. If someone has a device that reads RB files please let me know if the output files are read correctly.

Tags: , , , , .

    Comments Off


* Calibre Week in Review

Posted on May 2nd, 2009 by John. Filed under calibre.


It seems that PDF is becoming the never ending format for me. Maybe I should start naming the posts PDF Work instead of Calibre Week in Review…

One minor and one major change to PDF processing this week. The minor change was a fix for bug 2342. German umlauts are now displayed correctly in the output. The major change is PDF output now supports comics. cbz, cbr, cbc are some of the input formats for comics that are support and now you can turn them into a PDF. The huge advantage is for people (like me) who have a Cybook. A comic can be turned into one PDF file sized for the device keeping down the amount of clutter in the library view.

I also worked on the device framework and have pluginized all of the device interfaces (I like the term interfaces better than drivers because it reduces confusion as Windows device drivers are very different). They also sport a new configuration system (though they didn’t have configuration before at all). The user will be able to specify their preferred format order for sending to the device. As well as disable certain formats from being sent to the device at all. I said will because while the configuration code is done there is currently no way to call it in the preferences dialog. However, this will be rectified before 0.6 is released.

eReader output has been put on hold for the foreseeable future. eReader input is complete and working but due to the undocumented nature of the eReader format I have not been able to produce a working output plugin. The main issue I’ve run into is the eReader header (record 0 within the pdb container) is a 132 byte package with 66 sections. There are to many unknown sections. Even with the inspector script I wrote to see what the values are in working eReader files I have not been able to understand how all of the sections interact with the file itself. My guesses have all resulted in files that are not readable by the eReader Pro software.

eReader files uses the PML markup language and while I couldn’t get eReader output working I have added support for PML input and PML output. The PML output can be taken and put into either MakeBook or DropBook to produce a working eReader file.

Two things to note about the the PML support is input can take either a straight .pml file or it can take a zip archive filled with .pml files and PNG images (the images must be in PNG format). The zip archive must have the extension changed to .pmlz for this to work. PML output will produce a zip archive with the extension .pmlz. Within this archive will be all of the image files in PNG format and the produced .pml files.

.pmlz is simply an easy way to group the files and ensure that there is not issues with including missing files or not being able to find referenced files.

Tags: , , , , , .

    Comments Off