Archive for the ‘calibre’ Category

* Calibre Week in Review

Posted on November 22nd, 2009 by John. Filed under calibre.


PML input had some major changes this week. Thank the user WayneD for helping me out and getting me to actually do the work I’ve been putting off since I introduced PML/eReader as an input format.

There is now a metadata reader for PML and PMLZ. WayneD provided me with a set of regular expressions that can extract the metadata from a metatdata comment within a PML document. I took those regexes and created a metatdata plugin that supports both straight PML files as well as the PMLZ archive file.

The other major change to PML is, I’ve re-written the input parser. It is not longer based on a set of regular expressions. It is now a line oriented simple state machine. When I created the regex parser I intended to replace it at some point in the future with a true parser. The regex based one was simply a quick and dirty way to get PML supported. The new parser is much faster, produces cleaner and more accurate HTML output. It also has the added benefit of reading \CX codes and turns them into table of contents entries for PML and PMLZ input. The new parser is much better and I’m not completely finished with it. I still need to add support for \v comments (they are currently removed), \n codes, and implement font attribute tracking to condense changes (this is how \n will be handled).

WayneD did provide me with his Perl based line oriented simple state machine for PML to HTML conversion. I did use one idea from it. Turning footnote and sidebar xml syntax into custom PML tags. I had intended to port his parser to python and use it as a base but when I started looking at it I remembered I don’t know Perl at all and I can’t make heads or tails of Perl code. I have no desire or need to actually learn Perl, so I ended up writing my own parser.

Tags: , , , .



* Calibre Week in Review

Posted on October 26th, 2009 by John. Filed under calibre.


Mostly bug fixes this week. The majority of them were centered around eReader PDB output and PML generation. eReader PDB output now marks the first image as the cover image if a cover image is not explicitly set. PMLZ got images named properly in the output. PML generation now has .png added to the end of image names. I also fixed a bug where excessive new lines were not being properly removed. PML, TXT, RB, FB2 output all got excessive space removal tones down so instances were spaces were completely removed will stop happening. Regex header and footer matching was tweaked to match at a later stage in the conversion pipeline. This should ease issues of expressions not matching properly. Finally, at Kovid’s request I’ve added some info about header / footer regexes and converting TXT and PDF files to the documentation.

Tags: , , , , , , , .



* Calibre Week in Review

Posted on October 19th, 2009 by John. Filed under calibre.


Like every week there were miscellaneous bug fixes. However, this week I did a bit more. TCR input and output. Do be warned that the output supports multiple compression levels; the higher levels being slower than the lower. For instance a 200K TXT file as input will take around 25 seconds on the lowest level and 3.5 minutes at the highest.

TCR is an compressed text format used mainly by the Psion 3 and 5 series PDAs that were produced in the 90s. The compression used by TCR files is very interesting. It doesn’t have as high a compression ratio as say zlib but that is a trade off for being decompressable starting at any point in the stream. The history and more information about the format can be found at Andrew Giddings’ TCR page.

Tags: , , .



* Calibre Week in Review

Posted on October 11th, 2009 by John. Filed under calibre.


I haven’t had one of these for quite some time. I’ve been working on other projects and on the calibre font I’ve only dealing with small bug fixes. However, this past week I’ve done a bit of work that is worth mentioning.

I’ve cleaned up the FB2 output. It fixes some invalid markup. Fixes some issues with text not being displayed by FBReader. It also fixes some issues with invalid characters making there way into hrefs.

eReader PDB output also got some love. Some kind people have been working on the reverse engineering of the file format and have filled in a number of the blanks I left. All of the additional information that has been discovered has been added to the files produced. The two main things that have been added are chapter and link indexes. The chapter indexes give the nice names at the top of the eReader viewer application. The link index allows links to work in the eReader viewer application.

To coincide with the eReader PDB output changes, PML input and output had some cleanup. It looks better now and replaces unicode characters with the \UXXXX equivalent.

Tags: , , , , .



* Calibre Two Weeks in Review

Posted on August 2nd, 2009 by John. Filed under calibre.


This time I missed last weeks week in review because I simply forgot. I’m hoping to keep this to a minimum in the future.

The big news is calibre 0.6 has been released. Kovid is now back to his regular (weekly at the least) bug fix releases too. As of now the latest version is 0.6.4 and I get the feeling that 0.6.5 is right around the corner. For a listing of what’s gone into the 0.6 release take a look here.

Some features I’ve been working on that are included in the current release are: asciiize text, and iRex Iliad support.

The asciiize feature is one I’ve been wanting for a while and I’ve finally implemented it. It’s based on the ASCIIize text post I made last week. There is now an –asciiize option for the ebook-convert command and an option for the conversion dialog in the GUI. The basic premise of this feature is unicode characters often are not displayed correctly by ebook readers. My Cybook Gen 3 exhibits this behavior. ASCIIize transliterates the unicode text to an ASCII representation. Meaning it takes “Михаил Горбачёв” and converts it to “Mikhail Gorbachiov”.

The iRex Iliad is now as supported as I can get it. calibre detects it as a device, displays the list of books on the device and you can send books to it using send to device. The part I’m running into issues with is the manifest.xml file. It looks like it’s similar to the Sony PRS’s media.xml file, meaning it is a quick store for metadata. However, I don’t really know what goes into this file or should I say files (there are more than one). It also doesn’t look like the device updates it in any way because I had a user send me the files off of their Iliad and they were empty even though the user has put 20 or so books on the device with the Mobipocket desktop software. On the bright side it works well enough that I don’t think anyone will notice.

On the GUI tweaks front (these won’t be in a release until some point in the future) I’ve added a history drop down to the search field. There is a swap button for authors and title in the metadata bulk dialog, and you can hide the toolbars in the ebook viewer.

The remainder of what I accomplished over the past two weeks was bug fixes and code refactoring. Just th boring stuff that I would rather put off versus actually doing it.

Tags: , , .



* Calibre Week in Review

Posted on July 19th, 2009 by John. Filed under calibre.


There was no week in review last week because I went on vacation this past week. So this week in review combines everything since the last week in review.

I’ve made a few bug fixes to some output formats, PDB metadata and FB2 output mainly. The major things I’ve been working on is a bit of restructuring for the GUI and fixing some small bugs.

The GUI has had the button in the status bar (jobs, tags, cover flow) moved to a side bar on the right hand side. The version information and device connected information has moved to the status bar. The donate button was moved to the side bar. The status bar is now collapsible. When collapsed it shows less information (the list of formats for the selected book). When expanded it shows the book info the same as it does currently. The location view that lists the library and the connected device is hidden when no device is connected. I’ve added an About button to the new sidebar that will show some information about calibre. Overall these changes have two major benefits. It makes the interface a lot more netbook friendly and makes the book table larger so more information can be seen.

These changes to the GUI will be part of the 0.6 series but I can’t say for certain if it will be included in the initial 0.6.0 release.

Tags: , , , .



* Calibre Week in Review

Posted on July 5th, 2009 by John. Filed under calibre.


This week has been a productive one. I’ve made a lot of small GUI enhancements and did some work on PDF input as well. All of these changes have not made it into trunk yet. This is mainly because Kovid has been away this week.

I’ve added auto complete to a number of the input control on the GUI. Authors, Publisher, and Tags all auto complete pretty much everywhere now. The Tags will even auto complete in the table view in the main window. However, Authors, Series and Publisher do not auto complete in the main windows as of yet.

I’ve also been working with the GUI’s search. ISBN, Rating, Cover fields are all included in the default search. They are also search field identifiers. Meaning you can do isbn:123 to search just isbn numbers. Searching for empty and filled fields has been implemented as well. Use field:false and field:true respectively.

PDF input, either is or last week, got the ability to specify an unwrap factor for unwrapping lines. Previously this was a fixed value. Now it can be changed by the user. I have some ideas to enhance this further but I’m not going to to into detail because they may not materialize. Use the option –unwrap-factor with a decimal value 0 – 1. It is used by the regular expression that determines the minimum line length required for unwrapping.

PDF input had another highly requested change. The ability to remove headers and footers. However, it’s not as user friendly as I would like. There are four new options in total. –remove-header, –remove-footer, –header-regex, and –footer-regex. If the the –remove-* options are used then a regular expression that can be customized by using –*-regex is used to match headers and footers. The header and footer matching happens before all other processing rules. Use the ebook-convert’s –debug-input option to see the HTML that the regex will be matched against.

$ ebook-convert input.pdf .epub --debug-input output_dir/

Tags: , , .



* Calibre Week in Review

Posted on June 27th, 2009 by John. Filed under calibre.


More bug fixes going into the release of 0.6. I don’t have much of a time frame for when it will be released. However, the betas are pretty stable. If you decide to upgrade to the beta make a copy of your metadata.db file. Upgrading to 0.6 will upgrade this file and prevent you from going back to 0.5.

Tags: .



* Calibre Week in Review

Posted on June 20th, 2009 by John. Filed under calibre.


The bulk of what I’ve done this week was the many bug fixes going into the Calibre beta.

The one other thing I worked on was bring image extraction back to PDF input. It works as well as the 0.5.x series now. Meaning it will handle simple cases but there are still some bugs.

Tags: .



* Calibre Week in Review

Posted on June 14th, 2009 by John. Filed under calibre.


The beta for 0.6 has been out for about a week now. I’ve been spending my time fixing bugs. Thanks to everyone on Mobile Read who is helping with testing.

Tags: .