Calibre Week in Review

TXT input got some more work. It now supports the Textile markup language. This can be used in place of Markdown. Textile is also supported by the new auto-detection in TXT input.

FB2 output had some more bug fixes. The cover image is now put inside of the coverpage element in the metadata header. This is per the FB2 spec. However, the calibre ebook-viewer does not currently display the cover image that is part of the metadata header. Calibre’s FB2 metadata reader will read the cover image.

PML input had a bug fixed dealing with the \t and \T tags. They are now handled properly and will indent the entire line. This had been somewhat fixed previously but the previous fix would only work when those tags would start and end the line.

At a user’s request I’ve reworked the Author’s fields thought the GUI. Authors are now auto completed using the & symbol just like tags are auto completed using a ,. This makes adding multiple authors much easier. This change was actually fairly large and a lot of work. I refactored the auto complete classes for tags into a generic set of auto completion classes. Then I reworked each author field to use the new classes.

All of the above changes have made it into trunk and are either in the current release (0.7.40) or will be in the next release (0.7.41). The following changes are still being finished and will need Kovid’s review before being merged into a release.

Lee Dolsen and I had worked on the TXT last week and our partnership continued this week. He had created a variety of heuristic processing functions a while back. The heuristics processing would be used when the –preprocess-html option was enabled. We’ve broken the –preprocess-html function has been broken into individual options:

  • –enable-heuristics
  • –markup-chapter-headings
  • –italicize-common-cases
  • –fix-indents
  • –html-unwrap-factor=HTML_UNWRAP_FACTOR
  • –unwrap-lines
  • –delete-blank-paragraphs
  • –format-scene-breaks
  • –dehyphenate
  • –renumber-headings

The majority of the heuristic code is his. I helped to make the infrastructure changes to accomodate the options on the command line and in the GUI. I also added the –italicize-common-cases as a heuristic function and removed it from only working in TXT Input. I also made the necessary changes to the conversion pipeline so the heuristics will run over all input types. Currently the –preprocess-html option does not run over EPUB input. Lee did all the work to change the heuristic code to work as individual options as well as adding some extras and cleaning up some existing parts.

While Lee was making most of the heuristics changes I took the time to rework the –remove-header and –remove-footer options. Those two as well as their related regular expression options have been removed. Instead I’ve created three sets of generic search and replace options. They are much more flexible and also not as miss leading about what they do. My hope is to eventually have a heuristic function for removing headers and footers that does not require regular expressions.