* Calibre Week in Review
Posted on November 22nd, 2009 by John. Filed under calibre.
PML input had some major changes this week. Thank the user WayneD for helping me out and getting me to actually do the work I’ve been putting off since I introduced PML/eReader as an input format.
There is now a metadata reader for PML and PMLZ. WayneD provided me with a set of regular expressions that can extract the metadata from a metatdata comment within a PML document. I took those regexes and created a metatdata plugin that supports both straight PML files as well as the PMLZ archive file.
The other major change to PML is, I’ve re-written the input parser. It is not longer based on a set of regular expressions. It is now a line oriented simple state machine. When I created the regex parser I intended to replace it at some point in the future with a true parser. The regex based one was simply a quick and dirty way to get PML supported. The new parser is much faster, produces cleaner and more accurate HTML output. It also has the added benefit of reading \CX codes and turns them into table of contents entries for PML and PMLZ input. The new parser is much better and I’m not completely finished with it. I still need to add support for \v comments (they are currently removed), \n codes, and implement font attribute tracking to condense changes (this is how \n will be handled).
WayneD did provide me with his Perl based line oriented simple state machine for PML to HTML conversion. I did use one idea from it. Turning footnote and sidebar xml syntax into custom PML tags. I had intended to port his parser to python and use it as a base but when I started looking at it I remembered I don’t know Perl at all and I can’t make heads or tails of Perl code. I have no desire or need to actually learn Perl, so I ended up writing my own parser.
Leave a Reply
Tags
Archives
- July 2010 (4)
- June 2010 (1)
- May 2010 (2)
- March 2010 (1)
- January 2010 (8)
- December 2009 (5)
- November 2009 (6)
- October 2009 (4)
- September 2009 (2)
- August 2009 (6)
- July 2009 (6)
- June 2009 (4)
- May 2009 (6)
- April 2009 (4)
- March 2009 (2)
- February 2009 (4)
- January 2009 (4)
- December 2008 (7)
- November 2008 (2)