* ASCIIize Text
Posted on July 24th, 2009 by John. Filed under programming.
One pet peeve of I have with my Cybook Gen 3 is its inability to properly display unicode characters in plain text files. I don’t need anything fancy like Japanese characters just simple things like “ and ” (as opposed to ” and “). To solve this problem I’ve been thinking about adding an –asciize option to calibre. I say thinking because I didn’t really know where to start. Thankfully a user recently requested this very functionality in bug #2846. He even included a link to work to accomplish this very task.
I will be integrating transliteration of unicode to ascii into calibre soon. However, in the mean time here is a script and classes, see unidecoder for a better method, to accomplish this task outside of calibre. This is my python port of the ruby unidecode gem. Which is a port of the original perl Text::Unidecode.
The major differences between my implementation and the others is it’s written in python and it uses a single dictionary instead of loading the code group files as needed.
You can find out more on how this all works at http://interglacial.com/~sburke/tpj/as_html/tpj22.html
Tags
Archives
- January 2012 (3)
- December 2011 (2)
- November 2011 (1)
- October 2011 (3)
- September 2011 (9)
- August 2011 (15)
- July 2011 (5)
- June 2011 (3)
- May 2011 (4)
- April 2011 (2)
- March 2011 (2)
- February 2011 (4)
- January 2011 (4)
- December 2010 (2)
- November 2010 (1)
- October 2010 (1)
- August 2010 (3)
- July 2010 (4)
- June 2010 (1)
- May 2010 (2)
- March 2010 (1)
- January 2010 (8)
- December 2009 (5)
- November 2009 (6)
- October 2009 (4)
- September 2009 (2)
- August 2009 (6)
- July 2009 (6)
- June 2009 (4)
- May 2009 (6)
- April 2009 (4)
- March 2009 (2)
- February 2009 (4)
- January 2009 (4)
- December 2008 (7)
- November 2008 (2)