Posts Tagged ‘python’
* Changing Single Quotation Marks to Double in eBooks
Posted on May 12th, 2009 by John. Filed under programming.
As a person living in the USA I highly prefer double quotation marks to single quotes when denoting speech. Some authors use the single quote for effect but mostly it’s just a style choice. I find UK authors generally use the two interchangeably. Tolkien books are a good example. I have The Hobbit, The Lord of the Rings, and The Children of Hurin. The Hobbit use double quotes while the other two uses single quotes.
Following is some simple python code that will take the book (named th.txt) and change the single quotes into double quotes for the books in question. Both use ’ and ’ for the opening and closing quotes. Also ’ is used for contractions. The regexes take the opening and closing characters into account as well as change the contractions to the non-unicode ‘ character.
>>> th = open('th.txt', 'rb+wb') >>> th_t = th.read() >>> th_t = re.sub('(?u)(?>> th_t = re.sub('(?u)‘', '"', th_t) >>> th_t = re.sub('(?u)’', '"', th_t) >>> th.seek(0) >>> th.truncate(0) >>> th.write(th_t)
I do realize that the listed regexes could be combined a bit especially the opening and closing quotes. However, that would reduce their readability.
* Kindle Detection Bug in Calibre
Posted on January 31st, 2009 by John. Filed under programming.
The Kindle has been very tricky to get working in Calibre. However, it is finally working correctly (at least it should be).
The latest issue with the Kindle driver was in the usbms core. The main memory location was never being set. On Windows the driver loops over all attached drives and matches the drive to the internal or card memory. At the end of the loop before it starts over there is a check to see if the main and card memory have been matched. The idea being if both have been found there is no reason to continue looping over the drives. The check would then quit the loop. The issue was this check. It would only quit the loop when the card memory was found (the loop would still quit once all the drives had been checked). With the Kindle the card memory is reported before the main memory so the loop would always quit before it find the main memory. This isn’t the case with the Cybook which is why there wasn’t an issue even though it’s the same code path.
* Cybook t2b Format Specification
Posted on January 19th, 2009 by John. Filed under programming.
The Cybook Gen 3 uses it’s own image file for thumbnails. The t2b file is generated by the device based upon the image found in the ebook. If there is no image a default one with the file name written across it is created.
One of the feature request for Cybook support in Calibre is for the t2b thumbnail files to be generated on the computer and moved to the device. This is much faster than having the Cybook generate the thumbnail itself.
The t2b file used by the Cybook is a 2-bit image. Meaning 2 bits represent 1 pixel. 2 bits can have a total of four combinations giving the image a total of four colours. There is no header or footer. The bits representing 0, 1, 2, 3 are written directly to the file. The image’s dimensions are 96×144.
Every t2b file will have 13,824 pixels. The file size will always be 3,456 bytes. The formula to determine this is: (height x width x 2 bits per pixel) / 8 bits per byte. (96 * 144 * 2) / 8 = 3,456.
Following are two python scripts for converting an image to a t2b file and for converting a t2b file into a pgm image.
image2t2b.py
#!/usr/bin/env python import sys, Image def reduce_color(c): if c 64 and c 128 and c > y) & 1) for y in range(1, -1, -1)]) def main(): if len(sys.argv) != 3: raise Exception('Must have 2 arguments. %s input.image output.t2b' % sys.argv[0]) outf = open(sys.argv[2], 'wb') im = Image.open(sys.argv[1]).convert("L") im.thumbnail((96, 144)) newim = Image.new('L', (96, 144), 'white') x,y = im.size newim.paste(im, ((96-x)/2, (144-y)/2)) px = [] pxs = newim.getdata() for i in range(len(pxs)): px.append(pxs[i]) if len(px) >= 4: binstr = i2b(reduce_color(px[0])) + i2b(reduce_color(px[1])) + i2b(reduce_color(px[2])) + i2b(reduce_color(px[3])) outf.write(chr(int(binstr, 2))) px = [] outf.close() if __name__ == '__main__': main()
t2b2pgm
#!/usr/bin/env python import sys, os def get_greys(b): b.zfill(8) b = "".join([str((ord(b) >> y) & 1) for y in range(7, -1, -1)]) w = str(int(b[0:2],2)) x = str(int(b[2:4],2)) y = str(int(b[4:6],2)) z = str(int(b[6:8],2)) return [w, x, y, z] def main(): if len(sys.argv) != 3: raise Exception('Must have 2 arguments. %s input.t2b output.pgm' % sys.argv[0]) t2bfile = open(sys.argv[1], 'rb') pgmfile = open(sys.argv[2], 'w') pgmfile.write('P2\n96 144\n3\n') for i in range(144): for j in range(24): b = t2bfile.read(1) if b != '': vals = get_greys(b) pgmfile.write('%s %s %s %s ' % (vals[0], vals[1], vals[2], vals[3])) pgmfile.write('\n') pgmfile.close() t2bfile.close() if __name__ == '__main__': main()
* Calibre and New Driver Code
Posted on January 7th, 2009 by John. Filed under programming.
This Christmas Santa was very nice to me. He gave me a Cybook Gen 3 ebook reader. I’ve really been needing a new one since Tati has been monopolizing (this isn’t a bad thing) the Sony Prs505 we already had. Sadly the ebook management application I often use (Calibre) didn’t support the Cybook.
Calibre is an open source application and is written in Python. Both of these aspects are major benefits for the project. Being open source I was able to look at the code and create a driver for the Cybook. It has since been merged and released.
While looking over the driver code I noticed a lot of redundancy between drivers. After talking to the author a bit I started work to refactor the driver code into resuable pieces. My initial Cybook driver has been refactored itself to use the new USBMS device class. This work has also been accepted into the project. The driver still isn’t complete and with the next release will hopefully work better for Windows users. With any luck it will be fully working on all platforms very soon (currently Linux is the best supported).
I’ve learned a few things from this ongoing experience. I’ve learned about bzr and launchpad. I’ve also gotten a better understanding of Python. This experience has reaffirmed the value of open source projects in my eyes. If this was not open source I would not have bothered and written it off like the Sony Connect Software or Adobe’s Digital Editions.
The current state of the Cybook driver is, it works on Linux. Windows should work with the latest trunk. OS X support is still forth coming but I have the necessary information to get started on it. The USBMS device module is designed to make it easier to create drivers for USB mass storage devices that work on files without some sort of database. Once the Cybook is working the next steps are the Kindle and porting all other existing drivers over to the USBMS classes.
* json and blogger
Posted on December 14th, 2008 by John. Filed under programming.
Today I decided to learn about json. To help me with this I coded a little python script I call blogger-updates.py. It takes the name of a blogger blog and optionally a number designating the number of entries to reterieve. I used Google’s blogger api to get the data.
*** Updated to account for non numeric input when setting max entries.
import simplejson import sys import urllib2 def usage(): print sys.argv[0], 'blogname [max-results]' print ' Gets blog updates from blogger.com' if len(sys.argv) 3: usage() sys.exit(2) try: blogname = sys.argv[1] except: print "Sorry:", sys.exec_type, ":", sys.exec_value sys.exit(1) max_results = 5 if len(sys.argv) is 3: try: max_results = int(sys.argv[2]) except: pass try: json_data = simplejson.load(urllib2.urlopen('http://%s.blogspot.com/feeds/posts/default?alt=json&orderby=published&sortorder=ascending&max-results=%i' % (blogname, max_results))) except: print "Sorry:", sys.exc_type, ":", sys.exc_value sys.exit(1) for entry in json_data['feed']['entry']: print 'Title: %s' % (entry['title']['$t']) print 'Author: %s' % (entry['author'][0]['name']['$t']) print 'Published: %s' % (entry['published']['$t']) print 'Content: %s' % (entry['content']['$t']) print ''
Tags
Archives
- July 2010 (4)
- June 2010 (1)
- May 2010 (2)
- March 2010 (1)
- January 2010 (8)
- December 2009 (5)
- November 2009 (6)
- October 2009 (4)
- September 2009 (2)
- August 2009 (6)
- July 2009 (6)
- June 2009 (4)
- May 2009 (6)
- April 2009 (4)
- March 2009 (2)
- February 2009 (4)
- January 2009 (4)
- December 2008 (7)
- November 2008 (2)