Posts Tagged ‘ebook’
* Calibre Week in Review
Posted on April 18th, 2009 by John. Filed under calibre.
This has been a busy week for me on the Calibre front. All of my changes were to pluginize and the first three I talk about also made it into trunk and will be appearing in the next release.
I re-worked the mobi metadata reader so that it does not read the entire file into memory. It only reads the parts of the file that hold the metadata. The advantage is reading the metadata is now about five times faster. These results are from unscientific testing by a the bug reporter. Basically he said that listing the books on his Kindle went from 5 minutes to about 1 minute.
The metadata writer for pdf files has been re-worked and is now enabled. Kovid did some work to my initial work so that it won’t lock up the GUI when working with large pdf files.
I (a bit of help from Kovid on this too) was able to fix bug 2112 (last few pdf files held open). Calibre relies on Python’s garbage collector and object scope for closing files. It does not explicitly close them. The bug as caused by pyPdf which is a Python library Calibre uses to read and write pdfs. For some reason pyPdf’s file reader wan’t allowing the files to be closed. They were no longer in use and the object went out of scope but the garbage collector didn’t close the file immediately. It would close it eventually. A wrapper object was created and is used so that pyPdf doesn’t have a direct reference to the open file and it now gets closed properly.
The GUI in the releases only supports displaying one storage card from a device. Not all device support two storage cards but the Sony PRS devices do. Support for the GUI to display two storage cards has been added.
To go along with the GUI supporting displaying two storage cards, Almost all device drivers have been made to support up to two storage cards. The USBMS base class supports two cards and as most device drivers use this base they all get support for it without much work. However, this doesn’t mean that a device that doesn’t physically a storage card or two storage card slots won’t magically support two cards. All except the PRS drivers don’t have any user visible changes. For anyone looking to write a device driver using USBMS if the device supports two cards USBMS has you covered.
The PRS505 and PRS700 drivers both received the two card treatment. They also received a bit of work. They have been moved to use the USBMS base class. This removed a lot of redundant code and puts them on the same code path as the other (except PRS500) drivers. Overall this change is to reduce work in finding and fixing bugs and maintenance.
Internal work on the PRS505 and PRS700 drives wasn’t all I did to them. They no longer dump all books into a single directory. Books are stored in author/title/book hierarchy. News items are stored in a news/title hierarchy. They also support the USBMS / tag as a custom layout path.
Earlier I said almost all device driver got two storage card support. The PRS500 driver did not. It still only supports one storage card. Due to the way the driver works I will not be touching it.
I’ve been working with ldolse from mobileread and with his help the processing rules for pdftohtml (used for pdf input) have been improved.
* Calibre work
Posted on March 21st, 2009 by John. Filed under programming.
I’ve been working on a few new features for Calibre recently. They will appear once pluginize is turned into trunk. All of the recent features that I though were going to be part of the 0.5 release are currently in pluginize. I was under the impression that pluginize was going to be released as 0.5. Looks like the pluginization is taking a bit more time than expected. I don’t know when any of these changes will appear but I know that they will eventually hit trunk.
One of the two recent features I’ve been working on is a plain text output converter. Just like mobi, epub and what not you give it a supported ebook format and it will output the book as a plain text file.
The other feature I’ve been working on is much more interesting and useful. It is auto-convert for the GUI. Many music managers will auto-convert a music file that isn’t supported by the device into a format that is supported by the device than transfer the supported file. This is what has been added to Calibre. No longer will you get a not supported format error when sending an ebook to the device that isn’t in a format supported by the device. Calibre will automatically convert the ebook into a format supported by the device and transfer the supported format instead.
* Printing Support and PDF Conversion in Calibre 0.5
Posted on February 20th, 2009 by John. Filed under programming.
Support for printing will in fact be coming in version 0.5. I’ve committed (to my branch) a new printing framework that fixes the bugs in the old one. Also, I’ve committed some PDF changes. There is now an any2pdf app which will allow for conversion of any supported ebook format into the PDF format. PDF files will also have the first page used as the cover image within Calibre’s library.
There are a few things to know about the printing support. It’s only in the ebook-viewer application. It uses a print style so what is printing may not look exactly the same as it does in the viewer. Page breaks are not honoured, this is to save paper. Javascript is currently being stripped from the book. This should only have an impact on ePub books that utilize Javascript (I don’t know of any that this will be an issue).
If you want a perfect printed representation of an ebook as it is viewed in the ebook-viewer application use any2pdf to convert the ebook into a PDF file.
Just like with the last implementation for printing this could change at some point. However, this version has been deemed good enough for inclusion.
* Cybook t2b Format Specification
Posted on January 19th, 2009 by John. Filed under programming.
The Cybook Gen 3 uses it’s own image file for thumbnails. The t2b file is generated by the device based upon the image found in the ebook. If there is no image a default one with the file name written across it is created.
One of the feature request for Cybook support in Calibre is for the t2b thumbnail files to be generated on the computer and moved to the device. This is much faster than having the Cybook generate the thumbnail itself.
The t2b file used by the Cybook is a 2-bit image. Meaning 2 bits represent 1 pixel. 2 bits can have a total of four combinations giving the image a total of four colours. There is no header or footer. The bits representing 0, 1, 2, 3 are written directly to the file. The image’s dimensions are 96×144.
Every t2b file will have 13,824 pixels. The file size will always be 3,456 bytes. The formula to determine this is: (height x width x 2 bits per pixel) / 8 bits per byte. (96 * 144 * 2) / 8 = 3,456.
Following are two python scripts for converting an image to a t2b file and for converting a t2b file into a pgm image.
image2t2b.py
#!/usr/bin/env python import sys, Image def reduce_color(c): if c 64 and c 128 and c > y) & 1) for y in range(1, -1, -1)]) def main(): if len(sys.argv) != 3: raise Exception('Must have 2 arguments. %s input.image output.t2b' % sys.argv[0]) outf = open(sys.argv[2], 'wb') im = Image.open(sys.argv[1]).convert("L") im.thumbnail((96, 144)) newim = Image.new('L', (96, 144), 'white') x,y = im.size newim.paste(im, ((96-x)/2, (144-y)/2)) px = [] pxs = newim.getdata() for i in range(len(pxs)): px.append(pxs[i]) if len(px) >= 4: binstr = i2b(reduce_color(px[0])) + i2b(reduce_color(px[1])) + i2b(reduce_color(px[2])) + i2b(reduce_color(px[3])) outf.write(chr(int(binstr, 2))) px = [] outf.close() if __name__ == '__main__': main()
t2b2pgm
#!/usr/bin/env python import sys, os def get_greys(b): b.zfill(8) b = "".join([str((ord(b) >> y) & 1) for y in range(7, -1, -1)]) w = str(int(b[0:2],2)) x = str(int(b[2:4],2)) y = str(int(b[4:6],2)) z = str(int(b[6:8],2)) return [w, x, y, z] def main(): if len(sys.argv) != 3: raise Exception('Must have 2 arguments. %s input.t2b output.pgm' % sys.argv[0]) t2bfile = open(sys.argv[1], 'rb') pgmfile = open(sys.argv[2], 'w') pgmfile.write('P2\n96 144\n3\n') for i in range(144): for j in range(24): b = t2bfile.read(1) if b != '': vals = get_greys(b) pgmfile.write('%s %s %s %s ' % (vals[0], vals[1], vals[2], vals[3])) pgmfile.write('\n') pgmfile.close() t2bfile.close() if __name__ == '__main__': main()
* Calibre and Cybook
Posted on January 17th, 2009 by John. Filed under programming.
The usbms driver is mostly done. Everything is implemented and it works. Testing is all that’s needed now. The Cybook is working and Kindle support should be committed soon.
* Calibre and New Driver Code
Posted on January 7th, 2009 by John. Filed under programming.
This Christmas Santa was very nice to me. He gave me a Cybook Gen 3 ebook reader. I’ve really been needing a new one since Tati has been monopolizing (this isn’t a bad thing) the Sony Prs505 we already had. Sadly the ebook management application I often use (Calibre) didn’t support the Cybook.
Calibre is an open source application and is written in Python. Both of these aspects are major benefits for the project. Being open source I was able to look at the code and create a driver for the Cybook. It has since been merged and released.
While looking over the driver code I noticed a lot of redundancy between drivers. After talking to the author a bit I started work to refactor the driver code into resuable pieces. My initial Cybook driver has been refactored itself to use the new USBMS device class. This work has also been accepted into the project. The driver still isn’t complete and with the next release will hopefully work better for Windows users. With any luck it will be fully working on all platforms very soon (currently Linux is the best supported).
I’ve learned a few things from this ongoing experience. I’ve learned about bzr and launchpad. I’ve also gotten a better understanding of Python. This experience has reaffirmed the value of open source projects in my eyes. If this was not open source I would not have bothered and written it off like the Sony Connect Software or Adobe’s Digital Editions.
The current state of the Cybook driver is, it works on Linux. Windows should work with the latest trunk. OS X support is still forth coming but I have the necessary information to get started on it. The USBMS device module is designed to make it easier to create drivers for USB mass storage devices that work on files without some sort of database. Once the Cybook is working the next steps are the Kindle and porting all other existing drivers over to the USBMS classes.
* Multiple Input with Single Input Apps
Posted on December 24th, 2008 by John. Filed under programming.
The few eBook formating tools I’ve posted all share one major flaw. They can only handle a single file as input. This is a problem when you want to run it on all or a number of different eBooks. If you are able to use Bash there is a simple way to run any single input command with multiple items.
This first method gets input from a single directory. This method will not go into subdirectories for input.
$ cd ~/Books/author $ ls -1 *.txt | while read file; do echo $file; ./fix_paragraphs_ebook_txt "$file"; ./remove_extra_whitespace_ebook_txt "$file"; done
This second method gets all files matching the given pattern and will handle subdirectories.
$ find ~/books/ -iname "*.txt" | while read file; do echo "$file"; dos2unix -ad "$file"; ./fix_end_ebook_txt "$file"; done
* Building the eBook Tools
Posted on December 23rd, 2008 by John. Filed under programming.
It’s come to my attention that while I’ve posted a few eBook formating tools I wrote and use I never posted how to build them. Since I’m using Qt the easiest way to build them is to use qmake and make.
The build process is simple. Create a pro file for the project say fix_end_ebook_txt.pro. Run qmake then run make. You will end up with an executable. Just remember that this requires Qt, make, and a C++ complier (g++ on *nix or mingw on Windows).
fix_end_ebook_txt.pro
SOURCES += fix_end_ebook_txt.cpp CONFIG += qt TARGET = fix_end_ebook_txt
The above pro is very minimal and can be further tuned for the specific project but at the very least it shows how to build the Qt eBook tools I’ve posted.
* eBook Adding Empty Lines At End of File
Posted on December 22nd, 2008 by John. Filed under programming.
Continuing my work to clean up my eBooks I’ve written another little tool to help. I like for my eBooks to have two blank lines at the end of the file.
The only major caveat of this one is it assumes Unix end of lines. Meaning a single \n character. In order for this to work correctly use of the dos2unix tool is necessary for files that use a different new line format.
fix_end_ebook_txt.cpp
/* Copyright (c) 2008 John Schember Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ /* Ensures that there are 3 newline characters at the end of the file (two blank lines after the last of the text). This assumes Unix \n line characters. Please use dos2unix before running to ensure that the end of line characters are correct. */ #include <QFile> #include <QString> #include <QTextStream> int main(int argc, char **argv) { // Stream to write errors to the console. QTextStream errStream(stderr); // Store for the contents of the ebook. QString content; // We need an ebook file to work on. if (argc != 2) { errStream << QObject::tr("Error: No input file") << endl; return 1; } QFile ebook(argv[1]); if (!ebook.open(QIODevice::ReadWrite | QIODevice::Text)) { errStream << QObject::tr("Error: Could not open") << endl; return 1; } // We use a QTextStream to actually work on the file. QTextStream ioStream(&ebook); // We want to see what the last 3 characters are at the end of the file. ioStream.seek(ebook.size() - 3); content = ioStream.read(3); // Move to the end of the file because we want to add newlines (\n's) to // the end. ioStream.seek(ebook.size()); // We want 3 newline (\n) characters at the end of the file. Add them until // they total 3. for (int i = 0; i < (3 - content.count("\n")); i++) { ioStream << "\n" << flush; } ebook.close(); return 0; }
* eBook Paragraph Formating
Posted on December 21st, 2008 by John. Filed under programming.
Today I wrote two simple programs to help me clean up my ebooks. I prefer to keep my ebook collection as plain text files with paragraphs separated by a blank line. The first program reflows the paragraphs to put each on a single line. The second removes extraneous whitespace from the file.
The reflow is the more intensive of the two. I ran it on the largest ebook I have, Project Gutenberg’s War and Peace by Leo Tolstoy. The file is 3.1 MB.
Time to run: 7m35.494s.
Memory usage: 13.1 MB according to gnome-system-monitor.
Right now I’m loading the entire book into memory and using QStrings to work on it. Memory usage is about 4.5 x the size of the book. Thankfully plain text ebooks are fairly small. Later I’m going to look into optimizing it for size and hopefully speed.
Without further ado here are the two. They are MIT licensed and use the Qt tool kit.
fix_paragraphs_ebook_txt.cpp
/* Copyright (c) 2008 John Schember Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ /* Reflows txt file ebook paragraphs. Paragraphs should be separated by a blank line. Takes paragraphs that have hard breaks and puts all lines onto a single line. For Example: INPUT This is a multi line paragraph. It comprises a few lines but has hard breaks. Now for the second borken apart paragraph. OUTPUT This is a multi line paragraph. It comprises a few lines but has hard breaks. Now for the second broken apart paragraph. */ #include <QFile> #include <QRegExp> #include <QString> #include <QTextStream> int main(int argc, char** argv) { // Stream to write errors to the console. QTextStream errStream(stderr); // Regular expression to search for broken paragraphs. Works by looking // for char newline char. A proper ebook should have paragraphs separated // by a blank line meaning char newline newline char. QRegExp re("[^\n]\n[^\n]"); // Store for the contents of the ebook. QString content; // We need an ebook file to work on. if (argc != 2) { errStream << QObject::tr("Error: No input file") << endl; return 1; } QFile ebook(argv[1]); if (!ebook.open(QIODevice::ReadWrite | QIODevice::Text)) { errStream << QObject::tr("Error: Could not open") << endl; return 1; } // We use a QTextStream to actually work on the file. QTextStream ioStream(&ebook); // Read the entire file contents into memory. content = ioStream.readAll(); while (content.contains(re)) { // Remove the newline when there is a match with the regular expression. content = content.replace(content.indexOf(re)+1, 1, " "); } // Truncate the ebook so we don't end up with the original contents after // our modified contents. if (!ebook.resize(0)) { errStream << QObject::tr("Error: Could not truncate file") << endl; return 1; } // Store the modified content back on disk. ioStream.seek(0); ioStream << content; ebook.close(); return 0; }
remove_extra_whitespace_ebook_txt.cpp
/* Copyright (c) 2008 John Schember Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ /* Removes extraneous whitespace in a txt file ebook. This will remove every '\t', '\v', '\f', '\r', and will replace multiple occurrences ' ' with a single one. For Example: INPUT This is a bad line. Now for the second borken line. OUTPUT This is a bad line. Now for the second borken line. */ #include #include #include int main(int argc, char **argv) { // Stream to write errors to the console. QTextStream errStream(stderr); // Store for the contents of the ebook. QString content; // We need an ebook file to work on. if (argc != 2) { errStream << QObject::tr("Error: No input file") << endl; return 1; } QFile ebook(argv[1]); if (!ebook.open(QIODevice::ReadWrite | QIODevice::Text)) { errStream << QObject::tr("Error: Could not open") << endl; return 1; } // We use a QTextStream to actually work on the file. QTextStream ioStream(&ebook); // Read every line and remove the extras we don't want. while (!ioStream.atEnd()) { content += ioStream.readLine().simplified() + "\n"; } // Truncate the ebook so we don't end up with the original contents after // our modified contents. if (!ebook.resize(0)) { errStream << QObject::tr("Error: Could not truncate file") << endl; return 1; } // Store the modified content back on disk. ioStream.seek(0); ioStream << content; ebook.close(); return 0; }
Tags
Archives
- July 2010 (4)
- June 2010 (1)
- May 2010 (2)
- March 2010 (1)
- January 2010 (8)
- December 2009 (5)
- November 2009 (6)
- October 2009 (4)
- September 2009 (2)
- August 2009 (6)
- July 2009 (6)
- June 2009 (4)
- May 2009 (6)
- April 2009 (4)
- March 2009 (2)
- February 2009 (4)
- January 2009 (4)
- December 2008 (7)
- November 2008 (2)