Archive for December, 2008
* Multiple Input with Single Input Apps
Posted on December 24th, 2008 by John. Filed under programming.
The few eBook formating tools I’ve posted all share one major flaw. They can only handle a single file as input. This is a problem when you want to run it on all or a number of different eBooks. If you are able to use Bash there is a simple way to run any single input command with multiple items.
This first method gets input from a single directory. This method will not go into subdirectories for input.
$ cd ~/Books/author $ ls -1 *.txt | while read file; do echo $file; ./fix_paragraphs_ebook_txt "$file"; ./remove_extra_whitespace_ebook_txt "$file"; done
This second method gets all files matching the given pattern and will handle subdirectories.
$ find ~/books/ -iname "*.txt" | while read file; do echo "$file"; dos2unix -ad "$file"; ./fix_end_ebook_txt "$file"; done
* Building the eBook Tools
Posted on December 23rd, 2008 by John. Filed under programming.
It’s come to my attention that while I’ve posted a few eBook formating tools I wrote and use I never posted how to build them. Since I’m using Qt the easiest way to build them is to use qmake and make.
The build process is simple. Create a pro file for the project say fix_end_ebook_txt.pro. Run qmake then run make. You will end up with an executable. Just remember that this requires Qt, make, and a C++ complier (g++ on *nix or mingw on Windows).
fix_end_ebook_txt.pro
SOURCES += fix_end_ebook_txt.cpp CONFIG += qt TARGET = fix_end_ebook_txt
The above pro is very minimal and can be further tuned for the specific project but at the very least it shows how to build the Qt eBook tools I’ve posted.
* eBook Adding Empty Lines At End of File
Posted on December 22nd, 2008 by John. Filed under programming.
Continuing my work to clean up my eBooks I’ve written another little tool to help. I like for my eBooks to have two blank lines at the end of the file.
The only major caveat of this one is it assumes Unix end of lines. Meaning a single \n character. In order for this to work correctly use of the dos2unix tool is necessary for files that use a different new line format.
fix_end_ebook_txt.cpp
/* Copyright (c) 2008 John Schember Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ /* Ensures that there are 3 newline characters at the end of the file (two blank lines after the last of the text). This assumes Unix \n line characters. Please use dos2unix before running to ensure that the end of line characters are correct. */ #include <QFile> #include <QString> #include <QTextStream> int main(int argc, char **argv) { // Stream to write errors to the console. QTextStream errStream(stderr); // Store for the contents of the ebook. QString content; // We need an ebook file to work on. if (argc != 2) { errStream << QObject::tr("Error: No input file") << endl; return 1; } QFile ebook(argv[1]); if (!ebook.open(QIODevice::ReadWrite | QIODevice::Text)) { errStream << QObject::tr("Error: Could not open") << endl; return 1; } // We use a QTextStream to actually work on the file. QTextStream ioStream(&ebook); // We want to see what the last 3 characters are at the end of the file. ioStream.seek(ebook.size() - 3); content = ioStream.read(3); // Move to the end of the file because we want to add newlines (\n's) to // the end. ioStream.seek(ebook.size()); // We want 3 newline (\n) characters at the end of the file. Add them until // they total 3. for (int i = 0; i < (3 - content.count("\n")); i++) { ioStream << "\n" << flush; } ebook.close(); return 0; }
* eBook Paragraph Formating
Posted on December 21st, 2008 by John. Filed under programming.
Today I wrote two simple programs to help me clean up my ebooks. I prefer to keep my ebook collection as plain text files with paragraphs separated by a blank line. The first program reflows the paragraphs to put each on a single line. The second removes extraneous whitespace from the file.
The reflow is the more intensive of the two. I ran it on the largest ebook I have, Project Gutenberg’s War and Peace by Leo Tolstoy. The file is 3.1 MB.
Time to run: 7m35.494s.
Memory usage: 13.1 MB according to gnome-system-monitor.
Right now I’m loading the entire book into memory and using QStrings to work on it. Memory usage is about 4.5 x the size of the book. Thankfully plain text ebooks are fairly small. Later I’m going to look into optimizing it for size and hopefully speed.
Without further ado here are the two. They are MIT licensed and use the Qt tool kit.
fix_paragraphs_ebook_txt.cpp
/* Copyright (c) 2008 John Schember Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ /* Reflows txt file ebook paragraphs. Paragraphs should be separated by a blank line. Takes paragraphs that have hard breaks and puts all lines onto a single line. For Example: INPUT This is a multi line paragraph. It comprises a few lines but has hard breaks. Now for the second borken apart paragraph. OUTPUT This is a multi line paragraph. It comprises a few lines but has hard breaks. Now for the second broken apart paragraph. */ #include <QFile> #include <QRegExp> #include <QString> #include <QTextStream> int main(int argc, char** argv) { // Stream to write errors to the console. QTextStream errStream(stderr); // Regular expression to search for broken paragraphs. Works by looking // for char newline char. A proper ebook should have paragraphs separated // by a blank line meaning char newline newline char. QRegExp re("[^\n]\n[^\n]"); // Store for the contents of the ebook. QString content; // We need an ebook file to work on. if (argc != 2) { errStream << QObject::tr("Error: No input file") << endl; return 1; } QFile ebook(argv[1]); if (!ebook.open(QIODevice::ReadWrite | QIODevice::Text)) { errStream << QObject::tr("Error: Could not open") << endl; return 1; } // We use a QTextStream to actually work on the file. QTextStream ioStream(&ebook); // Read the entire file contents into memory. content = ioStream.readAll(); while (content.contains(re)) { // Remove the newline when there is a match with the regular expression. content = content.replace(content.indexOf(re)+1, 1, " "); } // Truncate the ebook so we don't end up with the original contents after // our modified contents. if (!ebook.resize(0)) { errStream << QObject::tr("Error: Could not truncate file") << endl; return 1; } // Store the modified content back on disk. ioStream.seek(0); ioStream << content; ebook.close(); return 0; }
remove_extra_whitespace_ebook_txt.cpp
/* Copyright (c) 2008 John Schember Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ /* Removes extraneous whitespace in a txt file ebook. This will remove every '\t', '\v', '\f', '\r', and will replace multiple occurrences ' ' with a single one. For Example: INPUT This is a bad line. Now for the second borken line. OUTPUT This is a bad line. Now for the second borken line. */ #include #include #include int main(int argc, char **argv) { // Stream to write errors to the console. QTextStream errStream(stderr); // Store for the contents of the ebook. QString content; // We need an ebook file to work on. if (argc != 2) { errStream << QObject::tr("Error: No input file") << endl; return 1; } QFile ebook(argv[1]); if (!ebook.open(QIODevice::ReadWrite | QIODevice::Text)) { errStream << QObject::tr("Error: Could not open") << endl; return 1; } // We use a QTextStream to actually work on the file. QTextStream ioStream(&ebook); // Read every line and remove the extras we don't want. while (!ioStream.atEnd()) { content += ioStream.readLine().simplified() + "\n"; } // Truncate the ebook so we don't end up with the original contents after // our modified contents. if (!ebook.resize(0)) { errStream << QObject::tr("Error: Could not truncate file") << endl; return 1; } // Store the modified content back on disk. ioStream.seek(0); ioStream << content; ebook.close(); return 0; }
* json and blogger
Posted on December 14th, 2008 by John. Filed under programming.
Today I decided to learn about json. To help me with this I coded a little python script I call blogger-updates.py. It takes the name of a blogger blog and optionally a number designating the number of entries to reterieve. I used Google’s blogger api to get the data.
*** Updated to account for non numeric input when setting max entries.
import simplejson import sys import urllib2 def usage(): print sys.argv[0], 'blogname [max-results]' print ' Gets blog updates from blogger.com' if len(sys.argv) 3: usage() sys.exit(2) try: blogname = sys.argv[1] except: print "Sorry:", sys.exec_type, ":", sys.exec_value sys.exit(1) max_results = 5 if len(sys.argv) is 3: try: max_results = int(sys.argv[2]) except: pass try: json_data = simplejson.load(urllib2.urlopen('http://%s.blogspot.com/feeds/posts/default?alt=json&orderby=published&sortorder=ascending&max-results=%i' % (blogname, max_results))) except: print "Sorry:", sys.exc_type, ":", sys.exc_value sys.exit(1) for entry in json_data['feed']['entry']: print 'Title: %s' % (entry['title']['$t']) print 'Author: %s' % (entry['author'][0]['name']['$t']) print 'Published: %s' % (entry['published']['$t']) print 'Content: %s' % (entry['content']['$t']) print ''
* Kittens!
Posted on December 13th, 2008 by John. Filed under Uncategorized.
Today Tati and I decided to adopt two cute little kittens. Wally is an orange tabby and he is 2 months old. Cricket is a Persian mix and she is 3 months old. Both love each other and play with one another whenever they aren’t sleeping. It’s nice having them in the condo. Pictures of them are up at http://photos.nachtimwald.com/index.php?album=kitties.
* WordPress 2.7
Posted on December 13th, 2008 by John. Filed under site maintenance.
WordPress 2.7 is out and all parts of Nach im Wald that use wordpress have been updated.
Tags
Archives
- July 2010 (4)
- June 2010 (1)
- May 2010 (2)
- March 2010 (1)
- January 2010 (8)
- December 2009 (5)
- November 2009 (6)
- October 2009 (4)
- September 2009 (2)
- August 2009 (6)
- July 2009 (6)
- June 2009 (4)
- May 2009 (6)
- April 2009 (4)
- March 2009 (2)
- February 2009 (4)
- January 2009 (4)
- December 2008 (7)
- November 2008 (2)