* Formatting Tips: Endnotes and Footnotes
Posted on September 20th, 2011 by John. Filed under Formatting Tips.
Many non-fiction books utilize footnotes or endnotes. Footnotes do not work very well with ebooks because footnotes need to be placed at the bottom of a page. It is possible with EPUB to specify content to be shown at the bottom of a page using the following:
<div style="display: oeb-page-foot">...</div>
I highly recommend against using the oeb-page-foot display style. Many reading devices and software do not support this and just ignore this text. My recommendation is to use endnotes.
All endnotes should be collected into a single page located at the end of the ebook. I recommend using either a * or increasing numbers. If you use numbers the number in the text should correspond and be displayed infront of the endnote. Use the sup tag and link to the end note.
<sup id="ra"><a href="c2.xhtml#enda">*</a></sup>
One thing to keep in mind is not all ebook reading device and software support a back button. So it’s a good idea to include a return link to take the reader back to the exact place in the text the endnote is referenced by. Make the return link subtle but not something the reader will over look. I just make the font smaller.
<sub><a class="return" href="c1.xhtml#ra">return</a></sub>
.return { font-size: x-small; }
The best way to illustrate this concept is with an example. Download ft-endnotes.epub. Opening the file with Sigil you will see the example chapters and the external CSS that is referenced by each XHTML file.
* Week in Review
Posted on September 16th, 2011 by John. Filed under calibre, Sigil.
Calibre
This week I focused on PDF output. There was a bug introduced in 0.8.17 that broke PDF output which has now been fixed. I was also able to fix PDF output on OS X. The PDF output engine on OS X is now using OS X’s internal PDF engine instead of Qt’s. Page sizes other than A4 are now possible and the PDFs produced are no longer large image based monstrosities. Meaning, text is now selectable and can be copied.
Sigil
I am currently working on Pearl compatible regular expression (PCRE) support. An initial version has been put into git. I have an enhanced version that allows for case changes in the replacement text working. Right now I’m working caching the results of a search to improve performance.
* Formatting Tips: Sizing elements (Including Text)
Posted on September 13th, 2011 by John. Filed under Formatting Tips.
When dealing with the EPUB format there are number of ways to deal with sizes. Font size, indent, margin, spacing all allow for a variety of units to define their size. Sizes can be defined using any of the following: %, in, cm, mm, em, ex, pt, pc, px.
With all these choices it might be hard to decide which unit type to use. This decision is actually very easy. Alway use a relative size type. cm and in for instance are fixed size; 1 cm is always 1 cm and 1 in is always 1 in. % and em are relative sizes. An em is equivalent to the the current font size. So 1 em is equivalent to 12 pt if the font size is 12 pt.
It’s very important to use relative sizes because EPUB and EPUB reading software / devices allow for users to change font sizes. Also, there is significant variation in screen sizes. Using relative sizes means that spacing of elements (indents for example) will always appear the same.
I’ll admit using relative sizes is hard. In some of the examples for Formatting Tips I’ve used exact measurements. I apologize for this but this illustrates how difficult it can be to take into account all of the possible rendering issues that can arise from such a diverse reading ecosystem available today.
Be careful when using relative sizes. Each unit is unique. Using a % for the indent will be a % of the screen width. While using a % for a text size it will be relative to the text size of that element. An em is always going to be relative to the current font size. Be sure to check the layout with various text sizes and screen dimensions. You can simulate (it’s not perfect) by opening the book in something like Sigil or calibre’s ebook-viewer and change the zoom and window size.
* Calibre Week in Review
Posted on September 8th, 2011 by John. Filed under calibre.
This week I finally sat down and spend some time with Markdown input and output. Both saw major changes. Markdown input was bumped to upstream version 2.0. Output was completely rewritten from scratch. Markdown output is now completely custom code (not using a third party output module like before). I based the new markdown code off of the Textile output classes I helped Perkin to create.
As with all new code and major changes there are probably bugs. I tested Markdown output with a variety of test material and kept working at it until everything converted acceptably. I also used a variety of the Markdown tests provided by John Gruber to ensure my output was correct. When converting the HTML output tests back to Markdown the output is similar enough to the original that I feel it is acceptable.
The last big change I made this week was adding a new OEB transformation to unsmarten punctuation. As the name implies it changes curly quotes, apostrophes and a few other characters to their plain text, straight equivalents. It basically does the opposite of smarten punctuation. I find this especially useful when converting to formatted (Textile or Markdown) plain text files (TXT).
* Sigil’s Future Direction (Post 0.4.x)
Posted on September 4th, 2011 by John. Filed under articles, Opinion, Sigil.
Introduction
With 0.4 my focus has been on getting the existing features in a stable state. I foresee 0.4 being around for quite some time as development shifts to accommodate new features. I wanted to be sure a relatively bug free version is available for people to use. If data loss is a constant then there wouldn’t be any point in using Sigil. Now that 0.4 is done it’s time to start working on what’s next.
Just what is next? For the time being I’ve marked a number of issues on the issue tracker as Milestone-0.5. My plan is to have 0.5 just implement the most commonly requested and most interesting features. 0.5 has no vision and is just a stop gap while I familiarize myself with Sigi’s code base. 0.5 is my short term plan. It’s not grand but it’s functional and sufficient.
Recently I posted the conclusion of my Sigil user study. The findings are Sigil is most used and most useful to power users and small professional ebook creating houses. Also, the overlap between the two is significant. Thus I want to target these two group and make Sigil even more useful for them. Keep this in mind because these two groups are who is going to shape my views of where I want to take Sigil.
Please realize that not everything I’m going to talk about is set in stone. A lot of it probably will never happend. Also, this is part plans, part what I want to do, and part rant about what Sigil does that I don’t like. This is what my ideal Sigil would look like and it is what I’m going to work toward. However, nothing is set in stone.
Plugins
If you’ve ever used calibre or Firefox you will know that plugins are amazing. They allow for easy and quick changes and additions to be made without having to change the main application. Both calibre and Firefox have large third party plugin communities. I would like to bring this to Sigil and I want a framework where all book manipulation is available over a plugin interface.
My feelings with Sigil are plugins should make small self contained changes. Similar to calibre’s heuristic processing. For instance, italicize common cases, up / down shift headings, and normalize CSS. To make plugins really useful I want to have a system where multiple plugins can be chained together and run in sequence. This would be super basic internal script functionality.
For plugins themselves I’m undecided about how they should be implemented. I don’t mean API wise because that isn’t even a thought at this point. I’m talking about what languages they should be able to be written in. C++ as a shared library will of course be supported because Sigil is written in C++. However, I want to Sigil to be able to load plugins written in scripting languages.
My first thought is Python because I’m very familiar with it and love to work with it. I’m also thinking about Lua and QtScript (Javascript without DOM). I don’t support frameworks for every one of these languages due to the amount of maintenance required. So I want to support only one scripting language. Python is big and slow. Lua is small but doesn’t have the advanced text manipulation libraries Python offers. QtScript is Javascript with is an abomination of a language. Added size of Sigil’s install, execution speed, ease of supporting, knowledge by contributors and text manipulation support are all major considerations.
Editor
Currently Sigil does not respect the structure of existing files. When you open an EPUB in Sigil it restructures the file layout. It even goes as far as to rewrite each XHTML file by running it through Tidy. With 0.4.0 cleaning with Tidy can be disabled but pretty printing is still used and alters the XHTML. I absolutely hate this! If I want my XHTML or file structure changed I’ll do it myself.
I want to change Sigil to not be as automatic. Restructuring and cleaning of the XHTML should be moved to plugins and run when the user requests it. This way a user can open Sigil, change the metadata, save, and the only thing that changes is the OPF with the metadata changes. Not every single piece of the EPUB.
I also hate WYSIWYG editing because it inherently must make drastic changes to the underlying code. I don’t think it’s a good idea to remove it though. I would prefer to have the book view default to a preview mode that is read only. There wouldn’t be any changes made to the code by using book view. Read only is the default but the user should be able to have an edit toggle that will set the book view to edit mode which will work like it already does. This way a user can make changes that may not be valid or work, check them, see there is an error (say a missing </a> tag) without losing any work. They can see the issue fix it and still be able to use WYSIWYG editing when they want.
Data Store
Right now XML (XHTML included) data is stored as a Xerces DOMDocument. This is then loaded into the book or code view depending on which one is focused. The use of a DOMDocument often leads to data loss. Putting malformed XML into a DOMDocument can have unintended consequences. Especially when then loading that into a QWebView and getting back a string.
I want to replace the DOMDocument with a plain string as the data store. This will prevent a lot of data loss, especially combined with the book view defaulting to read only. Further, this combined with not making automatic changes to the code will make the well-formed error warning unnecessary.
Not auto processing with Tidy and checking for errors automatically will allow Sigil to produce invalid EPUBs. I really don’t care that this can happen. The tools (FlightCrew) will still be there to check that the file conforms to the spec. It’s up to the author to ensure they’re publishing valid EPUBs. An EPUB that is being actively edited doesn’t have to be valid at all times. I’d rather put the onus on the person using Sigil to ensure their EPUB is correct before publishing versus having Sigil force validity at every moment.
Undo
Undo is terrible right now. Some actions cannot be undone, some can. The book view’s undo is completely separate from the code view. You can’t undo a replacement when doing it across all HTML files on files that aren’t open in a tab. I want to see a unified single undo that allows for setting back out of any change.
Further along this line I would like some graphical display where you can look at the changes that have been made to make it easy to find exactly how far back to undo. Something like Apple’s Time Machine but for the state of the book.
Conclusion
Here is where I want to take Sigil: less hand holding, less automatic changes and more advanced text manipulation though a plugin interface. The big question is, should I skip putting out a 0.5.0 release with just the Milestone-0.5.0 marked changes and get started on the above now?
* Sigil and Linux Distribution Packages
Posted on September 3rd, 2011 by John. Filed under Sigil.
The official Linux packages for Sigil are generic packages. They’re bundled in an InstallJammer installer and contain a number of libraries that Sigil depends on. This is not ideal but it’s not possible to provide Linux packages for every distro.
I’ve created a wiki page which I’m putting together a list of Linux distributions that have their own Sigil packages. These are the best packages for users to install because they’re smaller and tailored.
If your distro isn’t listed and it has Sigil packages let me know and I’ll add it to the list. If your distro doesn’t package Sigil let them know you would like to see them package it. I’m always willing to lend a hand to get Sigil in more Linux distros.
* Sigil 0.4.2 Released
Posted on September 2nd, 2011 by John. Filed under Sigil.
Sigil 0.4.1 is complete and available. This is mainly a maintenance release and fixes a number of bugs. Specifically a few bugs related to data loss. There was one major user visible change. The well-formed error dialog can be toggled not to show. This will cause errors to be auto fixed. Use this with care because the auto fix Sigil makes might not be what you want. As always see the changelog for a complete list of changes.
* Calibre Week in Review
Posted on September 2nd, 2011 by John. Filed under calibre.
Since taking over Sigil I haven’t had much time to spend working on calibre. However, I haven’t abandoned calibre. It’s still a priority and something I will continue to work on.
This week I focused on Get Books. Nothing new was added but I went through most of the store plugins and fixed a few of them to support changes to the stores. As of 0.8.17 all stores should be working properly. It doesn’t sound like much or very flashy but fixing bugs and keeping everything running smoothly is very important.
* Sigil User Study
Posted on August 31st, 2011 by John. Filed under articles, Opinion, Sigil.
Introduction
Since taking over as the maintainer of Sigil I have spent some time reaching out to specific people in the ebook community to ask them about Sigil. Specifically if they use Sigil? Why or why not? What do they see as Sigil’s shortcomings? How do they use Sigil in their work flow? Why doesn’t Sigil work in their work flow. Basically, their thoughts and opinions on Sigil.
I asked specific people privately because I didn’t want to be inundated with responses. The people can be broken down into three different groups: self publishers, power users, and professionals. After talking to professionals I’ve come to realize that they can be broken down into small and large. The size relating to the size of the company and production volume. I spoke with about 8 people total and I tried to keep it even between the various groups.
I wanted to find out who is using Sigil, who isn’t using Sigil and why so I can determine where I want to take Sigil in the future. The only ebook editing I do is cleaning up a few books here and there. Learning how people use Sigil will help me to determine the best direction to take the project.
Self Publishers
Self publishers are authors. These are people who write their book and then want to sell it as an ebook themselves. Typically these people are using Word for writing. they export their work as HTML, then import into an ebook editor for final adjustments and savings as an ebook file. The two biggest things self publishers are looking for are easy and high quality .doc or .docx import and one click send to store functionality.
Self publishers are also interested in WYSIWYG editing and don’t want to know about the internals of ebooks. They are primarily writers who see ebooks one of many distribution methods. They don’t care about the intricacies of EPUB for instance, they just want their work to look good and be readable by their audience.
The typical tools I hear being used by self publishers are calibre for format shifting. Atlantis Word Processor and Jutoh for formatting and base ebook creation. Atlantis and Jutoh both provide very easy to use WYSIWYG interaction and you can use these without ever seeing a line of code.
Power Users
These are people who prepare works in their spare time as a hobby. They are not motivated by money and do not sell the works they publish. Typically the works power users deal with are public domain such as Shakespeare. This group also encompasses people who do not distribute works covered by copyright but spend their time cleaning and reformatting their favorite books strictly for their own enjoyment and personal use.
Power users are comfortable using either WYSIWYG and code editors. The biggest feature requested and talked about by power users is robust regular expression support for search and replace. Many of the books power users work with have terrible and often non-existant formatting. These works typically started life as either a scanned copy of a print book or a PDF file. Both of which typically leave broken paragraphs and misspellings thought the document. Which leads to spell check being the next most common request from this group. They are trying to take a jumble of half sentences and put them back together into a visually appealing layout.
The tools used by power users are Sigil, calibre, Word or Open Office macros, and many custom scripts. Also an advanced text editor like BBEdit and Notepad++ are must have tools.
Professionals
Professionals format ebooks for one purpose, money. This is what they do for a living. An author comes to them and pays to have the company turn their work into an ebook. For a modest fee an author can have a beautiful ebook produced without any headaches or hassle. Many authors prefer paying someone to do this portion of publishing for them just like they will pay an editor to edit, a print house to print, cover artist to design a cover and so forth. Authors write and typically want to concentrate solely on writing. Many self publishers format their own ebooks out of necessity because of the cost of hiring a professional.
With both small and large professionals I’m specifically talking about ebook publishing and digitization services. I’m not talking about huge publishers like Macmillan that do everything. However, the larger publishers I talked to makes me believe their process is the same as the huge publishers. The big difference between small and large professionals are the tools they use.
Small
Small professionals tend to use either Sigil or Adobe’s InDesign for a good portion of their work. Both fill a very similar role in ebook creation. The big draw of InDesign over Sigil is InDesign supports print book layout creation. It’s an all in one tool. This type of professional tends to use off the shelf tools that are readily available. Sigil and InDesign are not the only exclusive tools they use but one or the other tends to be a heavily used tool in their tool box.
Large
Large professionals tend to use custom tools. They staff people who’s sole job is to develop and maintain ebook creation and formatting tools. They can afford to have custom tools that integration directly into their process. They don’t use off the shelf or vanilla tools. This group is all about custom everything. This allows them to quickly adapt to changes.
Professional Tools
Sigil or InDesign and custom tools are all I know. Many professionals are vague about their process and tools. Some even declined to talk to me at all. They use tools in some way that works for them but their methods and implementation are proprietary.
What Does This Mean For Sigil?
Out of all of these groups I have little desire to target self publishers. There are existing tools that do a great job of meeting this groups needs. Sigil has a WYSIWYG editor and it can certainly be improved but I don’t want to tie Sigil to a particular store or stores like Amazon or B&N. Also, I want to keep Sigil as an EPUB editor and not a generic ebook editor. I believe that Sigil’s strength lies in being able to manipulate the internals of the EPUB format itself. I want to target this aspect more.
Power users are the major group I want to target. Out of all of the people I spoke with power users use Sigil the most and get the most out of it. Advanced editing of an EPUB’s structure and code is where I want to take Sigil. That along with advanced text manipulation. Think expansion of calibre’s heuristic processing.
Small professionals are major users of Sigil and I do not want to discount them. I believe that their use of Sigil overlaps with power users enough that targeting power users will also target small publishers. I do not want to alienate small professionals and will continue to take their needs seriously. From what I’ve learned about small professionals tools that make code manipulation easier will be a benefit and hopefully reduce their need for other formatting tools.
The last group, large professionals, do not use Sigil. I don’t believe that changing Sigil to accommodate this group will get them to use Sigil. They use their own custom tools and Sigil doesn’t fit into their work flow and I don’t see it ever doing so. Thus I don’t see it being worth while to work toward making Sigil “the tool” for this group.
* Formatting Tips: Markdown, Textile and calibre
Posted on August 30th, 2011 by John. Filed under Formatting Tips.
Up to this point Formatting Tips have been focused on the EPUB format and working directly with the underlying XHTML and CSS. Not everyone wants or needs this level of control over the layout of their book. Often times a book only needs basic formatting such as headings, bold, and italic. There are other easier ways to format an ebook. However, in this case simpler does mean basic.
A very easy way to format an ebook is to start with a plain text file (TXT). Then use either Markdown or Textile to add the formatting. Both Markdown and Textile allow for simple text formatting and they are designed to be converted to HTML.
By using TXT with a formatting syntax you can use pretty much any text editor you want. Markdown and Textile are very simple formats that are much easier to learn than XHTML and CSS. Adding things like *bold* is as easy as putting a * before and after a segment of text.
I recommend looking at both Markdown and Textile. There are differences in what formatting they support but they both support the basics like bold, italic, and headings. I’ve found Markdown to be easier to use but Textile offers more options.
After adding your formatting to the text it’s very easy to turn the TXT file into your desired final format (EPUB or MOBI most likely). calibre supports TXT formatted with either Markdown or Textile. However, the Textile support is more robust. Simply convert to the output format of your choosing.
Tags
Archives
- January 2012 (3)
- December 2011 (2)
- November 2011 (1)
- October 2011 (3)
- September 2011 (9)
- August 2011 (15)
- July 2011 (5)
- June 2011 (3)
- May 2011 (4)
- April 2011 (2)
- March 2011 (2)
- February 2011 (4)
- January 2011 (4)
- December 2010 (2)
- November 2010 (1)
- October 2010 (1)
- August 2010 (3)
- July 2010 (4)
- June 2010 (1)
- May 2010 (2)
- March 2010 (1)
- January 2010 (8)
- December 2009 (5)
- November 2009 (6)
- October 2009 (4)
- September 2009 (2)
- August 2009 (6)
- July 2009 (6)
- June 2009 (4)
- May 2009 (6)
- April 2009 (4)
- March 2009 (2)
- February 2009 (4)
- January 2009 (4)
- December 2008 (7)
- November 2008 (2)