Author Archive

* Sigil and Data Loss Bugs

Posted on November 8th, 2011 by John. Filed under Sigil.


The majority of the data loss issues have been mitigated at this point. With a work flow of open, save as after major changes and saving after minor ones, catastrophic data loss can be worked around to the point that Sigil can and is being used on a day to day basis.

That said, there are issues with data loss in Sigil and they are a priority. I’m currently finishing up the 0.5 release (I do not have a set release date at this point) which is mainly a feature release and only addresses some of the the data loss issue. For example you can still have everything in an entire XHTML document removed by putting a malformed XML header in the document.

The issue has three components that require major work to fix. I hope to have it all completed for the 0.6 release but it’s going to be some time it’s ready.

The issues are:

1) Sigil currently uses Tidy to clean all XHTML to ensure it conforms (as much as it can) to the XHTML spec. I have seen Tidy remove tags it thinks are empty when they influence how the document is rendered. I want to keep Tidy as part of Sigil but I believe it should only be run when the user asks for it and any changes it makes the user should be able to revert.

2) An intermediate data store is used that requires valid XML is used. This store shuffles data between the book and code view. Due to this store requiring valid XML (valid XHTML conforms) there is the potential for data loss if it has to auto correct the XHTML. If you are in code view and have malformed structural issues with the XHTML and move out of it there is a warning dialog. This only appears when you are working on one file at a time. If you are replacing across multiple files auto correction is used and this can lead to data loss. This data store needs to be replaced with one that does not require valid XML.

3) Putting malformed content into the book view will cause the book view to try to correct it. Again auto correction can lead to data loss. This is mitigated by the malformed error dialog but many users just disable it and find that sections of their document are missing after looking at it in book view. Also, the book view is a WYSIWYG tool so it does make structural changes to the document and these may or may not be what the user expects. As with Tidy changes made by the book view need to be able to be reverted. I am thinking about ways to make the fact that the book view more obvious that it makes changes to the document. This way the user is aware that they need to use undo (doesn’t currently work for book view changes) to revert the changes if they don’t like them. I’m thinking about using a preview mode by default that doesn’t make any changes and an edit mode to make this distinction obvious.

The above issues can be fixed but they are not quick or easy changes. I plan on making them for the 0.6 release as part of the changes necessary to support EPUB 3. However, there is the possibility that they will slip to 0.7 due to how large they are. Unfortunately, all I can say right now is I’m aware of the issue, I know what the cause is, and I have an idea of how to correct it but it’s not going to happen tomorrow.

Tags: .

    Comments Off


* Retrieve Formatting Set by QSyntaxHighlighter

Posted on October 29th, 2011 by John. Filed under programming.


I have been working on adding inline spell check to Sigil recently and ran into a quirk on Qt that isn’t immediately obvious. I ended up having to look though the Qt source code to understand exactly what was happening.

When dealing with a QPlainTextEdit you can get the QTextCursor and use the charFormat() function to retrieve the QTextCharFormat for the character before the cursor. This does not work when the formatting is set by a QSyntaxHighlighter!.

charFormat retrieves the character format that has explicitly been set on the QPlainTextEdit. QSyntaxHighlighter does not directly set the formatting on the QPlainTextEdit. Instead QSyntaxHighlighter sets the format in additionalFormats as part of the block layout. All formatting for the block the cursor is currently in can be accessed by using QPlainTextEdit::textCursor().block().layout()->additionalFormats().

QTextLayout::additionalFormats() returns a list of FormatRange objects. A FormatRange gives the start of the formatting (relative to the block not the full text in the QPlainTextEdit), the length and the formatting (as set by the QSyntaxHighlighter). Simply loop over all of the FormatRange objects and check if the cursor is within a range to determine what formatting is applied to a particular part of the block’s text. Use QTextCursor::positionInBlock() to determine the relative position of the cursor within the block.

Here is an example from Sigil that I use for spell checking. It determines if a particular segment of text has the misspelled word style applied to it. It then selects the text.

QTextCursor c = textCursor();
int pos = c.positionInBlock();
foreach (QTextLayout::FormatRange r, textCursor().block().layout()->additionalFormats()) {
    if (pos >= r.start && pos <= r.start + r.length && r.format.underlineStyle() == QTextCharFormat::SpellCheckUnderline) {
        c.setPosition(c.block().position() + r.start);
        c.movePosition(QTextCursor::Right, QTextCursor::KeepAnchor, r.length);
        setTextCursor(c);
        break;
    }
}

*Note: QTextEdit can be substituted any place QPlainTextEdit is used. This applies to both not just QPlainTextEdit.

Tags: , .



* Sigil Now Supports Translations

Posted on October 8th, 2011 by John. Filed under Sigil.


One of the the new features that has been implemented for 0.5 (release date yet to be determined) is support for Translations. For Sigil’s first supported language Grzegorz Wolszczak has provided a Polish translation. Currently translations are loaded based upon the current system locale. There no support for choosing the language via preferences. This may come at a later time but for now I believe that using the system locale will handle the majority of user needs.

I’ve put together a wiki page with instructions for creating translations. This first revision is a bit basic but as people have questions I plan to update it to make it more robust.

Tags: , , , , .

    Comments Off


* Sigil Keyboard Shotcuts

Posted on October 1st, 2011 by John. Filed under Sigil.


Thanks to Grzegorz Wolszczak Sigil now (will be part of the 0.5 release) allows users to change keyboard shortcuts for many actions. Grzegorz has been helping out a lot and helped to introduce a preferences dialog and provided user configurable keyboard shortcuts.

Tags: , .

    Comments Off


* Formatting Tips: Raised Initial

Posted on September 27th, 2011 by John. Filed under Formatting Tips.


About Formatting Tips.

This is a very easy formatting type and is similar to doing a drop cap. The big difference is a raised initial the letter is on the base line and is higher than the other letters. Only the first letter of the first paragraph of a chapter should be raised.

Simply wrap the first letter in a span tag referencing the appropriate CSS class like so.

<p><span class="ri">L</span>orem...</p>

The CSS for a raised initial is very easy. Simply make the font size larger than normal and set it to bold.

span.ri {
    font-size: 4em;
    font-weight: bold;
}

The best way to illustrate this concept is with an example. Download ft-raised-initial.epub. Opening the file with Sigil you will see the example chapters and the external CSS that is referenced by each XHTML file.

Tags: , , , , , , , .



* Formatting Tips: Endnotes and Footnotes

Posted on September 20th, 2011 by John. Filed under Formatting Tips.


About Formatting tips.

Many non-fiction books utilize footnotes or endnotes. Footnotes do not work very well with ebooks because footnotes need to be placed at the bottom of a page. It is possible with EPUB to specify content to be shown at the bottom of a page using the following:

<div style="display: oeb-page-foot">...</div>

I highly recommend against using the oeb-page-foot display style. Many reading devices and software do not support this and just ignore this text. My recommendation is to use endnotes.

All endnotes should be collected into a single page located at the end of the ebook. I recommend using either a * or increasing numbers. If you use numbers the number in the text should correspond and be displayed infront of the endnote. Use the sup tag and link to the end note.

<sup id="ra"><a href="c2.xhtml#enda">*</a></sup>

One thing to keep in mind is not all ebook reading device and software support a back button. So it’s a good idea to include a return link to take the reader back to the exact place in the text the endnote is referenced by. Make the return link subtle but not something the reader will over look. I just make the font smaller.

<sub><a class="return" href="c1.xhtml#ra">return</a></sub>
.return {
    font-size: x-small;  
}

The best way to illustrate this concept is with an example. Download ft-endnotes.epub. Opening the file with Sigil you will see the example chapters and the external CSS that is referenced by each XHTML file.

Tags: , , , , , , .

    Comments Off


* Week in Review

Posted on September 16th, 2011 by John. Filed under calibre, Sigil.


Calibre

This week I focused on PDF output. There was a bug introduced in 0.8.17 that broke PDF output which has now been fixed. I was also able to fix PDF output on OS X. The PDF output engine on OS X is now using OS X’s internal PDF engine instead of Qt’s. Page sizes other than A4 are now possible and the PDFs produced are no longer large image based monstrosities. Meaning, text is now selectable and can be copied.

Sigil

I am currently working on Pearl compatible regular expression (PCRE) support. An initial version has been put into git. I have an enhanced version that allows for case changes in the replacement text working. Right now I’m working caching the results of a search to improve performance.

Tags: , , , .

    Comments Off


* Formatting Tips: Sizing elements (Including Text)

Posted on September 13th, 2011 by John. Filed under Formatting Tips.


About Formatting Tips.

When dealing with the EPUB format there are number of ways to deal with sizes. Font size, indent, margin, spacing all allow for a variety of units to define their size. Sizes can be defined using any of the following: %, in, cm, mm, em, ex, pt, pc, px.

With all these choices it might be hard to decide which unit type to use. This decision is actually very easy. Alway use a relative size type. cm and in for instance are fixed size; 1 cm is always 1 cm and 1 in is always 1 in. % and em are relative sizes. An em is equivalent to the the current font size. So 1 em is equivalent to 12 pt if the font size is 12 pt.

It’s very important to use relative sizes because EPUB and EPUB reading software / devices allow for users to change font sizes. Also, there is significant variation in screen sizes. Using relative sizes means that spacing of elements (indents for example) will always appear the same.

I’ll admit using relative sizes is hard. In some of the examples for Formatting Tips I’ve used exact measurements. I apologize for this but this illustrates how difficult it can be to take into account all of the possible rendering issues that can arise from such a diverse reading ecosystem available today.

Be careful when using relative sizes. Each unit is unique. Using a % for the indent will be a % of the screen width. While using a % for a text size it will be relative to the text size of that element. An em is always going to be relative to the current font size. Be sure to check the layout with various text sizes and screen dimensions. You can simulate (it’s not perfect) by opening the book in something like Sigil or calibre’s ebook-viewer and change the zoom and window size.

Tags: , , , , , , .

    Comments Off


* Calibre Week in Review

Posted on September 8th, 2011 by John. Filed under calibre.


This week I finally sat down and spend some time with Markdown input and output. Both saw major changes. Markdown input was bumped to upstream version 2.0. Output was completely rewritten from scratch. Markdown output is now completely custom code (not using a third party output module like before). I based the new markdown code off of the Textile output classes I helped Perkin to create.

As with all new code and major changes there are probably bugs. I tested Markdown output with a variety of test material and kept working at it until everything converted acceptably. I also used a variety of the Markdown tests provided by John Gruber to ensure my output was correct. When converting the HTML output tests back to Markdown the output is similar enough to the original that I feel it is acceptable.

The last big change I made this week was adding a new OEB transformation to unsmarten punctuation. As the name implies it changes curly quotes, apostrophes and a few other characters to their plain text, straight equivalents. It basically does the opposite of smarten punctuation. I find this especially useful when converting to formatted (Textile or Markdown) plain text files (TXT).

Tags: , , .

    Comments Off


* Sigil’s Future Direction (Post 0.4.x)

Posted on September 4th, 2011 by John. Filed under articles, Opinion, Sigil.


Introduction

With 0.4 my focus has been on getting the existing features in a stable state. I foresee 0.4 being around for quite some time as development shifts to accommodate new features. I wanted to be sure a relatively bug free version is available for people to use. If data loss is a constant then there wouldn’t be any point in using Sigil. Now that 0.4 is done it’s time to start working on what’s next.

Just what is next? For the time being I’ve marked a number of issues on the issue tracker as Milestone-0.5. My plan is to have 0.5 just implement the most commonly requested and most interesting features. 0.5 has no vision and is just a stop gap while I familiarize myself with Sigi’s code base. 0.5 is my short term plan. It’s not grand but it’s functional and sufficient.

Recently I posted the conclusion of my Sigil user study. The findings are Sigil is most used and most useful to power users and small professional ebook creating houses. Also, the overlap between the two is significant. Thus I want to target these two group and make Sigil even more useful for them. Keep this in mind because these two groups are who is going to shape my views of where I want to take Sigil.

Please realize that not everything I’m going to talk about is set in stone. A lot of it probably will never happend. Also, this is part plans, part what I want to do, and part rant about what Sigil does that I don’t like. This is what my ideal Sigil would look like and it is what I’m going to work toward. However, nothing is set in stone.

Plugins

If you’ve ever used calibre or Firefox you will know that plugins are amazing. They allow for easy and quick changes and additions to be made without having to change the main application. Both calibre and Firefox have large third party plugin communities. I would like to bring this to Sigil and I want a framework where all book manipulation is available over a plugin interface.

My feelings with Sigil are plugins should make small self contained changes. Similar to calibre’s heuristic processing. For instance, italicize common cases, up / down shift headings, and normalize CSS. To make plugins really useful I want to have a system where multiple plugins can be chained together and run in sequence. This would be super basic internal script functionality.

For plugins themselves I’m undecided about how they should be implemented. I don’t mean API wise because that isn’t even a thought at this point. I’m talking about what languages they should be able to be written in. C++ as a shared library will of course be supported because Sigil is written in C++. However, I want to Sigil to be able to load plugins written in scripting languages.

My first thought is Python because I’m very familiar with it and love to work with it. I’m also thinking about Lua and QtScript (Javascript without DOM). I don’t support frameworks for every one of these languages due to the amount of maintenance required. So I want to support only one scripting language. Python is big and slow. Lua is small but doesn’t have the advanced text manipulation libraries Python offers. QtScript is Javascript with is an abomination of a language. Added size of Sigil’s install, execution speed, ease of supporting, knowledge by contributors and text manipulation support are all major considerations.

Editor

Currently Sigil does not respect the structure of existing files. When you open an EPUB in Sigil it restructures the file layout. It even goes as far as to rewrite each XHTML file by running it through Tidy. With 0.4.0 cleaning with Tidy can be disabled but pretty printing is still used and alters the XHTML. I absolutely hate this! If I want my XHTML or file structure changed I’ll do it myself.

I want to change Sigil to not be as automatic. Restructuring and cleaning of the XHTML should be moved to plugins and run when the user requests it. This way a user can open Sigil, change the metadata, save, and the only thing that changes is the OPF with the metadata changes. Not every single piece of the EPUB.

I also hate WYSIWYG editing because it inherently must make drastic changes to the underlying code. I don’t think it’s a good idea to remove it though. I would prefer to have the book view default to a preview mode that is read only. There wouldn’t be any changes made to the code by using book view. Read only is the default but the user should be able to have an edit toggle that will set the book view to edit mode which will work like it already does. This way a user can make changes that may not be valid or work, check them, see there is an error (say a missing </a> tag) without losing any work. They can see the issue fix it and still be able to use WYSIWYG editing when they want.

Data Store

Right now XML (XHTML included) data is stored as a Xerces DOMDocument. This is then loaded into the book or code view depending on which one is focused. The use of a DOMDocument often leads to data loss. Putting malformed XML into a DOMDocument can have unintended consequences. Especially when then loading that into a QWebView and getting back a string.

I want to replace the DOMDocument with a plain string as the data store. This will prevent a lot of data loss, especially combined with the book view defaulting to read only. Further, this combined with not making automatic changes to the code will make the well-formed error warning unnecessary.

Not auto processing with Tidy and checking for errors automatically will allow Sigil to produce invalid EPUBs. I really don’t care that this can happen. The tools (FlightCrew) will still be there to check that the file conforms to the spec. It’s up to the author to ensure they’re publishing valid EPUBs. An EPUB that is being actively edited doesn’t have to be valid at all times. I’d rather put the onus on the person using Sigil to ensure their EPUB is correct before publishing versus having Sigil force validity at every moment.

Undo

Undo is terrible right now. Some actions cannot be undone, some can. The book view’s undo is completely separate from the code view. You can’t undo a replacement when doing it across all HTML files on files that aren’t open in a tab. I want to see a unified single undo that allows for setting back out of any change.

Further along this line I would like some graphical display where you can look at the changes that have been made to make it easy to find exactly how far back to undo. Something like Apple’s Time Machine but for the state of the book.

Conclusion

Here is where I want to take Sigil: less hand holding, less automatic changes and more advanced text manipulation though a plugin interface. The big question is, should I skip putting out a 0.5.0 release with just the Milestone-0.5.0 marked changes and get started on the above now?

Tags: , , .