EPUB: Not just for eBooks?
As more publishers recognize the need to distribute their titles on eBook platforms, there has been a rush to convert titles into EPUB standard format. EBook readers offer a unique opportunity, particularly for small and medium sized publishers, broadening readership to new audiences and enabling a one-time cost of conversion to sell many times without the cost of print and distribution.
The result has been widespread conversion of titles into EPUB format throughout publishing organizations worldwide, often facilitated through external offshore conversion vendors. As the universally accepted standard for eBooks, EPUB is now utilized on more than 40 eBook readers including the Apple iPad, Amazon Kindle, Sony eBook Reader and Barnes and Noble Nook.
Once this conversion process is complete, publishers are often left with yet another binary conversion format to manage and store within their archives. On the surface, EPUB seems to be an inflexible format that is difficult to modify, a delivery format at the end of a publishing workflow. However, this is far from accurate.
Publishing Workflow Without XML?
Interestingly enough, most publishers have chosen not to incur the costs of converting to XML as an intermediary or storage format for their titles. These publishers have made a strategic decision to instead convert their titles directly to EPUB from other electronic mediums, such as InDesign, PDF or Microsoft Word, and for older titles through a complex scanning process involving the physical title itself.
Over the past decade, XML has become widely accepted in the publishing community as the best means for content interchange and storage. Though there remains disagreement regarding specific XML Schema choices and implementations among publishers, it is generally acknowledged that a content-centric workflow with XML as the storage medium within the repository provides the most agile approach to facilitate discovery, transformation and delivery of content.
That being the case, conversion to EPUB without XML as an intermediary would seem to be a short-sited decision for publishers. Not so fast!
Is XML Really the Solution?
Publishing organizations have spent considerable capital over the past decade to convert to XML in order to enjoy the benefits of an electronic representation of their content that minimizes the complexity of conversion in interchange.
XML itself, however, is a generic term that represents any number of of disparate DTDs and W3C XML Schemas.
One could argue then that this investment has been unsuccessful, as many publishers continue to struggle in imposing constraints upon their own content. These publishers often develop custom DTDs and W3C XML Schemas as the representation that limit interchange because external consumers do not have software or transformation rules to interpret these custom schemas.
Forward-thinking XML proponents have encouraged adoptation of standards, such as DITA, DocBook and NLM, each imposing more generic constraints. Unfortunately, these schemas are often too generic to accurately represent content. Additionally, because a standard has not emerged throughout the publishing industry, publishers spent considerable resources and investment in conversion only to find that there are no consumers able to natively process these standards.
Perhaps most importantly, editorial teams have had to undergo training to understand this XML and learn to create and edit it. Over the years, XML editorial software has emerged and evolved to help enable publishers to modify this XML and impose constraints upon editing. This represents a further investment into training and software licenses.
Finally, while touted as a content storage format to facilitate interchange, XML itself is simply a container format. Most often, it is not read and processed by consumers without applied transformation. Web sites, eBook readers, mobile devices, and other containers cannot natively read and process XML without transformation as part of the delivery process.
Given that context, is XML still a wise investment? Yes, it is, but perhaps there is already a standard that can be utilized to meet the stated goals of the publishing community.
A Peek Inside EPUB
EPUB itself is a compressed zip file, comprised of XHTML, CSS, images and two ancillary metadata files. It is, therefore, simply a markup language derived from XML and HTML. Most eBook readers are in fact analogous to web browsers, capable of reading this markup language and rendering it within the reader.
Publishers who have converted their content into EPUB format have in fact already converted it to XML.
XHTML is an operational standard, a hybrid approach between utilizing a readable format that can be modified by editors and an interpreted markup language that can be rendered by eBook readers, web browsers and mobile devices.
EPUB: More than an eBook Format
Publishers who have converted their content to EPUB format will find that they can now enjoy the benefits of XML. With the proper tools, EPUB content can be loaded into a native XML repository such that content can be mined and delivered. Within a native XML database, such as MarkLogic or eXist, or a native XML CMS such as RSuite CMS, it can be managed down to the chapter and section level, then compiled into on-the-fly derivative products.
Why not use EPUB as your standard?




0 Comments:
Post a Comment
Subscribe to Post Comments [Atom]
<< Home