Category: Self-publishing

An Introduction to EPUB

When I was writing code to add EPUB publishing to AD Book Builder, I found there wasn’t a lot of information online about the EPUB file format. In this article I’m sharing what I learned in the hope that it helps others.

Tools

Two tools helped me learn about the EPUB file format. Sigil is an EPUB book editor. Being able to open EPUB books and see their contents taught me a lot about the EPUB file format.

IDPF, the group that creates the EPUB specification, has a validator tool. Click the Choose File button to upload an EPUB book. Click the Validate button to see if your EPUB book has valid EPUB.

EPUB Overview

An EPUB book is basically a zip archive of a website. Each chapter of your book is a web page. Like a website an EPUB book can include CSS files to style the book, fonts, image files, audio files, and video files.

The Root of the Archive

The root of an EPUB archive has three items.

  • mimetype
  • META-INF folder
  • OEBPS folder

The mimetype File

The mimetype file identifies the book as being an EPUB book. It is a very short file.

application/epub+zip

Make sure you don’t press the Return key to create a new line.

The mimetype file must be the first item in the EPUB archive. The file must be uncompressed.

META-INF Folder

The META-INF folder must have at least one file in it: the container file. The container file has the filename container.xml. The container file specifies where the book’s content (the OPF file) resides in the book’s EPUB archive. The following code shows a standard container file:

<?xml version="1.0" encoding="UTF-8"?>
    <container version="1.0"            
        xmlns="urn:oasis:names:tc:opendocument:xmlns:container">
        <rootfiles>
            <rootfile full-path="OEBPS/content.opf"
                media-type="application/oebps-package+xml" />
        </rootfiles>
    </container>

If you place the OPF file inside the OEBPS folder, you should be able to copy and paste the code into your own container file.

OEBPS Folder

Most of your book’s content resides in the OEBPS folder. Your book’s chapters reside in the OEBPS folder along with any additional files, such as image, audio, and video files.

In addition to text, image, audio, and video files, the OEBPS folder contains the following items:

  • OPF file
  • NAV file
  • NCX file

OPF File

The OPF file, named content.opf, is an XML file that lists the content in the book. The start of the file specifies the XML version and the package version, which is the EPUB version. The following XML code shows the start of an OPF file:

<?xml version="1.0" encoding="utf-8"?>
<package version="3.0" unique-identifier="pub-identifier" 
    xmlns="http://www.idpf.org/2007/opf">

The version="3.0" part specifies that the book is an EPUB 3 book.

There are three sections you must include in the OPF file.

  • Metadata
  • Manifest
  • Spine

Metadata

The metadata section contains information about the book. The metadata starts with a <metadata> tag.

<metadata xmlns:dc="http://purl.org/dc/elements/1.1/">

An EPUB 3 book requires the following metadata entries:

  • Title
  • Language
  • Identifier
  • Modified Date

The title entry specifies the title of the book.

<dc:title id="pub-title">Simple Book: A Beginning</dc:title>

The language specifies the language used to write the book. The following code shows an entry for a book written in United States English:

<dc:language>en-US</dc:language>

The identifier is a unique identifier for the book, such as an ISBN number.

<dc:identifier id="pub-identifier">urn:uid:1250064712</dc:identifier>

The modified date specifies the date and time the book was last modified.

<meta property="dcterms:modified">2019-05-22T12:00:00Z</meta>

You must use the format string CCYY-MM-DDThh:mm:ssZ to format the date and time. As you can see in the code example, you need the letter T between the date and time and the letter Z after the time. The EPUB standard is very picky about the modified date. You can’t just enter the date. You have to include the date and time in the right format.

Common optional metadata entries include the book’s author, publisher, and copyright. The following code shows an example of a metadata entry for a book’s author:

<dc:creator>Mark Szymczyk</dc:creator>

Add the closing tag to end the metadata section.

</metadata> 

Manifest

The manifest contains a list of every file in the EPUB book. The following code contains a short example of a manifest:

<manifest>
    <item id="nav" href="Text/nav.xhtml" media-type="application/xhtml+xml" 
        properties="nav"/>
    <item id="Chapter1" href="Text/Chapter1.xhtml" media-type="application/xhtml+xml"/>
    <item id="Chapter2" href="Text/Chapter2.xhtml" media-type="application/xhtml+xml"/>
    <item id="ncx" href="toc.ncx" media-type="application/x-dtbncx+xml"/>
</manifest>

There are four items in the example: the NAV file, two chapters, and the NCX file. Each manifest item has the following properties:

  • id, which identifies the manifest item.
  • href, which specifies where the item resides in the OEBPS folder.
  • media-type, which specifies the type of file. Text files usually have the type "application/xhtml+xml".

The NAV file has an additional property that specifies it is used to navigate the book as a table of contents.

A real book is going to have a much longer manifest. There will be a manifest entry for each chapter in the book as well as an entry for each image used in the book.

Spine

The spine contains a list of all the files in the book in linear reading order.

<spine toc="ncx">
    <itemref idref="Chapter1"/>
    <itemref idref="Chapter2"/>
</spine>

There is an <itemref> tag for each file in the spine.

Navigation is an important feature in an ebook. People want to jump to specific chapters and sections in a book. As a reader it would be annoying to have to navigate page by page.

EPUB has two files for book navigation in e-readers: the NAV file and the NCX file.

In EPUB 3 you use the NAV file, named nav.xhtml, to declare the book’s table of contents. The start of the file contains boilerplate code identifying the book as an XHTML document.

<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html>

<html xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops" 
    xml:lang="en" lang="en">

The header follows. It usually contains the title of the book.

<head>
<title>Simple Book: A Beginning</title>
</head>

The table of contents is an ordered HTML list. Each list item is an HTML link whose destination is the location of the item inside the EPUB archive.

<body>
<nav epub:type="toc" id="toc">
<h1>Table of Contents</h1>
<ol>
    <li><a href="../Text/Chapter1.xhtml">Chapter 1</a></li>
    <li><a href="../Text/Chapter2.xhtml">Chapter 2</a></li>
</ol>

</nav>
</body>
</html>

NCX File

The NCX file, named toc.ncx, also contains the book’s table of contents. The NCX file provides compatibility with older EPUB versions.

The start of the file contains boilerplate code identifying the book as an NCX document.

<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE ncx PUBLIC "-//NISO//DTD ncx 2005-1//EN"
    "http://www.daisy.org/z3986/2005/ncx-2005-1.dtd">
<ncx version="2005-1" xmlns="http://www.daisy.org/z3986/2005/ncx/">

The header follows.

<head>
    <meta content="urn:uid:1250064712" name="dtb:uid"/>
    <meta content="0" name="dtb:depth"/>
    <meta content="0" name="dtb:totalPageCount"/>
    <meta content="0" name="dtb:maxPageNumber"/>
</head>

The first meta entry is the identifier. The identifier must match the identifier you gave the book in the metadata in the OPF file. The second meta entry lets you specify how many levels and sub-levels appear in the table of contents menu. You shouldn’t have to change the last two meta entries.

The title of the book follows the header.

<docTitle>
    <text>Simple Book: A Beginning</text>
</docTitle>

The table of contents appear as a navigation map. Each item in the navigation map has a navigation point. The navigation point contains an ID and its order in the book. Each navigation point includes a navigation label and the location of the item in the book.

<navMap>
    <navPoint id="nav_1" playOrder="1">
        <navLabel>
            <text>Chapter 1</text>
        </navLabel>
        <content src="Text/Chapter1.xhtml"/>
    </navPoint>

    <navPoint id="nav_2" playOrder="2">
        <navLabel>
            <text>Chapter 2</text>
        </navLabel>
        <content src="Text/Chapter2.xhtml"/>
    </navPoint>
</navMap>
</ncx>

Text Folder

Most EPUB books place their chapters inside a Text folder inside the OEBPS folder. It’s not mandatory to have a Text folder, but having your chapters in a separate folder keeps your EPUB archive organized.

Additional Folders

In addition to a Text folder, having the following additional folders can help you keep track of your book’s files:

  • A Styles folder for CSS files to style your book
  • An Images folder for your book’s images
  • A Fonts folder for fonts you embed in your book
  • An Audio folder for audio files
  • A Video folder for video files

A Sample Chapter

The last thing your EPUB needs is chapters. Chapters are XHTML files. You should have one XHTML file for each chapter in the book. The following markup shows the shell of an XHTML file for a chapter:

<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html>

<html xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops" 
    xml:lang="en">
<head>
    <title>Chapter Title</title>    
</head>

<body>

</body>
</html>

The contents of the chapter go between the <body> and </body> tags.

Additional Reading

Elizabeth Castro’s book, EPUB Straight to the Point, has a chapter on the EPUB file format that I found helpful.

Liza Daly wrote two articles on IBM’s developer site on EPUB that may help you learn about the EPUB file format.

If you prefer video, Apple has two WWDC videos on EPUB.

Tools for Indie Authors: Scrivener

If you have a Mac and want to write and publish a book, give Scrivener a try. Scrivener is a writing app that allows you to outline, research, write, and publish your book in a single app.

Scrivener 3 was recently released for Mac, and it allows you to publish PDF, EPUB, and MOBI (Kindle) books by choosing File > Compile in Scrivener. I tried compiling an EPUB book in Scrivener 3, and it looked good without me having to do anything. After using Scrivener 3 I lost enthusiasm for working on Tome Builder because Scrivener 3 does everything I wanted Tome Builder to do.

Scrivener also has iOS and Windows versions, but I haven’t used them. The Mac and Windows versions have 30 day trials, which you can download from the Scrivener site.

Places to Publish: Smashwords

Smashwords is a place to publish, sell, and distribute ebooks. Smashwords provides ebook distribution to many retailers, including Apple’s iBooks and Barnes and Noble.

Pricing

Smashwords charges no upfront fees. They take a percentage of each ebook you sell. For books you sell through the Smashwords bookstore, you keep 80% of the book’s list price. For books sold outside the Smashwords store, you keep 60% of the book’s list price.

Strengths

Smashwords provides an easy way to distribute your ebook to many stores.

Weaknesses

You can make more money per sale by using Paddle or Gumroad. A $10 ebook would net you at least $1 more using Paddle or Gumroad instead of Smashwords.

You can’t sell books on Amazon using Smashwords.

Smashwords does not print books.

Summary

If you want to sell ebooks at many retailers, give Smashwords a try.

Tools for Indie Authors: Canva

Canva is an online graphic design app. Canva won’t help you write your book, but it can help you market and promote your book. Indie authors can use Canva to create the following:

  • Book covers
  • Logos
  • Flyers and posters for book signings
  • Twitter header images
  • Images for blog posts and tweets
  • Business cards

Using Canva

When using Canva you start by choosing what you want to design: book cover, logo, poster, etc. After making your decision either start with a template (Canva calls them layouts) or a blank canvas.

If you start with a template, the next step is to modify it to suit your needs. Some of the ways you can modify a template include the following:

  • Replace the placeholder text with your text
  • Change the color of text and other elements
  • Change the font and font size of the text
  • Change the text alignment

If you start with a blank canvas, you’ll have to add a background, text, and other elements. You can either go with a solid background color or choose from dozens of background patterns. For text you can add headings, body text, or choose from dozens of templates that let you have text inside shapes. Canva has the following elements to add to a design:

  • Photos
  • Grids
  • Frames
  • Shapes
  • Lines
  • Illustrations
  • Icons
  • Charts

Text and elements can also be added to designs that start from a template. You can also upload your own photos and images to use in your designs.

What’s nice about Canva is the templates provide a good starting point towards designing something that looks professional. But there are also enough ways to customize your designs to make them stand out.

Pricing

Canva provides a free version for teams of up to 10 people that includes 1 GB of storage for your designs. The free version should be sufficient for indie authors. There’s also a paid version for $13 a month that provides the following benefits:

  • Unlimited storage
  • More templates, photos, and illustrations to choose from
  • The ability to upload your own fonts
  • The ability to quickly resize your designs
  • Support for teams of up to 30 people

Choosing the Line Spacing for Your Book

Line spacing is the amount of space between lines of text in a paragraph. If there’s not enough space between lines, text becomes difficult to read. Place too much space between lines, and the lines in the paragraph don’t look like they belong together.

The line spacing for paragraphs should be 120-145% of the font size. If you have a 10 point font for your book’s body text, the line spacing should be 12-14.5 points. I recommend starting with the lower end of the scale, 120% or 125% of the font size, and see how that looks. As a point of reference, the line spacing for paragraphs on this blog is 130% of the font size.