Author: Mark Szymczyk

An Introduction to EPUB

When I was writing code to add EPUB publishing to AD Book Builder, I found there wasn’t a lot of information online about the EPUB file format. In this article I’m sharing what I learned in the hope that it helps others.

Tools

Two tools helped me learn about the EPUB file format. Sigil is an EPUB book editor. Being able to open EPUB books and see their contents taught me a lot about the EPUB file format.

IDPF, the group that creates the EPUB specification, has a validator tool. Click the Choose File button to upload an EPUB book. Click the Validate button to see if your EPUB book has valid EPUB.

EPUB Overview

An EPUB book is basically a zip archive of a website. Each chapter of your book is a web page. Like a website an EPUB book can include CSS files to style the book, fonts, image files, audio files, and video files.

The Root of the Archive

The root of an EPUB archive has three items.

  • mimetype
  • META-INF folder
  • OEBPS folder

The mimetype File

The mimetype file identifies the book as being an EPUB book. It is a very short file.

application/epub+zip

Make sure you don’t press the Return key to create a new line.

The mimetype file must be the first item in the EPUB archive. The file must be uncompressed.

META-INF Folder

The META-INF folder must have at least one file in it: the container file. The container file has the filename container.xml. The container file specifies where the book’s content (the OPF file) resides in the book’s EPUB archive. The following code shows a standard container file:

<?xml version="1.0" encoding="UTF-8"?>
    <container version="1.0"            
        xmlns="urn:oasis:names:tc:opendocument:xmlns:container">
        <rootfiles>
            <rootfile full-path="OEBPS/content.opf"
                media-type="application/oebps-package+xml" />
        </rootfiles>
    </container>

If you place the OPF file inside the OEBPS folder, you should be able to copy and paste the code into your own container file.

OEBPS Folder

Most of your book’s content resides in the OEBPS folder. Your book’s chapters reside in the OEBPS folder along with any additional files, such as image, audio, and video files.

In addition to text, image, audio, and video files, the OEBPS folder contains the following items:

  • OPF file
  • NAV file
  • NCX file

OPF File

The OPF file, named content.opf, is an XML file that lists the content in the book. The start of the file specifies the XML version and the package version, which is the EPUB version. The following XML code shows the start of an OPF file:

<?xml version="1.0" encoding="utf-8"?>
<package version="3.0" unique-identifier="pub-identifier" 
    xmlns="http://www.idpf.org/2007/opf">

The version="3.0" part specifies that the book is an EPUB 3 book.

There are three sections you must include in the OPF file.

  • Metadata
  • Manifest
  • Spine

Metadata

The metadata section contains information about the book. The metadata starts with a <metadata> tag.

<metadata xmlns:dc="http://purl.org/dc/elements/1.1/">

An EPUB 3 book requires the following metadata entries:

  • Title
  • Language
  • Identifier
  • Modified Date

The title entry specifies the title of the book.

<dc:title id="pub-title">Simple Book: A Beginning</dc:title>

The language specifies the language used to write the book. The following code shows an entry for a book written in United States English:

<dc:language>en-US</dc:language>

The identifier is a unique identifier for the book, such as an ISBN number.

<dc:identifier id="pub-identifier">urn:uid:1250064712</dc:identifier>

The modified date specifies the date and time the book was last modified.

<meta property="dcterms:modified">2019-05-22T12:00:00Z</meta>

You must use the format string CCYY-MM-DDThh:mm:ssZ to format the date and time. As you can see in the code example, you need the letter T between the date and time and the letter Z after the time. The EPUB standard is very picky about the modified date. You can’t just enter the date. You have to include the date and time in the right format.

Common optional metadata entries include the book’s author, publisher, and copyright. The following code shows an example of a metadata entry for a book’s author:

<dc:creator>Mark Szymczyk</dc:creator>

Add the closing tag to end the metadata section.

</metadata> 

Manifest

The manifest contains a list of every file in the EPUB book. The following code contains a short example of a manifest:

<manifest>
    <item id="nav" href="Text/nav.xhtml" media-type="application/xhtml+xml" 
        properties="nav"/>
    <item id="Chapter1" href="Text/Chapter1.xhtml" media-type="application/xhtml+xml"/>
    <item id="Chapter2" href="Text/Chapter2.xhtml" media-type="application/xhtml+xml"/>
    <item id="ncx" href="toc.ncx" media-type="application/x-dtbncx+xml"/>
</manifest>

There are four items in the example: the NAV file, two chapters, and the NCX file. Each manifest item has the following properties:

  • id, which identifies the manifest item.
  • href, which specifies where the item resides in the OEBPS folder.
  • media-type, which specifies the type of file. Text files usually have the type "application/xhtml+xml".

The NAV file has an additional property that specifies it is used to navigate the book as a table of contents.

A real book is going to have a much longer manifest. There will be a manifest entry for each chapter in the book as well as an entry for each image used in the book.

Spine

The spine contains a list of all the files in the book in linear reading order.

<spine toc="ncx">
    <itemref idref="Chapter1"/>
    <itemref idref="Chapter2"/>
</spine>

There is an <itemref> tag for each file in the spine.

Navigation is an important feature in an ebook. People want to jump to specific chapters and sections in a book. As a reader it would be annoying to have to navigate page by page.

EPUB has two files for book navigation in e-readers: the NAV file and the NCX file.

In EPUB 3 you use the NAV file, named nav.xhtml, to declare the book’s table of contents. The start of the file contains boilerplate code identifying the book as an XHTML document.

<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html>

<html xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops" 
    xml:lang="en" lang="en">

The header follows. It usually contains the title of the book.

<head>
<title>Simple Book: A Beginning</title>
</head>

The table of contents is an ordered HTML list. Each list item is an HTML link whose destination is the location of the item inside the EPUB archive.

<body>
<nav epub:type="toc" id="toc">
<h1>Table of Contents</h1>
<ol>
    <li><a href="../Text/Chapter1.xhtml">Chapter 1</a></li>
    <li><a href="../Text/Chapter2.xhtml">Chapter 2</a></li>
</ol>

</nav>
</body>
</html>

NCX File

The NCX file, named toc.ncx, also contains the book’s table of contents. The NCX file provides compatibility with older EPUB versions.

The start of the file contains boilerplate code identifying the book as an NCX document.

<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE ncx PUBLIC "-//NISO//DTD ncx 2005-1//EN"
    "http://www.daisy.org/z3986/2005/ncx-2005-1.dtd">
<ncx version="2005-1" xmlns="http://www.daisy.org/z3986/2005/ncx/">

The header follows.

<head>
    <meta content="urn:uid:1250064712" name="dtb:uid"/>
    <meta content="0" name="dtb:depth"/>
    <meta content="0" name="dtb:totalPageCount"/>
    <meta content="0" name="dtb:maxPageNumber"/>
</head>

The first meta entry is the identifier. The identifier must match the identifier you gave the book in the metadata in the OPF file. The second meta entry lets you specify how many levels and sub-levels appear in the table of contents menu. You shouldn’t have to change the last two meta entries.

The title of the book follows the header.

<docTitle>
    <text>Simple Book: A Beginning</text>
</docTitle>

The table of contents appear as a navigation map. Each item in the navigation map has a navigation point. The navigation point contains an ID and its order in the book. Each navigation point includes a navigation label and the location of the item in the book.

<navMap>
    <navPoint id="nav_1" playOrder="1">
        <navLabel>
            <text>Chapter 1</text>
        </navLabel>
        <content src="Text/Chapter1.xhtml"/>
    </navPoint>

    <navPoint id="nav_2" playOrder="2">
        <navLabel>
            <text>Chapter 2</text>
        </navLabel>
        <content src="Text/Chapter2.xhtml"/>
    </navPoint>
</navMap>
</ncx>

Text Folder

Most EPUB books place their chapters inside a Text folder inside the OEBPS folder. It’s not mandatory to have a Text folder, but having your chapters in a separate folder keeps your EPUB archive organized.

Additional Folders

In addition to a Text folder, having the following additional folders can help you keep track of your book’s files:

  • A Styles folder for CSS files to style your book
  • An Images folder for your book’s images
  • A Fonts folder for fonts you embed in your book
  • An Audio folder for audio files
  • A Video folder for video files

A Sample Chapter

The last thing your EPUB needs is chapters. Chapters are XHTML files. You should have one XHTML file for each chapter in the book. The following markup shows the shell of an XHTML file for a chapter:

<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html>

<html xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops" 
    xml:lang="en">
<head>
    <title>Chapter Title</title>    
</head>

<body>

</body>
</html>

The contents of the chapter go between the <body> and </body> tags.

Additional Reading

Elizabeth Castro’s book, EPUB Straight to the Point, has a chapter on the EPUB file format that I found helpful.

Liza Daly wrote two articles on IBM’s developer site on EPUB that may help you learn about the EPUB file format.

If you prefer video, Apple has two WWDC videos on EPUB.

AD Book Builder 0.2

I released version 0.2 of AD Book Builder. You can download it from the AD Book Builder page on this site.

Version 0.2 provides EPUB support. You can publish EPUB books with simple formatting needs, such as novels.

One user interface change in version 0.2 is the addition of a Book menu to the menu bar. The Book menu has two items: a Publish item to publish a book, and an Edit Title/Author item. The Edit Title/Author menu item opens a sheet to enter your book’s title, author, and unique identifier. If you have an ISBN number, use that as the book’s unique identifier.

AM Pages for iOS

AM Pages is now available at the App Store. I think the app works best on an iPad with a hardware keyboard.

On iPad AM Pages works best in landscape orientation because both the list of pages and the contents of the selected page fit side by side. On iPhone AM Pages works best in portrait orientation. Landscape orientation on iPhones doesn’t let you see enough text due to the onscreen keyboard.