Metadata is information about a book. For ebooks, metadata must be inside the ebook, stored in a standardised way, so that software systems (like your personal library of ebooks or an aggregator’s large-scale asset management system) can organise the books in a sensible way. The more metadata you include in an ebook, the better.
An epub must include at least these items of metadata to be valid:
- the title
- the creator (usually the author)
- a unique identifier of some sort (such as an ISBN or UUID; InDesign generates a UUID for you when you export to epub).
- a date of publication, at least the year.
The metadata for an epub is stored in the content.opf file. To edit it, open content.opf (in a text/code editor like Textpad) and find the
<metadata> tag. You’ll see a list of metadata that looks something like this:
<metadata xmlns:dc="http://purl.org/dc/elements/1.1/"> <dc:identifier id="bookid">urn:uuid:1ea984da-00f4-4946-7494-579321dbfa93</dc:identifier> <dc:identifier id="ISBN">9780000000000</dc:identifier> <dc:title>Motherhood and Me</dc:title> <dc:creator>Lindy Bruce</dc:creator> <dc:publisher>Oshun Books</dc:publisher> <dc:language>en-GB</dc:language> <dc:date>2009-06-01</dc:date> </metadata>
These are our minimum recommended metadata tags. They show:
- this ebook has two identifiers, an ISBN and a UUID
- the book's title is "Motherhood and Me"
- its author is Lindy Bruce
- its publisher is Oshun Books
- its language is British English
- the epub was published on 1 June 2009.
The ‘dc’ stands for Dublin Core, a standardised list of metadata items. You can add more metadata if needed, using Dublin Core tags.
For example, you can put a book’s blurb or jacket copy in a
Further reading on identifiers
There are slight variations on how metadata can be entered in an epub. In this very useful note on identifiers by Liza Daly, she uses ISBNs as identifiers like this:
<dc:identifier xmlns:dc="http://purl.org/dc/elements/1.1/" id="bookid" opf:scheme="ISBN">urn:isbn:9780596158347</dc:identifier>
Further reading on metadata
There are lots of sensible pieces written about metadata. For instance, those by Laura Dawson or Michael Cairns. Or you could read this post by wildly opinionated stirrer Mike Cane, which sums up the issue well.
Troubleshooting the epub's 'date' metadata
If you check your epub with epubcheck version 1.0.5 or later, you may get this error message:
ERROR: title.epub/OEBPS/content.opf(2): date value '' is not valid, YYYY[-MM[-DD]] expected
To fix the date: in the metadata section of your content.opf file, replace this code
That date’s an example, of course. Change that to your publication date. (If you don’t have a <dc:date/> tag, just add the line.)
That is, instead of an empty self-closing dc:date tag, you’re inserting an opening dc:date tag, with the date in the format YYYY[-MM[-DD]], and closing the tag. That date format means the four-digit year (YYYY) is mandatory, and a two-digit month (MM) is optional. If MM is provided, a two-digit day (DD) is optional.
Background: As of version 1.0.5, EpubCheck started checking for correctly formed pub-date metadata in epubs. It should always have done so, since correct pub-date metadata is required by the OPF part of the epub specification. InDesign CS4 does not yet ask for this pub-date metadata when creating an epub, nor create it in its exported epubs. As a result, epubs created with CS4 do not validate with epubcheck-1.0.5.