EBW Knowledge Base

Check metadata

Metadata is information about a book. For ebooks, metadata must be inside the ebook, stored in a standardised way, so that software systems (like your personal library of ebooks or an aggregator’s large-scale asset management system) can organise the books in a sensible way. The more metadata you include in an ebook, the better.

There is an excellent description of metadata information on the MobileRead wiki.

An epub must include at least these items of metadata to be valid:

  • the title
  • the creator (usually the author)
  • a unique identifier of some sort (such as an ISBN or UUID; InDesign generates a UUID for you when you export to epub).
  • a date of publication, at least the year.

The metadata for an epub is stored in the content.opf file. To edit it, open content.opf (in a text/code editor like Textpad) and find the <metadata> tag. You’ll see a list of metadata that looks something like this:

<metadata xmlns:dc="http://purl.org/dc/elements/1.1/">
    <dc:identifier id="bookid">urn:uuid:1ea984da-00f4-4946-7494-579321dbfa93</dc:identifier>
    <dc:identifier id="ISBN">9780000000000</dc:identifier>
    <dc:title>Motherhood and Me</dc:title>
    <dc:creator>Lindy Bruce</dc:creator>
    <dc:publisher>Oshun Books</dc:publisher>
    <dc:language>en-GB</dc:language>
    <dc:date>2009-06-01</dc:date>
</metadata>

These are our minimum recommended metadata tags. They show:

  • this ebook has two identifiers, an ISBN and a UUID
  • the book’s title is “Motherhood and Me”
  • its author is Lindy Bruce
  • its publisher is Oshun Books
  • its language is British English
  • the epub was published on 1 June 2009.

The ‘dc’ stands for Dublin Core, a standardised list of metadata items. You can add more metadata if needed, using Dublin Core tags.

For example, you can put a book’s blurb or jacket copy in a dc:description tag.

Further reading on identifiers

There are slight variations on how metadata can be entered in an epub. In this very useful note on identifiers by Liza Daly, she uses ISBNs as identifiers like this:

<dc:identifier xmlns:dc="http://purl.org/dc/elements/1.1/"
                 id="bookid"
                 opf:scheme="ISBN">urn:isbn:9780596158347</dc:identifier>

Further reading on metadata

There are lots of sensible pieces written about metadata. For instance, those by Laura Dawson or Michael Cairns. Or you could read this post by wildly opinionated stirrer Mike Cane, which sums up the issue well.

Troubleshooting the epub’s ‘date’ metadata

If you check your epub with epubcheck version 1.0.5 or later, you may get this error message:

ERROR: title.epub/OEBPS/content.opf(2): date value '' is not valid, YYYY[-MM[-DD]] expected

To fix the date: in the metadata section of your content.opf file, replace this code

<dc:date/>

with this

<dc:date>2010-06</dc:date>

That date’s an example, of course. Change that to your publication date. (If you don’t have a <dc:date/> tag, just add the line.)

That is, instead of an empty self-closing dc:date tag, you’re inserting an opening dc:date tag, with the date in the format YYYY[-MM[-DD]], and closing the tag. That date format means the four-digit year (YYYY) is mandatory, and a two-digit month (MM) is optional. If MM is provided, a two-digit day (DD) is optional.

Background: As of version 1.0.5, EpubCheck started checking for correctly formed pub-date metadata in epubs. It should always have done so, since correct pub-date metadata is required by the OPF part of the epub specification. InDesign CS4 does not yet ask for this pub-date metadata when creating an epub, nor create it in its exported epubs. As a result, epubs created with CS4 do not validate with epubcheck-1.0.5.

6 Comments

  1. arthurattwell says:

    I’ve added a section on troubleshooting missing date metadata in epubs.

  2. Arthur says:

    A tip: for editing metadata in epubs, you don’t have to get at the .opf file’s code inside the epub yourself. You can also use the excellent open-source apps Calibre (http://calibre-ebook.com) or Sigil (http://code.google.com/p/sigil/) to edit metadata.

  3. Manuela says:

    Hello there.
    I doing my ebook with Indesign, because Calibre give me many problems.
    So, in Indesign, where i must insert this:

    fix the date: in the metadata section of your content.opf file, replace this code

    with this

    2010-06

    Thanks

  4. Arthur says:

    Manuela: You don’t add that code in InDesign, but rather in the content.opf file that is generated when you export to epub from InDesign. That is, when you have your epub exported from InDesign, you unpackage the .epub file and edit the content.opf file inside it.

    However, if you’re not confident editing code, I highly suggest using Sigil to check and edit your epub after exporting from InDesign. Sigil will automatically fix this date problem for you. (Sigil is free and open-source software. Just Google ‘sigil’ to find it.)

  5. pl says:

    Re the link to “linda dawson.” Not. It looks like Ms. Laura Dawson to me, & one website, which I think holds the referenced interview, can be found at

    http://toc.oreilly.com/2011/01/metadata-digital-publishing.html

  6. Arthur says:

    Yikes, thank you for catching that silly mistake! Laura is a good friend, too, so it’s especially embarrassing :) Fixed. Thanks for the link.

Leave a Reply