Kindle Topaz Format .tpz

Snooping around to figure out the details of Kindle Topaz (.tpz format). Its the new format that newer Kindle Edition Books have been coming out in since end 2008 -

  1. If you get a topaz file via WhisperNet, the file gets named bookname.azw1. If you download it, it gets named bookname.tpz. 
  2. It allows for embedded fonts. Courtesy Igor via a post comment -

    Topaz format is more advanced – it supports different fonts and even custom glyphs, but apparently it’s only available to big publishers.

  3. As Igor points out, if you’re not a big publisher you’re probably not going to get access to Topaz.
  4. The ability to embed fonts means you can create a different look. A few comments have mentioned beautifully formatted Topaz books. However, some of the time the publishers mess up and changing font sizes really messes with the look.
  5. When the book is listed on its Amazon pages with ‘Number of Pages’ and not ‘File Size’ then it’s probably a Topaz book.
  6. The DRM for Topaz hasn’t been broken yet.

    This might explain why a lot of big publishers are being given the option to use Topaz and are preferring it.

  7. Topaz books are slower to open, and its slower to power up your Kindle from sleep mode if you’ve got one open.
  8. Page Turns are quite fast for me – opinions are mixed in the forums. Here’s a video –
    TopazVsAZW2

    TopazVsAZW2

    This movie requires Adobe Flash for playback.

    . Do note that this is an example of a book that isn’t well served by the Topaz format. As far as slower book load times – that occurs across all the Topaz books I’ve bought (5 or so).
  9. The embedding fonts ability in Topaz means it ought to be able to support Chinese, Japanese and other non-latin fonts.
  10. Topaz files tend to be bigger in size – based on a very, very small sample size.
  11. Your bookmarks, notes etc. for a Topaz file are saved in a .TAN file.

Thanks to a discussion at Mobileread for some of the data points.

Overall, its interesting to see a new format that has support for custom fonts. The size is a big issue for me and its a bit annoying to have to wait 5 or more seconds for the Kindle to load up from sleep mode if you have a topaz file open.

11 Responses

  1. Hello, thanks for the video. I’m enoying your blog very much.

    It looks to me like the topaz format book in the video is using full justification rather than left justification shown in your first sample. The publisher can set this on a book by book basis…and it drives me crazy. I hate seeing the extra white space between words. It is the publishers idea of how a book should look. Bah. We should have the ability to change the font and the justification on any book we buy. We don’t have control over either. It is unfortunate.

    -robert

    • thanks for the kind words. yes, that’s probably what it is. The addition of the ‘words per line’ option in thekindle dx is a good sign. Perhaps they add more options down the line like justification and fonts. One to ‘bold’ text would be another good one to add.

  2. huh… i guess i never noticed. ill have to plug my kindle in to the pc and see if i have any of these new format books… although i dont much care either way, since like i said, i havent noticed any difference…

  3. New from 2008? It’s not new.

    Basically, it looks like a photocopy of a print book; that is, it doesn’t look nice. You tend to end up with uneven fonts, and ink splotches on the page. Topaz is one of the reasons I check the sample of every Kindle book prior to purchase. Much of the formatting done for the Kindle has been extremely shoddy.

  4. I have a topaz book in which a special font appears to have been used to scramble the text. For example, sometimes there is a word that begins with a capital H, and — probably because of a typographical error — the H appears divided into two halves that are separated by at least one space. If you highlight the word that begins with the H and then view the highlight under notes and bookmarks, the word shown in the highlight is completely different from the word actually highlighted, probably because of the typo of the space(s) separating the two halves of the H. I don’t have any specialized knowledge of text encoding, but to me this seems to be some kind of extra DRM.

  5. please get in touch with Kovid Goyal who wrote the Calibre software. He or someone at MobileRead.com might be able to help you.

  6. According to this [link removed due to anti-DRM hack] ebook in topaz format is scanned printed book where individual words, mathematical formulas, charts, etc are placed in separate images to provide reflow with hidden layer of OCRd text to provide search and TTS (similarly to DJVU format). So the format is great for publishers because it does not require any additional work (Amazon scanned thousands of books already). On the other hand there is wide spread misinformation that topaz supports custom fonts – it does not, the text is just shown as in printed book.

  7. @AlexV. I don’t know what exactly are you saying. Topaz does support embed font. Do you mean it is image with OCR? I really don’t understand what is your point.

    • OK, here is what Topaz developer said step by step

      1. the book is scanned so we have one image per page
      2. each page’s image is cut up so that each word goes into its own image so we have as many images per page as there are words in it.
      3 the book in Topaz format is nothing else but container for those words’ images.

      Having each word in separate image allows for reflowing (by rearranging words/images) and resizing (by resizing the words’ images)

      You can not change the font because there is no concept of font in this format – you see the font that was used in printed book. Hence no font embedding – there are no letters to apply the font to.

      4. in addition the book is OCRd. The result is stored as _hidden_ information that is used for instance to search the text of the book – you never see the result of the OCR directly.

      This was the first format that was developed for Kindle. This is what was shown to publishers when Amazon was just thinking about the whole e-reader business and had a prototype of the device. The main reason was that the conversion from printed book to a e-book which looked almost like the original was fully automated.

      Later MOBI became the principal format for Kindle, but to have the book in this new format it basically has to be redesigned almost from scratch (to have all the hyperlinks, etc in place). Hence MOBI came later when publishers got on board.

      All that was not a speculation, I just repeated what the developer said about his work.

      • Thanks. That’s one of the most inelegant solutions I’ve ever heard of. That they literally took an image per page and then split it into an image per word. That’s crazy.

Leave a Reply

Fill in your details below or click an icon to log in:

Gravatar
WordPress.com Logo

Please log in to WordPress.com to post a comment to your blog.

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 5,220 other followers