Metadata and image issues #2

Closed
opened 2 months ago by adiabatic · 4 comments

Hi! Your project got mentioned on the Gemini mailing list and I wanted to voice support of a Gemtext-based ePub-equivalent.

That said, there are a couple questions I wanted to raise…

• mimetype - an ascii file containing the string: application/gpub+zip - this should give existing eBook readers an opportunity to fail gracefully

Writing a file like this is a minor burden best borne only if it's needed. Some text editors might insert a BOM and/or have weird line endings.

If this file isn't present, will existing ePub readers fail differently, or fail in a worse way? If this file isn't strictly needed as a compatibility tool, I think it'd be best if it weren't required (or even recommended) in the spec.

Considering that some ePub readers actually rely on the first file being in the zip file but uncompressed, I strongly suspect that this sort of internal MIME guard is unnecessary.

• metadata - an ascii file containing the title, author and any other optional fields, see Metadata below

Would you consider renaming this to metadata.txt for the benefit of people who are on OSs where it's difficult to interact with files that don't have extensions?

Metadata

Keys are case-insensitive

To make things easier on people who write parsers for this sort of thing in languages that don't have regexen, would you consider having keys be case-sensitive?

• cover - a jpg or png image which can be anywhere in the directory structure.
[…]
Supported formats are PNG and JPEG as they're common and included on most/all platforms.

I'm not a super-huge Google fan, but why disallow GIF and WebP? (I currently wouldn't recommend using WebP for something like this, but I can't see why using it should be disallowed by the spec.)

Images must always include a description for accessibility:

What if Gempub is being used as an archive format and the original source document lacks alt text for image links (because, say, it's a directory listing)?

Thanks for reading!

Hi! Your project got mentioned on the Gemini mailing list and I wanted to voice support of a Gemtext-based ePub-equivalent. That said, there are a couple questions I wanted to raise… > • mimetype - an ascii file containing the string: application/gpub+zip - this should give existing eBook readers an opportunity to fail gracefully Writing a file like this is a minor burden best borne only if it's needed. Some text editors might insert a BOM and/or have weird line endings. If this file isn't present, will _existing_ ePub readers fail differently, or fail in a worse way? If this file isn't _strictly_ needed as a compatibility tool, I think it'd be best if it weren't required (or even recommended) in the spec. Considering that some ePub readers actually rely on the first file being in the zip file but uncompressed, I strongly suspect that this sort of internal MIME guard is unnecessary. > • metadata - an ascii file containing the title, author and any other optional fields, see Metadata below Would you consider renaming this to metadata.txt for the benefit of people who are on OSs where it's difficult to interact with files that don't have extensions? > ## Metadata > Keys are case-insensitive To make things easier on people who write parsers for this sort of thing in languages that don't have regexen, would you consider having keys be case-sensitive? > • cover - a jpg or png image which can be anywhere in the directory structure. > […] > Supported formats are PNG and JPEG as they're common and included on most/all platforms. I'm not a super-huge Google fan, but why disallow GIF and WebP? (I currently wouldn't _recommend_ using WebP for something like this, but I can't see why using it should be disallowed by the spec.) > Images must always include a description for accessibility: What if Gempub is being used as an archive format and the original source document lacks alt text for image links (because, say, it's a directory listing)? Thanks for reading!

I've implemented rudimentary support for Gempubs in Lagrange (v1.4), so with that in mind here are my two cents...

Keys are case-insensitive

To make things easier on people who write parsers for this sort of thing in languages that don't have regexen, would you consider having keys be case-sensitive?

+1 on case sensitive keys as a simplification. Although, my parser for this format does not use regex and it's fine, one can use basic case-insensitive length-limited ASCII string compares.

I'm not a super-huge Google fan, but why disallow GIF and WebP?

IMO, it's good that the possible formats for the cover image are limited to PNG and JPEG. This image should be as easily and as widely readable as possible, since it simplifies client implementation and the image can be used as the icon for the Gempub file in various contexts.

What if Gempub is being used as an archive format and the original source document lacks alt text for image links (because, say, it's a directory listing)?

In this case, the tool that created the Gempub archive should show a warning about the missing descriptions.

A reasonable fallback for a client could be to extract the base file name from the link and use it as the description.

In general, the archival use case does conflict with a strict validity check against the Gempub spec — a random capsule naturally wasn't created with the Gempub requirements in mind. If one were to allow any capsule to be archived as-is, a Gempub client would have to be a pretty full-fledged Gemini client as well. It might be worth considering having two Gempub "profiles", something along the lines of:

  • "Book" Gempub: only Gemtext files and media attachments, require media link accessibility descriptions, require all internal links to use absolute paths in the archive (so one doesn't have to resolve relative paths or look up index.gmi's). Targeting a very simple reader app / eBook device.
  • "Archive" Gempub: any kind of files in addition to Gemtext, all kinds of links allowed; just a copy of all the static content on a capsule. Targeting a Gemini client.

In other words, the main difference here would be that the Book profile is a subset of what a full static Gemini capsule can be, with strict requirements about the content. Gemini clients can easily support both profiles, and simple devices don't have to worry about the (relative) complexities of a free-range capsule.

I've implemented rudimentary support for Gempubs in Lagrange (v1.4), so with that in mind here are my two cents... >> Keys are case-insensitive > To make things easier on people who write parsers for this sort of thing in languages that don't have regexen, would you consider having keys be case-sensitive? +1 on case sensitive keys as a simplification. Although, my parser for this format does not use regex and it's fine, one can use basic case-insensitive length-limited ASCII string compares. > I'm not a super-huge Google fan, but why disallow GIF and WebP? IMO, it's good that the possible formats for the cover image are limited to PNG and JPEG. This image should be as easily and as widely readable as possible, since it simplifies client implementation and the image can be used as the icon for the Gempub file in various contexts. > What if Gempub is being used as an archive format and the original source document lacks alt text for image links (because, say, it's a directory listing)? In this case, the tool that created the Gempub archive should show a warning about the missing descriptions. A reasonable fallback for a client could be to extract the base file name from the link and use it as the description. In general, the archival use case does conflict with a strict validity check against the Gempub spec — a random capsule naturally wasn't created with the Gempub requirements in mind. If one were to allow any capsule to be archived as-is, a Gempub client would have to be a pretty full-fledged Gemini client as well. It might be worth considering having two Gempub "profiles", something along the lines of: * "Book" Gempub: only Gemtext files and media attachments, require media link accessibility descriptions, require all internal links to use absolute paths in the archive (so one doesn't have to resolve relative paths or look up index.gmi's). Targeting a very simple reader app / eBook device. * "Archive" Gempub: any kind of files in addition to Gemtext, all kinds of links allowed; just a copy of all the static content on a capsule. Targeting a Gemini client. In other words, the main difference here would be that the Book profile is a subset of what a full static Gemini capsule can be, with strict requirements about the content. Gemini clients can easily support both profiles, and simple devices don't have to worry about the (relative) complexities of a free-range capsule.

Or perhaps a better way to think about this is to separate the Gempub Container format (ZIP with metadata) from the Gempub book content spec.

Then an archive can use just the container format, and both can be validated by a tool separately.

Or perhaps a better way to think about this is to separate the Gempub Container format (ZIP with metadata) from the Gempub book content spec. Then an archive can use just the container format, and both can be validated by a tool separately.
Collaborator

To avoid a conversation explosion, I've opened issues:

  • #3 ("mimetype" file requirement may be unnecessary), added a response there
  • #4 (Can "metadata" file be renamed to "metadata.txt"?)
  • #5 (Should "metadata" file type be UTF-8 rather than ASCII?)
  • #6 (Spec needs a "Recommendations" section)
  • #7 (An archive created by simply zipping up a capsule directory may conflict with the Gempub spec)

I think I've captured everything that is relevant to each issue from the above discussion. Please add anything I've missed or gotten wrong.

To avoid a conversation explosion, I've opened issues: - #3 ("mimetype" file requirement may be unnecessary), added a response there - #4 (Can "metadata" file be renamed to "metadata.txt"?) - #5 (Should "metadata" file type be UTF-8 rather than ASCII?) - #6 (Spec needs a "Recommendations" section) - #7 (An archive created by simply zipping up a capsule directory may conflict with the Gempub spec) I think I've captured everything that is relevant to each issue from the above discussion. Please add anything I've missed or gotten wrong.
Owner

Most of the above is covered by other tickets already (all changes accepted!)

Regards webp - it's still a niche format, it makes sense for Google who serve billions of files every day and have the scope to fix all the problems a new web file format brings. The byte savings are worth it for them.

For me the biggest case against including it is the burden is places on reader developers, most frameworks don't have webp api by default if at all - on some languages/frameworks it may prevent someone implementing a reader app at all. Full webp support has only recently come to Android!

Regards images coming from capsule archives not having alt text; that's a good catch. I think we should just be lenient in that case and use the filename as the alt-text - that's fine.

the Book profile is a subset of what a full static Gemini capsule can be, with strict requirements about the content

I like this description. I'll add it to the spec somehow tomorrow - as well as @raphm's grammatical improvements from their fork.

Most of the above is covered by other tickets already (all changes accepted!) Regards webp - it's still a niche format, it makes sense for Google who serve billions of files every day and have the scope to fix all the problems a new web file format brings. The byte savings are worth it for them. For me the biggest case against including it is the burden is places on reader developers, most frameworks don't have webp api by default if at all - on some languages/frameworks it may prevent someone implementing a reader app at all. Full webp support has only recently come to Android! Regards images coming from capsule archives not having alt text; that's a good catch. I think we should just be lenient in that case and use the filename as the alt-text - that's fine. > the Book profile is a subset of what a full static Gemini capsule can be, with strict requirements about the content I like this description. I'll add it to the spec somehow tomorrow - as well as @raphm's grammatical improvements from their fork.
raphm closed this issue 3 weeks ago
Sign in to join this conversation.
No Label
No Milestone
No project
No Assignees
4 Participants
Notifications
Due Date

No due date set.

Dependencies

This issue currently doesn't have any dependencies.

Loading…
There is no content yet.