Language Value - Using IETF instead of ISO 639
Openopened 2 years ago by krixano · 8 comments
Reference in New Issue
There is no content yet.
Delete Branch '%!s(<nil>)'
Deleting a branch is permanent. It CANNOT be undone. Continue?
You mention that one of the reasons for the language is for correct pronunciations by screen readers. If this is the case, then the dialect of the language should also be included. For example, instead of "en", it should be one of the following "en_US", "en_GB", "en_AU", etc. This is the "IETF BCP 47 language tag" spec (https://en.wikipedia.org/wiki/IETF_language_tag)
Edit: The IETF language tag should use hyphens, not underscores. Everything else still applies.
The text/gemini spec currently requires conformance with RFC4646:
However, https://tools.ietf.org/html/rfc5646 replaces [RFC4646]. [RFC5646], in combination with [RFC4647], comprises BCP 47.
So we probably need to see if we can get the text/gemini spec updated to refer to BCP 47, and then modify the gempub spec to refer to BCP 47 as well.
There is an currently an open issue for that: https://gitlab.com/gemini-specification/gemini-text/-/issues/1
https://tools.ietf.org/html/bcp47 seems to suggest that the Gemini spec is correct in saying that language tags should have hyphen-minuses in them, not underscores ("en-US", "sr-Latn-RS").
Right, but my original issue is about gempub not even using ietf's language tags at all, where, afaik (could be wrong, idk), it has no ability to add regions. Whether it's underscores or hyphen's is irrelevant and mostly just a small mistake from me thinking it was underscores.
Yes, the hyphen is mandated by BCP 47:
Language valueto Language Value - Using IETF instead of ISO 639 2 years ago
@krixano, absolutely. I think you are correct here in pointing out that the gempub
languagemetadata field should use BCP 47 language tags, and not ISO 639-1 or ISO 639-2 country codes. We're just trying to get pointers to the right specs.
I think I have the summary of this issue correct, which is that the current gempub spec provides for the use of an ISO 639-1 or ISO 639-2 tag in the
languagefield of the
metadata.txtfile. This only allows a two- or three- character country code.
The spec ought to provide for the use of complete language tags that are compliant with BCP 47, rather than only for two- or three- character country codes.
In other words, "en" or "en-GB" or "sr-Latn-RS" should all work, as they are all proper BCP 47 language tags that can be passed to screen readers for (among other things) the purpose of supporting correct pronunciation.
Everyone happy with BCP 47 then? I'll update it tomorrow - fwiw the spec did say that field needed some thinking about, it was just thrown in.
The use-case in mind was say a Spanish user reading an English book, and how would the screen reader handle that - I need to check what APIs are available for Android (just because that's my primary platform) and make sure it's a realistic requirement - at the moment I have no idea.
If we're talking about a spanish user reading an English book, you would want the screen reader to read in English? If that's the case, then this should be fine.
In the case of translation, the software should already have a setting of it's own for what the user's native language is. You wouldn't change anything in the gpub, that wouldn't make sense. The gpub only needs to have what is actually inside the book. gpub's can't magically know that a person of a different native language is reading the book. That's the job of the reader software.
There is one thing to consider though - multi-language books.
Multi-language books... let's leave that as a future problem for now.
And yes - re screen reader. On Android (for my sins, my primary platform) the screen reader TalkBack would default to the users Locale (Spanish in our example), so if we have the language from metadata.txt we can wrap String in a LocaleSpan which would give Talkback what it needs to correctly pronounce the English words on a Spanish Locale device.