Hacker News new | past | comments | ask | show | jobs | submit login

munch117 is right; it's a different letter.

Compare French é and è, different vowels with different spellings, to pinyin é and è, identical vowels with different diacritically-marked tones. The French vowels don't have diacritics any more than i and j do.

That's a graphemic definition of "letter". There is an alternative, collation order, but in Icelandic that will also tell you that o and ö are two different letters, and ö sorts after þ while o sorts before it.




The French definitely don't consider those to be different letters - they are e with an acute accent and a grave accent respectively; an accent is a type of diacritic mark. There's also a circumflex accent in French, and several other vowels can take accents. They even have two other diacritic marks, the trema (two dots) and the cedilla.

This is apparently different in Icelandic, where indeed ö is considered a separate letter, not an o with an umlaut/two dots. But this is simply a convention of Icelandic typography - German for example also uses ö in certain words, but they do consider it an o with an umlaut.


It depends on the language, and is ultimately down to the language authorities/users whether diacritical letters count separately or not.

Some examples (please correct me if I've mistaken anything):

(Modern) English: 26 letters; A-Z. ÆÐÞŒǷȜſꝛ are no longer letters, replaced with alternative letters or digraphs

French: 26 letters; A-Z. ÀÂÇÉÈÊËÎÏÔÙÛÜŸ are letters with diacritics. Æ and Œ are digraphs

German: 26 letters; A-Z. ÄÖÜ are letters with diacritics. ẞ is a ligature

Dutch: 26 letters; A-Z. IJ is a digraph

Spanish: 27 letters; A-Z + Ñ. LL is a digraph

Danish, Norwegian: 29 letters: A-Z + ÆØÅ

Swedish, Finnish: 29 letters; A-Z + ÅÄÖ. Finnish Š and Ž aren't letters and are replaced with digraphs

Icelandic: 32 letters; A-Z - CQWZ + ÁÉÍÓÚÝ + ÐÞÆÖ (all letters)

Hungarian: 44 letters; A-Z + ÁÉÍÓÖŐÚÜŰ + Cs, Dz, Dzs, Gy, Ly, Ny, Sz, Ty, Zs (all letters)


Also with Swedish, until 2006, W wasn't a letter but was just a variant of V, and sorted as if it was a V (so Ws were mixed in with the Vs in an index)


> This is apparently different in Icelandic, where indeed ö is considered a separate letter, not an o with an umlaut/two dots. But this is simply a convention of Icelandic typography

I mentioned both phenomena. On the fundamentals, the letters are separate; that is the conclusion you'd come to, for French éè or Icelandic oö, by studying the writing system. According to formal authority, the two French letters are one and the same. This is a mistake of fact, but that happens all the time. (Compare French aà, where they're formally identical letters and the fundamentals tell us that the formal statement is correct. You might also compare 20th-century Spanish, in which "ch" and "ll" were formal letters that collated differently from the sequences "c-h" and "l-l". Again that agreed with the facts of the language; I believe the collation order was changed for the benefit of computers.)

And as I mentioned and you agreed, in the case of Icelandic the facts on the ground and the formal statement by authority are in agreement, so there isn't a case to be made that o and ö might be considered the same letter.


> On the fundamentals, the letters are separate; that is the conclusion you'd come to, for French éè or Icelandic oö, by studying the writing system.

This is the part we most disagree on. In my opinion, there is no fundamental definition of a letter, or at least the fundamental definition is graphical, entirely unrelated to pronunciation. By my definition, é è ö are fundamentally combinations of diacritical marks applied to a base letter (e, for the first two; o for the second).

And the reason I don't think there is any link between letters and their pronunciation is that letters are just not pronounced in a single way, regardless of any extra marks on them. In French in particular, e can be pronounced ə, such as in "sucre"; it can be pronounced œ such as in "me"; it can be pronounced "ɑ̃" such as in "Rouen"; it can be pronounced "e" such as in "chanter" (which is the same sound that é usually marks); or it can be pronounced as ɛ such as in "elle" (the first e). All of these are extremely common, not some obscure phonetics of one particular word.

Also, another reason we can see the accents are diacritical marks and not separate letters is the history of how they came about. French orthography has a bunch of rules about how different other letters modify the pronunciation of "e". The accents were added in those places where the regular rules aren't otherwise followed. So we spell the infinitive "chanter" and the second person indicative "chantez", because with the final "r" or "z" it's clear that the "e" is pronounced as "e", but we spell the participle "chanté" (pronounced identically) to distinguish it from "chante".


Even in German, mixing up o and ö or u and ü is still a spelling error. They sound very different, even if they are in some formal sense the same letter.


The real question is how Germans spell in contexts where it's not easy to add those diacritical marks. Do they omit them, or take the extra effort to put them in / replace them with some digraphs?

My own language has several diacrititics that completely change the pronunciation (ă for schwa, â and î for a vowel that has no equivalent in English, ș for sh, ț for the zz from pizza), and is otherwise very phonetically written. Still, when writing in informal contexts on English keyboards, people generally just use a/i/s/t and rely on context to make it clear which word is meant. In old SMS writing, they would sometimes use sh and tz, but still a/i. So even though the diacritics fundamentally change pronunciation, and make a few words indistinguishable, they are not critical to the understanding of a text and local readers don't fully require them.


From what I've observed, if they only get ASCII they write out the diacritics phonetically. As an example, https://en.wikipedia.org/wiki/Chris_Huelsbeck interchangably uses Hülsbeck and Huelsbeck, but never Hulsbeck




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: