I have studied Japanese, and still think that a logographic writing system was a...

rett12 · on Aug 20, 2016

Native speakers seem to do fine. Learning a language while growing up, having the Hiragana as a helper, while all your media is written in Japanese makes everything easier. When they finish school they know enough Japanese to go by. It's obviously different for non-native people.

Also, it's not like you stop learning even after school. For example English has according to the Oxford dictionary 171,476 words in current use excluding inflections, and several technical and regional vocabularies. Does all English university students know these words?

ggreer · on Aug 20, 2016

Logographic systems have some major disadvantages:

• It's possible to know how to say a word, but have no clue how to write it. This phenomenon is called character amnesia, and it affects most native speakers.[1] Phonetic languages allow you to write out a misspelled word, which readers can understand (or autocorrect can fix).

• Likewise, it's possible to know what a symbol means, but have no idea how to pronounce it. This is extra-fun in Japanese, where most kanji have multiple pronunciations.

• Looking up words is harder, as there are no "letters" to sort by. Sorting can be done by stroke count, by radical (four corners or SKIP), or by phonetic spelling (in pinyin or hiragana). Modern technology has made this easier, and some phone apps (like Pleco) can even OCR hanzi. Still, it's far less convenient than phonetic languages.

The only aspect in which logographic systems win is information density. You can fit more words on a single page. This is obvious if you've ever seen Chinese or Japanese copies of works that were originally written in English. The Harry Potter books are crazy thin. Also, Chinese and Japanese tweets can express a paragraph of information.

1. https://en.wikipedia.org/wiki/Character_amnesia

weinzierl · on Aug 20, 2016

> It's possible to know how to say a word, but have no clue how to write it.

> Likewise, it's possible to know what a symbol means, but have no idea how to pronounce it.

As a second language learner of English I can attest that this is not just a problem of languages written in logographic systems:-)

>The only aspect in which logographic systems win is information density.

I vaguely remember a paper that claimed that information density is pretty much constant across languages and writing systems, but I couldn't find it as for now. There is another thread on HN [1] where people compared the size of "Universal Declaration of Human Rights" in different languages. I think this misses the point because it doesn't account for intra-character information density. It'd be much more interesting to render the text into a bitmap and then compare compressed bitmap sizes.

[1] https://news.ycombinator.com/item?id=8236135

ggreer · on Aug 20, 2016

People like to joke about English spelling, but see farther down-thread for examples of how bad things are in logographic systems. Even native-speaking PhDs can forget how to write words like "sneeze" or "toad". It's a failure mode that simply doesn't exist in phonetic languages (even ones as imperfect as English).

Sorry if it wasn't clear, but by "information density" I meant area on a page or screen, not digital bytes. In the thread you linked to, people correctly point out that digital information density depends on encoding and compression schemes matter far more than language.

The paper you're probably thinking of is A Cross-Language Perspective on Speech Information Rate[1][2], which (as the title indicates) studied spoken language, not written. Annoyingly, the study was widely misrepresented in the media. It found that languages with lower information density tended to have higher syllabic rates. That is: Spanish contained less information per syllable than English or Mandarin, but Spanish speakers spoke faster to make up for that. Most media summaries of the paper omitted an important finding: the compensations didn't balance out. Different languages had different information rates. In the study, English had the highest. The runner-up (French) was 10% slower. And Japanese was 30% slower at conveying information.

1. http://ohll.ish-lyon.cnrs.fr/fulltext/pellegrino/Pellegrino_...

2. This blog post has a more accessible summarization of the data: https://www.tofugu.com/japanese/why-do-japanese-people-talk-...

cthalupa · on Aug 20, 2016

>Phonetic languages allow you to write out a misspelled word, which readers can understand (or autocorrect can fix).

You can certainly write things out in kana. When I was more serious about studying Japanese, I knew less than 1000 kanji, but had a vocabulary several times that size, and would at times write out the word I meant in hiragana. And if we're counting autocorrect, your IME is going to take that hiragana and let you find the character.

>• Looking up words is harder, as there are no "letters" to sort by. Sorting can be done by stroke count, by radical (four corners or SKIP), or by phonetic spelling (in pinyin or hiragana). Modern technology has made this easier, and some phone apps (like Pleco) can even OCR hanzi. Still, it's far less convenient than phonetic languages.

Eh, I disagree here. It's harder if you're used to looking things up by the spelling, but once you're fast at looking things up by radical, it's not that difficult. My misguided attempts at slogging through 1Q84 while reading at a, at best, middle school level got me pretty fast at looking up kanji. Not any appreciable difference vs. looking things up in a regular dictionary.

FabHK · on Aug 20, 2016

You cannot write things out in Kana in Chinese. As such, GP's point against logographic writing systems stands, notwithstanding mixed writing systems such as Japanese.

Even without autocorrect, you can write a word in English such that most people would understand. Of course, in a logographic system you'd just write a homophone (which is what people actually do, write a simpler word pronounced the same).

As for looking up, it is in principle easier though. You only need to learn the order of about 26 things, not about 200, and can then run iterative binary search over it, and don't have to switch to stroke count. It is possible, of course.

pmontra · on Aug 20, 2016

Some upper and lower case letters have no clear resemblance, see Aa Rr Gg Nn, so one has to learn 52 symbols. Add other 52 symbols for script, if you have to. Then in the case of English learn how to pronounce or spell words, because in some cases there are no rules (why ocean and not oshean? Because of derivation from Greek, still...)

Anyway, any alphabet is better than Chinese characters.

andrioni · on Aug 20, 2016

>• It's possible to know how to say a word, but have no clue how to write it. This phenomenon is called character amnesia, and it affects most native speakers.[1] Phonetic languages allow you to write out a misspelled word, which readers can understand (or autocorrect can fix). > >• Likewise, it's possible to know what a symbol means, but have no idea how to pronounce it. This is extra-fun in Japanese, where most kanji have multiple pronunciations.

I don't think English is much better in these cases. In fact, the writing can be so divorced from speech that spelling bees are a thing.

ggreer · on Aug 20, 2016

I've had Chinese colleagues who, when asked to write a word they'd just used in a sentence, were simply unable to. At first I thought they were playing a joke on me. But nope, they'd just forgotten the appropriate hanzi, and they couldn't even hazard a guess. It's a totally different failure mode than imperfectly-phonetic languages like English.

w1ntermute · on Aug 20, 2016

From Why Chinese Is So Damn Hard[0]:

> I was once at a luncheon with three Ph.D. students in the Chinese Department at Peking University, all native Chinese (one from Hong Kong). I happened to have a cold that day, and was trying to write a brief note to a friend canceling an appointment that day. I found that I couldn't remember how to write the character 嚔, as in da penti 打喷嚔 "to sneeze". I asked my three friends how to write the character, and to my surprise, all three of them simply shrugged in sheepish embarrassment. Not one of them could correctly produce the character. Now, Peking University is usually considered the "Harvard of China". Can you imagine three Ph.D. students in English at Harvard forgetting how to write the English word "sneeze"?? Yet this state of affairs is by no means uncommon in China. English is simply orders of magnitude easier to write and remember. No matter how low-frequency the word is, or how unorthodox the spelling, the English speaker can always come up with something, simply because there has to be some correspondence between sound and spelling.

0: http://www.pinyin.info/readings/texts/moser.html

andrezsanchez · on Aug 20, 2016

To be fair, you can also "come up with something" in Chinese. Since there aren't all that many sounds, you can write in generic characters for the sound of the word that you can't remember.

fenomas · on Aug 20, 2016

Yep. The analogy I use is, it's a bit like if someone walked up and asked you to draw the logo of this or that company. Even if you've seen the logo a million times, you might not be able to summon up a mental picture of it, or you might remember the rough shape but have no idea how many lines go where.

matthewrudy · on Aug 20, 2016

I've never heard this term "Character Amnesia" but its an analogue to my situation.

I can read and write (via pinyin) a large number of characters, but cannot recollect their shape in abstraction.

I think that's just because as a foreigner learning chinese in the modern world I've never had to learn this skill.

The difference between Recollection and Recognition.

gurkendoktor · on Aug 21, 2016

Same here - and strangely enough, it's rarely a problem. Faking characters by using the correct radical and a random homophone base character works okay in a pinch.

But because I never write characters by hand, I have a really hard time reading handwritten notes, and that is a problem.

mistercow · on Aug 20, 2016

> or autocorrect can fix

If you're bringing computers into it, isn't text entry in Japanese usually done phonetically anyway?

sampo · on Aug 20, 2016

> For example English has according to the Oxford dictionary 171,476 words in current use excluding inflections, and several technical and regional vocabularies.

Here is a website which questions you with some random sample of words from an English dictionary, mixed with randomly generated non-words. Then it estimates the percentage of English words you know.

http://vocabulary.ugent.be/wordtest/start

I am a non-native speaker, and I have scored in the 77% to 89% range, when doing this test several times.

tempestn · on Aug 20, 2016

I'm curious: did you only answer yes to the words whose meanings you knew, or to anything that you knew was indeed a word? There were some that were pretty obviously words, but I wasn't certain the exact meaning (although I could guess), so I answered no. Ended up with 77% (as a native speaker). Apparently average for native speakers is 67%, so 77-89 as a non-native speaker sounds really good.

wingerlang · on Aug 20, 2016

I just did it, and I answered yes to words I knew, or knew that were actual words but I didn't know the exact meaning of. Like Argon, I know it is something related to chemistry but I don't actually know what it is. Some words were compound words which I am not sure would be in a dictionary, but still valid words.

I got 73% and I didn't say 'yes' to any fake words.

73% is apparently "This is a high level for a native speaker."

fenomas · on Aug 20, 2016

> I also think that the Latin alphabet could be easily used for Japanese

Writing Japanese entirely in Latin characters would be no different from writing it entirely in hiragana. Have you ever tried reading that way?

gizmo686 · on Aug 20, 2016

Kind of. In my first semester of japanese we worked in hiragana+spaces.

Having read English language papers on Japanese linguistics, I can also say that reading the Latin is easy too.

fenomas · on Aug 20, 2016

Sure, I didn't mean to suggest it can't be done in short spurts. But reading a novel that way would be hellish.

The larger point being, Japanese isn't locked into using a logographic system - it already has two phonetic syllabaries that people could start using exclusively if there was some advantage to doing so.

cthalupa · on Aug 20, 2016

That sounds like an absolutely miserable experience. I'd rather be forced to look up every 3rd or 4th kanji than try to deal with all hiragana writing.

jacobolus · on Aug 20, 2016

> I also do not think that the Latin alphabet could be easily used for Japanese, [...]

You stuck an extra “do not” in your sentence

* * *

As far as alphabets go, the Phoenician/Greek/Etruscan/Latin alphabet is pretty ad hoc and mediocre. But hey, it’s what we know. At this point, I think we’re stuck with it.

Similar story for modern Hindu/Arabic/European numeral glyphs. Learning arithmetic would be noticeably simpler if the glyphs expressed some of the symmetries of the number system. Alas.

gizmo686 · on Aug 20, 2016

Removed the "do not"

As far as the alphabet itself goes, I do not think that Latin is that bad. All symbols have a canonical sound associated with them. The problem is that our usage of the alphabet is horribly inconsistent. This is partially due to the fact that English has sounds that cannot be expressed using the "pure" alphabet. Arguably Japanese has this same problem in their system, with the ゃ、ょ、ゅ modifiers. But at least they distinguish those from や、よ、ゆ by size, and are disciplined about their usage, so we can consider the set of compounds to be their own characters and not have a mess.

Of course you still have the ず/づ issue, and the pronunciation of は and を as わ and お in their most common usage. But, even in modern Japanese, these oddities are not universal.

Out of curiousity, are you aware of any numeral system that beats Arabic? By pre-Arabic European standards, Arabic numerals are a masterpiece of symmetry.

jacobolus · on Aug 20, 2016

Here’s my proposal for base twelve numerals, http://i.imgur.com/UobIObq.jpg ; multiplication mod twelve, http://i.imgur.com/dRielBv.jpg

It can also be nice to use a “balanced base”, with digits for negative numbers, e.g. in a base ten context you’d have digits for –4 to 5 (or if you’re willing to have multiple expressions for the same number, –5 to 5).

A balanced base twelve multiplication table might look like this: http://i.imgur.com/quEcxH0.png

bhaak · on Aug 20, 2016

> As far as alphabets go, the Phoenician/Greek/Etruscan/Latin alphabet is pretty ad hoc and mediocre. But hey, it’s what we know. At this point, I think we’re stuck with it.

You mix the whole development line of that Latin alphabet into one dismissive argument. I see lots of difference between the Phoenician and the Latin alphabet and FWIW, the Latin alphabet is quite versatile as its wide application shows.

It wonder what do you consider mediocre about them?

> Similar story for modern Hindu/Arabic/European numeral glyphs. Learning arithmetic would be noticeably simpler if the glyphs expressed some of the symmetries of the number system. Alas.

I don't think learning arithmetic would be much simpler with other numerals. Even the Romans could do it and they had one of the worst possible numerical systems.

I find our numerals quite fine. My daughter was recognizing numbers before she turned 2. There is some mnemonic to the first four (1 line, 2 corners on the left, 3 corners on the left, 4 corners overall) and most are quite distinct from our Latin letters. 6 and 9 are annoyingly symmetrical of each other, though.

jacobolus · on Aug 21, 2016

Writing a less dismissive / more serious argument about the Latin alphabet would take a few hundred pages. You’re right though, I’m not a speaker of (or expert in) ancient Phoenician, perhaps their alphabet was a bit better structured for that language (it looks pretty ad hoc though). I can primarily speak to the Latin alphabet’s irregularity and mediocrity for representing modern English/Spanish/etc., though it doesn’t seem to have been much better for Greek or Latin. Obviously it works well enough to be the practical anchor for written culture, and I can certainly imagine worse systems (little Egyptian-style pictographs for letters for example). But it’s hardly elegant or systematic. The ordering of the letters is also pretty much arbitrary, and has nothing to do with the separation between consonants and vowels, or the relationship between particular sounds.

For an example of a better designed alphabet, check out Korean Hangul.

* * *

The numerals 1, 2, 3 come from just writing strokes, like tally marks, which over time became connected in handwriting. The other numbers were mostly fairly arbitrary symbols, which morphed slowly over time with occasional replacements and swaps. Otherwise, the symbols have absolutely nothing to do with the numbers they represent or with the base ten number system. Overall, I’d say numbers 0 and 1 are pretty effective. The rest are a huge waste of potential.

Same story for the words/names used to represent the numbers. They are made of arbitrary sounds in arbitrary numbers of syllables, reveal nothing about the theoretical properties of the numbers, some of them are hard to say or easy to mistake, etc. Especially for numbers beyond ten, the names are irregular and confusing. This has a real practical impact. Counting is notably easier for Chinese speaking children than for English speakers.

> I don't think learning arithmetic would be much simpler with other numerals. Even the Romans could do it and they had one of the worst possible numerical systems.

In general, Romans did their arithmetic using little pebbles (“calculus”) on counting board (“abacus”), and used written symbols only for recording the output of their calculations. This made some types of computation very difficult (because using pebbles to record every step gets cumbersome), which helps explain why science has taken off in the past 500 years in Europe after we started developing better notational conventions and using Hindu–Arabic numerals and later decimal fractions, logarithms, etc.

My son is about 2 weeks old, so I can’t tell you yet how well he learns arithmetic using a different set of numerals. Ask me again in about 10 years.

mcguire · on Aug 20, 2016

We should switch to Fëanorean script. It's almost IPA without the notational horrors.

rogual · on Aug 20, 2016

> the Japanese phonetic system writes voiced sounds as a modification of their unvoiced counterparts. why can't we all do that.

Fun fact, we do do that in English, at least for C and G. (G was introduced as a modified C to indicate the voicing).

Jack000 · on Aug 20, 2016

by that measure we should forget about historical languages and learn something constructed like esperanto.

languages are not solely a means of communication but a part of a people's cultural identity. I think the greater dependence on contextual cues and ambiguity in Chinese/Japanese lends itself much better for linguistic art forms like poetry and literature.