Hacker News new | past | comments | ask | show | jobs | submit login

Taiwan's language is traditional mandarin, which in the context of a text site like Wikipedia, matters a lot. Using China's site would mean being forced to read simplified Chinese. There's other language differences as well that would make a Taiwanese person very confused if they had to read a China-based Wikipedia. Really the only sensible solution is for Taiwan to have its own chapter.



Chapters are separate from which languages have a site. (E.g. thee exists Wikimedia Canada, Wikimedia NYC, wikimedia cascadia etc. That is unrelated to there being an english language wikipedia)

Http://zh.wikipedia.org uses some weird auto conversion to try and be both lsnguages.


The most popular language in Taiwan is Mandarin, and they use the Traditional characters, but it’s the same Mandarin, albeit with small, regional differences like pronouncing 和 as hàn instead of Northern Chinese hé. Additionally, there are other official Taiwanese languages like Taiwan Min Nan and various Formosan languages.

Pedantry out of the way, You could make a case that the Wikipedia is the same language and you can change the character set pretty much interchangeably by machine, but I wouldn’t want to see China have its claws on the only ZH-language Wikipedia either.


My view is rather different. Written Chinese does not "belong" to Mandarin or any other spoken Chinese language, but is its own entity.

While the mainland-China version of simplified Chinese is based around Mandarin, written Chinese is most definitely not Mandarin. 裏 is not 里. 只 is not 隻. 后 is not 後. They might sound the same in modern Mandarin but that is where the similarity ends. Those are characters with entirely different meanings and often different pronunciations in various forms of Chinese.

I believe that alone makes the assertion that traditional Chinese is "just another character set" as controversial as calling English "just another Newspeak".


You're absolutely right. Written Chinese is called Standard Chinese on the mainland and it is "Chinese" whether you speak Cantonese, Mandarin, or Shangaianese. That fact blew my mind when I learned it. Written Chinese can be understand by people that speak mutually unintelligible spoken languages. And not just that it _can be_ understood. They share the same written language. There are informal written version of regional languages, but that's a rabbit hole.

My point was really not to get too deep into the pedantry of it, but simply that Taiwan and China do share the same written language with a different character set because you can generally replace the characters without substantial grammatical or vocabulary changes. It's not in the same ballpark as equivocating Spanish, Catalan, Languedoc, and French. There are different vocabulary variations like saying 哪兒 Nǎ'er and 哪裡 Nǎlǐ (both mean "where?"), but you can also find variations between Brit and American English vocabulary. IMO, the Trad/Simplified difference is closer akin to American English deciding to drop a lot of U's in words like "colour."

My point was that you could say they share the same written language so it makes very much sense to have one ZH-language Wikipedia, and use machine conversion to present it in the character set preferred by the reader (as it does currently). And, that there is plenty of good reasons for Taiwan to have its own Wikimedia organization because there are non-ZH languages like Min Nan that deserve to be represented.

Edit for sake of illustration. Take the sentence "I like this cat." It's read "Wǒ xǐhuān zhè zhī māo" no matter if you're reading it in Traditional or Simplified characters. There may be some pronunciation differences North v South that is above my level so far, but the characters (notably 只 that you mentioned) are the same meaning and sound.

我喜歡這隻貓

我喜欢这只猫


> I believe that alone makes the assertion that traditional Chinese is "just another character set" as controversial as calling English "just another Newspeak".

I strongly disagree. The simplified/traditional split is not the first time new character sets have been introduced. This has happened many times throughout the history of written Chinese. E.g. ever wonder why things that have to do with body parts and organs have a moon radical associated with them? That's because of a conflation of two separate radicals (moon radical and meat radical) that were distinct in Qin Dynasty seal script. And yet this doesn't mean that our Classical Chinese works are all of a sudden written in a different language when written in modern standard script or Han Dynasty clerical script or even seal script (which in turn is definitely _not_ the original script that these works were written in).

Moreover the vast vast majority of differences between simplified and traditional Chinese characters are one-to-one mappings between different characters. While simplified Chinese characters sometimes map multiple traditional Chinese characters to a single simplified one, the reverse direction also happens! See e.g. 乾 which is split into 干 and 乾 depending on meaning and pronunciation.

> Those are characters with entirely different meanings and often different pronunciations in various forms of Chinese.

The fact that a single character can have different pronunciations and different meanings which can coincide or not among different varieties of Chinese has endured for as long as we've had written records of non-standard Chinese varieties. This is not something new with simplified characters.

At a higher level, there has been a split in Chinese between officially sanctioned characters and non-standard variant characters since the first unification of characters under the Qin dynasty (which characters are sanctioned has changed over time). Traditional Chinese and simplified Chinese characters are simply changes in which characters receive official sanction.

Basically at the end of the day the simplified/traditional split is nothing new in Chinese, it's happened many times before and it's very easy to learn one if one knows the other.


> E.g. ever wonder why things that have to do with body parts and organs have a moon radical associated with them? That's because of a conflation of two separate radicals (moon radical and meat radical) that were distinct in Qin Dynasty seal script.

They're distinct in modern Chinese too, if you look closely.

(And if you use a Taiwanese font.)


Indeed it is! Although that's actually a later re-invention of modern fonts rather than a preservation of a continuous tradition and is not consistent because it cannot be since there are multiple potential ancestors of 月 (e.g. 朋, which does not preserve its seal script roots in any modern font I know of).


> Pedantry out of the way, You could make a case that the Wikipedia is the same language and you can change the character set pretty much interchangeably by machine

This is absolutely not true. Even ignoring idiomatic differences (which are comparable to UK vs US English), the conversion from Traditional -> Simplified is a one-way function. The mapping between characters is not one-to-one: there are a bunch of traditional characters that map to the same simplified character, and some words that are dropped entirely. You can't up-convert from Simplified -> Traditional without guessing.

It's like saying we should just use simple.wikipedia.org for the English language and then use a thesaurus on the client side to auto-convert text to use big words for en.wikipedia.org. It ain't going to be readable.


Your analogy is not accurate. The Simple English Wikipedia is for ESL readers who are still learning English. The usage and grammar are different. A closer (but still imperfect) analogy is the "simplified" spelling of American English vs British where we dropped a lot of U's, or shortened "tonnes" to "tons."

The mapping is not isometric in either direction. There are simplified characters that collide when mapped to traditional and vice versa. In practice this isn't a problem though, because despite what many people think, each Chinese word is not necessarily one character. It is possible to use the context from surrounding characters to accurately map them back and forth. I'm sure it's not perfect, but for the general case it works. It's not a hypothetical either, Wikipedia does this currently.


It’s extremely far from perfect, and I know of no native speakers which use that automatic mapping. It’s like trying to read something through google translate: the point gets across, but the errors distract.

And it IS like the simple -> en example I gave, at times. Because there are more differences than just character usage. It also affects word choice and idioms. Even when the automatic translation picks the right character, it can still come off as... weird.

That’s why native speakers don’t use these features. It puts the text into an uncanny valley that is annoying to read.


Simplified and traditional characters are in perfect correspondence right? Could maybe translate on the fly?


Chinese Wikipedia https://zh.wikipedia.org automatically transliterates into Mainland Simplified, Hong Kong Traditional, Macau Traditional, Malaysia Simplified, Singapore Simplified and Taiwan Traditional. The article text can be in any variant (and is usually a mixture after getting edited by people using different standards) and if you visit an untransliterated page, you'll be prompted for your preference.

There's no simple one-to-one correspondence for all characters, but Wikipedia has multiple layers of special cases and exceptions that can cover most situations (including vocabulary differences). That doesn't mean the text always makes sense after transliteration: The article about the Taiwanese township of Shuili mentions that it was originally named 水裡, but then renamed to 水里 in 1966. At least in the Taiwanese Traditional version https://zh.wikipedia.org/zh-tw/%E6%B0%B4%E9%87%8C%E9%84%89 . If you read the Mainland Simplified one ( https://zh.wikipedia.org/zh-cn/%E6%B0%B4%E9%87%8C%E9%84%89 ) both of these names get simplified to 水里.


I've never really bought the multiple layers of special cases and exceptions. For example, British English and American English also have many of the same differences between simplified and traditional Chinese characters; e.g. British English and American English both agree on the verb form of "to curb" but disagree on "kerb" vs "curb." Yet AFAIK Wikipedia doesn't have a similar system that tries to convert from British English to American English or vice versa that also handles all these special cases apart from just the normal spelling differences.

Taiwanese Mandarin is really not that different from PRC Mandarin. In fact most college-educated mainland speakers can read works written in traditional characters just fine although I'm not sure on the Taiwanese side for simplified characters. In fact I think a substantial proportion (most?) of Chinese Wikipedia readers just don't bother with changing the character set either way and are fine with traditional/simplified character switches throughout the article (that's certainly the way I read it).


I wanted to find out whether there are any statistics on how this feature is used, but only found this Phabricator ticket: https://phabricator.wikimedia.org/T227904 Looks like they put doing research on user needs into the backlog for now.


> Simplified and traditional characters are in perfect correspondence right?

No, there is no perfect correspondence between simplified and traditional Chinese. The simplified Chinese collapses important characters such as “after” and “back”, often causing confusion (you can read more here [1]).

[1]: http://pages.ucsd.edu/~dkjordan/chin/SimplifiedCharacters.ht...


Not only that, but there are a couple of characters that are collapsed in traditional but split out in simplified:

"乾" in Simplified refers to only one of the 8 trigrams, as is used in the word "乾坤"

"乾" in Traditional can mean either the above OR the Simplified "干" (dry)

"干" also exists in Traditional but only means "stem" not "dry"

However, GP's point about translating on the fly is possible. You don't need a very advanced algorithm to translate simplified<->traditional almost perfectly. Unless you're translating poetry they can almost always be disambiguated by the nearby characters very well.

Note though that there are lots of actual word differences in mainland Mandarin and Taiwanese Mandarin. It's not difficult for one to read the other and maybe occasionally asking a question or two but it's nice for every human to have materials available in their native dialect and vocabulary. Much like you probably appreciate that your system offers both UK English and US English and doesn't force you to use the "other" one from what you're used to.


As a non-native English speaker, I actually hate that there is a split.


You should check out Singaporean English or "Singlish", which uses a lot of Chinese grammar and Hokkien and Malay vocabulary in English sentences. It's a wonderfully efficient dialect taking the best from all of these languages.


I heard it spoken when I was in Singapore. It was pretty fun, but I'm glad it isn't given equal status, and that we didn't have to learn any in school.


No. They are not in perfect correspondence. The mapping is one-to-multiple or multiple-to-one in quite a few characters, some of them commonly used. https://en.m.wikipedia.org/wiki/Ambiguities_in_Chinese_chara...




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: