Hacker News new | past | comments | ask | show | jobs | submit login
The long process of creating a Chinese font (qz.com)
153 points by f14ist on Dec 18, 2015 | hide | past | favorite | 54 comments



I've always hated the mingti--serif comparison. Mingti looks too artificial and beholden to technology (woodblock printing) for that to hold up. And kaiti likewise is too caligraphic and human to fit the bill either. With both of those constrained by their orignal medium enough to count as skeuomorphism---call me out on bias as a western or non-mason, but serifs don't evoke stone-inscribing as obviously to me---I was about to give up and say there is no serif analog.

But what font is this? https://qzprod.files.wordpress.com/2015/10/yan-rad1.png (from the article), is definitely not JinXuan, and I think is the "most serif" Chinese font I've seen. It's definitely a Kaiti first and foremost, which I consider a necessary traditionalism for this analogy. Yet, the general boxiness of the strokes, especially the cusps on the corner of the boxes/kou3, defy the practicalities of brush-strokes (e.g.. harder to do tangency-breaks) and evoke the "cuspiness" of serifs.


The shown one is some variant of Weibei (魏碑).


Thanks!


An interesting article. I am very glad for the recent flourishing of CJK fonts, both paid and free (see e.g. Noto fonts including Noto CJK[1])

Some feedback:

> Here an Arphic edit suggests aligning the bottom of the character 磋 with its top part, writing in red ink, “don’t shift right.” The character, as it happens, is cuo, and means “error.”

1. The character in the image is 蹉, not 磋.

2. The red edit text says 下偏右 = "the bottom is shifted right", not 不偏右 = "don't shift right".

[1] https://www.google.com/get/noto/help/cjk/


Author here. Thanks for the comments, I have fixed both points. The first one I assure you was just a typo, the second, I read too quickly and could have sworn there was another stroke there originally...


> I read too quickly and could have sworn there was another stroke there originally

Funny how the meaning ended up basically the same either way.

I'm reminded of a situation where I made a similar mistake (misread one character for another) but the meaning ended up the opposite: There was a hotel room listing that noted 设有电视 ("equipped with TV"). At the time simplified hanzi were still a bit new to me, so I misread ⻈ as ⺡ and thus in my mind it became 没有电视 ("has no TV").

I thought this was an odd thing to advertise, but that maybe it was simply a cultural difference.


As a Chinese I would say "带电视" or "配有电视", rather than "设有电视".


How about comparing with Japanese font? I worked in branding in China. I was told many modern inspirations are from Japanese work. IMHO as a Chinese, they generally have much better taste for art in China.


Most Chinese fonts contain more Han glyphs compared to Japanese, especially traditional. Most traditional Chinese fonts will have over 10,000 glyphs, while simplified are about 7,000. Many foundaries will create a even larger one, with over 22,000 glyphs in a single typeface.


If starting from scratch, yes, it could be very different and requires much more work. However, in many cases, I highly doubt this is what happened.


I know nothing of these subjects so I ask: is there any advantage to the typography and scrip system that is simplified Chinese vs a small alphabet a la Latin or Cyrillic?


Hello, I'm the author of this piece. It's a very good question, and the answer may simply be that script systems are inferior, but anecdotally I would say there are two advantages:

First, it makes the etymology of the script is very apparent. Often etymology in for example English is very obscure, and requires great leaps of imagination and inference to make the connections. Compare that to the character 灣 referred to in the piece, which means "bay" and contains the "water radical." The etymology can be made more clear in this way.

Second, the script is agnostic to how the characters are pronounced. This is what has allowed it to be used for several languages in China (often inaccurately referred to as "dialects")—which are often pronounced completely differently—for hundreds of years.

That said, there are clearly many, many disadvantages, and the main thing preventing change may simply be inertia.


The "etymology" you speak of (and is often used in a Chinese context) has nothing to do with actual etymology. The origin of characters is barely related at all to the origins of words.

The word 灣 wān (= bay, cove) might be related to 彎 wān (= curve, bend) but the character doesn't tell us that; it's certainly not related to 水 shuǐ (= water) which appears in 灣 as 氵.

wān also provides an excellent example of where the "character etymology" definitely isn't the actual origin of the word. 臺灣 Táiwān (= Taiwan) is made of characters meaning "terrace" and "cove", so you might think aha, Taiwan has a purely Chinese etymology from "Terrace Cove", but in fact it's unrelated: it's from Siraya (an indigenous Taiwanese language) Tay-uan (= sea people).


It can also, as my Japanese textbook pointed out, be faster to read if you're familiar with the characters in a body of text. Like the difference between reading "one hundred forty-three" vs "143". It's the input that kills you.

But I think computer/smartphone semi-phonetic input kind of gets you the best of both worlds.


Experienced readers of alphabetic languages recognise words by pattern matching their shape. That is, you do not individually decipher the characters that make up a word, but you pattern match on (mostly) the ascenders and descenders, and then maybe sanity check first and last characters.

That greatly speeds up reading, but also makes it hard to discover typos, in particular characters inside of words that have no as-/descenders.

In this way, alphabetic writing is maybe more accessible - novice readers can decipher character by character and map to phonemes, thus having a way to understand all words; experts pattern match and read faster.


And the tradeoff is that it's not quite as well optimized for the pattern-matching fast path. The systems just have very different performance/usability characteristics.


I'm skeptical of the "quick to read" argument. An educated Chinese speaker generally knows something in the neighborhood of 5,000 characters ("full literacy" is supposed to be 3k-4k), which is far less than readers of phonetic systems (20k-35k). Unless you're a professional writer of some variety you're going to spend more time looking up words in Chinese.


I am not familiar with Chinese, but in Japanese, the characters do not necessarily map 1 to 1 with words, so you have some words that are composed of multiple characters. For example, "adult" would be written as 大人, which are the characters "big" and "person".


Sure, but the script speed-reading advantage only comes at the level of having single symbols for single meaning. Once you need to combine symbols to get the (additional) meanings, you're not any faster than phonetics.


Yeah, just like the words "handwriting", "television", "sunshine", "seafood", the meaning of the whole character can be inferred from the parts.


Characters aren't necessary words. In Chinese, a lot of phrases are expressed in groupings of 2 characters and 4 characters.

Especially in these 4 character groupings, you can find a lot of efficiency and elegance. It expresses ideas & meaning that would take 20-100 words to fully express.


Many Chinese words are two characters in length.


The permutation of those 3k,4k,5k characters when combined into 2-character word or 3-character word are huge.


> and the main thing preventing change may simply be inertia.

And pride. I've noticed the Chinese are quick to defend their writing system, despite its lack of "efficiency". Which is understandable: it is a thing of great beauty, with thousands of years of cultural heritage.


Which "Chinese"? Your point seems to miss where the pride lays.

There is a lot infighting since the Communist in the 50s/60s decided to create simplified Chinese. Traditionalists would argue that simplified Chinese writing is not as elegant/pretty as traditional (I agree, although I have a bias as I grew up in a country that kept traditional Chinese) but the dominance of China has forced almost every other place in the world that writes in Chinese to use simplified. This includes Japan, which I believe a large majority of their signs are written in simplified version of kanji.

And this spills over into the U.S. where the Chinese who have lived here (which would consist mostly of Hong Kong and Taiwanese) are now fighting (or have fought) the recently immigrated Chinese from China over which system to use in U.S. schools.

So there is a lot of pride, but maybe not in the way you believe.


> This includes Japan, which I believe a large majority of their signs are written in simplified version of kanji.

Japanese simplification has some overlap with Chinese, but overall they are not the same, and the simplification was definitely not "forced" on Japan by China.


To add to that, China's simplification process happened during the 50s and 60s, back when China was deeply impoverished, had massive famines thanks to the Great Leap Forward[0], and had basically zero international influence (didn't even have a UN seat). They weren't in a position to influence Japan in any way. Not to mention that Japan was in the opposite camp during the Cold War.

[0]https://en.wikipedia.org/wiki/Great_Leap_Forward


I mean what you're saying is correct, but I never said China instantly switched everyone over to simplified once China decided in the 50s/60s.

I am saying the superpower it has since become has "forced" most other countries to defer to using simplified rather than traditional.


After looking this up, yeah you are right. I just assumed there was a conversion of Traditional->Simplified for the most part because I had noticed earlier in my life that most of the written forms of kanji (I saw) were written in traditional form and only towards the past decade or so have I noticed that there were more characters that looked "simplified"


Lol, I wouldn't call it out as pride. There might simply be no reason to change. Ideas have been expressed in the written language for thousands of years, and I believe that it's probably as efficient as English. Very different, but similar levels of efficiency. Would it be "pride" if Chinese people questioned why English was full of inconsistencies and why don't they just change those aspects of the language?

At the same time, English is a required subject in Chinese, and hopefully as that improves, people here will get the pros (and cons) of both systems.


To a great extent it exists because of cultural inertia. The English language has a huge number of inconsistencies as well, especially in spelling; far more inconsistent than, say, Spanish, which is highly phoenetically-consistent. But nobody is editing the English language because their edited version of the langauge wouldn't fly in a formal setting. So even if English or Chinese could be made better, it's not going to happen in a heartbeat. Python2 to Python3 is enough of a pain to deal with maintainence, let alone re-learning a human language.

There are some advantages to Chinese typography over a phoenetic system. Phoenetically written languages like English go from symbols to sounds to meaning. Chinese goes directly from symbols to meaning. Eschewing the sound stage means that as sounds evolve over time, the symbols, whose meanings change on a much slower time scale, are still comprehensible for a long period of time. This means, among other things, that Chinese formal writing is mostly common to all dialects, whereas a pronounciation-based system could only serve a single dialect. For example, in Hong Kong, where Cantonese is the de facto spoken dialect of choice, I, as a Mandarin-only speaker, can still understand all the signage even though I cannot interact with a Cantonese-only speaker. 1000+ year-old texts are reasonably accessible to well-educated high school students.


OK, but the major advantage you've suggested for Chinese typography, that the meaning of written words drifts much less over hundreds of year by allowing fast-change sound to change independently, is much less valuable today, right? Modern tech means we expect less drift overall, and less problems for a given amount of drift. Furthermore, the cost of drift is not felt by the people using the language at the time, so this advantage won't influence the choices people make.


But why all those emojis after all?


It's possible that an ideographic writing system is faster to read, presuming one has got over the 'learning thousands of characters' hump without it affecting your ability to comprehend, as you're not converting a representation of sounds into morphemes, and ultimately words, into something meaningful. That is, the characters (or small clusters thereof) are meaningful in their own right, somewhat. That more direct mapping to meaning -- albeit somewhat abstracted over time -- might also be advantageous in terms of learning simple nouns and constructs (e.g., "this character means 'horse' because it kinda looks like one", etc.).

One would have to test the comprehension and reading scores of kids from alphabetic and ideographic backgrounds, at the same ages, to see if that argument held any weight.

I would be interested to know if the effort taken in memorising thousands of symbols has a beneficial effect on brain physiology. It's obviously much more computationally taxing than an alphabetic system: would this lead to better overall working memory ability, or would that be offset by the extra energy it takes to process?


Interesting hypothesis, but apparently Chinese and English read at the same rate http://persquaremile.com/2011/12/21/which-reads-faster-chine...


Well, politically speaking a script system, by nature of its inertia, looks like it can hold a nation together a lot longer than Latin. Latin was the language of the Roman Empire, but evolution over the centuries means written Romance languages are no longer mutually intelligible. Today the former Roman Empire is split into dozens of states. Whereas, for Chinese, the number of states[1] is three. China, Taiwan, Singapore.

[1] the use in this case means the region in question has its own military.


Whether huge states vs lots of small states are desirable or not is another topic entirely though.


It's nice because otherwise I wouldn't be born, with each side of my family from different parts of China. :-)


Singapore is in South-East Asia, and was a English colony with massive Chinese immigrants. Singapore was never part of China.


Right, I was speaking in the contextual assumption where people with a common written language tended to congregate into the same "sphere", make it more likely they'll join into the same state at some point - and by Chinese I meant there are three Chinese language speaking states. The word "split" is applied to Europe, and not China. I added Singapore to make my case more "conservative".


This may not be in the 'spirit' of typography, but to me it seems like font creation ought to be automatable, to some degree. Artistic flourishes may have to be added in post-production, but the basic shapes (whether Latin or Chinese or whatever) with maybe some constraint engine or machine learning to get the kerning, etc. right could be bulk generated, parameterised by brush style and dynamics and anything else that can be simulated.


It is automated. Also a lot of fonts are basically built off of other fonts. There are automation tools to systematically alter them. To do quality work, a lot of manual adjustment is necessary.

http://doc.robofont.com/documentation/welcome-to-robofont/

Chinese typesetting software had traditionally included a glyph editor so that one could add a character that isn't supported by a font. However, that's not something people want to do very often. It also involves re-inventing the wheel. Its better if one team spends years making the font comprehensive and well designed.


> To do quality work, a lot of manual adjustment is necessary.

But should that adjustment result in manual one-off changes, or should it result in a new tweak being taught to a system that can then apply it anywhere else that problem happens?

In other words, why can't we build fonts the way we build Text-to-Speech voices?


Text-to-speech voices sound pretty awkward. To make them sound less awkward, you have to either tweak the software for all sorts of special cases or pre-record a bunch of phrases. Both of these take a lot of manpower.

Fonts are the same, except they're used by many more people and last much longer than that brief moment when your text-to-speech software stumbles on an uncommon combination of words.

I'm sure that the technology will rapidly improve as more and more Chinese fonts are needed, but as long as AI remains inferior to humans in some way, I don't see it getting all automated.


This is what Knuth's METAFONT does. It was used to create the Computer Modern typeface. However, overall the results are not very pleasing IMHO.


I believe that kerning is already done that way. Basically you use your glyphs (and popular letter pairs), feed them to an app, it gives you approximate good kerning based on visible gray and possible line crossings. Then the creator adjusts anything that doesn't look completely fine, adds ligatures, etc.

Youtube seems to have a lot of examples. (https://www.youtube.com/watch?v=0zCVx9L90ac ?)


You might enjoy http://genekogan.com/works/a-book-from-the-sky.html https://news.ycombinator.com/item?id=10750252 as a perspective on automated font construction.


Yeah, you're a barbarian, but there is some recent work which tries to find regularities in the dimensioning of Latin typefaces and make them parametric. I saw a relevant talk by this guy: http://www.lettermodel.org/ at a typography conference in Dublin last weekend.


why chinese is so damn hard? (http://pinyin.info/readings/texts/moser.html)


Fascinating article – and they did a great job with the Optima-inspired typeface. I'm not usually a fan of Optima, but as a Chinese font, it looks really fresh amidst the more boring traditional fonts.


That seems like something you could partially mitigate by vectorizing fonts and applying transformation and brush stroke rules to generate a base font you could then tweak as appropriate. Eg. Rules for how pronounced curves should be, serifs, spacing. Etc.


there are radicals and stroke patterns that can be recorded as chains of geometric macros and plugged into common reverse stroke-to-text tools found online, adjusting parameters or introducing custom paint motifs on this would produce fonts trivially; text character dictionary -> lookup in stroke recognizer -> algorithmic painting around strokes


So what's the Chinese script equivalent of Helvetica vs. Arial, if there is one?


The most well known ones would be Sung and Ming, and Ching. Emphasis are on ratio of width of horizontal or vertical strokes. And they are always sans, with little triangle at the end of strokes.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: