Not just tuned for web workloads in general, but for *specific* web workloads. T...

legulere · on Sept 1, 2016

The dictionary also contains lots of Chinese, Russian and Arabic

duskwuff · on Sept 2, 2016

By the numbers:

- 9216 phrases total

- 5857 (63.5%) pure ASCII phrases, mostly English with a few Spanish words thrown in

- 1372 (14.8%) code fragments -- mostly HTML, CSS, and Javascript

- 1027 (11.1%) CJK (Chinese, Japanese, and Korean, and mostly the first two) phrases -- it's very hard to tell Chinese and Japanese apart in this context; I didn't try

- 158 (1.7%) phrases containing extended Latin-1 characters (nearly all Spanish words)

- 303 (3.3%) Cyrillic script (probably Russian) phrases

- 322 (3.5%) Arabic phrases

- 172 (1.9%) Devanagari script (Hindi) phrases

Plus a few miscellaneous other scripts and generally unclassifiable content.