Hacker News new | past | comments | ask | show | jobs | submit login

I might have problems with unicode in PHP (I really don't know much here yet) so I opted to remove them (for now) using regex.

btw I'm using http://simplehtmldom.sourceforge.net/ to scrape/parse which is great, it let's you use jquery style selectors on html.




ö doesn't need Unicode. It's ISO-Latin-1 (ISO-8859-1). An update is ISO-8859-15. Latin-1 is an 8 bit character set including ASCII, covering the Western European languages. I don't know PHP, but it could be the default or a simple character set option.



In that case, it should replace ö with o, etc.


I have a screenscraping thing in place that targets an ASCII only environment (LambdaMOO mud). Manually replacing unicode stuff like this has been a pain in the butt. Luckily that was all just for fun and didn't need to be perfect.

Is there a good library out there (in any language) that does good unicode --> ASCII substitutions for major languages?


I once used latin1_to_ascii (The Unicode Hammer) in python: (works great) http://code.activestate.com/recipes/251871/


Awesome thanks. Two coincidences: this fun mud thing is in Python (works via RPC)... and the entry method name is "hammer," heh.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: