Hacker News new | past | comments | ask | show | jobs | submit login

Yes, but if `char` is signed, as it usually is, its bit patterns correspond to values -0x80 to 0x7F. So yeah, you can no-cost encode the >=0x80 code units as their two’s complement counterparts but it feels suspicious. At least to me, after writing some Rust lately which very much does not do implicit signed–unsigned conversions. Much better for char to always represent the "basic character set" (ie. usually ASCII) and have a distinct type for UTF-8.



Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: