Unicode and utf-8
What is Unicode?
Unicode provides a unique number for every character,
no matter what the platform,
no matter what the program,
no matter what the language.
from What is Unicode?, The Unicode Consortium
Different Encodings for Unicode
Letter/characters maps to a "code point". Different ways of arranging the bytes to represent the code points.
- utf-8: variable-length (1 to 4 bytes; characters 0-127 are all one byte)
- utf-16: variable-length (2 or 4 bytes; 1 or 2 2-byte unit)
- utf-32: fixed-length (4 bytes)
Want more? See:
- What is Unicode?
- The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) by Joel Spolsky
- And similarly titled, but different -
What Every Programmer Absolutely, Positively Needs To Know About Encodings And Character Sets To Work With Text