UTF-8 has an advantage where ASCII are most prevalent characters. In that case most characters only occupy one byte each. It is also advantageous that UTF-8 file containing only ASCII characters has the same encoding as an ASCII file.
UTF-16 is better where ASCII is not predominant, it uses 2 bytes per character primarily. UTF-8 will start to use 3 or more bytes for the higher order characters where UTF-16 remains at just 2 most of the time.
UTF-32 will cover all possible characters in 4 bytes each which makes it pretty bloated, I can’t think of any advantage to use it.
- UTF8: Variable-width encoding, backwards compatible with ASCII. ASCII characters (U+0000 to U+007F) take 1 byte, code points U+0080 to U+07FF take 2 bytes, code points U+0800 to U+FFFF take 3 bytes, code points U+10000 to U+10FFFF take 4 bytes. Good for English text, not so good for Asian text.
- UTF16: Variable-width encoding. Code points U+0000 to U+FFFF take 2 bytes, code points U+10000 to U+10FFFF take 4 bytes. Bad for English text, good for Asian text.
- UTF32: Fixed-width encoding. All code points take 4 bytes. An enormous memory hog, but fast to operate on. Rarely used.
Unicode is a standard and about UTF-x you can think as a technical implementation for some practical purposes:
- UTF-8 – “size optimized“: best suited for Latin character based data (or ASCII), it takes only 1 byte per character but the size grows accordingly symbol variety (and in worst case could grow up to 6 bytes per character)
- UTF-16 – “balance“: it takes minimum 2 bytes per character which is enough for existing set of the mainstream languages with having fixed size on it to ease character handling (but size is still variable and can grow up to 4 bytes per character)
- UTF-32 – “performance“: allows using of simple algorithms as result of fixed size characters (4 bytes) but with memory disadvantage