Lock, mutex, semaphore… what’s the difference?

[Originally Posted By]: http://stackoverflow.com/questions/2332765/lock-mutex-semaphore-whats-the-difference

A lock allows only one thread to enter the part that’s locked and the lock is not shared with any other processes.

A mutex is the same as a lock but it can be system wide (shared by multiple processes).

A semaphore does the same as a mutex but allows x number of threads to enter.

You also have read/write locks that allows either unlimited number of readers or 1 writer at any given time.


UTF8, UTF16, and UTF32

From http://stackoverflow.com/questions/496321/utf8-utf16-and-utf32

UTF-8 has an advantage where ASCII are most prevalent characters. In that case most characters only occupy one byte each. It is also advantageous that UTF-8 file containing only ASCII characters has the same encoding as an ASCII file.

UTF-16 is better where ASCII is not predominant, it uses 2 bytes per character primarily. UTF-8 will start to use 3 or more bytes for the higher order characters where UTF-16 remains at just 2 most of the time.

UTF-32 will cover all possible characters in 4 bytes each which makes it pretty bloated, I can’t think of any advantage to use it.


In short:

  • UTF8: Variable-width encoding, backwards compatible with ASCII. ASCII characters (U+0000 to U+007F) take 1 byte, code points U+0080 to U+07FF take 2 bytes, code points U+0800 to U+FFFF take 3 bytes, code points U+10000 to U+10FFFF take 4 bytes. Good for English text, not so good for Asian text.
  • UTF16: Variable-width encoding. Code points U+0000 to U+FFFF take 2 bytes, code points U+10000 to U+10FFFF take 4 bytes. Bad for English text, good for Asian text.
  • UTF32: Fixed-width encoding. All code points take 4 bytes. An enormous memory hog, but fast to operate on. Rarely used.

In long: see Wikipedia: UTF-8, UTF-16, and UTF-32


Unicode is a standard and about UTF-x you can think as a technical implementation for some practical purposes:

  • UTF-8 – “size optimized“: best suited for Latin character based data (or ASCII), it takes only 1 byte per character but the size grows accordingly symbol variety (and in worst case could grow up to 6 bytes per character)
  • UTF-16 – “balance“: it takes minimum 2 bytes per character which is enough for existing set of the mainstream languages with having fixed size on it to ease character handling (but size is still variable and can grow up to 4 bytes per character)
  • UTF-32 – “performance“: allows using of simple algorithms as result of fixed size characters (4 bytes) but with memory disadvantage