About Unicode and UTF-8

character encoding scheme

A combination of the two concepts of character sets and character encodings.

What is Unicode?

Unicode is a set of encoded characters.
A character set is a collection of characters that can be represented ("a", "wa", ...).
A numerical value (non-negative integer value) is assigned to each character in the character set called a code point (code position) in the encoded character set.


UTF-8 is a character encoding scheme of Unicode
A character encoding scheme refers to the conversion of code points into a sequence of bytes

A simple sample

A" expressed by Unicode code point => U+3042
Encoded in UTF-8 => 0xE3 0x81 0x82
encoded in UTF-16 => 0x30 0x42