A normal character set contains only 256 characters. 32 of them are used as control characters as Line Feed, Carriage Return and so on, so that only 224 are usable.
Even restricted to latin characters, 224 characters are by far not enough for all language in the world.
And don't forget the block graphic characters...
The solution is to use different character sets for different languages. A character set is called
'Codepage'.
Examples:
In english there are no special characters, so the createors of the first character set for IBM PCs
added a lot of block graphic characters. This set is called codepage 437 today.
But it contains special characters too for german, french and others, like 'ß', 'ü', 'é'...
Codepage 865 for Norway and Denmark is nearly identical to 437. Only two block graphic characters was
replaced by 'ø' and 'Ø'.
Codepage 850 has less block graphic characters and more special characters used in Europe, like
'ã', 'Ð', 'Þ'.
Codepage 852 is for slav languages and contains characters.
In codepage 862 there are 27 hebrew characters that replaces most of accent characters and all german
umlauts of codepage 437.
Codepage 866 is for cyrillic.
The lower half of these codepages is identical (ASCII-Code).
Under english versions of Windows codepage 1252 is default. Because Windows is has a graphical interface, block graphic characters aren't needed. So, in codepage 1252 is a lot of space for special
characters that are spreaded to serval codepages in DOS. That's why I could use all these
different characters here...
Some characters for slav languages are in codepage 1250 only. For cyrillc there is codepage
1251, for greek 1253, for hebrew 1255, arabic 1256, and so on...
The list of codepages is long. Here is a overview:
http://www.kostis.net/charsets
Here are general information about 'Characters and encodings':
http://www.cs.tut.fi/~jkorpela/chars/index.html
|