Ah, Language, isn’t it wonderful. If you’re not a programmer
Well I’ve spent a good part of the day reading the MySQL manual pages for Language support… and I’ve got to say, life would be so much simpler if everyone only wrote, spoke and used English… I know that’s not a very global attitude, but really, when you get into things like multilingual support (i.e., Unicode) in applications, programming languages, databases, web browsers its a wonder that anything works at all!
This basically comes into play, when importing, exporting or displaying data across different systems that use different character sets. For example reading data from database (stored in latin1 for example) and them manipulating it in PHP (defaults to cp1251, I believe) and then displaying in in a user’s web browser (using utf8). Not only do you have to worry is the character that I want available in the character set that my programming language uses, but now you have to make sure you translate it properly when reading and outputting it. Not only are there all these different character sets, but no two applications/databases/programing languages refer to them by the same name! Talk about impossible tasks!
Not only are there different characters in different languages, but there are differences how the characters in each language are sorted - this is referred to as a collating sequence. For instance in Spanish an ‘ñ’ (n-tilde) sorts between the ‘n’ and ‘0’ characters. And if using traditional spanish, there’s a ‘ch’ character that sorts in between ‘c’ and ‘d’, like wise with the character ‘ll’, its in between ‘l’ and ‘m’. And it gets worse for other languages such as Swedish/Finish.
This issue also affects the comparison of characters and doing database searches… like is ‘Ano’ equal to ‘Año’ (2nd character is a n-tilde)? Other examples (not sure these will show up correctly in this post) are ‘ß’ is ‘technically’ equal to ‘ss’, but some programming languages/database may say its equal to ‘s’ (I don’t even know what ‘ß’ is for). And look at these other ones, ‘Ä’ = ‘A’, ‘Ö’ = ‘O’ and ‘Ü’ = ‘U’!
Woe is me… and here UTF was supposed to be the holy grail. Remember, “There is no silver bullet.”
1 year ago
