<< lz-string: We'll try to make it faster | Home | Just released: color-finder >>

MySQL charset hell

There are not a lot of paradises on this corner of the earth, but there's one sure hell: charsets with MySQL. MySQL has the peculiar idea of defaulting everything to latin1. Everything.

Everything should be utf-8 of course, just because it fucking works with every character on earth. And no, MySQL's utf8 cannot. Because they rushed it in I suppose. So if you want to be able to use every character on earth, you should use utf8mb4.

Now, let's assume you got the following error message somewhere in some obscure logs:

Illegal mix of collations (latin1_swedish_ci,IMPLICIT) and (utf8_general_ci,COERCIBLE) for operation '='
This means one thing: you're in MySQL charset hell. Welcome in. You're deep in. It's hot.

The road to heaven

First, you will need to edit your my.cnf file, to be found on linux in /etc/mysql/ and add/modify the following properties in the proper section:
[client] default-character-set=utf8mb4 [mysql] default-character-set=utf8mb4 [mysqld] default-character-set = utf8mb4 // NOTE: Newer versions of mysql seems to choke on this one. collation-server = utf8mb4_unicode_ci init-connect='SET NAMES utf8mb4' character-set-server = utf8mb4
The goal is to see everything as UTF-8 while running the following:
show variables like 'char%';
The result you want is this:
+--------------------------+----------------------------+ | Variable_name | Value | +--------------------------+----------------------------+ | character_set_client | utf8mb4 | | character_set_connection | utf8mb4 | | character_set_database | utf8mb4 | | character_set_filesystem | binary | | character_set_results | utf8mb4 | | character_set_server | utf8mb4 | | character_set_system | utf8mb4 | | character_sets_dir | /usr/share/mysql/charsets/ | +--------------------------+----------------------------+
Similarly for collation:
mysql> show variables like 'collation%'; +----------------------+--------------------+ | Variable_name | Value | +----------------------+--------------------+ | collation_connection | utf8mb4_general_ci | | collation_database | utf8mb4_general_ci | | collation_server | utf8mb4_general_ci | +----------------------+--------------------+

Now what?

Now, you're still in hell, but you just got the promise of not getting deeper. And you still need to get out, that is to change all your tables to store UTF-8. There you have several options, some consisting in dumping everything, recreating a new DB and reloading everything (see the second link below). I chose the path of the ALTER. Here is what you can do:

Migrate your dbs:

ALTER DATABASE YourDbName DEFAULT CHARACTER SET utf8mb4 DEFAULT COLLATE utf8mb4_general_ci;

Locate the faulty tables:

SELECT table_schema,table_name, column_name, character_set_name FROM information_schema.`COLUMNS` C where character_set_name is not null;

Migrate table after table:

ALTER TABLE t CONVERT TO CHARACTER SET utf8mb4;

This is the easiest solution by far and it will convert all char based columns. If for any reason you want to convert only a few columns, you will have to alter them by hand:

ALTER TABLE t MODIFY col1 CHAR(50) CHARACTER SET utf8mb4;
Of course, the CHAR(50) piece (and eventual other options such as NOT NULL) depends on the column. Good luck.

And remember: For every MySQL install you do from that point on, do this first. Less trouble for after.



Thanks:

Dos and Don'ts

Never do
CREATE DATABASE new_db;
Always do
CREATE DATABASE new_db DEFAULT CHARACTER SET utf8mb4 DEFAULT COLLATE utf8mb4_general_ci;
Avatar: elmiguel

Re: MySQL charset hell

You left this wide open:

 

"it fucking works with every character on earth"

 

-unless the character is a Chuck Norris character, then your fucked!

Avatar: pieroxy

Re: MySQL charset hell

 There are certain things computers are powerless against, indeed.

Home