首页 > 解决方案 > Selecting strings from a column where some are utf8 encoded and others are not

问题描述

The data is the names of country subdivisions. Some have been stored as utf8 and some are not. example, this is how they are in my table:

statename

Bocas del Toro
Chiriquí
Coclé
Colón
Darién
Veraguas
Panamá Oeste
Emberá
Kuna Yala
Ngöbe-Buglé

This question/answer gets me really close to a solution: How to fix double-encoded UTF8 characters (in an utf-8 table)

If I use: CONVERT(CAST(CONVERT(statename USING latin1) AS BINARY) USING utf8):

statename

Bocas del Toro
Chiriquí
Coclé
Col
Dari
Veraguas
Panam
Emberá
Kuna Yala
Ng

the characters stored as "é" for example, just end the string.

the variation provided in that answer ,

SELECT CASE
    WHEN CONVERT( CAST( CONVERT( statename USING latin1 ) AS BINARY ) USING utf8 ) IS NULL
        THEN statename
    ELSE CONVERT( CAST( CONVERT( statename USING latin1 ) AS BINARY ) USING utf8 )
END
FROM 

returned the same result, though I am not even sure i implemented it correctly in this select.

I am not permitted to normalize this data in this case, so I would like to select it and get

Bocas del Toro
Chiriquí
Coclé
Colón
Darién
Veraguas
Panamá Oeste
Emberá
Kuna Yala
Ngöbe-Buglé

Will this be possible?

标签: mysqlutf-8

解决方案


这似乎是SQL_MODE. 为了使转换失败并返回NULL-STRICT_TRANS_TABLES必须设置模式。你可以设置它

SET SESSION sql_mode = CONCAT('STRICT_TRANS_TABLES,', @@sql_mode);

如果您不想在同一会话中中断其他“工作”查询,则应在获得结果后将其重置:

SET @old_sql_mode = @@sql_mode;
SET SESSION sql_mode = CONCAT('STRICT_TRANS_TABLES,', @@sql_mode);

SELECT COALESCE(
  CONVERT( CAST( CONVERT( statename USING latin1 ) AS BINARY ) USING utf8 ), statename
) as statename
FROM yourTable;

SET SESSION sql_mode = @old_sql_mode;

DB Fiddle 演示

注意:我已将您的查询更改为使用COALESCE()而不是CASE语句,因此您不需要复制转换代码。


推荐阅读