首页 > 解决方案 > 从数组中删除重复 JSON 条目的最有效方法?

问题描述

我很好奇开发人员使用什么技术来删除 JSON 数组中重复的 JSON 条目。例如:

[{"name":"philip"},{"name":"steve"},{"name":"philip"},{"name":"seth"}]

目标是删除多余的“菲利普”。

现在我正在对可能非常大(数百个)的对象数组执行此操作。当然,我可以编写代码来手动迭代它们,但速度是一个因素。

我确实尝试了将 JSON 转换为表格然后使用 DISTINCT 清理数据的技术:

-- Clean the data by removing duplicate entries. We do this by using the new JSON_TABLE function.
-- We convert the JSON array to a table, and then re-select it as DISTINCT on the id value.
SELECT JSON_ARRAYAGG(JSON_OBJECT('id',sit_distinct.id,'name',sit_distinct.sit_name,'type',sit_distinct.sit_type_c)), 
    COUNT(*) INTO json_data_total, var_count_total 
    FROM
    (
    SELECT DISTINCT(id), sit_name, sit_type_c 
        FROM JSON_TABLE(json_data_total, '$[*]' COLUMNS(rowid FOR ORDINALITY, id int PATH '$.id', sit_name varchar(128) PATH '$.name', sit_type_c tinyint PATH '$.type')) AS JT ORDER BY id
    ) AS sit_distinct;

虽然它在我的本地机器上运行良好,但 RDS 上的 MySQL 会因上述原因而崩溃。肯定是某种内部错误,我可能无法解决。

作为参考,这是上述代码导致的内部错误:

18:53:39 UTC - mysqld got signal 11 ;
Most likely, you have hit a bug, but this error can also be caused by malfunctioning hardware.
Thread pointer: 0x2b9d3eda6000
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 2b9d4627fcc0 thread_stack 0x80000
/rdsdbbin/mysql/bin/mysqld(my_print_stacktrace(unsigned char*, unsigned long)+0x2e) [0x1db533e]
/rdsdbbin/mysql/bin/mysqld(handle_fatal_signal+0x2f3) [0xf79383]
/lib64/libpthread.so.0(+0xf100) [0x2b9c90cc1100]
/rdsdbbin/mysql/bin/mysqld(temptable::Table::insert(unsigned char const*)+0x17) [0x2336ca7]
/rdsdbbin/mysql/bin/mysqld(handler::ha_write_row(unsigned char*)+0x184) [0x10848e4]
/rdsdbbin/mysql/bin/mysqld(Table_function::write_row()+0x1b) [0x12534db]
/rdsdbbin/mysql/bin/mysqld(Table_function_json::fill_json_table()+0x2e8) [0x1255af8]
/rdsdbbin/mysql/bin/mysqld(Table_function_json::fill_result_table()+0xc4) [0x1255bf4]
/rdsdbbin/mysql/bin/mysqld(MaterializedTableFunctionIterator::Init()+0x16) [0x10304a6]
/rdsdbbin/mysql/bin/mysqld(MaterializeIterator::Init()+0xf2) [0x1032752]
/rdsdbbin/mysql/bin/mysqld(filesort(THD*, Filesort*, bool, RowIterator*, Sort_result*, unsigned long long*, unsigned long long*)+0x409) [0x1069bb9]
/rdsdbbin/mysql/bin/mysqld(SortingIterator::DoSort(QEP_TAB*)+0xa9) [0xdd8529]
/rdsdbbin/mysql/bin/mysqld(SortingIterator::Init()+0x47) [0xdd8657]
/rdsdbbin/mysql/bin/mysqld(MaterializeIterator::Init()+0xf2) [0x1032752]
/rdsdbbin/mysql/bin/mysqld(AggregateIterator::Init()+0x16) [0x1030426]
/rdsdbbin/mysql/bin/mysqld(JOIN::exec()+0x425) [0xe25155]
/rdsdbbin/mysql/bin/mysqld(Sql_cmd_dml::execute_inner(THD*)+0x2ee) [0xea54ce]
/rdsdbbin/mysql/bin/mysqld(Sql_cmd_dml::execute(THD*)+0x438) [0xead1e8]
/rdsdbbin/mysql/bin/mysqld(mysql_execute_command(THD*, bool)+0x1c64) [0xe5c0b4]
/rdsdbbin/mysql/bin/mysqld(sp_instr_stmt::exec_core(THD*, unsigned int*)+0x50) [0xde0cf0]
/rdsdbbin/mysql/bin/mysqld(sp_lex_instr::reset_lex_and_exec_core(THD*, unsigned int*, bool)+0x19b) [0xde2b0b]
/rdsdbbin/mysql/bin/mysqld(sp_lex_instr::validate_lex_and_execute_core(THD*, unsigned int*, bool)+0xab) [0xde373b]
/rdsdbbin/mysql/bin/mysqld(sp_instr_stmt::execute(THD*, unsigned int*)+0xb1) [0xde50e1]
/rdsdbbin/mysql/bin/mysqld(sp_head::execute(THD*, bool)+0x5a8) [0xdda728]
/rdsdbbin/mysql/bin/mysqld(sp_head::execute_procedure(THD*, List<Item>*)+0x714) [0xddd674]
/rdsdbbin/mysql/bin/mysqld(Sql_cmd_call::execute_inner(THD*)+0x14a) [0x122704a]
/rdsdbbin/mysql/bin/mysqld(Sql_cmd_dml::execute(THD*)+0x438) [0xead1e8]
/rdsdbbin/mysql/bin/mysqld(mysql_execute_command(THD*, bool)+0x1c64) [0xe5c0b4]
/rdsdbbin/mysql/bin/mysqld(mysql_parse(THD*, Parser_state*)+0x30f) [0xe5f00f]
/rdsdbbin/mysql/bin/mysqld(dispatch_command(THD*, COM_DATA const*, enum_server_command)+0x28a9) [0xe61da9]
/rdsdbbin/mysql/bin/mysqld(do_command(THD*)+0x179) [0xe62d39]
/rdsdbbin/mysql/bin/mysqld() [0xf6b370]
/rdsdbbin/mysql/bin/mysqld() [0x22b01e7]
/lib64/libpthread.so.0(+0x7dc5) [0x2b9c90cb9dc5]
/lib64/libc.so.6(clone+0x6d) [0x2b9c914dcc9d]

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (2b9d481f5200): SELECT JSON_ARRAYAGG(JSON_OBJECT('id',sit_distinct.id,'name',sit_distinct.sit_name,'type',sit_distinct.sit_type_c)), COUNT(*) INTO json_data_total, var_count_total FROM ( SELECT DISTINCT(id), sit_name, sit_type_c FROM JSON_TABLE( NAME_CONST('json_data_total',_utf8mb4'[{\"id\": 1023, \"name\": \"Automotive\", \"type\": 1}, {\"id\": 1072, \"name\": \"Hospitality\", \"type\": 1}, {\"id\": 1076, \"name\": \"Industrial Automation\", \"type\": 1}, {\"id\": 1082, \"name\": \"Internet\", \"type\": 1}, {\"id\": 1101, \"name\": \"Medical Devices\", \"type\": 1}, {\"id\": 1138, \"name\": \"Research\", \"type\": 1}, {\"id\": 1160, \"name\": \"Wireless\", \"type\": 1}]' COLLATE 'utf8mb4_bin'), '$[*]' COLUMNS(rowid FOR ORDINALITY, id int PATH '$.id', sit_name varchar(128) PATH '$.name', sit_type_c tinyint PATH '$.type')) AS JT ORDER BY id ) AS sit_distinct
Connection ID (thread ID): 21
Status: NOT_KILLED

所以,我需要弄清楚如何防止这个错误(它只发生在 RDS 上而不是我的本地机器上)或者想出一种快速有效的新技术——如前所述,我知道我可以编写自己的代码这样做,但我希望使用更快更有效的内部方法。

任何想法表示赞赏!

标签: mysqlarraysjsonamazon-rds

解决方案


MySQL 8.0.21 修复了错误(这是一个 MySQL 错误)。


推荐阅读