首页 > 解决方案 > 如何从索引中删除行

问题描述

我知道我不能从索引中删除行,我只能从实时索引中删除行。但我必须从索引中删除行,但我现在不知道该怎么做。所以,这是我的表和记录:

+------+------+--------+
| id   | name | status |
+------+------+--------+
|    1 | aaa  |      1 |
|    2 | bbb  |      1 |
|    3 | ccc  |      1 |
+------+------+--------+

这是我的狮身人面像配置:

source mainSourse : mainConfSourse 
{
    sql_query = \
        SELECT id, name, status \
        from test_table

    sql_field_string = name
    sql_attr_uint = status

}
index testIndex
{
    source          = mainSourse
    path            = C:/sphinx/data/test/testIndex
    morphology      = stem_enru

    charset_table = 0..9, A..Z->a..z, _, a..z, U+410..U+42F->U+430..U+44F, U+430..U+44F, U+401->U+0435, U+451->U+0435
    min_prefix_len = 3
    index_exact_words = 1
    expand_keywords = 1
}

index testIndexRT
{
    type            = rt
    path            = C:/sphinx/data/test/testIndexRT

    rt_field = name
    rt_attr_string = name
    rt_attr_uint = status

    charset_table = 0..9, A..Z->a..z, _, a..z, U+410..U+42F->U+430..U+44F, U+430..U+44F, U+401->U+0435, U+451->U+0435
    min_prefix_len = 3
    index_exact_words = 1
    expand_keywords = 1
}

sphinx 服务器启动后,如果我想从 testIndex 更新记录,我只需将新记录写入 testIndexRT 示例:

insert into testIndexRT (id,name,status) values (1,'aaa_updated',1);

然后那个请求select * from testIndex,testIndexRT where status=1;告诉我:

+------+-------------+--------+
| id   | name        | status |
+------+-------------+--------+
|    1 | aaa_updated |      1 |
|    2 | bbb         |      1 |
|    3 | ccc         |      1 |
+------+-------------+--------+

成功了,太棒了!但是当我想从索引中删除记录时问题就开始了。我以为我会像更新一样轻松,但在这段代码之后update testIndexRT set status=2 where id=1我看到:

+------+------+--------+
| id   | name | status |
+------+------+--------+
|    1 | aaa  |      1 |
|    2 | bbb  |      1 |
|    3 | ccc  |      1 |
+------+------+--------+

sphinx 刚刚向我展示了来自 testIndex 的记录,尽管在 testIndexRT 中更新了 id 为 1 的行select * from testIndexRT;

+------+--------+-------------+
| id   | status | name        |
+------+--------+-------------+
|    1 |      2 | aaa_updated |
+------+--------+-------------+

我意识到它的方法不起作用:(我无法将所有记录从 DB 保存到 testIndexRT,因为我的 realy 表很大,它的大小约为 60 Gb。请有人告诉我,也许还有其他我不知道的方法知道?

标签: sphinx

解决方案


60G 对于 RT 索引应该不是问题,但是如果你想坚持使用普通索引,你可以使用 main+delta 技术来实现你想要的。这是一个互动课程 - https://play.manticoresearch.com/maindelta/(它基于 Manticore Search,它是 Sphinx 的一个分支,但在 Sphinx 中应该都是一样的,只是 killlist_target 在 Sphinx 3 中的命名不同)。

这是另一个例子:

MySQL:

mysql> desc data;
+---------+------------+------+-----+-------------------+-----------------------------+
| Field   | Type       | Null | Key | Default           | Extra                       |
+---------+------------+------+-----+-------------------+-----------------------------+
| id      | bigint(20) | NO   | PRI | 0                 |                             |
| body    | text       | YES  |     | NULL              |                             |
| updated | timestamp  | NO   |     | CURRENT_TIMESTAMP | on update CURRENT_TIMESTAMP |
+---------+------------+------+-----+-------------------+-----------------------------+
3 rows in set (0.00 sec)

mysql> desc helper;
+----------+--------------+------+-----+-------------------+-----------------------------+
| Field    | Type         | Null | Key | Default           | Extra                       |
+----------+--------------+------+-----+-------------------+-----------------------------+
| chunk_id | varchar(255) | NO   | PRI |                   |                             |
| built    | timestamp    | NO   |     | CURRENT_TIMESTAMP | on update CURRENT_TIMESTAMP |
+----------+--------------+------+-----+-------------------+-----------------------------+
2 rows in set (0.00 sec)

配置:

source main
{
        type = mysql
        sql_host = localhost
        sql_user = root
        sql_pass =
        sql_db = test
        sql_query_pre = replace into helper set chunk_id = '1_tmp', built = now()
        sql_query = select id, body, unix_timestamp(updated) updated from data where updated >= from_unixtime($start) and updated <= from_unixtime($end)
        sql_query_range = select (select unix_timestamp(min(updated)) from data) min, (select unix_timestamp(built) - 1 from helper where chunk_id = '1_tmp') max
        sql_query_post_index = replace into helper set chunk_id = '1', built = (select built from helper t where chunk_id = '1_tmp')
        sql_range_step = 100
        sql_field_string = body
        sql_attr_timestamp = updated
}

source delta : main
{
        sql_query_pre =
        sql_query_range = select (select unix_timestamp(built) from helper where chunk_id = '1') min, unix_timestamp() max
        sql_query_killlist = select id from data where updated >= (select built from helper where chunk_id = '1')
        killlist_target = idx_main:kl
}

index idx
{
        type = distributed
        local = idx_main
        local = idx_delta
}

推荐阅读