sql - 使用 ORDER、LIMIT 和 IN 谓词进行查询的 Cassandra 表设计
问题描述
我的数据看起来像:
select * from test;
department | employee | batch_number | hash
------------+----------+--------------+-------
dep1 | Bart | 1 | hash1
dep1 | Bart | 1 | hash2
dep1 | Lisa | 3 | hash3
dep1 | Lisa | 4 | hash4
dep1 | John | 5 | hash5
dep1 | Lucy | 6 | hash6
dep1 | Bart | 7 | hash7
dep1 | Bart | 7 | hash8
我想用一个where
子句 on batch_number
、一个ordering
onbatch_number
和一个in
谓词 on来查询数据employee
。
在关系数据库中,这看起来像
select * from test
where department='dep1'
and employee in ('Bart','Lucy','John')
and batch_number >= 2
order by batch_number desc
limit 3;
department | employee | batch_number | hash
------------+----------+--------------+-------
dep1 | Bart | 7 | hash7
dep1 | Bart | 7 | hash8
dep1 | Lucy | 6 | hash6
我在 Cassandra 中为这个查询建模表时遇到了一些问题。department
将是我的分区键,并且hash
需要成为主键的一部分。但我正在努力处理集群键和/或((SSTable 附加)二级)索引。
因为我想订购,batch_number
所以我尝试将它作为集群键包含在内:
CREATE TABLE keyspace.test(
department TEXT,
batch_number INT,
hash TEXT,
employee TEXT,
PRIMARY KEY ((department), batch_number, hash)
) WITH CLUSTERING ORDER BY (batch_number DESC);
CREATE INDEX tst_emp ON keyspace.test (employee);
但这不允许in
在我的索引上使用谓词进行查询:
select * from keyspace.test where department='dep1' and employee in ('Bart','Lucy','John');
InvalidRequest: Error from server: code=2200 [Invalid query] message="IN predicates on non-primary-key columns (employee) is not yet supported"
所以我也尝试将该employee
列添加为集群键:
CREATE TABLE keyspace.test(
department TEXT,
batch_number INT,
hash TEXT,
employee TEXT,
PRIMARY KEY ((department), batch_number, hash, employee)
) WITH CLUSTERING ORDER BY (batch_number DESC);
但这失败了,因为我不能将非 EQ 关系放在batch_number
:
select * from keyspace.test where department='dep1' and batch_number > 1 and employee in ('Bart','Lucy','John');
InvalidRequest: Error from server: code=2200 [Invalid query] message="Clustering column "employee" cannot be restricted (preceding column "batch_number" is restricted by a non-EQ relation)"
但是每当我在失去订购能力employee
之前放上:batch_number
batch_number
CREATE TABLE keyspace.test(
department TEXT,
employee TEXT,
batch_number INT,
hash TEXT,
PRIMARY KEY ((department), employee, batch_number, hash)
);
select * from keyspace.test where department='dep1' and employee in ('Bart','Lucy','John') ORDER BY batch_number DESC;
InvalidRequest: Error from server: code=2200 [Invalid query] message="Order by currently only support the ordering of columns following their declared order in the PRIMARY KEY"
那么什么样的表设计会允许这样的查询呢?这可以在 Cassandra 中完成吗?
编辑:
我希望能够在此表上运行的其他查询是:
select * from keyspace.test where department='X' and batch_number=Y
和
delete from keyspace.test where department='X'
解决方案
使用允许您重新排列数据的物化视图:
CREATE MATERIALIZED VIEW mv_test AS
SELECT
department,
batch_number,
employee,
hash
FROM
test
WHERE
department IS NOT NULL
AND batch_number IS NOT NULL
AND employee IS NOT NULL
AND hash IS NOT NULL
PRIMARY KEY (department, employee, batch_number, hash)
WITH clustering
ORDER BY
(batch_number DESC);
我可以执行以下查询:
SELECT * FROM mv_test
WHERE
department = 'dep1'
AND employee IN
(
'Bart',
'Lisa'
)
AND batch_number > 3;
结果按聚类顺序排序:
department | employee | batch_number | hash
------------+----------+--------------+-------
dep1 | Bart | 7 | hash7
dep1 | Bart | 7 | hash8
dep1 | Lisa | 4 | hash4
While>
子句是不相等的子句,IN
虽然具有多个值,但仍然是确定性的,这就是为什么我相信您可以毫无问题地过滤键的原因。由于batch_number
是您要过滤的最后一件事,因此允许使用任何类型的 where 子句。我假设你总是有department
.
请注意,物化视图会影响性能。更具体地说,写入性能。但是,与ALLOW FILTERING
.
更新:
在物化视图末尾指定的顺序表示batch_number
,但是,它将首先排序department
,然后employee
,然后,因此不保证具体batch_number
的顺序。batch_number
据我所知,没有办法解决这个问题。另一种数据库解决方案可能更可取。
更新 2:
如 Apache 邮件链中所述(请参阅下面的评论),物化视图不被视为生产就绪。但是,Datastax 认为它们是可用的,前提是使用提到的最佳实践来处理它们。就个人而言,我对物化视图没有任何问题。当然,对于一个简单的单一数据中心集群,考虑到最佳实践提到更复杂的设置,它们可能会在这种情况下崩溃。
推荐阅读
- php - PHP 7.3 警告 count():参数必须是数组或实现 Countable 的对象
- python - Selenium 在每个请求上都会超时。我怎样才能使它更可靠?
- java - Java应用程序kerberos初始化-身份验证延迟
- angular - 具有用户选择的角递归表单创建器
- struct - 通过给定的字符串访问结构字段
- python - 如何使用 df.apply 在列之间切换?
- laravel - laravel 无法以管理员身份登录
- python - 可以描述此代码功能的循环不变性/不变量是什么?
- java - 是否可以在 Eclipse 中模拟平台 API 来进行 Junit 测试?
- python - 无法从葡萄园导入 NodeStyle?