python-3.x - fn.AVG peewee 函数的意外行为
问题描述
我正在开展一个项目,以使用 python 和 peewee 作为学习练习来探索 covid-19 案例。
这个问题涉及两个表:
域模型将全球区域描述到美国的县级粒度。例如,纽约县有一行包含字段 Domain.country_region = 'US'、Domain.province_state = 'New York'、Domain.admin2 = 'New York'。
确认模型包含自今年早些时候以来每个域的已确认 covid-19 病例的累积计数。
近期目标是计算每个域的 5 天滚动平均值。最终,我计划能够确定县和州感兴趣的发展,例如“尖峰”等。
下面我包括模型、测试脚本和脚本的输出。我正在使用子查询来计算每日病例数,因为我无法创建使用同一表行中其他计算列的计算列。
问题是在“q”查询中,fn.AVG 不是计算最近 5 天每日确诊病例的平均值,而是返回最后一天(2020 年 6 月 19 日)的值。
我正要尝试在 Confirmed 表中使用混合属性而不是这种方法,但我很好奇为什么这段代码不起作用。
型号(仅列出相关表格):
from peewee import *
db = SqliteDatabase('data/covid.db')
class BaseModel(Model):
class Meta:
database = db
class Domain(BaseModel):
uid = IntegerField(primary_key=True)
country_region = CharField()
province_state = CharField(default='')
admin2 = CharField(default='')
population = IntegerField(default=0)
area = FloatField(default=0.0)
density = FloatField(default=0.0)
class Confirmed(BaseModel):
uid = ForeignKeyField(Domain)
date = DateField()
count = IntegerField()
完整代码:
from tabulate import tabulate
from covid.models import *
from datetime import date
subq = (Domain
.select(
Domain.uid,
Domain.country_region,
Domain.province_state,
Domain.admin2,
Confirmed.date,
Confirmed.count,
(Confirmed.count - fn.LAG(Confirmed.count, 1)
.over(
partition_by=[Domain.country_region, Domain.province_state,
Domain.admin2],
order_by=[Confirmed.date],
)
).alias('daily')
)
.join(Confirmed, on=(Domain.uid == Confirmed.uid))
.where(
(Domain.country_region == 'US')
& (Domain.province_state == 'New York')
& (Domain.admin2 == 'New York')
)
.order_by(
Domain.country_region,
Domain.province_state,
Domain.admin2,
Confirmed.date
)
).alias('subq')
# -------------------------------------------
q = (Domain
.select(
subq.c.province_state,
subq.c.admin2,
subq.c.date,
subq.c.count,
subq.c.daily,
fn.AVG(subq.c.daily)
.over(
start=Window.preceding(4), end=Window.preceding(0),
partition_by=[subq.c.country_region, subq.c.province_state,
subq.c.admin2],
order_by=[subq.c.date]
)
.alias('avg5'),
)
.join(subq, on=(Domain.uid == subq.c.uid))
.where(subq.c.date == date(2020,6,19))
)
print('--------------------------------- subq')
print(subq)
print(tabulate(subq.dicts(), 'keys'))
print('--------------------------------- q')
print(q)
print(tabulate(q.dicts(), 'keys'))
...和输出:
--------------------------------- subq
SELECT "t1"."uid", "t1"."country_region", "t1"."province_state", "t1"."admin2", "t2"."date",
"t2"."count", ("t2"."count" - LAG("t2"."count", 1) OVER (PARTITION BY "t1"."country_region",
"t1"."province_state", "t1"."admin2" ORDER BY "t2"."date")) AS "daily" FROM "domain" AS "t1"
INNER JOIN "confirmed" AS "t2" ON ("t1"."uid" = "t2"."uid_id")
WHERE ((("t1"."country_region" = 'US') AND ("t1"."province_state" = 'New York'))
AND ("t1"."admin2" = 'New York'))
ORDER BY "t1"."country_region", "t1"."province_state", "t1"."admin2", "t2"."date"
uid country_region province_state admin2 date count daily
-------- ---------------- ---------------- -------- ---------- ------- -------
84036061 US New York New York 2020-01-22 0
84036061 US New York New York 2020-01-23 0 0
84036061 US New York New York 2020-01-24 0 0
...
84036061 US New York New York 2020-06-13 209493 539
84036061 US New York New York 2020-06-14 209878 385
84036061 US New York New York 2020-06-15 210259 381
84036061 US New York New York 2020-06-16 210591 332
84036061 US New York New York 2020-06-17 210941 350
84036061 US New York New York 2020-06-18 211260 319
84036061 US New York New York 2020-06-19 211670 410
--------------------------------- q
SELECT "subq"."province_state", "subq"."admin2", "subq"."date", "subq"."count", "subq"."daily",
AVG("subq"."daily") OVER (PARTITION BY "subq"."country_region", "subq"."province_state",
"subq"."admin2" ORDER BY "subq"."date" ROWS BETWEEN 4 PRECEDING AND 0 PRECEDING) AS "avg5" FROM
"domain" AS "t1" INNER JOIN (SELECT "t1"."uid", "t1"."country_region", "t1"."province_state",
"t1"."admin2", "t2"."date", "t2"."count", ("t2"."count" - LAG("t2"."count", 1) OVER (PARTITION BY
"t1"."country_region", "t1"."province_state", "t1"."admin2" ORDER BY "t2"."date")) AS "daily" FROM
"domain" AS "t1" INNER JOIN "confirmed" AS "t2" ON ("t1"."uid" = "t2"."uid_id") WHERE
((("t1"."country_region" = 'US') AND ("t1"."province_state" = 'New York')) AND ("t1"."admin2" = 'New
York')) ORDER BY "t1"."country_region", "t1"."province_state", "t1"."admin2", "t2"."date") AS "subq"
ON ("t1"."uid" = "subq"."uid") WHERE ("subq"."date" = '2020-06-19')
province_state admin2 date count daily avg5
---------------- -------- ---------- ------- ------- ------
New York New York 2020-06-19 211670 410 410
解决方案
推荐阅读
- sql - 重用已删除行的标识值
- ios - 如何处理 WKWebview 中的身份验证挑战?
- javascript - 返回 0 或 NaN 而不是实际结果的值
- django - 从 Django 自动创建数据库视图
- python - 什么时候需要 PYTHONPATH 中的 cwd?
- c# - 事件的C#实现
- visual-studio - 在 Visual Studio 代码中格式化液体(Shopify)代码
- node.js - 节点肥皂 ClientSSLSecurityPFX - 403 禁止
- php - codeigniter 中的电子邮件概念
- c# - 使用 C# 导出到 Excel 时,Excel 将日期列视为一般