首页 > 解决方案 > fn.AVG peewee 函数的意外行为

问题描述

我正在开展一个项目,以使用 python 和 peewee 作为学习练习来探索 covid-19 案例。

这个问题涉及两个表:

域模型将全球区域描述到美国的县级粒度。例如,纽约县有一行包含字段 Domain.country_region = 'US'、Domain.province_state = 'New York'、Domain.admin2 = 'New York'。

确认模型包含自今年早些时候以来每个域的已确认 covid-19 病例的累积计数。

近期目标是计算每个域的 5 天滚动平均值。最终,我计划能够确定县和州感兴趣的发展,例如“尖峰”等。

下面我包括模型、测试脚本和脚本的输出。我正在使用子查询来计算每日病例数,因为我无法创建使用同一表行中其他计算列的计算列。

问题是在“q”查询中,fn.AVG 不是计算最近 5 天每日确诊病例的平均值,而是返回最后一天(2020 年 6 月 19 日)的值。

我正要尝试在 Confirmed 表中使用混合属性而不是这种方法,但我很好奇为什么这段代码不起作用。

型号(仅列出相关表格):

from peewee import *
db = SqliteDatabase('data/covid.db')
class BaseModel(Model):
    class Meta:
        database = db
class Domain(BaseModel):
    uid = IntegerField(primary_key=True)
    country_region = CharField()
    province_state = CharField(default='')
    admin2 = CharField(default='')
    population = IntegerField(default=0)
    area = FloatField(default=0.0)
    density = FloatField(default=0.0)
class Confirmed(BaseModel):
    uid = ForeignKeyField(Domain)
    date = DateField()
    count = IntegerField()

完整代码:

from tabulate import tabulate
from covid.models import *
from datetime import date

subq = (Domain
    .select(
        Domain.uid,
        Domain.country_region,
        Domain.province_state,
        Domain.admin2,
        Confirmed.date,
        Confirmed.count,
        (Confirmed.count - fn.LAG(Confirmed.count, 1)
            .over(
                partition_by=[Domain.country_region, Domain.province_state,
                    Domain.admin2],
                order_by=[Confirmed.date],
            )
        ).alias('daily')
    )
    .join(Confirmed, on=(Domain.uid == Confirmed.uid))
    .where(
          (Domain.country_region == 'US')
        & (Domain.province_state == 'New York')
        & (Domain.admin2 == 'New York')
    )
    .order_by(
        Domain.country_region,
        Domain.province_state,
        Domain.admin2,
        Confirmed.date
    )
).alias('subq')

# -------------------------------------------

q = (Domain
        .select(
            subq.c.province_state,
            subq.c.admin2,
            subq.c.date,
            subq.c.count,
            subq.c.daily,
            fn.AVG(subq.c.daily)
                .over(
                    start=Window.preceding(4), end=Window.preceding(0),
                    partition_by=[subq.c.country_region, subq.c.province_state,
                        subq.c.admin2],
                    order_by=[subq.c.date]
                )
                .alias('avg5'),
        )
        .join(subq, on=(Domain.uid == subq.c.uid))
        .where(subq.c.date == date(2020,6,19))
    )

print('--------------------------------- subq')
print(subq)
print(tabulate(subq.dicts(), 'keys'))

print('--------------------------------- q')
print(q)
print(tabulate(q.dicts(), 'keys'))

...和输出:

--------------------------------- subq
SELECT "t1"."uid", "t1"."country_region", "t1"."province_state", "t1"."admin2", "t2"."date", 
"t2"."count", ("t2"."count" - LAG("t2"."count", 1) OVER (PARTITION BY "t1"."country_region", 
"t1"."province_state", "t1"."admin2" ORDER BY "t2"."date")) AS "daily" FROM "domain" AS "t1"
INNER JOIN "confirmed" AS "t2" ON ("t1"."uid" = "t2"."uid_id") 
WHERE ((("t1"."country_region" = 'US') AND ("t1"."province_state" = 'New York')) 
AND ("t1"."admin2" = 'New York')) 
ORDER BY "t1"."country_region", "t1"."province_state", "t1"."admin2", "t2"."date"
     uid  country_region    province_state    admin2    date          count    daily
--------  ----------------  ----------------  --------  ----------  -------  -------
84036061  US                New York          New York  2020-01-22        0
84036061  US                New York          New York  2020-01-23        0        0
84036061  US                New York          New York  2020-01-24        0        0
...
84036061  US                New York          New York  2020-06-13   209493      539
84036061  US                New York          New York  2020-06-14   209878      385
84036061  US                New York          New York  2020-06-15   210259      381
84036061  US                New York          New York  2020-06-16   210591      332
84036061  US                New York          New York  2020-06-17   210941      350
84036061  US                New York          New York  2020-06-18   211260      319
84036061  US                New York          New York  2020-06-19   211670      410
--------------------------------- q
SELECT "subq"."province_state", "subq"."admin2", "subq"."date", "subq"."count", "subq"."daily", 
AVG("subq"."daily") OVER (PARTITION BY "subq"."country_region", "subq"."province_state", 
"subq"."admin2" ORDER BY "subq"."date" ROWS BETWEEN 4 PRECEDING AND 0 PRECEDING) AS "avg5" FROM 
"domain" AS "t1" INNER JOIN (SELECT "t1"."uid", "t1"."country_region", "t1"."province_state", 
"t1"."admin2", "t2"."date", "t2"."count", ("t2"."count" - LAG("t2"."count", 1) OVER (PARTITION BY 
"t1"."country_region", "t1"."province_state", "t1"."admin2" ORDER BY "t2"."date")) AS "daily" FROM 
"domain" AS "t1" INNER JOIN "confirmed" AS "t2" ON ("t1"."uid" = "t2"."uid_id") WHERE 
((("t1"."country_region" = 'US') AND ("t1"."province_state" = 'New York')) AND ("t1"."admin2" = 'New 
York')) ORDER BY "t1"."country_region", "t1"."province_state", "t1"."admin2", "t2"."date") AS "subq" 
ON ("t1"."uid" = "subq"."uid") WHERE ("subq"."date" = '2020-06-19')
province_state    admin2    date          count    daily    avg5
----------------  --------  ----------  -------  -------  ------
New York          New York  2020-06-19   211670      410     410

标签: python-3.xpeewee

解决方案


推荐阅读