python - 带有 MySQL 后端的 Django - 按时间范围分组
问题描述
我有这个简单的模型:
模型.py
class Ping(models.Model):
online = models.BooleanField()
created = models.DateTimeField(db_index=True, default=timezone.now)
def __str__(self):
return f'{self.online}, {self.created}'
它给了我以下结果:
mysql [lab]> SELECT * FROM myapp_ping;
+----+--------+----------------------------+
| id | online | created |
+----+--------+----------------------------+
| 1 | 1 | 2018-08-02 13:34:09.435292 |
| 2 | 1 | 2018-08-02 13:35:09.520200 |
| 3 | 0 | 2018-08-02 13:36:09.540638 |
| 4 | 0 | 2018-08-02 13:37:10.529783 |
| 5 | 1 | 2018-08-02 13:38:09.779012 |
| 6 | 1 | 2018-08-02 13:39:09.650365 |
| 7 | 1 | 2018-08-02 13:40:09.625543 |
| 8 | 1 | 2018-08-02 13:41:09.892196 |
| 9 | 1 | 2018-08-02 13:42:09.802186 |
| 10 | 1 | 2018-08-02 13:43:09.864551 |
| 11 | 1 | 2018-08-02 13:44:09.960962 |
| 12 | 1 | 2018-08-02 13:45:09.891947 |
| 13 | 0 | 2018-08-02 13:46:09.141727 |
| 14 | 0 | 2018-08-02 13:47:09.142030 |
| 15 | 0 | 2018-08-02 13:48:09.160942 |
| 16 | 0 | 2018-08-02 13:49:09.152879 |
| 17 | 0 | 2018-08-02 13:50:09.280246 |
| 18 | 1 | 2018-08-02 13:51:09.363184 |
| 19 | 1 | 2018-08-02 13:52:09.405863 |
| 20 | 1 | 2018-08-02 13:53:09.403251 |
+----+--------+----------------------------+
20 rows in set (0.00 sec)
有没有办法获得与此类似的输出(online
错误的范围):
停机时间:
from | to | duration
2018-08-02 13:36:09 | 2018-08-02 13:37:10 | 1 minute and 1 second
2018-08-02 13:46:09 | 2018-08-02 13:50:09 | 4 minutes and 0 seconds
我不确定这是否可以用 Django ORM 完成,或者它需要一个原始的 MySQL 查询来使用类似CASE
orIF
语句的东西?
更新:2018 年 8 月 8 日星期三 15:13:15 UTC
所以我从@AKX answer得到了两种解决方案的概念证明:
模型.py
class PingManager(models.Manager):
def downtime_python(self):
queryset = super().get_queryset().filter(created__gt=timezone.now() - timezone.timedelta(days=30))
offline = False
ret = []
for entry in queryset:
if not entry.online and not offline:
offline = True
_ret = {'start': str(entry.created)}
if entry.online and offline:
_ret.update({'end': str(entry.created)})
ret.append(_ret)
offline = False
return ret
def downtime_sql(self):
queryset = super().get_queryset().filter(created__gt=timezone.now() - timezone.timedelta(days=30))
offline = queryset.filter(online=False).order_by('created').first()
last = queryset.order_by('created').last()
ret = []
if offline:
online = queryset.filter(created__gt=offline.created, online=True).order_by('created').first()
ret.append({'start': str(offline.created), 'end': str(online.created)})
while True:
offline = queryset.filter(created__gt=online.created, online=False).order_by('created').first()
if offline:
online = queryset.filter(created__gt=offline.created, online=True).order_by('created').first()
if (online and offline) and online.created < last.created:
ret.append({'start': str(offline.created), 'end': str(online.created)})
continue
else:
break
return ret
class Ping(models.Model):
online = models.BooleanField()
created = models.DateTimeField(db_index=True, default=timezone.now)
objects = PingManager()
def __str__(self):
return f'{self.online}, {self.created}'
问题:
我应该为此创建一个静态方法还是自定义
manger
是正确的解决方案?如果两个计算都在内存中运行,为什么执行时间之间会有如此巨大的差异?有没有办法改进并使其更像 python 等效方法?
测试:
# python manage.py shell
Python 3.6.5 (default, Apr 10 2018, 17:08:37)
Type 'copyright', 'credits' or 'license' for more information
IPython 6.5.0 -- An enhanced Interactive Python. Type '?' for help.
In [1]: from myapp.models import Ping
In [2]: Ping.objects.downtime_sql()[0]
Out[2]:
{'start': '2018-07-13 16:32:16.009356+00:00',
'end': '2018-07-13 16:33:15.942784+00:00'}
In [3]: Ping.objects.downtime_python()[0]
Out[3]:
{'start': '2018-07-13 16:32:16.009356+00:00',
'end': '2018-07-13 16:33:15.942784+00:00'}
In [4]: Ping.objects.downtime_sql() == Ping.objects.downtime_python()
Out[4]: True
In [5]: import timeit
In [6]: timeit.timeit(stmt=Ping.objects.downtime_python, number=1)
Out[6]: 5.720254830084741
In [7]: timeit.timeit(stmt=Ping.objects.downtime_sql, number=1)
Out[7]: 0.25946347787976265
解决方案
扩展我的评论:
我不确定甚至 SQL case/if 语句能否为您提供该结果,因为结果行取决于先前的行。不过,这在 Python 中很容易在程序上完成。
- 显而易见的方法是循环
Ping.objects.all()
(或Ping.objects.iterator()
)并跟踪online
变量以形成您想要的“条纹”。这样做的缺点是您确实需要遍历每个对象,这最终会很慢(和/或耗尽您的内存)。 - 一种更复杂的方法,它使用更多查询但内存更少,是找到第一个
Ping
离线的对象,然后找到下一个(时间方面)Ping
再次在线的对象——这将形成一个连续。然后冲洗并重复此操作,直到用完Ping
要检查的对象。
编辑
所以是的,这是方法 2 的一个(相当优雅,如果你不介意我说的话)具体实现(在https://github.com/akx/so51656477找到完整的测试仓库):
class PingQuerySet(models.QuerySet):
def streaks(self):
queryset = self.values_list('created', 'online').order_by('created')
entry = queryset.first()
while entry:
next_entry = queryset.filter(created__gt=entry[0], online=(not entry[1])).first()
yield (entry, next_entry)
entry = next_entry
它是 2-tuples 元组的生成器:((start_timestamp, start_online), (end_timestamp, end_online) | None)
.
例如,要获得过去 10 天的上涨/下跌或下跌/上涨对,
for start, end in Ping.objects.filter(created__gt=now() - timedelta(days=10)).streaks():
print(start, end)
会打印出类似的东西
[...snip...]
(datetime.datetime(2018, 8, 8, 8, 10, 12, 943500), False) (datetime.datetime(2018, 8, 8, 10, 10, 12, 943500), True)
(datetime.datetime(2018, 8, 8, 10, 10, 12, 943500), True) (datetime.datetime(2018, 8, 8, 11, 10, 12, 943500), False)
(datetime.datetime(2018, 8, 8, 11, 10, 12, 943500), False) (datetime.datetime(2018, 8, 8, 11, 40, 12, 943500), True)
(datetime.datetime(2018, 8, 8, 11, 40, 12, 943500), True) (datetime.datetime(2018, 8, 8, 12, 40, 12, 943500), False)
(datetime.datetime(2018, 8, 8, 12, 40, 12, 943500), False) (datetime.datetime(2018, 8, 8, 16, 40, 12, 943500), True)
(datetime.datetime(2018, 8, 8, 16, 40, 12, 943500), True) (datetime.datetime(2018, 8, 8, 17, 40, 12, 943500), False)
(datetime.datetime(2018, 8, 8, 17, 40, 12, 943500), False) (datetime.datetime(2018, 8, 8, 18, 10, 12, 943500), True)
(datetime.datetime(2018, 8, 8, 18, 10, 12, 943500), True) (datetime.datetime(2018, 8, 8, 19, 40, 12, 943500), False)
(datetime.datetime(2018, 8, 8, 19, 40, 12, 943500), False) (datetime.datetime(2018, 8, 8, 23, 10, 12, 943500), True)
(datetime.datetime(2018, 8, 8, 23, 10, 12, 943500), True) (datetime.datetime(2018, 8, 9, 0, 10, 12, 943500), False)
(datetime.datetime(2018, 8, 9, 0, 10, 12, 943500), False) (datetime.datetime(2018, 8, 9, 3, 10, 12, 943500), True)
(datetime.datetime(2018, 8, 9, 3, 10, 12, 943500), True) (datetime.datetime(2018, 8, 9, 3, 40, 12, 943500), False)
(datetime.datetime(2018, 8, 9, 3, 40, 12, 943500), False) (datetime.datetime(2018, 8, 9, 5, 10, 12, 943500), True)
(datetime.datetime(2018, 8, 9, 5, 10, 12, 943500), True) (datetime.datetime(2018, 8, 9, 5, 40, 12, 943500), False)
(datetime.datetime(2018, 8, 9, 5, 40, 12, 943500), False) (datetime.datetime(2018, 8, 9, 7, 10, 12, 943500), True)
(datetime.datetime(2018, 8, 9, 7, 10, 12, 943500), True) None
一些注意事项:
- 最后一个
end
值可能是None
,这意味着机器仍然处于运行状态或处于运行状态(取决于start
元组的状态值)。 - 如果您只关心机器停机的时间,只需忽略
start
元组状态值为 的对True
。 - 由于这是一个生成器,当你有足够的数据时,你可以停止迭代它,它不会再查询。
- 由于这是一种
QuerySet
扩展方法,因此您可以根据需要添加其他过滤器(只要它们不过滤online
)。例如,如果您有一个host
字段,Ping.objects.filter(host='example.com').streaks()
.
推荐阅读
- mysql - 从 Cloud Functions 连接现有的 MySQL 数据库 (OVH)
- python - GEKKO Python - 来自文档的简单线性回归
- sqlite - 如何在 go 中查询 sqlite db?
- excel - VB.NET - 使用 Excel,一旦完成就无法发布文件
- google-chrome - 会话存储未结转到 Chrome 89+ 中的新标签页
- python - 为什么当我尝试在项目解释器中安装任何库时,PyCharm 会说“无法获取 URL https://pypi.org/simple/pip/: ...”?
- spring - 多个 @Before/@After 与 Cucumber 和 Spring?
- reactjs - 在 React 中更新状态数组
- reactjs - nextjs 路由器更新 URL 参数更改
- javascript - 无法调用 JS 函数,在控制台日志上显示错误