首页 > 解决方案 > Python:如何在两个日期范围之间找到每个月的第一天

问题描述

我编写了一些代码来为两个日期范围之间的每一天创建一个每月第一天的列表。你能想出更好的方法来做到这一点吗?

import datetime
end_date= datetime.datetime.strptime('2018-03-28', "%Y-%m-%d").date()
start_date= datetime.datetime.strptime('2017-10-25', "%Y-%m-%d").date()
print(start_date)
print(start_date + datetime.timedelta(days=1))
mylist = []
checking_date = start_date
print(checking_date + datetime.timedelta(days=1))
while str(checking_date) < str(end_date):
    if checking_date != start_date:
        mylist.append(checking_date)
    month = str(checking_date).split('-')[1]
    new_date = checking_date + datetime.timedelta(days=20)
    possible_new_month = str(new_date).split('-')[1]
    if possible_new_month == month:
        new_date = new_date + datetime.timedelta(days=20)
    new_year = str(new_date).split('-')[0]
    new_month = str(new_date).split('-')[1]
    checking_date_format = "{0}-{1}-01".format(new_year,new_month)
    checking_date = datetime.datetime.strptime(checking_date_format, "%Y-%m-%d").date()

标签: pythondatetime

解决方案


years * 12 + (month - 1)如果您将年份和月份转换为单个数字,则使用;更容易推理月份算术。这可以通过底除法和模运算转换回年和月对。例如,2017-10(十月)是自零年以来的 24213 个月:

>>> 2017 * 12 + (10 - 1)
24213

您可以从该数字中轻松添加或删除几个月。您可以通过除法再次得出年份,并通过%模数和加回找到月份1

>>> 24213 // 12  # year
2017
>>> (24213 % 12) + 1  # month
10

考虑到这一点,您可以使用 arange()生成任意数量的月份:

from datetime import date

def months(start_date, end_date, day=1):
    """Produce a date for every month from start until end"""
    start = start_date.year * 12 + (start_date.month - 1)
    if start_date.day > day:
        # already in this month, so start counting at the next
        start += 1
    end = end_date.year * 12 + (end_date.month - 1)
    if end_date.day > day:
        # end date is past the reference day, include the reference
        # date in the output
        end += 1
    # generate the months, just a range from start to end
    for ordinal in range(start, end):
        yield date(ordinal // 12, (ordinal % 12) + 1, day)

以上是生成连续月份的生成器函数;list()如果您需要完整的序列,请调用它:

>>> start_date = date(2017, 10, 25)
>>> end_date = date(2018, 3, 28)
>>> list(months(start_date, end_date))
[datetime.date(2017, 11, 1), datetime.date(2017, 12, 1), datetime.date(2018, 1, 1), datetime.date(2018, 2, 1), datetime.date(2018, 3, 1)]

请注意,您在任何时候都不需要将日期转换为字符串!.month您可以使用该属性轻松地从实例中获取月份值。

为了进行比较,我也将其他两种解决方案转换为生成器:

from calendar import monthrange
from datetime import timedelta
from dateutil import rrule

def andray_timedelta_one(start_date, end_date):
    delta = end_date - start_date
    first_days_of_month = []
    for i in range(delta.days + 1):
        d = start_date + timedelta(i)
        if d.day == 1:
            yield d

def matthew_timedelta_monthrange(start_date, end_date):
    if start_date.day == 1:
        yield start_date

    start_date = start_date.replace(day=1)

    while start_date <= end_date:
        # add the number of days in the month for this month/year
        try:
            start_date += timedelta(monthrange(start_date.year, start_date.month)[1])
            yield start_date
        except OverflowError:
            # trying to add to close-to-date.max would raise this exception
            return

def sunitha_rrule(start_date, end_date):
    # already an iterable
    return rrule.rrule(rrule.MONTHLY, bymonthday=1, dtstart=start_date, until=end_date)

# for completion's sake, I renamed mine to martijn_months

这样就可以公平地比较它们的性能,并且我们可以使用一种deque(..., maxlen=0)技巧来快速消耗它们的输出,而无需大量内存。date.min然后,我们可以在至的范围内运行每个函数date.max,即最大可能的日期范围;那是要产生近 12 万个日期对象:

>>> sum(1 for _ in months(datetime.date.min, datetime.date.max))
119988

这些是结果:

>>> from timeit import Timer
>>> from collections import deque
>>> bootstrap = 'from __main__ import date, deque, {} as test'
>>> test = 'deque(test(date.min, date.max), maxlen=0)'
>>> for f in (
...         andray_timedelta_one,
...         sunitha_rrule,
...         matthew_timedelta_monthrange,
...         martijn_months):
...     loop_count, total_time = Timer(test, bootstrap.format(f.__name__)).autorange()
...     print(f'{f.__name__:<30}: {total_time/loop_count*1000:.5f}ms')
...
andray_timedelta_one          : 2001.27048ms
sunitha_rrule                 : 1517.70081ms
matthew_timedelta_monthrange  : 154.68727ms
martijn_months                : 38.86803ms

如您所见,我的方法要快几个数量级。

  • Andray 的方法浪费了大量时间来创建日历中的每个日期,一次添加一天。
  • Sunitha 选择的rrule方法简洁明了,但该函数必须考虑更复杂的日期算术,因此这种非常简单的情况没有得到优化。这使得rrule()
  • Matthew 的效率要高得多,但是对于calendar.monthrange()年月组合上的简单加一运算来说,执行的计算仍然是多余的。我们不需要知道当月是否有 31、30、29 或 28 天来进行计算!

推荐阅读