python - Python - 将 CSV 列舍入到最近的 30 分钟
问题描述
我的 CSV 数据如下:
列:
- CRASH_MONTH(例如“1”)
- CRASH_DAY(例如“1”)
- TIMESTR(例如“8:40”)
希望的结果:
一个名为“CRASH_DATETIME”的新列,其中包含一个datetime
基于相应日期的 Python 对象。年份无关紧要,主要目标是按月、日和小时:分钟跟踪崩溃,应该四舍五入到最接近的 30 分钟。
尝试了以下但失败了:
from datetime import datetime, timedelta
def ceil_dt(month, day, hourWithMinutes, delta):
hour,minutes = hourWithMinutes.split(':')
int(month)
int(day)
int(hour)
int(minutes)
dt = datetime.datetime(month=month, day=day, hour=hour, minute=minutes)
return dt + (datetime.min - dt) % delta
和
dataInitial['TIME'] = dataInitial.apply(lambda row: ceil_dt(row['CRASH_MONTH'], row['CRASH_DAY'], row['TIMESTR'], '30'))
但失败了(使用 Jupyter Notebook):
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc (pandas/_libs/index.c:5126)()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item (pandas/_libs/hashtable.c:14010)()
TypeError: an integer is required
During handling of the above exception, another exception occurred:
KeyError Traceback (most recent call last)
<ipython-input-40-a9ef29fd7eb7> in <module>()
----> 1 dataInitial['TIME'] = dataInitial.apply(lambda row: ceil_dt(row['CRASH_MONTH'], row['CRASH_DAY'], row['TIMESTR'], '30'))
~/anaconda2/envs/tfdeeplearning/lib/python3.5/site-packages/pandas/core/frame.py in apply(self, func, axis, broadcast, raw, reduce, args, **kwds)
4260 f, axis,
4261 reduce=reduce,
-> 4262 ignore_failures=ignore_failures)
4263 else:
4264 return self._apply_broadcast(f, axis)
~/anaconda2/envs/tfdeeplearning/lib/python3.5/site-packages/pandas/core/frame.py in _apply_standard(self, func, axis, ignore_failures, reduce)
4356 try:
4357 for i, v in enumerate(series_gen):
-> 4358 results[i] = func(v)
4359 keys.append(v.name)
4360 except Exception as e:
<ipython-input-40-a9ef29fd7eb7> in <lambda>(row)
----> 1 dataInitial['TIME'] = dataInitial.apply(lambda row: ceil_dt(row['CRASH_MONTH'], row['CRASH_DAY'], row['TIMESTR'], '30'))
~/anaconda2/envs/tfdeeplearning/lib/python3.5/site-packages/pandas/core/series.py in __getitem__(self, key)
599 key = com._apply_if_callable(key, self)
600 try:
--> 601 result = self.index.get_value(self, key)
602
603 if not is_scalar(result):
~/anaconda2/envs/tfdeeplearning/lib/python3.5/site-packages/pandas/core/indexes/base.py in get_value(self, series, key)
2475 try:
2476 return self._engine.get_value(s, k,
-> 2477 tz=getattr(series.dtype, 'tz', None))
2478 except KeyError as e1:
2479 if len(self) > 0 and self.inferred_type in ['integer', 'boolean']:
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_value (pandas/_libs/index.c:4404)()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_value (pandas/_libs/index.c:4087)()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc (pandas/_libs/index.c:5210)()
KeyError: ('CRASH_MONTH', 'occurred at index CRASH_DATE')
有任何想法吗?
解决方案
您的函数在转换(未存储在变量中)、缺少年份和 timedelta 方面存在一些小问题。此版本的功能正常工作:
from datetime import datetime, timedelta
def ceil_dt(month, day, hourWithMinutes, delta):
hour,minutes = hourWithMinutes.split(':')
month = int(month)
day = int(day)
hour = int(hour)
minutes = int(minutes)
dt = datetime(year = 2019, month=month, day=day, hour=int(hour), minute=int(minutes))
return dt + (datetime.min - dt) % timedelta(minutes=int(delta))
推荐阅读
- python - 将彩色文本放入熊猫数据框 python
- java - Log4j 不读取大写字母
- git - 如何停止忘记更改每个 git 项目的身份
- javascript - 如何在我的视图中加载外部网站 ASP.NET Core
- java - Firebase 实时数据库更新数据 - Android Java
- linux - bash 脚本上的“缓存”凭据
- powerbuilder - PB App 仅适用于旧版 SQL Server,但不适用于新版 SQL Server。使困惑
- java - 使用 Jackson 反序列化的实例化集合类型而不是初始化一个新集合类型?
- android - 如何将值从 RecyclerView 项目传递和显示到其他活动?
- jquery - What happens to 'this' between two calls of .each()?