python - 将导入的日期从.txt转换为日期格式python
问题描述
我在导入的 .txt 文件中转换日期时遇到问题,我想知道我做错了什么。
我通过以下方式导入数据:
df_TradingMonthlyDates = pd.read_csv(TradingMonthlyDates, dtype=str, sep=',') # header=True,
它看起来像下表(日期代表月份的开始/结束并有一个标题Date
):
Date
0 2008-12-30
1 2008-12-31
2 2009-01-01
3 2009-01-02
4 2009-01-29
.. ...
557 2020-06-29
558 2020-06-30
559 2020-07-01
560 2020-07-02
561 2020-07-30
.. ...
624 2021-11-30
625 2021-12-01
626 2021-12-02
627 2021-12-30
628 2021-12-31
[629 rows x 1 columns]
<class 'pandas.core.frame.DataFrame'>
然后我计算今天的日期:
df_EndDate = datetime.now().date()
我正在尝试在此函数中应用上述数据以获取给定日期之前的最接近日期(在我的情况下,给定日期 = 今天的日期):
# https://stackoverflow.com/questions/32237862/find-the-closest-date-to-a-given-date
def nearest(items, pivot):
return min([i for i in items if i < pivot], key=lambda x: abs(x - pivot))
date_output = nearest(df_TradingMonthlyDates, df_EndDate)
# date_output should be = 2020-07-02 given today's date of 2020-07-12
我收到的错误消息df_TradingMonthlyDates
是不是日期格式。因此,我尝试将数据框转换为日期时间格式,但无法使其正常工作。
我尝试将数据转换为日期格式:
# df_TradingMonthlyDates["Date"] = pd.to_datetime(df_TradingMonthlyDates["Date"], format="%Y-%m-%d")
# df_TradingMonthlyDates = datetime.strptime(df_TradingMonthlyDates, "%Y-%m-%d").date()
# df_TradingMonthlyDates['Date'] = df_TradingMonthlyDates['Date'].apply(lambda x: pd.to_datetime(x[0], format="%Y-%m-%d"))
# df_TradingMonthlyDates = df_TradingMonthlyDates.iloc[1:]
# print(df_TradingMonthlyDates)
# df_TradingMonthlyDates = datetime.strptime(str(df_TradingMonthlyDates), "%Y-%m-%d").date()
# for line in split_source[1:]: # skip the first line
代码:
import pandas as pd
from datetime import datetime
# Version 1
TradingMonthlyDates = "G:/MonthlyDates.txt"
# Import file where all the first/end month date exists
df_TradingMonthlyDates = pd.read_csv(TradingMonthlyDates, dtype=str, sep=',') # header=True,
print(df_TradingMonthlyDates)
# https://community.dataquest.io/t/datetime-and-conversion/213425
# df_TradingMonthlyDates["Date"] = pd.to_datetime(df_TradingMonthlyDates["Date"], format="%Y-%m-%d")
# df_TradingMonthlyDates = datetime.strptime(df_TradingMonthlyDates, "%Y-%m-%d").date()
# df_TradingMonthlyDates['Date'] = df_TradingMonthlyDates['Date'].apply(lambda x: pd.to_datetime(x[0], format="%Y-%m-%d"))
# df_TradingMonthlyDates = df_TradingMonthlyDates.iloc[1:]
# print(df_TradingMonthlyDates)
# df_TradingMonthlyDates = datetime.strptime(str(df_TradingMonthlyDates), "%Y-%m-%d").date()
# for line in split_source[1:]: # skip the first line # maybe header is the problem
print(type(df_TradingMonthlyDates))
df_TradingMonthlyDates = df_TradingMonthlyDates.datetime.strptime(df_TradingMonthlyDates, "%Y-%m-%d")
df_TradingMonthlyDates = df_TradingMonthlyDates.time()
print(df_TradingMonthlyDates)
df_EndDate = datetime.now().date()
print(type(df_EndDate))
# https://stackoverflow.com/questions/32237862/find-the-closest-date-to-a-given-date
def nearest(items, pivot):
return min([i for i in items if i < pivot], key=lambda x: abs(x - pivot))
date_output = nearest(df_TradingMonthlyDates, df_EndDate)
错误消息因我尝试转换数据类型的方式而异,但我认为它们都注意到我的日期格式不成功:
df_TradingMonthlyDates = df_TradingMonthlyDates.datetime.strptime(df_TradingMonthlyDates, "%Y-%m-%d")
Traceback (most recent call last):
File "g:/till2.py", line 25, in <module>
df_TradingMonthlyDates = df_TradingMonthlyDates.datetime.strptime(df_TradingMonthlyDates, "%Y-%m-%d")
File "C:\Users\ID\AppData\Roaming\Python\Python38\site-packages\pandas\core\generic.py", line 5274, in __getattr__
return object.__getattribute__(self, name)
AttributeError: 'DataFrame' object has no attribute 'datetime'
df_TradingMonthlyDates["Date"] = pd.to_datetime(df_TradingMonthlyDates["Date"], format="%Y-%m-%d")
Traceback (most recent call last):
File "g:/till2.py", line 40, in <module>
date_output = nearest(df_TradingMonthlyDates, df_EndDate)
File "g:/till2.py", line 38, in nearest
return min([i for i in items if i < pivot], key=lambda x: abs(x - pivot))
File "g:/till2.py", line 38, in <listcomp>
return min([i for i in items if i < pivot], key=lambda x: abs(x - pivot))
TypeError: '<' not supported between instances of 'str' and 'datetime.date'
解决方案
编辑:添加了方法 3,这可能是最简单的.loc
,然后.iloc
您可以采用稍微不同的方法(使用下面的方法 #1 或方法 #2),方法是取今天日期和数据之间差异的绝对最小值,但是您没有做的关键事情是pd.to_datetime()
环绕datetime.date
对象df_EndDate
以便将其转换为 aDatetimeArray
以便可以将其与您的Date
列进行比较。它们都必须采用相同的格式DatetimeArray
才能进行比较。
方法一:
import pandas as pd
import datetime as dt
df_TradingMonthlyDates = pd.DataFrame({'Date': {'0': '2008-12-30',
'1': '2008-12-31',
'2': '2009-01-01',
'3': '2009-01-02',
'4': '2009-01-29',
'557': '2020-06-29',
'558': '2020-06-30',
'559': '2020-07-01',
'560': '2020-07-02',
'561': '2020-07-30',
'624': '2021-11-30',
'625': '2021-12-01',
'626': '2021-12-02',
'627': '2021-12-30',
'628': '2021-12-31'}})
df_TradingMonthlyDates['Date'] = pd.to_datetime(df_TradingMonthlyDates['Date'])
df_TradingMonthlyDates['EndDate'] = pd.to_datetime(dt.datetime.now().date())
df_TradingMonthlyDates['diff'] = (df_TradingMonthlyDates['Date'] - df_TradingMonthlyDates['EndDate'])
a=min(abs(df_TradingMonthlyDates['diff']))
df_TradingMonthlyDates = df_TradingMonthlyDates.loc[(df_TradingMonthlyDates['diff'] == a)
| (df_TradingMonthlyDates['diff'] == -a)]
df_TradingMonthlyDates
输出 1:
Date EndDate diff
560 2020-07-02 2020-07-11 -9 days
如果您不想要额外的列而只想要日期,则分配变量以创建系列而不是新列:
方法二:
d = pd.to_datetime(df_TradingMonthlyDates['Date'])
t = pd.to_datetime(dt.datetime.now().date())
e = (d-t)
a=min(abs(e))
df_TradingMonthlyDates = df_TradingMonthlyDates.loc[(e == a) | (e == -a)]
df_TradingMonthlyDates
输出 2:
Date
560 2020-07-02
方法三:
df_TradingMonthlyDates['Date'] = pd.to_datetime(df_TradingMonthlyDates['Date'])
date_output = df_TradingMonthlyDates.sort_values('Date') \
.loc[df_TradingMonthlyDates['Date'] <=
pd.to_datetime(dt.datetime.now().date())] \
.iloc[-1,:]
date_output
输出 3:
Date 2020-07-02
Name: 560, dtype: datetime64[ns]
推荐阅读
- javascript - 我可以将国际象棋游戏/移动导入 React Chessboard 以仅以视觉格式显示移动吗?(实际上没玩)
- laravel - 如何在 Laravel 中创建自定义排除验证规则
- .net-core - .Net Core 5 控制台应用程序,在 Program.Main 中完成时不会为环境加载应用程序设置
- javascript - 如何在我的 JS 文件中使用 main.dart 变量?
- javascript - 文件加载器在 dist 文件夹的根目录中创建额外文件
- modal-dialog - 动态添加一些内容后,Bootstrap 5 Modal 在滚动时无法正常工作 - 这导致 Modal 的高度超过了以前
- javascript - 通过多个请求捕获可观察对象的最后一个值
- knex.js - knex 正在构建一个无效的查询
- sql - 如何通过删除列中的前面字符来更新 SQL 中的列?
- python - 如何优化/加快迭代 Dataframe 并从另一个 Dataframe 匹配/获取数据?