首页 > 解决方案 > 从 Pandas Datetime Excel DateOffset 中剥离时间

问题描述

所以我导入了一个具有“DT RECD”字段的 excel 表,该字段将日期格式化为 mm/dd/yy,这是 excel 表中的第 7 列,我像这样导入我的数据:

import datetime
import pandas as pd


excel_workbook = 'ExcelSheet.xlsx'
sheet1 = pd.read_excel(excel_workbook,
                       sheet_name='Sheet1',
                       keep_default_na= False,
                       index_col=0,
                       parse_dates=['DT RECD'])
sheet1['DT RECD'] = pd.to_datetime(sheet1['DT RECD'])

当我打印数据时,它看起来像这样,这非常适合我需要做的事情。

DT RECD     LOT Number  FNISH                
2008-07-23     471359     AL  
2018-05-18   71378301     CR  
2018-05-18     713787     CR  
2018-11-09   74219202     CR

然后我需要在日期“1 年和超过 2 年标记”“2 年和超过 3 年标记”等日期过滤这些数据。所以我为今天和 1,2,3,4,.. 年创建了我的日期,并像这样从 pandas 中应用一个偏移量:

today = datetime.date.today()
oneYear = today - pd.DateOffset(years=1)
#oneYear = today - datetime.timedelta(years=1) <- did not work 
twoYear = today - pd.DateOffset(years=2)

现在这是我的日期中有 00:00:00 的问题

2021-03-03
2020-03-03 00:00:00
2019-03-03 00:00:00

我正在尝试获取第一年和第二年之间的数据并发布到新表中,如下所示:

YearOne = sheet1[sheet1['DT RECD'].between(oneYear, twoYear)]

当我打印 YearOne 时,我得到一个空数据集 - 如果我打印 sheet1 数据就在那里,这是“index_col = 0”。

Empty DataFrame
Columns: [...,THICKNESS,...,DT RECD, 
Index: []

我不知道如何提取一两年的数据。

这也是我写回excel的方式

writer = pd.ExcelWriter('ExcelSheet.xlsx', mode='a', engine='openpyxl')
YearOne.to_excel(writer, '1 Year')

当我设置 Index_col=7 这是“DT RECD”列时,我得到一个错误:

sheet1 = pd.read_excel(excel_workbook,
                       sheet_name='Sheet1',
                       keep_default_na= False,
                       index_col=7,
                       parse_dates=['DT RECD'])

我收到此错误:

Traceback (most recent call last):
  File "...\site-packages\pandas\core\indexes\base.py", line 3080, in get_loc
    return self._engine.get_loc(casted_key)
  File "pandas\_libs\index.pyx", line 70, in pandas._libs.index.IndexEngine.get_loc
  File "pandas\_libs\index.pyx", line 101, in pandas._libs.index.IndexEngine.get_loc
  File "pandas\_libs\hashtable_class_helper.pxi", line 4554, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas\_libs\hashtable_class_helper.pxi", line 4562, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'DT RECD'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\steven\PycharmProjects\Test\venv\Foreach Loop Panda.py", line 11, in <module>
    sheet1['DT RECD'] = pd.to_datetime(sheet1['DT RECD'])
  File "C:\Users\steven\PycharmProjects\Test\venv\lib\site-packages\pandas\core\frame.py", line 3024, in __getitem__
    indexer = self.columns.get_loc(key)
  File "C:\Users\steven\PycharmProjects\Test\venv\lib\site-packages\pandas\core\indexes\base.py", line 3082, in get_loc
    raise KeyError(key) from err
KeyError: 'DT RECD'

如果 Index_col=0 我没有收到错误,但在尝试获取 2 年之间的数据时“YearOne”数据中没有数据。

标签: pythonexcelpandasdatetimeoffset

解决方案


推荐阅读