python - Pandas - df.loc - 只能比较相同标签的系列
问题描述
我下面的代码(对不起,我不能分享确切的数据)需要一个 df,按日期范围过滤它,然后重新标记某个日期。然后我想将那些重新标记的日期拉到原始 df 中。在这行代码之前它工作正常:
finaldf.loc[(finaldf['Due_Date'] != finaldfmon['Due_Date']), 'Due_Date'] = finaldfmon['Due_Date']
从现在自己研究来看,是因为索引长度不一样。
print(finaldf.index)
对比
print(finaldfmon.index)
我不明白为什么这会是一个问题,也不知道如何解决它。我想模拟一个excel vlookup,但如果它们没有被击中,就不会留下#NA(因为Anchor值(认为主键)没有任何匹配项(外键)。
完整代码在这里:
import pandas as pd
import xlrd # added when using visual studio
import datetime
from datetime import datetime
finaldf = pd.read_excel("scrubcomplete.xlsx", encoding = "ISO-8859-1", dtype=object)
finaldf.columns = finaldf.columns.str.strip().str.replace(' ', '_').str.replace('(', '').str.replace(')', '')
#
today = pd.to_datetime(datetime.now().date())
day_of_week = today.dayofweek
last_monday = today - pd.to_timedelta(day_of_week, unit='d')
finaldf = finaldf[finaldf.Affliate_Code.str.contains('Part/Unix', na=False)]
f day_of_week !=0:
finaldf['Completed_Date'] = pd.to_datetime(finaldf['Completed_Date'], format="%m/%d/%Y").dt.date
finaldf['Due_Date'] = pd.to_datetime(finaldf['Due_Date'], format="%m/%d/%y").dt.date # making it lower case y made it work
current_week_flags = (finaldf.Completed_Date >= last_monday.date()) & (finaldf.Completed_Date <= today.date()) # this worked as of 4.16
earlydue = (finaldf.Due_Date < last_monday.date())
flags = current_week_flags & earlydue
finaldfmon = finaldf[current_week_flags]
finaldfmon.loc[(finaldfmon['Due_Date']<last_monday.date()), 'Due_Date'] = last_monday # here we make all the due dates before monday, monday while complete date filterered
finaldf.loc[(finaldf['Due_Date'] != finaldfmon['Due_Date']), 'Due_Date'] =
finaldfmon['Due_Date']
writer = pd.ExcelWriter('currentweek.xlsx', engine='xlsxwriter')
finaldf.to_excel(writer, index=False, sheet_name='Sheet1')
writer.save()
错误是:
raise ValueError("Can only compare identically-labeled "
ValueError: Can only compare identically-labeled Series objects
它是由:
finaldf.loc[(finaldf['Due_Date'] != finaldfmon['Due_Date']), 'Due_Date'] = finaldfmon['Due_Date']
解决方案
这不是答案,请参阅我在代码中的评论。另外,在这一点上,我认为这个问题更适合codereview。
finaldf['Completed_Date'] = pd.to_datetime(finaldf['Completed_Date'], format="%m/%d/%Y").dt.date
# making it lower case y made it work
finaldf['Due_Date'] = pd.to_datetime(finaldf['Due_Date'], format="%m/%d/%y").dt.date
# this worked as of 4.16
current_week_flags = (finaldf.Completed_Date >= last_monday.date()) & (finaldf.Completed_Date <= today.date())
earlydue = (finaldf.Due_Date < last_monday.date())
flags = current_week_flags & earlydue
finaldfmon = finaldf[current_week_flags]
# here we make all the due dates before monday, monday while complete date filterered
# this works because last_monday is a single day
finaldfmon.loc[(finaldfmon['Due_Date']<last_monday.date()), 'Due_Date'] = last_monday
# this fails in two places:
# finaldf.loc[(finaldf['Due_Date'] != finaldfmon['Due_Date']), 'Due_Date'] = finaldfmon['Due_Date']
# finaldf['Due_Date'] != finaldfmon['Due_Date']
# these two series have different length, so you can't compare them
# even if they have the same length, they have different indices
# (unless one of them is a single number/date, then it becomes the case above)
# finaldf.loc[..., 'Due_Date'] = finaldfmon['Due_Date']
# same story
writer = pd.ExcelWriter('currentweek.xlsx', engine='xlsxwriter')
finaldf.to_excel(writer, index=False, sheet_name='Sheet1')
writer.save()
下面的代码(主要是最后一行实现了目标
import pandas as pd
import xlrd # added when using visual studio
import datetime
from datetime import datetime
#read in excel file
finaldf = pd.read_excel("scrubcomplete.xlsx", encoding = "ISO-8859-1", dtype=object)
finaldf.columns = finaldf.columns.str.strip().str.replace(' ', '_').str.replace('(', '').str.replace(')', '')
#
today = pd.to_datetime(datetime.now().date())
day_of_week = today.dayofweek
last_monday = today - pd.to_timedelta(day_of_week, unit='d')
#
if day_of_week !=0:
finaldf['Completed_Date'] = pd.to_datetime(finaldf['Completed_Date'], format="%m/%d/%Y").dt.date
finaldf['Due_Date'] = pd.to_datetime(finaldf['Due_Date'], format="%m/%d/%y").dt.date # making it lower case y made it work
current_week_flags = (finaldf.Completed_Date >= last_monday.date()) & (finaldf.Completed_Date <= today.date())
finaldf.loc[(finaldf['Completed_Date'] >= last_monday.date()) & (finaldf['Completed_Date'] <= today.date()) & (finaldf['Due_Date'] < last_monday.date()), 'Due_Date'] = last_monday
推荐阅读
- regex - 正则表达式负前瞻无法正常工作
- javascript - 如何使用 NodeJS 脚本加载 JSON 文件?- 需要读取和上传目录中的所有 JSON 文件,文件之间可能会暂停
- python - 2D 中的单调插值
- python - ValueError:形状 (None, 8) 和 (None, 10) 不兼容
- powerbi - Power BI 矩阵 - 需要在列分组外显示度量
- sql - SQL获取所有孩子
- python - Plotly:将分位数范围添加到散点图中
- android - 我们可以使用相同的代码库来测试使用 appium 的 android 和 ios 应用程序吗?
- vb.net - 如何将存储在会话变量中的对象的属性用作 EntityDataSource 中的 WhereParameter?
- javascript - 在滚动时更改 id 的颜色