python - Skipping x rows in an iterable of a subset of a dataframe
问题描述
I have a dataframe with a total number of 154529
rows through which I iterate by grouping it based on one of its columns.
During my iteration I look for a specific correlating row y
with regards to the current row x
(the one, the iterable is currently at). As soon as I found the row y
, I want to skip the iteration until one row/index after row y
.
To do so, I'm using the next(islice(...))
functionality. However, the islice
method always skips to the wrong index. My assumption is, that this is because of my iteration on subsets only but the indices are still relative to the whole dataframe.
I already tried to solve my problem b< applying reset_index()
on the sub-dataframe, but as I need the original indices for some assignments that are done during the looping, this approach doesn't work.
Can anybody help me with the finding of the correct Start parameter for the islice()
method?
Here are some example indices for deeper investigations. (I wasn't able to find a pattern in the offsets of the actual new indices.)
And here is my code
from itertools import islice
case_started = False
for session_id, session_df in labeled_data.groupby('SessionId'):
session_iterations = session_df.iterrows()
start_end_pairs = [] #store all start-end-pairs for each session
next_start_index = ''
for index, row in session_iterations:
# doing stuff to find row y
# doing some assignemnts with row y index and current row index
start_end_pairs.append((index, row_y))
next_start_index = case_end + 1
if next_start_index < session_df.index[-1]:
skip = case_end - index #skipping relative to current index
next(islice(session_iterations, skip, None), 'Stop') #skipping to next start index
else:
break
Thanks in advance for any kind of help or hints!
解决方案
问题似乎出在 的第二个参数中islice
,请尝试将其设置为skip
。
例子:
dataset['C'] = np.arange(len(dataset)) # just to validate iterator does not break
rowiter = dataset.iterrows()
for a, b in rowiter:
print("idx", a, "row number", b.C)
if a % 5 == 0:
next(islice(rowiter, 4, 4), None) # skipping the next four rows
if a > 10:
break
结果是:
idx 0 row number 0.0
idx 5 row number 5.0
idx 10 row number 10.0
idx 15 row number 15.0
这是预期的输出。
推荐阅读
- sql - 从 Delphi 中的 SQL 查询的输出中复制数据
- ios - 布局约束不起作用iOS swift
- javascript - React 中的双面输入滑块,在达到初始范围 0 后,它也会向左移动并增加值,为什么?
- microsoft-graph-api - 是否有一个 api 端点来列出私人聊天中的文件?
- php - 如何更新存储为行的表列?
- android - 改造导致 Expected BEGIN_OBJECT 但在第 1 行第 153 列路径 $.joining 处为 STRING
- automation - 运行机器人框架脚本时,出现错误:默认适配器失败
- python - Django导出excel文件而不保存它
- stock - 库存表更新
- python - ModuleNotFoundError:没有名为“pandas”的模块(运行线性回归时)