python - 对 Pandas DataFrame 的迭代
问题描述
迭代 DataFrame 的(最佳实践)正确方法是什么?
我在用:
for i in range(working.shape[0]):
for j in range(1, working.shape[1]):
working.iloc[i,j] = (100 - working.iloc[i,j])*100
以上是正确的,但与其他 Stack Overflow 答案不一致。我希望有人能解释为什么上述不是最优的,并提出一个更好的实现。
总的来说,我是一个编程新手,尤其是 Pandas。也很抱歉问了一个已经在 SF 上解决的问题:虽然我并没有真正理解这个问题的常规答案。可能重复,但这个答案对于新手来说很容易理解,如果不太全面的话。
解决方案
What is the (best practice) correct way to iterate over DataFrames?
There are several ways (for example iterrows
) but in general, you should try to avoid iteration at all costs. pandas offer several tools for vectorized operations which will almost always be faster than an iterative solution.
The example you provided can be vectorized in the following way using iloc
:
working.iloc[:, 1:] = (100 - working.iloc[:, 1:]) * 100
Some timings:
from timeit import Timer
working = pd.DataFrame({'a': range(50), 'b': range(50)})
def iteration():
for i in range(working.shape[0]):
for j in range(1, working.shape[1]):
working.iloc[i, j] = (100 - working.iloc[i, j]) * 100
def direct():
# in actual code you will have to assign back to working.iloc[:, 1:]
(100 - working.iloc[:, 1:]) * 100
print(min(Timer(iteration).repeat(50, 50)))
print(min(Timer(direct).repeat(50, 50)))
Outputs
0.38473859999999993
0.05334049999999735
A 7-factor difference and that's with only 50 rows.
推荐阅读
- docker - Docker swarm 尝试在我的 compose 文件中解析 ENV 变量的值(因为它有一个 go 模板)并给我一个错误
- java - 如何使用 JAVA 模拟 GET HTTP 请求
- javascript - Angular - 控制器中的自定义验证器功能:如何访问“this”?
- docker - 在 ubuntu 18.04 中编写 docker 文件时出错?
- javascript - 谁能告诉我为什么我无法从待办事项列表中删除项目?
- javascript - How to pass formarray to other component
- c# - How to write a test case for abstract class method
- url - 如何防止在 PWA 中打开多个带有深层链接的浏览器窗口?
- javascript - 如何在 ReactJS 中重定向到新页面,例如 www.google.com,例如单击按钮 window.location.href
- python-3.x - Python3代码使用IO而不是String IO上传到S3存储桶