python - 如何重新排序 CSV 文件的行?
问题描述
前提和我想要达到的目标。
在Test2.csv中,我想根据Test1.csv中的“ID”列重新排列Test2.csv中的“ID”和“Nubmer”列。我会很感激你的建议。谢谢您的合作。这是数据的简化版本(超过 1000 行)。
相关源代码。
#!/usr/bin/python
# -*- coding: utf-8 -*-
import pandas as pd
input_path1 = "Test.csv"
input_path2 = "Test2.csv"
output_path = "output.csv"
df1 = pd.read_csv(filepath_or_buffer=input_path1, encoding="utf-8")
df2 = pd.read_csv(filepath_or_buffer=input_path2, encoding="utf-8")
df1 = df1.set_index('ID')
df2 = df2.set_index('ID')
for column_name, item in df2.iteritems():
item = "S_" + item
df2 = df1.reindex_like(df2)
with open(output_path, mode='w') as f:
f.write()
错误信息
Traceback (most recent call last):
File "/Users/macuser/downloads/yes/lib/python3.7/site-packages/pandas/core/ops/__init__.py", line 968, in na_op
result = expressions.evaluate(op, str_rep, x, y, **eval_kwargs)
File "/Users/macuser/downloads/yes/lib/python3.7/site-packages/pandas/core/computation/expressions.py", line 221, in evaluate
return _evaluate(op, op_str, a, b, **eval_kwargs)
File "/Users/macuser/downloads/yes/lib/python3.7/site-packages/pandas/core/computation/expressions.py", line 70, in _evaluate_standard
return op(a, b)
File "/Users/macuser/downloads/yes/lib/python3.7/site-packages/pandas/core/ops/roperator.py", line 9, in radd
return right + left
numpy.core._exceptions.UFuncTypeError: ufunc 'add' did not contain a loop with signature matching types (dtype('<U32'), dtype('<U32')) -> dtype('<U32')
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "narabekae.py", line 14, in <module>
item = "S_" + item
File "/Users/macuser/downloads/yes/lib/python3.7/site-packages/pandas/core/ops/__init__.py", line 1048, in wrapper
result = na_op(lvalues, rvalues)
File "/Users/macuser/downloads/yes/lib/python3.7/site-packages/pandas/core/ops/__init__.py", line 970, in na_op
result = masked_arith_op(x, y, op)
File "/Users/macuser/downloads/yes/lib/python3.7/site-packages/pandas/core/ops/__init__.py", line 464, in masked_arith_op
result[mask] = op(xrav[mask], y)
File "/Users/macuser/downloads/yes/lib/python3.7/site-packages/pandas/core/ops/roperator.py", line 9, in radd
return right + left
numpy.core._exceptions.UFuncTypeError: ufunc 'add' did not contain a loop with signature matching types (dtype('<U32'), dtype('<U32')) -> dtype('<U32')
理想状态(输出.csv)
ID,Number
AA,2.13
BB,2.21
CC,2.09
DD,2.38
EE,2.52
使用文件 1 (Test1.csv)
ID,Number
AA,2.1
BB,2.2
CC,2.3
DD,2.4
EE,2.5
使用文件 2 (Test2.csv)
ID,Number
CC,2.09
EE,2.52
AA,2.13
DD,2.38
BB,2.21
附加信息(例如,固件/工具版本)
macOS 10.15.7 Python 3.7.3 原子
解决方案
如果两个文件的行数相同,那么这应该会产生所需的输出:
df1 = pd.read_csv(filepath_or_buffer=input_path1, encoding="utf-8")
df2 = pd.read_csv(filepath_or_buffer=input_path2, encoding="utf-8")
(df1.merge(df2, how='left', on='ID')
.set_index('ID')
.drop('Number_x', axis='columns')
.rename({'Number_y': 'Number'}, axis='columns')
.to_csv(output_path)
)
输出:
ID,Number
AA,2.13
BB,2.21
CC,2.09
DD,2.38
EE,2.52
推荐阅读
- docker - 两个 docker 容器之间的 EF 核心连接字符串
- javascript - 在正文 html 更改上调用 Javascript 类
- scala - Scala 在给定符号及其位置的映射时查找水平和垂直单词
- sql - 选择每个设备的最大记录,记录在第 10 天
- linux - 获取启动进程的PID
- thymeleaf - Thymeleaf - 在对象变量上评估 SpringEL 表达式的异常
- python - 在特定维度上选择 numpy 数组中的索引
- dialogflow-es - 如何使用内联编辑器在 Dialogflow Fulfillment 上测试 http 请求
- php - 如何发布以下开关按钮的值
- java - 相对于其他编译语言,Java 中的字符串连接有多慢/快?