首页 > 解决方案 > 如何重新排序 CSV 文件的行?

问题描述

前提和我想要达到的目标。

在Test2.csv中,我想根据Test1.csv中的“ID”列重新排列Test2.csv中的“ID”和“Nubmer”列。我会很感激你的建议。谢谢您的合作。这是数据的简化版本(超过 1000 行)。

相关源代码。

#!/usr/bin/python
# -*- coding: utf-8 -*-
import pandas as pd

input_path1 = "Test.csv"
input_path2 = "Test2.csv"
output_path = "output.csv"
df1 = pd.read_csv(filepath_or_buffer=input_path1, encoding="utf-8")
df2 = pd.read_csv(filepath_or_buffer=input_path2, encoding="utf-8")
df1 = df1.set_index('ID')
df2 = df2.set_index('ID')

for column_name, item in df2.iteritems():
    item = "S_" + item
    df2 = df1.reindex_like(df2)

with open(output_path, mode='w') as f:
    f.write()

错误信息

Traceback (most recent call last):
  File "/Users/macuser/downloads/yes/lib/python3.7/site-packages/pandas/core/ops/__init__.py", line 968, in na_op
    result = expressions.evaluate(op, str_rep, x, y, **eval_kwargs)
  File "/Users/macuser/downloads/yes/lib/python3.7/site-packages/pandas/core/computation/expressions.py", line 221, in evaluate
    return _evaluate(op, op_str, a, b, **eval_kwargs)
  File "/Users/macuser/downloads/yes/lib/python3.7/site-packages/pandas/core/computation/expressions.py", line 70, in _evaluate_standard
    return op(a, b)
  File "/Users/macuser/downloads/yes/lib/python3.7/site-packages/pandas/core/ops/roperator.py", line 9, in radd
    return right + left
numpy.core._exceptions.UFuncTypeError: ufunc 'add' did not contain a loop with signature matching types (dtype('<U32'), dtype('<U32')) -> dtype('<U32')

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "narabekae.py", line 14, in <module>
    item = "S_" + item
  File "/Users/macuser/downloads/yes/lib/python3.7/site-packages/pandas/core/ops/__init__.py", line 1048, in wrapper
    result = na_op(lvalues, rvalues)
  File "/Users/macuser/downloads/yes/lib/python3.7/site-packages/pandas/core/ops/__init__.py", line 970, in na_op
    result = masked_arith_op(x, y, op)
  File "/Users/macuser/downloads/yes/lib/python3.7/site-packages/pandas/core/ops/__init__.py", line 464, in masked_arith_op
    result[mask] = op(xrav[mask], y)
  File "/Users/macuser/downloads/yes/lib/python3.7/site-packages/pandas/core/ops/roperator.py", line 9, in radd
    return right + left
numpy.core._exceptions.UFuncTypeError: ufunc 'add' did not contain a loop with signature matching types (dtype('<U32'), dtype('<U32')) -> dtype('<U32')

理想状态(输出.csv)

ID,Number
AA,2.13
BB,2.21
CC,2.09
DD,2.38
EE,2.52

使用文件 1 (Test1.csv)

ID,Number
AA,2.1
BB,2.2
CC,2.3
DD,2.4
EE,2.5

使用文件 2 (Test2.csv)

ID,Number
CC,2.09
EE,2.52
AA,2.13
DD,2.38
BB,2.21

附加信息(例如,固件/工具版本)

macOS 10.15.7 Python 3.7.3 原子

标签: pythoncsv

解决方案


如果两个文件的行数相同,那么这应该会产生所需的输出:

df1 = pd.read_csv(filepath_or_buffer=input_path1, encoding="utf-8")
df2 = pd.read_csv(filepath_or_buffer=input_path2, encoding="utf-8")
(df1.merge(df2, how='left', on='ID')
    .set_index('ID')
    .drop('Number_x', axis='columns')
    .rename({'Number_y': 'Number'}, axis='columns')
    .to_csv(output_path)
)

输出:

ID,Number
AA,2.13
BB,2.21
CC,2.09
DD,2.38
EE,2.52

推荐阅读