首页 > 解决方案 > 合并同一数据框中的重复列

问题描述

我有来自我试图放在一起的各种 csv 文件的数据。我把它全部放在一个数据框中。如何将数据组合到相应的 A、B、C 列中并为每一行包含一个标题?

for data_base in data:
    base_data.append(data_base['A'])
    base_data.append(data_base[' B'])
    base_data.append(data_base[' C'] )
#    np.append(base_data, np.nan)
df_name = pd.DataFrame(name_join)
df_data = pd.DataFrame(base_data)
trp = np.transpose(df_data)

实际的:

A           B       C       A       B       C       A       B       C
0.7283  0.743   0.01    0.7283  0.7512  0.02    0.7283  0.7456  0.02
0.5165  0.488   0.03    0.5165  0.4756  0.04    0.5165  0.4707  0.05
0.5087  0.4781  0.03    0.5087  0.4611  0.05    0.5087  0.4467  0.06
0.4598  0.4834  0.02    0.4598  0.4938  0.03    0.4598  0.4793  0.02
0.4883  0.5235  0.04    0.4883  0.5173  0.03    0.4883  0.5278  0.04
0.5993  0.6229  0.02    0.5993  0.6223  0.02    0.5993  0.6258  0.03
0.5351  0.5983  0.06    0.5351  0.6029  0.07    0.5351  0.613   0.08
0.6105  0.6314  0.02    0.6105  0.6434  0.03    0.6105  0.6361  0.03
0.5946  0.6495  0.05    0.5946  0.6452  0.05    0.5946  0.6463  0.05
0.7335  0.7506  0.02    0.7335  0.7559  0.02    0.7335  0.7497  0.02

预期的:

    A       B       C
Cow 0.7283  0.743   0.01
    0.5165  0.488   0.03
    0.5087  0.4781  0.03
    0.4598  0.4834  0.02
    0.4883  0.5235  0.04
    0.5993  0.6229  0.02
    0.5351  0.5983  0.06
    0.6105  0.6314  0.02
    0.5946  0.6495  0.05
    0.7335  0.7506  0.02
Cat 0.7283  0.7512  0.02
    0.5165  0.4756  0.04
    0.5087  0.4611  0.05
    0.4598  0.4938  0.03
    0.4883  0.5173  0.03
    0.5993  0.6223  0.02
    0.5351  0.6029  0.07
    0.6105  0.6434  0.03
    0.5946  0.6452  0.05
    0.7335  0.7559  0.02
Dog 0.7283  0.7456  0.02
    0.5165  0.4707  0.05
    0.5087  0.4467  0.06
    0.4598  0.4793  0.02
    0.4883  0.5278  0.04
    0.5993  0.6258  0.03
    0.5351  0.613   0.08
    0.6105  0.6361  0.03
    0.5946  0.6463  0.05
    0.7335  0.7497  0.02

标签: pythonpandascsvdataframereshape

解决方案


这是基于 Nycbros 评论的解决方案。

import pandas as pd

# Dummy data
data_double = pd.DataFrame(data=[{'x': x, 'y': 2 * x} for x in range(5)])
data_triple = pd.DataFrame(data=[{'x': x, 'y': 3 * x} for x in range(5)])

print(data_double)

输出:

   x  y
0  0  0
1  1  2
2  2  4
3  3  6
4  4  8
print(data_triple)

输出:

   x   y
0  0   0
1  1   3
2  2   6
3  3   9
4  4  12

# You will need to get a list of keys which equate to your data
data = [data_double, data_triple]
keys = ['Double', 'Triple']

# Concatenate the dataframes in your data array, give it the keys to index with
combo = pd.concat(data, keys=keys)
print(combo)

输出:

          x   y
Double 0  0   0
       1  1   2
       2  2   4
       3  3   6
       4  4   8
Triple 0  0   0
       1  1   3
       2  2   6
       3  3   9
       4  4  12
# If you don't want the original indexes, you can drop them
combo = combo.reset_index(level=1, drop=True)
print(combo)

输出:

        x   y
Double  0   0
Double  1   2
Double  2   4
Double  3   6
Double  4   8
Triple  0   0
Triple  1   3
Triple  2   6
Triple  3   9
Triple  4  12

推荐阅读