首页 > 解决方案 > 如何通过更改 Python 中的值来分组或连接不同的单元格

问题描述

我之前发布了这个问题,但我添加了一些新的评论 - 我有一个大数据框,我试图弄清楚如何将具有不同值的单元格连接到一个单元格中,给定以下数据框:
DF1:数据和名称为标题

    Data,                          Name
    Address State1,                Name1
    Household = 1,                 Name1
    1012 Address 123 City,         Name1
    1013 Address Zip 12345,        Name1
    1012 Address 234 City,         Name1
    1013 Address Zip 23456,        Name1
    Address State2,                Name2
    Household = 2,                 Name2
    1012 Address 345 City,         Name2
    1013 Address Zip 34567,        Name2
    1012 Address 456 City,         Name2
    1013 Address Zip 45678,        Name2
    .......... dataframe repeats with different values for 10,000+ lines

1012 和 1013 是不同的重复序列 X 次。我不能只使用一个groupby函数,因为 1012 和 1013 单元格中的值正在变化。我正在尝试将地址、家庭、1012...、1013... 合并到一个单元格中。我想要的输出是:
DFOut:

    Data,                                                                                        Name
    Address State1   Household = 1   1012 Address 123 City        1013 Address Zip 12345,        Name1
    Address State1   Household = 1   1012 Address 234 City        1013 Address Zip 23456,        Name1
    Address State2   Household = 2   1012 Address 345 City        1013 Address Zip 34567,        Name2
    Address State2   Household = 2   1012 Address 456 City        1013 Address Zip 45678,        Name2
    ..... repeats for entire dataframe 10,000+ lines in DF1

或者,Data也可以分离 DFOut 中列中的单元格:

    Data,            Number,         Seq,                         Seq1,                          Name
    Address State1,  Household = 1,  1012 Address 123 City,       1013 Address Zip 12345,        Name1
    Address State1,  Household = 1,  1012 Address 234 City,       1013 Address Zip 23456,        Name1
    Address State2,  Household = 2,  1012 Address 345 City,       1013 Address Zip 34567,        Name2
    Address State2,  Household = 2,  1012 Address 456 City,       1013 Address Zip 45678,        Name2
    ..... repeats for entire dataframe 10,000+ lines in DF1

我尝试使用一些for循环来Data根据值搜索列,然后将不同的值连接到一列中,但是Name这样做之后我出于某种原因松开了该列。我对 Python 相当陌生,任何帮助将不胜感激。提前致谢!

标签: pythonpython-3.xexcelpandascsv

解决方案


如果您知道总是以相同的顺序存在相同的字段,您可以使用 numpy reshape 执行以下操作:


df = pd.DataFrame({'Data': ['a1', 'a2', 'a3', 'b1', 'b2', 'b3']})
to_reshape = np.array(df['Data'])
reshaped = to_reshape.reshape((2, 3))
df = pd.DataFrame(data=reshaped, columns=['1', '2', '3'])
print(df)

>>>     1   2   3
>>> 0  a1  a2  a3
>>> 1  b1  b2  b3

然后你可以添加名称列。要知道有多少行,您可以计算唯一名称。


推荐阅读