首页 > 解决方案 > 在执行操作时转置此表的最快方法

问题描述

我有这个:

name phenotype
ID1      tall
ID2      tall
ID3     short
ID4      tall

我要这个:

ID1  ID2  ID3  ID4 phenotype
yes  yes   no  yes      tall
no   no  yes   no     short

我试过这种方法:

df = pd.DataFrame({'name' : ['ID1', 'ID2', 'ID3', 'ID4'], 'phenotype' : ['tall', 'tall', 'short', 'tall']})

sample_dict = {}
for sample in df['name']:
    var_list = []
    for variant in df['phenotype'].unique():
        sample_subset = df[df['name'] == sample]
        if variant in sample_subset['phenotype'].to_list():
            var_list.append('yes')
        else:
            var_list.append('no')

        sample_dict[sample] = var_list
sample_dict['phenotype'] = ['tall', 'short']
sample_df = pd.DataFrame(sample_dict)

在执行我描述的操作时,是否有更好更快的转置该表的方法?

标签: pythonpandas

解决方案


尝试pd.crosstab

print(
    pd.crosstab(df["phenotype"], df["name"])
    .replace({0: "no", 1: "yes"})
    .reset_index()
    .rename_axis("", axis=1)
)

印刷:

  phenotype  ID1  ID2  ID3  ID4
0     short   no   no  yes   no
1      tall  yes  yes   no  yes

推荐阅读