首页 > 解决方案 > 如何在Python中选择行词作为列

问题描述

我想从行中选择单词作为列,然后从行中删除相同的单词。

我试图查看 .pivot 之类的熊猫功能,但没有说出来。

这是我的输入

['Sampling frequency: 8000 Hz',
 'Number of channels: 2 (16-bit integer)',
 'File name: /home/niraj/Documents/audiofiles/M1F1-int16.wav',
 'Sampling frequency: 8000',
 'Sampling frequency: 16000 Hz',
 'Number of channels: 1 (16-bit integer)',
 'File name: /home/niraj/Documents/jg00b1ss.wav',
 'Sampling frequency: 16000',
 'sample_rate: 16000',
 'Sampling frequency: 8000 Hz',
 'Number of channels: 2 (16-bit integer)',
 'File name: /home/niraj/Documents/M1F1-int16.wav',
 'Sampling frequency: 8000']

我正在寻找的预期输出是这个

    File name                sample_rate   Sampling frequency    Number of channels                                        
0  /home/niraj/Documents...  16000           8000Hz               2(16-bit integer)

如果找不到信息,则可以为空白或 N/A

标签: pythonpandas

解决方案


IIUC,每条记录都以“采样频率”数据开头,并以可能存在或不存在的一些其他值开头。

我们可以迭代您的数据并根据此键将其划分为记录,然后从这些记录中创建一个 DataFrame:

import pandas as pd

data = ['Sampling frequency: 8000 Hz',
 'Number of channels: 2 (16-bit integer)',
 'File name: /home/niraj/Documents/audiofiles/M1F1-int16.wav',
 'Sampling frequency: 8000',
 'Sampling frequency: 16000 Hz',
 'Number of channels: 1 (16-bit integer)',
 'File name: /home/niraj/Documents/jg00b1ss.wav',
 'Sampling frequency: 16000',
 'sample_rate: 16000',
 'Sampling frequency: 8000 Hz',
 'Number of channels: 2 (16-bit integer)',
 'File name: /home/niraj/Documents/M1F1-int16.wav',
 'Sampling frequency: 8000']

records = []
for line in data:
    key, value = line.split(": ")
    if key == "Sampling frequency":
        records.append({key:value})
    records[-1][key]=value

df = pd.DataFrame.from_records(records)

print(df)

pandas 已经将NaN用于记录中缺少的每个字段。


推荐阅读