首页 > 解决方案 > 应使用 python 将单列中的每 3 行分配给新的 3 列

问题描述

我有一个文本文件,需要每 3 行解析一次,并使用数据框中的 pandas/numpy 分配给新的三列。

示例 sample.txt 看起来像这样

com.google.plugin.system.url:540 ,,, 
178745,,, 
Country ,,, 
23-DEC-13 03-FEB-14 ,,, 
com.google.plugin.system.url:540 ,,, 
178744,,, 
Responsible ID ,,, 
23-DEC-13 03-FEB-14 ,,, ,,,
com.google.plugin.system.url:540 ,,,
 178743,,, 
Development Group ,,, 
23-DEC-13 03-FEB-14
##############################################################

预期输出应该是这样的

Name                                   ID        case         Date 
com.google.plugin.system.url:540     178745     Country   23-DEC-13 03-FEB-14
com.google.plugin.system.url:540     178744  Responsible ID  23-DEC-13 03-FEB-14
com.google.plugin.system.url:540      178744  Development Group  23-DEC-13 03-FEB-14

请帮助我任何人。如何从上面的数据框重新构建它

标签: pythonpandasnumpy

解决方案


我认为只要您的记录在 4 行段内,这应该可以解决问题:

import pandas as pd

#set file name and full path
file = 'filename.txt'

#read in file without headers and add a dummy column, make sure you reset index and keep it as it'll be your data
df = pd.read_csv(file, header=None, names=['record']).reset_index(drop=False)
#keep only the needed data
df = df[['level_0']]

#create new dataframe by reading values to each column based on location and segement 
new_df = pd.DataFrame({'Name':df['level_0'].iloc[::4].values, 'ID':df['level_0'].iloc[1::4].values, 'case':df['level_0'].iloc[2::4].values, 'Date':df['level_0'].iloc[3::4].values})

推荐阅读