首页 > 解决方案 > 如何使用特定格式的熊猫从文本文件中读取数据?

问题描述

我有一个包含如下数据的文本文件。

20/12/2018 
This is the test text. 

22/12/2018
* 21/12/2018 
This is a test text where the text is written on later than the actual date.

现在让我们说,上面的数据与文本文件 (text.txt) 中的日期。我需要一种方法来读取该数据并将其放在熊猫数据框中。我想将它们读入列中,

日期文本 DateOfWritten

日期将采取应该是文本的实际日期。例如,21/22/2018 应该是日期。22/12/2018 应该是 DateOfWritten

预期的输出应该是这样的: 在此处输入图像描述

提前致谢。

标签: pythonpandas

解决方案


这可能是一种解决方案

from collections import defaultdict
import pandas as pd

dict_for_df = defaultdict(list)
last_find = None
last_date = None

with open("test.txt",'r') as f:
    for line in f.readlines():
        curr_find = line.find("/")
        if line == "\n":
            continue
        elif curr_find == 2:
            Date = line.replace("\\n","").strip()
            dict_for_df['DateOfWritten'].append(Date)
            last_date = Date
            last_find = 2
        elif (last_find == 2 and  curr_find != 4):
            dict_for_df['Date'].append(last_date)
            dict_for_df['text'].append(line.replace("\n","").strip())
            last_find = 0
            last_date = ''
        elif curr_find == 4:
            dict_for_df['Date'].append(line.replace("*","").replace("\n","").strip())
            last_date = ""
            last_find = None
        else:
            dict_for_df['text'].append(line.replace("\n","").strip())
            last_date = ""
            last_find = None

df =  pd.DataFrame(dict_for_df)

推荐阅读