首页 > 解决方案 > python类:逐步运行函数并保存

问题描述

我有一个读取数据帧的类,然后是另一个处理该数据帧的类。处理类中的函数应逐步应用于同一数据帧以形成最终数据帧,然后将其保存为 csv 文件。

from pydantic import BaseModel
from config import DATA_REPO
import pandas as pd
import os

class PandaDataFrame(BaseModel):
data: pd.DataFrame

      class Config:
          arbitrary_types_allowed = True

class Directory(BaseModel):
    data_directory: str

class DataToPandaReader(object):

    def csv_file_reader(self, directory: Directory):
        directory = directory.data_directory
        for file in os.listdir(directory):
            if file.endswith('.csv'):
               return pd.read_csv(os.path.join(directory, file))

class DataProcessor(object):

    def remove_punctuation(self, my_: PandaDataFrame):
        my_data_to_process = my_.data
        for col in my_data_to_process:
            if any(word in col for word in ['example', 'text', 'Answer']):
                my_data_to_process = my_data_to_process[col].str.replace('[^\w\s]', '', regex=True)
                return add_number_column(my_data_to_process)

    def add_number_column(self, my_: PandaDataFrame):
        my_data_to_process = my_.data
        my_data_to_process['sentence_number'] = range(len(my_data_to_process))
        return save_final_dataframe(my_data_to_process)

    def save_final_dataframe(self, my_:PandaDataFrame):
        my_data_to_process = my_.data
        return my_data_to_process.to_csv('final_data.csv')

def parse_data_process(directory_to_csv_file):
    toprocess = DataProcessor()
    toprocess.save_final_dataframe(directory_to_csv_file)
    toprocess.remove_punctuation(directory_to_csv_file)
    toprocess.add_number_column(directory_to_csv_file)
    return toprocess

if __name__ == '__main__':
    parse_data_process(PandaDataFrame(data= DataToPandaReader().csv_file_reader(Directory(data_directory = os.path.join(DATA_REPO, 'input_data')))))

现在,例如要实例化 DataProcessor 类中的第一个函数,我将执行以下操作

DataProcessor().remove_punctuation(PandaDataFrame(data= DataToPandaReader().csv_file_reader(Directory(data_directory = os.path.join(DATA_REPO, 'input_data')))))

但我的意图是逐步在 DataProcessor 类中运行所有这些函数,因此 save_final_dataset 函数将保存删除了标点符号并且还有一个数字列的数据帧。

更新:

按照给出的答案,我进行了这些更改,但得到了函数未知的错误。

def parse_data_process(directory_to_csv_file):
    toprocess = DataProcessor()
    toprocess.save_final_dataframe(directory_to_csv_file)
    toprocess.remove_punctuation(directory_to_csv_file)
    toprocess.add_number_column(directory_to_csv_file)
    return toprocess

if __name__ == '__main__':
    parse_data_process(PandaDataFrame(data= DataToPandaReader().csv_file_reader(Directory(data_directory = os.path.join(DATA_REPO, 'input_data')))))

标签: pythonpandasfunctionclass

解决方案


除非我误解了您的用例,否则您需要做的就是更换

return my_data_to_process

...在 remove_punctuation 函数中

return add_number_column(my_data_to_process)

...然后替换

return my_data_to_process

...在 add_number_column 函数中

return save_final_dataframe(my_data_to_process)

推荐阅读