首页 > 解决方案 > 使用 Python 解析文本文件并通过选项对其进行排序

问题描述

我正在尝试解析 3 个不同的txt文件,其中包含诸如Abercrombie, Neil, Male, Tan, 2/13/1943.

尽管信息相同,但每个文件使用不同的分隔符:一个使用逗号,一个使用管道,一个使用空格。

这里的目标是能够读取txt文件格式,并按给定选项对它们进行排序;可以按性别、出生日期和姓氏。

我试图弄清楚我需要什么确切的方法,以及我是否应该为此开设一个课程。我有几个想法,因为我觉得有上千种写法。不过,我愿意接受建议。这些方法也将被测试。

import sys
import os

def main():
   filepath = sys.argv[1]
   sortOption = sys.argv[2]    

   if not os.path.isfile(filepath):
       print("File path {} does not exist. Exiting...".format(filepath))
       sys.exit()

   with open(filepath) as fp:
       for line in fp:
                    formatPerson(line)

def formatPerson(line):
    # delimiter = ... find the delimiter for each line and split the string as so
    person = line.strip().split(delimiter);
    return person

def formatDate(line):
    # read line and re-format the date

def sortContacts(option):
    # read all lines and sort them by the option

def sortByGender():
    # Read all contacts and sort them by gender

def sortByBirthdate():
    # Read all contacts and sort them by birthdate

def sortByLastname():
    # Read all contacts and sort them by last name

if __name__ == '__main__':
   main()

示例输出如下所示:

Hingis Martina Female 4/2/1979 Green
Kelly Sue Female 7/12/1959 Pink
Kournikova Anna Female 6/3/1975 Red
Seles Monica Female 12/2/1973 Black
Abercrombie Neil Male 2/13/1943 Tan
Bishop Timothy Male 4/23/1967 Yellow
Bonk Radek Male 6/3/1975 Green
Bouillon Francis Male 6/3/1975 Blue
Smith Steve Male 3/3/1985 Red

标签: python

解决方案


使用csv模块。您可以为每个文件设置不同的分隔符来读取文件。如果您事先知道并且可以将该信息硬编码到程序中,那是一个选择;否则,您可以只使用该行中不是字母数字或空格的第一个字符:

# use regex to find the first character in the file that isn't whitespace
with open("my_file.csv", "r") as infile:
    file_lines = infile.readlines()
    delimiter = re.search("\w+([^\w])", file_lines[0]).group(1)
    ...

然后,使用模块打开文件csv

reader = csv.reader(file_lines, delimiter=delimiter)
line_list = [row for row in reader]
# line_list is now a 2D list, where each element of the outer list is a list of tokens
#   on that row of the CSV file

现在,您可以根据line_list需要进行排序,方法是为内置sorted()函数提供自定义键以进行排序:

sorted_by_lastname = sorted(line_list, key=lambda elem:elem[0])
sorted_by_firstname = sorted(line_list, key=lambda elem:elem[1])
...

推荐阅读