首页 > 解决方案 > 对两个文本文件进行排序,其缩进文本与其对齐

问题描述

我想比较执行前后生成的两个日志文件,看看它是否影响了任何东西。但是,我得到的日志的顺序并不总是相同的。因为日志文件也有多个缩进行,所以当我尝试排序时,所有内容都已排序。但是,我想让孩子和父母保持完整。缩进的行是空格而不是制表符。

任何帮助将不胜感激。我对任何 Windows 解决方案或 Linux 解决方案都很好。

例如文件:

#这是一个示例代码

Parent1 to be verified

    Child1 to be verified

    Child2 to be verified
        Child21 to be verified
        Child23 to be verified
        Child22 to be verified
            Child221 to be verified

    Child4 to be verified

    Child5 to be verified
        Child53 to be verified
        Child52 to be verified
            Child522 to be verified
            Child521 to be verified

    Child3 to be verified

标签: sortinghierarchy

解决方案


我在这里发布另一个答案,使用python.

这个想法是将父母附加到孩子,以确保同一父母下的孩子被排序在一起。

请参阅下面的 python 脚本:

"""Attach parent to children in an indentation-structured text"""
from typing import Tuple, List
import sys

# A unique separator to separate the parent and child in each line
SEPARATOR = '@'
# The indentation
INDENT = '    '

def parse_line(line: str) -> Tuple[int, str]:
    """Parse a line into indentation level and its content
    with indentation stripped

    Args:
        line (str): One of the lines from the input file, with newline ending

    Returns:
        Tuple[int, str]: The indentation level and the content with
            indentation stripped.

    Raises:
        ValueError: If the line is incorrectly indented.
    """
    # strip the leading white spaces
    lstripped_line = line.lstrip()
    # get the indentation
    indent = line[:-len(lstripped_line)]

    # Let's check if the indentation is correct
    # meaning it should be N * INDENT
    n = len(indent) // len(INDENT)
    if INDENT * n != indent:
        raise ValueError(f"Wrong indentation of line: {line}")

    return n, lstripped_line.rstrip('\r\n')


def format_text(txtfile: str) -> List[str]:
    """Format the text file by attaching the parent to it children

    Args:
        txtfile (str): The text file

    Returns:
        List[str]: A list of formatted lines
    """
    formatted = []
    par_indent = par_line = None

    with open(txtfile) as ftxt:
        for line in ftxt:
            # get the indentation level and line without indentation
            indent, line_noindent = parse_line(line)

            # level 1 parents
            if indent == 0:
                par_indent = indent
                par_line = line_noindent
                formatted.append(line_noindent)

            # children
            elif indent > par_indent:
                formatted.append(par_line +
                                 SEPARATOR * (indent - par_indent) +
                                 line_noindent)

                par_indent = indent
                par_line = par_line + SEPARATOR + line_noindent

            # siblings or dedentation
            else:
                # We just need first `indent` parts of parent line as our prefix
                prefix = SEPARATOR.join(par_line.split(SEPARATOR)[:indent])
                formatted.append(prefix + SEPARATOR + line_noindent)
                par_indent = indent
                par_line = prefix + SEPARATOR + line_noindent

    return formatted

def sort_and_revert(lines: List[str]):
    """Sort the formatted lines and revert the leading parents
    into indentations

    Args:
        lines (List[str]): list of formatted lines

    Prints:
        The sorted and reverted lines
    """
    sorted_lines = sorted(lines)
    for line in sorted_lines:
        if SEPARATOR not in line:
            print(line)
        else:
            leading, _, orig_line = line.rpartition(SEPARATOR)
            print(INDENT * (leading.count(SEPARATOR) + 1) + orig_line)

def main():
    """Main entry"""
    if len(sys.argv) < 2:
        print(f"Usage: {sys.argv[0]} <file>")
        sys.exit(1)

    formatted = format_text(sys.argv[1])
    sort_and_revert(formatted)

if __name__ == "__main__":
    main()

让我们将其另存为format.py,我们有一个测试文件,例如test.txt

parent2
    child2-1
        child2-1-1
    child2-2
parent1
    child1-2
        child1-2-2
        child1-2-1
    child1-1

让我们测试一下:

$ python format.py test.txt
parent1
    child1-1
    child1-2
        child1-2-1
        child1-2-2
parent2
    child2-1
        child2-1-1
    child2-2

如果你想知道format_text函数如何格式化文本,这里是中间结果,这也解释了为什么我们可以让文件按我们想要的方式排序:

parent2
parent2@child2-1
parent2@child2-1@child2-1-1
parent2@child2-2
parent1
parent1@child1-2
parent1@child1-2@child1-2-2
parent1@child1-2@child1-2-1
parent1@child1-1

您可能会看到每个孩子都有其父母,一直到根。这样同一个父级下的子级就排序在一起了。


推荐阅读