sorting - 对两个文本文件进行排序,其缩进文本与其对齐
问题描述
我想比较执行前后生成的两个日志文件,看看它是否影响了任何东西。但是,我得到的日志的顺序并不总是相同的。因为日志文件也有多个缩进行,所以当我尝试排序时,所有内容都已排序。但是,我想让孩子和父母保持完整。缩进的行是空格而不是制表符。
任何帮助将不胜感激。我对任何 Windows 解决方案或 Linux 解决方案都很好。
例如文件:
#这是一个示例代码
Parent1 to be verified
Child1 to be verified
Child2 to be verified
Child21 to be verified
Child23 to be verified
Child22 to be verified
Child221 to be verified
Child4 to be verified
Child5 to be verified
Child53 to be verified
Child52 to be verified
Child522 to be verified
Child521 to be verified
Child3 to be verified
解决方案
我在这里发布另一个答案,使用python
.
这个想法是将父母附加到孩子,以确保同一父母下的孩子被排序在一起。
请参阅下面的 python 脚本:
"""Attach parent to children in an indentation-structured text"""
from typing import Tuple, List
import sys
# A unique separator to separate the parent and child in each line
SEPARATOR = '@'
# The indentation
INDENT = ' '
def parse_line(line: str) -> Tuple[int, str]:
"""Parse a line into indentation level and its content
with indentation stripped
Args:
line (str): One of the lines from the input file, with newline ending
Returns:
Tuple[int, str]: The indentation level and the content with
indentation stripped.
Raises:
ValueError: If the line is incorrectly indented.
"""
# strip the leading white spaces
lstripped_line = line.lstrip()
# get the indentation
indent = line[:-len(lstripped_line)]
# Let's check if the indentation is correct
# meaning it should be N * INDENT
n = len(indent) // len(INDENT)
if INDENT * n != indent:
raise ValueError(f"Wrong indentation of line: {line}")
return n, lstripped_line.rstrip('\r\n')
def format_text(txtfile: str) -> List[str]:
"""Format the text file by attaching the parent to it children
Args:
txtfile (str): The text file
Returns:
List[str]: A list of formatted lines
"""
formatted = []
par_indent = par_line = None
with open(txtfile) as ftxt:
for line in ftxt:
# get the indentation level and line without indentation
indent, line_noindent = parse_line(line)
# level 1 parents
if indent == 0:
par_indent = indent
par_line = line_noindent
formatted.append(line_noindent)
# children
elif indent > par_indent:
formatted.append(par_line +
SEPARATOR * (indent - par_indent) +
line_noindent)
par_indent = indent
par_line = par_line + SEPARATOR + line_noindent
# siblings or dedentation
else:
# We just need first `indent` parts of parent line as our prefix
prefix = SEPARATOR.join(par_line.split(SEPARATOR)[:indent])
formatted.append(prefix + SEPARATOR + line_noindent)
par_indent = indent
par_line = prefix + SEPARATOR + line_noindent
return formatted
def sort_and_revert(lines: List[str]):
"""Sort the formatted lines and revert the leading parents
into indentations
Args:
lines (List[str]): list of formatted lines
Prints:
The sorted and reverted lines
"""
sorted_lines = sorted(lines)
for line in sorted_lines:
if SEPARATOR not in line:
print(line)
else:
leading, _, orig_line = line.rpartition(SEPARATOR)
print(INDENT * (leading.count(SEPARATOR) + 1) + orig_line)
def main():
"""Main entry"""
if len(sys.argv) < 2:
print(f"Usage: {sys.argv[0]} <file>")
sys.exit(1)
formatted = format_text(sys.argv[1])
sort_and_revert(formatted)
if __name__ == "__main__":
main()
让我们将其另存为format.py
,我们有一个测试文件,例如test.txt
:
parent2
child2-1
child2-1-1
child2-2
parent1
child1-2
child1-2-2
child1-2-1
child1-1
让我们测试一下:
$ python format.py test.txt
parent1
child1-1
child1-2
child1-2-1
child1-2-2
parent2
child2-1
child2-1-1
child2-2
如果你想知道format_text
函数如何格式化文本,这里是中间结果,这也解释了为什么我们可以让文件按我们想要的方式排序:
parent2
parent2@child2-1
parent2@child2-1@child2-1-1
parent2@child2-2
parent1
parent1@child1-2
parent1@child1-2@child1-2-2
parent1@child1-2@child1-2-1
parent1@child1-1
您可能会看到每个孩子都有其父母,一直到根。这样同一个父级下的子级就排序在一起了。
推荐阅读
- android - Flutter 如何获得 Future
到正常的布尔类型 - c# - 如何在 C# 中将字符串转换为 int 数组?
- java - 使用 Lombok 生成的构造函数时,“类不包含用于自动装配的匹配构造函数”错误
- sql - postgresl SQL 语句显示区间外的日期
- file - LUA 文件读取
- python - 在 ctypes.func.argtypes 中表示返回为 CreateWindow 的 GLFWwindow 句柄的适当结构是什么?
- firebase - 在flutter项目中处理Firestore中的海量数据
- android - 将不同类型的列表合并到List中
, 但保留他们的财产 - spring - Vaadin,JPA,Spring - 具有组合主键的实体的简单表单
- flutter - Flutter:更改 TextField 焦点上的按钮颜色