rust - 如何使用 Rust nom 为这种结构文本编写解析器?
问题描述
我有以下数据
let data = r#"title1
title1 line1
title1 line2
sep/
title2
title2 line1
title2 line2
title2 line3
sep/
title3
title3 line1
sep/"#;
基本上它代表三个条目:
struct Entry {
title: String,
body: String,
}
每个条目都有一个标题和一个正文。标题使用单行(不包括行尾),正文使用所有后续行,直到遇到分隔线 ( sep/
)。我想要的结果是一个条目向量。如何使用 nom 来解析它?我对 nom 很陌生,我不能让这些部分一起工作并形成一个工作解析器。以下是我所拥有的:
use nom::IResult;
use nom::branch::alt;
use nom::bytes::complete::{tag, take_until, is_not, is_a};
use nom::error::ErrorKind::ParseTo;
use nom::sequence::{pair, tuple, delimited, terminated};
use nom::combinator::opt;
use nom::error::{Error, ErrorKind};
use nom::character::complete::line_ending;
use nom::regexp::str::{re_find, re_match, re_matches, re_capture};
use nom::multi::many0;
struct Entry {
title: String,
body: String,
}
fn get_entry_title(i: &str) -> IResult<&str, &str> {
delimited(tag(""),
take_until(alt((
tag("\r\n"),
tag("\n")
))),
alt((
tag("\r\n"),
tag("\n")
))
)(i)
}
fn get_entry_body(i: &str) -> IResult<&str, &str> {
terminated(
take_until( tag("sep/")),
tag("sep/")
)(i)
}
fn main() {
let data = r#"title1
title1 line1
title1 line2
sep/
title2
title2 line1
title2 line2
title2 line3
sep/
title3
title3 line1
sep/"#;
let result = get_entry_title(&data);
}
解决方案
这是一种仅标称的方法(标称 6.1.2):
use nom::IResult;
use nom::branch::alt;
use nom::bytes::complete::{tag, take_while};
use nom::sequence::{terminated, delimited, pair};
use nom::multi::{separated_list0, many1};
#[derive(Debug)]
struct Entry {
title: String,
body: String,
}
fn main() {
let input = r#"title1
title1 line1
title1 line2
sep/
title2
title2 line1
title2 line2
title2 line3
sep/
title3
title3 line1
sep/"#;
let (_, entries) = parse(input).unwrap();
println!("{:#?}", entries);
}
fn parse(input: &str) -> IResult<&str, Vec<Entry>> {
separated_list0(
separator,
entry,
)(input)
}
fn entry(input: &str) -> IResult<&str, Entry> {
let (input, title) = title(input)?;
let (input, body_lines) = many1(body_line(title))(input)?;
let body = body_lines.join("");
let entry = Entry {
title: title.to_owned(),
body,
};
//TODO: Does it have to end with a separator ?
// If it does, either use terminated() in combination with many(), or add
// an additional check for separator here
IResult::Ok((input, entry))
}
fn title(input: &str) -> IResult<&str, &str> {
terminated(
take_while(not_r_n),
end_of_line,
)(input)
}
pub fn body_line<'i>(title: &'i str) -> impl FnMut(&'i str) -> IResult<&'i str, &'i str, nom::error::Error<&'i str>>
{
move |input: &str| {
delimited(
pair(tag(title), tag(" ")),
take_while(not_r_n),
end_of_line,
)(input)
}
}
fn separator(input: &str) -> IResult<&str, &str> {
terminated(
tag("sep/"), // the separator is hardcoded, otherwise you have to do the same monstrosity as body_line() above
end_of_line,
)(input)
}
fn end_of_line(input: &str) -> IResult<&str, &str> {
alt((
tag("\n"),
tag("\r\n")
))(input)
}
fn not_r_n(ch: char) -> bool {
ch != '\r' && ch != '\n'
}
推荐阅读
- python - PyTorch: Error>> expected scalar type float but found double
- node.js - websocket node.js module "ws" not working - infinite load
- javascript - Shared tooltip on multiple charts not formatting correctly
- android - Request from the database during activity creation
- kotlin - Kotlin:在构造函数中访问非最终属性名称
- android - 将新 ns 映射到旧 ns
- authentication - NextJS 静态 SSG 认证
- c++ - 数组结束时指针不应该指向 nullptr 吗?
- c# - WPF 绑定用户控件属性不起作用
- r - R用正则表达式重命名所有列