首页 > 解决方案 > 使用需要以特定单词结尾的 parsec 解析字符串?

问题描述

我正在做一些编程练习。我正在研究的输入格式如下:

Give xxxxxxxxx as yyyy.

xxxxxxxx 可以是在这些练习中反复出现的多种格式。特别是它的二进制(由空格分隔的 8 组)、十六进制(不带空格)或八进制(最多 3 个数字的组)。我已经为这些格式编写了解析器——但是它们都被“as”绊倒了。他们看起来像这样

binaryParser = BinaryQuestion  <$> (count 8 ( oneOf "01") ) `sepBy1` space

我用这个怪物解决了(修剪了不必要的代码)

{-# LANGUAGE OverloadedStrings #-}
import Text.Parsec.ByteString
import Text.Parsec
import Text.Parsec.Char
import Data.ByteString.Char8 (pack, unpack, dropWhile, drop, snoc)
import qualified Data.ByteString as B 

data Input = BinaryQuestion [String] 
           | HexQuestion [String]
           | OctalQuestion [String]
  deriving Show
data Question = Question {input :: Input, target :: Target} deriving Show
data Target = Word deriving Show

test1 :: B.ByteString
test1 = "Give 01110100 01110101 01110010 01110100 01101100 01100101 as a word."
test2 :: B.ByteString
test2 = "Give 646f63746f72 as a word."
test3 :: B.ByteString
test3 = "Give 164 151 155 145 as a word."

targetParser :: Parser Target
targetParser = string "word" >> return Word

wrapAs :: Parser a -> Parser [a]
wrapAs kind = manyTill kind (try (string " as"))
inputParser :: Parser Input
inputParser = choice [try binaryParser, try (space >> hexParser), try octParser]
binaryParser :: Parser Input
binaryParser = BinaryQuestion  <$> wrapAs (space >> count 8 ( oneOf "01") )
hexParser :: Parser Input
hexParser = HexQuestion <$> wrapAs (count 2 hexDigit)
octParser :: Parser Input
octParser = OctalQuestion  <$> wrapAs (many1 space >> many1 (oneOf ['0'..'7']))

questionParser :: Parser Question
questionParser = do
  string "Give"
  inp <- inputParser 
  string " a "
  tar <- targetParser
  char '.'
  eof
  return $ Question inp tar

我不喜欢在输入解析中使用以下字符串“as”,而且它们通常可读性较差。我的意思是使用正则表达式,有一个尾随字符串是微不足道的。所以我对我的解决方案不满意。

有没有办法可以重用“好”的解析器 - 或者至少使用更具可读性的解析器?

补充说明

我希望我能开始工作的代码如下所示:

{-# LANGUAGE OverloadedStrings #-}

import Text.Parsec.ByteString
import Text.Parsec
import Text.Parsec.Char
import Data.ByteString.Char8 (pack, unpack, dropWhile, drop, snoc)
import qualified Data.ByteString as B 

data Input = BinaryQuestion [String] 
           | HexQuestion [String]
           | OctalQuestion [String]
  deriving Show
data Question = Question {input :: Input, target :: Target} deriving Show
data Target = Word deriving Show

test1 :: B.ByteString
test1 = "Give 01110100 01110101 01110010 01110100 01101100 01100101 as a word."
test2 :: B.ByteString
test2 = "Give 646f63746f72 as a word."
test3 :: B.ByteString
test3 = "Give 164 151 155 145 as a word."

targetParser :: Parser Target
targetParser = string "word" >> return Word

inputParser :: Parser Input
inputParser = choice [try binaryParser, try hexParser, try octParser]
binaryParser :: Parser Input
binaryParser = BinaryQuestion  <$> count 8 ( oneOf "01") `sepBy1` space
hexParser :: Parser Input
hexParser = HexQuestion <$> many1 (count 2 hexDigit)
octParser :: Parser Input
octParser = OctalQuestion  <$>  (many1 (oneOf ['0'..'7'])) `sepBy1` space

questionParser :: Parser Question
questionParser = do
  string "Give"
  many1 space
  inp <- inputParser 
  many1 space
  string "as a"
  many1 space
  tar <- targetParser
  char '.'
  eof
  return $ Question inp tar

parseTest questionParser test3会归还我parse error at (line 1, column 22): unexpected "a"

我想问题是空格被用作输入内的分隔符,但也出现在as a字符串中。我在 parsec 中看不到任何适合的函数。在沮丧中,我尝试try在各个地方添加 - 但没有成功。

标签: haskellparsec

解决方案


编辑:

正如评论中所说,干净的解析器不能被本文末尾所述的Previouse 解决方案重用。

它导致开发了一个使用 Parsec 的小型解析器来处理结束解析以空格分隔的数字字符串的所有可能情况,即

  1. 以空格结尾,后跟非必需数字字符,例如“..11 as”
  2. 以空格结尾,例如“..11”
  3. 以 结尾eof,例如“..11”

以及如下这样的解析器:

numParser:: (Parser Char->Parser String)->[Char]->Parser [String]
numParser repeatParser digits = 
    let digitParser = repeatParser $ oneOf digits
        endParser = (try $ lookAhead $ (space >> noneOf digits)) <|>
                    (try $ lookAhead $ (space <* eof))           <|> 
                    (eof >> return ' ')
    in do init <- digitParser
          rest <- manyTill (space >> digitParser) endParser
          return (init : rest)

并且binaryParser需要octParser修改如下:

binaryParser = BinaryQuestion <$> numParser (count 8) "01"
octParser    = OctalQuestion  <$> numParser many1 ['0'..'7']

并且没有什么需要改变问题中提到的问题解析器,作为参考,我在这里再次声明:

questionParser = do
  string "Give"
  many1 space
  inp <- inputParser 
  many1 space       --no need change to many
  string "as a"
  many1 space     
  tar <- targetParser
  char '.'
  eof
  return $ Question inp tar

以前的解决方案:

函数endBy1manyinText.Parsec在这种情况下很有帮助。

替换sepBy1endBy1_

binaryParser = BinaryQuestion  <$> count 8 ( oneOf "01") `endBy1` space

octParser = OctalQuestion  <$>  (many1 (oneOf ['0'..'7'])) `endBy1` space

不像sepBy1,endBy1将读取接下来的一些字符以确定是否结束解析,因此将消耗最后一个数字后的一个空格,即

Give 164 151 155 145 as a word.
                    ^ this space will be consumed

所以,不是在 "as a..." 之前检查一个或多个空格,而是需要检查个或多个空格,那么为什么使用manyfunction 而不是many1,现在代码变为:

...
inp <- inputParser 
many space            -- change to many
string "as a"
.... 

推荐阅读