首页 > 解决方案 > Lua中的Pandoc过滤器将跨度插入现有字符串

问题描述

我正在为 pandoc 编写一个 Lua 过滤器,它将词汇表函数添加到降价文件的 HTML 输出中。目标是将鼠标悬停文本添加到文档中每个首字母缩略词或键定义的出现处。

我希望能够在列表中包含首字母缩略词(由标点符号包围),但不能包含字母(例如,CO 不会在诸如钴之类的单词中突出显示)。

我的 MWE 在这个计数上失败了,因为 Pandoc AST 中的字符串包含相邻的标点符号(例如Str "CO/DBP/SBP,"or Str "CO,",Space,Str "SBP,")。

-- # MWE
-- Parse glossary file (summarised here for brevity)
local glossary = {CO = "Cardiac Output", DBP = "Diastolic Blood Pressure", SBP = "Systolic Blood Pressure"}

-- Substitute glossary term for span with a mouseover link
function Str(elem)
  for key, value in next, glossary do
    if elem.text == key then
      return pandoc.Span (key, {title = value, class = "glossary"})
    end
  end
end

我玩过,string.substring.find没能得到任何可行的东西,主要是因为我不确定如何返回新的 Span 和 Str(减去它的新 Span)。任何帮助,将不胜感激!


我的测试降价包含:

# Acronyms: SBP, DBP & CO

Spaced acronyms: CO and SBP and DBP.

In a comma-separated list: CO, SBP, DBP; with backslashes; CO/DBP/SBP, and in bullet points:
  
* CO
* SBP
* DBP

标签: luapandoc

解决方案


您可以只返回一个包含多个元素的表格。我的想法是寻找第一个分隔符,然后用跨度替换词汇表条目:

-- Parse glossary file (summarised here for brevity)
local glossary = {CO = "Cardiac Output", DBP = "Diastolic Blood Pressure", SBP = "Systolic Blood Pressure"}

local Set = function(list)
    local set = {}
    for i,v in ipairs(list) do
        set[v] = true
    end
    return set
end

local findSeparator = function(text)
    local separator = Set{",", "/", " "}
    for i = 1, #text do
        local s = string.sub(text,i,i)
        if separator[s] then
            return s
        end
    end
end

local separatedList = function(text)
    local found
    local t = {}
    local separator = findSeparator(text)
    if not separator then return end
    for abb in string.gmatch(text, "%P+") do
        if glossary[abb] then
            found = true
            t[#t+1] = pandoc.Span(abb, {title = abb, class = "glossary"})
            t[#t+1] = pandoc.Str(separator)
        end
    end
    if found then
        -- remove last separator if there are more then one elements in the list
        -- because otherwise the seperator is part of the element and needs to stay
        if #t > 2 then t[#t] = nil end
        return t
    end
end

local glossarize = {
    Str = function(el)
        if glossary[el.text] then
            return pandoc.Span(el.text, {title = glossary[el.text], class = "glossary"})
        else
            return separatedList(el.text)
        end
    end
}

function Pandoc(doc)
    local div = pandoc.Div(doc.blocks)
    local blocks = pandoc.walk_block(div, glossarize).content
    return pandoc.Pandoc(blocks, doc.meta)
end

推荐阅读