lua - 在 Lua 中解析和计算多行文本中的单词
问题描述
假设我有多行文本:
str = [[
The lazy dog sleeping on the yard.
While a lazy old man smoking.
The yard never green again.
]]
我可以使用以下方法拆分每个单词:
for w in str:gmatch("%S+") do print(w) end
但是我怎样才能得到结果作为一个例子:
The = 3 words, line 1,3
Lazy = 2 words, line 1,2
Dog = 1 word, line 1
..and so on?
谢谢
解决方案
您可以像已经在计算单词一样检测到\n
使用情况。gmatch
模式类似于"[^\n]+"
,代码类似于:
local str = [[
The lazy dog sleeping on the yard.
While a lazy old man smoking.
The yard never green again.
]]
local words = {}
local lines = {}
local line_count = 0
for l in str:gmatch("[^\n]+") do
line_count = line_count + 1
for w in l:gmatch("[^%s%p]+") do
w = w:lower()
words[w] = words[w] and words[w] + 1 or 1
lines[w] = lines[w] or {}
if lines[w][#lines[w]] ~= line_count then
lines[w][#lines[w] + 1] = line_count
end
end
end
for w, count in pairs(words) do
local the_lines = ""
for _,line in ipairs(lines[w]) do
the_lines = the_lines .. line .. ','
end
--The = 3 words, line 1,3
print(w .." = " .. count .. " words , lines " .. the_lines)
end
完整输出,请注意,我还将您用于捕获单词的模式更改为"[^%s%p]+"
我这样做是为了消除与.
吸烟、再次和院子的联系。
smoking = 1 words , lines 2,
while = 1 words , lines 2,
green = 1 words , lines 3,
never = 1 words , lines 3,
on = 1 words , lines 1,
lazy = 2 words , lines 1,2,
the = 3 words , lines 1,3,
again = 1 words , lines 3,
man = 1 words , lines 2,
yard = 2 words , lines 1,3,
dog = 1 words , lines 1,
old = 1 words , lines 2,
a = 1 words , lines 2,
sleeping = 1 words , lines 1,
推荐阅读
- c# - HttpClient and long running Windows Services - What is the potential for thread deadlock and how to circumvent it
- javascript - 通过 API 上传图片到 imgur - Javascript
- reactjs - 如何在反应中使用 react-jsonschema-form?
- java - 无法自动接线。找不到“PetService”类型的 bean
- npm - npm 更新检查中的问题失败
- apache - 为 Maven 站点使用 Web 服务器
- ansible - Ansible 命令模块中的通配符
- go - 如何将golang中的用户输入整数转换为函数
- c++ - 如何在 C++ 中对 long 进行哈希处理?
- javascript - 我可以使用 NodeJS 更新 WHMCS 中的自定义字段吗?