ruby - Process Jekyll content to replace first occurrence of any post title with a hyperlink of the post with that title
问题描述
What I'm trying to do
I am building a Jekyll ruby plugin that will replace the first occurrence of any word in the post copy text content with a hyperlink linking to the URL of a post by the same name.
The problems I'm having
I've gotten this to work but I can't figure out two problems in the process_words
method:
- How to only search for a post title in the main content copy text of the post, and not the meta tags before the post or the table of contents (which is also generated before main post copy text)? I can't get this to work with Nokigiri, even though that seems to be the tool of choice here.
- If a post's URL is not at
post.data['url']
, where is it? - Also, is there a more efficient, cleaner way to do this?
The current code works but will replace the first occurrence even if it's the value of an HTML attribute, like an anchor or a meta tag.
Example result
We have a blog with 3 posts:
- Hobbies
- Food
- Bicycles
And in the "Hobbies" post body text, we have a sentence with each word appearing in it for the first time in the post, like so:
I love mountain biking and bicycles in general.
The plugin would process that sentence and output it as:
I love mountain biking and <a href="https://example.com/link/to/bicycles/">bicycles</a> in general.
My current code (UPDATED 1)
# _plugins/hyperlink_first_word_occurance.rb
require "jekyll"
require 'uri'
module Jekyll
# Replace the first occurance of each post title in the content with the post's title hyperlink
module HyperlinkFirstWordOccurance
POST_CONTENT_CLASS = "page__content"
BODY_START_TAG = "<body"
ASIDE_START_TAG = "<aside"
OPENING_BODY_TAG_REGEX = %r!<body(.*)>\s*!
CLOSING_ASIDE_TAG_REGEX = %r!</aside(.*)>\s*!
class << self
# Public: Processes the content and updates the
# first occurance of each word that also has a post
# of the same title, into a hyperlink.
#
# content - the document or page to be processes.
def process(content)
@title = content.data['title']
@posts = content.site.posts
content.output = if content.output.include? BODY_START_TAG
process_html(content)
else
process_words(content.output)
end
end
# Public: Determines if the content should be processed.
#
# doc - the document being processes.
def processable?(doc)
(doc.is_a?(Jekyll::Page) || doc.write?) &&
doc.output_ext == ".html" || (doc.permalink&.end_with?("/"))
end
private
# Private: Processes html content which has a body opening tag.
#
# content - html to be processes.
def process_html(content)
content.output = if content.output.include? ASIDE_START_TAG
head, opener, tail = content.output.partition(CLOSING_ASIDE_TAG_REGEX)
else
head, opener, tail = content.output.partition(POST_CONTENT_CLASS)
end
body_content, *rest = tail.partition("</body>")
processed_markup = process_words(body_content)
content.output = String.new(head) << opener << processed_markup << rest.join
end
# Private: Processes each word of the content and makes
# the first occurance of each word that also has a post
# of the same title, into a hyperlink.
#
# html = the html which includes all the content.
def process_words(html)
page_content = html
@posts.docs.each do |post|
post_title = post.data['title'] || post.name
post_title_lowercase = post_title.downcase
if post_title != @title
if page_content.include?(" " + post_title_lowercase + " ") ||
page_content.include?(post_title_lowercase + " ") ||
page_content.include?(post_title_lowercase + ",") ||
page_content.include?(post_title_lowercase + ".")
page_content = page_content.sub(post_title_lowercase, "<a href=\"#{ post.url }\">#{ post_title.downcase }</a>")
elsif page_content.include?(" " + post_title + " ") ||
page_content.include?(post_title + " ") ||
page_content.include?(post_title + ",") ||
page_content.include?(post_title + ".")
page_content = page_content.sub(post_title, "<a href=\"#{ post.data['url'] }\">#{ post_title }</a>")
end
end
end
page_content
end
end
end
end
Jekyll::Hooks.register %i[posts pages], :post_render do |doc|
# code to call after Jekyll renders a post
Jekyll::HyperlinkFirstWordOccurance.process(doc) if Jekyll::HyperlinkFirstWordOccurance.processable?(doc)
end
Update 1
Updated my code with @Keith Mifsud's advice. Now using either the sidebar's aside
element or the page__content
class to select body content to work on.
Also improved checking and replacing the correct term.
PS: The code base example I started with working on my plugin was @Keith Mifsud's jekyll-target-blank plugin
解决方案
这段代码看起来很熟悉 :) 我建议您查看 Rspecs 测试文件来测试您的问题:https ://github.com/keithmifsud/jekyll-target-blank
我会尽力回答您的问题,抱歉,我无法在撰写本文时亲自测试这些问题。
如何仅在帖子的主要内容复制文本中搜索帖子标题,而不是帖子或目录之前的元标记(这也是在主要帖子复制文本之前生成的)?我无法让它与 Nokigiri 一起使用,尽管这似乎是这里的首选工具。
您的要求是:
1)忽略<body></body>
标签外的内容。
这似乎已经在process_html()
方法中实现了。此方法说明了唯一的过程body_content
,它应该按原样工作。你有测试吗?你是怎么调试的?相同的字符串拆分在我的插件中起作用。即只处理正文中的内容。
2) 忽略目录 (TOC) 中的内容。我建议您process_html()
通过进一步拆分body_content
变量来扩展该方法。在 TOC 的开始和结束标记之间搜索内容(按 id、css 类等)并将其排除,然后将其添加回process_words
字符串之前或之后的位置。
3) 是否使用Nokigiri插件?这个插件非常适合解析 html。我认为您正在解析字符串,然后创建 html。所以 vanilla Ruby 和 URI 插件就足够了。如果需要,您仍然可以使用它,但它不会比在 ruby 中拆分字符串更快。
如果帖子的 URL 不在 post.data['url'] 中,它在哪里?
我认为您应该有一种方法来获取所有帖子标题,然后将“单词”与数组匹配。您可以从文档本身获取所有帖子集合, doc.site.posts
然后 foreach 帖子返回标题。该process_words()
方法可以检查每个工作以查看它是否与数组中的项目匹配。但是,如果标题由多个单词组成怎么办?
另外,有没有更有效、更清洁的方法来做到这一点?
到目前为止,一切都很好。我将从解决问题开始,然后针对速度和编码标准进行重构。
我再次建议您使用测试来帮助您解决这个问题。
让我知道我是否可以提供更多帮助:)
推荐阅读
- python - 一种算法,用于查找 n 个整数的排列,使得对于任何两个数 a[i] 和 a[j] (i < j),它们的平均值不在它们之间
- excel - Excel:从表格中给定的多个点绘制多条曲线
- react-native - 'react-native' 不被识别为内部或外部命令、可运行程序或批处理文件,用于 AND 未处理的 Promise 错误
- android - android导航组件的问题
- npm - 如何解决 install -g yarn 上的 npm 错误?
- javascript - 使用数组值作为常量
- mysql - 查找最频繁的值,然后从 MySQL 中的结果中计算另一个值
- javascript - 如何在与初始域不同的跨域上下文中使用 Web Worker 存储/读取数据?
- laravel - Laravel - 在控制器和电子邮件通知之间共享代码的最佳方式
- xamarin.forms - SetIcon 改变颜色