首页 > 解决方案 > Issue with gsub method in my Ruby code when trying to replace HTML tags with the URL stripped from in it

问题描述

标签: htmlrubyhtml-parsingnokogirigsub

解决方案


Because Nokogiri::XML::Element is neither a string nor a regexp. Sticking .to_s works:

puts message.gsub(
    Nokogiri::HTML.parse(message).at('a').to_s, 
    Nokogiri::HTML.parse(message).at('a')['href']
)

However, you are going to all the trouble of parsing the HTML just to search the document again as if you didn't know anything about it. Also, it will give a wrong result if you have multiple links in one message, or if your anchor tag is not formatted canonically — e.g. if you have an extra space, like this: <a href="https://www.google.com" >https://www.google.com</a>

Why not let Nokogiri work?

puts Nokogiri::HTML.fragment(message).tap { |doc|
  doc.css("a").each { |node|
    node.replace(node["href"])
  }
}.to_html

Note that I changed Nokogiri::HTML.fragment, since this is not a full HTML document (with doctype and all), which Nokogiri would feel obligated to add. Then, for each anchor node, replace it with the value of its href attribute.


推荐阅读