regex - I need to use RegEx to find a speciffic word in HTML page?
问题描述
I'm trying to extract a specific word (that might change) which comes after a permanent expression. I want to extract the name Taldor
in this code:
<h4 class="t-16 t-black t-normal">
<span class="visually-hidden">Company Name</span>
<span class="pv-entity__secondary-title">Taldor</span>
</h4>
For now I able to find <h4 class="t-16 t-black t-normal">
using this regex:
(?<=<h4 class="t-16 t-black t-normal">).*
Will be glad for any kind of advice.
解决方案
I'd suggest you to use an HTML parsing library like Jsoup in Java or beautifulsoup in Python to parse HTML instead of using regex for this reason
Following is the kind of code that does the job for you,
String s = "<h4 class=\"t-16 t-black t-normal\">\r\n" +
" <span class=\"visually-hidden\">Company Name</span>\r\n" +
" <span class=\"pv-entity__secondary-title\">Taldor</span>\r\n" +
" </h4>";
Document doc = Jsoup.parse(s);
for (Element element : doc.getElementsByClass("pv-entity__secondary-title")) {
System.out.println(element.text());
break;
}
Prints,
Taldor
In worst case, if you are doing some quick and dirty work, you can do this temporary solution using regex but it is surely not recommended thing to do.
<span class="pv-entity__secondary-title">(.*?)<\/span>
Use this regex and capture your data from group1.
推荐阅读
- .net - 大型 DBML(Linq-to-SQL)在编辑时损坏
- ios - Swift - 如何在 iOS 应用中检测 CarPlay 的连接/断开状态?
- html - 短划线和十字符号的 SVG 模式
- python - 获取包含python中一个字符的年份月份
- vue.js - IdentityServer4 使用 JWT 令牌和 [Authorize] 调用后端
- sql - 为数据范围内的每个客户 ID 创建月份列
- javascript - 如何使 onChange 事件对来自数据库而不是来自用户操作 Jquery 的预选数据起作用
- python - python - 当forloop放置在python中的while循环中时,如何避免在pandas数据框中重复列标题
- xamarin - 如何在 SearchBar 控件 Xamarin 表单中隐藏清除按钮图标
- javascript - 使用时刻 js 将日期范围转换为周范围