javascript - 从 RSS Feed XML 中提取文本标签(使用 Javascript/React)
问题描述
我刚刚解析了一个 RSS 提要(Upwork's),并且我将标题、链接等工作项目数据点解析为数据点(items.title、items.link),但是我需要提取的大部分数据工作(其类别、技能等)作为一大块文本转储在“内容”数据项中。一般来说,我需要的信息的标题是标签和信息本身只是一个文本块,后跟一个标签。
这是来自 XML (items.content) 的示例:
We are looking for a developer with capabilities as a Wordpress Frontend/Backend Developer or Full Stack Wordpress Developer. <br /><br /> It is important for us to have experience with hosting, SSL, and Pagebuilders (Elementor/Visual Composer).<br /><br /><b>Hourly Range</b>: $20.00-$45.00 <br /><b>Posted On</b>: December 16, 2020 23:12 UTC<br /><b>Category</b>: Full Stack Development<br /><b>Skills</b>:Website Development, API, Website Redesign, WordPress Plugin, Website Optimization, Google Analytics, Java, JavaScript, PHP, Ruby, Scala, Kotlin, Python, SQL, Very Small (1-9 employees), CSS, Website Security, HTML, Graphic Design, Web Design, jQuery, Adobe Photoshop, Adobe Illustrator <br /><b>Location Requirement</b>: Only freelancers located in the United States may apply. <br /><b>Country</b>: United States <br /><a href="https://www.upwork.com/jobs/Ongoing-Website-development-specialist_%7E018e7e903a64f4e78e?source=rss">click to apply</a>
例如,如何提取标签“Hourly Range”以及与之相关的数据:($20.00 - $45.00)?为了增加复杂性,理想情况下,我需要能够将列出的每个项目(例如 HTML、CSS)分离成它们自己的单独日期项目。
我不知道如何阅读此文本并以有组织的方式提取我需要的数据。任何帮助表示赞赏!
解决方案
DOM 中的任何东西都是一个节点。标签是b
元素节点。和他们的数据文本节点兄弟。
const snippet = (new DOMParser()).parseFromString(getHTML(), 'text/html');
const data = {};
for (const label of snippet.querySelectorAll('b')) {
const name = normalizeSpace(label.textContent);
let value = normalizeSpace(
label.nextSibling.textContent.replace(/^:/, '')
);
if (name === 'Skills') {
value = value.split(/\s*,\s*/);
}
data[name] = value;
}
console.log(data);
function normalizeSpace(value) {
return value.replace(/\s{2,}/g, ' ').trim();
}
function getHTML(){
return `We are looking for a developer with capabilities as a Wordpress
Frontend/Backend Developer or Full Stack Wordpress Developer.
<br /><br /> It is important for us to have experience with hosting, SSL,
and Pagebuilders (Elementor/Visual Composer).<br /><br /><b>Hourly
Range</b>: $20.00-$45.00 <br /><b>Posted On</b>: December 16, 2020 23:12 UTC
<br /><b>Category</b>: Full Stack Development<br /><b>Skills</b>:Website
Development, API, Website Redesign, WordPress Plugin, Website Optimization,
Google Analytics, Java, JavaScript, PHP, Ruby, Scala, Kotlin, Python, SQL,
Very Small (1-9 employees), CSS, Website Security, HTML, Graphic Design, Web
Design, jQuery, Adobe Photoshop, Adobe Illustrator <br /><b>Location
Requirement</b>: Only freelancers located in the United States may apply.
<br /><b>Country</b>: United States <br />
<a href="https://www.upwork.com/jobs/Ongoing-Website-development-specialist_%7E018e7e903a64f4e78e?source=rss">click to apply</a>`;
}