首页 > 解决方案 > Javascript Regex 仅删除标签

问题描述

我有一个很长的字符串,由几个 HTML 文档组成,如下所示:

<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
some head info 
</head>
<body>
<div > some content with other HTML tags that I want to preserve </div>
<body>
</html>
<html>
<div> another content with other HTML tags that I want to preserve </div>
</html>
<html xmlns="http://www.w3.org/TR/REC-html40">
<head>
some head info 
</head>
<body>
<div> some other content with other HTML tags that I want to preserve </div>
<body>
</html>

我想把它们变成这样的东西:

<div > some content with other HTML tags that I want to preserve </div>
<div> another content with other HTML tags that I want to preserve </div>
<div> some other content with other HTML tags that I want to preserve </div>

基本上我正在寻找一个正则表达式来<html> </html>从一个巨大的 html 字符串中删除标签(而不是其他/内部 html 元素)。请注意,我应该保留 html 内容并摆脱父标签。

提前致谢

(请注意,我已经进行了广泛的搜索以确保这不是重复的问题)

标签: javascripthtmlregex

解决方案


作为重要说明:https ://stackoverflow.com/a/1732454/3498950

但如果你必须,我可能会使用类似的东西/<\/?html.*?>/g

const html = `<html xmlns:v="urn:schemas-microsoft-com:vml">
<head>head info</head>
<div>other content</div>
</html>`;

console.log(html.replace(/<\/?html.*?>/g, '').trim());

并调整正则表达式:https ://regex101.com/r/EeTv68/1


推荐阅读