python - Modify all local links in html file
问题描述
I want to change links from a html page like below:
//html
<html>
<head>
<title>Hello</title>
</head>
<body>
<p>this is a simple text in html file</p>
<a href="https://google.com">Google</a>
<a href="/frontend/login/">Login</a>
<a href="/something/work/">Something</a>
</body>
</html>
//Result
<html>
<head>
<title>Hello</title>
</head>
<body>
<p>this is a simple text in html file</p>
<a href="https://google.com">Google</a>
<a href="/more/frontend/login/part/">Login</a>
<a href="/more/something/work/extra/">Something</a>
</body>
</html>
So how can I change html to result and save it as html using python ?
解决方案
我已经自己解决了。但我认为这可以帮助很多人。这就是为什么我要回答我的问题并将其公开发布
谢谢尼古拉斯。_ 他的 30-50% 解决方案对我的完整解决方案帮助很大。
import re
regex = r"href=\"\/"
test_str = ("<html>\n"
" <head>\n"
" <title>Hello</title>\n"
" </head>\n"
" <body>\n"
" <p>this is a simple text in html file</p>\n"
" <a href=\"https://google.com\">Google</a>\n"
" <a href=\"/front-end/login/\">Login</a>\n"
" <a href=\"/something/work/\">Something</a>\n"
" </body>\n"
" </html>")
subst = "href=\"/more/"
# You can manually specify the number of replacements by changing the 4th argument
result = re.sub(regex, subst, test_str, 0, re.MULTILINE)
subst2 = "\\1hello/"
regex2 = r"(href=\"/(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\(\), ]|(?:%[0-9a-fA-F][0-9a-fA-F]))+)"
result2 = re.sub(regex2, subst2, result, 0, re.MULTILINE)
if result2:
print (result2)
writtingtofile = open("solution.html","w")
writtingtofile.write(result2)
writtingtofile.close()
输出:
推荐阅读
- database - 什么时候数据一致性不是问题?
- java - 确定具有给定内角的多边形边
- excel - 如何在vba中对行进行分组?
- javascript - 将对象添加到 inlineImages sendEmail 参数
- python - 如何评估/提高来自具有不平衡数据集的神经网络的预测的准确性?
- c++ - c ++ cuda opengl不渲染vbo
- ruby-on-rails - Rails.cache.fetch 返回 nil
- asp.net - IdentityServer4 无法在生产中工作
- python - Scipy spsolve 比 matlab mldivide 慢一个数量级
- javascript - Firebase 函数 onCreate 不会写入 firestore,但 https.onRequest 会 - 相同的代码