首页 > 解决方案 > Modify all local links in html file

问题描述

I want to change links from a html page like below:

//html
<html>
    <head>
        <title>Hello</title>
    </head>
    <body>
        <p>this is a simple text in html file</p>
        <a href="https://google.com">Google</a>
        <a href="/frontend/login/">Login</a>
        <a href="/something/work/">Something</a>
    </body>
 </html>



//Result
    <html>
        <head>
            <title>Hello</title>
        </head>
        <body>
            <p>this is a simple text in html file</p>
            <a href="https://google.com">Google</a>
            <a href="/more/frontend/login/part/">Login</a>
            <a href="/more/something/work/extra/">Something</a>
        </body>
     </html>

So how can I change html to result and save it as html using python ?

标签: pythonregexpython-3.xbeautifulsoup

解决方案


我已经自己解决了。但我认为这可以帮助很多人。这就是为什么我要回答我的问题并将其公开发布

谢谢尼古拉斯。_ 他的 30-50% 解决方案对我的完整解决方案帮助很大。

import re

regex = r"href=\"\/"

test_str = ("<html>\n"
    "    <head>\n"
    "        <title>Hello</title>\n"
    "    </head>\n"
    "    <body>\n"
    "        <p>this is a simple text in html file</p>\n"
    "        <a href=\"https://google.com\">Google</a>\n"
    "        <a href=\"/front-end/login/\">Login</a>\n"
    "        <a href=\"/something/work/\">Something</a>\n"
    "    </body>\n"
    " </html>")

subst = "href=\"/more/"

# You can manually specify the number of replacements by changing the 4th argument
result = re.sub(regex, subst, test_str, 0, re.MULTILINE)

subst2 = "\\1hello/"
regex2 = r"(href=\"/(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\(\), ]|(?:%[0-9a-fA-F][0-9a-fA-F]))+)"
result2 = re.sub(regex2, subst2, result, 0, re.MULTILINE)

if result2:
    print (result2)

writtingtofile = open("solution.html","w")
writtingtofile.write(result2)
writtingtofile.close()

输出:

在此处输入图像描述


推荐阅读