首页 > 解决方案 > 我想用我自己的替换html代码

问题描述

我正在使用 lxml 和 beautifulsoup 库,实际上我的目标是从整个 html 代码中翻译特定标签的文本,我想要的是,我想用翻译后的文本替换特定标签的文本。

我想为特定的 xpath 设置一个循环,所有翻译的文本都应该一个接一个地插入其中。并且 html 代码应该与翻译版本一起返回。

from bs4 import BeautifulSoup, NavigableString, Tag
import requests
import time
import pandas as pd
import translators as ts
import json
import numpy as np
import regex
import selenium
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.common.exceptions import TimeoutException
from lxml import html
import time
import lxml.html



#r=requests.get(input('Enter the URL of your HTML page:\n'))
r=requests.get('https://neculaifantanaru.com/en/qualities-of-a-leader-inner-integrity.html')
soup=BeautifulSoup(r.text, 'html.parser')
page=r.content
element = html.fromstring(page)




try:
    articles=[]
    for item in element.xpath('//*[@id="incadrare_text_mijloc_2"]/div[1]//p[@class = "text_obisnuit"]'):  

        texts=item.text_content()
        #texts=texts.split('"',100)
        #articles.append(item.text_content())
        articles.append(texts)
        translated_articles=[]
        for text in articles:
            print(text)
            output=ts.google(text, from_language='en', to_language='ro')
            translated_articles.append(output)
            
            for i,z in zip(translated_articles,soup.find_all('p', attrs={'class':'text_obisnuit'})):
                var=z.string
                var.replace_with(var.replace(var, i))

    
    #print(soup)

except Exception as e:
    print(e)

我没有从这个 xpath 得到整个文本。

element.xpath('//*[@id="incadrare_text_mijloc_2"]/div[1]//p[@class = "text_obisnuit"]')

我得到的输出:

Everything in Kevin Lomax's life changed after he was recruited by the most powerful law firm in the world, "Milton, Chadwick & Waters". Despite the fact that his mother was not agree, he accepted to provide his services of a professional lawyer to this company headed by none other than John Milton, a very powerful man with a very strange personality, which has aroused some suspicion since their first meeting.
If you saw the movie "The Devil's Advocate (1997)", perhaps you remember the end. Milton proposes to Kevin to take over his company, promising that he will have everything in the world, but with a single price - to sell his soul. But Kevin was hiding virtues that Milton did not believe that he has them.
AttributeError: 'NoneType' object has no attribute 'replace_with'

我想使用上面的 xpath 提取“attribute class=obisnuit”的 p 标签的所有文本,然后使用翻译器库对其进行翻译,并希望在属性 class=obisnuit 的 p 标签之间返回带有翻译文本的整个 html 代码。

###笔记:###

应该有一个循环在所有这些标签中插入翻译后的文本,我的意思是所有标签都应该在使用循环翻译后获得自己的文本。

我无法解释更多,请任何人指导我。

标签: pythonbeautifulsouptagslxml

解决方案


你需要更换吗?您不能简单地将字符串/contnet 设置为翻译吗?

此外,您在这里做了一些不必要的循环。你需要修复你的缩进,因为你想要的是for i,z2 级。

尝试这个:

r=requests.get('https://neculaifantanaru.com/en/qualities-of-a-leader-inner-integrity.html')
soup=BeautifulSoup(r.text, 'html.parser')

try:
    articles = soup.find_all('p', {'class':"text_obisnuit"})
    for item in articles:  

        original_text=item.text
        #print(original_text)
        translated_output=ts.google(original_text, from_language='en', to_language='ro')
        print(item)

        item.string = translated_output
            
except Exception as e:
    print(e)

# To see that it was changed
for item in articles:   
    print(item)


translated_html = str(soup)

推荐阅读