首页 > 解决方案 > 使用python在段落标签中提取两个句子

问题描述

在每个paragraph tag我都将我的本地语言提取到 alist中,我如何将含义和翻译提取到另一个列表

from bs4 import BeautifulSoup
import re


html = """
[<div class="excerpt">
 <p>A ki i fi ara eni se oogun alokunna. Translation: One does not use oneself as an ingredient in a medicine requiring that the ingredients be pulverized. Meaning; Self-preservation is a compulsory project for all.</p>
 </div>, <div class="excerpt">
 <p>A ki i fi ai-mo-we mookun. Translation: One does not dive under water without knowing how to swim. Meaning: Never engage in a project for which you lack the requisite skills.</p>
 </div>, <div class="excerpt">
 <p>A fun o lobe o tami si; o gbon ju olobe lo. Translation: You are given some stew and you add water; you must be wiser than the cook. Meaning: Adding water is a means of stretching stew. A person who thus stretches the stew he or she is given would seem to know better than the person who served it how much would suffice for the meal.</p>
 </div>] 
       """

soup = BeautifulSoup(html,'html.parser')

yoruba = []
translation = []
meaning = []
for i in soup5.findAll("div",'excerpt'):
    a = i.get_text(strip=True).split('Translation')[0].strip().replace('\xa0',' ')
    yoruba.append(a)

标签: pythonhtmlweb-scrapingbeautifulsoup

解决方案


您可以使用正则表达式和一些字符串操作来实现这一点。

试试这个代码。

    html = """
[<div class="excerpt">
 <p>A ki i fi ara eni se oogun alokunna. Translation: One does not use oneself as an ingredient in a medicine requiring that the ingredients be pulverized. Meaning; Self-preservation is a compulsory project for all.</p>
 </div>, <div class="excerpt">
 <p>A ki i fi ai-mo-we mookun. Translation: One does not dive under water without knowing how to swim. Meaning: Never engage in a project for which you lack the requisite skills.</p>
 </div>, <div class="excerpt">
 <p>A fun o lobe o tami si; o gbon ju olobe lo. Translation: You are given some stew and you add water; you must be wiser than the cook. Meaning: Adding water is a means of stretching stew. A person who thus stretches the stew he or she is given would seem to know better than the person who served it how much would suffice for the meal.</p>
 </div>] 
       """

soup = BeautifulSoup(html,'html.parser')

yoruba = []
translation = []
meaning = []
for i in soup.findAll("div",'excerpt'):
    for item in i.find_all('p'):

        data=re.sub(r'Translation:\s*', '', item.get_text(strip=True))
        translation.append(data.split('.')[1].strip())
        data1=re.sub(r'Meaning?\s*', '', data)
        if ':' in data1:
            meaning.append(data1.split(':')[-1].strip())
        if (';' in data1) and (':' not in data1) :
            meaning.append(data1.split(';')[-1].strip())

print(translation)
print(meaning)

输出翻译

['One does not use oneself as an ingredient in a medicine requiring that the ingredients be pulverized', 'One does not dive under water without knowing how to swim', 'You are given some stew and you add water; you must be wiser than the cook']

意义

['Self-preservation is a compulsory project for all.', 'Never engage in a project for which you lack the requisite skills.', 'Adding water is a means of stretching stew. A person who thus stretches the stew he or she is given would seem to know better than the person who served it how much would suffice for the meal.']

推荐阅读