首页 > 解决方案 > 使用python从html中抓取双引号内的字符串

问题描述

这张图片中,我只想刮掉突出显示的部分。

我试过这个:-

from selenium import webdriver
from bs4 import BeautifulSoup
import re;
driver = webdriver.Chrome(executable_path=r"C:\Users\sandheep\OneDrive\desktop\chromedriver.exe")
driver.get("https://www.emojimeanings.net/list-smileys-people-whatsapp")
content = driver.page_source
soup = BeautifulSoup(content, "lxml")
for i in soup.find_all("td"):
    print(re.findall(r'"([^"]*)"', i))

这给了我这个错误:-

Traceback (most recent call last):
File "e:\Pycharm\quotes_author_WebScraping.py", line 16, in <module>
print(re.findall(r'"([^"]*)"', i))
File "C:\Users\sandheep\AppData\Local\Programs\Python\Python39\lib\re.py", line 241, in findall
return _compile(pattern, flags).findall(string)
TypeError: expected string or bytes-like object

提前致谢!!!

标签: pythonweb-scraping

解决方案


这是你的解决方案!!!

from bs4 import BeautifulSoup
import json

r = requests.get("https://www.emojimeanings.net/list-smileys-people-whatsapp")

soup = BeautifulSoup(r.text, "lxml")

emojiLink = []

for tableRow in soup.find_all("tr", attrs={"class": "ugc_emoji_tr"}):
    for tabledata in tableRow.findChildren("td"):
        if tabledata.has_attr("id"):
            k = tabledata.text.strip().split('\n')[-1]
            l = k.lstrip()
            emojiDescription.append(l)
print(emojiLink, emojiTitle, emojiDescription)

推荐阅读