首页 > 解决方案 > 通过python从变量中通过id在html页面中查找元素

问题描述

伙计们,我编写了一个代码,通过查找元素 id 从我的网站中提取一些数据,我从我的数据库中获取帖子 id 并保存在一个变量中,但它只返回 None !

这是 HTML 代码:我从 db 获取 post id 为 int18448但我应该像这样为 html id 制作真正的格式post-18448

<article class="post-18448 post type-post status-publish format-standard hentry category-tv-shows" id="post-18448">

而python代码是:

import mysql.connector
from bs4 import BeautifulSoup as wsoup
from urllib.request import urlopen as wreq


lucas_db = mysql.connector.connect(
    host='localhost',
    user="root",
    password="xxxxxxxxxxx",
    database="Lucas_database")

mycursor_mov = lucas_db.cursor()
mycursor_mov.execute(
    "SELECT Post_ID FROM Lucas_t_db WHERE Post_ID IS NOT NULL AND Post_status IS NOT NULL ORDER BY Published_Time ASC ") #AND Post_ID IS NOT NULL AND Post_status IS NULL ")
myresult_mov = mycursor_mov.fetchall()

myresult_mov = [a[0] for a in myresult_mov]

print("DB post id query:",myresult_mov[-1:]

id_value = myresult_mov[-1:]

me = str(id_value[0])

print("none braket post id",me)

z = '"post-'+me+'"'
print("true fromat id: ",z)

url = "http://ezddl.com/"

url_req = wreq(url)
page_read = url_req.read()
url_req.close()

page_soup = wsoup(page_read, "html.parser")

Entry = page_soup.main.find('article',{"id":z})

print("extracted data",Entry)

代码的结果是:

DB post id query: [18448]
none braket post id: 18448
true fromat id:  "post-18448"
extracted data:  None

***Repl Closed***

但是当我z像这样设置变量时z="post-18448",代码的结果是真的!!!!:

*same codes*


z ="post-18448"

Entry = page_soup.main.find('article',{"id":z})

print("extracted data: ",Entry)

新代码的结果:

DB post id query: [18448]
none braket post id: 18448
true fromat id:  "post-18448"
extracted data:  <article class="post-18448 post type-post status-publish format-standard hentry category-tv-shows" id="post-18448">

***Repl Closed***

我不明白为什么当我运行第一个代码时它给了我none 但第二个代码给了我真实的结果

标签: pythonbeautifulsoup

解决方案


你的元素id不是"post-18448",它是post-18448。在您的第一个示例中,您在匹配的字符串中任意添加双引号。当您手动定义它时,您已经删除了它们。

从您的设置中删除双引号z

z = 'post-'+me

推荐阅读