python - BS4:解析 HTML,存储解析的元素并仅在网页上发布新信息时作为文本发送
问题描述
我目前在使用下面的 elems 变量时遇到问题。本质上,我正在尝试创建脚本来抓取下面的网页并发送带有指定解析的 html 变量“v”的文本。它目前可以这样做,但我想这样做,以便在更新网页时,脚本会抓取新数据并将其发送(最终我将添加代码以使其每天运行一次)。为了进行这种迭代,我试图通过在每个段落结尾“]”处拆分来分解 elems 字符串,然后创建一个列表并让它调用 list[0],这只是在我运行 str(elems ) 它只返回'[]'。我很难让这段代码发送最近添加的段落。
import twilio
from twilio.rest import Client
import json
import bs4
import requests
from pprint import pprint
data = json.loads(open('secret.json', 'r').read())
# secret.json password storage
def get_elems_from_document(document):
pass
res = requests.get('http://www.sharkresearchcommittee.com/pacific_coast_shark_news.htm')
res.raise_for_status()
soup = bs4.BeautifulSoup(res.text, 'html.parser')
for i in range(1, 100): # attempting to grab the most recent added paragraph
elems = soup.select('body > div > div > center > table > tr > td:nth-of-type(2) > p:nth-of-type({})'
.format(i))
if '—' in str(elems):
v = elems[0].text
#print("{}th element: ".format(i))
#pprint(elems)
# trying to take the elems variable, turn into string and split each paragraph up, then return the first in the list
x = str(elems)
y = x.split(']')
f = y[0]
# adding a set
accountSID = data['sid']
authToken = data['authToken']
twilioCli = Client(accountSID, authToken)
myTwilioNumber = data['twilioNumber']
myCellPhone = data['myNumber']
message = twilioCli.messages.create(body = 'Warning: Shark sighting off the coast of ' + **v** + 'Beach !', from_=myTwilioNumber, to=myCellPhone)
解决方案
您可以使用此脚本获取大部分新闻(最新新闻存储在 中news[0]
):
import json
import bs4
import requests
res = requests.get('http://www.sharkresearchcommittee.com/pacific_coast_shark_news.htm')
res.raise_for_status()
soup = bs4.BeautifulSoup(res.text, 'html.parser')
news = [p.text.strip() for p in soup.select('h1 ~ p') if p.find('font')]
for p in news:
print(p)
print('-' * 80)
# most recent news is stored in news[0]
将输出:
Ventura — On July 10, 2018 Andy Kastenberg reported the following; "I took a paddle from North of Emma Wood State Beach in Ventura to the South part of the inner reef at about 1:00 PM PST. A small South swell mixed with some wind swell was showing. The West wind was just starting to puff but the water texture was still pretty glassy. Outside air temperature was an estimated 80+ degrees Fahrenheit and the water seemed to be nearing 70 degrees Fahrenheit. After catching a wave or two, a 6 foot shark appeared in the face of a set wave (four foot face). Without expertise, my guess is that it was a young Great White Shark. Another guy in the water said that he had seen one in the area for several days prior as did another surfer back up the beach where I had parked." Please report any shark sighting, encounter, or attack to the Shark Research Committee.
--------------------------------------------------------------------------------
Goleta — On July 2, 2018 Aaron Lauer reported the following; "I was working off Goleta on platform Holly, about 2 miles from the Santa Barbara Coast at Coal Oil Point. The platform is in 211 feet of water. I sighted a White Shark, approximately 12 feet long, a dark grey body and a white belly with a dorsal fin about 18 inches high. There was also a small white tip on the tail fin. It circled the platform slowly once and then headed off to the South, following the coast toward Santa Barbara. A consensus of opinions by myself and co-workers estimated the weight to be in excess of 400 pounds. A number of seals reside on the platform which might be the reason the shark was attracted to it. None of the seals were interested in leaving the platform during this time." Please report any shark sighting, encounter, or attack to the Shark Research Committee.
--------------------------------------------------------------------------------
Oceanside — On June 25, 2018 Julie Wolfe was paddling her outrigger canoe 2 miles due West of Oceanside Harbor entrance. It was 6:00 PM and she had been on the water 25 – 30 minutes. The late afternoon sky was clear with an estimated temperature of 70 degrees Fahrenheit. The ocean was calm with an estimated temperature of 68 degrees Fahrenheit and a mild breeze from the West creating a bump to the sea surface. No marine mammals were observed in the area. Wolfe reported; "I was paddling by myself when my canoe was hit HARD from underneath. I immediately turned around and paddled as fast as I could toward shore. I never saw the shark and wasn't sure if it was following me or not until about a minute later it tugged at my paddle! I made it into the harbor safe but my carbon fiber canoe has bite marks through and through . My canoe took on water. Terrifying two mile sprint in!" 'Interspace' measurements of the tooth impressions in her outrigger canoe suggest a White Shark 11 – 12 feet in length. This is the first confirmed unprovoked shark attack reported in 2018 from the Pacific Coast of North America. Please report any shark sighting, encounter, or attack to the Shark Research Committee.
--------------------------------------------------------------------------------
...and so on
推荐阅读
- android - android-jetifier: "无法解析所有工件"; 新项目上的“改造工件失败”
- python - 有没有办法返回 Soup 字典中所有元素的所有第一个子元素?
- c# - 在 C# 中使用 Finisar.SQLite 在 SQLite Alter Table 上出现语法错误
- python - 在python中处理两个列表的索引
- git - Git 无法锁定 ref 'HEAD'
- reactjs - 如何将我的 React 项目部署到生产环境?
- swift - 如何使用滚动更改导航栏和 BarButtonItem 的颜色
- typescript - 强制索引键为字符串类型(类型没有索引签名)
- python - 为熊猫打印设置列宽
- microsoft-dynamics - 如何在 WSO2 ESB 与 MS Dynamic 365 之间建立连接?