首页 > 解决方案 > BS4:解析 HTML,存储解析的元素并仅在网页上发布新信息时作为文本发送

问题描述

我目前在使用下面的 elems 变量时遇到问题。本质上,我正在尝试创建脚本来抓取下面的网页并发送带有指定解析的 html 变量“v”的文本。它目前可以这样做,但我想这样做,以便在更新网页时,脚本会抓取新数据并将其发送(最终我将添加代码以使其每天运行一次)。为了进行这种迭代,我试图通过在每个段落结尾“]”处拆分来分解 elems 字符串,然后创建一个列表并让它调用 list[0],这只是在我运行 str(elems ) 它只返回'[]'。我很难让这段代码发送最近添加的段落。

import twilio
from twilio.rest import Client
import json
import bs4
import requests
from pprint import pprint

data = json.loads(open('secret.json', 'r').read())
# secret.json password storage

def get_elems_from_document(document):
    pass

res = requests.get('http://www.sharkresearchcommittee.com/pacific_coast_shark_news.htm')
res.raise_for_status()
soup = bs4.BeautifulSoup(res.text, 'html.parser')

for i in range(1, 100): # attempting to grab the most recent added paragraph 

    elems = soup.select('body > div > div > center > table > tr > td:nth-of-type(2) > p:nth-of-type({})'
    .format(i))

    if '—' in str(elems):
        v = elems[0].text

        #print("{}th element: ".format(i))
        #pprint(elems)

# trying to take the elems variable, turn into string and split each paragraph up, then return the first in the list
x = str(elems)
y = x.split(']')
f = y[0]

# adding a set 



accountSID = data['sid']
authToken = data['authToken']
twilioCli = Client(accountSID, authToken)

myTwilioNumber = data['twilioNumber']
myCellPhone = data['myNumber']

message = twilioCli.messages.create(body = 'Warning: Shark sighting off the coast of ' + **v** + 'Beach !', from_=myTwilioNumber, to=myCellPhone)

标签: pythonbeautifulsoup

解决方案


您可以使用此脚本获取大部分新闻(最新新闻存储在 中news[0]):

import json
import bs4
import requests

res = requests.get('http://www.sharkresearchcommittee.com/pacific_coast_shark_news.htm')
res.raise_for_status()
soup = bs4.BeautifulSoup(res.text, 'html.parser')

news = [p.text.strip() for p in soup.select('h1 ~ p') if p.find('font')]

for p in news:
    print(p)
    print('-' * 80)

# most recent news is stored in news[0]

将输出:

Ventura   —    On July 10, 2018 Andy Kastenberg reported the following; "I took a paddle from North  of Emma Wood State Beach in Ventura to the South part of the inner reef at  about 1:00 PM PST. A small South swell mixed with some wind swell was showing.  The West wind was just starting to puff but the water texture was still pretty  glassy. Outside air temperature was an estimated 80+ degrees Fahrenheit and the water  seemed to be nearing 70 degrees Fahrenheit. After catching a wave or two, a 6  foot shark appeared in the face of a set wave (four foot face). Without  expertise, my guess is that it was a young Great White Shark. Another guy  in the water said that he had seen one in the area for several days prior as  did another surfer back up the beach where I had parked." Please report any shark sighting, encounter, or  attack to the Shark Research Committee.
--------------------------------------------------------------------------------
Goleta   —    On July 2, 2018 Aaron Lauer reported the following; "I was working off Goleta on platform Holly,  about 2 miles from the Santa Barbara Coast at Coal Oil Point. The platform is in 211 feet of  water.  I sighted a White Shark, approximately 12 feet long, a dark grey  body and a white belly with a dorsal fin about 18 inches high.  There was also a small white tip on the tail fin. It circled the platform slowly once and then  headed off to the South, following the coast toward Santa Barbara. A consensus  of opinions by myself and co-workers estimated the weight to be in excess of  400 pounds. A number of seals reside on the platform which might be the reason  the shark was attracted to it. None of the seals were interested in leaving the  platform during this time." Please report any shark sighting,  encounter, or attack to the Shark Research Committee.
--------------------------------------------------------------------------------
Oceanside   —    On  June 25, 2018 Julie Wolfe was paddling her outrigger canoe 2 miles due West of  Oceanside Harbor entrance. It was 6:00 PM and she had been on the water 25   – 30  minutes. The late afternoon sky was clear with an estimated temperature of 70  degrees Fahrenheit. The ocean was calm with an estimated temperature of 68  degrees Fahrenheit and a mild breeze from the West creating a bump to the sea  surface. No marine mammals were observed in the area. Wolfe reported; "I was paddling by myself when my canoe was hit HARD  from underneath. I immediately  turned around and paddled as fast as I could toward shore. I never saw the shark and wasn't sure if it was  following me or not until about a minute  later it tugged at my paddle! I made it into the harbor safe but my carbon fiber canoe has bite marks through and  through . My canoe took on water. Terrifying  two mile sprint in!" 'Interspace'  measurements of the tooth impressions in her outrigger canoe suggest a White  Shark 11   – 12 feet in length. This is the first confirmed  unprovoked shark attack reported in 2018 from the Pacific Coast of North  America. Please report any shark sighting, encounter, or attack to the Shark  Research Committee.
--------------------------------------------------------------------------------

...and so on

推荐阅读