首页 > 解决方案 > BeautifulSoup:Python:尝试从多行中提取数据

问题描述

我对编程很陌生。我正在尝试编写一个程序来抓取月球在我当地(坦帕)落下的时间,并在我输入日期时显示它。

这是我的代码:

from bs4 import BeautifulSoup
import urllib.request
def GetMoonSet():
  # setup the source
  with urllib.request.urlopen("https://www.timeanddate.com/moon/usa/tampa") as url:
    req = url.read()

  soup = BeautifulSoup(req, "html.parser")
  the_rows = soup('table', {'id': "tb-7dmn"})[0].tbody('tr')

  day1 = the_rows[0].findChildren('td')
  day2 = the_rows[1].findChildren('td')
  day3 = the_rows[2].findChildren('td')
  day4 = the_rows[3].findChildren('td')
  day5 = the_rows[4].findChildren('td')
  day6 = the_rows[5].findChildren('td')
  day7 = the_rows[6].findChildren('td')
  day8 = the_rows[7].findChildren('td')
  day9 = the_rows[8].findChildren('td')
  day10 = the_rows[9].findChildren('td')
  day11 = the_rows[10].findChildren('td')
  day12 = the_rows[11].findChildren('td')
  day13 = the_rows[12].findChildren('td')
  day14 = the_rows[13].findChildren('td')
  day15 = the_rows[14].findChildren('td')
  day16 = the_rows[15].findChildren('td')
  day17 = the_rows[16].findChildren('td')
  day18 = the_rows[17].findChildren('td')
  day19 = the_rows[18].findChildren('td')
  day20 = the_rows[19].findChildren('td')
  day21 = the_rows[20].findChildren('td')
  day22 = the_rows[21].findChildren('td')
  day23 = the_rows[22].findChildren('td')
  day24 = the_rows[23].findChildren('td')
  day25 = the_rows[24].findChildren('td')
  day26 = the_rows[25].findChildren('td')
  day27 = the_rows[26].findChildren('td')
  day28 = the_rows[27].findChildren('td')
  day29 = the_rows[28].findChildren('td')
  day30 = the_rows[29].findChildren('td')

  what_date = input("Please enter a date for this month ")

  if what_date == "1":
    print("The moon will set at " + day1[1].text)
  elif what_date == "2":
    print("The moon will set at " + day2[1].text)
  elif what_date == "3":
    print("The moon will set at " + day3[1].text)
  elif what_date == "4":
    print("The moon will set at " + day4[1].text)
  elif what_date == "5":
    print("The moon will set at " + day5[1].text)
  elif what_date == "6":
    print("The moon will set at " + day6[1].text)
  elif what_date == "7":
    print("The moon will set at " + day7[1].text)
  elif what_date == "8":
    print("The moon will set at " + day8[1].text)
  elif what_date == "9":
    print("The moon will set at " + day9[1].text)
  elif what_date == "10":
    print("The moon will set at " + day10[1].text)
  elif what_date == "11":
    print("The moon will set at " + day11[1].text)
  elif what_date == "12":
    print("The moon will set at " + day12[1].text)
  elif what_date == "13":
    print("The moon will set at " + day13[1].text)
  elif what_date == "14":
    print("The moon will set at " + day14[1].text)
  elif what_date == "15":
    print("The moon will set at " + day15[1].text)
  elif what_date == "16":
    print("The moon will set at " + day16[1].text)
  elif what_date == "17":
    print("The moon will set at " + day17[1].text)
  elif what_date == "18":
    print("The moon will set at " + day18[1].text)
  elif what_date == "19":
    print("The moon will set at " + day19[1].text)
  elif what_date == "20":
    print("The moon will set at " + day20[1].text)
  elif what_date == "21":
    print("The moon will set at " + day21[1].text)
  elif what_date == "22":
    print("The moon will set at " + day22[1].text)
  elif what_date == "23":
    print("The moon will set at " + day23[1].text)
  elif what_date == "24":
    print("The moon will set at " + day24[1].text)
  elif what_date == "25":
    print("The moon will set at " + day25[1].text)
  elif what_date == "26":
    print("The moon will set at " + day26[1].text)
  elif what_date == "27":
    print("The moon will set at " + day27[1].text)
  elif what_date == "28":
    print("The moon will set at " + day28[1].text)
  elif what_date == "29":
    print("The moon will set at " + day29[1].text)
  elif what_date == "30":
    print("The moon will set at " + day30[1].text)
  else:
     print("Please enter a different number (e.g. 4, 5, 28, 30")

GetMoonSet()

我确信它看起来不是最好的,但我在提取数据时遇到了麻烦。从第 4 天到第 17 天,第一列发生了月亮升起。当我要求提供数据时,由于新信息,它给了我一栏。我知道我可以将 4-17 更新为 day4[2].text 但这将在下个月有所不同,并且不再起作用。

当我输入 2 时,它显示:月亮将在上午 10:22 落下

当我输入 4 时,它显示:月亮将设置在↑(99°)

我这样做很难吗?有没有办法只通过 find_all 提取月落时间?

谢谢!

标签: pythonbeautifulsoup

解决方案


该表看起来像是为不被解析而构建的!看起来这title可能是您需要的关键:

for i in soup.table.tbody.find_all(class_="pdr0", title=re.compile("^The Moon sets ")):
  print(i.get_text())

而且,为了使您尝试的内容更加紧凑:

msets = {}
title=re.compile("^The Moon sets ")
for row in soup.table.tbody.find_all('tr'):
  day  = row['data-day']
  mset = row.find(title=title)
  if day and mset: msets[day] = mset.get_text()

what_date = input("Please enter a date for this month: ")
if what_date in msets:
  print("the moon will set at " + msets[what_date])
else:
  print("i don't know about that date.")

作为编程时的经验法则——如果你发现自己一遍又一遍地重复同样的事情,你可能需要一个循环。


推荐阅读