首页 > 解决方案 > 如何切片抓取的 xml 数据并在 python 中创建列表

问题描述

我正在寻找一种更快的方法来创建下面创建的“时间”和“临时”列表。如您所见,我不得不逐项创建列表,因为我不知道如何对列表进行切片或创建一个 for 循环来制作它们。

例如,使用:

'time = f_soup.forecast.select('time')[0:]'

'time[0]['from']' - #works to select a single item, but
'time[0:3]['from']' - #causes "TypeError: list indices must be integers or slices, not str"       

该数据是来自https://openweathermap.org/forecast5的埃德蒙顿 5 天预报的网络抓取。

我在下面包括了其他所有内容:

#f = Forecast
f = requests.get('https://api.openweathermap.org/data/2.5/forecast? 
q=Edmonton&mode=xml&units=metric&&appid=########################')
f_text = f.text

f_soup = BeautifulSoup(f_text, "lxml-xml")
print(f_soup.prettify())

time = f_soup.forecast.select('time')[0:]
times = [
    datetime.strptime(time[0]['from'], '%Y-%m-%dT%H:%M:%S'),
    datetime.strptime(time[1]['from'], '%Y-%m-%dT%H:%M:%S'),
    datetime.strptime(time[2]['from'], '%Y-%m-%dT%H:%M:%S'),
    datetime.strptime(time[3]['from'], '%Y-%m-%dT%H:%M:%S'),
    datetime.strptime(time[4]['from'], '%Y-%m-%dT%H:%M:%S'),
    datetime.strptime(time[5]['from'], '%Y-%m-%dT%H:%M:%S'),
    ...
    ...
    datetime.strptime(time[39]['from'], '%Y-%m-%dT%H:%M:%S')
    ]

temp = f_soup.forecast.select('temperature')[0:]
temps = [
    float(temp[0]['value']),
    float(temp[1]['value']),
    float(temp[2]['value']),
    float(temp[3]['value']),
    float(temp[4]['value']),
    float(temp[5]['value']),
    ...
    ...
    float(temp[39]['value'])
    ]

下面是我要抓取的 xml 时间标签之一的片段:

<forecast>
  <time from="2021-01-18T21:00:00" to="2021-01-19T00:00:00">
   <symbol name="clear sky" number="800" var="01n"/>
   <precipitation probability="0"/>
   <windDirection code="WNW" deg="288" name="West-northwest"/>
   <windSpeed mps="3.9" name="Gentle Breeze" unit="m/s"/>
   <temperature max="0.65" min="-1.73" unit="celsius" value="0.65"/>
   <feels_like unit="celsius" value="-4.5"/>
   <pressure unit="hPa" value="1026"/>
   <humidity unit="%" value="75"/>
   <clouds all="0" unit="%" value="clear sky"/>
   <visibility value="10000"/>
  </time>

提前感谢您的帮助。非常感谢任何有关如何改进代码的评论。

标签: pythonxmlweb-scrapingbeautifulsoupslice

解决方案


我认为使用 xml.etree 比使用 BeautifulSoup更好,你可以得到这样的结果:

import requests
import xml.etree.ElementTree as ET
from datetime import datetime
f = requests.get('https://api.openweathermap.org/data/2.5/forecast?q=Edmonton&mode=xml&units=metric&&appid={api_key}'.format(api_key=api_key))

# parse the result text directly to the XML parser
tree = ET.fromstring(f.text)

times = []
temps = []

# iterate through the XML tree element by element
for elem in tree.iter():
    # is it a time tag then extract the time and date from the 'from' attribute
    if elem.tag == 'time':
        times.append(datetime.strptime(elem.get('from'), '%Y-%m-%dT%H:%M:%S')) # append result to list

    if elem.tag == 'temperature':
        # is it a temperature tag then extract the value from the 'value' attribute
        temps.append(elem.get('value')) # append result to list

print(times)
print(len(times))
print(temps)
print(len(temps))

推荐阅读