python - 如何切片抓取的 xml 数据并在 python 中创建列表
问题描述
我正在寻找一种更快的方法来创建下面创建的“时间”和“临时”列表。如您所见,我不得不逐项创建列表,因为我不知道如何对列表进行切片或创建一个 for 循环来制作它们。
例如,使用:
'time = f_soup.forecast.select('time')[0:]'
'time[0]['from']' - #works to select a single item, but
'time[0:3]['from']' - #causes "TypeError: list indices must be integers or slices, not str"
该数据是来自https://openweathermap.org/forecast5的埃德蒙顿 5 天预报的网络抓取。
我在下面包括了其他所有内容:
#f = Forecast
f = requests.get('https://api.openweathermap.org/data/2.5/forecast?
q=Edmonton&mode=xml&units=metric&&appid=########################')
f_text = f.text
f_soup = BeautifulSoup(f_text, "lxml-xml")
print(f_soup.prettify())
time = f_soup.forecast.select('time')[0:]
times = [
datetime.strptime(time[0]['from'], '%Y-%m-%dT%H:%M:%S'),
datetime.strptime(time[1]['from'], '%Y-%m-%dT%H:%M:%S'),
datetime.strptime(time[2]['from'], '%Y-%m-%dT%H:%M:%S'),
datetime.strptime(time[3]['from'], '%Y-%m-%dT%H:%M:%S'),
datetime.strptime(time[4]['from'], '%Y-%m-%dT%H:%M:%S'),
datetime.strptime(time[5]['from'], '%Y-%m-%dT%H:%M:%S'),
...
...
datetime.strptime(time[39]['from'], '%Y-%m-%dT%H:%M:%S')
]
temp = f_soup.forecast.select('temperature')[0:]
temps = [
float(temp[0]['value']),
float(temp[1]['value']),
float(temp[2]['value']),
float(temp[3]['value']),
float(temp[4]['value']),
float(temp[5]['value']),
...
...
float(temp[39]['value'])
]
下面是我要抓取的 xml 时间标签之一的片段:
<forecast>
<time from="2021-01-18T21:00:00" to="2021-01-19T00:00:00">
<symbol name="clear sky" number="800" var="01n"/>
<precipitation probability="0"/>
<windDirection code="WNW" deg="288" name="West-northwest"/>
<windSpeed mps="3.9" name="Gentle Breeze" unit="m/s"/>
<temperature max="0.65" min="-1.73" unit="celsius" value="0.65"/>
<feels_like unit="celsius" value="-4.5"/>
<pressure unit="hPa" value="1026"/>
<humidity unit="%" value="75"/>
<clouds all="0" unit="%" value="clear sky"/>
<visibility value="10000"/>
</time>
提前感谢您的帮助。非常感谢任何有关如何改进代码的评论。
解决方案
我认为使用 xml.etree 比使用 BeautifulSoup更好,你可以得到这样的结果:
import requests
import xml.etree.ElementTree as ET
from datetime import datetime
f = requests.get('https://api.openweathermap.org/data/2.5/forecast?q=Edmonton&mode=xml&units=metric&&appid={api_key}'.format(api_key=api_key))
# parse the result text directly to the XML parser
tree = ET.fromstring(f.text)
times = []
temps = []
# iterate through the XML tree element by element
for elem in tree.iter():
# is it a time tag then extract the time and date from the 'from' attribute
if elem.tag == 'time':
times.append(datetime.strptime(elem.get('from'), '%Y-%m-%dT%H:%M:%S')) # append result to list
if elem.tag == 'temperature':
# is it a temperature tag then extract the value from the 'value' attribute
temps.append(elem.get('value')) # append result to list
print(times)
print(len(times))
print(temps)
print(len(temps))