python - 使用 Pandas(或 numpy)比较两列并计算百分比差异
问题描述
免责声明:我正在学习用 Python 进行开发,我知道这种编码方式可能就像垃圾一样,但我计划在创建程序的同时不断改进。
所以我正在尝试构建一个爬虫来每天使用 Selenium 检查特定的航班价格,并且这部分代码已经完成。始发地、目的地、首飞日期、二飞日期和价格将每天保存。我将这些数据保存到一个文件中,然后比较价格是否有任何变化。
我的目标是确定价格变化是否超过 X 个百分比,然后在每个比较航班的脚本中打印一条消息。
import pandas as pd
import os.path
import numpy as np
#This are just sample data before integrating Selenium values
price = 230
departuredate = '20/02/2020'
returndate = '20/02/2020'
fromm = 'BOS'
to = 'JFK'
price2 = 630
departuredate2 = '20/02/2020'
returndate2 = '20/02/2020'
fromm2= 'CDG'
to2= 'JFK'
#End of sample data
flightdata = {'From': [fromm, fromm2], 'To': [to,to2], 'Departure date': [departuredate,departuredate2], 'Return date': [returndate,returndate2], 'Price': [price,price2]}
df = pd.DataFrame(flightdata, columns= ['From', 'To', 'Departure date', 'Return date', 'Price'])
#Check if the script is running for the first time
if os.path.exists('flightstoday.xls') == True:
os.remove("flightsyesterday.xls")
os.rename('flightstoday.xls', 'flightsyesterday.xls') #Rename the flights scraped fromm yesterday
df.to_csv('flightstoday.xls', mode='a', header=True, sep='\t')
else:
df.to_csv('flightstoday.xls', mode='w', header=True, sep='\t')
#Work with two dataframes
flightsyesterday = pd.read_csv("flightsyesterday.xls",sep='\t')
flightstoday = pd.read_csv("flightstoday.xls",sep='\t')
我缺少的是如何比较“价格”列并打印一条消息,说明对于具有“从”、“至”、“出发日期”、“返回日期”的行 X,航班已更改 X 百分比.
我已经尝试过这段代码,但它只在flightstoday文件中添加了一列,而不是百分比,当然也不会打印价格有任何变化。
flightstoday['PriceDiff'] = np.where(vueloshoy['Price'] == vuelosayer['Price'], 0, vueloshoy['Price'] - vuelosayer['Price'])
对这个新手的任何帮助将不胜感激。谢谢!
解决方案
从我收集到的信息来看,我认为这就是你打算做的。
import pandas as pd
import os.path
import numpy as np
# This are just sample data before integrating Selenium values
price = 230
departuredate = '20/02/2020'
returndate = '20/02/2020'
fromm = 'BOS'
to = 'JFK'
price2 = 630
departuredate2 = '20/02/2020'
returndate2 = '20/02/2020'
fromm2 = 'CDG'
to2 = 'JFK'
# Create second set of prices
price3 = 250
price4 = 600
# Generate data to construct DataFrames
today_flightdata = {'From': [fromm, fromm2], 'To': [to, to2], 'Departure date': [
departuredate, departuredate2], 'Return date': [returndate, returndate2], 'Price': [price, price2]}
yesterday_flightdata = {'From': [fromm, fromm2], 'To': [to, to2], 'Departure date': [
departuredate, departuredate2], 'Return date': [returndate, returndate2], 'Price': [price3, price4]}
# Create dataframes for yesterday and today
today = pd.DataFrame(today_flightdata, columns=[
'From', 'To', 'Departure date', 'Return date', 'Price'])
yesterday = pd.DataFrame(yesterday_flightdata, columns=[
'From', 'To', 'Departure date', 'Return date', 'Price'])
# Determine changes
today['price_change'] = (
today['Price'] - yesterday['Price']) / yesterday['Price'] * 100.
# Determine indices of all rows where price_change > threshold
threshold = 1.0
today['exceeds_threshold'] = abs(today['price_change']) >= threshold
exceed_indices = today['exceeds_threshold'][today['exceeds_threshold']].index
# Print out those entries that exceed threshold
for idx in exceed_indices:
row = today.iloc[idx]
print('Flight from {} to {} leaving on {} and returning on {} has changed by {}%'.format(
row['From'], row['To'], row['Departure date'], row['Return date'], row['price_change']))
输出:
Flight from CDG to JFK leaving on 20/02/2020 and returning on 20/02/2020 has changed by 5.0%
exceed_indices
我从这篇文章中学习了计算的语法
推荐阅读
- html - 表格和 div 宽度之间的差异
- c# - 从 Visual Studio Designer 设置 onClick 侦听器
- javascript - 我无法与捕获组匹配
- java - SimpleDateFormat - 解析日期时出现奇怪的结果
- javascript - window.onload 函数未运行
- angular - Angular 6 在所有 url 前面加上来自服务的值
- android - 如何在imageview中显示图像作为预览?不是完整的图片
- java - 如何生成具有外键属性的通用 CriteriaQuery?
- javascript - 如何将来自webview的数据存储在localdata UWP中
- javascript - 在 mvc 中使用 javascript 计算 CSV 文件中的数字条目