python - 对单个值求和而不是对整列求和
问题描述
我的脚本主要基于这个问题的 chitown88 答案。我的脚本旨在使用 BeautifulSoup 从陆军工程兵团网站上的 XML 中提取锁(即锁和坝)数据。从这些数据中,它使用 Pandas 创建一个表,然后创建一个列表,每个锁都有自己的表。最后,它将表格写入单独的 Excel 工作表。它工作得很好。但是,现在我请求帮助对每张纸上的“驳船数量”列求和。
我想在底部创建一个“总计”行,其中“驳船数量”列是唯一的总和。我的两次尝试都没有把总和放在底部。相反,它只是在新行中重复该列中的每个值。
(1) 我试过了
sheet = sheet.append(df,sort=True).reset_index(drop=True)
sheet.loc['Total'] = pd.Series(df['Number of Barges'].sum(), index = ['Number of Barges'])
(2) 我已经尝试过(如下面的脚本所示。):
df.loc['Total'] = pd.Series(df['Number of Barges'].sum(), index = ['Number of Barges'])
我的脚本:
import requests
from bs4 import BeautifulSoup
import pandas as pd
from pandas import ExcelWriter
from datetime import datetime
import os
#set the headers as a browser
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}
#set up file name
file_path = r"\\pathto\BargeTraffic"
excel_file = 'LockQueueReport_TEST' + str(datetime.now().strftime('_%m_%d_%Y')) + '.xlsx'
excel_file_full = os.path.join(file_path, excel_file)
#create a list of locks that will be used as three separate tables
lockName = ['Lockport Lock', 'Brandon Rd Lock', 'Dresden Island Lock']
lockNo = ['02', '03', '04']
results = []
for lock in lockNo:
url = 'https://corpslocks.usace.army.mil/lpwb/xml.lockqueue?in_river=IL&in_lock=' + lock
#print (url)
link = requests.get(url).text
soup = BeautifulSoup(link,'lxml')
#grab each row, pull the data
rows = soup.find_all('row')
#take those rows on the site and create a table, and make a list of tables (one table for each lock)
sheet = pd.DataFrame()
for row in rows:
name = row.find('vessel_name').text.strip()
no = row.find('vessel_no').text.strip()
dir = row.find('direction').text.strip()
barno = row.find('num_barges').text.strip()
arr = row.find('arrival_date').text.strip()
#because these fields could have no value, put in try/except block
try:
end = row.find('end_of_lockage').text.strip()
except:
end = ''
pass
df = pd.DataFrame([[name,no,dir,barno,arr,end]], columns=['Name','Vessel No.','Direction','Number of Barges','Arrival', 'End of Lockage'])
df.loc['Total'] = pd.Series(df['Number of Barges'].sum(), index = ['Number of Barges']) #
sheet = sheet.append(df,sort=True).reset_index(drop=True)
results.append(sheet)
#function that takes that list of tables and write them into separate excel sheets
def save_xls(list_dfs, xls_path):
with ExcelWriter(xls_path) as writer:
for n, df in enumerate(list_dfs):
df.to_excel(writer,'%s' %lockName[n],index=False,)
writer.save()
save_xls(results,excel_file_full)
print('----done----')
结果:
解决方案
我认为更改应用总计的位置,并转换为数字数据类型以便正确求和
import requests
from bs4 import BeautifulSoup
import pandas as pd
from pandas import ExcelWriter
from datetime import datetime
import os
#set the headers as a browser
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}
#set up file name
file_path = r"\\pathto\BargeTraffic"
excel_file = 'LockQueueReport_TEST' + str(datetime.now().strftime('_%m_%d_%Y')) + '.xlsx'
excel_file_full = os.path.join(file_path, excel_file)
#create a list of locks that will be used as three separate tables
lockName = ['Lockport Lock', 'Brandon Rd Lock', 'Dresden Island Lock']
lockNo = ['02', '03', '04']
results = []
for lock in lockNo:
url = 'https://corpslocks.usace.army.mil/lpwb/xml.lockqueue?in_river=IL&in_lock=' + lock
#print (url)
link = requests.get(url).text
soup = BeautifulSoup(link,'lxml')
#grab each row, pull the data
rows = soup.find_all('row')
#take those rows on the site and create a table, and make a list of tables (one table for each lock)
sheet = pd.DataFrame()
for row in rows:
name = row.find('vessel_name').text.strip()
no = row.find('vessel_no').text.strip()
dir_ = row.find('direction').text.strip()
barno = row.find('num_barges').text.strip()
arr = row.find('arrival_date').text.strip()
#because these fields could have no value, put in try/except block
try:
end = row.find('end_of_lockage').text.strip()
except:
end = ''
pass
df = pd.DataFrame([[name,no,dir_,barno,arr,end]], columns=['Name','Vessel No.','Direction','Number of Barges','Arrival', 'End of Lockage'])
sheet = sheet.append(df,sort=True).reset_index(drop=True)
sheet['Number of Barges'] = pd.to_numeric(sheet['Number of Barges'], errors = 'coerce')
sheet.loc['Total'] = pd.Series(sheet['Number of Barges'].sum(), index = ['Number of Barges'])
print(sheet)
带有错误处理的 Pandas 类型转换参见@Alex Riley
推荐阅读
- javascript - Express-API POST 在 Postman 中有效,但不适用于 AJAX
- c# - C#位图截图大小
- bash - 如何使用默认 shell 在 OSX 上查找所有可用字体
- mysql-8.0 - MYSQL 从 5.7.18 升级到 8.0.17 后 information_schema db size 为 0 Byte
- java - JPA - 如何在 postgres 文本数组中查找字符串列表
- .net - 如何修复“找不到类型或命名空间名称“系统”是否缺少指令或程序集引用 (CS0246)”
- google-chrome - 如何从 Chrome 控制台导出数组
- amp-html - 在 reCAPTCHA 脚本上运行 execute() 时出错
- javascript - Woocommerce:未触发交付方法更改事件处理程序
- linux - Wget 错误:已发送 HTTP 请求,等待响应... 401 Unauthorized Authorization failed