首页 > 解决方案 > 对单个值求和而不是对整列求和

问题描述

我的脚本主要基于这个问题的 chitown88 答案。我的脚本旨在使用 BeautifulSoup 从陆军工程兵团网站上的 XML 中提取锁(即锁和坝)数据。从这些数据中,它使用 Pandas 创建一个表,然后创建一个列表,每个锁都有自己的表。最后,它将表格写入单独的 Excel 工作表。它工作得很好。但是,现在我请求帮助对每张纸上的“驳船数量”列求和。

我想在底部创建一个“总计”行,其中“驳船数量”列是唯一的总和。我的两次尝试都没有把总和放在底部。相反,它只是在新行中重复该列中的每个值。

(1) 我试过了

sheet = sheet.append(df,sort=True).reset_index(drop=True)
sheet.loc['Total'] = pd.Series(df['Number of Barges'].sum(), index = ['Number of Barges'])

(2) 我已经尝试过(如下面的脚本所示。):

df.loc['Total'] = pd.Series(df['Number of Barges'].sum(), index = ['Number of Barges'])

我的脚本:

import requests
from bs4 import BeautifulSoup
import pandas as pd
from pandas import ExcelWriter
from datetime import datetime
import os

#set the headers as a browser
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}
#set up file name
file_path = r"\\pathto\BargeTraffic"
excel_file = 'LockQueueReport_TEST' + str(datetime.now().strftime('_%m_%d_%Y')) + '.xlsx'
excel_file_full = os.path.join(file_path, excel_file)

#create a list of locks that will be used as three separate tables
lockName = ['Lockport Lock', 'Brandon Rd Lock', 'Dresden Island Lock']
lockNo = ['02', '03', '04']

results = []
for lock in lockNo: 
    url = 'https://corpslocks.usace.army.mil/lpwb/xml.lockqueue?in_river=IL&in_lock=' + lock
    #print (url)
    link = requests.get(url).text
    soup = BeautifulSoup(link,'lxml')
    
    #grab each row, pull the data
    rows = soup.find_all('row')

    #take those rows on the site and create a table, and make a list of tables (one table for each lock)
    sheet = pd.DataFrame()
    for row in rows:
        name = row.find('vessel_name').text.strip()
        no = row.find('vessel_no').text.strip()
        dir = row.find('direction').text.strip()
        barno = row.find('num_barges').text.strip()
        arr = row.find('arrival_date').text.strip()
        
        #because these fields could have no value, put in try/except block
        try:
            end = row.find('end_of_lockage').text.strip()
        except:
            end = ''
            pass

        df = pd.DataFrame([[name,no,dir,barno,arr,end]], columns=['Name','Vessel No.','Direction','Number of Barges','Arrival', 'End of Lockage'])

        df.loc['Total'] = pd.Series(df['Number of Barges'].sum(), index = ['Number of Barges']) #
        sheet = sheet.append(df,sort=True).reset_index(drop=True)
            
    results.append(sheet)

#function that takes that list of tables and write them into separate excel sheets
def save_xls(list_dfs, xls_path):
    with ExcelWriter(xls_path) as writer:
        for n, df in enumerate(list_dfs):
            df.to_excel(writer,'%s' %lockName[n],index=False,)
        writer.save()

save_xls(results,excel_file_full)
print('----done----')

结果:

在此处输入图像描述

标签: pythonpandasbeautifulsoup

解决方案


我认为更改应用总计的位置,并转换为数字数据类型以便正确求和

import requests
from bs4 import BeautifulSoup
import pandas as pd
from pandas import ExcelWriter
from datetime import datetime
import os

#set the headers as a browser
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}
#set up file name
file_path = r"\\pathto\BargeTraffic"
excel_file = 'LockQueueReport_TEST' + str(datetime.now().strftime('_%m_%d_%Y')) + '.xlsx'
excel_file_full = os.path.join(file_path, excel_file)

#create a list of locks that will be used as three separate tables
lockName = ['Lockport Lock', 'Brandon Rd Lock', 'Dresden Island Lock']
lockNo = ['02', '03', '04']

results = []

for lock in lockNo: 
    url = 'https://corpslocks.usace.army.mil/lpwb/xml.lockqueue?in_river=IL&in_lock=' + lock
    #print (url)
    link = requests.get(url).text
    soup = BeautifulSoup(link,'lxml')
    
    #grab each row, pull the data
    rows = soup.find_all('row')

    #take those rows on the site and create a table, and make a list of tables (one table for each lock)
    sheet = pd.DataFrame()
    for row in rows:
        name = row.find('vessel_name').text.strip()
        no = row.find('vessel_no').text.strip()
        dir_ = row.find('direction').text.strip()
        barno = row.find('num_barges').text.strip()
        arr = row.find('arrival_date').text.strip()
        
        #because these fields could have no value, put in try/except block
        try:
            end = row.find('end_of_lockage').text.strip()
        except:
            end = ''
            pass

        df = pd.DataFrame([[name,no,dir_,barno,arr,end]], columns=['Name','Vessel No.','Direction','Number of Barges','Arrival', 'End of Lockage'])
        sheet = sheet.append(df,sort=True).reset_index(drop=True)       
    sheet['Number of Barges'] = pd.to_numeric(sheet['Number of Barges'], errors = 'coerce')  
    sheet.loc['Total'] = pd.Series(sheet['Number of Barges'].sum(), index = ['Number of Barges'])
    print(sheet)

带有错误处理的 Pandas 类型转换参见@Alex Riley


推荐阅读