首页 > 解决方案 > 如何根据不同列中的唯一变量获取 csv 文件中列的运行总计?

问题描述

import csv
def getDataFromFile(filename, dataList):
    file = open(filename, "r")
    csvReader = csv.reader(file)
    for aList in csvReader:
        dataList.append(aList)
    file.close()

def getTotalByYear(expendDataList):
    total = 0
    for row in expendDataList:
        expenCount = float(row[2])
        total += expenCount**

Rtotal = input(print("Enter 'every' or a particular year. "))
if Rtotal == 'every' or == 'Every':
    print(expenCount)

如您所见,如果您键入every或,我得到了第 2 列的运行总计,Every但我不明白如何在依赖于第一列中的某个变量时为第 2 列计算运行总计。

在这种情况下,我的 CSV 文件包含三列数据。一个year领域,一个item领域,一个expenditure领域。如何expenditure根据某一年获得该领域的总和?

expendDataList = [] 
fname = "expenditures.csv"
getDataFromFile(fname, expendDataList)
getTotalByYear(expendDataList)

标签: pythonpython-3.xcsvcumulative-sum

解决方案


生成运行总计对于生成器函数来说是一项很好的任务。此示例使用filter内置函数来过滤掉不需要的年份(可以使用生成器表达式/列表推导式代替)。然后它遍历选定的行以产生结果。

import csv 


def running_totals(year):
    with open('year-item-expenditure.csv') as f:
        reader = csv.DictReader(f)
        predicate = None if year.lower() == 'every' else lambda row: row['Year'] == year
        total = 0 
        for row in filter(predicate, reader):
            total += float(row['Expenditure'])
            yield total


totals = running_totals('2019')
for total in totals:
    print(total)

另一种方法是使用itertools.accumulate,尽管您仍然必须执行所有过滤,因此除非您需要性能,否则这样做没有太多好处。

import csv
import itertools


def running_totals(year):
    with open('year-item-expenditure.csv') as f:
        reader = csv.DictReader(f)
        predicate = None if year.lower() == 'every' else lambda row: row['Year'] == year
        # Create a generator expression that yields expenditures as floats
        expenditures = (float(row['Expenditure']) for row in filter(predicate, reader))
        for total in itertools.accumulate(expenditures):
            yield total

推荐阅读