python - Showing the total of a column without repeating values
问题描述
I have a script which outputs a csv with five columns. I've added two lines of code to SUM two of those columns. I have managed to do this, however, the totals are these columns are repeated on every row, where i just want the totals to be shown on one row.
df['Unit Total'] = df['Units Sold'].sum()
df['Total Revenue'] = df['data_revenue'].sum()
This is what my script produces
8 0.013207 AR ARS 0.105656 74012 575.2779
10 0.013207 AR ARS 0.13207 74012 575.2779
6 0.013207 AR ARS 0.079242 74012 575.2779
6 0.013207 AR ARS 0.079242 74012 575.2779
What i actually want to see
8 0.013207 AR ARS 0.105656 74012 575.2779
10 0.013207 AR ARS 0.13207
6 0.013207 AR ARS 0.079242
6 0.013207 AR ARS 0.079242
My Script
for filename in filelist:
print(filename)
df = pandas.read_csv('SYB_M_20171001_20171031.txt', header=None, encoding='utf-8', sep='\t', names=colnames,
skiprows=3, usecols=['Units Sold', 'Dealer Price', 'End Consumer Country', 'Currency Code']
)
df['data_revenue'] = df['Units Sold'] * df['Dealer Price']
df = df.sort_values(['End Consumer Country', 'Currency Code'])
df['Unit Total'] = df['Units Sold'].sum()
df['Total Revenue'] = df['data_revenue'].sum()
df.to_csv(outfile + r"\output.csv", index=None)
dflist.append(filename)
解决方案
Set first value of index by position:
df.loc[df.index[0], 'Unit Total'] = df['Units Sold'].sum()
df.loc[df.index[0], 'Unit Revenue'] = df['data_revenue'].sum()
Another solution is create default index by reset_index
with drop=True
, so possible set by 0
:
df = df.sort_values(['End Consumer Country', 'Currency Code']).reset_index(drop=True)
df.loc[0, 'Unit Total'] = df['Units Sold'].sum()
df.loc[0, 'Unit Revenue'] = df['data_revenue'].sum()
推荐阅读
- javascript - 如何在反应材料表上添加花哨的滚动条?
- git - 如何解决 git describe --match 限制
- selenium - 不推荐使用 Cucumber 主类
- nginx - 为什么nginx Lua处理请求时修改了变量?
- javascript - 无法与内容脚本建立连接
- c# - 如何填充图形的一部分
- reactjs - 如何在 React 中呈现 api 响应?
- python - 使用 _scatter() 替换矩阵中的值
- python - 使用 np.where 时有关警告的更多详细信息
- php - 不理解影响postgresql max_connections 的文档中的描述