python - 处方数据集的 Python 分析 - 循环提前终止,并且没有输出文件
问题描述
我有这个输入数据:
id,prescriber_last_name,prescriber_first_name,drug_name,drug_cost
1000000001,Smith,James,AMBIEN,100
1000000002,Garcia,Maria,AMBIEN,200
1000000003,Johnson,James,CHLORPROMAZINE,1000
1000000004,Rodriguez,Maria,CHLORPROMAZINE,2000
1000000005,Smith,David,BENZTROPINE MESYLATE,1500
输出应该如下:
drug_name,num_prescriber,total_cost
CHLORPROMAZINE,2,3000
BENZTROPINE MESYLATE,1,1500
AMBIEN,2,300
但相反,我得到以下输出:
AMBIEN 2 300
CHLORPROMAZINE 0 0
BENZTROPINE MESYLATE 0 0
任何建议,将不胜感激!我的代码如下:
fileHandle = """
id,prescriber_last_name,prescriber_first_name,drug_name,drug_cost
1000000001,Smith,James,AMBIEN,100
1000000002,Garcia,Maria,AMBIEN,200
1000000003,Johnson,James,CHLORPROMAZINE,1000
1000000004,Rodriguez,Maria,CHLORPROMAZINE,2000
1000000005,Smith,David,BENZTROPINE MESYLATE,1500
"""
input_data = re.sub(r'(\n)', r',\1', fileHandle)
fields = input_data.split(',')
del fields[0]
NumberOfRows = round(len(fields)/5)
NumberOfCols = 5
length_of_fields=len(fields)
# Expected output: drug_name,number_of_prescribers,total_cost
# drug_name at indices 3 (title), 8, 13, 18, 23, 28
# drug_cost at indices 4 (title), 9, 14, 19, 24, 29
#outputfile = open('/output/top_cost_drug.txt','w')
# get list of drug names
i=8
j=0
drug_name_list=list()
drug_name_indices=list()
while i<=length_of_fields:
drug_name_list.append(fields[i])
drug_name_indices.append(i)
i=i+5;
j=j+1;
# find unique names (same as drug_name_list but without repetition)
unique_drug_list = list()
# traverse for all elements
for x in drug_name_list:
# check if exists in unique_name_list or not
if x not in unique_drug_list:
unique_drug_list.append(x)
i=0
j=0
number_of_unique_drugs=len(unique_drug_list)
unique_cost_list=[0]*number_of_unique_drugs
number_of_prescribers = [0]*number_of_unique_drugs
#while i<len(drug_name_list):
# while j<number_of_unique_drugs:
# if drug_name_list[i]==unique_drug_list[j]:
# drug_name_index=drug_name_indices[i]
# cost_of_drug=int(fields[drug_name_index+1])
# unique_cost_list[j]=int(unique_cost_list[j])+cost_of_drug
# number_of_prescribers[j]=number_of_prescribers[i]+1
# j=j+1
# i=i+1
while j<number_of_unique_drugs:
while i<len(drug_name_list):
if drug_name_list[i]==unique_drug_list[j]:
drug_name_index=drug_name_indices[i]
cost_of_drug=int(fields[drug_name_index+1])
unique_cost_list[j]=int(unique_cost_list[j])+cost_of_drug
number_of_prescribers[j]=number_of_prescribers[i]+1
i=i+1
j=j+1
# print output values
counter=0
print("drug_name,number_of_prescribers,total_cost \n")
while counter<number_of_unique_drugs:
print(unique_drug_list[counter], number_of_prescribers[counter], unique_cost_list[counter])
print("\n")
counter=counter+1
另外,我将打印语句更改为 outputfile.write 但我没有得到任何输出文件,这是为什么呢?
outputfile = open('/output/top_cost_drug.txt','w')
# print output values
counter=0
outputfile.write("drug_name,number_of_prescribers,total_cost \n")
while counter<number_of_unique_drugs:
outputfile.write(unique_drug_list[counter],',', number_of_prescribers[counter],',', unique_cost_list[counter])
print("\n")
counter=counter+1
解决方案
要获得预期的输出,请使用 Pandasgroupby()
聚合方法:
df.groupby("drug_name").drug_cost.agg(["count", "sum"])
count sum
drug_name
AMBIEN 2 300
BENZTROPINE MESYLATE 1 1500
CHLORPROMAZINE 2 3000
要写入文件,请使用to_csv()
:
df.groupby("drug_name").drug_cost.agg(["count", "sum"]).to_csv("output.csv")
数据:
import pandas as pd
data = {'id': {0: 1000000001,
1: 1000000002,
2: 1000000003,
3: 1000000004,
4: 1000000005},
'prescriber_last_name': {0: 'Smith',
1: 'Garcia',
2: 'Johnson',
3: 'Rodriguez',
4: 'Smith'},
'prescriber_first_name': {0: 'James',
1: 'Maria',
2: 'James',
3: 'Maria',
4: 'David'},
'drug_name': {0: 'AMBIEN',
1: 'AMBIEN',
2: 'CHLORPROMAZINE',
3: 'CHLORPROMAZINE',
4: 'BENZTROPINE MESYLATE'},
'drug_cost': {0: 100, 1: 200, 2: 1000, 3: 2000, 4: 1500}}
df = pd.DataFrame(data)
df
id prescriber_last_name prescriber_first_name \
0 1000000001 Smith James
1 1000000002 Garcia Maria
2 1000000003 Johnson James
3 1000000004 Rodriguez Maria
4 1000000005 Smith David
drug_name drug_cost
0 AMBIEN 100
1 AMBIEN 200
2 CHLORPROMAZINE 1000
3 CHLORPROMAZINE 2000
4 BENZTROPINE MESYLATE 1500
推荐阅读
- python - 使用带有 numpy 的 if else 语句进行算术计算
- reactjs - React + Firebase 身份验证:在不更改当前登录用户的情况下创建新用户
- amazon-web-services - 雅典娜返回返回零结果
- javascript - Typescript过滤器中回调函数的类型是什么?
- laravel - Laravel Excel:错误资源解释为文档但使用 MIME 传输
- vue.js - 在下面文件的vue JS中获取“在调用reset()时无法读取未定义错误的属性'map'”
- visual-studio-code - 单击“运行代码”时,Visual Studio Code 的 Code Runner 扩展不起作用
- html - 从选择值计算的任何选项
- reactjs - iframe 中的 HTML input type=“file” 会破坏 Chrome 和 Edge 中的文件选择对话框
- scala - 通过名称和调用方法获取类