python - How to combine only first and last column from each Excel sheet into new excel using panas in python?
问题描述
I have an excel
file that consist of multiple sheets (~100sheets) and 8 columns. I am trying to combine my 1st column which is "date", and my last column "prediction" from each sheet into new excel file. Thus, my new excel file should have "date" and "prediction" column for each and every sheet into a single sheet, with multiple prediction columns. For doing this, my thought process was to read file first than use pandas concat()
to concate the "prediction" column. But when I did that python generated lot of NaN's
. I was curious, if we can achieve this much better way.
**Sheet 1:**
Date col1 Col2 ..... Prediction1
01/01 9 5 5
02/01 3 7 5
**Sheet2**
Date col1 Col2 ..... Prediction2
01/01 9 5 4
02/01 3 7 6
Note: I am new to python, provide explanation with your code.
Code:
#Reading file
df=pd.read_excel('myexcel.xlsx")
#Combining files
excel_combine=pd.concat(df[frame] for frame in df.keys())
Expected Output:
Date Prediction1 Prediction2
01/01 5 4
02/01 5 6
解决方案
这应该为您提供一个数据框,其中所有预测列都被整齐地重命名。连接并不总是会给你最好的结果。也许尝试合并。还可以在此处查看有关此主题的 pandas 文档:https ://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html
import xlrd
import pandas
# Open the workbook
bk = xlrd.open_workbook('input_file_name')
# set counter to zero
n = 0
# loop through the sheet names
for i in bk.sheet_names():
# read one sheet into a df at a time
temp_df = pd.read_excel(file_name, sheet_name = i)
# set a new column name according to which sheet the prediction came from
new_col_name = 'pred_' + i
# rename the prediction column
temp_df.rename(columns = {'predition' : new_col_name}, inplace = True)
n += 1 # add one to counter each time a new sheet is processed
if n == 1:
# if this is the first loop a dtaframe called df is created
df = temp_df.copy()
else:
# if it is not the first loop merge the temp_df with the df table
df = df.merge(temp_df,
on = 'date',
how = 'left') # assuming you do have equal time series for all predictions I set a left join, otherwise a outer join may be better - look this up if you don't know it
# check df if everything is there
print df.info()
print df.head()
print df.describe()
# write to excel
df.to_excel('your_file_name', index = False)
推荐阅读
- webpack - 是否对主源或渲染器源中的更改进行电子自动重新加载?
- ruby-on-rails - 设计不会更新用户模型中的所有自定义字段
- c# - 如果我为电子邮件类创建一个 Mock,它会发送一封真实的电子邮件吗?
- c - 遍历在 C 中不断更新的目录
- java - 用 Java 连接新的 MYSQL 数据库时阻止“未知数据库”消息
- javascript - 在 Edge 中单击并拖动滚动会导致瞬间冻结
- c# - 如何在每个请求中将 winform 自定义用户凭据传递给 WCF 服务?
- c# - ConnectWise SDK,创建产品时请求错误
- html - 避免在特定 tr 行后分页
- docker - 无法在 docker hub 中构建 Windows 映像