首页 > 解决方案 > How to combine only first and last column from each Excel sheet into new excel using panas in python?

问题描述

I have an excel file that consist of multiple sheets (~100sheets) and 8 columns. I am trying to combine my 1st column which is "date", and my last column "prediction" from each sheet into new excel file. Thus, my new excel file should have "date" and "prediction" column for each and every sheet into a single sheet, with multiple prediction columns. For doing this, my thought process was to read file first than use pandas concat() to concate the "prediction" column. But when I did that python generated lot of NaN's. I was curious, if we can achieve this much better way.

**Sheet 1:**
Date    col1    Col2 .....   Prediction1
01/01     9         5               5
02/01     3         7               5

**Sheet2**
Date    col1    Col2 .....   Prediction2
01/01     9         5               4
02/01     3         7               6

Note: I am new to python, provide explanation with your code.

Code:

  #Reading file
  df=pd.read_excel('myexcel.xlsx")

  #Combining files
  excel_combine=pd.concat(df[frame] for frame in df.keys())

Expected Output:

    Date       Prediction1         Prediction2
    01/01            5               4
    02/01            5                6

标签: pythonexcelpython-3.xpandas

解决方案


这应该为您提供一个数据框,其中所有预测列都被整齐地重命名。连接并不总是会给你最好的结果。也许尝试合并。还可以在此处查看有关此主题的 pandas 文档:https ://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html

import xlrd
import pandas

# Open the workbook
bk = xlrd.open_workbook('input_file_name')

# set counter to zero
n = 0

# loop through the sheet names
for i in bk.sheet_names():
   # read one sheet into a df at a time
   temp_df = pd.read_excel(file_name, sheet_name = i)
   # set a new column name according to which sheet the prediction came from
   new_col_name = 'pred_' + i
   # rename the prediction column
   temp_df.rename(columns = {'predition' : new_col_name}, inplace = True)

   n += 1 # add one to counter each time a new sheet is processed

   if n == 1:
      # if this is the first loop a dtaframe called df is created
      df = temp_df.copy()

   else:
      # if it is not the first loop merge the temp_df with the df table
      df = df.merge(temp_df,
                    on = 'date',
                    how = 'left') # assuming you do have equal time series for all predictions I set a left join, otherwise a outer join may be better - look this up if you don't know it

# check df if everything is there
print df.info()
print df.head()
print df.describe()

# write to excel
df.to_excel('your_file_name', index = False)

推荐阅读