首页 > 解决方案 > Convert 'tabulated' string of pandas dataframe back into a dataframe

问题描述

I will try and illustrate my problem with an example DataFrame:

import pandas as pd
example_df = pd.DataFrame([['a',1,2],['b',3,4],['c',5,6]], columns = ['col1', 'col2', 'col3'])

Now I convert this dataframe to a tabulated string like so

from tabulate import tabulate
example_string = tabulate(example_df, headers=list(example_df.columns), tablefmt='jira', showindex="never")

example_string then looks like this...

"|| col1   ||   col2 ||   col3 ||\n| a      |      1 |      2 |\n| b      |      3 |      4 |\n| c      |      5 |      6 |"

So my question is - how can I convert this string back into a pandas DataFrame?

标签: pythonpandas

解决方案


pandas can parse a string if you use StringIO with pd.read_csv.

The biggest issue here is that the column labels have a different separator ('||') than the data ('|'), and there is also an additional '|' around the table. To deal with this, we will read in the header and data separately then combine them into a single DataFrame. The .dropna removes the additional columns on the edges created by the table border '|'

import pandas as pd
from io import StringIO

header = (pd.read_csv(StringIO(example_string), sep=r'\|\|', engine='python', 
                      nrows=1, header=None)
             .dropna(how='all', axis=1))
data = (pd.read_csv(StringIO(example_string), sep=r'\|', engine='python', 
                    skiprows=1, header=None)
          .dropna(how='all', axis=1))

# Replace data's RangeIndex with column labels. 
data.columns = data.columns.map(header.T[0].str.strip().to_dict())

print(data)

       col1  col2  col3
0   a           1     2
1   b           3     4
2   c           5     6

One issue with the above is that object columns still have all of the extra whitespace. You'll need to strip 'col1' for instance. Numeric columns were properly typecast. It's a simple fix:

for col in data.select_dtypes('O'):
    data[col] = data[col].str.strip()

print(data)

  col1  col2  col3
0    a     1     2
1    b     3     4
2    c     5     6

推荐阅读