python - Convert 'tabulated' string of pandas dataframe back into a dataframe
问题描述
I will try and illustrate my problem with an example DataFrame:
import pandas as pd
example_df = pd.DataFrame([['a',1,2],['b',3,4],['c',5,6]], columns = ['col1', 'col2', 'col3'])
Now I convert this dataframe to a tabulated string like so
from tabulate import tabulate
example_string = tabulate(example_df, headers=list(example_df.columns), tablefmt='jira', showindex="never")
example_string then looks like this...
"|| col1 || col2 || col3 ||\n| a | 1 | 2 |\n| b | 3 | 4 |\n| c | 5 | 6 |"
So my question is - how can I convert this string back into a pandas DataFrame?
解决方案
pandas
can parse a string if you use StringIO
with pd.read_csv
.
The biggest issue here is that the column labels have a different separator ('||'
) than the data ('|'
), and there is also an additional '|'
around the table. To deal with this, we will read in the header and data separately then combine them into a single DataFrame. The .dropna
removes the additional columns on the edges created by the table border '|'
import pandas as pd
from io import StringIO
header = (pd.read_csv(StringIO(example_string), sep=r'\|\|', engine='python',
nrows=1, header=None)
.dropna(how='all', axis=1))
data = (pd.read_csv(StringIO(example_string), sep=r'\|', engine='python',
skiprows=1, header=None)
.dropna(how='all', axis=1))
# Replace data's RangeIndex with column labels.
data.columns = data.columns.map(header.T[0].str.strip().to_dict())
print(data)
col1 col2 col3
0 a 1 2
1 b 3 4
2 c 5 6
One issue with the above is that object
columns still have all of the extra whitespace. You'll need to strip 'col1'
for instance. Numeric columns were properly typecast. It's a simple fix:
for col in data.select_dtypes('O'):
data[col] = data[col].str.strip()
print(data)
col1 col2 col3
0 a 1 2
1 b 3 4
2 c 5 6
推荐阅读
- java - 使用 XQJ 进行简单查询
- python - 使用 2 位 Dtype 指定 NumPy 数组
- python - 以特定因子加宽 x 轴?
- python-3.x - 我只是将主机名作为输出,但主要要求是备份 Cisco IOS 设备的运行配置
- mysql - CONCAT 字符串到 longblog 和 mysql 磁盘使用情况
- javascript - npm link会导致找不到模块问题吗?
- wpf - 为什么我运行程序时WPF标题栏背景是白色的?
- python - 如何使用 for 循环和从其他 df 派生的新列名重命名 df 的列?
- metal - 金属 Alpha 通道异常
- button - 使用鼠标侦听器更改图标的 JButton