python - Display columns in matrix format using dataframe python
问题描述
I have the following table
I want to convert int into a matrix using python, to look something like below:
Can I get some direction as to where to start with this? I have used pandas to read two dataframes and merge them to create the initial table I have shown(one having two columns).
Code I am using is below is below:
import pandas as pd
from pyexcelerate import Workbook
import numpy as np
import time
start = time.process_time()
excel_file = 'Test.xlsx'
df = pd.read_excel(excel_file, sheet_name=0, index_col=0)
print(df.columns)
print(df.index)
newdf= (df.pivot(index='ColumnB',columns='ColumnA', values='ColumnB'))
myNewDF = newdf.transform(lambda x: np.where(x.isnull(), '', 'yes'))
aftercalc = time.process_time()
print(aftercalc - start)
myNewDF.to_excel("1.xlsx")
print(time.process_time() - aftercalc)
The ouput of the prints are :
Index(['ColumnB'], dtype='object') Index(['TypeA', 'TypeA', 'TypeA', 'TypeA', 'TypeA', 'TypeB', 'TypeB', 'TypeC', 'TypeC', 'TypeC', 'TypeD'], dtype='object', name='ColumnA')
The error I get while running this is :
Traceback (most recent call last): File "C:_data\learn\Miniconda\lib\site-packages\pandas\core\indexes\base.py", line 2657, in get_loc return self._engine.get_loc(key) File "pandas/_libs/index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/index.pyx", line 132, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/hashtable_class_helper.pxi", line 1601, in pandas._libs.hashtable.PyObjectHashTable.get_item File "pandas/_libs/hashtable_class_helper.pxi", line 1608, in pandas._libs.hashtable.PyObjectHashTable.get_item KeyError: 'ColumnA'
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "test.py", line 10, in newdf= (df.pivot(index='ColumnB',columns='ColumnA', values='ColumnB')) File "C:_data\learn\Miniconda\lib\site-packages\pandas\core\frame.py", line 5628, in pivot return pivot(self, index=index, columns=columns, values=values) File "C:_data\learn\Miniconda\lib\site-packages\pandas\core\reshape\pivot.py", line 379, in pivot index = MultiIndex.from_arrays([index, data[columns]]) File "C:_data\learn\Miniconda\lib\site-packages\pandas\core\frame.py", line 2927, in getitem indexer = self.columns.get_loc(key) File "C:_data\learn\Miniconda\lib\site-packages\pandas\core\indexes\base.py", line 2659, in get_loc return self._engine.get_loc(self._maybe_cast_indexer(key)) File "pandas/_libs/index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/index.pyx", line 132, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/hashtable_class_helper.pxi", line 1601, in pandas._libs.hashtable.PyObjectHashTable.get_item File "pandas/_libs/hashtable_class_helper.pxi", line 1608, in pandas._libs.hashtable.PyObjectHashTable.get_item
解决方案
这解决了吗?
newdf= (df.pivot(index='ColumnB',columns='ColumnA', values='ColumnB'))
newdf
Out[28]:
ColumnA TypeA TypeB TypeC TypeD
ColumnB
A A A NaN A
B B NaN B NaN
C C NaN C NaN
D D NaN NaN NaN
E E NaN NaN NaN
F NaN F NaN NaN
Z NaN NaN Z NaN
newdf.transform(lambda x: np.where(x.isnull(), '', 'yes'))
Out[29]:
ColumnA TypeA TypeB TypeC TypeD
ColumnB
A yes yes yes
B yes yes
C yes yes
D yes
E yes
F yes
Z yes
修改后的代码
import pandas as pd
#from pyexcelerate import Workbook
import time
import numpy as np
start = time.process_time()
excel_file = 'C:\\Users\\ss\\Desktop\\check.xlsx'
df = pd.read_excel(excel_file, sheet_name=0, index_col=0)
print(df.columns)
print(df.index)
newdf= (df.pivot(index='ColumnB',columns='ColumnA', values='ColumnB'))
myNewDF = newdf.transform(lambda x: np.where(x.isnull(), '', 'yes'))
aftercalc = time.process_time()
print(aftercalc - start)
myNewDF.to_excel("C:\\Users\\ss\\Desktop\\output.xlsx")
推荐阅读
- reactjs - 当子级没有该状态作为道具时,父级状态更改会生成子级重新渲染 [Virtual DOM]
- appium - 如何使用 Appium 在 Sauce Labs 中运行应用内购买的自动化测试
- python-3.x - Python 多处理管理器在烧瓶 API 中使用时显示错误
- python - Pandas DataFrame - 根据子字符串过滤行
- c# - 如何将编辑日期和时间更新到服务器端的 SQL 数据库中
- mysql - 如果元素更改则添加行,否则只需更新时间戳
- angular - 如何保存动态表中的每个值
- python-3.x - 如何有效地从字符串中提取子字符串
- postgresql - JDBC sink 连接器如何将值插入 postgres 数据库
- c++ - 为什么我没有收到任何错误(C 样式转换)