首页 > 解决方案 > 从几个 numpy 系列创建 pandas 数据框

问题描述

我正在尝试创建一个熊猫数据框,其中的列是 numpy 数组。我还想在创建时命名列。

这似乎是一个非常简单的任务。

它可以正常工作而无需命名列,尽管列的顺序错误:

import numpy as np
import pandas as pd

n_obs = 500

df = pd.DataFrame(np.random.uniform(low = 1.1, high = 5.0,size = (n_obs) ) , np.random.randint(size = (n_obs), low = 18, high = 80)) 

print(df.head())

输出:

49  3.802458
57  3.830600
29  4.991442
47  2.600079
70  1.658041
52  2.236296
37  3.327520
23  1.366954
22  1.509165
36  1.289901
77  3.834789
68  4.370223
40  4.532152
71  2.348842

当我尝试命名列时出现错误:

df = pd.DataFrame(np.random.uniform(low = 1.1, high = 5.0,size = (n_obs) ) , np.random.randint(size = (n_obs), low = 18, high = 80), columns =['col1','col2']) 

输出:

Traceback (most recent call last):
  File "C:\Users\GBUHR4\AppData\Local\Continuum\anaconda3\lib\site-packages\pand
as\core\internals.py", line 4622, in create_block_manager_from_blocks
    placement=slice(0, len(axes[0])))]
  File "C:\Users\GBUHR4\AppData\Local\Continuum\anaconda3\lib\site-packages\pand
as\core\internals.py", line 2957, in make_block
    return klass(values, ndim=ndim, fastpath=fastpath, placement=placement)
  File "C:\Users\GBUHR4\AppData\Local\Continuum\anaconda3\lib\site-packages\pand
as\core\internals.py", line 120, in __init__
    len(self.mgr_locs)))
ValueError: Wrong number of items passed 1, placement implies 2

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "fake.py", line 33, in <module>
    df = pd.DataFrame(np.random.uniform(low = 1.1, high = 5.0,size = (n_obs) ) ,
 np.random.randint(size = (n_obs), low = 18, high = 80), columns =['col1','col2'
])
  File "C:\Users\Me\AppData\Local\Continuum\anaconda3\lib\site-packages\pand
as\core\frame.py", line 361, in __init__
    copy=copy)
  File "C:\Users\Me\AppData\Local\Continuum\anaconda3\lib\site-packages\pand
as\core\frame.py", line 533, in _init_ndarray
    return create_block_manager_from_blocks([values], [columns, index])
  File "C:\Users\Me\AppData\Local\Continuum\anaconda3\lib\site-packages\pand
as\core\internals.py", line 4631, in create_block_manager_from_blocks
    construction_error(tot_items, blocks[0].shape[1:], axes, e)
  File "C:\Users\Me\AppData\Local\Continuum\anaconda3\lib\site-packages\pand
as\core\internals.py", line 4608, in construction_error
    passed, implied))
ValueError: Shape of passed values is (1, 500), indices imply (2, 500)

我找不到涵盖此内容的教程。这显然是一个非常简单的问题,但我找不到解决方案。

标签: pythonpandasnumpy

解决方案


DataFrame使用 dict将数组传递给构造函数:

n_obs = 500

a = np.random.uniform(low = 1.1, high = 5.0,size = (n_obs))
b = np.random.randint(size = (n_obs), low = 18, high = 80)

df = pd.DataFrame({'col1':a, 'col2':b}) 
print (df.head())
       col1  col2
0  2.070148    23
1  1.735960    28
2  4.156209    72
3  4.253241    26
4  3.539951    45

如果可以使用 python bellow 3.6 添加参数columns以指定排序(从 Python 3.6 开始,标准 dict 类型默认保持插入顺序):

df = pd.DataFrame({'col1':a, 'col2':b}, columns=['col2','col1']) 
print (df.head())
   col2      col1
0    23  2.070148
1    28  1.735960
2    72  4.156209
3    26  4.253241
4    45  3.539951

您还可以在 numpy 中堆叠数组,但获得相同类型的数据 - 这里浮动:

df = pd.DataFrame(np.column_stack((a,b)), columns=['col1','col2']) 
print (df.head())
       col1  col2
0  2.070148  23.0
1  1.735960  28.0
2  4.156209  72.0
3  4.253241  26.0
4  3.539951  45.0

同样在您的解决方案中:

df = pd.DataFrame(a, b) 

第一个数组创建列和第二个索引,就像:

df = pd.DataFrame(a, index=b) 
print (df.head())
           0
23  2.070148
28  1.735960
72  4.156209
26  4.253241
45  3.539951

推荐阅读