首页 > 解决方案 > 数组形状无法从循环 Python 中获取

问题描述

试图建立一个包含两个数组的人工数据集。X of shape (100,2) - 收入和年龄出现在两列中。y 形状 (100,1)

当附加循环时,我得到的只有 (200,1) 的 X。

没有错误,但提供模型的意图是生成 100x2 的 np 数组

感谢您的支持。

from scipy.stats import norm 
import random
from numpy import *
import numpy as np
from ast import literal_eval
from ast import literal_eval
from pandas import DataFrame


# Function for N points in k clusters to generate artificial data
def create_clustered_data(N,k):
    random.seed(10)
    points_per_cluster=float(N)/k
    X=np.array([])
    y=np.array([])
    for i in range(k):
        income_centroid=np.random.uniform(20000,200000)
        age_centroid=np.random.uniform(20,70)
        for j in range(int(points_per_cluster)):
             X=np.append(X,[np.random.normal(income_centroid,10000),np.random.normal(age_centroid,2)])
             y=np.append(y,i) 
             X=np.array(X)
             y=np.array(y)
    return X,y

(X,y)=create_clustered_data(100,5) # using the function to create two arrays

print(X[0:4]) # getting the income and age appending together in single dimension
X.shape # I need to get the shape as (100,2) instead of (200,1) currently being achieved
X.ndim # I need to get this as 2 instead of 1 currently being achieved 

标签: pythonarraysfunctionloopsappend

解决方案


只需将您的 np.array 修改为标准 python 列表并使用+=运算符将​​值附加到列表中。您还需要将您的记录用方括号括起来[],请参阅[[np.random.normal(income_centroid, 10000), np.random.normal(age_centroid,2)]]

# Function for N points in k clusters to generate artificial data
def create_clustered_data(N,k):
    random.seed(10)
    points_per_cluster=float(N)/k
    X=[]
    y=[]
    for i in range(k):
        income_centroid=np.random.uniform(20000,200000)
        age_centroid=np.random.uniform(20,70)
        for j in range(int(points_per_cluster)):
             X += [[np.random.normal(income_centroid, 10000), np.random.normal(age_centroid,2)]]
             y += [i]
    return X, y

(X,y) = create_clustered_data(100,5)
X = np.array(X)
y = np.array(y)

转换后X = np.array(X)X.shape将返回(100, 2)


推荐阅读