首页 > 解决方案 > pyspark:数据类型更改功能

问题描述

我有一个 iris.csv 数据集。我将 csv 文件加载到 RDD 中,我应该将所有数值更改为浮点数。我试图将其转换为数据框,但它说“无法推断类型的模式:类'str'”我一整天都在尝试这样做,但我做不到。因为我是初学者,你能帮帮我吗

    irisRDD = sc.textFile("C:/Users/fox/Desktop/KOREAN/iris.csv")
>>> newirisRDD = irisRDD.toDF()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\spark\python\pyspark\sql\session.py", line 61, in toDF
    return sparkSession.createDataFrame(self, schema, sampleRatio)
  File "C:\spark\python\pyspark\sql\session.py", line 605, in createDataFrame
    return self._create_dataframe(data, schema, samplingRatio, verifySchema)
  File "C:\spark\python\pyspark\sql\session.py", line 628, in _create_dataframe
    rdd, schema = self._createFromRDD(data.map(prepare), schema, samplingRatio)
  File "C:\spark\python\pyspark\sql\session.py", line 425, in _createFromRDD
    struct = self._inferSchema(rdd, samplingRatio, names=schema)
  File "C:\spark\python\pyspark\sql\session.py", line 405, in _inferSchema
    schema = _infer_schema(first, names=names)
  File "C:\spark\python\pyspark\sql\types.py", line 1067, in _infer_schema
    raise TypeError("Can not infer schema for type: %s" % type(row))
TypeError: Can not infer schema for type: <class 'str'>
>>>

标签: python

解决方案


推荐阅读