python - AttributeError:“PipelinedRDD”对象没有属性“_jdf”
问题描述
我对 PySpark 还很陌生。尝试运行逻辑回归时出现属性错误。我正在尝试对 minmaxscaler 向量进行逻辑回归,以获得数据点之间可能匹配的概率值。
number_games = df2.filter(df2.GAME_ID > 22000000).filter(
df2.GAME_ID < 40000000).groupby("TEAM_ABBREVIATION").agg(
(F.sum("FGM") / F.countDistinct("GAME_ID")).alias('Points_Per_Game'))
vectorassembler = VectorAssembler(inputCols=["Points_Per_Game"],
outputCol="Performance")
scaler = MinMaxScaler(inputCol="Performance", outputCol="Output")
vectors = vectorassembler.transform(number_games)
scaler_model = scaler.fit(vectors)
scaler_data = scaler_model.transform(vectors)
statistics_teams = scaler_data.select('TEAM_ABBREVIATION',
'Output') # teams match up against one another
statistics_teams
RDD2 = sc.parallelize(statistics_teams.collect())
# RDD4 = RDD2.map( lambda x: x.split()) even as a pipelineRDD I get the same attribute error
lr = LogisticRegression(maxIter=20, regParam=0.001)
logistic_model = lr.fit(RDD2)
logistic_model.show()
错误返回
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-46-3c0eb05824a8> in <module>
1 lr = LogisticRegression(maxIter=20, regParam=0.001)
----> 2 logistic_model = lr.fit(RDD4)
3
4 logistic_model.show()
c:\users\user\appdata\local\programs\python\python39\lib\site-packages\pyspark\ml\base.py in fit(self, dataset, params)
159 return self.copy(params)._fit(dataset)
160 else:
--> 161 return self._fit(dataset)
162 else:
163 raise ValueError("Params must be either a param map or a list/tuple of param maps, "
c:\users\user\appdata\local\programs\python\python39\lib\site-packages\pyspark\ml\wrapper.py in _fit(self, dataset)
333
334 def _fit(self, dataset):
--> 335 java_model = self._fit_java(dataset)
336 model = self._create_model(java_model)
337 return self._copyValues(model)
c:\users\user\appdata\local\programs\python\python39\lib\site-packages\pyspark\ml\wrapper.py in _fit_java(self, dataset)
330 """
331 self._transfer_params_to_java()
--> 332 return self._java_obj.fit(dataset._jdf)
333
334 def _fit(self, dataset):
AttributeError: 'PipelinedRDD' object has no attribute '_jdf'
解决方案
在这种情况下,您可以尝试调用.fit()
实际的数据框statistics_teams
吗?我认为 LogisticRegression 适用于数据帧而不是 RDD。
推荐阅读
- javascript - 在 Node JS 和 Express 中触发服务器发送事件
- php - 可以在 Laravel 8 中使用默认的 Controller.php 吗?
- azure - Azure VPN 客户端 Azure Active Directory 身份验证
- elasticsearch - 在 Elasticsearsh 中区分不区分大小写的字段
- sql - MSOLEDBSQL 与 MSOLEDBSQL.1 有什么区别?
- python - Backtrader : dt = date(int(dttxt[0:4]), int(dttxt[5:7]), int(dttxt[8:10])) ValueError: int() 的无效文字,以 10 为底:'31 /0' 错误
- angular7 - forRoot 从 Angular 服务中获取值
- javascript - 将 REST 调用结果转换为数组的 JS 脚本
- javascript - 如何启用/禁用闪亮,闪亮,闪亮小部件中的特定单选按钮
- r - 分别从每个唯一因素获取 ANOVA 汇总统计数据