python - AttributeError:“DataFrame”对象在 Dask 中没有“take”属性
问题描述
我对 Dask 有疑问。我检查了 csv 文件,一切正常,我不上传它,因为它是机密的。但也许您可以尝试自己的 CSV 并看到您得到相同的错误。
我的代码如下:
from dask.distributed import Client
client = Client(n_workers=4)
client
import dask.dataframe as dd
df = dd.read_csv('merged_data.csv')
X=df[['Mp10','Mp10_cal','Mp2_5','Mp2_5_cal','Humedad','Temperatura']]
y = df['Sector']
from dask_ml.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.8, random_state=42, shuffle=False)
import joblib
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import GridSearchCV
# Create the parameter grid based on the results of random search
param_grid = {
'bootstrap': [True],
'max_depth': [80, 90, 100, 110],
'max_features': [2, 3],
'min_samples_leaf': [3, 4, 5],
'min_samples_split': [8, 10, 12],
'n_estimators': [100, 200, 300, 1000]
}
# Create a based model
rf = RandomForestClassifier()
# Instantiate the grid search model
grid_search = GridSearchCV(estimator = rf, param_grid = param_grid,
cv = 5, n_jobs = -1, verbose = 2)
grid_search.fit(X_train, y_train).compute()```
错误如下:
Fitting 5 folds for each of 288 candidates, totalling 1440 fits
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_17712/1827769193.py in <module>
----> 1 grid_search.fit(X_train, y_train).compute()
C:\WORKSPACE\DataLab\lib\site-packages\sklearn\model_selection\_search.py in fit(self, X, y, groups, **fit_params)
889 return results
890
--> 891 self._run_search(evaluate_candidates)
892
893 # multimetric is determined here because in the case of a callable
C:\WORKSPACE\DataLab\lib\site-packages\sklearn\model_selection\_search.py in _run_search(self, evaluate_candidates)
1390 def _run_search(self, evaluate_candidates):
1391 """Search all candidates in param_grid"""
-> 1392 evaluate_candidates(ParameterGrid(self.param_grid))
1393
1394
C:\WORKSPACE\DataLab\lib\site-packages\sklearn\model_selection\_search.py in evaluate_candidates(candidate_params, cv, more_results)
836 )
837
--> 838 out = parallel(
839 delayed(_fit_and_score)(
840 clone(base_estimator),
C:\WORKSPACE\DataLab\lib\site-packages\joblib\parallel.py in __call__(self, iterable)
1054
1055 with self._backend.retrieval_context():
-> 1056 self.retrieve()
1057 # Make sure that we get a last message telling us we are done
1058 elapsed_time = time.time() - self._start_time
C:\WORKSPACE\DataLab\lib\site-packages\joblib\parallel.py in retrieve(self)
933 try:
934 if getattr(self._backend, 'supports_timeout', False):
--> 935 self._output.extend(job.get(timeout=self.timeout))
936 else:
937 self._output.extend(job.get())
C:\WORKSPACE\DataLab\lib\site-packages\joblib\_parallel_backends.py in wrap_future_result(future, timeout)
540 AsyncResults.get from multiprocessing."""
541 try:
--> 542 return future.result(timeout=timeout)
543 except CfTimeoutError as e:
544 raise TimeoutError from e
~\AppData\Local\Programs\Python\Python39\lib\concurrent\futures\_base.py in result(self, timeout)
443 raise CancelledError()
444 elif self._state == FINISHED:
--> 445 return self.__get_result()
446 else:
447 raise TimeoutError()
~\AppData\Local\Programs\Python\Python39\lib\concurrent\futures\_base.py in __get_result(self)
388 if self._exception:
389 try:
--> 390 raise self._exception
391 finally:
392 # Break a reference cycle with the exception in self._exception
AttributeError: 'DataFrame' object has no attribute 'take'
解决方案
X, y 是 Dask DataFrames,我相信您需要为 GridSearchCV 使用 Dask Arrays。
要将 Dask DataFrames 转换为 Dask Arrays,您可以使用:
X = df[['Mp10','Mp10_cal','Mp2_5','Mp2_5_cal','Humedad','Temperatura']].to_dask_array(lengths=True)
y = df['Sector'].to_dask_array(lengths=True)
在此之后,您的其余代码应该可以工作。
compute()
另外,你不需要打电话grid_search.fit
:)
推荐阅读
- c# - 使用 Bitmaptransform 缩放图像会在 win2d 中提供模糊图像
- c - 文件指针意外更改的问题
- sql - 创建视图创建一个以前不存在的新错误
- windows - 无法连接到主机 discordapp.com:443 ssl:default [[SSL: CERTIFICATE_VERIFY_FAILED] 证书验证失败 (_ssl.c:852)]
- c# - 在我的 Elasticsearch 嵌套客户端中使用泛型时出错?它无法识别字段。标题
- react-native - React Native Color 属性不起作用
- javascript - Qualtrics:如何使用 altair 可视化库创建图表
- excel - VBA 代码返回运行时错误“这不起作用,因为它会移动工作表中表格中的单元格”
- r - 使用 readline 在 R 中存储多个用户输入
- python - FileNotFoundError: [Errno 2] 没有这样的文件或目录。Python没有正确读取文件