r - 如何从 uci 中清除 R 中的数据集
问题描述
下午好 ,
假设我们有以下功能:
data_preprocessing<-function(link){
link=as.character(link)
dataset=read.csv(link)
dataset=replace(dataset,dataset=="?",NA)
return(dataset)
}
示例(https协议问题):
Echocardiogram=data_preprocessing("https://archive.ics.uci.edu/ml/machine-learning-databases/echocardiogram/echocardiogram.data")
Show Traceback
Rerun with Debug
Error in file(file, "rt") : cannot open the connection
下载数据集后:
Echocardiogram=data_preprocessing("http://archive.ics.uci.edu/ml/machine-learning-databases/echocardiogram/echocardiogram.data")
head(Echocardiogram)
X11 X0 X71 X0.1 X0.260 X9 X4.600 X14 X1 X1.1 name X1.2 X0.2
1 19 0 72 0 0.380 6 4.100 14 1.700 0.588 name 1 0
2 16 0 55 0 0.260 4 3.420 14 1 1 name 1 0
3 57 0 60 0 0.253 12.062 4.603 16 1.450 0.788 name 1 0
4 19 1 57 0 0.160 22 5.750 18 2.250 0.571 name 1 0
5 26 0 68 0 0.260 5 4.310 12 1 0.857 name 1 0
6 13 0 62 0 0.230 31 5.430 22.5 1.875 0.857 name 1 0
还 :
str(Echocardiogram)
'data.frame': 130 obs. of 12 variables:
$ X11 : Factor w/ 57 levels "",".03",".25",..: 18 16 54 18 27 14 50 18 26 12 ...
$ X0 : Factor w/ 4 levels "","?","0","1": 3 3 3 4 3 3 3 3 3 4 ...
$ X71 : Factor w/ 40 levels "","?","35","46",..: 30 12 17 14 26 19 17 4 11 34 ...
$ X0.1 : int 0 0 0 0 0 0 0 0 0 0 ...
$ X0.260: Factor w/ 74 levels "","?","0.010",..: 65 50 47 26 50 42 59 60 21 19 ...
$ X9 : Factor w/ 93 levels "","?","0","10",..: 69 57 13 46 62 56 79 3 19 29 ...
$ X4.600: Factor w/ 106 levels "","?","2.32",..: 25 6 54 92 38 85 76 70 47 33 ...
$ X14 : Factor w/ 48 levels "","?","10","10.5",..: 16 16 21 27 8 36 16 21 19 27 ...
$ X1 : Factor w/ 67 levels "","?","1","1.04",..: 48 3 37 60 3 52 3 11 16 50 ...
$ X1.1 : Factor w/ 32 levels "","?","0.140",..: 14 30 25 13 27 27 30 31 29 21 ...
$ X1.2 : Factor w/ 5 levels "","?","1","2",..: 3 3 3 3 3 3 3 3 3 3 ...
$ X0.2 : Factor w/ 5 levels "","?","0","1",..: 3 3 3 3 3 3 3 3 3 4 ...
在这里,我想将"?"
数据集中的所有内容替换为NA
. 此外,删除重复和空行(如 50 行)会很好。
谢谢你的帮助 !
解决方案
像这样的东西?
library(data.table)
DT <- data.table::fread("https://archive.ics.uci.edu/ml/machine-learning-databases/echocardiogram/echocardiogram.data",
fill = TRUE,
na.strings = "?")
推荐阅读
- java - 如何在spring application.properties中设置lombok.equalsAndHashCode.callSuper = call?
- android - getApplicationIcon() 通过 PackageManager 返回一个放置在中心的白条图像,它不是实际的应用程序图标
- java - 当模块 A 接收到模块 B 类并使用此 B 类的类加载器对其执行 getBundle() 时,MissingResourceException
- kubernetes - 我们可以将 Kubernetes 指向另一个集群吗
- python - subprocess.run() 不返回 stdout 或 stderr
- .net - 不带列标题高度的 Datagrid ActualHeight
- reactjs - 如何在 redux-saga 中将 url、params、headers 传递给 call()?
- html - 在有太多选择时,适应Select2的边界
- javascript - 尝试在 vuejs 中进行数据表分页的服务器端集成。但它不能正常工作
- reactjs - React 中两个具有相同键的孩子