python - Normalization by min-max & stand deviation method to only certain columns using Python or R
问题描述
I have a dataframe which has 37 variables and 50,000 rows. There are both categorical and numerical features. I would like to do the normalization function to some columns in the dataframe.
Here is a fake dataset:
diagnosis gender area age weight score compactness class
447 1 95.88 50 117.66 674.8 80 0
167 0 109.3 65 118.8 886.3 35.6 2
444 0 117.5 80 160.85 990 64.2 2
100 0 88.05 35 94.98 582.7 35.23 1
227 1 97.45 40 15.51 684.5 70 1
I want to do normalization only to area, weight, score, compactness for example. How should I do it? BTW, I found a stand deviation method from here , but it meant for normalizing the whole dataset and the code is:
# identify outliers with standard deviation
from numpy.random import seed
from numpy.random import randn
from numpy import mean
from numpy import std
# calculate summary statistics
data_mean, data_std = mean(data), std(data)
# identify outliers
cut_off = data_std * 3
lower, upper = data_mean - cut_off, data_mean + cut_off
# identify outliers
outliers = [x for x in data if x < lower or x > upper]
print('Identified outliers: %d' % len(outliers))
# remove outliers
outliers_removed = [x for x in data if x >= lower and x <= upper]
print('Non-outlier observations: %d' % len(outliers_removed))
My question is how can do normalization only to some columns in a dataframe? Thanks for your help in advance!
解决方案
我实际上有一个用于自动标准化的书面函数。如下:
n <-function(x){
d=dim(x)
c=colMeans(x)
xm=sapply(1:d[2],function(i){
x[,i]=x[,i]-c[i]
})
# xm is the x with removed means
v=var(xm) # variance matrix
xn=sapply(1:d[2],function(i){
xm[,i]=xm[,i]/sqrt(v[i,i])
})
xn
}
然后只需将此函数应用于所需的列。
tochange=c("age","weight","score")
df[,tochange]=n(df[,tochange])
> df
diagnosis gender area age weight score
[1,] 447 1 95.88 -0.2161373 0.3000106 -0.5282662
[2,] 167 0 109.30 0.5943775 0.3212536 0.7290858
[3,] 444 0 117.50 1.4048924 1.1048216 1.3455747
[4,] 100 0 88.05 -1.0266521 -0.1226130 -1.0757939
[5,] 227 1 97.45 -0.7564805 -1.6034728 -0.4706004
compactness class
[1,] 80.00 0
[2,] 35.60 2
[3,] 64.20 2
[4,] 35.23 1
[5,] 70.00 1
推荐阅读
- java - 升级 JUNIT 5 后无单元测试
- unit-testing - 有没有办法像在 gtest 中一样测试 double 值
- c# - 为什么AES算法会出错
- ios - 按日期分组过滤结果而不将所有内容转储到内存
- sql - SQL 查询,它搜索具有特定数组列项的所有行
- python-3.x - 如何设置通用工作目录?
- ios - 如何设置 Jenkins 来构建使用 Carthage 的 React Native iOS 应用程序?
- graphql - AWS AppSync GraphQL - 使用联合作为突变返回导致错误
- java - 没有从属性文件中获取整数值
- ruby-on-rails - 带有语义 UI 滑轨的图像循环