首页 > 解决方案 > 将 Excel 文件导入 R

问题描述

我必须将 Excel 文件导入 R,但到目前为止我发现的每个教程都是关于简单数据表的,而我的则更复杂一些。你能帮我解决这个问题吗?

https://drive.google.com/file/d/1R5sVaP20MDLlaY6TLesrCj664wJYUhDG/view?usp=sharing

非常感谢您!

标签: r

解决方案


library(xlsx)
library(zoo)

# Read the dataset starting form the 3rd line
df <- read.xlsx("SO.xlsx", 1, header=TRUE,startRow=3, stringsAsFactors=FALSE)

# Clean the data to remove the lines that should not be there
# like the lines 4 and 66 in this dataset
# this could be done many ways. Here I assume that all columns starting from the third 
# should have some values
df <- df[!is.na(df$hallos),]

# Assign the names to the first 2 columns
names(df)[1:2] <- c( "year", "type")

# The last 2 rows are summaries, so we probably want to remove them
df <- df[!grepl("",df$type),]

# The first column "year" has many missing values. We need to add year values to each cell:
df$year <- na.locf(df$year)

警告:由于此文本框格式的限制,以下结果缺少一些带重音符号的字符,但在 R 环境中,类型列中的列名和符号将正确显示。

# Result
head(df)
#   year type  hallos  slyos.   knny sszesen  meghalt  slyosan  knnyen  sszesen.1
# 2 2013    J      28     255    622      905      33      300     870       1203
# 3 2013    F      31     223    527      781      34      248     764       1046
# 4 2013    M      34     274    691      999      34      320     971       1325
# 5 2013    A      36     349    757     1142      42      392    1090       1524
# 6 2013   Mj      52     436    902     1390      54      501    1241       1796
# 7 2013    J      39     455   1004     1498      41      509    1414       1964

推荐阅读