r - 将 .dta 导入 R
问题描述
我目前正在尝试将数据从 .dta 导入到 R。数据也以 .tab 格式提供。如何将变量作为标签导入?
我正在使用的代码:
data <- read_dta("file.dta")
或者
data <- read.table("file.tab", header = F, sep = "\t", fill = TRUE)
有没有办法以 R 识别标签的方式导入变量?
样本数据
structure(list(sn1 = structure(c(1, 1, 1, 1, 1, 1), label = " point number ", format.stata = "%8.0g"),
wjob12s5 = structure(c(NA, NA, 0, 0, 0, 0), label = " day 5/ twelfth job line starts at ", format.stata = "%8.0g", labels = c(` 4:01 4:15 ` = 1,
` 4:16 4:30 ` = 2, ` 4:31 4:45 ` = 3, ` 4:46 5:00 ` = 4,
` 5:01 5:15 ` = 5, ` 5:16 5:30 ` = 6, ` 5:31 5:45 ` = 7,
` 5:46 6:00 ` = 8, ` 6:01 6:15 ` = 9, ` 6:16 6:30 ` = 10,
` 6:31 6:45 ` = 11, ` 6:46 7:00 ` = 12, ` 7:01 7:15 ` = 13,
` 7:16 7:30 ` = 14, ` 7:31 7:45 ` = 15, ` 7:46 8:00 ` = 16,
` 8:01 8:15 ` = 17, ` 8:16 8:30 ` = 18, ` 8:31 8:45 ` = 19,
` 8:46 9:00 ` = 20, ` 9:01 9:15 ` = 21, ` 9:16 9:30 ` = 22,
` 9:31 9:45 ` = 23, ` 9:46 10:00 ` = 24, ` 10:01 10:15 ` = 25,
` 10:16 10:30 ` = 26, ` 10:31 10:45 ` = 27, ` 10:46 11:00 ` = 28,
` 11:01 11:15 ` = 29, ` 11:16 11:30 ` = 30, ` 11:31 11:45 ` = 31,
` 11:46 12:00 ` = 32, ` 12:01 12:15 ` = 33, ` 12:16 12:30 ` = 34,
` 12:31 12:45 ` = 35, ` 12:46 13:00 ` = 36, ` 13:01 13:15 ` = 37,
` 13:16 13:30 ` = 38, ` 13:31 13:45 ` = 39, ` 13:46 14:00 ` = 40,
` 14:01 14:15 ` = 41, ` 14:16 14:30 ` = 42, ` 14:31 14:45 ` = 43,
` 14:46 15:00 ` = 44, ` 15:01 15:15 ` = 45, ` 15:16 15:30 ` = 46,
` 15:31 15:45 ` = 47, ` 15:46 16:00 ` = 48, ` 16:01 16:15 ` = 49,
` 16:16 16:30 ` = 50, ` 16:31 16:45 ` = 51, ` 16:46 17:00 ` = 52,
` 17:01 17:15 ` = 53, ` 17:16 17:30 ` = 54, ` 17:31 17:45 ` = 55,
` 17:46 18:00 ` = 56, ` 18:01 18:15 ` = 57, ` 18:16 18:30 ` = 58,
` 18:31 18:45 ` = 59, ` 18:46 19:00 ` = 60, ` 19:01 19:15 ` = 61,
` 19:16 19:30 ` = 62, ` 19:31 19:45 ` = 63, ` 19:46 20:00 ` = 64,
` 20:01 20:15 ` = 65, ` 20:16 20:30 ` = 66, ` 20:31 20:45 ` = 67,
` 20:46 21:00 ` = 68, ` 21:01 21:15 ` = 69, ` 21:16 21:30 ` = 70,
` 21:31 21:45 ` = 71, ` 21:46 22:00 ` = 72, ` 22:01 22:15 ` = 73,
` 22:16 22:30 ` = 74, ` 22:31 22:45 ` = 75, ` 22:46 23:00 ` = 76,
` 23:01 23:15 ` = 77, ` 23:16 23:30 ` = 78, ` 23:31 23:45 ` = 79,
` 23:46 0:00 ` = 80, ` 0:01 0:15 ` = 81, ` 0:16 0:30 ` = 82,
` 0:31 0:45 ` = 83, ` 0:46 1:00 ` = 84, ` 1:01 1:15 ` = 85,
` 1:16 1:30 ` = 86, ` 1:31 1:45 ` = 87, ` 1:46 2:00 ` = 88,
` 2:01 2:15 ` = 89, ` 2:16 2:30 ` = 90, ` 2:31 2:45 ` = 91,
` 2:46 3:00 ` = 92, ` 3:01 3:15 ` = 93, ` 3:16 3:30 ` = 94,
` 3:31 3:45 ` = 95, ` 3:46 4:00 ` = 96), class = c("haven_labelled",
"vctrs_vctr", "double"))), row.names = c(NA, -6L), class = c("tbl_df",
"tbl", "data.frame"))
解决方案
如果 read_dta 失败了,你可以在“labelled”包的帮助下创建一个查找表。请注意,我在这里使用了您的示例数据框的删节和稍微调整的版本:
图书馆(标记)图书馆(tidyverse)
df <-
structure(list(sn1 = structure(c(1, 1, 1, 1, 1, 1),
label = " point number ",
format.stata = "%8.0g"),
wjob12s5 = structure(c(NA, NA, 0, 1, 2, 3),
label = " day 5/ twelfth job line starts at ",
format.stata = "%8.0g",
labels = c(` 4:01 4:15 ` = 0, ` 4:16 4:30 ` = 2, ` 4:31 4:45 ` = 3),
class = c("haven_labelled", "vctrs_vctr", "double"))),
row.names = c(NA, -6L),
class = c("tbl_df","tbl", "data.frame"))
lookup_table <- data.frame(
code = val_labels(df$wjob12s5),
labels = val_labels(df$wjob12s5) %>% names())
df %>% left_join(lookup_table, by = c("wjob12s5" = "code"))
推荐阅读
- hibernate-search - 在休眠搜索中对嵌入的实体名称进行排序
- activemq - Python程序使用Stomp协议连接ActiveMQ不断断开连接
- bash - Shell 脚本 - 使用 bash 对从 aws cloudwatch 统计数据的 json 提取的统计数据运行操作
- python - 尝试从 FTP 文件列表中匹配正则表达式文件名
- intellij-idea - IntelliJ 代码样式设置中的视觉指南
- c# - 如何在 C# 中获取用户输入并更新 Google 电子表格
- javascript - 如果对象值`num`大于数组中的`quota`,则将类添加到与我的数组对象具有相同#id的元素
- android - Firebase Crashlytics SDK 无法识别 Crashlytics 类
- sh - 当为 sh -c 和 su 传入字符串时,如何让管道工作
- flutter - 如何映射整个文档子集合的数据并使其成为 Flutter 中的流?