r - 如何从 sas7bdat 获取变量名称和标签到 data.frame
问题描述
我正在将一组 sas 数据读入 r。我想知道是否有一个代码可以用来将变量名称和变量标签放入 data.frame 中,或者像代码簿一样?
我使用 Haven 包读取数据
haven:read_sas
我想知道它是否将数据标签保存在某个地方。如果是这样,我可以把它拿出来吗?
r 中的数据如下所示:
我想构建一个如下所示的 data.frame:
错误代码:
<error/purrr_error_bad_element_vector>
Result 6 must be a single string, not NULL of length 0
Backtrace:
x
1. +-base::debug(list_of_labels <- lapply(datasets, label_lookup_map))
2. +-base::lapply(datasets, label_lookup_map)
3. | \-global::FUN(X[[i]], ...)
4. | \-tibble::tibble(col_name = df %>% names(), labels = df %>% map_chr(attr_getter("label")))
5. | \-tibble:::tibble_quos(xs[!is_null], .rows, .name_repair)
6. | \-rlang::eval_tidy(xs[[j]], mask)
7. +-df %>% map_chr(attr_getter("label"))
8. | +-base::withVisible(eval(quote(`_fseq`(`_lhs`)), env, env))
9. | \-base::eval(quote(`_fseq`(`_lhs`)), env, env)
10. | \-base::eval(quote(`_fseq`(`_lhs`)), env, env)
11. | \-`_fseq`(`_lhs`)
12. | \-magrittr::freduce(value, `_function_list`)
13. | +-base::withVisible(function_list[[k]](value))
14. | \-function_list[[k]](value)
15. | \-purrr::map_chr(., attr_getter("label"))
16. \-purrr:::stop_bad_element_vector(...)
17. \-purrr:::stop_bad_vector(...)
18. \-purrr:::stop_bad_type(...)
Itr 看起来错误是由如下所示的数据引起的:
样本数据可以通过
df<- structure(list(VISITNUM = c(4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,
4, 4, 4, 4, 4, 14, 14, 14, 14), EXDOSE = c(36, 109, 182, 182,
182, 182, 182, 55, 36, 55, 36, 55, 109, 182, 109, 182, 2600,
2600, 2600, 2600), EXDOSU = c("mg", "mg", "mg", "mg", "mg", "mg",
"mg", "mg", "mg", "mg", "mg", "mg", "mg", "mg", "mg", "mg", "mg",
"mg", "mg", "mg")), label = "EX ", row.names = c(NA,
20L), class = "data.frame")
解决方案
您可能会发现这个问题很有帮助:Extract the labels attribute from "labeled" tibble columns from ahave import from Stata
这是一个例子:
library(haven)
library(tidyverse)
airline <- read_sas("http://www.principlesofeconometrics.com/sas/airline.sas7bdat")
label_lookup_map <- tibble(
col_name = airline %>% names(),
labels = airline %>% map_chr(attr_getter("label"))
)
print(label_lookup_map)
# # A tibble: 6 x 2
# col_name labels
# <chr> <chr>
# 1 YEAR year
# 2 Y level of output
# 3 W wage rate
# 4 R interest rate
# 5 L labor input
# 6 K capital input
编辑:根据评论,如果您想在其中一些 data.frames 没有标签的列表中获取多个 data.frames 的标签,这是一个示例。
library(haven)
library(tidyverse)
airline <- read_sas("http://www.principlesofeconometrics.com/sas/airline.sas7bdat")
cola <- read_sas("http://www.principlesofeconometrics.com/sas/cola.sas7bdat")
data(iris)
list_of_tbl <- list(airline, cola, iris)
get_labels <- attr_getter("label")
has_labels <- function(df) {
!all(sapply(lapply(df, get_labels), is.null))
}
label_lookup_map <- function(df) {
df_labels <- NA
if (has_labels(df)) {
df_labels <- df %>% map_chr(get_labels)
}
tibble(
col_name = df %>% names,
labels = df_labels
)
}
list_of_labels <- lapply(list_of_tbl, label_lookup_map)
print(list_of_labels)
# [[1]]
# # A tibble: 6 x 2
# col_name labels
# <chr> <chr>
# 1 YEAR year
# 2 Y level of output
# 3 W wage rate
# 4 R interest rate
# 5 L labor input
# 6 K capital input
# [[2]]
# # A tibble: 5 x 2
# col_name labels
# <chr> <chr>
# 1 ID customer id
# 2 CHOICE = 1 if brand chosen
# 3 PRICE price of 2 liter soda
# 4 FEATURE = 1 featured item at the time of purchase
# 5 DISPLAY = 1 if displayed at time of purchase
# [[3]]
# # A tibble: 5 x 2
# col_name labels
# <chr> <lgl>
# 1 Sepal.Length NA
# 2 Sepal.Width NA
# 3 Petal.Length NA
# 4 Petal.Width NA
# 5 Species NA
推荐阅读
- python - TypeError: fun() 接受 1 个位置参数,但在 sciply.optimize.curve_fit 中给出了 2 个错误
- javascript - Promise 构造函数是同步的吗?
- excel - 计算VBA中的百分比
- reactjs - module.exports 在 react 16.12.0 中不再工作,在 16.7.0 中工作
- c# - @User.Identity.Name 在 _layout.cshtml 上的 IIS 上为空
- java - 无法使用 setValueAt() 编辑我的 JTable 数据
- amazon-dynamodb - 假设您在 DynamoDB 上编写类似 cognito 的用户池,您将如何对其进行分区以跨越多个公司?
- python - 如何在python中从真实IP变成自动IP范围?
- python - 通过比较时间和持续时间来合并数据框 pandas 中的行
- c# - 线程与资源检索的同步