首页 > 解决方案 > 如何从 sas7bdat 获取变量名称和标签到 data.frame

问题描述

我正在将一组 sas 数据读入 r。我想知道是否有一个代码可以用来将变量名称和变量标签放入 data.frame 中,或者像代码簿一样?

我使用 Haven 包读取数据

haven:read_sas

我想知道它是否将数据标签保存在某个地方。如果是这样,我可以把它拿出来吗?

r 中的数据如下所示:

在此处输入图像描述

我想构建一个如下所示的 data.frame:

在此处输入图像描述

错误代码:

<error/purrr_error_bad_element_vector>
Result 6 must be a single string, not NULL of length 0
Backtrace:
     x
  1. +-base::debug(list_of_labels <- lapply(datasets, label_lookup_map))
  2. +-base::lapply(datasets, label_lookup_map)
  3. | \-global::FUN(X[[i]], ...)
  4. |   \-tibble::tibble(col_name = df %>% names(), labels = df %>% map_chr(attr_getter("label")))
  5. |     \-tibble:::tibble_quos(xs[!is_null], .rows, .name_repair)
  6. |       \-rlang::eval_tidy(xs[[j]], mask)
  7. +-df %>% map_chr(attr_getter("label"))
  8. | +-base::withVisible(eval(quote(`_fseq`(`_lhs`)), env, env))
  9. | \-base::eval(quote(`_fseq`(`_lhs`)), env, env)
 10. |   \-base::eval(quote(`_fseq`(`_lhs`)), env, env)
 11. |     \-`_fseq`(`_lhs`)
 12. |       \-magrittr::freduce(value, `_function_list`)
 13. |         +-base::withVisible(function_list[[k]](value))
 14. |         \-function_list[[k]](value)
 15. |           \-purrr::map_chr(., attr_getter("label"))
 16. \-purrr:::stop_bad_element_vector(...)
 17.   \-purrr:::stop_bad_vector(...)
 18.     \-purrr:::stop_bad_type(...)

Itr 看起来错误是由如下所示的数据引起的:

在此处输入图像描述

样本数据可以通过

df<- structure(list(VISITNUM = c(4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 
4, 4, 4, 4, 4, 14, 14, 14, 14), EXDOSE = c(36, 109, 182, 182, 
182, 182, 182, 55, 36, 55, 36, 55, 109, 182, 109, 182, 2600, 
2600, 2600, 2600), EXDOSU = c("mg", "mg", "mg", "mg", "mg", "mg", 
"mg", "mg", "mg", "mg", "mg", "mg", "mg", "mg", "mg", "mg", "mg", 
"mg", "mg", "mg")), label = "EX                              ", row.names = c(NA, 
20L), class = "data.frame")

标签: r

解决方案


您可能会发现这个问题很有帮助:Extract the labels attribute from "labeled" tibble columns from ahave import from Stata

这是一个例子:

library(haven)
library(tidyverse)

airline <- read_sas("http://www.principlesofeconometrics.com/sas/airline.sas7bdat")

label_lookup_map <- tibble(
  col_name = airline %>% names(),
  labels = airline %>% map_chr(attr_getter("label"))
)

print(label_lookup_map)
# # A tibble: 6 x 2
# col_name labels         
# <chr>    <chr>          
# 1 YEAR   year           
# 2 Y      level of output
# 3 W      wage rate      
# 4 R      interest rate  
# 5 L      labor input    
# 6 K      capital input

编辑:根据评论,如果您想在其中一些 data.frames 没有标签的列表中获取多个 data.frames 的标签,这是一个示例。

library(haven)
library(tidyverse)

airline <- read_sas("http://www.principlesofeconometrics.com/sas/airline.sas7bdat")
cola <- read_sas("http://www.principlesofeconometrics.com/sas/cola.sas7bdat")
data(iris)

list_of_tbl <- list(airline, cola, iris)

get_labels <- attr_getter("label")

has_labels <- function(df) {
    !all(sapply(lapply(df, get_labels), is.null))
}

label_lookup_map <- function(df) {

    df_labels <- NA
    if (has_labels(df)) {
        df_labels <- df %>% map_chr(get_labels)
    }
 
  tibble(
    col_name = df %>% names,
    labels = df_labels
  )
}

list_of_labels <- lapply(list_of_tbl, label_lookup_map)

print(list_of_labels)
# [[1]]
# # A tibble: 6 x 2
#   col_name labels         
#   <chr>    <chr>          
# 1 YEAR     year           
# 2 Y        level of output
# 3 W        wage rate      
# 4 R        interest rate  
# 5 L        labor input    
# 6 K        capital input  

# [[2]]
# # A tibble: 5 x 2
#   col_name labels                                   
#   <chr>    <chr>                                    
# 1 ID       customer id                              
# 2 CHOICE   = 1 if brand chosen                      
# 3 PRICE    price of 2 liter soda                    
# 4 FEATURE  = 1 featured item at the time of purchase
# 5 DISPLAY  = 1 if displayed at time of purchase     

# [[3]]
# # A tibble: 5 x 2
#   col_name     labels
#   <chr>        <lgl> 
# 1 Sepal.Length NA    
# 2 Sepal.Width  NA    
# 3 Petal.Length NA    
# 4 Petal.Width  NA    
# 5 Species      NA 

推荐阅读