首页 > 解决方案 > 在 R 中将数据整理成长格式

问题描述

我有一个自然文章的源数据集。我想知道如何将第 4 行和第 12 行的值提取为具有相关分配组的长数据格式(即低效/高效)。

这是我用来将数据导入 R 的代码。


# load the required libraries 
library(ggsignif) 
library(readxl) 
library(svglite) 
library(tidyverse) 
library(tidyr) 
library(dplyr) 

# The paper from which the figure is taken is Tasdogen et al. (2020)
# Metabolic heterogeneity confers differences in melanoma metastatic potential 

# The figure is 2b and can be accessed at 
# https://www.nature.com/articles/s41586-019-1847-2#MOESM3 

# The link to the raw data used in the article is given below and directly improted for plotting 

url <-'https://static-content.springer.com/esm/art%3A10.1038%2Fs41586-019-1847-2/MediaObjects/41586_2019_1847_MOESM3_ESM.xlsx' 

#create a dataframe from the Excel data 
temp <- tempfile() 

download.file(url, temp, mode='wb') 

myData <- read_excel(path = temp) 

我不知道如何插入数据集的图像,但它应该显示在前面的代码中。我需要 2-31 列来表示高效,2 到 37 列表示低效。

我希望这些信息足以让人们理解我所说的。

标签: rdata-wrangling

解决方案


虽然它可能不漂亮,但我相信这将是您仅使用readxltidyverse包的解决方案:

# Select first set of rows with group and value
set1 <- 
  myData %>% 
  filter(row_number() %in% c(2, 4))

# Select second set of rows with group and value
set2 <- 
  myData %>% 
  filter(row_number() %in% c(10, 12))

# Join both sets of data, so that all group labels are in one row and all values are in one row.
left_join(set1, set2, by = "Fractional enrichment of glucose m+6 in primary subcutaneous tumors after [U-13C]glucose infusion") %>% 
  #pivot the table to a long format with group lable and value labels in separate columns
  pivot_longer(cols = !`Fractional enrichment of glucose m+6 in primary subcutaneous tumors after [U-13C]glucose infusion`) %>% 
  # pivot wider to a format with group lable and value labels in separate columns
  pivot_wider(names_from = `Fractional enrichment of glucose m+6 in primary subcutaneous tumors after [U-13C]glucose infusion`, values_from = value) %>% 
  # Remove old column names/numbers
  select(-name)
# A tibble: 72 x 2
   Group       `Glucose m+6`      
   <chr>       <chr>              
 1 Inefficient 0.48499999999999999
 2 Inefficient 0.47399999999999998
 3 Inefficient 0.48799999999999999
 4 Inefficient 0.45600000000000002
 5 Inefficient 0.53100000000000003
 6 Inefficient 0.318              
 7 Inefficient 0.26600000000000001
 8 Inefficient 0.30399999999999999
 9 Inefficient 0.309              
10 Inefficient 0.33               
# ... with 62 more rows


推荐阅读