首页 > 解决方案 > 如何以图形方式测量变量之间的关系?

问题描述

首先是我的数据:

dput(A22[1:10,])
structure(list(var1 = c("2.655086631421", "3.7212389963679",
"5.72853363351896", "9.08207789994776", "2.01681931037456", "8.98389684967697",
"9.44675268605351", "6.60797792486846", "6.29114043898881", "0.617862704675645"
), var2 = c("1552.74486613787", "-2569.05222968964", "444.924755180376",
"-30903.126560766", "5712.55164894465", "-15996.3316364127",
"-39466.7802848889", "-6396.48804278828", "662.572855848352",
"-542.783293142592"), var3 = c("12.0761815621956", "15.531955650981",
"24.3703946694194", "38.692940909924", "1.13425531130685", "37.6187150619221",
"48.2338786451232", "27.554822845155", "22.9179948054061", "7.56647601307255"
), var4 = c("0.136221893102778", "0.407167603423836", "-0.0696548130129049",
"-0.247664341619331", "0.69555080661964", "1.1462283572158",
"-2.40309621489187", "0.572739555245841", "0.374724406778655",
"-0.425267721556076"), gruppe = c("0", "0", "0", "1", "1", "1",
"1", "1", "0", "0")), row.names = c(NA, 10L), class = "data.frame")

我将数据理解如下:我有两个不同的组(组 0 和组 1)。在每组 var1、var2、var3、var4 中进行测量。

我的任务是:

(a)Visualize the distribution of var1. Do you recognize group-specific differences?
(b)Graphically analyze the relationship between the following variables:
   (i)var1 and var3 
   (ii)var1 and var2 
   (iii)var1 and var4
(c)Calculate an introductory measure to measure the relationship.

我的主要问题是:我不知道如何从我的数据中提取(例如在(a)中)var1,以及如何以某种方式在组中对其进行条件化并将其可视化。

我很高兴得到帮助。

最好的问候和愉快的周末。

标签: rggplot2

解决方案


由于这个问题被标记为ggplot2,我将在回答时考虑 ggplot2。您提供的数据采用所谓的“宽格式”,与“长格式”相反。ggplot2 更适用于“长格式”数据,因此第一步是使用tidyr::pivot_longer(). 然后,您可以制作多面核密度图,以按组显示不同变量在不同面和颜色中的分布。

df <- tidyr::pivot_longer(A22, cols = c("var1", "var2", "var3", "var4"))
# or: df <- tidyr::pivot_longer(A22, !gruppe)

library(ggplot2)

ggplot(df, aes(as.numeric(value), fill = as.factor(gruppe))) +
  geom_density(alpha  = 0.3) +
  facet_wrap(~ name, scales = "free")


推荐阅读