r - Data wrangling in Shiny: plotting newly wrangled data after k-means clustering analysis
问题描述
I am trying to build a data analytics dashboard and I am using Shiny, which I am relatively new to. One of the features of my dashboard uses k-means clustering on user generated data. I can get the clustering analysis to work fine, but I want to be able to exploratory data analysis on individual clusters once the initial cluster analysis has been done. Also, I would like to do this with reactive data frames in Shiny, so that if the user changes a value on the dash board, the analysis refreshes, including the post-clustering exploratory stuff.
Before anything, here are some functions that I use in the dashboard server code and relevant libraries, so run these first:-
#libraries===================================================================
library(ids)
library(tidyverse)
library(dplyr)
library(shiny)
library(ggplot2)
library(shinydashboard)
library(shinyWidgets)
library(factoextra)
#functions required==========================================================
#scale https://stackoverflow.com/questions/35775696/trying-to-use-dplyr-to-group-by-and-apply-scale
scale_this <- function(x){
(x - mean(x, na.rm=TRUE)) / sd(x, na.rm=TRUE)
}
#wss plot
wssplot <- function(data, nc = 15, seed = 1234) {
wss <- (nrow(data) - 1) * sum(apply(data, 2, var))
for (i in 2:nc) {
set.seed(seed)
wss[i] <- sum(kmeans(data, centers = i)$withinss)
}
plot(1:nc,
wss,
type = "b",
xlab = "Number of Clusters",
ylab = "Within groups sum of squares")
}
Here is the code for the mock data frame for this example:-
#Create my mock data frame============================================
set.seed(123)
randomid<-random_id(333)#from 'ids' library
Duration<-c(floor(runif(10000, min=1, max=1000)))
mockdf<-cbind(randomid, Duration)
mockdf<-as.data.frame(mockdf)
mockdf$Duration<-as.numeric(mockdf$Duration)
My UI code:-
#UI============================================================================
ui<-fluidPage(
titlePanel('Minimal example'),
tabsetPanel(
#=============================================kmeans clustering==================================================
tabPanel("User Type Discovery",
sidebarLayout(
sidebarPanel(width = 4,numericInput('ksolution', 'Select k solution', 5),
pickerInput('userselect', 'Which users do you want to include:',
choices = unique(mockdf$randomid), options = list('actions-box'=TRUE),multiple = T)),
mainPanel(fluidRow(
column(12, plotOutput("elbowplot")),
column(12, plotOutput("clustplot")),
column(12, plotOutput("clust_dens")),
column(12, DT::dataTableOutput('Clusterdf'))))
)
)
)
)
And my server code:-
#SERVER===========================================================
server<-function(input,output,session){
#create reactive dataframe
rval_df <-reactive({
mockdf
})
#=============================================kmeans clustering==================================================
rval_UserData<-reactive({
rval_df()%>%
filter(randomid %in% input$userselect)%>%
group_by(randomid)%>%
summarise(Count=n(),MeanDuration=mean(Duration),SDDuration=sd(Duration))%>%
mutate(SDDuration=if_else(is.na(SDDuration),0,SDDuration),
Cluster=as.factor(rval_kclust()$cluster))
})
#create a scaled dataset for the clustering
rval_cluster_df<-reactive({
rval_df()%>%
filter(randomid %in% input$userselect)%>%
group_by(randomid)%>%
summarise(Count=n(),MeanDuration=mean(Duration),SDDuration=sd(Duration))%>%
mutate(SDDuration=if_else(is.na(SDDuration),0,SDDuration),
Count=scale_this(Count),
MeanDuration=scale_this(MeanDuration),
SDDuration=scale_this(SDDuration))%>%
select(Count,MeanDuration,SDDuration)
})
#cluster algorithm
rval_kclust<-reactive({
kmeans(rval_cluster_df(), centers = input$ksolution)
})
output$clustplot<-renderPlot({
factoextra::fviz_cluster(rval_kclust(), data = rval_cluster_df())
})
output$elbowplot<-renderPlot({
wssplot(rval_cluster_df())
})
output$Clusterdf<- DT::renderDataTable({
rval_UserData()
})
}
shinyApp(ui, server)
When you run shinyApp(ui,server)
, hit the "Select All" button in the drop down box in the app to run the clustering.
Now, here is what I want to do. Since I have assigned the cluster number back onto rval_UserData()
, I want to be able to merge this assign the cluster number to mockdf
, so I can generate plots using ggplot2
on the Duration
variable and also generate summary tables, all at cluster level. I prefer to be able to do this using reactive data frames, so the plots will up refresh depending on the ksolution
input in the UI.
Here's some of my attempts to merge the cluster number back onto the mockdf
, followed by an attempt to plot a density plot:-
rval_cluster_merged_df<-reactive({
merge(mockdf(), rval_UserData(), by="randomid")
#outside of shiny, this would be a quick way to paste the cluster number back onto the mock dataframe
})
output$clust_dens<-renderPlot({
dd<-rval_cluster_merged_df()
ggplot(dd,aes(x=Duration, colour=Cluster, group=Cluster))+
geom_density()+ggtitle("Cluster density plot")+scale_x_log10()
})
And this is what I get, see the error message:-
It's probably something obvious that I am doing wrong but any pointers in the right direction would be well appreciated! Thank you in advance :)
解决方案
You need to use req()
for all the input$abc
variables, and eval_tidy
as they are not standard variables. Minor update to your server function as shown below will solve your problem.
server<-function(input,output,session){
#create reactive dataframe
rval_df <-reactive({
mockdf
})
#=============================================kmeans clustering==================================================
rval_UserData<-reactive({
req(input$userselect)
userselect <- eval_tidy(input$userselect)
rval_df()%>%
filter(randomid %in% userselect)%>%
group_by(randomid)%>%
summarise(Count=n(),MeanDuration=mean(Duration),SDDuration=sd(Duration))%>%
mutate(SDDuration=if_else(is.na(SDDuration),0,SDDuration),
Cluster=as.factor(rval_kclust()$cluster))
})
#create a scaled dataset for the clustering
rval_cluster_df<-reactive({
req(input$userselect)
userselect <- eval_tidy(input$userselect)
rval_df()%>%
filter(randomid %in% userselect)%>%
group_by(randomid)%>%
summarise(Count=n(),MeanDuration=mean(Duration),SDDuration=sd(Duration))%>%
mutate(SDDuration=if_else(is.na(SDDuration),0,SDDuration),
Count=scale_this(Count),
MeanDuration=scale_this(MeanDuration),
SDDuration=scale_this(SDDuration))%>%
select(Count,MeanDuration,SDDuration)
})
#cluster algorithm
rval_kclust<-reactive({
req(input$ksolution)
centers <- as.numeric(eval_tidy(input$ksolution))
kmeans(rval_cluster_df(), centers = centers)
})
output$clustplot<-renderPlot({
factoextra::fviz_cluster(rval_kclust(), data = rval_cluster_df())
})
output$elbowplot<-renderPlot({
wssplot(rval_cluster_df())
})
output$Clusterdf<- DT::renderDataTable({
rval_UserData()
})
rval_cluster_merged_df<-reactive({
merge(rval_df(), rval_UserData(), by="randomid")
})
output$clust_dens<-renderPlot({
dd<-rval_cluster_merged_df()
ggplot(dd,aes(x=Duration, colour=Cluster, group=Cluster))+
geom_density()+ggtitle("Cluster density plot")+scale_x_log10()
})
}
Final output will be:
推荐阅读
- symfony - 从开发到生产的 Symfony 并发症
- sql - 我的 Create Table 语句中的错误逻辑在哪里?
- sql-server - SSRS - 使用 ReportServer AddEvent,不总是处理订阅
- apache-spark - 从在 PySpark 中压缩的 XML 文件中读取数据
- sql-server - 查询在 sqldbx 中执行,但不在 SSRS 中(Visual Studio 2010)
- jquery-mobile - mobile.loading 和计时问题 - JQuery Mobile
- java - Spring中预定的SQL LOAD DATA INFILE脚本执行
- node.js - 如何确保每个源文件都有注释头、gitook 或 npm 脚本?
- if-statement - 如果选中元素并重新加载页面,则单击事件
- c - 创建一个静态库并链接它