Visualizing Sets

Sets
Venn diagram
Author

Frederick Anyan

Published

June 24, 2023

Modified

August 30, 2024

Variables or columns are known as sets in upset visualization. Upset plots can be used to visualize the size and the pairwise combinations or intersections of sets and their aggregates. This facilitates easy-to-understand communication of the size and proportion of set memberships.

Example

  • Let’s say you want the size or proportion of your sample who meet cut-off for some disorders as well as co-morbidity across disorders. Upset plots can visualize the proportion of the sample who meet criteria for panic, separation anxiety disorder and selective mutism, or other kinds of combinations of sorts. For example, the proportion of people who meet panic disorder, but not separation anxiety disorder and PTSD, or the proportion who meet criteria for PTSD, but not depression and social anxiety disorders, and several other combinations.

Data with multiple variables or their combinations is often displayed in a Venn diagram. In some cases the Euler diagram is used. Both have limitations with increasing number of variables or sets. Upset plots can be used to visualize the size of different variables or sets, frequencies of their overlaps or intersections and their aggregates - for communicating set memberships.

In this tutorial post, we will see how to use the upset() function in the UpSetR package to visualize intersecting sets.

Load data

Code
data <- read.csv("/Volumes/anyan-1/frederickanyan.github.io/quantpost_data/dataupset.csv")#


#head(data) #first few rows

Following are symptoms in the data

  • Separation anxiety disorder - separation.
  • Selective mutism - mutism.
  • Specific phobia - phobia.
  • Social anxiety disorder - social.
  • Panic disorder - panic.
  • Generalized anxiety disorder - anxiety.
  • Depression - depression.
  • Post-traumatic stress disorder - ptsd.

Let’s change the variable names to the symptom names

Code
names(data) <- c("separation", "mutism", "phobia", "social", "panic", "anxiety", "depression", "ptsd")

For this tutorial, median split was used to categorize participants into two groups namely ‘clinical’ and ‘non-clinical’ - with binary coding as 1 and 0. You can use established cut-off scores for your own data, not the median split.

UpSetR package

Code
library(UpSetR) #Load the UpSetR package to access the upset () function

Now visualize the data using the upset() function.

Code
upset(data,                      #Name of data file 
      nsets = 8,                #To see all 8 sets in the upset plot
      matrix.color = "red", 
      sets.bar.color = "blue", 
      order.by = "freq",         #You can order sets by frequencies 
      set_size.show = TRUE) 

The blue bar chart shows the total size of the sets (i.e., set size).

The red filled-in circles corresponds to intersections or overlaps showing which set is part of an intersection or which disorder overlaps with which other disorder(s). For example, panic and mutism form an intersection or overlap, and so does panic and phobia. Additionally, panic, mutism, phobia and separation also form an intersection. And so on…

The black bar chart shows the occurrence or frequencies for each intersection (i.e., intersection size).

Read more about upset plots