Correlation analysis: understanding the relationships among our variables

Author

Julie Nguyen

In this notebook, I present the correlation analyses behind our research, “Benevolent Sexism and the Gender Gap in Startup Evaluation”. We ask the question: Does benevolent sexism skew evaluators’ views on the viability of startups led by men versus women? To dissect this, we orchestrated three experimental studies where participants were randomly assigned to evaluate startups led by either men or women, while we separately measured their levels of benevolent and hostile sexism.

Key variables include:

Navigating the analysis:

Our goal is to calculate linear and partial correlations among the key variables in our three studies. We first delve into Study 1’s dataset to establish a benchmark for our correlation analyses, setting the stage to extend and automate these methods across Studies 2 and 3.

knitr::opts_chunk$set(message = FALSE, warning = FALSE)
# Load the necessary libraries
library(dplyr)
library(tidyr)
library(ggplot2)
library(purrr)
library(Hmisc)
library(psych)
library(cowplot)
library(corrplot)
corrplot 0.92 loaded
# Load the data for three studies 
study_1 <- readRDS("/Users/mac/Library/CloudStorage/OneDrive-McGillUniversity/Work/Projects/BS in entre/Data/Main studies/Study 1/Data/R data/study_1.rds")
study_2 <- readRDS("~/Library/CloudStorage/OneDrive-McGillUniversity/Work/Projects/BS in entre/Data/Main studies/Study 2/Data/R data/study_2.rds")
study_3 <- readRDS("~/Library/CloudStorage/OneDrive-McGillUniversity/Work/Projects/BS in entre/Data/Main studies/Study 3/Data/R data/study_3.rds")

Kicking off with study 1

Welcome to our exploration into regression analyses! 🌟 Today, we’re diving into the Study 1 dataset to tease out insights through correlation analyses of key variables. Let’s get acquainted with our data:

# Display the first few rows of the dataset to understand the structure and types of variables included in Study 1.
study_1 %>% select(id, Condition, BS, HS, sex, viable, Invest) %>% head() %>% kableExtra::kable()
id Condition BS HS sex viable Invest
116899 1 2.545454 2.636364 1 5.5 44085
116977 1 2.363636 2.000000 1 5.5 50000
117031 1 2.727273 3.181818 1 5.0 60845
110215 1 3.636364 1.272727 1 7.0 40141
112513 1 2.000000 2.727273 0 5.5 20000
117004 1 3.181818 4.000000 0 5.0 30000

Here’s a quick look at our data. Each row captures details about an individual participant: their ID (id), the experimental condition they were assigned to (Condition), their levels of agreement with benevolent (BS) and hostile sexism (HS), their gender (sex), their assessment of the startup’s viability (viable), and how much they were prepared to invest (Invest).

Now, let’s dive into the correlations to see if there are any significant linear relationships:

# Calculating Pearson correlations among selected variables to identify any significant relationships.
cor_matrix <- rcorr(as.matrix(study_1 %>% select(BS, HS, viable, Invest)), type = "pearson")
cor_matrix
         BS    HS viable Invest
BS     1.00  0.41   0.07   0.06
HS     0.41  1.00  -0.08  -0.06
viable 0.07 -0.08   1.00   0.47
Invest 0.06 -0.06   0.47   1.00

n
        BS  HS viable Invest
BS     388 388    388    387
HS     388 388    388    387
viable 388 388    388    387
Invest 387 387    387    387

P
       BS     HS     viable Invest
BS            0.0000 0.1404 0.2775
HS     0.0000        0.0999 0.2785
viable 0.1404 0.0999        0.0000
Invest 0.2775 0.2785 0.0000       

People who endorse benevolent sexism often also endorse hostile sexism, but neither form of sexism significantly correlates with perceptions of startup viability or investment levels.

Let’s create a heatmap of the correlation matrix, which is a clear and intuitive way to see the relationships between the variables.

# Visualize the correlation matrix with corrplot
corrplot(cor_matrix$r, # Extract the correlation matrix for plotting
         method = "color", # Use color tiles to show correlation values
         type = "upper", # Show the upper part of the correlation matrix
         order = "hclust", # Order variables based on hierarchical clustering
         tl.col = "black", # Color of text labels
         tl.srt = 45, # Rotation of text labels
         col = colorRampPalette(c("#6D9EC1", "white", "#E46726"))(200), # Define color scale: Red for positive, blue for negative
         addCoef.col = "black")  # Add correlation coefficients to the plot for clarity

To delve deeper, we’ll also look at partial correlations to understand the relationships between two variables while controlling for the influence of another. First up, let’s control for hostile sexism and explore the relationship between benevolent sexism and our outcomes (startup viability and investment levels):

# Examining the influence of Benevolent Sexism on startup viability and investment decisions, controlling for Hostile Sexism.
study_1 %>% 
  select(BS, viable, Invest, HS) %>% 
  psych::partial.r(c("BS", "viable", "Invest"),"HS") %>% # Calculate partial correlations between 'BS', 'viable', 'Invest' controlling for 'HS'.
  psych::corr.p(n=nrow(study_1)) # Compute p-values for these partial correlations based on the sample size of study_1.
Call:psych::corr.p(r = ., n = nrow(study_1))
Correlation matrix 
partial correlations 
         BS viable Invest
BS     1.00   0.12   0.09
viable 0.12   1.00   0.47
Invest 0.09   0.47   1.00
Sample Size 
[1] 388
Probability values (Entries above the diagonal are adjusted for multiple tests.) 
partial correlations 
         BS viable Invest
BS     0.00   0.03   0.09
viable 0.02   0.00   0.00
Invest 0.09   0.00   0.00

 To see confidence intervals of the correlations, print with the short=FALSE option

Turns out, benevolent sexism is positively correlated with perceived startup viability, but is unrelated to investment amounts when we remove the effects of hostile sexism.

Next, we control for benevolent sexism to explore how hostile sexism alone might affect startup outcomes:

# Exploring the relationship between Hostile Sexism and key outcomes, controlling for Benevolent Sexism.
study_1 %>% 
  select(BS, viable, Invest, HS) %>% 
  psych::partial.r(c("HS", "viable", "Invest"),"BS") %>% # Calculate partial correlations between 'HS', 'viable', 'Invest' controlling for 'BS'.
  psych::corr.p(n=nrow(study_1)) # Compute p-values for these partial correlations based on the sample size.
Call:psych::corr.p(r = ., n = nrow(study_1))
Correlation matrix 
partial correlations 
          HS viable Invest
HS      1.00  -0.13  -0.09
viable -0.13   1.00   0.47
Invest -0.09   0.47   1.00
Sample Size 
[1] 388
Probability values (Entries above the diagonal are adjusted for multiple tests.) 
partial correlations 
         HS viable Invest
HS     0.00   0.03   0.09
viable 0.01   0.00   0.00
Invest 0.09   0.00   0.00

 To see confidence intervals of the correlations, print with the short=FALSE option

Here we see a pattern: higher levels of hostile sexism correlate with lower perceptions of startup viability, independent of benevolent sexism. However, this attitude doesn’t seem to influence the amount of funding participants are willing to invest.

Automating across studies

Implementing automation

Having completed our correlation analyses with Study 1, we are now poised to extend these methods to encompass additional studies. In the realm of experimental psychological research, such as ours, it is a standard practice to conduct multiple studies. This allows us to fine-tune the variables or conditions and ensure that our findings are not mere artifacts but robust, replicable phenomena. Repeating the same analyses manually for each study, though feasible, is not only time-consuming but also susceptible to errors. Automation helps us the same tasks across various datasets with precision. This saves valuable time and enhances the reliability of our results, freeing us to focus on broader research implications.

We begin by defining a function that will handle the data for each study. This function will calculate both linear and partial correlations and store these in a list for easy access. Here’s how we set it up:

perform_correlation_analysis <- function(data) {
  # Calculate Pearson correlation for key variables
  pearson_cor <- rcorr(as.matrix(data %>% select(BS, HS, viable, Invest)), type = "pearson")
         
  # Compute partial correlations controlling for Hostile Sexism
  partial_cor_bs <- data %>% partial.r(c("BS", "viable", "Invest"), "HS") %>% corr.p(n=nrow(data))
  
  # Compute partial correlations controlling for Benevolent Sexism
  partial_cor_hs <- data %>% partial.r(c("HS", "viable", "Invest"), "BS") %>% corr.p(n=nrow(data))

  # Return results as a list
  list(
    Pearson = pearson_cor,
    Partial_BS = partial_cor_bs,
    Partial_HS = partial_cor_hs
  )
}

# Define the datasets for each study
data_sets <- list(study_1 = study_1, study_2 = study_2, study_3 = study_3)

# Apply the function to each dataset using map
results <- map(data_sets, perform_correlation_analysis)

Visualizing and interpreting correlations

To visually interpret the relationships between our study variables, let’s create heatmaps for each study’s correlation matrix. These visualizations allow us to quickly grasp the strength and direction of associations between sexism, perceptions of viability, and investment decisions.

# Generate and display heatmaps for each study's correlation matrix
heatmaps <- map(results, ~ {
  # Generate the correlation plot
  corrplot(.x$Pearson$r, 
           method = "color", 
           type = "upper", 
           order = "hclust", 
           tl.col = "black", 
           tl.srt = 45, 
           col = colorRampPalette(c("#6D9EC1", "white", "#E46726"))(200),
           addCoef.col = "black")
})

Let’s also take a look at the p-values from Pearson’s correlation for each study to get their statistical significance.

options(scipen = 999)
# Displaying p-values for Pearson correlations in Study 1
results$study_1$Pearson$P
              BS         HS     viable    Invest
BS            NA 0.00000000 0.14037612 0.2775449
HS     0.0000000         NA 0.09991409 0.2784888
viable 0.1403761 0.09991409         NA 0.0000000
Invest 0.2775449 0.27848877 0.00000000        NA
# Displaying p-values for Pearson correlations in Study 2
results$study_2$Pearson$P
                  BS         HS        viable     Invest
BS                NA 0.00000000 0.00002506246 0.85755204
HS     0.00000000000         NA 0.74115752337 0.07065101
viable 0.00002506246 0.74115752            NA 0.00000000
Invest 0.85755204305 0.07065101 0.00000000000         NA
# Displaying p-values for Pearson correlations in Study 3
results$study_3$Pearson$P
                            BS                      HS    viable    Invest
BS                          NA 0.000000000000003108624 0.1361645 0.4165824
HS     0.000000000000003108624                      NA 0.2527221 0.2810248
viable 0.136164532082302613958 0.252722094705760458311        NA 0.0000000
Invest 0.416582427197501159455 0.281024848333587762284 0.0000000        NA

The consistent positive correlation between benevolent and hostile sexism across all studies (ranging from 0.41 to 0.47) suggests those who hold seemingly subtle sexist attitudes may also harbor more overtly negative biases. Intriguingly, these forms of sexism generally do not correlate with how startups are perceived in terms of viability or investment, with an exception in Study 2 where a positive link emerges with perceived viability.

Next, we consider how these relationships change when accounting for the influence of other forms of sexism. This gives us a clearer picture of the independent effect of each form of sexism.

# Partial correlations adjusting for Hostile Sexism in Study 1
results$study_1$Partial_BS
Call:corr.p(r = ., n = nrow(data))
Correlation matrix 
partial correlations 
         BS viable Invest
BS     1.00   0.12   0.09
viable 0.12   1.00   0.47
Invest 0.09   0.47   1.00
Sample Size 
[1] 388
Probability values (Entries above the diagonal are adjusted for multiple tests.) 
partial correlations 
         BS viable Invest
BS     0.00   0.03   0.09
viable 0.02   0.00   0.00
Invest 0.09   0.00   0.00

 To see confidence intervals of the correlations, print with the short=FALSE option
# Partial correlations adjusting for Hostile Sexism in Study 2
results$study_2$Partial_BS
Call:corr.p(r = ., n = nrow(data))
Correlation matrix 
partial correlations 
         BS viable Invest
BS     1.00   0.19   0.03
viable 0.19   1.00   0.58
Invest 0.03   0.58   1.00
Sample Size 
[1] 572
Probability values (Entries above the diagonal are adjusted for multiple tests.) 
partial correlations 
         BS viable Invest
BS     0.00      0   0.44
viable 0.00      0   0.00
Invest 0.44      0   0.00

 To see confidence intervals of the correlations, print with the short=FALSE option
# Partial correlations adjusting for Hostile Sexism in Study 3
results$study_3$Partial_BS
Call:corr.p(r = ., n = nrow(data))
Correlation matrix 
partial correlations 
         BS viable Invest
BS     1.00   0.12   0.08
viable 0.12   1.00   0.59
Invest 0.08   0.59   1.00
Sample Size 
[1] 312
Probability values (Entries above the diagonal are adjusted for multiple tests.) 
partial correlations 
         BS viable Invest
BS     0.00   0.06   0.16
viable 0.03   0.00   0.00
Invest 0.16   0.00   0.00

 To see confidence intervals of the correlations, print with the short=FALSE option

When controlling for hostile sexism, benevolent sexism consistently predicts a higher perceived viability of startups.

What about the correlation between hostile sexism and startup outcomes when benevolent sexism is controlled?

# Partial correlations adjusting for Benevolent Sexism in Study 1
results$study_1$Partial_HS
Call:corr.p(r = ., n = nrow(data))
Correlation matrix 
partial correlations 
          HS viable Invest
HS      1.00  -0.13  -0.09
viable -0.13   1.00   0.47
Invest -0.09   0.47   1.00
Sample Size 
[1] 388
Probability values (Entries above the diagonal are adjusted for multiple tests.) 
partial correlations 
         HS viable Invest
HS     0.00   0.03   0.09
viable 0.01   0.00   0.00
Invest 0.09   0.00   0.00

 To see confidence intervals of the correlations, print with the short=FALSE option
# Partial correlations adjusting for Benevolent Sexism in Study 2
results$study_2$Partial_HS
Call:corr.p(r = ., n = nrow(data))
Correlation matrix 
partial correlations 
          HS viable Invest
HS      1.00  -0.08  -0.08
viable -0.08   1.00   0.59
Invest -0.08   0.59   1.00
Sample Size 
[1] 572
Probability values (Entries above the diagonal are adjusted for multiple tests.) 
partial correlations 
         HS viable Invest
HS     0.00    0.1    0.1
viable 0.06    0.0    0.0
Invest 0.05    0.0    0.0

 To see confidence intervals of the correlations, print with the short=FALSE option
# Partial correlations adjusting for Benevolent Sexism in Study 3
results$study_3$Partial_HS
Call:corr.p(r = ., n = nrow(data))
Correlation matrix 
partial correlations 
          HS viable Invest
HS      1.00  -0.11  -0.09
viable -0.11   1.00   0.59
Invest -0.09   0.59   1.00
Sample Size 
[1] 312
Probability values (Entries above the diagonal are adjusted for multiple tests.) 
partial correlations 
         HS viable Invest
HS     0.00    0.1   0.11
viable 0.05    0.0   0.00
Invest 0.11    0.0   0.00

 To see confidence intervals of the correlations, print with the short=FALSE option

Hostile sexism shows a negative correlation with perceptions of startup viability, which is the opposite pattern to benevolent sexism.

Summary

Throughout this notebook, we’ve explored and visualized the intricate web of relationships between different forms of sexism, perceptions of startup viability, and investment decisions. Using Pearson and partial correlation analyses, we get an initial sense of how different sexist attitudes can influence startup evaluations. Moving forward, these analyses set a foundation for regression models that will help us dissect these relationships more comprehensively.