::opts_chunk$set(message = FALSE, warning = FALSE) knitr
Correlation analysis: understanding the relationships among our variables
In this notebook, I present the correlation analyses behind our research, “Benevolent Sexism and the Gender Gap in Startup Evaluation”. We ask the question: Does benevolent sexism skew evaluators’ views on the viability of startups led by men versus women? To dissect this, we orchestrated three experimental studies where participants were randomly assigned to evaluate startups led by either men or women, while we separately measured their levels of benevolent and hostile sexism.
Key variables include:
- Entrepreneur gender (
Condition
) coded 0 for men and 1 for women entrepreneurs. - Participant gender (
sex
) indicates the evaluator’s gender, with 0 for men and 1 for women. - Participant benevolent sexism (
BS
) and hostile sexism (HS
) rated on a scale of 1-6, reflecting participants’ endorsement of different forms of sexism - Participant perceptions of startup viability (
viable
) assessed on a scale of 1-7, reflecting participants’ views on the startup’s potential success. - Participant funding allocations (
Invest
) captures the financial commitment participants are willing to make, ranging from 0 to 100,000.
Navigating the analysis:
Our goal is to calculate linear and partial correlations among the key variables in our three studies. We first delve into Study 1’s dataset to establish a benchmark for our correlation analyses, setting the stage to extend and automate these methods across Studies 2 and 3.
# Load the necessary libraries
library(dplyr)
library(tidyr)
library(ggplot2)
library(purrr)
library(Hmisc)
library(psych)
library(cowplot)
library(corrplot)
corrplot 0.92 loaded
# Load the data for three studies
<- readRDS("/Users/mac/Library/CloudStorage/OneDrive-McGillUniversity/Work/Projects/BS in entre/Data/Main studies/Study 1/Data/R data/study_1.rds")
study_1 <- readRDS("~/Library/CloudStorage/OneDrive-McGillUniversity/Work/Projects/BS in entre/Data/Main studies/Study 2/Data/R data/study_2.rds")
study_2 <- readRDS("~/Library/CloudStorage/OneDrive-McGillUniversity/Work/Projects/BS in entre/Data/Main studies/Study 3/Data/R data/study_3.rds") study_3
Kicking off with study 1
Welcome to our exploration into regression analyses! 🌟 Today, we’re diving into the Study 1 dataset to tease out insights through correlation analyses of key variables. Let’s get acquainted with our data:
# Display the first few rows of the dataset to understand the structure and types of variables included in Study 1.
%>% select(id, Condition, BS, HS, sex, viable, Invest) %>% head() %>% kableExtra::kable() study_1
id | Condition | BS | HS | sex | viable | Invest |
---|---|---|---|---|---|---|
116899 | 1 | 2.545454 | 2.636364 | 1 | 5.5 | 44085 |
116977 | 1 | 2.363636 | 2.000000 | 1 | 5.5 | 50000 |
117031 | 1 | 2.727273 | 3.181818 | 1 | 5.0 | 60845 |
110215 | 1 | 3.636364 | 1.272727 | 1 | 7.0 | 40141 |
112513 | 1 | 2.000000 | 2.727273 | 0 | 5.5 | 20000 |
117004 | 1 | 3.181818 | 4.000000 | 0 | 5.0 | 30000 |
Here’s a quick look at our data. Each row captures details about an individual participant: their ID (id
), the experimental condition they were assigned to (Condition
), their levels of agreement with benevolent (BS
) and hostile sexism (HS
), their gender (sex
), their assessment of the startup’s viability (viable
), and how much they were prepared to invest (Invest
).
Now, let’s dive into the correlations to see if there are any significant linear relationships:
# Calculating Pearson correlations among selected variables to identify any significant relationships.
<- rcorr(as.matrix(study_1 %>% select(BS, HS, viable, Invest)), type = "pearson")
cor_matrix cor_matrix
BS HS viable Invest
BS 1.00 0.41 0.07 0.06
HS 0.41 1.00 -0.08 -0.06
viable 0.07 -0.08 1.00 0.47
Invest 0.06 -0.06 0.47 1.00
n
BS HS viable Invest
BS 388 388 388 387
HS 388 388 388 387
viable 388 388 388 387
Invest 387 387 387 387
P
BS HS viable Invest
BS 0.0000 0.1404 0.2775
HS 0.0000 0.0999 0.2785
viable 0.1404 0.0999 0.0000
Invest 0.2775 0.2785 0.0000
People who endorse benevolent sexism often also endorse hostile sexism, but neither form of sexism significantly correlates with perceptions of startup viability or investment levels.
Let’s create a heatmap of the correlation matrix, which is a clear and intuitive way to see the relationships between the variables.
# Visualize the correlation matrix with corrplot
corrplot(cor_matrix$r, # Extract the correlation matrix for plotting
method = "color", # Use color tiles to show correlation values
type = "upper", # Show the upper part of the correlation matrix
order = "hclust", # Order variables based on hierarchical clustering
tl.col = "black", # Color of text labels
tl.srt = 45, # Rotation of text labels
col = colorRampPalette(c("#6D9EC1", "white", "#E46726"))(200), # Define color scale: Red for positive, blue for negative
addCoef.col = "black") # Add correlation coefficients to the plot for clarity
To delve deeper, we’ll also look at partial correlations to understand the relationships between two variables while controlling for the influence of another. First up, let’s control for hostile sexism and explore the relationship between benevolent sexism and our outcomes (startup viability and investment levels):
# Examining the influence of Benevolent Sexism on startup viability and investment decisions, controlling for Hostile Sexism.
%>%
study_1 select(BS, viable, Invest, HS) %>%
::partial.r(c("BS", "viable", "Invest"),"HS") %>% # Calculate partial correlations between 'BS', 'viable', 'Invest' controlling for 'HS'.
psych::corr.p(n=nrow(study_1)) # Compute p-values for these partial correlations based on the sample size of study_1. psych
Call:psych::corr.p(r = ., n = nrow(study_1))
Correlation matrix
partial correlations
BS viable Invest
BS 1.00 0.12 0.09
viable 0.12 1.00 0.47
Invest 0.09 0.47 1.00
Sample Size
[1] 388
Probability values (Entries above the diagonal are adjusted for multiple tests.)
partial correlations
BS viable Invest
BS 0.00 0.03 0.09
viable 0.02 0.00 0.00
Invest 0.09 0.00 0.00
To see confidence intervals of the correlations, print with the short=FALSE option
Turns out, benevolent sexism is positively correlated with perceived startup viability, but is unrelated to investment amounts when we remove the effects of hostile sexism.
Next, we control for benevolent sexism to explore how hostile sexism alone might affect startup outcomes:
# Exploring the relationship between Hostile Sexism and key outcomes, controlling for Benevolent Sexism.
%>%
study_1 select(BS, viable, Invest, HS) %>%
::partial.r(c("HS", "viable", "Invest"),"BS") %>% # Calculate partial correlations between 'HS', 'viable', 'Invest' controlling for 'BS'.
psych::corr.p(n=nrow(study_1)) # Compute p-values for these partial correlations based on the sample size. psych
Call:psych::corr.p(r = ., n = nrow(study_1))
Correlation matrix
partial correlations
HS viable Invest
HS 1.00 -0.13 -0.09
viable -0.13 1.00 0.47
Invest -0.09 0.47 1.00
Sample Size
[1] 388
Probability values (Entries above the diagonal are adjusted for multiple tests.)
partial correlations
HS viable Invest
HS 0.00 0.03 0.09
viable 0.01 0.00 0.00
Invest 0.09 0.00 0.00
To see confidence intervals of the correlations, print with the short=FALSE option
Here we see a pattern: higher levels of hostile sexism correlate with lower perceptions of startup viability, independent of benevolent sexism. However, this attitude doesn’t seem to influence the amount of funding participants are willing to invest.
Automating across studies
Implementing automation
Having completed our correlation analyses with Study 1, we are now poised to extend these methods to encompass additional studies. In the realm of experimental psychological research, such as ours, it is a standard practice to conduct multiple studies. This allows us to fine-tune the variables or conditions and ensure that our findings are not mere artifacts but robust, replicable phenomena. Repeating the same analyses manually for each study, though feasible, is not only time-consuming but also susceptible to errors. Automation helps us the same tasks across various datasets with precision. This saves valuable time and enhances the reliability of our results, freeing us to focus on broader research implications.
We begin by defining a function that will handle the data for each study. This function will calculate both linear and partial correlations and store these in a list for easy access. Here’s how we set it up:
<- function(data) {
perform_correlation_analysis # Calculate Pearson correlation for key variables
<- rcorr(as.matrix(data %>% select(BS, HS, viable, Invest)), type = "pearson")
pearson_cor
# Compute partial correlations controlling for Hostile Sexism
<- data %>% partial.r(c("BS", "viable", "Invest"), "HS") %>% corr.p(n=nrow(data))
partial_cor_bs
# Compute partial correlations controlling for Benevolent Sexism
<- data %>% partial.r(c("HS", "viable", "Invest"), "BS") %>% corr.p(n=nrow(data))
partial_cor_hs
# Return results as a list
list(
Pearson = pearson_cor,
Partial_BS = partial_cor_bs,
Partial_HS = partial_cor_hs
)
}
# Define the datasets for each study
<- list(study_1 = study_1, study_2 = study_2, study_3 = study_3)
data_sets
# Apply the function to each dataset using map
<- map(data_sets, perform_correlation_analysis) results
Visualizing and interpreting correlations
To visually interpret the relationships between our study variables, let’s create heatmaps for each study’s correlation matrix. These visualizations allow us to quickly grasp the strength and direction of associations between sexism, perceptions of viability, and investment decisions.
# Generate and display heatmaps for each study's correlation matrix
<- map(results, ~ {
heatmaps # Generate the correlation plot
corrplot(.x$Pearson$r,
method = "color",
type = "upper",
order = "hclust",
tl.col = "black",
tl.srt = 45,
col = colorRampPalette(c("#6D9EC1", "white", "#E46726"))(200),
addCoef.col = "black")
})
Let’s also take a look at the p-values from Pearson’s correlation for each study to get their statistical significance.
options(scipen = 999)
# Displaying p-values for Pearson correlations in Study 1
$study_1$Pearson$P results
BS HS viable Invest
BS NA 0.00000000 0.14037612 0.2775449
HS 0.0000000 NA 0.09991409 0.2784888
viable 0.1403761 0.09991409 NA 0.0000000
Invest 0.2775449 0.27848877 0.00000000 NA
# Displaying p-values for Pearson correlations in Study 2
$study_2$Pearson$P results
BS HS viable Invest
BS NA 0.00000000 0.00002506246 0.85755204
HS 0.00000000000 NA 0.74115752337 0.07065101
viable 0.00002506246 0.74115752 NA 0.00000000
Invest 0.85755204305 0.07065101 0.00000000000 NA
# Displaying p-values for Pearson correlations in Study 3
$study_3$Pearson$P results
BS HS viable Invest
BS NA 0.000000000000003108624 0.1361645 0.4165824
HS 0.000000000000003108624 NA 0.2527221 0.2810248
viable 0.136164532082302613958 0.252722094705760458311 NA 0.0000000
Invest 0.416582427197501159455 0.281024848333587762284 0.0000000 NA
The consistent positive correlation between benevolent and hostile sexism across all studies (ranging from 0.41 to 0.47) suggests those who hold seemingly subtle sexist attitudes may also harbor more overtly negative biases. Intriguingly, these forms of sexism generally do not correlate with how startups are perceived in terms of viability or investment, with an exception in Study 2 where a positive link emerges with perceived viability.
Next, we consider how these relationships change when accounting for the influence of other forms of sexism. This gives us a clearer picture of the independent effect of each form of sexism.
# Partial correlations adjusting for Hostile Sexism in Study 1
$study_1$Partial_BS results
Call:corr.p(r = ., n = nrow(data))
Correlation matrix
partial correlations
BS viable Invest
BS 1.00 0.12 0.09
viable 0.12 1.00 0.47
Invest 0.09 0.47 1.00
Sample Size
[1] 388
Probability values (Entries above the diagonal are adjusted for multiple tests.)
partial correlations
BS viable Invest
BS 0.00 0.03 0.09
viable 0.02 0.00 0.00
Invest 0.09 0.00 0.00
To see confidence intervals of the correlations, print with the short=FALSE option
# Partial correlations adjusting for Hostile Sexism in Study 2
$study_2$Partial_BS results
Call:corr.p(r = ., n = nrow(data))
Correlation matrix
partial correlations
BS viable Invest
BS 1.00 0.19 0.03
viable 0.19 1.00 0.58
Invest 0.03 0.58 1.00
Sample Size
[1] 572
Probability values (Entries above the diagonal are adjusted for multiple tests.)
partial correlations
BS viable Invest
BS 0.00 0 0.44
viable 0.00 0 0.00
Invest 0.44 0 0.00
To see confidence intervals of the correlations, print with the short=FALSE option
# Partial correlations adjusting for Hostile Sexism in Study 3
$study_3$Partial_BS results
Call:corr.p(r = ., n = nrow(data))
Correlation matrix
partial correlations
BS viable Invest
BS 1.00 0.12 0.08
viable 0.12 1.00 0.59
Invest 0.08 0.59 1.00
Sample Size
[1] 312
Probability values (Entries above the diagonal are adjusted for multiple tests.)
partial correlations
BS viable Invest
BS 0.00 0.06 0.16
viable 0.03 0.00 0.00
Invest 0.16 0.00 0.00
To see confidence intervals of the correlations, print with the short=FALSE option
When controlling for hostile sexism, benevolent sexism consistently predicts a higher perceived viability of startups.
What about the correlation between hostile sexism and startup outcomes when benevolent sexism is controlled?
# Partial correlations adjusting for Benevolent Sexism in Study 1
$study_1$Partial_HS results
Call:corr.p(r = ., n = nrow(data))
Correlation matrix
partial correlations
HS viable Invest
HS 1.00 -0.13 -0.09
viable -0.13 1.00 0.47
Invest -0.09 0.47 1.00
Sample Size
[1] 388
Probability values (Entries above the diagonal are adjusted for multiple tests.)
partial correlations
HS viable Invest
HS 0.00 0.03 0.09
viable 0.01 0.00 0.00
Invest 0.09 0.00 0.00
To see confidence intervals of the correlations, print with the short=FALSE option
# Partial correlations adjusting for Benevolent Sexism in Study 2
$study_2$Partial_HS results
Call:corr.p(r = ., n = nrow(data))
Correlation matrix
partial correlations
HS viable Invest
HS 1.00 -0.08 -0.08
viable -0.08 1.00 0.59
Invest -0.08 0.59 1.00
Sample Size
[1] 572
Probability values (Entries above the diagonal are adjusted for multiple tests.)
partial correlations
HS viable Invest
HS 0.00 0.1 0.1
viable 0.06 0.0 0.0
Invest 0.05 0.0 0.0
To see confidence intervals of the correlations, print with the short=FALSE option
# Partial correlations adjusting for Benevolent Sexism in Study 3
$study_3$Partial_HS results
Call:corr.p(r = ., n = nrow(data))
Correlation matrix
partial correlations
HS viable Invest
HS 1.00 -0.11 -0.09
viable -0.11 1.00 0.59
Invest -0.09 0.59 1.00
Sample Size
[1] 312
Probability values (Entries above the diagonal are adjusted for multiple tests.)
partial correlations
HS viable Invest
HS 0.00 0.1 0.11
viable 0.05 0.0 0.00
Invest 0.11 0.0 0.00
To see confidence intervals of the correlations, print with the short=FALSE option
Hostile sexism shows a negative correlation with perceptions of startup viability, which is the opposite pattern to benevolent sexism.
Summary
Throughout this notebook, we’ve explored and visualized the intricate web of relationships between different forms of sexism, perceptions of startup viability, and investment decisions. Using Pearson and partial correlation analyses, we get an initial sense of how different sexist attitudes can influence startup evaluations. Moving forward, these analyses set a foundation for regression models that will help us dissect these relationships more comprehensively.