Descriptive statistics: getting to know our data

Author

Julie Nguyen

In this notebook, we presents the analysis behind our research, “Benevolent Sexism and the Gender Gap in Startup Evaluation,” featured in “Entrepreneurship: Theory and Practice” and “The Conversation Canada”. Our study explores into how evaluators’ benevolent sexism might influence their assessment of startups, contingent on the founder’s gender.

At the heart of our investigation lies a critical question: Does benevolent sexism skew evaluators’ views on the viability of startups led by men versus women? To dissect this, we orchestrated three experimental studies. Participants were subtly assigned to evaluate startups headed by either gender, while we separately measured their levels of benevolent and hostile sexism.

Key variables include:

Navigating the analysis:

We kick-started our analytical journey with Study 1 and then automate the process across three studies. Specifically, we calculated descriptive statistics for key variables for the whole sample and separately for each experimental condition and each participant gender group. We then visualize sample sizes to assess participant distribution, create histograms to understand variable distributions, comparing means and standard deviations across groups via bar charts, and testing these group differences through t-tests. In doing so, we gain an understanding the basic structure and distribution of our data, setting the stage for more complex regression analyses later on.

knitr::opts_chunk$set(message = FALSE, warning = FALSE)
# Load the necessary libraries
library(dplyr)
library(tidyr)
library(ggplot2)
library(purrr)
library(psych)
library(kableExtra)
library(stringr)
library(broom)
library(patchwork)
library(cowplot)

# Load the data for three studies 
study_1 <- readRDS("/Users/mac/Library/CloudStorage/OneDrive-McGillUniversity/Work/Projects/BS in entre/Data/Main studies/Study 1/Data/R data/study_1.rds")
study_2 <- readRDS("~/Library/CloudStorage/OneDrive-McGillUniversity/Work/Projects/BS in entre/Data/Main studies/Study 2/Data/R data/study_2.rds")
study_3 <- readRDS("~/Library/CloudStorage/OneDrive-McGillUniversity/Work/Projects/BS in entre/Data/Main studies/Study 3/Data/R data/study_3.rds")

Kicking off with study 1

Welcome to the beginning of our exploration! 🌟 In this segment, we dive into the Study 1 dataset to uncover insights through descriptive statistics of key variables.

Understanding the participants

Let’s first take a glimpse at our data:

# Initial glimpse at the dataset to check the first few entries. This helps in getting a basic understanding of data structure and types of variables collected in the study.
study_1 %>% select(id, Condition, BS, HS, sex, viable, Invest) %>% head()
# A tibble: 6 × 7
  id     Condition    BS    HS   sex viable Invest
  <chr>      <dbl> <dbl> <dbl> <dbl>  <dbl>  <dbl>
1 116899         1  2.55  2.64     1    5.5  44085
2 116977         1  2.36  2        1    5.5  50000
3 117031         1  2.73  3.18     1    5    60845
4 110215         1  3.64  1.27     1    7    40141
5 112513         1  2     2.73     0    5.5  20000
6 117004         1  3.18  4        0    5    30000

This table shows a snapshot of our dataset. Each row is a unique participant, with columns detailing their characteristics: their ID (id), the experimental scenario they were assigned (Condition), how much they agreed with benevolent sexist (BS) and hostile sexist beliefs (HS), their gender (sex), how viable they thought the startup was (viable), and how much they were willing to support it (Invest).

Now, let’s calculate mean and standard deviation across key variables.

# Calculating mean and standard deviation for key variables (BS, HS, viable, Invest) to get an overview of central tendencies and variability. `na.rm = TRUE` ensures missing values are ignored in these calculations.
overall_stats <- study_1 %>% 
  summarise(
    across(
      c(BS, HS, viable, Invest), 
      list(mean = mean, sd = sd), 
      na.rm = TRUE))
Warning: There was 1 warning in `summarise()`.
ℹ In argument: `across(...)`.
Caused by warning:
! The `...` argument of `across()` is deprecated as of dplyr 1.1.0.
Supply arguments directly to `.fns` through an anonymous function instead.

  # Previously
  across(a:b, mean, na.rm = TRUE)

  # Now
  across(a:b, \(x) mean(x, na.rm = TRUE))
overall_stats %>% kable()
BS_mean BS_sd HS_mean HS_sd viable_mean viable_sd Invest_mean Invest_sd
3.220009 0.8162942 2.83552 0.9602238 4.667526 1.366932 37456.82 23845.36

It looks like our participants agree with benevolent sexism more than hostile sexism. No big surprise there—society often dresses up these attitudes as chivalry instead of prejudice. Our participants are also quite optimistic about the startup’s potential, scoring its viability pretty high (4.7 out of 7). However, they’re a tad more cautious when it comes to actually investing, with investment amount averaging at 37K out of 100K. A classic case of “let’s see where this goes”.

Let’s visualize the distribution of these variables to better grasp how our participants’ opinions spread out. For this, we’ll use histograms.

# Setting up for visualization
# Define key variables, their bin widths, and assigned colors for differentiation
variables <- c("BS", "HS", "viable", "Invest") # `variables` are the key variables of interest
binwidths <- c(0.5, 0.5, 0.5, 5000) # `binwidths` determine the granularity of the histogram
colors <- c("#cf1578", "#e8d21d", "#039fbe", "#b20238") # `colors` are visually distinguishing each variable's histogram
x_limits <- list(c(1, 6), c(1, 6), c(1, 7), c(0, 100000)) # Define x-axis limits based on expected data ranges

# Generate histograms using map to iterate over the key variables and their corresponding attributes
plots <- map(seq_along(variables), ~ {
  var_plot <- ggplot(study_1, aes(x = .data[[variables[.x]]])) +
    geom_histogram(binwidth = binwidths[.x], fill = colors[.x], color = NA) +  # No outline around bins
    ggtitle(paste("Histogram of", variables[.x])) +
    xlab(variables[.x]) +
    ylab("Frequency") +
    theme_minimal() +
    scale_x_continuous(limits = x_limits[[.x]], oob = scales::oob_squish)  # Adjust x-axis based on variable
})

# Assembling histograms into a cohesive visual layout for side-by-side comparison.
(plots[[1]] + plots[[2]]) / (plots[[3]] + plots[[4]])
Warning: Removed 1 row containing non-finite outside the scale range
(`stat_bin()`).

Benevolent sexism, hostile sexism, and perceived viability scores generally follow a normal distribution, indicating a relatively even spread of opinions on these scales. Investment amounts, though, presents what looks like a multimodal distribution with varied peaks. This hints at distinct groups of participants based on their willingness to invest.

Zooming in on the experimental conditions

Let’s narrow down our focus and compare startups led by men versus those led by women. How many observations we have in each condition?

# Counting the number of observations within each experimental condition to ensure sufficient data for each group.

study_1 %>%
  mutate(Condition = case_when(
      Condition == 0 ~ "Men entrepreneur",
      Condition == 1 ~ "Women entrepreneur",
      TRUE ~ as.character(Condition)  
    )
  ) %>% count(Condition)
# A tibble: 2 × 2
  Condition              n
  <chr>              <int>
1 Men entrepreneur     196
2 Women entrepreneur   192

Looks like we have a balanced number of observations for each experimental condition. Nice! This is crucial for subsequent comparative analysis to be meaningful.

Let’s dig deeper and calculate means and standard deviations for each experimental condition.

# Calculate descriptive statistics by the experimental condition to discern potential differences.
# The 'Condition' column in 'study_1' is recoded so that 0 is recoded to 'Men entrepreneur' and 1 to 'Women entrepreneur'.
condition_stats <- study_1 %>%
  mutate(
    Condition = case_when(
      Condition == 0 ~ "Men entrepreneur",
      Condition == 1 ~ "Women entrepreneur",
      TRUE ~ as.character(Condition)
    )
  ) %>% 
  # Group the data by the newly updated 'Condition' column.
  # Within each group (each unique condition), calculate the mean and standard deviation for the same set of variables.
  group_by(Condition) %>%
  summarise(
    across(
      c(BS, HS, viable, Invest), 
      list(mean = ~mean(., na.rm = TRUE), sd = ~sd(., na.rm = TRUE))
    )
  )

condition_stats %>% kable()
Condition BS_mean BS_sd HS_mean HS_sd viable_mean viable_sd Invest_mean Invest_sd
Men entrepreneur 3.182746 0.8095630 2.888219 0.9727516 4.630102 1.378279 36428.40 24395.79
Women entrepreneur 3.258049 0.8234795 2.781724 0.9467708 4.705729 1.357785 38501.32 23290.16

Let’s break down these stats visually. It’s always a bit easier to spot patterns and contrasts with a graph rather than a table:

# Transforming condition stats for visualization. 
# We pivot longer to have a single measure and stat type per row, then pivot wider to separate mean and sd for plotting. 
condition_stats %>% 
  pivot_longer(cols = ends_with("_mean") | ends_with("_sd"), # Select columns that end with '_mean' or '_sd'
               names_to = "Metric_Type", # New column where original column names (indicating metric and stat type) are stored
               values_to = "Value") %>% # New column where values from the selected columns are stored
  separate(Metric_Type, into = c("Variable", "Stat"), sep = "_") %>% # Split 'Metric_Type' into 'Variable' and 'Stat' based on '_'
  pivot_wider(names_from = Stat, values_from = Value) %>% # Pivot back to a wider format where 'mean' and 'sd' become separate columns
  # Replace abbreviated variable names with full, descriptive names for clarity in visual representation
  mutate(Variable = str_replace(Variable, "BS", "Benevolent Sexism"),
         Variable = str_replace(Variable, "HS", "Hostile Sexism"),
         Variable = str_replace(Variable, "viable", "Perceived Viability"),
         Variable = str_replace(Variable, "Invest", "Investment Decisions")) %>% 
  ggplot(aes(x = Condition, y = mean, fill = Condition)) + # Plotting setup: X-axis is Condition, Y-axis is mean, colored by Condition
  geom_bar(stat = "identity", position = position_dodge(width = 0.8), width = 0.7) +  # Draw bars for mean values, dodge positions them side by side
  geom_errorbar(aes(ymin = mean - sd, ymax = mean + sd), position = position_dodge(width = 0.8), width = 0.25) + # Add error bars for SD
  facet_wrap(~Variable, scales = "free_y", ncol = 2) +  # Separate plots for each variable, allowing Y-axis to scale independently
  labs(x = "", y = "Mean with SD as error bars") + # Labeling axes
  theme_minimal() + # Minimal theme for a clean look
  scale_fill_manual(values = c("Men entrepreneur" = "#e8d21d", "Women entrepreneur" = "#039fbe")) + # Custom colors for conditions
  theme(legend.position = "none")  # Remove legend for a cleaner look

Seems like there is no drastic differences in sexist attitudes between the conditions. That’s good, it means our random assignment was successful in cancelling out group differences in sexism. There’s a slight edge in favor of women’s startups when it comes to investment. Let’s conduct t-test to see if this difference are statistically significant.

# Specifying the variables to undergo statistical testing.
variables_to_test <- c("BS", "HS", "viable", "Invest")

# Setting scientific notation penalty to avoid scientific notation in output
options(scipen = 999)

# Running t-tests for each variable between conditions to check for statistically significant differences.
# The reformulate function dynamically creates the formula needed for the t-test based on the variable name.
map(variables_to_test, ~t.test(reformulate("Condition", response = .), data = study_1)) 
[[1]]

    Welch Two Sample t-test

data:  BS by Condition
t = -0.90815, df = 385.45, p-value = 0.3644
alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
95 percent confidence interval:
 -0.23833528  0.08772845
sample estimates:
mean in group 0 mean in group 1 
       3.182746        3.258049 


[[2]]

    Welch Two Sample t-test

data:  HS by Condition
t = 1.0928, df = 385.98, p-value = 0.2752
alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
95 percent confidence interval:
 -0.08510283  0.29809371
sample estimates:
mean in group 0 mean in group 1 
       2.888219        2.781723 


[[3]]

    Welch Two Sample t-test

data:  viable by Condition
t = -0.54446, df = 385.99, p-value = 0.5864
alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
95 percent confidence interval:
 -0.3487286  0.1974744
sample estimates:
mean in group 0 mean in group 1 
       4.630102        4.705729 


[[4]]

    Welch Two Sample t-test

data:  Invest by Condition
t = -0.85506, df = 384.63, p-value = 0.3931
alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
95 percent confidence interval:
 -6839.460  2693.624
sample estimates:
mean in group 0 mean in group 1 
       36428.40        38501.32 

And… it turns out, the differences we spotted don’t pass the statistical significance test. So, the way our participants view and invest in these startups doesn’t hinge on whether a man or a woman is at the helm.

Zooming in on participant gender

Do men and women see things differently in our study? Let’s find out:

# Calculate descriptive statistics by participant gender to discern potential differences.
# The 'sex' column in 'study_1' is recoded so that 0 is recoded to 'Men participant' and 1 to 'Women participant'.
participant_gender_stats <- study_1 %>%
  mutate(
    sex = case_when(
      sex == 0 ~ "Men participant",
      sex == 1 ~ "Women participant"
    )) %>% 
  filter(!is.na(sex)) %>% 
  # Group the data by the newly updated 'sex' column.
  # Within each group (each unique condition), calculate the mean and standard deviation for the same set of variables.
  group_by(sex) %>%
  summarise(
    across(
      c(BS, HS, viable, Invest), 
      list(mean = ~mean(., na.rm = TRUE), sd = ~sd(., na.rm = TRUE))
    )
  )

participant_gender_stats %>% kable()
sex BS_mean BS_sd HS_mean HS_sd viable_mean viable_sd Invest_mean Invest_sd
Men participant 3.384204 0.7959016 3.092966 0.9308327 4.488688 1.453400 33711.55 24439.65
Women participant 2.995565 0.7949420 2.472838 0.8653258 4.884146 1.204029 42367.46 22103.41

Let’s graph these mean and sd values like we did before.

participant_gender_stats %>% 
  pivot_longer(cols = ends_with("_mean") | ends_with("_sd"), 
               names_to = "Metric_Type", 
               values_to = "Value") %>% 
  separate(Metric_Type, into = c("Variable", "Stat"), sep = "_") %>% 
  pivot_wider(names_from = Stat, values_from = Value) %>% 
  mutate(Variable = str_replace(Variable, "BS", "Benevolent Sexism"),
         Variable = str_replace(Variable, "HS", "Hostile Sexism"),
         Variable = str_replace(Variable, "viable", "Perceived Viability"),
         Variable = str_replace(Variable, "Invest", "Investment Decisions")) %>% 
  ggplot(aes(x = sex, y = mean, fill = sex)) +
  geom_bar(stat = "identity", position = position_dodge(width = 0.8), width = 0.7) +
  geom_errorbar(aes(ymin = mean - sd, ymax = mean + sd), position = position_dodge(width = 0.8), width = 0.25) +
  facet_wrap(~Variable, scales = "free_y", ncol = 2) +
  labs(x = "", y = "Mean with SD as error bars") +
  theme_minimal() +
  scale_fill_manual(values = c("Men participant" = "#ecc19c", "Women participant" = "#1e847f")) +
  theme(legend.position = "none")

Looks like the women in our study endorse benevolent and hostile sexism less than the men, although gender difference in benevolent sexism is not as big. Interestingly, they are more generous with their startup evaluations and investments. Time for one more round of t-tests to see if these observations hold water.

# Automating t-tests to compare variables between participant gender groups.
map(c("BS", "HS", "viable", "Invest"), ~t.test(reformulate("sex", response = .), data = study_1)) 
[[1]]

    Welch Two Sample t-test

data:  BS by sex
t = 4.7411, df = 351.56, p-value = 0.000003094
alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
95 percent confidence interval:
 0.2274194 0.5498579
sample estimates:
mean in group 0 mean in group 1 
       3.384204        2.995565 


[[2]]

    Welch Two Sample t-test

data:  HS by sex
t = 6.7316, df = 364.17, p-value = 0.00000000006541
alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
95 percent confidence interval:
 0.4389708 0.8012846
sample estimates:
mean in group 0 mean in group 1 
       3.092966        2.472838 


[[3]]

    Welch Two Sample t-test

data:  viable by sex
t = -2.9155, df = 378.34, p-value = 0.003762
alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
95 percent confidence interval:
 -0.6621582 -0.1287589
sample estimates:
mean in group 0 mean in group 1 
       4.488688        4.884146 


[[4]]

    Welch Two Sample t-test

data:  Invest by sex
t = -3.6275, df = 368, p-value = 0.0003266
alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
95 percent confidence interval:
 -13348.238  -3963.589
sample estimates:
mean in group 0 mean in group 1 
       33711.55        42367.46 

The results show that women indeed do endorse benevolent and hostile sexism less than men. They also gave the startup higher evaluation and higher funding.

Scaling up: automating across studies

With Study 1 under our belt, we’re now ready to extend our analysis across multiple studies. In experimental psychological research like ours, it’s common to conduct multiple studies. Each one might adjust certain variables or conditions to ensure that our observations are not just flukes but reflect genuine, robust phenomena.

Now, obviously we can manually repeat the same codes for each study. But that can be both time-consuming and prone to human error. Automating is like having a trusted assistant who performs the same tasks for multiple datasets with unwavering accuracy, saving us time to focus on the bigger picture.

Automating descriptive stats calculation

We start by writing a function. A function is like as a recipe - it takes various “ingredients” (data) and, through a series of “cooking” steps (processing), delivers a delectable “dish” (outcome). In our case, the perform_descriptive_analysis function will take in the data for each study, calculate descriptive statistics for the whole sample and for separate groups, and serve up a comprehensive summary in a neatly organized dataframe.

# Initial setup for descriptive analysis automation.
perform_descriptive_analysis <- function(data, study_name) {
  # Recoding the variables for clarity
  data <- data %>%
    mutate(
      Condition = factor(case_when(Condition == 0 ~ "Men entrepreneur",
                                   Condition == 1 ~ "Women entrepreneur",
                                   TRUE ~ as.character(Condition)
                                   )), 
      sex = factor(case_when(sex == 0 ~ "Men participant",
                             sex == 1 ~ "Women participant",
                             TRUE ~ "Other participant gender"
                             ))
    )
  
  # Calculate stats for the entire sample to give us a baseline understanding of the dataset.
  overall_stats <- data %>%
    summarise(
      across(
        c(BS, HS, viable, Invest),
        list(mean = ~mean(., na.rm = TRUE), sd = ~sd(., na.rm = TRUE))
      ),
      n = n() # Capturing sample size for each analysis segment.
    ) %>% 
    mutate(Condition = "Overall") # Labeling these stats as 'Overall' for easy identification.

  # Calculate statistics for each experimental condition
  condition_stats <- data %>%
    group_by(Condition) %>%
    summarise(
      across(
        c(BS, HS, viable, Invest),
        list(mean = ~mean(., na.rm = TRUE), sd = ~sd(., na.rm = TRUE))
      ),
      n = n(),
      .groups = 'drop' # Ensuring the grouped structure is dropped post-summarization for simplicity.
    ) 

  
  # Calculate statistics for each participant gender group
  participant_gender_stats <- data %>%
    group_by(sex) %>%
    summarise(
      across(
        c(BS, HS, viable, Invest),
        list(mean = ~mean(., na.rm = TRUE), sd = ~sd(., na.rm = TRUE))
      ),
      n = n(),
      .groups = 'drop'
    ) %>%
  mutate(Condition = as.character(sex)) %>% # Labeling these stats for each participant gender group
  select(-sex)  # Removing the now redundant 'sex' column.

  # Compiling all stats into one comprehensive dataframe.
  combined_stats <- bind_rows(overall_stats, condition_stats, participant_gender_stats) 

  return(combined_stats) # Delivering the compiled dataframe as the function's output.
}

Next, we can use map_df from the purrr package to apply the function. It’s like having an army of robots at your disposal, each programmed to carry out the recipe on different datasets.

# List of datasets
studies <- list(study_1 = study_1, study_2 = study_2, study_3 = study_3)

# Apply 'perform_descriptive_analysis' to each dataset using 'map_df'
# '.id = "Study_Name"' adds a column with the name of each study, keeping track of which study each result came from.
descriptive_results <- map_df(names(studies), ~perform_descriptive_analysis(studies[[.x]], .x), .id = "Study_name")

# Presenting the aggregated results.
descriptive_results %>% mutate_if(is.numeric, ~ round(., 2)) %>% kable()
Study_name BS_mean BS_sd HS_mean HS_sd viable_mean viable_sd Invest_mean Invest_sd n Condition
1 3.22 0.82 2.84 0.96 4.67 1.37 37456.82 23845.36 388 Overall
1 3.18 0.81 2.89 0.97 4.63 1.38 36428.40 24395.79 196 Men entrepreneur
1 3.26 0.82 2.78 0.95 4.71 1.36 38501.32 23290.16 192 Women entrepreneur
1 3.38 0.80 3.09 0.93 4.49 1.45 33711.55 24439.65 221 Men participant
1 3.39 0.77 3.70 1.69 6.00 1.00 43662.00 30663.18 3 Other participant gender
1 3.00 0.79 2.47 0.87 4.88 1.20 42367.46 22103.41 164 Women participant
2 2.91 1.00 2.55 1.17 5.07 1.50 37184.11 25116.82 572 Overall
2 2.93 0.97 2.57 1.17 5.16 1.51 37402.50 24158.75 287 Men entrepreneur
2 2.89 1.02 2.53 1.16 4.98 1.48 36965.73 26080.74 285 Women entrepreneur
2 3.17 0.94 2.86 1.13 4.96 1.55 34352.07 25256.80 297 Men participant
2 1.55 0.36 1.18 0.00 5.17 1.44 30000.00 30000.00 3 Other participant gender
2 2.64 0.98 2.22 1.11 5.20 1.43 40381.89 24618.98 272 Women participant
3 2.63 1.01 2.45 1.17 4.34 1.39 30557.96 23184.86 312 Overall
3 2.62 1.00 2.48 1.12 4.17 1.37 27957.24 23981.45 152 Men entrepreneur
3 2.64 1.02 2.42 1.21 4.51 1.39 33028.64 22195.22 160 Women entrepreneur
3 2.90 0.97 2.67 1.14 4.22 1.38 27774.32 22064.52 177 Men participant
3 1.68 0.06 1.09 0.13 4.00 0.00 10000.00 14142.14 2 Other participant gender
3 2.28 0.96 2.16 1.14 4.51 1.40 34571.65 24141.38 133 Women participant

Visualizing observations across studies

With our statistics in hand, we’re ready to dive into some visualizations to better grasp our data! We’ll start by looking at participant numbers for each study.

descriptive_results %>% 
  # filtering out entries tagged as 'Other participant gender' since there are too few participant in this group
  filter(Condition != "Other participant gender") %>% 
  # adjust the 'Condition' and 'Study_name' columns for clearer categorization and labeling in our visualizations.
  mutate(
    # Convert 'Condition' into a factor with specific levels for clear grouping in the plot.
    # This helps in differentiating between the experimental conditions and participant gender groups.
    Condition = factor(Condition, levels = c("Overall", "Men entrepreneur", "Women entrepreneur", "Men participant", "Women participant")),
    # Similarly, convert 'Study_name' into a factor and assign more descriptive labels ('Study 1', 'Study 2', 'Study 3').
    # This ensures that the plots clearly indicate which study the data is drawn from.
    Study_name = factor(Study_name, levels = unique(Study_name), labels = c("Study 1", "Study 2", "Study 3"))
  ) %>%
  # Create a bar plot with 'Condition' on the x-axis, the number of participants ('n') on the y-axis, and color-coded by 'Condition'.
  ggplot(aes(x = Condition, y = n, fill = Condition)) +
  geom_bar(stat = "identity", position = position_dodge()) +  # 'stat="identity"' indicates that the heights of the bars represent data values.
  # Add labels on top of each bar to display the exact number of participants. The 'position_dodge()' ensures the labels align with the bars.
  geom_text(aes(label = n), position = position_dodge(width = 0.75), vjust = -0.25, size = 3, color = "gray50") +
  # Use 'facet_wrap' to create separate plots for each study, enabling comparisons across studies.
  facet_wrap(~ Study_name, scales = "free_x", nrow = 1) +
  # Customize plot labels and theme for readability and aesthetics. Remove x-axis label for cleanliness.
  labs(title = "Sample Size Across Studies", x = "", y = "Sample Size") +
  theme_minimal() +  # Apply a minimal theme for a clean look.
  theme(axis.text.x = element_text(angle = 45, hjust = 1), legend.position = "none") +  # Adjust text angle for better legibility.
  scale_fill_brewer(palette = "Set1")  # Apply a color palette for visual distinction of conditions.

Study 2 has the highest number of participants (n = 572) while Study 3 has the lowest (n = 312). The difference makes sense; Study 2 was open to all US full-time employees, a much larger pool than Study 3’s niche of people with previous experience in startup evaluation.

The sample sizes across our experimental conditions are very balanced. And while there were more men than women in our participant pool across three studies, the numbers are close enough that we’re all set for fair comparisons.

Exploring variable distributions across studies

Now, let’s take our analysis up a notch by diving into the distributions of our main variables across all three studies. By plotting histograms, we can visually grasp how participant responses vary for each variable—letting us spot trends, outliers, and overall patterns at a glance.

# Define a function to create histograms for given variables across a single study.
# This function takes the dataset, the name of the study, a list of variables to plot,
# the bin widths for each histogram, the colors for the histograms, and x-axis limits.
create_histograms <- function(data, study_name, variables, binwidths, colors, x_limits) {
  # Loop through each variable to generate its histogram.
  plots <- map(seq_along(variables), ~ {
    # Create the histogram with specified aesthetics.
    ggplot(data, aes(x = .data[[variables[.x]]])) +
      geom_histogram(binwidth = binwidths[.x], fill = colors[.x], color = "white") + # No outline color for cleaner look.
      ggtitle(paste(study_name, "-", variables[.x])) + # Title includes study name and variable.
      theme_minimal() + # Minimalist theme for focus on the data.
      xlim(x_limits[[.x]]) # Set x-axis limits based on predefined limits.
  })
  # Arrange the generated plots in a grid layout for easier comparison.
  plot_grid(plotlist = plots, ncol = 2) 
}

# List of study names extracted from the studies list for iteration.
study_names <- names(studies)

# Generating histograms for each study by passing them through our custom function.
map(study_names, ~create_histograms(studies[[.x]], .x, variables, binwidths, colors, x_limits))
[[1]]


[[2]]


[[3]]

For Studies 1 and 2, it’s like most participants are on the same page in terms of their benevolent sexism scores, with scores clustering in a bell curve. But in Study 3, it’s a different story: the curve flattens out before dipping, suggesting that while a range of moderately benevolent sexist attitudes is somewhat evenly spread among participants, extremely high benevolent sexist attitudes are rare. This is also the case for hostile sexism scores in Study 1. Yet, in Studies 2 and 3, the distribution of hostile sexism scores resembles a downward line, suggesting a general trend among the participants towards lower levels of hostile sexism, with high levels being progressively less common.

Across the board, we’re seeing bell curves when it comes to the distribution of perceived viability scores. This tells us that most participants gravitate towards a common middle ground when it comes to how viable they think the startups are. In constrast, with peaks and valleys, the multimodal distribution for investment decisions reveals distinct participant groups based on how much they’re willing to invest.

Comparing experimental conditions and participant genders

Up next, we transition from broad statistics to focused comparisons. Specifically, we’re comparing the experimental conditions (men-led vs. women-led startups) and the participant genders. We’ll look at the mean responses and the variability within these groups through bar charts. This visual approach gives us a straightforward way to see if there are any notable differences or if the groups are more alike than not.

# Reshaping the results for easier visualization.
descriptive_results %>%
  # Transform our results to a long format
  # Each variable (e.g., benevolent sexism, hostile sexism) gets expanded into two rows—one for mean and one for SD.
  pivot_longer(cols = ends_with("_mean") | ends_with("_sd"), 
               names_to = "Metric_Type", 
               values_to = "Value") %>% 
  separate(Metric_Type, into = c("Variable", "Stat"), sep = "_") %>% 
  pivot_wider(names_from = Stat, values_from = Value) %>% 
  # rename variables for a clearer understanding in the graphs.
  mutate(Variable = str_replace(Variable, "BS", "Benevolent Sexism"),
         Variable = str_replace(Variable, "HS", "Hostile Sexism"),
         Variable = str_replace(Variable, "viable", "Perceived Viability"),
         Variable = str_replace(Variable, "Invest", "Investment Decisions")) -> descriptive_results_long

# Splitting the transformed data by study and condition/gender for targeted analysis.
# This allows us to separately analyze and visualize the data for experimental conditions and participant genders across each study.
condition_stats <- descriptive_results_long %>% filter(Condition %in% c("Men entrepreneur", "Women entrepreneur")) %>% split(.$Study_name)
participant_gender_stats <- descriptive_results_long %>% filter(Condition %in% c("Men participant", "Women participant")) %>% rename(sex = Condition) %>% split(.$Study_name)
# Define a function to crafting bar charts that showcase mean values and include error bars for standard deviation.
# This function is versatile, adapting to either compare experimental conditions or participant genders based on input.
generate_plot_for_study <- function(data, group_var) {
  # Determine whether we're plotting Condition or sex based on group_var parameter
  fill_var <- if (group_var == "Condition") {
    "Condition"
  } else {
    "sex"
  }
  
  # Set the title dynamically based on the group_var
  title_text <- if (group_var == "Condition") {
    "Experimental Conditions"
  } else {
    "Participant Gender Groups"
  }
  
  # Adjust the fill colors based on the group_var
  fill_values <- if (group_var == "Condition") {
    c("Men entrepreneur" = "#e8d21d", "Women entrepreneur" = "#039fbe")
  } else {
    c("Men participant" = "#ecc19c", "Women participant" = "#1e847f")
  }
  
  # The plotting command constructs the bar chart, using aesthetic mappings specific to the comparison type ('Condition' or 'sex').
  # 'geom_bar' creates the bars, 'geom_errorbar' adds the error bars, and 'facet_wrap' organizes variables into subplots for a comprehensive view.
  ggplot(data, aes_string(x = fill_var, y = "mean", fill = fill_var)) +
    geom_bar(stat = "identity", position = position_dodge(width = 0.8), width = 0.7) +
    geom_errorbar(aes_string(ymin = "mean - sd", ymax = "mean + sd"), 
                  position = position_dodge(width = 0.8), width = 0.25) +
    facet_wrap(~Variable, scales = "free_y", ncol = 2) +
    labs(title = title_text, x = "", y = "Mean with SD as error bars") +
    theme_minimal() +
    scale_fill_manual(values = fill_values) +
    theme(legend.position = "none")
}
# Generating and displaying the bar charts for experimental conditions.
map(condition_stats, generate_plot_for_study, group_var = "Condition")
Warning: `aes_string()` was deprecated in ggplot2 3.0.0.
ℹ Please use tidy evaluation idioms with `aes()`.
ℹ See also `vignette("ggplot2-in-packages")` for more information.
$`1`


$`2`


$`3`

This visual dive shows us that for the most part, people in the men-led and women-led startup condition are remarkably consistent across various metrics. But Study 3 suggests a slight edge for women-led startups in perceived viability and funding.

What about our men and women participants? Do they differ in these key variables?

# Generating and displaying the bar charts for participant gender group 
map(participant_gender_stats, generate_plot_for_study, group_var = "sex")
$`1`


$`2`


$`3`

Here, the narrative remains steady. Women participants consistently show lower endorsement of sexist attitudes and are more generous in their evaluations and funding.

Performing t-tests across studies

Now, we dive into t-tests to validate if what we saw in our charts stands up to statistical rigor.

# Function to perform t-tests for specified variables across conditions and gender, ensuring comparability.
perform_combined_t_tests <- function(data, study_name, variables) {
  # Standardizing 'Condition' and 'sex' as factors to maintain clear and consistent group distinctions.
  data <- data %>%
    mutate(
      Condition = factor(Condition,
                         levels = c("0", "1"),
                         labels = c("Men entrepreneur", "Women entrepreneur")),
      sex = factor(sex,
                   levels = c("0", "1"),
                   labels = c("Men participant", "Women participant"))
    )
  
  # Preparing to capture t-test results across all variables.
  all_t_test_results <- list()
  
  # t-tests for comparing experimental conditions, encapsulating each result within a structured list.
  condition_t_test_results <- map(variables, ~ {
    t_test <- t.test(reformulate("Condition", response = .x), data = data)
    list(variable = .x, 
         comparison_type = "Condition", 
         groups_compared = "Men entrepreneur vs Women entrepreneur", 
         t_test_summary = broom::tidy(t_test))
  })
  all_t_test_results <- append(all_t_test_results, condition_t_test_results)
  
  # Similar t-tests for participant gender, again storing results in a structured format for easy interpretation.
  gender_t_test_results <- map(variables, ~ {
    t_test <- t.test(reformulate("sex", response = .x), data = data)
    list(variable = .x, 
         comparison_type = "Gender", 
         groups_compared = "Men participant vs Women participant", 
         t_test_summary = broom::tidy(t_test))
  })
  all_t_test_results <- append(all_t_test_results, gender_t_test_results)
  
  # Assembling t-test summaries into a cohesive dataframe, adding context about the variable and comparison type.
  t_test_df <- map_df(all_t_test_results, ~ .x$t_test_summary) %>%
    mutate(
      variable = map_chr(all_t_test_results, ~ .x$variable),
      Comparison = map_chr(all_t_test_results, ~ .x$groups_compared),
      Study = study_name
    )
  
  return(t_test_df)
}

# Executing t-tests across all studies and variables, reformatting for readability and context.
map_df(study_names, ~perform_combined_t_tests(studies[[.x]], .x, variables), .id = "Study") %>% 
  rename(
    Mean_Difference = estimate, 
    Mean_Group1 = estimate1,
    Mean_Group2 = estimate2,
    T_Statistic = statistic,
    P_Value = p.value,
    Degrees_of_Freedom = parameter,
    CI_Low = conf.low,
    CI_High = conf.high,
    Test_Method = method,
    Hypothesis_Testing = alternative,
    Variable_Tested = variable,
    Groups_Compared = Comparison
  ) %>% 
  relocate(Study, Variable_Tested, Groups_Compared) %>% 
  arrange(Variable_Tested) -> ttest_results

ttest_results %>% kable()
Study Variable_Tested Groups_Compared Mean_Difference Mean_Group1 Mean_Group2 T_Statistic P_Value Degrees_of_Freedom CI_Low CI_High Test_Method Hypothesis_Testing
1 BS Men entrepreneur vs Women entrepreneur -0.0753034 3.182746 3.258049 -0.9081468 0.3643681 385.4518 -0.2383353 0.0877285 Welch Two Sample t-test two.sided
1 BS Men participant vs Women participant 0.3886386 3.384204 2.995565 4.7410538 0.0000031 351.5642 0.2274194 0.5498579 Welch Two Sample t-test two.sided
2 BS Men entrepreneur vs Women entrepreneur 0.0391166 2.925879 2.886762 0.4689355 0.6392957 567.9061 -0.1247245 0.2029578 Welch Two Sample t-test two.sided
2 BS Men participant vs Women participant 0.5317086 3.167738 2.636029 6.5939030 0.0000000 556.5458 0.3733197 0.6900975 Welch Two Sample t-test two.sided
3 BS Men entrepreneur vs Women entrepreneur -0.0130981 2.624402 2.637500 -0.1142094 0.9091458 309.7260 -0.2387580 0.2125619 Welch Two Sample t-test two.sided
3 BS Men participant vs Women participant 0.6240524 2.904982 2.280930 5.6356737 0.0000000 284.4628 0.4060933 0.8420115 Welch Two Sample t-test two.sided
1 HS Men entrepreneur vs Women entrepreneur 0.1064954 2.888219 2.781724 1.0928270 0.2751513 385.9842 -0.0851028 0.2980937 Welch Two Sample t-test two.sided
1 HS Men participant vs Women participant 0.6201277 3.092966 2.472838 6.7316285 0.0000000 364.1709 0.4389708 0.8012846 Welch Two Sample t-test two.sided
2 HS Men entrepreneur vs Women entrepreneur 0.0410624 2.569930 2.528868 0.4206748 0.6741514 569.0000 -0.1506592 0.2327841 Welch Two Sample t-test two.sided
2 HS Men participant vs Women participant 0.6371947 2.861794 2.224599 6.7742614 0.0000000 563.5546 0.4524415 0.8219479 Welch Two Sample t-test two.sided
3 HS Men entrepreneur vs Women entrepreneur 0.0625897 2.479067 2.416477 0.4741748 0.6357092 309.8832 -0.1971343 0.3223137 Welch Two Sample t-test two.sided
3 HS Men participant vs Women participant 0.5096408 2.674371 2.164730 3.8997625 0.0001202 284.3998 0.2524081 0.7668736 Welch Two Sample t-test two.sided
1 Invest Men entrepreneur vs Women entrepreneur -2072.9177083 36428.400000 38501.317708 -0.8550577 0.3930515 384.6347 -6839.4597900 2693.6243733 Welch Two Sample t-test two.sided
1 Invest Men participant vs Women participant -8655.9134146 33711.550000 42367.463415 -3.6274683 0.0003266 367.9959 -13348.2381987 -3963.5886306 Welch Two Sample t-test two.sided
2 Invest Men entrepreneur vs Women entrepreneur 436.7703180 37402.498233 36965.727915 0.2066800 0.8363348 560.7269 -3714.1188533 4587.6594894 Welch Two Sample t-test two.sided
2 Invest Men participant vs Women participant -6029.8134834 34352.074576 40381.888060 -2.8668131 0.0043028 558.2159 -10161.1944967 -1898.4324701 Welch Two Sample t-test two.sided
3 Invest Men entrepreneur vs Women entrepreneur -5071.4003289 27957.243421 33028.643750 -1.9359110 0.0538026 304.9616 -10226.2688059 83.4681480 Welch Two Sample t-test two.sided
3 Invest Men participant vs Women participant -6797.3377512 27774.316384 34571.654135 -2.5451717 0.0114793 269.9483 -12055.3467311 -1539.3287712 Welch Two Sample t-test two.sided
1 viable Men entrepreneur vs Women entrepreneur -0.0756271 4.630102 4.705729 -0.5444594 0.5864398 385.9875 -0.3487286 0.1974744 Welch Two Sample t-test two.sided
1 viable Men participant vs Women participant -0.3954586 4.488688 4.884146 -2.9155343 0.0037622 378.3387 -0.6621582 -0.1287589 Welch Two Sample t-test two.sided
2 viable Men entrepreneur vs Women entrepreneur 0.1813687 5.163763 4.982394 1.4478839 0.1482003 568.9732 -0.0646689 0.4274063 Welch Two Sample t-test two.sided
2 viable Men participant vs Women participant -0.2442866 4.956081 5.200368 -1.9509215 0.0515594 565.9984 -0.4902313 0.0016582 Welch Two Sample t-test two.sided
3 viable Men entrepreneur vs Women entrepreneur -0.3351974 4.171053 4.506250 -2.1421644 0.0329602 309.5410 -0.6430886 -0.0273062 Welch Two Sample t-test two.sided
3 viable Men participant vs Women participant -0.2909392 4.220339 4.511278 -1.8217760 0.0695471 282.3129 -0.6052948 0.0234164 Welch Two Sample t-test two.sided

That’s a lot of results. Let’s break down these findings, starting with benevolent sexism.

# Presenting t-test results specifically for benevolent sexism.
ttest_results %>% filter(Variable_Tested == "BS") %>% kable()
Study Variable_Tested Groups_Compared Mean_Difference Mean_Group1 Mean_Group2 T_Statistic P_Value Degrees_of_Freedom CI_Low CI_High Test_Method Hypothesis_Testing
1 BS Men entrepreneur vs Women entrepreneur -0.0753034 3.182746 3.258049 -0.9081468 0.3643681 385.4518 -0.2383353 0.0877285 Welch Two Sample t-test two.sided
1 BS Men participant vs Women participant 0.3886386 3.384204 2.995565 4.7410538 0.0000031 351.5642 0.2274194 0.5498579 Welch Two Sample t-test two.sided
2 BS Men entrepreneur vs Women entrepreneur 0.0391166 2.925879 2.886762 0.4689355 0.6392957 567.9061 -0.1247245 0.2029578 Welch Two Sample t-test two.sided
2 BS Men participant vs Women participant 0.5317086 3.167738 2.636029 6.5939030 0.0000000 556.5458 0.3733197 0.6900975 Welch Two Sample t-test two.sided
3 BS Men entrepreneur vs Women entrepreneur -0.0130981 2.624402 2.637500 -0.1142094 0.9091458 309.7260 -0.2387580 0.2125619 Welch Two Sample t-test two.sided
3 BS Men participant vs Women participant 0.6240524 2.904982 2.280930 5.6356737 0.0000000 284.4628 0.4060933 0.8420115 Welch Two Sample t-test two.sided

In our studies, there’s balance in benevolent sexism scores between conditions—thanks to random assignment. However, a noticeable gender gap emerges, with men participants showing higher levels.

Next up, hostile sexism.

# Presenting t-test results specifically for hostile sexism.
ttest_results %>% filter(Variable_Tested == "HS") %>% kable()
Study Variable_Tested Groups_Compared Mean_Difference Mean_Group1 Mean_Group2 T_Statistic P_Value Degrees_of_Freedom CI_Low CI_High Test_Method Hypothesis_Testing
1 HS Men entrepreneur vs Women entrepreneur 0.1064954 2.888219 2.781724 1.0928270 0.2751513 385.9842 -0.0851028 0.2980937 Welch Two Sample t-test two.sided
1 HS Men participant vs Women participant 0.6201277 3.092966 2.472838 6.7316285 0.0000000 364.1709 0.4389708 0.8012846 Welch Two Sample t-test two.sided
2 HS Men entrepreneur vs Women entrepreneur 0.0410624 2.569930 2.528868 0.4206748 0.6741514 569.0000 -0.1506592 0.2327841 Welch Two Sample t-test two.sided
2 HS Men participant vs Women participant 0.6371947 2.861794 2.224599 6.7742614 0.0000000 563.5546 0.4524415 0.8219479 Welch Two Sample t-test two.sided
3 HS Men entrepreneur vs Women entrepreneur 0.0625897 2.479067 2.416477 0.4741748 0.6357092 309.8832 -0.1971343 0.3223137 Welch Two Sample t-test two.sided
3 HS Men participant vs Women participant 0.5096408 2.674371 2.164730 3.8997625 0.0001202 284.3998 0.2524081 0.7668736 Welch Two Sample t-test two.sided

Hostile sexsim scores follow a similar trend to benevolent sexism scores: no difference between experimental conditions and higher among men participants than women participants.

Similar to benevolent sexism, experimental conditions are balanced in terms of hostile sexism scores. Yet, men participants outscored women, indicating a gender divide.

What about startup viability perceptions?

# Presenting t-test results specifically for viability.
ttest_results %>% filter(Variable_Tested == "viable") %>% kable()
Study Variable_Tested Groups_Compared Mean_Difference Mean_Group1 Mean_Group2 T_Statistic P_Value Degrees_of_Freedom CI_Low CI_High Test_Method Hypothesis_Testing
1 viable Men entrepreneur vs Women entrepreneur -0.0756271 4.630102 4.705729 -0.5444594 0.5864398 385.9875 -0.3487286 0.1974744 Welch Two Sample t-test two.sided
1 viable Men participant vs Women participant -0.3954586 4.488688 4.884146 -2.9155343 0.0037622 378.3387 -0.6621582 -0.1287589 Welch Two Sample t-test two.sided
2 viable Men entrepreneur vs Women entrepreneur 0.1813687 5.163763 4.982394 1.4478839 0.1482003 568.9732 -0.0646689 0.4274063 Welch Two Sample t-test two.sided
2 viable Men participant vs Women participant -0.2442866 4.956081 5.200368 -1.9509215 0.0515594 565.9984 -0.4902313 0.0016582 Welch Two Sample t-test two.sided
3 viable Men entrepreneur vs Women entrepreneur -0.3351974 4.171053 4.506250 -2.1421644 0.0329602 309.5410 -0.6430886 -0.0273062 Welch Two Sample t-test two.sided
3 viable Men participant vs Women participant -0.2909392 4.220339 4.511278 -1.8217760 0.0695471 282.3129 -0.6052948 0.0234164 Welch Two Sample t-test two.sided

Surprisingly, men- and women-led startups were seen as equally viable in most studies. An interesting deviation in Study 3 paints women-led startups more favorably. This upends common stereotypes, likely due the fact that in our experimental scenario the entrepreneurs are portrayed as highly competent and the startup was pre-tested to be seen as a viable idea. And, as we’ll see in subsequent regression analyses, while on the surface there seems to be no bias, benevolent sexism actually plays a role in creating inequity in startup evaluation.

Lastly, the matter of investment decisions.

ttest_results %>% filter(Variable_Tested == "Invest") %>% kable()
Study Variable_Tested Groups_Compared Mean_Difference Mean_Group1 Mean_Group2 T_Statistic P_Value Degrees_of_Freedom CI_Low CI_High Test_Method Hypothesis_Testing
1 Invest Men entrepreneur vs Women entrepreneur -2072.9177 36428.40 38501.32 -0.8550577 0.3930515 384.6347 -6839.460 2693.62437 Welch Two Sample t-test two.sided
1 Invest Men participant vs Women participant -8655.9134 33711.55 42367.46 -3.6274683 0.0003266 367.9959 -13348.238 -3963.58863 Welch Two Sample t-test two.sided
2 Invest Men entrepreneur vs Women entrepreneur 436.7703 37402.50 36965.73 0.2066800 0.8363348 560.7269 -3714.119 4587.65949 Welch Two Sample t-test two.sided
2 Invest Men participant vs Women participant -6029.8135 34352.07 40381.89 -2.8668131 0.0043028 558.2159 -10161.194 -1898.43247 Welch Two Sample t-test two.sided
3 Invest Men entrepreneur vs Women entrepreneur -5071.4003 27957.24 33028.64 -1.9359110 0.0538026 304.9616 -10226.269 83.46815 Welch Two Sample t-test two.sided
3 Invest Men participant vs Women participant -6797.3378 27774.32 34571.65 -2.5451717 0.0114793 269.9483 -12055.347 -1539.32877 Welch Two Sample t-test two.sided

Financial backing was fairly even men- and women-led startups in all studies, though men participants were somewhat more conservative in their funding. This peels back layers on how participant gender influences startup support.

Summary

In this journey through our dataset, we’ve taken some crucial first steps before diving into the deeper waters of regression analysis. By calculating descriptive statistics, peeking at our sample sizes through colorful bar charts, exploring the shapes of our key variables with histograms, and comparing means across different groups with bar charts and t-tests, we’ve essentially mapped out the terrain of our data landscape. This initial exploration is not merely about crunching numbers—it’s about getting to know the fundamental properties our data—recognizing its patterns, its quirks, and how it speaks to the larger story we’re aiming to tell. In the next phase, we’ll dive into another crucial step before regression analysis: exploring the relationships between variables through correlation analyses.