Descriptive statistics: getting to know our data

Author

Julie Nguyen

In this notebook, we presents the analysis behind our research, “Benevolent Sexism and the Gender Gap in Startup Evaluation,” featured in “Entrepreneurship: Theory and Practice” and “The Conversation Canada”. Our study explores into how evaluators’ benevolent sexism might influence their assessment of startups, contingent on the founder’s gender.

At the heart of our investigation lies a critical question: Does benevolent sexism skew evaluators’ views on the viability of startups led by men versus women? To dissect this, we orchestrated three experimental studies. Participants were subtly assigned to evaluate startups headed by either gender, while we separately measured their levels of benevolent and hostile sexism.

Key variables include:

Entrepreneur gender (Condition) coded 0 for men and 1 for women entrepreneurs.
Participant gender (sex) indicates the evaluator’s gender, with 0 for men and 1 for women.
Participant benevolent sexism (BS) and hostile sexism (HS) rated on a scale of 1-6, reflecting participants’ endorsement of different forms of sexism
Participant perceptions of startup viability (viable) assessed on a scale of 1-7, reflecting participants’ views on the startup’s potential success.
Participant funding allocations (Invest) captures the financial commitment participants are willing to make, ranging from 0 to 100,000.

Navigating the analysis:

We kick-started our analytical journey with Study 1 and then automate the process across three studies. Specifically, we calculated descriptive statistics for key variables for the whole sample and separately for each experimental condition and each participant gender group. We then visualize sample sizes to assess participant distribution, create histograms to understand variable distributions, comparing means and standard deviations across groups via bar charts, and testing these group differences through t-tests. In doing so, we gain an understanding the basic structure and distribution of our data, setting the stage for more complex regression analyses later on.

knitr::opts_chunk$set(message = FALSE, warning = FALSE)

# Load the necessary libraries
library(dplyr)
library(tidyr)
library(ggplot2)
library(purrr)
library(psych)
library(kableExtra)
library(stringr)
library(broom)
library(patchwork)
library(cowplot)

# Load the data for three studies 
study_1 <- readRDS("/Users/mac/Library/CloudStorage/OneDrive-McGillUniversity/Work/Projects/BS in entre/Data/Main studies/Study 1/Data/R data/study_1.rds")
study_2 <- readRDS("~/Library/CloudStorage/OneDrive-McGillUniversity/Work/Projects/BS in entre/Data/Main studies/Study 2/Data/R data/study_2.rds")
study_3 <- readRDS("~/Library/CloudStorage/OneDrive-McGillUniversity/Work/Projects/BS in entre/Data/Main studies/Study 3/Data/R data/study_3.rds")

Kicking off with study 1

Welcome to the beginning of our exploration! 🌟 In this segment, we dive into the Study 1 dataset to uncover insights through descriptive statistics of key variables.

Understanding the participants

Let’s first take a glimpse at our data:

# Initial glimpse at the dataset to check the first few entries. This helps in getting a basic understanding of data structure and types of variables collected in the study.
study_1 %>% select(id, Condition, BS, HS, sex, viable, Invest) %>% head()

# A tibble: 6 × 7
  id     Condition    BS    HS   sex viable Invest
  <chr>      <dbl> <dbl> <dbl> <dbl>  <dbl>  <dbl>
1 116899         1  2.55  2.64     1    5.5  44085
2 116977         1  2.36  2        1    5.5  50000
3 117031         1  2.73  3.18     1    5    60845
4 110215         1  3.64  1.27     1    7    40141
5 112513         1  2     2.73     0    5.5  20000
6 117004         1  3.18  4        0    5    30000

This table shows a snapshot of our dataset. Each row is a unique participant, with columns detailing their characteristics: their ID (id), the experimental scenario they were assigned (Condition), how much they agreed with benevolent sexist (BS) and hostile sexist beliefs (HS), their gender (sex), how viable they thought the startup was (viable), and how much they were willing to support it (Invest).

Now, let’s calculate mean and standard deviation across key variables.

# Calculating mean and standard deviation for key variables (BS, HS, viable, Invest) to get an overview of central tendencies and variability. `na.rm = TRUE` ensures missing values are ignored in these calculations.
overall_stats <- study_1 %>% 
  summarise(
    across(
      c(BS, HS, viable, Invest), 
      list(mean = mean, sd = sd), 
      na.rm = TRUE))

Warning: There was 1 warning in `summarise()`.
ℹ In argument: `across(...)`.
Caused by warning:
! The `...` argument of `across()` is deprecated as of dplyr 1.1.0.
Supply arguments directly to `.fns` through an anonymous function instead.

  # Previously
  across(a:b, mean, na.rm = TRUE)

  # Now
  across(a:b, \(x) mean(x, na.rm = TRUE))

overall_stats %>% kable()

BS_mean	BS_sd	HS_mean	HS_sd	viable_mean	viable_sd	Invest_mean	Invest_sd
3.220009	0.8162942	2.83552	0.9602238	4.667526	1.366932	37456.82	23845.36

It looks like our participants agree with benevolent sexism more than hostile sexism. No big surprise there—society often dresses up these attitudes as chivalry instead of prejudice. Our participants are also quite optimistic about the startup’s potential, scoring its viability pretty high (4.7 out of 7). However, they’re a tad more cautious when it comes to actually investing, with investment amount averaging at 37K out of 100K. A classic case of “let’s see where this goes”.

Let’s visualize the distribution of these variables to better grasp how our participants’ opinions spread out. For this, we’ll use histograms.

# Setting up for visualization
# Define key variables, their bin widths, and assigned colors for differentiation
variables <- c("BS", "HS", "viable", "Invest") # `variables` are the key variables of interest
binwidths <- c(0.5, 0.5, 0.5, 5000) # `binwidths` determine the granularity of the histogram
colors <- c("#cf1578", "#e8d21d", "#039fbe", "#b20238") # `colors` are visually distinguishing each variable's histogram
x_limits <- list(c(1, 6), c(1, 6), c(1, 7), c(0, 100000)) # Define x-axis limits based on expected data ranges

# Generate histograms using map to iterate over the key variables and their corresponding attributes
plots <- map(seq_along(variables), ~ {
  var_plot <- ggplot(study_1, aes(x = .data[[variables[.x]]])) +
    geom_histogram(binwidth = binwidths[.x], fill = colors[.x], color = NA) +  # No outline around bins
    ggtitle(paste("Histogram of", variables[.x])) +
    xlab(variables[.x]) +
    ylab("Frequency") +
    theme_minimal() +
    scale_x_continuous(limits = x_limits[[.x]], oob = scales::oob_squish)  # Adjust x-axis based on variable
})

# Assembling histograms into a cohesive visual layout for side-by-side comparison.
(plots[[1]] + plots[[2]]) / (plots[[3]] + plots[[4]])

Warning: Removed 1 row containing non-finite outside the scale range
(`stat_bin()`).

Benevolent sexism, hostile sexism, and perceived viability scores generally follow a normal distribution, indicating a relatively even spread of opinions on these scales. Investment amounts, though, presents what looks like a multimodal distribution with varied peaks. This hints at distinct groups of participants based on their willingness to invest.

Zooming in on the experimental conditions

Let’s narrow down our focus and compare startups led by men versus those led by women. How many observations we have in each condition?

# Counting the number of observations within each experimental condition to ensure sufficient data for each group.

study_1 %>%
  mutate(Condition = case_when(
      Condition == 0 ~ "Men entrepreneur",
      Condition == 1 ~ "Women entrepreneur",
      TRUE ~ as.character(Condition)  
    )
  ) %>% count(Condition)

# A tibble: 2 × 2
  Condition              n
  <chr>              <int>
1 Men entrepreneur     196
2 Women entrepreneur   192

Looks like we have a balanced number of observations for each experimental condition. Nice! This is crucial for subsequent comparative analysis to be meaningful.

Let’s dig deeper and calculate means and standard deviations for each experimental condition.

# Calculate descriptive statistics by the experimental condition to discern potential differences.
# The 'Condition' column in 'study_1' is recoded so that 0 is recoded to 'Men entrepreneur' and 1 to 'Women entrepreneur'.
condition_stats <- study_1 %>%
  mutate(
    Condition = case_when(
      Condition == 0 ~ "Men entrepreneur",
      Condition == 1 ~ "Women entrepreneur",
      TRUE ~ as.character(Condition)
    )
  ) %>% 
  # Group the data by the newly updated 'Condition' column.
  # Within each group (each unique condition), calculate the mean and standard deviation for the same set of variables.
  group_by(Condition) %>%
  summarise(
    across(
      c(BS, HS, viable, Invest), 
      list(mean = ~mean(., na.rm = TRUE), sd = ~sd(., na.rm = TRUE))
    )
  )

condition_stats %>% kable()

Condition	BS_mean	BS_sd	HS_mean	HS_sd	viable_mean	viable_sd	Invest_mean	Invest_sd
Men entrepreneur	3.182746	0.8095630	2.888219	0.9727516	4.630102	1.378279	36428.40	24395.79
Women entrepreneur	3.258049	0.8234795	2.781724	0.9467708	4.705729	1.357785	38501.32	23290.16

Let’s break down these stats visually. It’s always a bit easier to spot patterns and contrasts with a graph rather than a table:

# Transforming condition stats for visualization. 
# We pivot longer to have a single measure and stat type per row, then pivot wider to separate mean and sd for plotting. 
condition_stats %>% 
  pivot_longer(cols = ends_with("_mean") | ends_with("_sd"), # Select columns that end with '_mean' or '_sd'
               names_to = "Metric_Type", # New column where original column names (indicating metric and stat type) are stored
               values_to = "Value") %>% # New column where values from the selected columns are stored
  separate(Metric_Type, into = c("Variable", "Stat"), sep = "_") %>% # Split 'Metric_Type' into 'Variable' and 'Stat' based on '_'
  pivot_wider(names_from = Stat, values_from = Value) %>% # Pivot back to a wider format where 'mean' and 'sd' become separate columns
  # Replace abbreviated variable names with full, descriptive names for clarity in visual representation
  mutate(Variable = str_replace(Variable, "BS", "Benevolent Sexism"),
         Variable = str_replace(Variable, "HS", "Hostile Sexism"),
         Variable = str_replace(Variable, "viable", "Perceived Viability"),
         Variable = str_replace(Variable, "Invest", "Investment Decisions")) %>% 
  ggplot(aes(x = Condition, y = mean, fill = Condition)) + # Plotting setup: X-axis is Condition, Y-axis is mean, colored by Condition
  geom_bar(stat = "identity", position = position_dodge(width = 0.8), width = 0.7) +  # Draw bars for mean values, dodge positions them side by side
  geom_errorbar(aes(ymin = mean - sd, ymax = mean + sd), position = position_dodge(width = 0.8), width = 0.25) + # Add error bars for SD
  facet_wrap(~Variable, scales = "free_y", ncol = 2) +  # Separate plots for each variable, allowing Y-axis to scale independently
  labs(x = "", y = "Mean with SD as error bars") + # Labeling axes
  theme_minimal() + # Minimal theme for a clean look
  scale_fill_manual(values = c("Men entrepreneur" = "#e8d21d", "Women entrepreneur" = "#039fbe")) + # Custom colors for conditions
  theme(legend.position = "none")  # Remove legend for a cleaner look

Seems like there is no drastic differences in sexist attitudes between the conditions. That’s good, it means our random assignment was successful in cancelling out group differences in sexism. There’s a slight edge in favor of women’s startups when it comes to investment. Let’s conduct t-test to see if this difference are statistically significant.

# Specifying the variables to undergo statistical testing.
variables_to_test <- c("BS", "HS", "viable", "Invest")

# Setting scientific notation penalty to avoid scientific notation in output
options(scipen = 999)

# Running t-tests for each variable between conditions to check for statistically significant differences.
# The reformulate function dynamically creates the formula needed for the t-test based on the variable name.
map(variables_to_test, ~t.test(reformulate("Condition", response = .), data = study_1))

[[1]]

    Welch Two Sample t-test

data:  BS by Condition
t = -0.90815, df = 385.45, p-value = 0.3644
alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
95 percent confidence interval:
 -0.23833528  0.08772845
sample estimates:
mean in group 0 mean in group 1 
       3.182746        3.258049 


[[2]]

    Welch Two Sample t-test

data:  HS by Condition
t = 1.0928, df = 385.98, p-value = 0.2752
alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
95 percent confidence interval:
 -0.08510283  0.29809371
sample estimates:
mean in group 0 mean in group 1 
       2.888219        2.781723 


[[3]]

    Welch Two Sample t-test

data:  viable by Condition
t = -0.54446, df = 385.99, p-value = 0.5864
alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
95 percent confidence interval:
 -0.3487286  0.1974744
sample estimates:
mean in group 0 mean in group 1 
       4.630102        4.705729 


[[4]]

    Welch Two Sample t-test

data:  Invest by Condition
t = -0.85506, df = 384.63, p-value = 0.3931
alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
95 percent confidence interval:
 -6839.460  2693.624
sample estimates:
mean in group 0 mean in group 1 
       36428.40        38501.32

And… it turns out, the differences we spotted don’t pass the statistical significance test. So, the way our participants view and invest in these startups doesn’t hinge on whether a man or a woman is at the helm.

Zooming in on participant gender

Do men and women see things differently in our study? Let’s find out:

# Calculate descriptive statistics by participant gender to discern potential differences.
# The 'sex' column in 'study_1' is recoded so that 0 is recoded to 'Men participant' and 1 to 'Women participant'.
participant_gender_stats <- study_1 %>%
  mutate(
    sex = case_when(
      sex == 0 ~ "Men participant",
      sex == 1 ~ "Women participant"
    )) %>% 
  filter(!is.na(sex)) %>% 
  # Group the data by the newly updated 'sex' column.
  # Within each group (each unique condition), calculate the mean and standard deviation for the same set of variables.
  group_by(sex) %>%
  summarise(
    across(
      c(BS, HS, viable, Invest), 
      list(mean = ~mean(., na.rm = TRUE), sd = ~sd(., na.rm = TRUE))
    )
  )

participant_gender_stats %>% kable()

sex	BS_mean	BS_sd	HS_mean	HS_sd	viable_mean	viable_sd	Invest_mean	Invest_sd
Men participant	3.384204	0.7959016	3.092966	0.9308327	4.488688	1.453400	33711.55	24439.65
Women participant	2.995565	0.7949420	2.472838	0.8653258	4.884146	1.204029	42367.46	22103.41

Let’s graph these mean and sd values like we did before.

participant_gender_stats %>% 
  pivot_longer(cols = ends_with("_mean") | ends_with("_sd"), 
               names_to = "Metric_Type", 
               values_to = "Value") %>% 
  separate(Metric_Type, into = c("Variable", "Stat"), sep = "_") %>% 
  pivot_wider(names_from = Stat, values_from = Value) %>% 
  mutate(Variable = str_replace(Variable, "BS", "Benevolent Sexism"),
         Variable = str_replace(Variable, "HS", "Hostile Sexism"),
         Variable = str_replace(Variable, "viable", "Perceived Viability"),
         Variable = str_replace(Variable, "Invest", "Investment Decisions")) %>% 
  ggplot(aes(x = sex, y = mean, fill = sex)) +
  geom_bar(stat = "identity", position = position_dodge(width = 0.8), width = 0.7) +
  geom_errorbar(aes(ymin = mean - sd, ymax = mean + sd), position = position_dodge(width = 0.8), width = 0.25) +
  facet_wrap(~Variable, scales = "free_y", ncol = 2) +
  labs(x = "", y = "Mean with SD as error bars") +
  theme_minimal() +
  scale_fill_manual(values = c("Men participant" = "#ecc19c", "Women participant" = "#1e847f")) +
  theme(legend.position = "none")

Looks like the women in our study endorse benevolent and hostile sexism less than the men, although gender difference in benevolent sexism is not as big. Interestingly, they are more generous with their startup evaluations and investments. Time for one more round of t-tests to see if these observations hold water.

# Automating t-tests to compare variables between participant gender groups.
map(c("BS", "HS", "viable", "Invest"), ~t.test(reformulate("sex", response = .), data = study_1))

[[1]]

    Welch Two Sample t-test

data:  BS by sex
t = 4.7411, df = 351.56, p-value = 0.000003094
alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
95 percent confidence interval:
 0.2274194 0.5498579
sample estimates:
mean in group 0 mean in group 1 
       3.384204        2.995565 


[[2]]

    Welch Two Sample t-test

data:  HS by sex
t = 6.7316, df = 364.17, p-value = 0.00000000006541
alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
95 percent confidence interval:
 0.4389708 0.8012846
sample estimates:
mean in group 0 mean in group 1 
       3.092966        2.472838 


[[3]]

    Welch Two Sample t-test

data:  viable by sex
t = -2.9155, df = 378.34, p-value = 0.003762
alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
95 percent confidence interval:
 -0.6621582 -0.1287589
sample estimates:
mean in group 0 mean in group 1 
       4.488688        4.884146 


[[4]]

    Welch Two Sample t-test

data:  Invest by sex
t = -3.6275, df = 368, p-value = 0.0003266
alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
95 percent confidence interval:
 -13348.238  -3963.589
sample estimates:
mean in group 0 mean in group 1 
       33711.55        42367.46

The results show that women indeed do endorse benevolent and hostile sexism less than men. They also gave the startup higher evaluation and higher funding.

Scaling up: automating across studies

With Study 1 under our belt, we’re now ready to extend our analysis across multiple studies. In experimental psychological research like ours, it’s common to conduct multiple studies. Each one might adjust certain variables or conditions to ensure that our observations are not just flukes but reflect genuine, robust phenomena.

Now, obviously we can manually repeat the same codes for each study. But that can be both time-consuming and prone to human error. Automating is like having a trusted assistant who performs the same tasks for multiple datasets with unwavering accuracy, saving us time to focus on the bigger picture.

Automating descriptive stats calculation

We start by writing a function. A function is like as a recipe - it takes various “ingredients” (data) and, through a series of “cooking” steps (processing), delivers a delectable “dish” (outcome). In our case, the perform_descriptive_analysis function will take in the data for each study, calculate descriptive statistics for the whole sample and for separate groups, and serve up a comprehensive summary in a neatly organized dataframe.

# Initial setup for descriptive analysis automation.
perform_descriptive_analysis <- function(data, study_name) {
  # Recoding the variables for clarity
  data <- data %>%
    mutate(
      Condition = factor(case_when(Condition == 0 ~ "Men entrepreneur",
                                   Condition == 1 ~ "Women entrepreneur",
                                   TRUE ~ as.character(Condition)
                                   )), 
      sex = factor(case_when(sex == 0 ~ "Men participant",
                             sex == 1 ~ "Women participant",
                             TRUE ~ "Other participant gender"
                             ))
    )
  
  # Calculate stats for the entire sample to give us a baseline understanding of the dataset.
  overall_stats <- data %>%
    summarise(
      across(
        c(BS, HS, viable, Invest),
        list(mean = ~mean(., na.rm = TRUE), sd = ~sd(., na.rm = TRUE))
      ),
      n = n() # Capturing sample size for each analysis segment.
    ) %>% 
    mutate(Condition = "Overall") # Labeling these stats as 'Overall' for easy identification.

  # Calculate statistics for each experimental condition
  condition_stats <- data %>%
    group_by(Condition) %>%
    summarise(
      across(
        c(BS, HS, viable, Invest),
        list(mean = ~mean(., na.rm = TRUE), sd = ~sd(., na.rm = TRUE))
      ),
      n = n(),
      .groups = 'drop' # Ensuring the grouped structure is dropped post-summarization for simplicity.
    ) 

  
  # Calculate statistics for each participant gender group
  participant_gender_stats <- data %>%
    group_by(sex) %>%
    summarise(
      across(
        c(BS, HS, viable, Invest),
        list(mean = ~mean(., na.rm = TRUE), sd = ~sd(., na.rm = TRUE))
      ),
      n = n(),
      .groups = 'drop'
    ) %>%
  mutate(Condition = as.character(sex)) %>% # Labeling these stats for each participant gender group
  select(-sex)  # Removing the now redundant 'sex' column.

  # Compiling all stats into one comprehensive dataframe.
  combined_stats <- bind_rows(overall_stats, condition_stats, participant_gender_stats) 

  return(combined_stats) # Delivering the compiled dataframe as the function's output.
}

Next, we can use map_df from the purrr package to apply the function. It’s like having an army of robots at your disposal, each programmed to carry out the recipe on different datasets.

# List of datasets
studies <- list(study_1 = study_1, study_2 = study_2, study_3 = study_3)

# Apply 'perform_descriptive_analysis' to each dataset using 'map_df'
# '.id = "Study_Name"' adds a column with the name of each study, keeping track of which study each result came from.
descriptive_results <- map_df(names(studies), ~perform_descriptive_analysis(studies[[.x]], .x), .id = "Study_name")

# Presenting the aggregated results.
descriptive_results %>% mutate_if(is.numeric, ~ round(., 2)) %>% kable()

Study_name	BS_mean	BS_sd	HS_mean	HS_sd	viable_mean	viable_sd	Invest_mean	Invest_sd	n	Condition
1	3.22	0.82	2.84	0.96	4.67	1.37	37456.82	23845.36	388	Overall
1	3.18	0.81	2.89	0.97	4.63	1.38	36428.40	24395.79	196	Men entrepreneur
1	3.26	0.82	2.78	0.95	4.71	1.36	38501.32	23290.16	192	Women entrepreneur
1	3.38	0.80	3.09	0.93	4.49	1.45	33711.55	24439.65	221	Men participant
1	3.39	0.77	3.70	1.69	6.00	1.00	43662.00	30663.18	3	Other participant gender
1	3.00	0.79	2.47	0.87	4.88	1.20	42367.46	22103.41	164	Women participant
2	2.91	1.00	2.55	1.17	5.07	1.50	37184.11	25116.82	572	Overall
2	2.93	0.97	2.57	1.17	5.16	1.51	37402.50	24158.75	287	Men entrepreneur
2	2.89	1.02	2.53	1.16	4.98	1.48	36965.73	26080.74	285	Women entrepreneur
2	3.17	0.94	2.86	1.13	4.96	1.55	34352.07	25256.80	297	Men participant
2	1.55	0.36	1.18	0.00	5.17	1.44	30000.00	30000.00	3	Other participant gender
2	2.64	0.98	2.22	1.11	5.20	1.43	40381.89	24618.98	272	Women participant
3	2.63	1.01	2.45	1.17	4.34	1.39	30557.96	23184.86	312	Overall
3	2.62	1.00	2.48	1.12	4.17	1.37	27957.24	23981.45	152	Men entrepreneur
3	2.64	1.02	2.42	1.21	4.51	1.39	33028.64	22195.22	160	Women entrepreneur
3	2.90	0.97	2.67	1.14	4.22	1.38	27774.32	22064.52	177	Men participant
3	1.68	0.06	1.09	0.13	4.00	0.00	10000.00	14142.14	2	Other participant gender
3	2.28	0.96	2.16	1.14	4.51	1.40	34571.65	24141.38	133	Women participant

Visualizing observations across studies

With our statistics in hand, we’re ready to dive into some visualizations to better grasp our data! We’ll start by looking at participant numbers for each study.

descriptive_results %>% 
  # filtering out entries tagged as 'Other participant gender' since there are too few participant in this group
  filter(Condition != "Other participant gender") %>% 
  # adjust the 'Condition' and 'Study_name' columns for clearer categorization and labeling in our visualizations.
  mutate(
    # Convert 'Condition' into a factor with specific levels for clear grouping in the plot.
    # This helps in differentiating between the experimental conditions and participant gender groups.
    Condition = factor(Condition, levels = c("Overall", "Men entrepreneur", "Women entrepreneur", "Men participant", "Women participant")),
    # Similarly, convert 'Study_name' into a factor and assign more descriptive labels ('Study 1', 'Study 2', 'Study 3').
    # This ensures that the plots clearly indicate which study the data is drawn from.
    Study_name = factor(Study_name, levels = unique(Study_name), labels = c("Study 1", "Study 2", "Study 3"))
  ) %>%
  # Create a bar plot with 'Condition' on the x-axis, the number of participants ('n') on the y-axis, and color-coded by 'Condition'.
  ggplot(aes(x = Condition, y = n, fill = Condition)) +
  geom_bar(stat = "identity", position = position_dodge()) +  # 'stat="identity"' indicates that the heights of the bars represent data values.
  # Add labels on top of each bar to display the exact number of participants. The 'position_dodge()' ensures the labels align with the bars.
  geom_text(aes(label = n), position = position_dodge(width = 0.75), vjust = -0.25, size = 3, color = "gray50") +
  # Use 'facet_wrap' to create separate plots for each study, enabling comparisons across studies.
  facet_wrap(~ Study_name, scales = "free_x", nrow = 1) +
  # Customize plot labels and theme for readability and aesthetics. Remove x-axis label for cleanliness.
  labs(title = "Sample Size Across Studies", x = "", y = "Sample Size") +
  theme_minimal() +  # Apply a minimal theme for a clean look.
  theme(axis.text.x = element_text(angle = 45, hjust = 1), legend.position = "none") +  # Adjust text angle for better legibility.
  scale_fill_brewer(palette = "Set1")  # Apply a color palette for visual distinction of conditions.

Study 2 has the highest number of participants (n = 572) while Study 3 has the lowest (n = 312). The difference makes sense; Study 2 was open to all US full-time employees, a much larger pool than Study 3’s niche of people with previous experience in startup evaluation.

The sample sizes across our experimental conditions are very balanced. And while there were more men than women in our participant pool across three studies, the numbers are close enough that we’re all set for fair comparisons.

Exploring variable distributions across studies

Now, let’s take our analysis up a notch by diving into the distributions of our main variables across all three studies. By plotting histograms, we can visually grasp how participant responses vary for each variable—letting us spot trends, outliers, and overall patterns at a glance.

# Define a function to create histograms for given variables across a single study.
# This function takes the dataset, the name of the study, a list of variables to plot,
# the bin widths for each histogram, the colors for the histograms, and x-axis limits.
create_histograms <- function(data, study_name, variables, binwidths, colors, x_limits) {
  # Loop through each variable to generate its histogram.
  plots <- map(seq_along(variables), ~ {
    # Create the histogram with specified aesthetics.
    ggplot(data, aes(x = .data[[variables[.x]]])) +
      geom_histogram(binwidth = binwidths[.x], fill = colors[.x], color = "white") + # No outline color for cleaner look.
      ggtitle(paste(study_name, "-", variables[.x])) + # Title includes study name and variable.
      theme_minimal() + # Minimalist theme for focus on the data.
      xlim(x_limits[[.x]]) # Set x-axis limits based on predefined limits.
  })
  # Arrange the generated plots in a grid layout for easier comparison.
  plot_grid(plotlist = plots, ncol = 2) 
}

# List of study names extracted from the studies list for iteration.
study_names <- names(studies)

# Generating histograms for each study by passing them through our custom function.
map(study_names, ~create_histograms(studies[[.x]], .x, variables, binwidths, colors, x_limits))

[[1]]


[[2]]


[[3]]

For Studies 1 and 2, it’s like most participants are on the same page in terms of their benevolent sexism scores, with scores clustering in a bell curve. But in Study 3, it’s a different story: the curve flattens out before dipping, suggesting that while a range of moderately benevolent sexist attitudes is somewhat evenly spread among participants, extremely high benevolent sexist attitudes are rare. This is also the case for hostile sexism scores in Study 1. Yet, in Studies 2 and 3, the distribution of hostile sexism scores resembles a downward line, suggesting a general trend among the participants towards lower levels of hostile sexism, with high levels being progressively less common.

Across the board, we’re seeing bell curves when it comes to the distribution of perceived viability scores. This tells us that most participants gravitate towards a common middle ground when it comes to how viable they think the startups are. In constrast, with peaks and valleys, the multimodal distribution for investment decisions reveals distinct participant groups based on how much they’re willing to invest.

Comparing experimental conditions and participant genders

Up next, we transition from broad statistics to focused comparisons. Specifically, we’re comparing the experimental conditions (men-led vs. women-led startups) and the participant genders. We’ll look at the mean responses and the variability within these groups through bar charts. This visual approach gives us a straightforward way to see if there are any notable differences or if the groups are more alike than not.

# Reshaping the results for easier visualization.
descriptive_results %>%
  # Transform our results to a long format
  # Each variable (e.g., benevolent sexism, hostile sexism) gets expanded into two rows—one for mean and one for SD.
  pivot_longer(cols = ends_with("_mean") | ends_with("_sd"), 
               names_to = "Metric_Type", 
               values_to = "Value") %>% 
  separate(Metric_Type, into = c("Variable", "Stat"), sep = "_") %>% 
  pivot_wider(names_from = Stat, values_from = Value) %>% 
  # rename variables for a clearer understanding in the graphs.
  mutate(Variable = str_replace(Variable, "BS", "Benevolent Sexism"),
         Variable = str_replace(Variable, "HS", "Hostile Sexism"),
         Variable = str_replace(Variable, "viable", "Perceived Viability"),
         Variable = str_replace(Variable, "Invest", "Investment Decisions")) -> descriptive_results_long

# Splitting the transformed data by study and condition/gender for targeted analysis.
# This allows us to separately analyze and visualize the data for experimental conditions and participant genders across each study.
condition_stats <- descriptive_results_long %>% filter(Condition %in% c("Men entrepreneur", "Women entrepreneur")) %>% split(.$Study_name)
participant_gender_stats <- descriptive_results_long %>% filter(Condition %in% c("Men participant", "Women participant")) %>% rename(sex = Condition) %>% split(.$Study_name)

# Define a function to crafting bar charts that showcase mean values and include error bars for standard deviation.
# This function is versatile, adapting to either compare experimental conditions or participant genders based on input.
generate_plot_for_study <- function(data, group_var) {
  # Determine whether we're plotting Condition or sex based on group_var parameter
  fill_var <- if (group_var == "Condition") {
    "Condition"
  } else {
    "sex"
  }
  
  # Set the title dynamically based on the group_var
  title_text <- if (group_var == "Condition") {
    "Experimental Conditions"
  } else {
    "Participant Gender Groups"
  }
  
  # Adjust the fill colors based on the group_var
  fill_values <- if (group_var == "Condition") {
    c("Men entrepreneur" = "#e8d21d", "Women entrepreneur" = "#039fbe")
  } else {
    c("Men participant" = "#ecc19c", "Women participant" = "#1e847f")
  }
  
  # The plotting command constructs the bar chart, using aesthetic mappings specific to the comparison type ('Condition' or 'sex').
  # 'geom_bar' creates the bars, 'geom_errorbar' adds the error bars, and 'facet_wrap' organizes variables into subplots for a comprehensive view.
  ggplot(data, aes_string(x = fill_var, y = "mean", fill = fill_var)) +
    geom_bar(stat = "identity", position = position_dodge(width = 0.8), width = 0.7) +
    geom_errorbar(aes_string(ymin = "mean - sd", ymax = "mean + sd"), 
                  position = position_dodge(width = 0.8), width = 0.25) +
    facet_wrap(~Variable, scales = "free_y", ncol = 2) +
    labs(title = title_text, x = "", y = "Mean with SD as error bars") +
    theme_minimal() +
    scale_fill_manual(values = fill_values) +
    theme(legend.position = "none")
}

# Generating and displaying the bar charts for experimental conditions.
map(condition_stats, generate_plot_for_study, group_var = "Condition")

Warning: `aes_string()` was deprecated in ggplot2 3.0.0.
ℹ Please use tidy evaluation idioms with `aes()`.
ℹ See also `vignette("ggplot2-in-packages")` for more information.

$`1`


$`2`


$`3`

This visual dive shows us that for the most part, people in the men-led and women-led startup condition are remarkably consistent across various metrics. But Study 3 suggests a slight edge for women-led startups in perceived viability and funding.

What about our men and women participants? Do they differ in these key variables?

# Generating and displaying the bar charts for participant gender group 
map(participant_gender_stats, generate_plot_for_study, group_var = "sex")

$`1`


$`2`


$`3`

Here, the narrative remains steady. Women participants consistently show lower endorsement of sexist attitudes and are more generous in their evaluations and funding.

Performing t-tests across studies

Now, we dive into t-tests to validate if what we saw in our charts stands up to statistical rigor.

# Function to perform t-tests for specified variables across conditions and gender, ensuring comparability.
perform_combined_t_tests <- function(data, study_name, variables) {
  # Standardizing 'Condition' and 'sex' as factors to maintain clear and consistent group distinctions.
  data <- data %>%
    mutate(
      Condition = factor(Condition,
                         levels = c("0", "1"),
                         labels = c("Men entrepreneur", "Women entrepreneur")),
      sex = factor(sex,
                   levels = c("0", "1"),
                   labels = c("Men participant", "Women participant"))
    )
  
  # Preparing to capture t-test results across all variables.
  all_t_test_results <- list()
  
  # t-tests for comparing experimental conditions, encapsulating each result within a structured list.
  condition_t_test_results <- map(variables, ~ {
    t_test <- t.test(reformulate("Condition", response = .x), data = data)
    list(variable = .x, 
         comparison_type = "Condition", 
         groups_compared = "Men entrepreneur vs Women entrepreneur", 
         t_test_summary = broom::tidy(t_test))
  })
  all_t_test_results <- append(all_t_test_results, condition_t_test_results)
  
  # Similar t-tests for participant gender, again storing results in a structured format for easy interpretation.
  gender_t_test_results <- map(variables, ~ {
    t_test <- t.test(reformulate("sex", response = .x), data = data)
    list(variable = .x, 
         comparison_type = "Gender", 
         groups_compared = "Men participant vs Women participant", 
         t_test_summary = broom::tidy(t_test))
  })
  all_t_test_results <- append(all_t_test_results, gender_t_test_results)
  
  # Assembling t-test summaries into a cohesive dataframe, adding context about the variable and comparison type.
  t_test_df <- map_df(all_t_test_results, ~ .x$t_test_summary) %>%
    mutate(
      variable = map_chr(all_t_test_results, ~ .x$variable),
      Comparison = map_chr(all_t_test_results, ~ .x$groups_compared),
      Study = study_name
    )
  
  return(t_test_df)
}

# Executing t-tests across all studies and variables, reformatting for readability and context.
map_df(study_names, ~perform_combined_t_tests(studies[[.x]], .x, variables), .id = "Study") %>% 
  rename(
    Mean_Difference = estimate, 
    Mean_Group1 = estimate1,
    Mean_Group2 = estimate2,
    T_Statistic = statistic,
    P_Value = p.value,
    Degrees_of_Freedom = parameter,
    CI_Low = conf.low,
    CI_High = conf.high,
    Test_Method = method,
    Hypothesis_Testing = alternative,
    Variable_Tested = variable,
    Groups_Compared = Comparison
  ) %>% 
  relocate(Study, Variable_Tested, Groups_Compared) %>% 
  arrange(Variable_Tested) -> ttest_results

ttest_results %>% kable()

Study	Variable_Tested	Groups_Compared	Mean_Difference	Mean_Group1	Mean_Group2	T_Statistic	P_Value	Degrees_of_Freedom	CI_Low	CI_High	Test_Method	Hypothesis_Testing
1	BS	Men entrepreneur vs Women entrepreneur	-0.0753034	3.182746	3.258049	-0.9081468	0.3643681	385.4518	-0.2383353	0.0877285	Welch Two Sample t-test	two.sided
1	BS	Men participant vs Women participant	0.3886386	3.384204	2.995565	4.7410538	0.0000031	351.5642	0.2274194	0.5498579	Welch Two Sample t-test	two.sided
2	BS	Men entrepreneur vs Women entrepreneur	0.0391166	2.925879	2.886762	0.4689355	0.6392957	567.9061	-0.1247245	0.2029578	Welch Two Sample t-test	two.sided
2	BS	Men participant vs Women participant	0.5317086	3.167738	2.636029	6.5939030	0.0000000	556.5458	0.3733197	0.6900975	Welch Two Sample t-test	two.sided
3	BS	Men entrepreneur vs Women entrepreneur	-0.0130981	2.624402	2.637500	-0.1142094	0.9091458	309.7260	-0.2387580	0.2125619	Welch Two Sample t-test	two.sided
3	BS	Men participant vs Women participant	0.6240524	2.904982	2.280930	5.6356737	0.0000000	284.4628	0.4060933	0.8420115	Welch Two Sample t-test	two.sided
1	HS	Men entrepreneur vs Women entrepreneur	0.1064954	2.888219	2.781724	1.0928270	0.2751513	385.9842	-0.0851028	0.2980937	Welch Two Sample t-test	two.sided
1	HS	Men participant vs Women participant	0.6201277	3.092966	2.472838	6.7316285	0.0000000	364.1709	0.4389708	0.8012846	Welch Two Sample t-test	two.sided
2	HS	Men entrepreneur vs Women entrepreneur	0.0410624	2.569930	2.528868	0.4206748	0.6741514	569.0000	-0.1506592	0.2327841	Welch Two Sample t-test	two.sided
2	HS	Men participant vs Women participant	0.6371947	2.861794	2.224599	6.7742614	0.0000000	563.5546	0.4524415	0.8219479	Welch Two Sample t-test	two.sided
3	HS	Men entrepreneur vs Women entrepreneur	0.0625897	2.479067	2.416477	0.4741748	0.6357092	309.8832	-0.1971343	0.3223137	Welch Two Sample t-test	two.sided
3	HS	Men participant vs Women participant	0.5096408	2.674371	2.164730	3.8997625	0.0001202	284.3998	0.2524081	0.7668736	Welch Two Sample t-test	two.sided
1	Invest	Men entrepreneur vs Women entrepreneur	-2072.9177083	36428.400000	38501.317708	-0.8550577	0.3930515	384.6347	-6839.4597900	2693.6243733	Welch Two Sample t-test	two.sided
1	Invest	Men participant vs Women participant	-8655.9134146	33711.550000	42367.463415	-3.6274683	0.0003266	367.9959	-13348.2381987	-3963.5886306	Welch Two Sample t-test	two.sided
2	Invest	Men entrepreneur vs Women entrepreneur	436.7703180	37402.498233	36965.727915	0.2066800	0.8363348	560.7269	-3714.1188533	4587.6594894	Welch Two Sample t-test	two.sided
2	Invest	Men participant vs Women participant	-6029.8134834	34352.074576	40381.888060	-2.8668131	0.0043028	558.2159	-10161.1944967	-1898.4324701	Welch Two Sample t-test	two.sided
3	Invest	Men entrepreneur vs Women entrepreneur	-5071.4003289	27957.243421	33028.643750	-1.9359110	0.0538026	304.9616	-10226.2688059	83.4681480	Welch Two Sample t-test	two.sided
3	Invest	Men participant vs Women participant	-6797.3377512	27774.316384	34571.654135	-2.5451717	0.0114793	269.9483	-12055.3467311	-1539.3287712	Welch Two Sample t-test	two.sided
1	viable	Men entrepreneur vs Women entrepreneur	-0.0756271	4.630102	4.705729	-0.5444594	0.5864398	385.9875	-0.3487286	0.1974744	Welch Two Sample t-test	two.sided
1	viable	Men participant vs Women participant	-0.3954586	4.488688	4.884146	-2.9155343	0.0037622	378.3387	-0.6621582	-0.1287589	Welch Two Sample t-test	two.sided
2	viable	Men entrepreneur vs Women entrepreneur	0.1813687	5.163763	4.982394	1.4478839	0.1482003	568.9732	-0.0646689	0.4274063	Welch Two Sample t-test	two.sided
2	viable	Men participant vs Women participant	-0.2442866	4.956081	5.200368	-1.9509215	0.0515594	565.9984	-0.4902313	0.0016582	Welch Two Sample t-test	two.sided
3	viable	Men entrepreneur vs Women entrepreneur	-0.3351974	4.171053	4.506250	-2.1421644	0.0329602	309.5410	-0.6430886	-0.0273062	Welch Two Sample t-test	two.sided
3	viable	Men participant vs Women participant	-0.2909392	4.220339	4.511278	-1.8217760	0.0695471	282.3129	-0.6052948	0.0234164	Welch Two Sample t-test	two.sided

That’s a lot of results. Let’s break down these findings, starting with benevolent sexism.

# Presenting t-test results specifically for benevolent sexism.
ttest_results %>% filter(Variable_Tested == "BS") %>% kable()

Study	Variable_Tested	Groups_Compared	Mean_Difference	Mean_Group1	Mean_Group2	T_Statistic	P_Value	Degrees_of_Freedom	CI_Low	CI_High	Test_Method	Hypothesis_Testing
1	BS	Men entrepreneur vs Women entrepreneur	-0.0753034	3.182746	3.258049	-0.9081468	0.3643681	385.4518	-0.2383353	0.0877285	Welch Two Sample t-test	two.sided
1	BS	Men participant vs Women participant	0.3886386	3.384204	2.995565	4.7410538	0.0000031	351.5642	0.2274194	0.5498579	Welch Two Sample t-test	two.sided
2	BS	Men entrepreneur vs Women entrepreneur	0.0391166	2.925879	2.886762	0.4689355	0.6392957	567.9061	-0.1247245	0.2029578	Welch Two Sample t-test	two.sided
2	BS	Men participant vs Women participant	0.5317086	3.167738	2.636029	6.5939030	0.0000000	556.5458	0.3733197	0.6900975	Welch Two Sample t-test	two.sided
3	BS	Men entrepreneur vs Women entrepreneur	-0.0130981	2.624402	2.637500	-0.1142094	0.9091458	309.7260	-0.2387580	0.2125619	Welch Two Sample t-test	two.sided
3	BS	Men participant vs Women participant	0.6240524	2.904982	2.280930	5.6356737	0.0000000	284.4628	0.4060933	0.8420115	Welch Two Sample t-test	two.sided

In our studies, there’s balance in benevolent sexism scores between conditions—thanks to random assignment. However, a noticeable gender gap emerges, with men participants showing higher levels.

Next up, hostile sexism.

# Presenting t-test results specifically for hostile sexism.
ttest_results %>% filter(Variable_Tested == "HS") %>% kable()

Study	Variable_Tested	Groups_Compared	Mean_Difference	Mean_Group1	Mean_Group2	T_Statistic	P_Value	Degrees_of_Freedom	CI_Low	CI_High	Test_Method	Hypothesis_Testing
1	HS	Men entrepreneur vs Women entrepreneur	0.1064954	2.888219	2.781724	1.0928270	0.2751513	385.9842	-0.0851028	0.2980937	Welch Two Sample t-test	two.sided
1	HS	Men participant vs Women participant	0.6201277	3.092966	2.472838	6.7316285	0.0000000	364.1709	0.4389708	0.8012846	Welch Two Sample t-test	two.sided
2	HS	Men entrepreneur vs Women entrepreneur	0.0410624	2.569930	2.528868	0.4206748	0.6741514	569.0000	-0.1506592	0.2327841	Welch Two Sample t-test	two.sided
2	HS	Men participant vs Women participant	0.6371947	2.861794	2.224599	6.7742614	0.0000000	563.5546	0.4524415	0.8219479	Welch Two Sample t-test	two.sided
3	HS	Men entrepreneur vs Women entrepreneur	0.0625897	2.479067	2.416477	0.4741748	0.6357092	309.8832	-0.1971343	0.3223137	Welch Two Sample t-test	two.sided
3	HS	Men participant vs Women participant	0.5096408	2.674371	2.164730	3.8997625	0.0001202	284.3998	0.2524081	0.7668736	Welch Two Sample t-test	two.sided

Hostile sexsim scores follow a similar trend to benevolent sexism scores: no difference between experimental conditions and higher among men participants than women participants.

Similar to benevolent sexism, experimental conditions are balanced in terms of hostile sexism scores. Yet, men participants outscored women, indicating a gender divide.

What about startup viability perceptions?

# Presenting t-test results specifically for viability.
ttest_results %>% filter(Variable_Tested == "viable") %>% kable()

Study	Variable_Tested	Groups_Compared	Mean_Difference	Mean_Group1	Mean_Group2	T_Statistic	P_Value	Degrees_of_Freedom	CI_Low	CI_High	Test_Method	Hypothesis_Testing
1	viable	Men entrepreneur vs Women entrepreneur	-0.0756271	4.630102	4.705729	-0.5444594	0.5864398	385.9875	-0.3487286	0.1974744	Welch Two Sample t-test	two.sided
1	viable	Men participant vs Women participant	-0.3954586	4.488688	4.884146	-2.9155343	0.0037622	378.3387	-0.6621582	-0.1287589	Welch Two Sample t-test	two.sided
2	viable	Men entrepreneur vs Women entrepreneur	0.1813687	5.163763	4.982394	1.4478839	0.1482003	568.9732	-0.0646689	0.4274063	Welch Two Sample t-test	two.sided
2	viable	Men participant vs Women participant	-0.2442866	4.956081	5.200368	-1.9509215	0.0515594	565.9984	-0.4902313	0.0016582	Welch Two Sample t-test	two.sided
3	viable	Men entrepreneur vs Women entrepreneur	-0.3351974	4.171053	4.506250	-2.1421644	0.0329602	309.5410	-0.6430886	-0.0273062	Welch Two Sample t-test	two.sided
3	viable	Men participant vs Women participant	-0.2909392	4.220339	4.511278	-1.8217760	0.0695471	282.3129	-0.6052948	0.0234164	Welch Two Sample t-test	two.sided

Surprisingly, men- and women-led startups were seen as equally viable in most studies. An interesting deviation in Study 3 paints women-led startups more favorably. This upends common stereotypes, likely due the fact that in our experimental scenario the entrepreneurs are portrayed as highly competent and the startup was pre-tested to be seen as a viable idea. And, as we’ll see in subsequent regression analyses, while on the surface there seems to be no bias, benevolent sexism actually plays a role in creating inequity in startup evaluation.

Lastly, the matter of investment decisions.

ttest_results %>% filter(Variable_Tested == "Invest") %>% kable()

Study	Variable_Tested	Groups_Compared	Mean_Difference	Mean_Group1	Mean_Group2	T_Statistic	P_Value	Degrees_of_Freedom	CI_Low	CI_High	Test_Method	Hypothesis_Testing
1	Invest	Men entrepreneur vs Women entrepreneur	-2072.9177	36428.40	38501.32	-0.8550577	0.3930515	384.6347	-6839.460	2693.62437	Welch Two Sample t-test	two.sided
1	Invest	Men participant vs Women participant	-8655.9134	33711.55	42367.46	-3.6274683	0.0003266	367.9959	-13348.238	-3963.58863	Welch Two Sample t-test	two.sided
2	Invest	Men entrepreneur vs Women entrepreneur	436.7703	37402.50	36965.73	0.2066800	0.8363348	560.7269	-3714.119	4587.65949	Welch Two Sample t-test	two.sided
2	Invest	Men participant vs Women participant	-6029.8135	34352.07	40381.89	-2.8668131	0.0043028	558.2159	-10161.194	-1898.43247	Welch Two Sample t-test	two.sided
3	Invest	Men entrepreneur vs Women entrepreneur	-5071.4003	27957.24	33028.64	-1.9359110	0.0538026	304.9616	-10226.269	83.46815	Welch Two Sample t-test	two.sided
3	Invest	Men participant vs Women participant	-6797.3378	27774.32	34571.65	-2.5451717	0.0114793	269.9483	-12055.347	-1539.32877	Welch Two Sample t-test	two.sided

Financial backing was fairly even men- and women-led startups in all studies, though men participants were somewhat more conservative in their funding. This peels back layers on how participant gender influences startup support.

Summary

In this journey through our dataset, we’ve taken some crucial first steps before diving into the deeper waters of regression analysis. By calculating descriptive statistics, peeking at our sample sizes through colorful bar charts, exploring the shapes of our key variables with histograms, and comparing means across different groups with bar charts and t-tests, we’ve essentially mapped out the terrain of our data landscape. This initial exploration is not merely about crunching numbers—it’s about getting to know the fundamental properties our data—recognizing its patterns, its quirks, and how it speaks to the larger story we’re aiming to tell. In the next phase, we’ll dive into another crucial step before regression analysis: exploring the relationships between variables through correlation analyses.