::opts_chunk$set(message = FALSE, warning = FALSE) knitr
Descriptive statistics: getting to know our data
In this notebook, we presents the analysis behind our research, “Benevolent Sexism and the Gender Gap in Startup Evaluation,” featured in “Entrepreneurship: Theory and Practice” and “The Conversation Canada”. Our study explores into how evaluators’ benevolent sexism might influence their assessment of startups, contingent on the founder’s gender.
At the heart of our investigation lies a critical question: Does benevolent sexism skew evaluators’ views on the viability of startups led by men versus women? To dissect this, we orchestrated three experimental studies. Participants were subtly assigned to evaluate startups headed by either gender, while we separately measured their levels of benevolent and hostile sexism.
Key variables include:
- Entrepreneur gender (
Condition
) coded 0 for men and 1 for women entrepreneurs. - Participant gender (
sex
) indicates the evaluator’s gender, with 0 for men and 1 for women. - Participant benevolent sexism (
BS
) and hostile sexism (HS
) rated on a scale of 1-6, reflecting participants’ endorsement of different forms of sexism - Participant perceptions of startup viability (
viable
) assessed on a scale of 1-7, reflecting participants’ views on the startup’s potential success. - Participant funding allocations (
Invest
) captures the financial commitment participants are willing to make, ranging from 0 to 100,000.
Navigating the analysis:
We kick-started our analytical journey with Study 1 and then automate the process across three studies. Specifically, we calculated descriptive statistics for key variables for the whole sample and separately for each experimental condition and each participant gender group. We then visualize sample sizes to assess participant distribution, create histograms to understand variable distributions, comparing means and standard deviations across groups via bar charts, and testing these group differences through t-tests. In doing so, we gain an understanding the basic structure and distribution of our data, setting the stage for more complex regression analyses later on.
# Load the necessary libraries
library(dplyr)
library(tidyr)
library(ggplot2)
library(purrr)
library(psych)
library(kableExtra)
library(stringr)
library(broom)
library(patchwork)
library(cowplot)
# Load the data for three studies
<- readRDS("/Users/mac/Library/CloudStorage/OneDrive-McGillUniversity/Work/Projects/BS in entre/Data/Main studies/Study 1/Data/R data/study_1.rds")
study_1 <- readRDS("~/Library/CloudStorage/OneDrive-McGillUniversity/Work/Projects/BS in entre/Data/Main studies/Study 2/Data/R data/study_2.rds")
study_2 <- readRDS("~/Library/CloudStorage/OneDrive-McGillUniversity/Work/Projects/BS in entre/Data/Main studies/Study 3/Data/R data/study_3.rds") study_3
Kicking off with study 1
Welcome to the beginning of our exploration! 🌟 In this segment, we dive into the Study 1 dataset to uncover insights through descriptive statistics of key variables.
Understanding the participants
Let’s first take a glimpse at our data:
# Initial glimpse at the dataset to check the first few entries. This helps in getting a basic understanding of data structure and types of variables collected in the study.
%>% select(id, Condition, BS, HS, sex, viable, Invest) %>% head() study_1
# A tibble: 6 × 7
id Condition BS HS sex viable Invest
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 116899 1 2.55 2.64 1 5.5 44085
2 116977 1 2.36 2 1 5.5 50000
3 117031 1 2.73 3.18 1 5 60845
4 110215 1 3.64 1.27 1 7 40141
5 112513 1 2 2.73 0 5.5 20000
6 117004 1 3.18 4 0 5 30000
This table shows a snapshot of our dataset. Each row is a unique participant, with columns detailing their characteristics: their ID (id
), the experimental scenario they were assigned (Condition
), how much they agreed with benevolent sexist (BS
) and hostile sexist beliefs (HS
), their gender (sex
), how viable they thought the startup was (viable
), and how much they were willing to support it (Invest
).
Now, let’s calculate mean and standard deviation across key variables.
# Calculating mean and standard deviation for key variables (BS, HS, viable, Invest) to get an overview of central tendencies and variability. `na.rm = TRUE` ensures missing values are ignored in these calculations.
<- study_1 %>%
overall_stats summarise(
across(
c(BS, HS, viable, Invest),
list(mean = mean, sd = sd),
na.rm = TRUE))
Warning: There was 1 warning in `summarise()`.
ℹ In argument: `across(...)`.
Caused by warning:
! The `...` argument of `across()` is deprecated as of dplyr 1.1.0.
Supply arguments directly to `.fns` through an anonymous function instead.
# Previously
across(a:b, mean, na.rm = TRUE)
# Now
across(a:b, \(x) mean(x, na.rm = TRUE))
%>% kable() overall_stats
BS_mean | BS_sd | HS_mean | HS_sd | viable_mean | viable_sd | Invest_mean | Invest_sd |
---|---|---|---|---|---|---|---|
3.220009 | 0.8162942 | 2.83552 | 0.9602238 | 4.667526 | 1.366932 | 37456.82 | 23845.36 |
It looks like our participants agree with benevolent sexism more than hostile sexism. No big surprise there—society often dresses up these attitudes as chivalry instead of prejudice. Our participants are also quite optimistic about the startup’s potential, scoring its viability pretty high (4.7 out of 7). However, they’re a tad more cautious when it comes to actually investing, with investment amount averaging at 37K out of 100K. A classic case of “let’s see where this goes”.
Let’s visualize the distribution of these variables to better grasp how our participants’ opinions spread out. For this, we’ll use histograms.
# Setting up for visualization
# Define key variables, their bin widths, and assigned colors for differentiation
<- c("BS", "HS", "viable", "Invest") # `variables` are the key variables of interest
variables <- c(0.5, 0.5, 0.5, 5000) # `binwidths` determine the granularity of the histogram
binwidths <- c("#cf1578", "#e8d21d", "#039fbe", "#b20238") # `colors` are visually distinguishing each variable's histogram
colors <- list(c(1, 6), c(1, 6), c(1, 7), c(0, 100000)) # Define x-axis limits based on expected data ranges
x_limits
# Generate histograms using map to iterate over the key variables and their corresponding attributes
<- map(seq_along(variables), ~ {
plots <- ggplot(study_1, aes(x = .data[[variables[.x]]])) +
var_plot geom_histogram(binwidth = binwidths[.x], fill = colors[.x], color = NA) + # No outline around bins
ggtitle(paste("Histogram of", variables[.x])) +
xlab(variables[.x]) +
ylab("Frequency") +
theme_minimal() +
scale_x_continuous(limits = x_limits[[.x]], oob = scales::oob_squish) # Adjust x-axis based on variable
})
# Assembling histograms into a cohesive visual layout for side-by-side comparison.
1]] + plots[[2]]) / (plots[[3]] + plots[[4]]) (plots[[
Warning: Removed 1 row containing non-finite outside the scale range
(`stat_bin()`).
Benevolent sexism, hostile sexism, and perceived viability scores generally follow a normal distribution, indicating a relatively even spread of opinions on these scales. Investment amounts, though, presents what looks like a multimodal distribution with varied peaks. This hints at distinct groups of participants based on their willingness to invest.
Zooming in on the experimental conditions
Let’s narrow down our focus and compare startups led by men versus those led by women. How many observations we have in each condition?
# Counting the number of observations within each experimental condition to ensure sufficient data for each group.
%>%
study_1 mutate(Condition = case_when(
== 0 ~ "Men entrepreneur",
Condition == 1 ~ "Women entrepreneur",
Condition TRUE ~ as.character(Condition)
)%>% count(Condition) )
# A tibble: 2 × 2
Condition n
<chr> <int>
1 Men entrepreneur 196
2 Women entrepreneur 192
Looks like we have a balanced number of observations for each experimental condition. Nice! This is crucial for subsequent comparative analysis to be meaningful.
Let’s dig deeper and calculate means and standard deviations for each experimental condition.
# Calculate descriptive statistics by the experimental condition to discern potential differences.
# The 'Condition' column in 'study_1' is recoded so that 0 is recoded to 'Men entrepreneur' and 1 to 'Women entrepreneur'.
<- study_1 %>%
condition_stats mutate(
Condition = case_when(
== 0 ~ "Men entrepreneur",
Condition == 1 ~ "Women entrepreneur",
Condition TRUE ~ as.character(Condition)
)%>%
) # Group the data by the newly updated 'Condition' column.
# Within each group (each unique condition), calculate the mean and standard deviation for the same set of variables.
group_by(Condition) %>%
summarise(
across(
c(BS, HS, viable, Invest),
list(mean = ~mean(., na.rm = TRUE), sd = ~sd(., na.rm = TRUE))
)
)
%>% kable() condition_stats
Condition | BS_mean | BS_sd | HS_mean | HS_sd | viable_mean | viable_sd | Invest_mean | Invest_sd |
---|---|---|---|---|---|---|---|---|
Men entrepreneur | 3.182746 | 0.8095630 | 2.888219 | 0.9727516 | 4.630102 | 1.378279 | 36428.40 | 24395.79 |
Women entrepreneur | 3.258049 | 0.8234795 | 2.781724 | 0.9467708 | 4.705729 | 1.357785 | 38501.32 | 23290.16 |
Let’s break down these stats visually. It’s always a bit easier to spot patterns and contrasts with a graph rather than a table:
# Transforming condition stats for visualization.
# We pivot longer to have a single measure and stat type per row, then pivot wider to separate mean and sd for plotting.
%>%
condition_stats pivot_longer(cols = ends_with("_mean") | ends_with("_sd"), # Select columns that end with '_mean' or '_sd'
names_to = "Metric_Type", # New column where original column names (indicating metric and stat type) are stored
values_to = "Value") %>% # New column where values from the selected columns are stored
separate(Metric_Type, into = c("Variable", "Stat"), sep = "_") %>% # Split 'Metric_Type' into 'Variable' and 'Stat' based on '_'
pivot_wider(names_from = Stat, values_from = Value) %>% # Pivot back to a wider format where 'mean' and 'sd' become separate columns
# Replace abbreviated variable names with full, descriptive names for clarity in visual representation
mutate(Variable = str_replace(Variable, "BS", "Benevolent Sexism"),
Variable = str_replace(Variable, "HS", "Hostile Sexism"),
Variable = str_replace(Variable, "viable", "Perceived Viability"),
Variable = str_replace(Variable, "Invest", "Investment Decisions")) %>%
ggplot(aes(x = Condition, y = mean, fill = Condition)) + # Plotting setup: X-axis is Condition, Y-axis is mean, colored by Condition
geom_bar(stat = "identity", position = position_dodge(width = 0.8), width = 0.7) + # Draw bars for mean values, dodge positions them side by side
geom_errorbar(aes(ymin = mean - sd, ymax = mean + sd), position = position_dodge(width = 0.8), width = 0.25) + # Add error bars for SD
facet_wrap(~Variable, scales = "free_y", ncol = 2) + # Separate plots for each variable, allowing Y-axis to scale independently
labs(x = "", y = "Mean with SD as error bars") + # Labeling axes
theme_minimal() + # Minimal theme for a clean look
scale_fill_manual(values = c("Men entrepreneur" = "#e8d21d", "Women entrepreneur" = "#039fbe")) + # Custom colors for conditions
theme(legend.position = "none") # Remove legend for a cleaner look
Seems like there is no drastic differences in sexist attitudes between the conditions. That’s good, it means our random assignment was successful in cancelling out group differences in sexism. There’s a slight edge in favor of women’s startups when it comes to investment. Let’s conduct t-test to see if this difference are statistically significant.
# Specifying the variables to undergo statistical testing.
<- c("BS", "HS", "viable", "Invest")
variables_to_test
# Setting scientific notation penalty to avoid scientific notation in output
options(scipen = 999)
# Running t-tests for each variable between conditions to check for statistically significant differences.
# The reformulate function dynamically creates the formula needed for the t-test based on the variable name.
map(variables_to_test, ~t.test(reformulate("Condition", response = .), data = study_1))
[[1]]
Welch Two Sample t-test
data: BS by Condition
t = -0.90815, df = 385.45, p-value = 0.3644
alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
95 percent confidence interval:
-0.23833528 0.08772845
sample estimates:
mean in group 0 mean in group 1
3.182746 3.258049
[[2]]
Welch Two Sample t-test
data: HS by Condition
t = 1.0928, df = 385.98, p-value = 0.2752
alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
95 percent confidence interval:
-0.08510283 0.29809371
sample estimates:
mean in group 0 mean in group 1
2.888219 2.781723
[[3]]
Welch Two Sample t-test
data: viable by Condition
t = -0.54446, df = 385.99, p-value = 0.5864
alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
95 percent confidence interval:
-0.3487286 0.1974744
sample estimates:
mean in group 0 mean in group 1
4.630102 4.705729
[[4]]
Welch Two Sample t-test
data: Invest by Condition
t = -0.85506, df = 384.63, p-value = 0.3931
alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
95 percent confidence interval:
-6839.460 2693.624
sample estimates:
mean in group 0 mean in group 1
36428.40 38501.32
And… it turns out, the differences we spotted don’t pass the statistical significance test. So, the way our participants view and invest in these startups doesn’t hinge on whether a man or a woman is at the helm.
Zooming in on participant gender
Do men and women see things differently in our study? Let’s find out:
# Calculate descriptive statistics by participant gender to discern potential differences.
# The 'sex' column in 'study_1' is recoded so that 0 is recoded to 'Men participant' and 1 to 'Women participant'.
<- study_1 %>%
participant_gender_stats mutate(
sex = case_when(
== 0 ~ "Men participant",
sex == 1 ~ "Women participant"
sex %>%
)) filter(!is.na(sex)) %>%
# Group the data by the newly updated 'sex' column.
# Within each group (each unique condition), calculate the mean and standard deviation for the same set of variables.
group_by(sex) %>%
summarise(
across(
c(BS, HS, viable, Invest),
list(mean = ~mean(., na.rm = TRUE), sd = ~sd(., na.rm = TRUE))
)
)
%>% kable() participant_gender_stats
sex | BS_mean | BS_sd | HS_mean | HS_sd | viable_mean | viable_sd | Invest_mean | Invest_sd |
---|---|---|---|---|---|---|---|---|
Men participant | 3.384204 | 0.7959016 | 3.092966 | 0.9308327 | 4.488688 | 1.453400 | 33711.55 | 24439.65 |
Women participant | 2.995565 | 0.7949420 | 2.472838 | 0.8653258 | 4.884146 | 1.204029 | 42367.46 | 22103.41 |
Let’s graph these mean and sd values like we did before.
%>%
participant_gender_stats pivot_longer(cols = ends_with("_mean") | ends_with("_sd"),
names_to = "Metric_Type",
values_to = "Value") %>%
separate(Metric_Type, into = c("Variable", "Stat"), sep = "_") %>%
pivot_wider(names_from = Stat, values_from = Value) %>%
mutate(Variable = str_replace(Variable, "BS", "Benevolent Sexism"),
Variable = str_replace(Variable, "HS", "Hostile Sexism"),
Variable = str_replace(Variable, "viable", "Perceived Viability"),
Variable = str_replace(Variable, "Invest", "Investment Decisions")) %>%
ggplot(aes(x = sex, y = mean, fill = sex)) +
geom_bar(stat = "identity", position = position_dodge(width = 0.8), width = 0.7) +
geom_errorbar(aes(ymin = mean - sd, ymax = mean + sd), position = position_dodge(width = 0.8), width = 0.25) +
facet_wrap(~Variable, scales = "free_y", ncol = 2) +
labs(x = "", y = "Mean with SD as error bars") +
theme_minimal() +
scale_fill_manual(values = c("Men participant" = "#ecc19c", "Women participant" = "#1e847f")) +
theme(legend.position = "none")
Looks like the women in our study endorse benevolent and hostile sexism less than the men, although gender difference in benevolent sexism is not as big. Interestingly, they are more generous with their startup evaluations and investments. Time for one more round of t-tests to see if these observations hold water.
# Automating t-tests to compare variables between participant gender groups.
map(c("BS", "HS", "viable", "Invest"), ~t.test(reformulate("sex", response = .), data = study_1))
[[1]]
Welch Two Sample t-test
data: BS by sex
t = 4.7411, df = 351.56, p-value = 0.000003094
alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
95 percent confidence interval:
0.2274194 0.5498579
sample estimates:
mean in group 0 mean in group 1
3.384204 2.995565
[[2]]
Welch Two Sample t-test
data: HS by sex
t = 6.7316, df = 364.17, p-value = 0.00000000006541
alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
95 percent confidence interval:
0.4389708 0.8012846
sample estimates:
mean in group 0 mean in group 1
3.092966 2.472838
[[3]]
Welch Two Sample t-test
data: viable by sex
t = -2.9155, df = 378.34, p-value = 0.003762
alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
95 percent confidence interval:
-0.6621582 -0.1287589
sample estimates:
mean in group 0 mean in group 1
4.488688 4.884146
[[4]]
Welch Two Sample t-test
data: Invest by sex
t = -3.6275, df = 368, p-value = 0.0003266
alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
95 percent confidence interval:
-13348.238 -3963.589
sample estimates:
mean in group 0 mean in group 1
33711.55 42367.46
The results show that women indeed do endorse benevolent and hostile sexism less than men. They also gave the startup higher evaluation and higher funding.
Scaling up: automating across studies
With Study 1 under our belt, we’re now ready to extend our analysis across multiple studies. In experimental psychological research like ours, it’s common to conduct multiple studies. Each one might adjust certain variables or conditions to ensure that our observations are not just flukes but reflect genuine, robust phenomena.
Now, obviously we can manually repeat the same codes for each study. But that can be both time-consuming and prone to human error. Automating is like having a trusted assistant who performs the same tasks for multiple datasets with unwavering accuracy, saving us time to focus on the bigger picture.
Automating descriptive stats calculation
We start by writing a function. A function is like as a recipe - it takes various “ingredients” (data) and, through a series of “cooking” steps (processing), delivers a delectable “dish” (outcome). In our case, the perform_descriptive_analysis
function will take in the data for each study, calculate descriptive statistics for the whole sample and for separate groups, and serve up a comprehensive summary in a neatly organized dataframe.
# Initial setup for descriptive analysis automation.
<- function(data, study_name) {
perform_descriptive_analysis # Recoding the variables for clarity
<- data %>%
data mutate(
Condition = factor(case_when(Condition == 0 ~ "Men entrepreneur",
== 1 ~ "Women entrepreneur",
Condition TRUE ~ as.character(Condition)
)), sex = factor(case_when(sex == 0 ~ "Men participant",
== 1 ~ "Women participant",
sex TRUE ~ "Other participant gender"
))
)
# Calculate stats for the entire sample to give us a baseline understanding of the dataset.
<- data %>%
overall_stats summarise(
across(
c(BS, HS, viable, Invest),
list(mean = ~mean(., na.rm = TRUE), sd = ~sd(., na.rm = TRUE))
),n = n() # Capturing sample size for each analysis segment.
%>%
) mutate(Condition = "Overall") # Labeling these stats as 'Overall' for easy identification.
# Calculate statistics for each experimental condition
<- data %>%
condition_stats group_by(Condition) %>%
summarise(
across(
c(BS, HS, viable, Invest),
list(mean = ~mean(., na.rm = TRUE), sd = ~sd(., na.rm = TRUE))
),n = n(),
.groups = 'drop' # Ensuring the grouped structure is dropped post-summarization for simplicity.
)
# Calculate statistics for each participant gender group
<- data %>%
participant_gender_stats group_by(sex) %>%
summarise(
across(
c(BS, HS, viable, Invest),
list(mean = ~mean(., na.rm = TRUE), sd = ~sd(., na.rm = TRUE))
),n = n(),
.groups = 'drop'
%>%
) mutate(Condition = as.character(sex)) %>% # Labeling these stats for each participant gender group
select(-sex) # Removing the now redundant 'sex' column.
# Compiling all stats into one comprehensive dataframe.
<- bind_rows(overall_stats, condition_stats, participant_gender_stats)
combined_stats
return(combined_stats) # Delivering the compiled dataframe as the function's output.
}
Next, we can use map_df
from the purrr
package to apply the function. It’s like having an army of robots at your disposal, each programmed to carry out the recipe on different datasets.
# List of datasets
<- list(study_1 = study_1, study_2 = study_2, study_3 = study_3)
studies
# Apply 'perform_descriptive_analysis' to each dataset using 'map_df'
# '.id = "Study_Name"' adds a column with the name of each study, keeping track of which study each result came from.
<- map_df(names(studies), ~perform_descriptive_analysis(studies[[.x]], .x), .id = "Study_name")
descriptive_results
# Presenting the aggregated results.
%>% mutate_if(is.numeric, ~ round(., 2)) %>% kable() descriptive_results
Study_name | BS_mean | BS_sd | HS_mean | HS_sd | viable_mean | viable_sd | Invest_mean | Invest_sd | n | Condition |
---|---|---|---|---|---|---|---|---|---|---|
1 | 3.22 | 0.82 | 2.84 | 0.96 | 4.67 | 1.37 | 37456.82 | 23845.36 | 388 | Overall |
1 | 3.18 | 0.81 | 2.89 | 0.97 | 4.63 | 1.38 | 36428.40 | 24395.79 | 196 | Men entrepreneur |
1 | 3.26 | 0.82 | 2.78 | 0.95 | 4.71 | 1.36 | 38501.32 | 23290.16 | 192 | Women entrepreneur |
1 | 3.38 | 0.80 | 3.09 | 0.93 | 4.49 | 1.45 | 33711.55 | 24439.65 | 221 | Men participant |
1 | 3.39 | 0.77 | 3.70 | 1.69 | 6.00 | 1.00 | 43662.00 | 30663.18 | 3 | Other participant gender |
1 | 3.00 | 0.79 | 2.47 | 0.87 | 4.88 | 1.20 | 42367.46 | 22103.41 | 164 | Women participant |
2 | 2.91 | 1.00 | 2.55 | 1.17 | 5.07 | 1.50 | 37184.11 | 25116.82 | 572 | Overall |
2 | 2.93 | 0.97 | 2.57 | 1.17 | 5.16 | 1.51 | 37402.50 | 24158.75 | 287 | Men entrepreneur |
2 | 2.89 | 1.02 | 2.53 | 1.16 | 4.98 | 1.48 | 36965.73 | 26080.74 | 285 | Women entrepreneur |
2 | 3.17 | 0.94 | 2.86 | 1.13 | 4.96 | 1.55 | 34352.07 | 25256.80 | 297 | Men participant |
2 | 1.55 | 0.36 | 1.18 | 0.00 | 5.17 | 1.44 | 30000.00 | 30000.00 | 3 | Other participant gender |
2 | 2.64 | 0.98 | 2.22 | 1.11 | 5.20 | 1.43 | 40381.89 | 24618.98 | 272 | Women participant |
3 | 2.63 | 1.01 | 2.45 | 1.17 | 4.34 | 1.39 | 30557.96 | 23184.86 | 312 | Overall |
3 | 2.62 | 1.00 | 2.48 | 1.12 | 4.17 | 1.37 | 27957.24 | 23981.45 | 152 | Men entrepreneur |
3 | 2.64 | 1.02 | 2.42 | 1.21 | 4.51 | 1.39 | 33028.64 | 22195.22 | 160 | Women entrepreneur |
3 | 2.90 | 0.97 | 2.67 | 1.14 | 4.22 | 1.38 | 27774.32 | 22064.52 | 177 | Men participant |
3 | 1.68 | 0.06 | 1.09 | 0.13 | 4.00 | 0.00 | 10000.00 | 14142.14 | 2 | Other participant gender |
3 | 2.28 | 0.96 | 2.16 | 1.14 | 4.51 | 1.40 | 34571.65 | 24141.38 | 133 | Women participant |
Visualizing observations across studies
With our statistics in hand, we’re ready to dive into some visualizations to better grasp our data! We’ll start by looking at participant numbers for each study.
%>%
descriptive_results # filtering out entries tagged as 'Other participant gender' since there are too few participant in this group
filter(Condition != "Other participant gender") %>%
# adjust the 'Condition' and 'Study_name' columns for clearer categorization and labeling in our visualizations.
mutate(
# Convert 'Condition' into a factor with specific levels for clear grouping in the plot.
# This helps in differentiating between the experimental conditions and participant gender groups.
Condition = factor(Condition, levels = c("Overall", "Men entrepreneur", "Women entrepreneur", "Men participant", "Women participant")),
# Similarly, convert 'Study_name' into a factor and assign more descriptive labels ('Study 1', 'Study 2', 'Study 3').
# This ensures that the plots clearly indicate which study the data is drawn from.
Study_name = factor(Study_name, levels = unique(Study_name), labels = c("Study 1", "Study 2", "Study 3"))
%>%
) # Create a bar plot with 'Condition' on the x-axis, the number of participants ('n') on the y-axis, and color-coded by 'Condition'.
ggplot(aes(x = Condition, y = n, fill = Condition)) +
geom_bar(stat = "identity", position = position_dodge()) + # 'stat="identity"' indicates that the heights of the bars represent data values.
# Add labels on top of each bar to display the exact number of participants. The 'position_dodge()' ensures the labels align with the bars.
geom_text(aes(label = n), position = position_dodge(width = 0.75), vjust = -0.25, size = 3, color = "gray50") +
# Use 'facet_wrap' to create separate plots for each study, enabling comparisons across studies.
facet_wrap(~ Study_name, scales = "free_x", nrow = 1) +
# Customize plot labels and theme for readability and aesthetics. Remove x-axis label for cleanliness.
labs(title = "Sample Size Across Studies", x = "", y = "Sample Size") +
theme_minimal() + # Apply a minimal theme for a clean look.
theme(axis.text.x = element_text(angle = 45, hjust = 1), legend.position = "none") + # Adjust text angle for better legibility.
scale_fill_brewer(palette = "Set1") # Apply a color palette for visual distinction of conditions.
Study 2 has the highest number of participants (n = 572) while Study 3 has the lowest (n = 312). The difference makes sense; Study 2 was open to all US full-time employees, a much larger pool than Study 3’s niche of people with previous experience in startup evaluation.
The sample sizes across our experimental conditions are very balanced. And while there were more men than women in our participant pool across three studies, the numbers are close enough that we’re all set for fair comparisons.
Exploring variable distributions across studies
Now, let’s take our analysis up a notch by diving into the distributions of our main variables across all three studies. By plotting histograms, we can visually grasp how participant responses vary for each variable—letting us spot trends, outliers, and overall patterns at a glance.
# Define a function to create histograms for given variables across a single study.
# This function takes the dataset, the name of the study, a list of variables to plot,
# the bin widths for each histogram, the colors for the histograms, and x-axis limits.
<- function(data, study_name, variables, binwidths, colors, x_limits) {
create_histograms # Loop through each variable to generate its histogram.
<- map(seq_along(variables), ~ {
plots # Create the histogram with specified aesthetics.
ggplot(data, aes(x = .data[[variables[.x]]])) +
geom_histogram(binwidth = binwidths[.x], fill = colors[.x], color = "white") + # No outline color for cleaner look.
ggtitle(paste(study_name, "-", variables[.x])) + # Title includes study name and variable.
theme_minimal() + # Minimalist theme for focus on the data.
xlim(x_limits[[.x]]) # Set x-axis limits based on predefined limits.
})# Arrange the generated plots in a grid layout for easier comparison.
plot_grid(plotlist = plots, ncol = 2)
}
# List of study names extracted from the studies list for iteration.
<- names(studies)
study_names
# Generating histograms for each study by passing them through our custom function.
map(study_names, ~create_histograms(studies[[.x]], .x, variables, binwidths, colors, x_limits))
[[1]]
[[2]]
[[3]]
For Studies 1 and 2, it’s like most participants are on the same page in terms of their benevolent sexism scores, with scores clustering in a bell curve. But in Study 3, it’s a different story: the curve flattens out before dipping, suggesting that while a range of moderately benevolent sexist attitudes is somewhat evenly spread among participants, extremely high benevolent sexist attitudes are rare. This is also the case for hostile sexism scores in Study 1. Yet, in Studies 2 and 3, the distribution of hostile sexism scores resembles a downward line, suggesting a general trend among the participants towards lower levels of hostile sexism, with high levels being progressively less common.
Across the board, we’re seeing bell curves when it comes to the distribution of perceived viability scores. This tells us that most participants gravitate towards a common middle ground when it comes to how viable they think the startups are. In constrast, with peaks and valleys, the multimodal distribution for investment decisions reveals distinct participant groups based on how much they’re willing to invest.
Comparing experimental conditions and participant genders
Up next, we transition from broad statistics to focused comparisons. Specifically, we’re comparing the experimental conditions (men-led vs. women-led startups) and the participant genders. We’ll look at the mean responses and the variability within these groups through bar charts. This visual approach gives us a straightforward way to see if there are any notable differences or if the groups are more alike than not.
# Reshaping the results for easier visualization.
%>%
descriptive_results # Transform our results to a long format
# Each variable (e.g., benevolent sexism, hostile sexism) gets expanded into two rows—one for mean and one for SD.
pivot_longer(cols = ends_with("_mean") | ends_with("_sd"),
names_to = "Metric_Type",
values_to = "Value") %>%
separate(Metric_Type, into = c("Variable", "Stat"), sep = "_") %>%
pivot_wider(names_from = Stat, values_from = Value) %>%
# rename variables for a clearer understanding in the graphs.
mutate(Variable = str_replace(Variable, "BS", "Benevolent Sexism"),
Variable = str_replace(Variable, "HS", "Hostile Sexism"),
Variable = str_replace(Variable, "viable", "Perceived Viability"),
Variable = str_replace(Variable, "Invest", "Investment Decisions")) -> descriptive_results_long
# Splitting the transformed data by study and condition/gender for targeted analysis.
# This allows us to separately analyze and visualize the data for experimental conditions and participant genders across each study.
<- descriptive_results_long %>% filter(Condition %in% c("Men entrepreneur", "Women entrepreneur")) %>% split(.$Study_name)
condition_stats <- descriptive_results_long %>% filter(Condition %in% c("Men participant", "Women participant")) %>% rename(sex = Condition) %>% split(.$Study_name) participant_gender_stats
# Define a function to crafting bar charts that showcase mean values and include error bars for standard deviation.
# This function is versatile, adapting to either compare experimental conditions or participant genders based on input.
<- function(data, group_var) {
generate_plot_for_study # Determine whether we're plotting Condition or sex based on group_var parameter
<- if (group_var == "Condition") {
fill_var "Condition"
else {
} "sex"
}
# Set the title dynamically based on the group_var
<- if (group_var == "Condition") {
title_text "Experimental Conditions"
else {
} "Participant Gender Groups"
}
# Adjust the fill colors based on the group_var
<- if (group_var == "Condition") {
fill_values c("Men entrepreneur" = "#e8d21d", "Women entrepreneur" = "#039fbe")
else {
} c("Men participant" = "#ecc19c", "Women participant" = "#1e847f")
}
# The plotting command constructs the bar chart, using aesthetic mappings specific to the comparison type ('Condition' or 'sex').
# 'geom_bar' creates the bars, 'geom_errorbar' adds the error bars, and 'facet_wrap' organizes variables into subplots for a comprehensive view.
ggplot(data, aes_string(x = fill_var, y = "mean", fill = fill_var)) +
geom_bar(stat = "identity", position = position_dodge(width = 0.8), width = 0.7) +
geom_errorbar(aes_string(ymin = "mean - sd", ymax = "mean + sd"),
position = position_dodge(width = 0.8), width = 0.25) +
facet_wrap(~Variable, scales = "free_y", ncol = 2) +
labs(title = title_text, x = "", y = "Mean with SD as error bars") +
theme_minimal() +
scale_fill_manual(values = fill_values) +
theme(legend.position = "none")
}
# Generating and displaying the bar charts for experimental conditions.
map(condition_stats, generate_plot_for_study, group_var = "Condition")
Warning: `aes_string()` was deprecated in ggplot2 3.0.0.
ℹ Please use tidy evaluation idioms with `aes()`.
ℹ See also `vignette("ggplot2-in-packages")` for more information.
$`1`
$`2`
$`3`
This visual dive shows us that for the most part, people in the men-led and women-led startup condition are remarkably consistent across various metrics. But Study 3 suggests a slight edge for women-led startups in perceived viability and funding.
What about our men and women participants? Do they differ in these key variables?
# Generating and displaying the bar charts for participant gender group
map(participant_gender_stats, generate_plot_for_study, group_var = "sex")
$`1`
$`2`
$`3`
Here, the narrative remains steady. Women participants consistently show lower endorsement of sexist attitudes and are more generous in their evaluations and funding.
Performing t-tests across studies
Now, we dive into t-tests to validate if what we saw in our charts stands up to statistical rigor.
# Function to perform t-tests for specified variables across conditions and gender, ensuring comparability.
<- function(data, study_name, variables) {
perform_combined_t_tests # Standardizing 'Condition' and 'sex' as factors to maintain clear and consistent group distinctions.
<- data %>%
data mutate(
Condition = factor(Condition,
levels = c("0", "1"),
labels = c("Men entrepreneur", "Women entrepreneur")),
sex = factor(sex,
levels = c("0", "1"),
labels = c("Men participant", "Women participant"))
)
# Preparing to capture t-test results across all variables.
<- list()
all_t_test_results
# t-tests for comparing experimental conditions, encapsulating each result within a structured list.
<- map(variables, ~ {
condition_t_test_results <- t.test(reformulate("Condition", response = .x), data = data)
t_test list(variable = .x,
comparison_type = "Condition",
groups_compared = "Men entrepreneur vs Women entrepreneur",
t_test_summary = broom::tidy(t_test))
})<- append(all_t_test_results, condition_t_test_results)
all_t_test_results
# Similar t-tests for participant gender, again storing results in a structured format for easy interpretation.
<- map(variables, ~ {
gender_t_test_results <- t.test(reformulate("sex", response = .x), data = data)
t_test list(variable = .x,
comparison_type = "Gender",
groups_compared = "Men participant vs Women participant",
t_test_summary = broom::tidy(t_test))
})<- append(all_t_test_results, gender_t_test_results)
all_t_test_results
# Assembling t-test summaries into a cohesive dataframe, adding context about the variable and comparison type.
<- map_df(all_t_test_results, ~ .x$t_test_summary) %>%
t_test_df mutate(
variable = map_chr(all_t_test_results, ~ .x$variable),
Comparison = map_chr(all_t_test_results, ~ .x$groups_compared),
Study = study_name
)
return(t_test_df)
}
# Executing t-tests across all studies and variables, reformatting for readability and context.
map_df(study_names, ~perform_combined_t_tests(studies[[.x]], .x, variables), .id = "Study") %>%
rename(
Mean_Difference = estimate,
Mean_Group1 = estimate1,
Mean_Group2 = estimate2,
T_Statistic = statistic,
P_Value = p.value,
Degrees_of_Freedom = parameter,
CI_Low = conf.low,
CI_High = conf.high,
Test_Method = method,
Hypothesis_Testing = alternative,
Variable_Tested = variable,
Groups_Compared = Comparison
%>%
) relocate(Study, Variable_Tested, Groups_Compared) %>%
arrange(Variable_Tested) -> ttest_results
%>% kable() ttest_results
Study | Variable_Tested | Groups_Compared | Mean_Difference | Mean_Group1 | Mean_Group2 | T_Statistic | P_Value | Degrees_of_Freedom | CI_Low | CI_High | Test_Method | Hypothesis_Testing |
---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | BS | Men entrepreneur vs Women entrepreneur | -0.0753034 | 3.182746 | 3.258049 | -0.9081468 | 0.3643681 | 385.4518 | -0.2383353 | 0.0877285 | Welch Two Sample t-test | two.sided |
1 | BS | Men participant vs Women participant | 0.3886386 | 3.384204 | 2.995565 | 4.7410538 | 0.0000031 | 351.5642 | 0.2274194 | 0.5498579 | Welch Two Sample t-test | two.sided |
2 | BS | Men entrepreneur vs Women entrepreneur | 0.0391166 | 2.925879 | 2.886762 | 0.4689355 | 0.6392957 | 567.9061 | -0.1247245 | 0.2029578 | Welch Two Sample t-test | two.sided |
2 | BS | Men participant vs Women participant | 0.5317086 | 3.167738 | 2.636029 | 6.5939030 | 0.0000000 | 556.5458 | 0.3733197 | 0.6900975 | Welch Two Sample t-test | two.sided |
3 | BS | Men entrepreneur vs Women entrepreneur | -0.0130981 | 2.624402 | 2.637500 | -0.1142094 | 0.9091458 | 309.7260 | -0.2387580 | 0.2125619 | Welch Two Sample t-test | two.sided |
3 | BS | Men participant vs Women participant | 0.6240524 | 2.904982 | 2.280930 | 5.6356737 | 0.0000000 | 284.4628 | 0.4060933 | 0.8420115 | Welch Two Sample t-test | two.sided |
1 | HS | Men entrepreneur vs Women entrepreneur | 0.1064954 | 2.888219 | 2.781724 | 1.0928270 | 0.2751513 | 385.9842 | -0.0851028 | 0.2980937 | Welch Two Sample t-test | two.sided |
1 | HS | Men participant vs Women participant | 0.6201277 | 3.092966 | 2.472838 | 6.7316285 | 0.0000000 | 364.1709 | 0.4389708 | 0.8012846 | Welch Two Sample t-test | two.sided |
2 | HS | Men entrepreneur vs Women entrepreneur | 0.0410624 | 2.569930 | 2.528868 | 0.4206748 | 0.6741514 | 569.0000 | -0.1506592 | 0.2327841 | Welch Two Sample t-test | two.sided |
2 | HS | Men participant vs Women participant | 0.6371947 | 2.861794 | 2.224599 | 6.7742614 | 0.0000000 | 563.5546 | 0.4524415 | 0.8219479 | Welch Two Sample t-test | two.sided |
3 | HS | Men entrepreneur vs Women entrepreneur | 0.0625897 | 2.479067 | 2.416477 | 0.4741748 | 0.6357092 | 309.8832 | -0.1971343 | 0.3223137 | Welch Two Sample t-test | two.sided |
3 | HS | Men participant vs Women participant | 0.5096408 | 2.674371 | 2.164730 | 3.8997625 | 0.0001202 | 284.3998 | 0.2524081 | 0.7668736 | Welch Two Sample t-test | two.sided |
1 | Invest | Men entrepreneur vs Women entrepreneur | -2072.9177083 | 36428.400000 | 38501.317708 | -0.8550577 | 0.3930515 | 384.6347 | -6839.4597900 | 2693.6243733 | Welch Two Sample t-test | two.sided |
1 | Invest | Men participant vs Women participant | -8655.9134146 | 33711.550000 | 42367.463415 | -3.6274683 | 0.0003266 | 367.9959 | -13348.2381987 | -3963.5886306 | Welch Two Sample t-test | two.sided |
2 | Invest | Men entrepreneur vs Women entrepreneur | 436.7703180 | 37402.498233 | 36965.727915 | 0.2066800 | 0.8363348 | 560.7269 | -3714.1188533 | 4587.6594894 | Welch Two Sample t-test | two.sided |
2 | Invest | Men participant vs Women participant | -6029.8134834 | 34352.074576 | 40381.888060 | -2.8668131 | 0.0043028 | 558.2159 | -10161.1944967 | -1898.4324701 | Welch Two Sample t-test | two.sided |
3 | Invest | Men entrepreneur vs Women entrepreneur | -5071.4003289 | 27957.243421 | 33028.643750 | -1.9359110 | 0.0538026 | 304.9616 | -10226.2688059 | 83.4681480 | Welch Two Sample t-test | two.sided |
3 | Invest | Men participant vs Women participant | -6797.3377512 | 27774.316384 | 34571.654135 | -2.5451717 | 0.0114793 | 269.9483 | -12055.3467311 | -1539.3287712 | Welch Two Sample t-test | two.sided |
1 | viable | Men entrepreneur vs Women entrepreneur | -0.0756271 | 4.630102 | 4.705729 | -0.5444594 | 0.5864398 | 385.9875 | -0.3487286 | 0.1974744 | Welch Two Sample t-test | two.sided |
1 | viable | Men participant vs Women participant | -0.3954586 | 4.488688 | 4.884146 | -2.9155343 | 0.0037622 | 378.3387 | -0.6621582 | -0.1287589 | Welch Two Sample t-test | two.sided |
2 | viable | Men entrepreneur vs Women entrepreneur | 0.1813687 | 5.163763 | 4.982394 | 1.4478839 | 0.1482003 | 568.9732 | -0.0646689 | 0.4274063 | Welch Two Sample t-test | two.sided |
2 | viable | Men participant vs Women participant | -0.2442866 | 4.956081 | 5.200368 | -1.9509215 | 0.0515594 | 565.9984 | -0.4902313 | 0.0016582 | Welch Two Sample t-test | two.sided |
3 | viable | Men entrepreneur vs Women entrepreneur | -0.3351974 | 4.171053 | 4.506250 | -2.1421644 | 0.0329602 | 309.5410 | -0.6430886 | -0.0273062 | Welch Two Sample t-test | two.sided |
3 | viable | Men participant vs Women participant | -0.2909392 | 4.220339 | 4.511278 | -1.8217760 | 0.0695471 | 282.3129 | -0.6052948 | 0.0234164 | Welch Two Sample t-test | two.sided |
That’s a lot of results. Let’s break down these findings, starting with benevolent sexism.
# Presenting t-test results specifically for benevolent sexism.
%>% filter(Variable_Tested == "BS") %>% kable() ttest_results
Study | Variable_Tested | Groups_Compared | Mean_Difference | Mean_Group1 | Mean_Group2 | T_Statistic | P_Value | Degrees_of_Freedom | CI_Low | CI_High | Test_Method | Hypothesis_Testing |
---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | BS | Men entrepreneur vs Women entrepreneur | -0.0753034 | 3.182746 | 3.258049 | -0.9081468 | 0.3643681 | 385.4518 | -0.2383353 | 0.0877285 | Welch Two Sample t-test | two.sided |
1 | BS | Men participant vs Women participant | 0.3886386 | 3.384204 | 2.995565 | 4.7410538 | 0.0000031 | 351.5642 | 0.2274194 | 0.5498579 | Welch Two Sample t-test | two.sided |
2 | BS | Men entrepreneur vs Women entrepreneur | 0.0391166 | 2.925879 | 2.886762 | 0.4689355 | 0.6392957 | 567.9061 | -0.1247245 | 0.2029578 | Welch Two Sample t-test | two.sided |
2 | BS | Men participant vs Women participant | 0.5317086 | 3.167738 | 2.636029 | 6.5939030 | 0.0000000 | 556.5458 | 0.3733197 | 0.6900975 | Welch Two Sample t-test | two.sided |
3 | BS | Men entrepreneur vs Women entrepreneur | -0.0130981 | 2.624402 | 2.637500 | -0.1142094 | 0.9091458 | 309.7260 | -0.2387580 | 0.2125619 | Welch Two Sample t-test | two.sided |
3 | BS | Men participant vs Women participant | 0.6240524 | 2.904982 | 2.280930 | 5.6356737 | 0.0000000 | 284.4628 | 0.4060933 | 0.8420115 | Welch Two Sample t-test | two.sided |
In our studies, there’s balance in benevolent sexism scores between conditions—thanks to random assignment. However, a noticeable gender gap emerges, with men participants showing higher levels.
Next up, hostile sexism.
# Presenting t-test results specifically for hostile sexism.
%>% filter(Variable_Tested == "HS") %>% kable() ttest_results
Study | Variable_Tested | Groups_Compared | Mean_Difference | Mean_Group1 | Mean_Group2 | T_Statistic | P_Value | Degrees_of_Freedom | CI_Low | CI_High | Test_Method | Hypothesis_Testing |
---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | HS | Men entrepreneur vs Women entrepreneur | 0.1064954 | 2.888219 | 2.781724 | 1.0928270 | 0.2751513 | 385.9842 | -0.0851028 | 0.2980937 | Welch Two Sample t-test | two.sided |
1 | HS | Men participant vs Women participant | 0.6201277 | 3.092966 | 2.472838 | 6.7316285 | 0.0000000 | 364.1709 | 0.4389708 | 0.8012846 | Welch Two Sample t-test | two.sided |
2 | HS | Men entrepreneur vs Women entrepreneur | 0.0410624 | 2.569930 | 2.528868 | 0.4206748 | 0.6741514 | 569.0000 | -0.1506592 | 0.2327841 | Welch Two Sample t-test | two.sided |
2 | HS | Men participant vs Women participant | 0.6371947 | 2.861794 | 2.224599 | 6.7742614 | 0.0000000 | 563.5546 | 0.4524415 | 0.8219479 | Welch Two Sample t-test | two.sided |
3 | HS | Men entrepreneur vs Women entrepreneur | 0.0625897 | 2.479067 | 2.416477 | 0.4741748 | 0.6357092 | 309.8832 | -0.1971343 | 0.3223137 | Welch Two Sample t-test | two.sided |
3 | HS | Men participant vs Women participant | 0.5096408 | 2.674371 | 2.164730 | 3.8997625 | 0.0001202 | 284.3998 | 0.2524081 | 0.7668736 | Welch Two Sample t-test | two.sided |
Hostile sexsim scores follow a similar trend to benevolent sexism scores: no difference between experimental conditions and higher among men participants than women participants.
Similar to benevolent sexism, experimental conditions are balanced in terms of hostile sexism scores. Yet, men participants outscored women, indicating a gender divide.
What about startup viability perceptions?
# Presenting t-test results specifically for viability.
%>% filter(Variable_Tested == "viable") %>% kable() ttest_results
Study | Variable_Tested | Groups_Compared | Mean_Difference | Mean_Group1 | Mean_Group2 | T_Statistic | P_Value | Degrees_of_Freedom | CI_Low | CI_High | Test_Method | Hypothesis_Testing |
---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | viable | Men entrepreneur vs Women entrepreneur | -0.0756271 | 4.630102 | 4.705729 | -0.5444594 | 0.5864398 | 385.9875 | -0.3487286 | 0.1974744 | Welch Two Sample t-test | two.sided |
1 | viable | Men participant vs Women participant | -0.3954586 | 4.488688 | 4.884146 | -2.9155343 | 0.0037622 | 378.3387 | -0.6621582 | -0.1287589 | Welch Two Sample t-test | two.sided |
2 | viable | Men entrepreneur vs Women entrepreneur | 0.1813687 | 5.163763 | 4.982394 | 1.4478839 | 0.1482003 | 568.9732 | -0.0646689 | 0.4274063 | Welch Two Sample t-test | two.sided |
2 | viable | Men participant vs Women participant | -0.2442866 | 4.956081 | 5.200368 | -1.9509215 | 0.0515594 | 565.9984 | -0.4902313 | 0.0016582 | Welch Two Sample t-test | two.sided |
3 | viable | Men entrepreneur vs Women entrepreneur | -0.3351974 | 4.171053 | 4.506250 | -2.1421644 | 0.0329602 | 309.5410 | -0.6430886 | -0.0273062 | Welch Two Sample t-test | two.sided |
3 | viable | Men participant vs Women participant | -0.2909392 | 4.220339 | 4.511278 | -1.8217760 | 0.0695471 | 282.3129 | -0.6052948 | 0.0234164 | Welch Two Sample t-test | two.sided |
Surprisingly, men- and women-led startups were seen as equally viable in most studies. An interesting deviation in Study 3 paints women-led startups more favorably. This upends common stereotypes, likely due the fact that in our experimental scenario the entrepreneurs are portrayed as highly competent and the startup was pre-tested to be seen as a viable idea. And, as we’ll see in subsequent regression analyses, while on the surface there seems to be no bias, benevolent sexism actually plays a role in creating inequity in startup evaluation.
Lastly, the matter of investment decisions.
%>% filter(Variable_Tested == "Invest") %>% kable() ttest_results
Study | Variable_Tested | Groups_Compared | Mean_Difference | Mean_Group1 | Mean_Group2 | T_Statistic | P_Value | Degrees_of_Freedom | CI_Low | CI_High | Test_Method | Hypothesis_Testing |
---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | Invest | Men entrepreneur vs Women entrepreneur | -2072.9177 | 36428.40 | 38501.32 | -0.8550577 | 0.3930515 | 384.6347 | -6839.460 | 2693.62437 | Welch Two Sample t-test | two.sided |
1 | Invest | Men participant vs Women participant | -8655.9134 | 33711.55 | 42367.46 | -3.6274683 | 0.0003266 | 367.9959 | -13348.238 | -3963.58863 | Welch Two Sample t-test | two.sided |
2 | Invest | Men entrepreneur vs Women entrepreneur | 436.7703 | 37402.50 | 36965.73 | 0.2066800 | 0.8363348 | 560.7269 | -3714.119 | 4587.65949 | Welch Two Sample t-test | two.sided |
2 | Invest | Men participant vs Women participant | -6029.8135 | 34352.07 | 40381.89 | -2.8668131 | 0.0043028 | 558.2159 | -10161.194 | -1898.43247 | Welch Two Sample t-test | two.sided |
3 | Invest | Men entrepreneur vs Women entrepreneur | -5071.4003 | 27957.24 | 33028.64 | -1.9359110 | 0.0538026 | 304.9616 | -10226.269 | 83.46815 | Welch Two Sample t-test | two.sided |
3 | Invest | Men participant vs Women participant | -6797.3378 | 27774.32 | 34571.65 | -2.5451717 | 0.0114793 | 269.9483 | -12055.347 | -1539.32877 | Welch Two Sample t-test | two.sided |
Financial backing was fairly even men- and women-led startups in all studies, though men participants were somewhat more conservative in their funding. This peels back layers on how participant gender influences startup support.
Summary
In this journey through our dataset, we’ve taken some crucial first steps before diving into the deeper waters of regression analysis. By calculating descriptive statistics, peeking at our sample sizes through colorful bar charts, exploring the shapes of our key variables with histograms, and comparing means across different groups with bar charts and t-tests, we’ve essentially mapped out the terrain of our data landscape. This initial exploration is not merely about crunching numbers—it’s about getting to know the fundamental properties our data—recognizing its patterns, its quirks, and how it speaks to the larger story we’re aiming to tell. In the next phase, we’ll dive into another crucial step before regression analysis: exploring the relationships between variables through correlation analyses.