Now that we have our tidied dataset, we can conduct some exploratory analyses to answer the 3 following questions:
First, we want to examine the distribution of movies according to
Bechdel Test dimension. The code chunk below summarizes the proportion
of movies in movies_df
according to their Bechdel Test
Score, as well as the proportion of movies that pass or fail the Bechdel
test.
movies_df %>%
group_by(pass_bechdel) %>%
summarise(N = n()) %>%
mutate(Proportion = N/sum(N)) %>%
rename("Passes Bechdel Test" = pass_bechdel) %>%
knitr::kable(digits = 3) %>%
kableExtra::kable_styling(bootstrap_options = c("striped", "hover"))
Passes Bechdel Test | N | Proportion |
---|---|---|
FALSE | 849 | 0.473 |
TRUE | 945 | 0.527 |
movies_df %>%
group_by(bechdel_score) %>%
summarise(N = n()) %>%
mutate(Proportion = N/sum(N)) %>%
rename("Bechdel Test Criterion" = bechdel_score) %>%
knitr::kable(digits = 3) %>%
kableExtra::kable_styling(bootstrap_options = c("striped", "hover"))
Bechdel Test Criterion | N | Proportion |
---|---|---|
Don’t talk to each other | 514 | 0.287 |
Dubious | 142 | 0.079 |
Less than 2 women | 141 | 0.079 |
Only talk about men | 194 | 0.108 |
Passes Bechdel | 803 | 0.448 |
The tables indicate slightly over half of the movies in the data pass the Bechdel Test, with 52.7% of movies passing and 47.3% of movies failing. Examining these categories in more detail, we can see that among those movies that don’t pass the Bechdel Test, most movies have women that don’t talk to each other. Among movies that pass the Bechdel Test, we can see only a small proportion of these movies have a “dubious” or debatable pass.
Next, we want to explore whether the number of movies passing the Bechdel Test have increased over time. The code chunk below plots the distribution of movies according to their Bechdel score by decade.
seventies_df = movies_df %>%
filter(decade == "1970-1979") %>%
group_by(decade, bechdel_score) %>%
summarise(N = n()) %>%
mutate(Proportion = N/sum(N))
eighties_df = movies_df %>%
filter(decade == "1980-1989") %>%
group_by(decade, bechdel_score) %>%
summarise(N = n()) %>%
mutate(Proportion = N/sum(N))
nineties_df = movies_df %>%
filter(decade == "1990-1999") %>%
group_by(decade, bechdel_score) %>%
summarise(N = n()) %>%
mutate(Proportion = N/sum(N))
thousands_df = movies_df %>%
filter(decade == "2000-2009") %>%
group_by(decade, bechdel_score) %>%
summarise(N = n()) %>%
mutate(Proportion = N/sum(N))
tens_df = movies_df %>%
filter(decade == "2010-2013") %>%
group_by(decade, bechdel_score) %>%
summarise(N = n()) %>%
mutate(Proportion = N/sum(N))
decades = bind_rows(seventies_df, eighties_df, nineties_df, thousands_df, tens_df)
decades %>%
plot_ly(x = ~decade, y = ~Proportion*100, type = "bar", color = ~bechdel_score, colors = "RdYlGn") %>%
layout(barmode = "stack",
title = "Distribution of movies passing Bechdel Test Criteria by decade",
xaxis = list(title = "Decade"),
yaxis = list(title = "Percentage (%)"))
This chart shows the breakdown of our sample of films stratified by decade. Looking at the plot, we can see that since 1970, an increasing proportion of movies pass the Bechdel Test, however this level has seemed to plateau around 54-55% since the 2000’s. This may be partially attributed to a smaller sample of films in the 2010s. Despite this, the proportion of movies that decisively pass the Bechdel Test (‘Dubious’ means contributors were skeptical about whether the films in question passed the test) remains under 50%.
Next, we want to explore how movie budgets may vary according to the primacy of women’s roles in movies. Grouping by Bechdel Test score, we can compute the distribution of movie budget, adjusted to 2013 inflation.
movies_df %>%
mutate(pass_bechdel = ifelse(pass_bechdel == TRUE, "Pass", "Fail")) %>%
plot_ly(x = ~budget_2013, y = ~pass_bechdel, type = "box", text = ~title, color = ~pass_bechdel, colors = c("red4", "green4")) %>%
layout(xaxis = list(title = "Movie budget ($)"),
yaxis = list(title = "Bechdel Test"),
showlegend = FALSE)
movies_df %>%
plot_ly(x = ~budget_2013, y = ~bechdel_score, type = "box", text = ~title, color = ~bechdel_score, colors = c("darkred","orange1", "green4")) %>%
layout(xaxis = list(title = "Movie budget ($)"),
yaxis = list(title = "Bechdel Criterion"),
showlegend = FALSE,
title = "Movie budgets (2013-adjusted) stratified by Bechdel Test criteria")
This chart shows the range of movie budgets stratified by criteria of the Bechdel Test. Movies that pass the Bechdel Test appear to have slightly smaller budgets than movies that don’t pass when we stratify on a binary basis. We notice that film budgets are fairly right-skewed, indicating we will need to log-transform this variable in subsequent analyses. When stratifying by the categorical Bechdel score, the budgets are not significantly different from each other, however, it appears that the interquartile range for movies that firmly pass the Bechdel Test is slightly lower than movies that don’t firmly pass the Bechdel Test. These numbers may suggest that Hollywood puts more money behind male-only films than films in which women talk to each other, however, this will be further explored in subsequent analyses
Additionally, we want to explore how movie revenues may vary
according to the primacy of women’s roles in movies. We can examine this
using the profit
and ROI
variables in our
data.
Starting with profit
, which is the difference between
intgross_2013
(2013-adjusted international gross revenue)
and budget_2013
(2013-adjusted budget), we can plot the
range of profits for each movie, stratified by their Bechdel Test
criteria.
movies_df %>%
mutate(pass_bechdel = ifelse(pass_bechdel == TRUE, "Pass", "Fail")) %>%
plot_ly(x = ~profit, y = ~pass_bechdel, type = "box", text = ~title, color = ~bechdel_score, colors = c("darkred","orange1", "green4")) %>%
layout(boxmode = "group",
xaxis = list(title = "Profit ($)"),
yaxis = list(title = "Bechdel Criterion"),
showlegend = FALSE,
title = "Movie profits stratified by Bechdel Test criteria")
Looking at the resulting plot, we see that the distribution of profits between movies passing vs. failing the Bechdel test are quite similar, with movies that fail the Bechdel Test reeling in slightly higher profits. The profits overall appear quite right-skewed, with some outliers bringing in similar amounts of profit (e.g. Star Wars and the Titanic). Given the skewed-ness of this outcome, we will need to log-transform this variable in subsequent analyses. When we stratify further by each dimension of the Bechdel Test, there are no notable differences in the profits between Bechdel criteria.
Next, let’s examine ROI
, which is the ratio of
profit
to budget_2013
. Because ROI is heavily
right-skewed, we will log-transform this variable to better visualize
differences in the ROI distribution between movies of differing Bechdel
criteria.
movies_df %>%
mutate(pass_bechdel = ifelse(pass_bechdel == TRUE, "Pass", "Fail")) %>%
plot_ly(x = ~log(ROI), y = ~pass_bechdel, type = "box", text = ~title, color = ~bechdel_score, colors = c("darkred","orange1", "green4")) %>%
layout(boxmode = "group",
xaxis = list(title = "Log(Return on Investment)"),
yaxis = list(title = "Bechdel Criterion"),
showlegend = FALSE,
title = "Movie ROI stratified by Bechdel Test criteria")
Examining the log(ROI)
across Bechdel dimensions, the
range of financial performance does not appear to differ substantially,
as the median and IQR are approximately the same across all categories.
Therefore, despite having slightly lower budgets, we can see that the
primacy of women’s roles in movies do not negatively impact movies’
financial performance, and therefore refutes claims that films with
stronger female characters perform any worse at the box office than
films without them.
Although financial performance is an important indicator of success for films, we may also want to consider how well movies are received by fans and critics depending on the primacy of female roles. We can plot IMDB ratings (fan ratings) against Metacritic Scores (aggregate scores based on critic reviews) and stratify by Bechdel Score.
movies_df %>%
plot_ly(x = ~imdb_rating, y = ~metascore/10, color = ~bechdel_score, type = "scatter",
mode = "markers", colors = c("darkred","orange1", "green4"), text = ~title, alpha = 0.6) %>%
layout(yaxis = list(title = list(text = "Metascore", standoff = 5), gridcolor = "lightgray"),
xaxis = list(title = "IMDB Rating"), gridcolor = "gray",
title = "Movie IMDB and Metacritic Scores stratified by Bechdel Test criteria")
Based on the scatterplot, we do not see any discernable clusters that indicate movies that pass the Bechdel test perform better or worse than movies that do not pass the Bechdel Test. However, we do notice that there films with high outlying fan and critic scores appear to fail the Bechdel Test. Additionally, we see that there are more films passing the Bechdel Test with higher critic ratings relative to fan ratings, and more films that fail the Bechdel Test with lower critic ratings relative to fan ratings. This could perhaps be inherent biases held by fans to be more critical of films with female-dominant roles, while critics may be more likely to isolate their own biases while judging films.