We tested the association of bechdel_score
with outcomes
budget
, profit
, imdb_rating
, and
metascore
individually using linear regression.
Bechdel score of "Less than 2 women"
was taken as the
reference category. A movie with less than 2 women would fail all three
dimensions of the Bechdel test which are: 1) have at least 2 women in
the movie 2) the women talk to each other 3) they talk about something
other than a man. Thus, the intercept will be used to compare movies
that fail all dimensions of the Bechdel test with the movies that pass
all dimensions of the Bechdel test.
budget = movies_df %>% ggplot(aes(x = budget_2013)) + geom_histogram(alpha = 0.8, color = "white") +
labs(
x = "Budget (Dollars, 2013-adjusted)" ,
y = "Count",
title = "Distribution of Movie Budgets")
ggplotly(budget)
Since budget was heavily right-skewed, we log-transformed it to enforce more normal distributions.
log_budget = movies_df %>% ggplot(aes(x = log(budget_2013))) + geom_histogram(alpha = 0.8, color = "white") +
labs(
x = "Log-Budget (Dollars, 2013-adjusted)",
y = "Count" ,
title = "Distribution of Log-Transformed Movie Budgets")
ggplotly(log_budget)
We then quantitatively tested the association of the
log(budget)
variable with the categorical Bechdel score
using linear regression.
Model statement: \(log(Budget) =\beta_0+ \beta_1(<2 women) +\beta_2(don't talk) +\beta_3(talk about men) + beta_4(dubious)+beta_5(pass)\)
log_budget= movies_df %>%
lm(log(budget_2013)~bechdel_score, data=.)%>%
broom::tidy()
knitr::kable(log_budget, digits=3)
term | estimate | std.error | statistic | p.value |
---|---|---|---|---|
(Intercept) | 17.341 | 0.116 | 150.109 | 0.000 |
bechdel_scoreDon’t talk to each other | 0.154 | 0.130 | 1.183 | 0.237 |
bechdel_scoreOnly talk about men | -0.226 | 0.152 | -1.488 | 0.137 |
bechdel_scoreDubious | -0.078 | 0.163 | -0.478 | 0.633 |
bechdel_scorePasses Bechdel | -0.310 | 0.125 | -2.475 | 0.013 |
Based on parameter estimate of -0.31 with a p value of 0.01, passing all three Bechdel dimensions had a negative relationship with the budget of the movies. Movies that passed all three Bechdel dimensions had a lower budget than movies that failed every Bechdel dimension.
$33,963,942
$24,910,750
profit = movies_df %>% ggplot(aes(x = profit)) + geom_histogram(alpha = 0.8, color = "white") +
labs(
x = "Profit (Dollars)",
y = "Count",
title = "Distribution of Profits")
ggplotly(profit)
In the graph, profit appears to be heavily right skewed. Thus, we log-transformed the data to enforce more normal distributions. Additionally, before the logarithm is applied, 1 was added to the base value to prevent applying a logarithm to a 0 value. This will be subtracted while calculating the results.
profit = movies_df %>% ggplot(aes(x = log(profit))) + geom_histogram(alpha = 0.8, color = "white") +
labs(
x = "Log-Profit",
y = "Count",
title = "Distribution of Log-Transformed Profits")
ggplotly(profit)
We quantitatively tested the association of profit
with
the categorical Bechdel score using linear regression.
Model statement : \(log(profit) =\beta_0+ \beta_1(<2 women) +\beta_2(don't talk) +\beta_3(talk about men) + beta_4(dubious)+beta_5(pass)\)
profit = movies_df %>%
lm(log(profit+1)~bechdel_score, data =.) %>%
broom::tidy()
knitr::kable(profit, digits=3)
term | estimate | std.error | statistic | p.value |
---|---|---|---|---|
(Intercept) | 18.139 | 0.160 | 113.117 | 0.000 |
bechdel_scoreDon’t talk to each other | 0.080 | 0.179 | 0.447 | 0.655 |
bechdel_scoreOnly talk about men | -0.012 | 0.208 | -0.057 | 0.955 |
bechdel_scoreDubious | 0.071 | 0.224 | 0.317 | 0.751 |
bechdel_scorePasses Bechdel | -0.282 | 0.173 | -1.625 | 0.104 |
Based on the parameter estimate of -0.044, profits of movies that passed all three dimensions of the Bechdel test had lower profits than movies that failed all dimensions of the Bechdel test. However since the p-value of 0.7 > 0.05, the difference was not found to be statistically significant.
$75,526,942
$57,082,034
…
imdb = movies_df %>% ggplot(aes(x = imdb_rating)) + geom_histogram(alpha = 0.8, color = "white") +
labs(
x = "IMDB Rating",
y = "Count",
title = "Distribution of IMDB Ratings")
ggplotly(imdb)
Testing association of IMDB rating with categorical Bechdel scores using linear regression:
Model statement: \(IMDB =\beta_0+ \beta_1(<2 women) +\beta_2(don't talk) +\beta_3(talk about men) + beta_4(dubious)+beta_5(pass)\)
imdb = movies_df %>%
lm(imdb_rating~bechdel_score, data=.) %>%
broom::tidy()
knitr::kable(imdb, digits=3)
term | estimate | std.error | statistic | p.value |
---|---|---|---|---|
(Intercept) | 6.944 | 0.085 | 81.848 | 0.000 |
bechdel_scoreDon’t talk to each other | -0.032 | 0.095 | -0.335 | 0.738 |
bechdel_scoreOnly talk about men | -0.074 | 0.113 | -0.653 | 0.514 |
bechdel_scoreDubious | -0.208 | 0.121 | -1.723 | 0.085 |
bechdel_scorePasses Bechdel | -0.344 | 0.092 | -3.729 | 0.000 |
Based on the parameter estimate of -0.344 with a p-value of 0.00019,
movies that passed all three dimensions of the Bechdel test had a
negative relationship with imdbrating
.
meta = movies_df %>% ggplot(aes(x = metascore)) + geom_histogram(alpha = 0.8, color = "white") +
labs(
x = "Metascore",
y = "Count",
title = "Distribution of Metascores")
ggplotly(meta)
Testing association of Metascore with categorical Bechdel scores using linear regression:
Model statement: \(Metascore =\beta_0+ \beta_1(<2 women) +\beta_2(don't talk) +\beta_3(talk about men) + beta_4(dubious)+beta_5(pass)\)
meta = movies_df %>%
lm(metascore~bechdel_score, data=.) %>%
broom::tidy()
knitr::kable(meta, digits=3)
term | estimate | std.error | statistic | p.value |
---|---|---|---|---|
(Intercept) | 58.730 | 1.636 | 35.897 | 0.000 |
bechdel_scoreDon’t talk to each other | 1.306 | 1.841 | 0.709 | 0.478 |
bechdel_scoreOnly talk about men | -0.591 | 2.155 | -0.274 | 0.784 |
bechdel_scoreDubious | 2.032 | 2.324 | 0.874 | 0.382 |
bechdel_scorePasses Bechdel | -0.464 | 1.775 | -0.262 | 0.794 |
Based on the parameter estimate -0.4642, movies that pass all dimensions of the Bechdel test had a negative relationship with Metascores. However since the p-value of 0.7 > 0.05, the difference was not statistically significant.
In order to assess whether a movie passing the Bechdel test is a
significant predictor of that movie’s success, we utilized a stepwise
selection algorithm. The stepAIC
command from the
MASS
package performs stepwise model selection by
optimizing AIC.
The setup of the regression models here follows the methodology
outlined previously including log-transforming variables like
profit
and budget_2013
to enforce a normal
distribution. Additionally, the binary Bechdel test variable was applied
in these models to optimize degrees of freedom.
We assessed movie success based on three factors:
profit
, imdb_rating
, and
budget_2013
.
We first assessed the Bechdel test as a potential predictor of
profit
. Other potential predictors included in the model
were budget_2013
, imdb_rating
, and all genre
variables.
# Stepwise regression for profit
## logProfit ~ Bechdel (binary) + logBudget + IMDB + Genre
modelprofit = lm(log(profit +1) ~ pass_bechdel + log(budget_2013) + imdb_rating + action + adventure + animation + biography + comedy + crime + documentary + drama + family + fantasy + history + horror + music + musical + mystery + romance + sci_fi + sport + thriller + war + western, data = movies_df)
stepprofit <- MASS::stepAIC(modelprofit, direction = "both", trace = FALSE) %>% broom::tidy()
knitr::kable(stepprofit, digits = 3)
term | estimate | std.error | statistic | p.value |
---|---|---|---|---|
(Intercept) | 4.205 | 0.599 | 7.020 | 0.000 |
log(budget_2013) | 0.646 | 0.030 | 21.721 | 0.000 |
imdb_rating | 0.395 | 0.041 | 9.539 | 0.000 |
adventureTRUE | 0.453 | 0.101 | 4.463 | 0.000 |
biographyTRUE | -0.338 | 0.179 | -1.886 | 0.060 |
dramaTRUE | -0.369 | 0.085 | -4.330 | 0.000 |
horrorTRUE | 0.547 | 0.128 | 4.264 | 0.000 |
musicTRUE | 0.481 | 0.262 | 1.834 | 0.067 |
romanceTRUE | 0.197 | 0.107 | 1.848 | 0.065 |
sci_fiTRUE | -0.210 | 0.113 | -1.851 | 0.064 |
westernTRUE | -2.685 | 0.496 | -5.412 | 0.000 |
The algorithm selected budget, IMDB rating, and the genre variables
of adventure, biography, drama, horror, music, romance, sci-fi, and
western as influential predictors of a movie’s profit.
pass_bechdel
was not selected as an influential
predictor.
The second metric we assessed as a measure of movie success was movie
reviews as measured by imdb_rating
. Other potential
predictors we included in the model were budget, runtime
,
profit
, award_winner
, and all genre
variables.
# Stepwise regression for imdb ratings
## IMDB ~ Bechdel (binary) + logBudget + runtime + award_winner + profit + Genre
modelIMDB = lm(imdb_rating ~ pass_bechdel + log(budget_2013) + profit + imdb_rating + action + adventure + animation + biography + comedy + crime + documentary + drama + family + fantasy + history + horror + music + musical + mystery + romance + sci_fi + sport + thriller + war + western, data = movies_df)
stepIMDB <- MASS::stepAIC(modelIMDB, direction = "both", trace = FALSE) %>% broom::tidy()
knitr::kable(stepIMDB, digits = 3)
term | estimate | std.error | statistic | p.value |
---|---|---|---|---|
(Intercept) | 8.427 | 0.312 | 26.991 | 0.000 |
pass_bechdelTRUE | -0.255 | 0.043 | -5.943 | 0.000 |
log(budget_2013) | -0.080 | 0.018 | -4.450 | 0.000 |
profit | 0.000 | 0.000 | 14.631 | 0.000 |
actionTRUE | -0.418 | 0.060 | -6.930 | 0.000 |
adventureTRUE | -0.212 | 0.065 | -3.266 | 0.001 |
animationTRUE | 0.320 | 0.092 | 3.477 | 0.001 |
biographyTRUE | 0.336 | 0.102 | 3.299 | 0.001 |
comedyTRUE | -0.412 | 0.055 | -7.490 | 0.000 |
crimeTRUE | 0.140 | 0.062 | 2.268 | 0.023 |
dramaTRUE | 0.199 | 0.056 | 3.526 | 0.000 |
familyTRUE | -0.462 | 0.089 | -5.218 | 0.000 |
fantasyTRUE | -0.116 | 0.070 | -1.655 | 0.098 |
horrorTRUE | -0.684 | 0.081 | -8.438 | 0.000 |
musicTRUE | -0.236 | 0.139 | -1.703 | 0.089 |
romanceTRUE | -0.139 | 0.065 | -2.138 | 0.033 |
thrillerTRUE | -0.242 | 0.063 | -3.834 | 0.000 |
warTRUE | 0.233 | 0.156 | 1.499 | 0.134 |
The algorithm selected the Bechdel test, budget, profit, and the
genre variables of action, adventure, animation, biography, comedy,
crime, drama, family, fantasy, horror, music, romance, thriller, and war
as influential variables of a movie’s IMDB rating. The p-value of
pass_bechdel
is significant at 5% and, on average, movies
that pass the Bechdel test have a 0.255 point lower rating on IMDB than
movies that do not pass, when adjusting for budget, profit, and the
aforementioned genre variables.
As budget is a significant predictor of both profit and IMDB ratings,
we also assessed whether passing the Bechdel test is an influential
predictor of a movie’s budget. Other potential predictors we included in
the model were runtime
and all genre variables.
# Stepwise regression for budget
## logBudget ~ Bechdel (binary) + runtime + Genre
modelbudget = lm(log(budget_2013) ~ pass_bechdel + runtime + action + adventure + animation + biography + comedy + crime + documentary + drama + family + fantasy + history + horror + music + musical + mystery + romance + sci_fi + sport + thriller + war + western, data = movies_df)
stepbudget <- MASS::stepAIC(modelbudget, direction = "both", trace = FALSE) %>% broom::tidy()
knitr::kable(stepbudget, digits = 3)
term | estimate | std.error | statistic | p.value |
---|---|---|---|---|
(Intercept) | 13.521 | 0.186 | 72.873 | 0.000 |
runtime | 0.026 | 0.001 | 17.882 | 0.000 |
actionTRUE | 0.817 | 0.076 | 10.814 | 0.000 |
adventureTRUE | 0.622 | 0.082 | 7.613 | 0.000 |
animationTRUE | 1.300 | 0.124 | 10.492 | 0.000 |
biographyTRUE | 0.326 | 0.134 | 2.433 | 0.015 |
comedyTRUE | 0.287 | 0.073 | 3.917 | 0.000 |
crimeTRUE | 0.173 | 0.080 | 2.157 | 0.031 |
documentaryTRUE | -1.750 | 0.539 | -3.246 | 0.001 |
dramaTRUE | -0.247 | 0.071 | -3.474 | 0.001 |
familyTRUE | 0.677 | 0.115 | 5.890 | 0.000 |
fantasyTRUE | 0.549 | 0.093 | 5.904 | 0.000 |
musicalTRUE | 0.531 | 0.237 | 2.239 | 0.025 |
mysteryTRUE | 0.483 | 0.101 | 4.771 | 0.000 |
romanceTRUE | 0.285 | 0.084 | 3.388 | 0.001 |
sci_fiTRUE | 0.424 | 0.090 | 4.726 | 0.000 |
sportTRUE | 0.571 | 0.208 | 2.751 | 0.006 |
thrillerTRUE | 0.331 | 0.083 | 3.976 | 0.000 |
The algorithm selected runtime
and the genre variables
of action, adventure, animation, biography, comedy, crime, documentary,
drama, family, fantasy, musical, mystery, romance, sci-fi, sport, and
thriller. pass_bechdel
was not selected as an influential
predictor.
In conclusion, based on our data, whether or not a movie passes the Bechdel test is not a significant predictor of that movie’s potential for success based on the criteria of profit and budget. It is a significant predictor of a movie’s IMDB ratings where movies that pass have, on average, a lower rating than movies that do not pass. Therefore, passing the Bechdel test can be considered detrimental to a movie’s success.