Primary Outcomes (explanation)
The first outcome tells us about students’ preference for attending the college-department with a particular attribute, and the second to fifth outcomes tell us about the underlying mechanism of the preference.
MAIN HYPOTHESES
H1: female students prefer to attend a college-department with a higher female student ratio
H2: female students prefer to attend a college-department with a higher female student ratio conditional on it being a STEM college-department
H3: a higher female student ratio is more important at a STEM college-department than at a non-STEM college-department for female students’ preference to attend it
SPECIFICATIONS
In all the specifications, we use female students' data only and estimate the model via OLS. We cluster standard errors at the student level and use the 5% level (two-sided) as our significance level cutoff. Also, in all the specifications, we take the difference between the right and the left college-department attributes, including the interaction terms. Below, we simply refer to the differenced attributes as attributes for brevity.
As the first specification, we regress each of the outcomes on the attributes. The college names are not included in the regressors, and the departments are aggregated into a single indicator variable equal to 1 if it is a STEM department and 0 otherwise before taking the difference. Our interest is (i) the coefficient on the female student ratio that tells us whether female students prefer to attend a college-department with a higher female student ratio (H1).
As the second specification, we augment the first specification and add to the regressor the interaction between the female student ratio and the STEM department dummy. Our interests are (ii) the sum of the coefficient estimate on the female student ratio and the coefficient on the interaction term between the STEM department and the female student ratio that tells us whether female students prefer to attend a college-department with a higher female student ratio even if it is a STEM department (H2); and (iii) the coefficient on the interaction term between the STEM department and the female student ratio that tells us whether a higher female student ratio is more important at a STEM college-department than at a non-STEM college-department for female students’ preference to attend it (H3).
As robustness checks, we augment the first specification and (a) interact the female student ratio with all the other attributes and (b) interact the STEM department dummy with all the other attributes. The former addresses the possibility that the female student ratio interacts with other attributes. The latter addresses the possibility that the STEM department interacts with other attributes.
Below, we consider as our main specifications the first specification for H1 and the second specification for H2 and H3 unless otherwise indicated.
EFFECTS ON MALE STUDENTS
We investigate whether the student gender ratio is a determinant of men’s preference for attending a college-department in general (that corresponds to H1), attending a STEM college-department (that corresponds to H2), and more so for a STEM college department (that corresponds to H3). To answer these questions, we run the same analyses outlined in the primary outcomes section but using male students' data.
We also test whether female students are more responsive to the student gender ratio than male students by pooling female and male students’ data and interacting all the regressors with an indicator variable that equals 1 if a student is female, 0 otherwise.
WILLINGNESS TO PAY
For all the coefficients of interest – (i) the coefficient on the female student ratio, (ii) the sum of the coefficient estimate on the female student ratio and the coefficient on the interaction term between the STEM department and the female student ratio, and (iii) the coefficient on the interaction term between the STEM department and the female student ratio – we calculate female students’ willingness to pay (WTP) to have 10 percentage points higher female student ratio in terms of the school selectivity index. We calculate the WTP by dividing each of the coefficient estimates of interests by the coefficient estimate on the school selectivity index a la Gallen and Wasserman (2023 JPubE). We divide the WTP by 10 to express it in terms of 10 percentage points higher ratio. We do this exercise separately for female students and male students.
CHANGES IN THE APPLICANT POOL
To see how the applicant pool would change when we increase the female student ratio, we do the following simple exercise:
1. Estimate the model via OLS with all the attributes and the interaction term between the STEM department dummy and the female student ratio as the regressors and the choice being the outcome. Estimate this model separately for female and male students’ data and take into account the comparative advantage in mathematics and language.
2. Predict the probability of choosing to attend STEM for female and male students with different comparative advantages in mathematics and language by plugging the actual STEM female student ratio (drawn from Japan’s national statistics) and the actual STEM female student ratio + 10% (a counterfactual scenario) into the estimated models. We plug in the same values between the two scenarios generated within the experiment for other attributes.
3. Simulate choices under the actual and the counterfactual scenarios.
4. Compare the fraction of students who have a comparative advantage in mathematics and choose STEM vs. non-STEM.
NONLINEARITY
We investigate the potential non-linear effect of the female student ratio on students’ preferences by replacing the female student ratio (which is a linear continuous variable) with a set of indicator variables indicating the female student ratio being 0% to 10%, 11% to 20%, 21% to 30%,…, 91%-100% (discretize into 10 equally-spaced bins). Since this analysis is likely to be underpowered, we mainly look at the point estimates and consider the confidence intervals only suggestive. We also add quadratic forms of the female student ratio instead of these bins as an alternative specification. We do this exercise for all three coefficients of interest and separately for female students and male students.
DECOMPOSITION
We quantify the contribution of the four potential mechanisms to the students’ college-department preference by using Gong et al.’s (2021 JHR, section V.D.) decomposition method, which is based on Gelbach's (2016 JOLE) decomposition. We do this exercise for all three coefficients of interest and separately for female students and male students.
HETEROGENEITIES
We investigate the heterogeneities in the following dimensions by interacting them with all the regressors. We run these heterogeneity analyses separately with female students’ data and male students’ data:
1. Mother’s education: an indicator variable equals 1 if the mother has a bachelor’s degree or higher, 0 otherwise.
2. Parents’ education: an indicator variable equals 1 if both the mother and the father have a bachelor’s degree or higher, 0 otherwise.
3. Extra education: an indicator variable equals 1 if a student attends an above-median number of days in after-school supplementary classes, 0 otherwise.
4. Underestimation of the average STEM department gender ratio: an indicator variable equals 1 if a student’s belief about the average gender ratio in STEM departments is below the true value, 0 otherwise.
5. Competitiveness: an indicator variable equals 1 if a student’s competitiveness (measured with a 5-point Likert scale with 3 being neutral) is above 3, 0 otherwise.
6. Risk preference: an indicator variable equals 1 if a student’s risk preference (measured with a 5-point Likert scale with 3 being neutral) is above 3, 0 otherwise.
ADDITIONAL HYPOTHESES
H4: men are more inclined to STEM college-departments
H5: men are more inclined to STEM college-departments even conditional on their mathematics ability
H6: (if H5 is true) H5 is driven by men’s confidence in mathematics
H7: men are more inclined to selective college-departments
H8: men are more inclined to selective college-departments even conditional on their ability
H9: (if H8 is true) H8 is driven by men’s confidence in their ability
To test H4, we pool female and male students’ data and interact the STEM department dummy with an indicator variable for female students. The coefficient on the interaction term tells us whether female students are less inclined to STEM college-departments.
To test H5, we additionally add to the H4 specification an interaction term between the STEM department dummy and the students’ mathematics ability. The coefficient estimate on the interaction term between the indicator variable for female students and the STEM department dummy tells us whether female students are less inclined to STEM college-departments even after conditioning on their mathematics ability.
To test H6, we further add to the H5 specification an interaction term between the student’s confidence level about their mathematics ability and the STEM department dummy. For H6 to be true, (i) male students must be more confident in their mathematics ability, (ii) the coefficient estimate on the interaction term between the indicator variable for female students and the STEM department dummy must be 0, and (iii) the coefficient estimate on the interaction term between the student’s confidence level in their mathematics ability and the STEM department dummy must be positive.
We test H7, H8, and H9 in similar ways as the tests for H4, H5, and H6, but replace the STEM department dummy with the school selectivity index, the student’s mathematics ability with an arithmetic mean of the student’s language, mathematics, and English abilities, and the student’s confidence level in mathematics ability with an arithmetic mean of the confidence level in language, mathematics, and English abilities.