Changes in Facial Shape throughout Pregnancy—A Computational Exploratory Approach
Round 1
Reviewer 1 Report
This manuscript details the results of an observational study measuring markers of facial attractiveness in the same women across the course of their pregnancy, once per semester. The main results demonstrate that sexual dimorphism is highest in trimester one, declining in the second and increasing slightly in the third. Facial asymmetry also follows a similar pattern. The results add to a body of knowledge that attractiveness perceptions seem to alter with pregnancy, presumably in response to the evolutionary function of attractiveness perception itself - an already pregnant female is unlikely to be a good option as a mate. The authors also discuss some very plausible mechanisms by which these changes could occur, in the form of adiposity and other hormonal changes.
I found much to like here. The question is well motivated and builds on previous work using composites, which are not well suited to answer this question. The measures of facial attributes are done well, and the use of Bayes factor approaches are a useful complement. The authors acknowledge the small sample size, but I commend them for taking advantage of the available resources and testing their hypotheses here.
There are two main concerns I have about the manuscript which need to be addressed. The first is statistical, and the second is more conceptual about what the current data tells us.
- My main statistical concern is around the interpretation of these analyses with a small sample. There's nothing inherently wrong with small samples and again I think the authors have done a good job with the available resources, but its worth thinking some more about the available data. Usually, the sample mean is a good approximation of a population mean, given enough data, but in smaller samples it is likely inaccurate. The current statistical tests rely on mean estimates to draw conclusions (which are technically maximum-likelihood estimates), and the use of Bayes factors does not really help as they use the same estimate. The authors have very kindly provided the data, so I dove in and took a look. We can use Bayesian estimation approaches to incorporate some uncertainty in the estimation of the means themselves and their differences. For the dimorphism data, I used a mixed-model with random intercepts for each female, and two dummy-coded predictors (trimester 2 - trimester 1; trimester 3 - trimester 1). The prior for these predictors was from a normal distribution with mean zero and a standard deviation of .10. The authors may disagree with this prior, but my reasoning was that on the scale of dimorphism units, this prior represents a sensible idea of what we might expect - consider that the standard deviation across the whole dimorphism range is about 0.47. This is probably not what the actual deviation is, but we shouldn't expect a trimester change to affect dimorphism by about .50 (about 1SD) units. A normal distribution with an SD of .10 means that about a quarter change in dimorphism is likely, and about a half SD change is possible but not extremely likely. If pregnancy changed dimorphism by over a standard deviation change then it would have an 'interocular trauma test' style effect (i.e. its so obvious it hits you right between the eyes). I've attached a plot here of the T1 and T2 dimorphism posterior distributions of the estimates of the means; and another highlighting their difference with trimester 1. Its clear that T3 is not different from T1; but the difference between T2 and T1 is probably greater than zero, on average about 0.14 units (about a third of an SD). This around 40% smaller than the original difference observed in the study (~0.35 units) which was around three quarters of an SD, which is really large. Is this a meaningful difference? This is the crux of my next point.
- What I think would be a really useful addition, though would sadly require some more data, is to see whether these differences in facial appearance track changes in attractiveness perceptions. The Danel study showed composites have different attractiveness ratings across trimesters, but this is confounded perfectly with individuals who go into the composites. Here the authors demonstrate small changes in appearance, but the final piece of the puzzle is missing here - do people actually rate these individual faces differently in terms of attractiveness? The sample may be small but a group of raters providing attractiveness ratings would give a really nice final piece of the puzzle here and a really stringent test of the hypothesis, especially as the role of symmetry, dimorphism, and averageness, as measured here, is actually really small when predicting attractiveness (see Holzleitner et al., 2020 - JEPHPP; Jones & Jaeger, 2019 - Symmetry).
Minor points:
1. Why not take the averageness metric from the larger sample of faces used to compute sexual dimorphism? Distance from an average of 12 or so suffers the same problems as a simple point estimate; its hard to know whether the average is a representation or not.
I enjoyed this paper and think the dataset is interesting, and commend the authors on their careful approach here. Im not convinced the effects are a) as large as they report due to small sample sizes (and whether this is meaningful), and b) I think a really important next step is having the faces rated for attractiveness to see whether these changes track a psychological perception. I hope the authors don't find this review too critical as I like the work a lot and think it can answer an important evolutionary hypothesis; just not in its present form. I'd be happy to share code etc if it is helpful.
Comments for author File: Comments.pdf
Author Response
This manuscript details the results of an observational study measuring markers of facial attractiveness in the same women across the course of their pregnancy, once per semester. The main results demonstrate that sexual dimorphism is highest in trimester one, declining in the second and increasing slightly in the third. Facial asymmetry also follows a similar pattern. The results add to a body of knowledge that attractiveness perceptions seem to alter with pregnancy, presumably in response to the evolutionary function of attractiveness perception itself - an already pregnant female is unlikely to be a good option as a mate. The authors also discuss some very plausible mechanisms by which these changes could occur, in the form of adiposity and other hormonal changes.
I found much to like here. The question is well motivated and builds on previous work using composites, which are not well suited to answer this question. The measures of facial attributes are done well, and the use of Bayes factor approaches are a useful complement. The authors acknowledge the small sample size, but I commend them for taking advantage of the available resources and testing their hypotheses here.
There are two main concerns I have about the manuscript which need to be addressed. The first is statistical, and the second is more conceptual about what the current data tells us.
- My main statistical concern is around the interpretation of these analyses with a small sample. There's nothing inherently wrong with small samples and again I think the authors have done a good job with the available resources, but its worth thinking some more about the available data. Usually, the sample mean is a good approximation of a population mean, given enough data, but in smaller samples it is likely inaccurate. The current statistical tests rely on mean estimates to draw conclusions (which are technically maximum-likelihood estimates), and the use of Bayes factors does not really help as they use the same estimate. The authors have very kindly provided the data, so I dove in and took a look. We can use Bayesian estimation approaches to incorporate some uncertainty in the estimation of the means themselves and their differences. For the dimorphism data, I used a mixed-model with random intercepts for each female, and two dummy-coded predictors (trimester 2 - trimester 1; trimester 3 - trimester 1). The prior for these predictors was from a normal distribution with mean zero and a standard deviation of .10. The authors may disagree with this prior, but my reasoning was that on the scale of dimorphism units, this prior represents a sensible idea of what we might expect - consider that the standard deviation across the whole dimorphism range is about 0.47. This is probably not what the actual deviation is, but we shouldn't expect a trimester change to affect dimorphism by about .50 (about 1SD) units. A normal distribution with an SD of .10 means that about a quarter change in dimorphism is likely, and about a half SD change is possible but not extremely likely. If pregnancy changed dimorphism by over a standard deviation change then it would have an 'interocular trauma test' style effect (i.e. its so obvious it hits you right between the eyes). I've attached a plot here of the T1 and T2 dimorphism posterior distributions of the estimates of the means; and another highlighting their difference with trimester 1. Its clear that T3 is not different from T1; but the difference between T2 and T1 is probably greater than zero, on average about 0.14 units (about a third of an SD). This around 40% smaller than the original difference observed in the study (~0.35 units) which was around three quarters of an SD, which is really large. Is this a meaningful difference? This is the crux of my next point.
We greatly appreciate the remarkable effort that the Reviewer made to examine our data. We are also very grateful for all the clear explanations of the analysis and the code that helped us understand the Bayesian mixed-effects modelling and interpret the Reviewer’s interpretation. It appears that, in general, the main conclusions coming from the provided materials supports our current results. However, as the Reviewer suggested, the magnitudes of the effects should be carefully evaluated and discussed with caution. We fully agree with the Reviewer. In fact, as the MS title already suggests, our study is only exploratory and could serve as a starting point for future studies involving considerably larger sample sizes. As we explained in the MS, we are also aware that our analyses did not allow for in-depth interpretations and generalizations. However, by using two different statistical approaches, we believe that we have demonstrated that pregnancy may affect morphological markers of facial attractiveness and that our data (to some extent) support this hypothesis. This effect is in line with our previous research and may explain the shifts in facial attractiveness during pregnancy observed on a different sample of pregnant women.
Nonetheless, being aware of the limitations of our study and referring to the main message from the review, in the current version of the MS, we explicitly highlighted the preliminary and explorative nature of our results and advised to interpret them with caution. These reservations are especially important since we cannot directly test the actual ecological importance of the observed effects (as suggested in point 2). We have adjusted the phrasing throughout the manuscript accordingly.
- What I think would be a really useful addition, though would sadly require some more data, is to see whether these differences in facial appearance track changes in attractiveness perceptions. The Danel study showed composites have different attractiveness ratings across trimesters, but this is confounded perfectly with individuals who go into the composites. Here the authors demonstrate small changes in appearance, but the final piece of the puzzle is missing here - do people actually rate these individual faces differently in terms of attractiveness? The sample may be small but a group of raters providing attractiveness ratings would give a really nice final piece of the puzzle here and a really stringent test of the hypothesis, especially as the role of symmetry, dimorphism, and averageness, as measured here, is actually really small when predicting attractiveness (see Holzleitner et al., 2020 - JEPHPP; Jones & Jaeger, 2019 - Symmetry).
We entirely agree that the natural continuation of our study would be to test if changes in facial morphology are perceptible and affect the perception of facial attractiveness. Unfortunately, we do not have appropriate data to test this hypothesis directly (we did not obtain permission to publish or present photographs of individual faces). Our previous study was based on a different set of pictures used to construct composite portraits, and only the composites were assessed for attractiveness. However, despite different materials and study designs (currently: longitudinal, previously: cross-sectional), our results are in line with the outcomes from previous studies, and we refer to these studies throughout the MS. Moreover, in the current version of the MS, we have directly suggested (Further direction) that future studies should test if the changes in facial morphology during pregnancy affect the perception of women’s facial attractiveness. One entire paragraph has been dedicated to this issue.
Minor points:
- Why not take the averageness metric from the larger sample of faces used to compute sexual dimorphism? Distance from an average of 12 or so suffers the same problems as a simple point estimate; its hard to know whether the average is a representation or not.
In our study, we were interested in the relative measure of averageness. In our opinion changing the reference sample will not change the interpretation of an individual averageness level. More average faces will be in the same position in relation to the other faces from the study sample regardless of the reference point, i.e. the further the face is from an average based on the measured sample, the further it will be from any other average. If anything, we imagine that the distance from any other average would be greater than the distance from the average based on “our” 12 faces.
I enjoyed this paper and think the dataset is interesting, and commend the authors on their careful approach here. Im not convinced the effects are a) as large as they report due to small sample sizes (and whether this is meaningful), and b) I think a really important next step is having the faces rated for attractiveness to see whether these changes track a psychological perception. I hope the authors don't find this review too critical as I like the work a lot and think it can answer an important evolutionary hypothesis; just not in its present form. I'd be happy to share code etc if it is helpful.
We are very grateful for this meticulous and constructive review. As we mention before, we agree with the points suggested by the reviewer, but unfortunately we have limited possibilities to actually address them (as for example by including judgement of real faces). We are also very grateful for the code provided by the reviewer. Although python is new for us, we did our best to conduct additional analysis to see for ourselves how the results are being affected (you can see the code and our rational in the attached pdf file). As the general direction of the results stays the same, and as we discussed in detail in the response to Point 1, we have changed the conclusions throughout the MS to underline the exploratory character of the entire study.
Author Response File: Author Response.pdf
Reviewer 2 Report
The research report shows how the facial sex dimorphism of pregnant women decreases in the second semester compared to the first semester of pregnancy, while facial averageness and symmetry do not decrease.
The manuscript is a well-written, high quality research report, with well-chosen methodology, appropriate statistical analysis and moderate conclusions. However, I think it is important to make explicit in the Discussion section that the results confirm previous functional/evolutionary interpretations of female beauty. Since the pregnant woman cannot be re-fertilized during this period, sexual attractiveness (gender dimorphic features on the face) is not relevant information. This research illustrates that female attractiveness as a cue works according to evolutionary logic (I don't think it is necessary to write at length, 1-2 sentences are sufficient, and you can choose from existing references for reference. The point is to make this idea explicit and visible in the text.)
Minor errors:
Page 3 Line 114 | Perhaps a reference to this term “Frankfurt plane” would be good.
Page 5 Line 223 | The word “concert” is rather “concern”.
Author Response
The manuscript is a well-written, high quality research report, with well-chosen methodology, appropriate statistical analysis and moderate conclusions. However, I think it is important to make explicit in the Discussion section that the results confirm previous functional/evolutionary interpretations of female beauty. Since the pregnant woman cannot be re-fertilized during this period, sexual attractiveness (gender dimorphic features on the face) is not relevant information. This research illustrates that female attractiveness as a cue works according to evolutionary logic (I don't think it is necessary to write at length, 1-2 sentences are sufficient, and you can choose from existing references for reference. The point is to make this idea explicit and visible in the text.)
Response: One paragraph in the discussion has been now changed to accommodate this idea: „Results obtained with this computation approach can be perceived as a support for the previous evolutionary interpretation of female beauty. Since the current conception probability for a pregnant woman is zero, sexual attractiveness (including facial di-morphic features), being a costly feature to maintain, is not crucial to be preserved in that moment. Ultimately, the results illustrate that female attractiveness as a cue to current fertility could follow the evolutionary logic. Proximately, the decrease of facial sexual dimorphism, could stem from fluctuations in fat percentage throughout pregnancy and/or in levels of steroid hormones. It cannot be ruled out that the observed final change in shape is a by-product of the physiological changes related to pregnancy.”
Minor errors:
Page 3 Line 114 | Perhaps a reference to this term “Frankfurt plane” would be good.
Response: we added an extended definition: “Frankfurt plane – position of the head most parallel to the surface of the earth, based on a plane passing through the inferior margin of the orbit and the upper margin of external auditory meatus.).”
Page 5 Line 223 | The word “concert” is rather “concern”.
Response: changed to “concordant”.
Round 2
Reviewer 1 Report
I find the manuscript improved a lot, thank you for responding to my review. I would just suggest the authors add some more to their discussion, that their effect sizes are almost certainly overstated given their small sample size.
Author Response
Thank you very much.
We have now added a sentence to the main text Conclusions saying: "Due to the small sample size however, it is possible that the effect sizes may be overstated."