Egg Study Redux: Correcting the Stats

In my last post, I criticized the recent study purporting to show that egg yolks increase atherosclerosis. After corresponding with the lead author, Dr. J. David Spence, I realize I made an error in the way I described the statistical analysis, partly due to my own hastiness and partly due to the lack of clarity in the original report.

In this post, I’d like to take a stroll through the authors’ arguments step by step, pointing out the strengths and limitations of each argument, and pointing out where I made my own errors.

The authors make three statistical arguments:

  • Atherosclerosis increases linearly with age, but increases exponentially with egg-yolk years. The exponential increase seen with egg-yolk years resembles that seen with pack-years of smoking.
  • After adjusting for age, those who consumed more than three eggs per week had more atherosclerosis than those who consumed less than two eggs per week.
  • After adjusting for sex, total cholesterol, systolic blood pressure, body mass index, and smoking, “egg-yolk years” predicted atherosclerosis in a multiple linear regression model.

Eggs and the Exceptionally “Exponential” Curve

Let’s take the first argument. Here the authors show that atherosclerosis increases roughly in a straight line with age, in both males and females:


Here the authors show that with increasing “egg-yolk years,” atherosclerosis increases not in a straight line, but according to an “exponential” curve:


The authors then compare this to a similar “exponential” curve relating atherosclerosis to pack-years of smoking. Here is their conclusion:

The exponential nature of the increase in [total plaque area] by quintiles of egg consumption follows a similar pattern to that of cigarette smoking. The effect of the upper quintile of egg consumption was equivalent in terms of atheroma development to 2/3 of the effect of the upper quintile of smoking. In view of the almost unanimous agreement on the damage caused by smoking, we believe our study makes it imperative to reassess the role of egg yolks, and dietary cholesterol in general, as a risk factor for CHD.

The subtle argument seems to be that if the increase with age is linear, but the increase with “egg-yolk years” is exponential, then the increase with “egg-yolk years” must not be due to age alone. Thus, it reflects “egg consumption,” and they promptly begin using this phrase as if it is interchangeable with “egg-yolk years” once the discussion commences.

There’s just one problem: “egg-yolk years” is a composite measurement that includes age (more precisely, how many years the person had been consuming eggs) and the number of whole eggs eaten per week (which they call “yolks” because they wish to blame cholesterol).

Before we let age off the hook, let’s take a look at the relation between age and egg-yolk years. After all, a second grader might predict that age would increase linearly with age, and in fact may correlate perfectly with its own self, but if age is the culprit lurking behind the shadows of “egg-yolk years,” there’s no reason to assume we would find the proof in a perfectly linear pudding.

In other words, if age is the only thing that matters, and it increases along a curve with “egg-yolk years,” then the increase in plaque with increasing “egg-yolk years” should follow a similar curve rather than a straight line. In such a case, the curve would hardly suggest that something more than age were operating.

I don’t know very much about curve fitting, so I’ll avoid any detailed analysis and I look forward to any criticisms that readers more experienced in this area would like to leave in the comments.

I used Microsoft Excel to judge the fit between the dots and various types of lines, and I used Graphpad Prizm to make the graphs. The error bars look smaller in my graphs than in those of the original paper because they represent standard error in mine rather than standard deviation.

Here are the dots with no line:


As we can see below, the relationship fits only half-decently to a straight line, with about 84 percent agreement between the dots and the line:


It doesn’t fit any better to an exponential curve, with only 84 percent agreement:


This is unsurprising, since exponential curves fit well when the data values are rising or falling at an increasingly greater rate. On the contrary, these values are going nowhere until the last two quintiles, and the first of these two jumps is greater in size than the second.

It fits perfectly to a fourth-order polynomial, which is characterized by three major hills or valleys:


And it fits quite nicely to a simpler second-order polynomial, which is characterized by one major hill or valley, with about 95 percent agreement between the dots and the line:


Now let’s take a look at the egg component of “egg-yolk years.” Here are the dots ready to be connected:


I suppose you can get anything with five dots to fit a fourth-order polynomial perfectly:


But a simple straight line fits this graph just as nicely as the second-order polynomial fit the graph for age, with the dots in roughly 95 percent agreement to the line:


The straight line fits better than an exponential curve, where the agreement between the dots and the line is only about 90 percent:


A second-order polynomial fits it slightly better than a straight line, with 97 percent agreement, but even here there is only a slight curvature in the line:


I’ll let the statistics buffs have the last word in the comments, but I think it would be fair to say that neither graph is exponential, and that the egg graph is approximately linear.

Let’s look at which curve corresponds more closely to the increase in plaque. Here are the dots:


Just as with the curve for age, the dots fit only half-decently to a straight line, with about 85 percent agreement:


It fits an exponential curve only slightly better, with 89 percent agreement:


We’ll see below that the exponential curve isn’t the best fit. This is not terribly surprising. Just like the graph for age, the values seem almost flat at first and then surge in the last two quintiles, rather than steadily rising at an increasingly greater rate.

Just as with both of the previous graphs, the dots converge on a second-order polynomial line quite nicely, in this case with about 98 percent agreement:


So let’s juxtapose the plaque curve against the curves for eggs and age, and see which makes a better match. I’ll use the second-order polynomial curves for each of them.

Let’s look at eggs first:


The match isn’t terrible, but there’s quite a big gap between the curves.

Now we’ll look at age:


Almost a perfect match!

I’m not sure if this exercise is much more productive than playing a game when it comes to uncovering the actual cause-and-effect dynamic hiding behind these relationships. I do think, however, that it undermines the first argument from the original report: the “exponential,” or at least curvilinear, nature of the relationship between plaque accumulation and “egg-yolk years” is hardly evidence that something more is happening than folks are accumulating plaque as they get older. Indeed, age may be the boogeyman lurking behind the shadows of “egg-yolk years.”

Of course,  the only way to assess the independent contribution of eggs would be to look at eggs, rather than “egg-yolk years.” This brings us to their second argument.

Does Atherosclerosis Increase With Eggs Per Week?

Previously, I had shown this graph, which depicts the very small and statistically insignificant difference in plaque area between those consuming less than two eggs per week and those consuming more than three:


And here is where I made my first blunder. I stated that this analysis was never adjusted for age. I was wrong. I misunderstood the original report as meaning that the multiple regression analysis I’ll discuss in the next section was adjusted for age. After discussing this with Dr. Spence, I realize that this was the only analysis adjusted for age, using a technique called analysis of covariance, where eggs per week was entered as the independent variable and age as the covariate. This made the results statistically significant at P<0.0001.

Unfortunately, the report does not describe the methods in much detail. To use this method of adjustment, the data must satisfy two key assumptions:

  • Eggs per week must not be correlated with age.
  • The slope of the line relating plaque accumulation to age within the low-egg group must be parallel to the slope of the same line in the high-egg group.

I was able to learn from Dr. Spence that although older people tended to consume fewer eggs, the relationship was not statistically significant. Thus, the first assumption was satisfied. I was unfortunately unable to learn whether the second assumption was satisfied.

The importance of satisfying the second assumption might become clearer if we take a look at a graphical depiction of the procedure. This is my own graph, adapted from Figure 11.7 in Statistical Methods in Medical Research:


The two black dots represent the unadjusted data points. Those who consumed more eggs were slightly younger and had slightly more plaque. Since the slopes of the two lines are parallel in this hypothetical example, we could determine their distance at any point for a given age and this would be the age-adjusted difference between the two groups. The simplest way to do this would be to meet in the middle of the two data points. Thus, the dotted line I drew represents the estimated difference between the two groups if the mean age in both groups were just over 61.

This type of analysis breaks down completely if the lines aren’t parallel. If we consider the possibility that consuming eggs can affect plaque accumulation, then we should consider the possibility that consuming eggs can affect the relationship between plaque and age. If it does, the lines would not be parallel. Imagine, for example, that eating more eggs decreases the rate that plaque accumulates with age:


Where should I draw the dotted line? The length of the dotted line, representing the “adjusted” difference between the two groups, would be different depending on where I drew it. Over the age of 65, it would even turn the results on their head and show that plaque was slightly lower among people who ate more eggs. This type of adjustment would be meaningless, and this is the reason the lines must be parallel to perform it. Since the authors do not disclose whether the assumption of parallel lines was met, we have no idea if the adjustment for age was accurate.

In my opinion, this comparison should be ignored unless the authors offer further details in the future, perhaps in response to letters to the editor in the journal.

That Good Ol’ Multiple Regression Analysis

I had previously stated that the multiple regression analysis was adjusted for age, but it was not. The analysis was adjusted for sex, total cholesterol, systolic blood pressure, body mass index, and pack-years of smoking. “Egg-yolk years” predicted atherosclerosis independently of these other factors. I learned from Dr. Spence that this model was not adjusted for age because age is incorporated into “egg-yolk years” and pack-years of smoking.

Here’s the problem: Why should we attribute the association with “egg-yolk years” to egg yolks rather than to age? As far as I can tell, there is no reason at all.


After having corrected the errors I made in my previous post, I am even less convinced that this study shows anything other than that people develop more plaque as they get older.

Read more about the author, Chris Masterjohn, PhD, here.

© 2015 The Weston A. Price Foundation for Wise Traditions in Food, Farming, and the Healing Arts.