In my last post, I criticized the recent study purporting to show that egg yolks increase atherosclerosis. After corresponding with the lead author, Dr. J. David Spence, I realize I made an error in the way I described the statistical analysis, partly due to my own hastiness and partly due to the lack of clarity in the original report.

In this post, I’d like to take a stroll through the authors’ arguments step by step, pointing out the strengths and limitations of each argument, and pointing out where I made my own errors.

The authors make three statistical arguments:

- Atherosclerosis increases linearly with age, but increases exponentially with egg-yolk years. The exponential increase seen with egg-yolk years resembles that seen with pack-years of smoking.
- After adjusting for age, those who consumed more than three eggs per week had more atherosclerosis than those who consumed less than two eggs per week.
- After adjusting for sex, total cholesterol, systolic blood pressure, body mass index, and smoking, “egg-yolk years” predicted atherosclerosis in a multiple linear regression model.

## Eggs and the Exceptionally “Exponential” Curve

Let’s take the first argument. Here the authors show that atherosclerosis increases roughly in a straight line with age, in both males and females:

Here the authors show that with increasing “egg-yolk years,” atherosclerosis increases not in a straight line, but according to an “exponential” curve:

The authors then compare this to a similar “exponential” curve relating atherosclerosis to pack-years of smoking. Here is their conclusion:

The exponential nature of the increase in [total plaque area] by quintiles of egg consumption follows a similar pattern to that of cigarette smoking. The effect of the upper quintile of egg consumption was equivalent in terms of atheroma development to 2/3 of the effect of the upper quintile of smoking. In view of the almost unanimous agreement on the damage caused by smoking, we believe our study makes it imperative to reassess the role of egg yolks, and dietary cholesterol in general, as a risk factor for CHD.

The subtle argument seems to be that if the increase with age is linear, but the increase with “egg-yolk years” is exponential, then the increase with “egg-yolk years” must not be due to age alone. Thus, it reflects “egg consumption,” and they promptly begin using this phrase as if it is interchangeable with “egg-yolk years” once the discussion commences.

There’s just one problem: “egg-yolk years” is a composite measurement that includes age (more precisely, how many years the person had been consuming eggs) and the number of whole eggs eaten per week (which they call “yolks” because they wish to blame cholesterol).

Before we let *age* off the hook, let’s take a look at the relation between age and egg-yolk years. After all, a second grader might predict that *age* would increase linearly with *age, *and in fact may correlate perfectly with its own self, but if age is the culprit lurking behind the shadows of “egg-yolk years,” there’s no reason to assume we would find the proof in a perfectly linear pudding.

In other words, if age is the only thing that matters, and it increases along a curve with “egg-yolk years,” then the increase in plaque with increasing “egg-yolk years” should follow a similar curve rather than a straight line. In such a case, the curve would hardly suggest that something more than age were operating.

I don’t know very much about curve fitting, so I’ll avoid any detailed analysis and I look forward to any criticisms that readers more experienced in this area would like to leave in the comments.

I used Microsoft Excel to judge the fit between the dots and various types of lines, and I used Graphpad Prizm to make the graphs. The error bars look smaller in my graphs than in those of the original paper because they represent standard error in mine rather than standard deviation.

Here are the dots with no line:

As we can see below, the relationship fits only half-decently to a straight line, with about 84 percent agreement between the dots and the line:

It doesn’t fit any better to an exponential curve, with only 84 percent agreement:

This is unsurprising, since exponential curves fit well when the data values are rising or falling at an increasingly greater rate. On the contrary, these values are going nowhere until the last two quintiles, and the first of these two jumps is greater in size than the second.

It fits perfectly to a fourth-order polynomial, which is characterized by three major hills or valleys:

And it fits quite nicely to a simpler second-order polynomial, which is characterized by one major hill or valley, with about 95 percent agreement between the dots and the line:

Now let’s take a look at the *egg* component of “egg-yolk years.” Here are the dots ready to be connected:

I suppose you can get anything with five dots to fit a fourth-order polynomial perfectly:

But a simple straight line fits this graph just as nicely as the second-order polynomial fit the graph for age, with the dots in roughly 95 percent agreement to the line:

The straight line fits better than an exponential curve, where the agreement between the dots and the line is only about 90 percent:

A second-order polynomial fits it slightly better than a straight line, with 97 percent agreement, but even here there is only a slight curvature in the line:

I’ll let the statistics buffs have the last word in the comments, but I think it would be fair to say that neither graph is exponential, and that the egg graph is approximately linear.

Let’s look at which curve corresponds more closely to the increase in plaque. Here are the dots:

Just as with the curve for age, the dots fit only half-decently to a straight line, with about 85 percent agreement:

It fits an exponential curve only slightly better, with 89 percent agreement:

We’ll see below that the exponential curve isn’t the best fit. This is not terribly surprising. Just like the graph for age, the values seem almost flat at first and then surge in the last two quintiles, rather than steadily rising at an increasingly greater rate.

Just as with both of the previous graphs, the dots converge on a second-order polynomial line quite nicely, in this case with about 98 percent agreement:

So let’s juxtapose the *plaque* curve against the curves for *eggs* and *age*, and see which makes a better match. I’ll use the second-order polynomial curves for each of them.

Let’s look at *eggs* first:

The match isn’t terrible, but there’s quite a big gap between the curves.

Now we’ll look at *age*:

Almost a perfect match!

I’m not sure if this exercise is much more productive than playing a game when it comes to uncovering the actual cause-and-effect dynamic hiding behind these relationships. I do think, however, that it undermines the first argument from the original report: the “exponential,” or at least curvilinear, nature of the relationship between plaque accumulation and “egg-yolk years” is hardly evidence that something more is happening than folks are accumulating plaque as they get older. Indeed, *age* may be the boogeyman lurking behind the shadows of “egg-yolk years.”

Of course, the only way to assess the independent contribution of eggs would be to look at eggs, rather than “egg-yolk years.” This brings us to their second argument.

## Does Atherosclerosis Increase With Eggs Per Week?

Previously, I had shown this graph, which depicts the very small and statistically insignificant difference in plaque area between those consuming less than two eggs per week and those consuming more than three:

And here is where I made my first blunder. I stated that this analysis was never adjusted for age. I was wrong. I misunderstood the original report as meaning that the multiple regression analysis I’ll discuss in the next section was adjusted for age. After discussing this with Dr. Spence, I realize that * this* was the only analysis adjusted for age, using a technique called

*analysis of covariance*, where

*eggs per week*was entered as the independent variable and

*age*as the covariate. This made the results statistically significant at

*P*<0.0001.

Unfortunately, the report does not describe the methods in much detail. To use this method of adjustment, the data must satisfy two key assumptions:

- Eggs per week must not be correlated with age.
- The slope of the line relating plaque accumulation to age within the low-egg group must be parallel to the slope of the same line in the high-egg group.

I was able to learn from Dr. Spence that although older people tended to consume fewer eggs, the relationship was not statistically significant. Thus, the first assumption was satisfied. I was unfortunately unable to learn whether the second assumption was satisfied.

The importance of satisfying the second assumption might become clearer if we take a look at a graphical depiction of the procedure. This is my own graph, adapted from Figure 11.7 in *Statistical Methods in Medical Research*:

The two black dots represent the unadjusted data points. Those who consumed more eggs were slightly younger and had slightly more plaque. Since the slopes of the two lines are parallel in this hypothetical example, we could determine their distance at any point *for a given age* and this would be the age-adjusted difference between the two groups. The simplest way to do this would be to meet in the middle of the two data points. Thus, the dotted line I drew represents the estimated difference between the two groups if the mean age in both groups were just over 61.

This type of analysis breaks down completely if the lines aren’t parallel. If we consider the possibility that consuming eggs can affect plaque accumulation, then we should consider the possibility that consuming eggs can affect the relationship between plaque and age. If it does, the lines would not be parallel. Imagine, for example, that eating more eggs decreases the rate that plaque accumulates with age:

Where should I draw the dotted line? The length of the dotted line, representing the “adjusted” difference between the two groups, would be different depending on where I drew it. Over the age of 65, it would even turn the results on their head and show that plaque was slightly *lower* among people who ate more eggs. This type of adjustment would be meaningless, and this is the reason the lines must be parallel to perform it. Since the authors do not disclose whether the assumption of parallel lines was met, we have no idea if the adjustment for age was accurate.

In my opinion, this comparison should be ignored unless the authors offer further details in the future, perhaps in response to letters to the editor in the journal.

## That Good Ol’ Multiple Regression Analysis

I had previously stated that the multiple regression analysis was adjusted for age, but it was not. The analysis was adjusted for sex, total cholesterol, systolic blood pressure, body mass index, and pack-years of smoking. “Egg-yolk years” predicted atherosclerosis independently of these other factors. I learned from Dr. Spence that this model was not adjusted for age because age is incorporated into “egg-yolk years” and pack-years of smoking.

Here’s the problem: Why should we attribute the association with “egg-yolk years” to egg yolks rather than to age? As far as I can tell, there is no reason at all.

## Conclusions

After having corrected the errors I made in my previous post, I am even less convinced that this study shows anything other than that people develop more plaque as they get older.

Read more about the author, Chris Masterjohn, PhD, here.

I’m not convinced either. Most importantly to me, the study did not control for toast eaten with all those eggs.

or the jam on that toast. Way too many possibilities.

The biggest drawback that hinders the authors’ conclusion is the data itself. Playing statistical games is exactly that. How can anyone believe that a person will recall their consumption of a specific food over the time-frame indicated and expect any accuracy. Truly amazing that researchers get to publish this crap – I guess their “peers” are tainted as well.

Chris,

I just read the numbers off your first graph and played around with them myself. I believe the second-order polynomial is not significantly better than the first-order polynomial (straight line). In my fit, the x^2 term was not significant. The higher-order polynomial is of course overfitted. I’m not quite sure why you included it.

Curiously, the linear fit and the exponential fit seem to perform nearly equally well. I didn’t do a formal hypothesis test of one model against the other, but the R^2 is basically the same, so we have little reason to conclude one model fits better than the other. A quick test to see if data follow an exponential relationship is to log-transform the y axis and check if the relationship becomes more linear. In your age vs. quintile-of-egg-yolk-years plot, the plot looks basically unchanged when log-transforming age, hence the exponential model is not much better than the linear model.

For the analysis of covariance, it would be easy to test for an interaction term if the raw data were available. Do you have the raw data?

Finally, since age and egg-yolk years are highly correlated, it is statistically correct to not include both variables in the multiple regression. However, that also means that we cannot statistically disentangle their separate contributions.

In general, though, I agree. It looks like age is the underlying causal variable.

Claus Wilke

University of Texas at Austin

Hi Claus,

Thank you so much for your comments! I didn’t do any formal statistical tests. As I stated, I don’t have any experience curve fitting and I didn’t want to pretend to be capable of a formal analysis with the potential to make mistakes. And like you indicated, I’m not sure how productive it is to play around with quintiles like this anyway, so I’m not sure it justifies the effort of a formal analysis. I was simply evaluating what the authors did, which was to simply state that plaque versus yolk-years fit an exponential curve better than a linear curve, and to henceforth refer to the “exponential nature” of the effect of “egg consumption.”

The main purpose of drawing attention to the second-order polynomials was to draw attention to the facts that a) just because something doesn’t visibly increase until the later quintiles doesn’t make it “exponential,” and b) it was a good and simple fit for juxtaposing the various curves against one another to see which pair looked most similar.

I agree the fourth-order polynomial is over-fitted, and I mostly included it for fun. Like I said, you can get pretty much anything to perfectly fit something allowing so few data points to follow that many curves.

For the analysis of covariance, I agree. I would be surprised if they give you the raw data, but if they do, I’d love to hear what you find, so please write back. In my experience, Dr. Spence was extraordinarily forthcoming initially and very willing to re-analyze the data. However, when I asked him about the interaction term for the ANCOVA, he asked me how to do the analysis, so I forwarded him an SPSS tutorial on it (which is the program they used), and after that he stopped engaging me and never told me what the result was.

I agree with your assessment of the multiple regression. I didn’t mean to blame them for not including age, but was simply clarifying my previous misunderstanding.

Thanks so much for contributing!

Sincerely,

Chris

Hi Claus

I agree with your conclusion about the multiple regression. Age has been distributed into the yolk-years and pack-years terms, as I averred in the discussion to the last post. Thus, this model *implicitly* includes age but cannot *explicitly* control for it.

I think all the bases have been covered about these data. There’s no good reason to buy the argument that the response is exponential- let alone the inferences that the authors draw from that. As Chris showed, the response of yolk-years to age is identical.

Chris

edit: identical in seeming appearance (and therefore just as sound of an inference)

It’s frustrating to me, to read a report/conclusion regarding a “study” done on a food such as eggs (this article), when that food was not studied in it’s raw vs cooked/heated state.

Cholesterol in egg yolk has a different effect when it’s heated than when raw, as in oxidized from heat.

This includes meats, fats as well.

I have contacted the authors and requested the raw data. Playing around with quintiles is useless, we cannot really draw any conclusions from that.

I tend to be rather skeptical about ordinary multivariate regression and step-wise selection on high-dimensional data sets. All kinds of things can go wrong, in particular when predictors are collinear (as is the case here). It would be interesting to try some more robust methods, e.g. PCA or LASSO.

I’ll be curious to follow your results if the authors agree to let the data to you!

See my longer post below. I got the data but can’t comment on it.

Hi Claus,

I agree playing around with the quintiles is useless, but this is what the authors did in the paper. They never reported a formal statistical analysis of linearity, and their comments referred to the graph of increase by quintile, not to the relation between the individual data.

Other multivariate statistical methods are way out of my realm of expertise, far more than even lowly curve fitting. If you get your hands on the raw data, I’d certainly be interested in what you get with other methods!

Chris

I wonder if the participants were prompted to include pancakes, cakes, batter, etc. as eggs.

If so, much confounding material is included – trans fats, sugar, flour.

If not, the tally of egg yolks may be seriously inaccurate.

Yes, not to mention it may be difficult to recall things when you’ve just had a stroke or transient ischemic attack. The potential for confounding is endless here, but the association doesn’t seem to even be real anyway.

Chris

Dear Chris,

Why do studies seldom compare a raw food vs a cooked/heated food before they conclude how that food effect one’s health? In this case, consuming cooked eggs would effect health differently than consuming the eggs raw. Heating any cholesterol containing food can oxidize it’s cholesterol producing Oxy-cholesterol, which can be problematic to health.

Any comments?

Thanks

Garry

Hi Garry,

I agree with you that these things should be taken into account. I didn’t address that here because it seems the association is very unconvincing in itself. I would love to see high-quality studies that look at different ways of cooking eggs, and at raw eggs, with and without yolks. I tend to feel better when I eat raw egg yolks instead of cooked eggs.

Chris

Hi Chris,

I suppose a study comparing the effects of raw egg yolk vs cooked yolk would be rather

extensive. Oxy cholesterol, enzyme denaturing, nutrient reduction and such would need evaluating in order to access the effects on health, eh?

Thanks Chris,

Garry

Also Chris, are you saying the difference between a heated/cooked egg yolk and raw yolk is not covered because the difference is not measurable … that cooking an egg doesn’t change the integrity of it’s raw benefits?

Please clarify.

Thanks again,

Garry

Dr. Spence has graciously provided me with his data set, under the condition of confidentiality (which I respect). I have investigated the data set and have drawn my own conclusions, which I will not resent here. Instead, I will point out a few issues that I have with the statistical analysis as presented in the paper. First, I would like to emphasize two things, though: (a) These are my personal opinions. Other people might disagree. (b) Read through the references I provide and think for yourself.

1. Step-wise variable elimination is problematic, in particular when not subjected to bootstrapping and/or cross-validation. It is extremely easy to arrive at meaningless or misleading models. This issue is very well known, and entire books have been written on it (e.g., Harrell, Regression Modeling Strategies).

2. Multi-collinearity is a big problem when trying to identify relevant predictors. Traditional regression models fail under multi-collinearity.

3. Linear regression models assume that predictors are measured without error. Clearly, in an observational study, most predictor variables have just as much error as the response variable. Hence, linear regression models use an incorrect error structure, and this can result in misleading model predictions. (What I mean by incorrect error structure is explained e.g. here:

http://www.r-bloggers.com/principal-component-analysis-pca-vs-ordinary-least-squares-ols-a-visual-explanation/ )

Points 2 and 3 can be addressed at the same time by carrying out a principal component analysis (PCA). PCA treats all predictors equally and handles multi-collinearity without problem. However, traditional PCA does not do well with variable selection. It often produces components that are confusing mixtures of many variables, and whose interpretation is unclear. One way around this problem is sparse PCA (SPCA), using lasso/elastic net regularization

(http://users.stat.umn.edu/~zouxx019/software.html). Sparse PCA provides principal components in which most variables have zero loadings. The result is components that tend to have clear interpretation.

Incidentally, lasso regularization is also currently considered to be one of

the best available solutions to problem 1, variable selection. An alternative way to analyze the data would be to carry out a lasso-regularized regression. The advantage (over PCA) would be a clear distinction between predictors and response, at the cost of an incorrect error structure. I personally prefer PCA, but I would not object to a properly cross-validated regression model. It would probably be best to do both. If they don’t substantially agree, something is amiss.

4. It is not statistically sound to group data into quantiles and then try to make inferences based on the averages within these quantiles. Considering that the paper argues for the strong relationship between egg years and plaque area, it is notable that a scatter plot of one of these variables against the other is absent, as is a simple correlation coefficient. The dangers of grouping

data into quantiles are explained in detail here:

http://wilke.openwetware.org/Statistics_errors.html#Aggregation_by_quantiles_erroneously_amplifies_trend

In a nutshell, this practice can create seemingly strong relationships out of

rather weak ones.

I have made Dr. Spence aware of these concerns of mine.

Hi Claus,

Thank you so much for your comments! Very helpful!

Chris

Sorry, a typo: “which I will not *present* here”

Thank you Claus. I do want to point out that it is convention in some circles to include all main effects in the model when any of the higher order effects containing it are also included. Then again, the folks who do this are probably well aware of the pitfalls of regression in general. Thank you for addressing the problem of using regression with the quintiles, technically, a sub-optimal method, or at least one fraught with danger. It is just easy to find things that ain’t there.

That is correct. When you use a higher-order term, you have to include all relevant lower-order terms as well, otherwise the significance of your higher-order term doesn’t remain invariant under rescaling of the data. Did I say anything anywhere that would have suggested the contrary?

Nope, but the suggestion was somewhere around here.

Here is the problem Chris. If they had controlled for age, given the very high correlation between age and egg-yolk years (eyys), you’d have one of two things happen (or even both): (a) age and eyys would be collinear, which could distort the results to the point of making nonsensical; and/or (b) age would capture all of the variance (R-squared) in the variable measuring plaque, making the relationship between eyys and plaque disappear. Age and eyys seem to be too highly correlated (in a linear way) to be treated as separate variables (they are redundant, or collinear), regardless of the nonlinear relationships you discussed.

Hi Ned,

Thanks for your comments! I didn’t mean to suggest that they *should* have incorporated age into the multiple regression. I simply thought they *did* (which struck me as non-sensical at the time, and you pointed out the problem with collinearity in my first post), and then found out I was wrong so corrected myself in the second post.

I imagine you might agree that creating the egg-yolk years variable is deeply problematic in the first place? Most people’s answer to “how long have you been consuming eggs?” will be roughly equivalent to their age, perhaps minus a weaning period. It’s impossible to separate the effect of eggs from that of age. It seems it would have been better to include eggs and age separately in the multiple regression.

Chris

Using eyys in a MR analysis is problematic for the reason you pointed out in your previous post – eyys and age seem to measure the same thing, namely age. So, other measures should be used in a MR; eggs/w is not a very good one, but it is certainly better than eyys. And, eggs/w seems to be associated with decreased LDL cholesterol (!) and with what seems to be a protective moderating effect:

http://bit.ly/RbDcJH

Sorry, in my comment above I meant to say, at the end, that “eggs/w is not associated with LDL cholesterol” instead of “eggs/w seems to be associated with decreased LDL cholesterol”. Plaque, on the other hand, seems to be negatively associated with LDL cholesterol.

It seems obvious that age and egg years should be highly correlated, but if you think about it a little harder you realize this need not be the case.

Let’s make a thought experiment: According to the numbers in Table 2, mean eggs per week over the whole data set must be somewhere around 1.5 or so. We can assume that some people eat no eggs, and other people may eat 10 eggs or more per week. However, nobody eats a negative number of eggs. So let’s model egg consumption per week as an exponentially distributed variable with mean 1.5.

Also according to Table 2, mean age is somewhere around 60, with a standard distribution of around 15, give or take. Let’s assume age is normally distributed.

Based on these two assumptions, let’s generate random data, calculate eggyears, and see how eggyears correlate with age:

> eggspw = rexp(1000,1/1.5)

> age = rnorm(1000, mean=60, sd=15)

> cor.test(age, eggspw*age)

Pearson’s product-moment correlation

data: age and eggspw * age

t = 7.3928, df = 998, p-value = 3.038e-13

alternative hypothesis: true correlation is not equal to 0

95 percent confidence interval:

0.1682440 0.2858162

sample estimates:

cor

0.2278605

These “highly correlated” variables have a correlation coefficient of only 0.23. Only about 5% of the variation in age is explained by the variation in eggyears. Hence, we would be safe to include both eggyears and age in the regression model and could evaluate their contributions independently.

Note: This is an entirely hypothetical example, put together entirely on the basis of information in Table 2. I am not claiming that this example shares any resemblance to the actual data set.

The reason why eggyears and age correlate so weakly in my example is because the variance in eggs per week is so high relative to its mean, while the variance in age is relatively small compared to its mean. As a consequence, large eggyears scores almost exclusively indicate high egg consumption, not high age.

Hi Claus,

Good point, and thank you. Naturally, I don’t know whether age and egg yolk-years were correlated in the original data set. I’m still skeptical, however, whether transformation of eggs/week to egg-yolk years does anything but confound the latter measurement with an age component. I’m skeptical that age wouldn’t correlate almost perfectly with length of time consuming eggs.

Chris

> I’m skeptical that age wouldn’t correlate almost perfectly with length of time consuming eggs.

Yes, of course they would. What I’m saying is that we should expect the product of eggs per week and age (i.e., eggyears) to correlate much more strongly with eggs per week than with age. As a consequence, eggyears should be only weakly confounded by age, and basically contain the same information as eggs per week.

In my made-up example above, eggyears correlate with eggs per week with r=0.93.

On that note, it should strike you as odd that eggs per week showed no effect in the paper, but eggyears did. The only reasonable explanation is that the regression model picked up on the small amount of information about age contained in eggyears, and that the effect would have disappeared if age had been in the model.

Interesting, Claus, thanks!

Chris

I just read your post on the hypothetical set of data where you break down the average of how many eggs were eaten from 0 up to 10 per week. The results from the study were averaged out to 1.5. Does it not make sense that when this study was done that they would have processed the data in many ways and then published it according to what result they wanted to show such as the 1.5 egg yolks a week average? One has to wonder why they wouldn’t have shown the stats for the 10 egg a week people and also the zero egg a week people. That would be much more revealing. We are all experiments in progress and don’t know the results until much later down the road. I am eating a paleo diet now and am feeling much better. I am currently eating 3 fried eggs a day at breakfast.

Another factor that I don’t think has been mentioned is what the hens that laid the eggs were eating, such as corn or soy or who knows what. Very few eggs consumed are from pasture-raised chickens who hopefully are eating as nature intended. Unfortunately, the food most people buy is of terrible quality which might throw off a lot of these food-based tests.

Lisa, I agree with your comment more than any of the statistical banter. Studies rarely give consideration to the quality of the food source. Eggs from hens that are pasture-raised eating their natural diet and getting exercise versus eggs from hens that are caged and fed “garbage”. Same with beef; 100% grass-fed pasture roaming versus penned, grain-fed “quickly fattened” cows.

There lies the problem with studies like this. Not only are they looking for eggs to be the link between heart disease and high cholesterol and such, but they don’t look at quality. They also don’t look at other foods eat, like someone pointed out. what if they have an egg with there pancake with margarine and syrup, fruit, OJ, Coffee breakfast, 2-3 times a week? Honestly I think all the other items other than the eggs are responsible for heart problems. Add that all together with low quality product. It seems to me that the study is quite worthless and a waste of time. If your gonna do something do right, not half a*sed which is what many studies i read of are like. Bad science.

Per Ned’s and Claus’s comments. I think we’re getting a little lost here. It seems to me an easy way of expressing the crux of the question is this: are older people more or less likely to consume more eggs on a regular basis than younger people? If so, are the consumption patterns of these different age-cohorts consistent over long times spans (i.e. the decades needed to accumulate plaque)?

If the answer to these questions is yes, than we have reason to suspect that the correlation between age and yolk-years will be weaker than if different age-cohorts have a similar mean egg intake. Whether such a weak correlation would justify the use of the term “yolk-years” in a statistical model is a question for model selection, in my mind.

The answer to the second is Chris M’s point about the reliability of asking a question like “how long have you been eating eggs?” (and then extrapolating current frequency over those years!!!!). What kind of evidence would we really need to accurately assess this?

In short, I don’t see how any of this discussion salvages the author’s original regression model from the bin of irrelevance…It still lumps the effect of age into the “yolk-year” term (based on how the author’s calculated yolk-years and our inability to answer the questions I posed above).

Chris

I’m not sure. To me, the crux of the question is whether the current data set, even if we take it at face value and disregard all the limitations of study design and so on, implicates egg years as a risk factor at all. That has not been convincingly established, as far as I can see.

In fact, I doubt that if the data had been analyzed in a double-blind manner (where the statistician building the model does not know what the predictor variables are) egg-years would have been selected as a meaningful predictor.

Well, I guess you have privileged insight in this case, so I’ll go ahead and take your word for it I stand by my criticisms of the methodology, but I think you may be onto something more important…

Chris!

Seems to be a problem with the addresses to the pictures.

What’s wrong with them?

Chris

I completely agree, Chris! There are way too many variables to make such a sweeping statement and attempted correlation between egg yolk consumption, age, and plaque. Once again a situation where incorrect data has been fed to the public with a resultant fear-response in hopes of creating a cause-and-effect result to support an unsupportable hypothesis! There are so many influences and causes of plaque formation – as well as the egg itself – that there is no single factor that anyone can PROVE causes atherosclerotic plaquing! Just more fodder for the misled and misinformed public!

Great discussion! Makes me realize I need to re-educate myself about statistics. I will continue to eat my two eggs daily with vegetables and avoid processed carbs or grains consumed with them. As we all know, diet studies are often problematic and should be viewed skeptically.

Another variable that I think should be taken into account is what the chickens were fed. I’ve had a few email exchanges with Dr. Peat and he says he limits his egg consumption to no more than 1 a day unless he knows what the chickens are being fed.

According to him animals like pigs and chickens tend to retain and pass on the negative materials from things like soy in their fat and yolks wheres animals like cows are able to better filter these things out in their rumens.

I’m not sure to what extent this would affect a person who is eating egg yolks from soy meal fed chickens compared to chickens on their natural diet but it would be interesting to look into.

CHRIS

There is a paper written at WAPF that covers 1950/60 autopsy of old ppl around the world that found no matter what diet was consumed ppl ALWAYS had hardening of the arteries if they went past a certain age. It may even been done by the anti-cholesterol propagandist scientist. I think even the Mansi had harding after a certain age.

Hi Del,

Masai have some atherosclerosis, but not complicated lesions nor occluded lumen, according to the research done back then.

Chris

I have cosumed 2+ whole eggs per day since 4-5 yr. old .Now 80+,total eggs 800+ per yr.

Looks like I’m pretty late to this discussion, and certainly most of the analysis is beyond my learning and capabilities, but I do have a couple of comments on graph C above.

1. On the horizontal axis, the second column, 50-110 Egg-yolk years, spans 60, while the third column spans only 40 EYY. It seems to me that this improperly moves quantities that should have been in the third column to the second column, making the second column larger than it would otherwise have been and, conversly, the third column smaller than it would otherwise have been. Had the grouping remained consistent throughout, the seemingly upward increasing aspect of the curve in this region would have been muted or perhaps eliminated.

2. The last group, > 200, lumps all larger Egg-yolk years into one column, improperly increasing the plaque area value of this column. If the horizontal axis had been extended (200-250, 250-300, etc) and column widths had remained consistent, what now looks like a possible exponential curve might have instead appeared linear.

Maybe I’m missing something, or perhaps the techniques involved somehow account for this, but as it is, to me it looks suspicious.

Sly,

I think the distribution in graph C is based on the quantity of people present in the quintile, not the quantity of egg-yolk years.

Point two is a good point — grouping into quintiles obscures the effects of extremes on both ends.

Chris

Right you are; the x coordinate is displayed in quintiles. However, if the goal of the graph is to show a correlation between egg-yolk-years and carotid plaque area, I don’t see how using quintiles clarifies the nature of the association instead of obscuring it. If I want to see if the correlation is linear, geometric, logarithmic, exponential, some polynomial function or unrelated then I need to see the data displayed in consistent groups (if that is necessary for clarity) on each axis. I don’t see how arbitrarily dividing the x axis into quintiles serves that purpose. Of course, if the author of the study is making the raw data freely available, then I suppose others can do an analysis and check the results. If that is not so, then he has to offer a proof of the validity of his method, unless it is a standard and accepted published method that is being used appropriatly here.

Good point, Sly. To see the true trend we would be best looking at continuous data rather than data categorized by quintile.

Chris

I read it, and I also believe that eggs are very healthy. But I have a doubt:

Are boiled eggs bad? If so, how to eat them? How to cook them? I eat eggs every day.

Boiled eggs are fine. Pretty much any way you want to prepare them is OK.