Beware of Interactions
Sea stacks off St Kilda. Navigate safely or disaster awaits.

Beware of Interactions

Parallel trials but not lines

In a previous post I used an example from Chuang-Stein and Tong(1996) to illustrate ambiguities that arise when fitting interactions in a model. The example in question was with a binary outcome and binary covariate. The figure below illustrates a similar situation with a continuous covariate and a continuous outcome using data from Wei and Zhang (2001). Since the fitted lines are not parallel, the treatment effect in the fit depends on the baseline value. If you want to quote a single value, do you want to quote the one corresponding to a baseline value of 0 (left most vertical dashed line) or the one that applies to the average baseline value of 11.3 (dashed line further to the right) or some other value?


Surprise packages

To return to the Chuang-Stein and Tong (CST) example, the general message was what the computer interprets the main effect to be depends on parameterisation, which might vary from package to package. An example was given in which three statisticians would report three very different estimates, standard errors and p-values for a treatment effect depending on parameterisation. I showed this by actually parameterising the dummy variable for the covariate (sex) myself. Now, however, I am going to use three packages to illustrate that results can appear to change drastically when applying different packages to the same dataset using the same model.

The power of three

I shall illustrate the analysis of the CST example using R(R) SAS(R) and Genstat(R) . In each case a binary event (which either occurs or does not occur) is analysed using sex, treatment and their interaction. In each case I have simply copied and pasted output and have circled the P-value for the treatment effect. Just to avert being scolded by the P-value police, this is not because I regard P-values as being particularly important but because it helps highlight a point.

What R(R) says


Analysis of Chuang-Stein and Tong example using R(R)


What Genstat(R) says


Analysis of the Chuang-Stein and Tong example using Genstat(R)

What SAS(R) says


Analysis of Chuang-Stein and Tong example using SAS(R)

Getting explicit

If you compare the three results you will see that the P-value that R reports is what statistician A quoted in the previous blog. What Genstat(R) gives you is what statistician B reported and SAS(R) gives you what statistician C quoted. (I have cheated slightly with Genstat(R) in that I have explicitly instructed it in the code to use M, for male, as the reference level for sex. Had I used F would have got the same result as R(R).)

In case you doubt that the same model is being fitted by the three packages, look at the P-value for the interaction of treatment and sex. In all three cases it is 0.178 to 3 decimal places.

Isn't this moot?

You could argue that this is all irrelevant. You should not quote a main effect when the interaction is in the model. As I put it in Senn(2000), where I first commented on this example

It may be argued, of course, that where an interaction is present, the results in individual strata are different, therefore, any overall summary is both arbitrary and misleading and a weighted average is inappropriate. (p544)

However, I then went on to say

There are good reasons, however, for preferring the weighted average. The first is related to a point recognized a long time ago by the Bayesian, Harold Jeffreys (46). It is that we should have more scepticism regarding higher order effects (such as interactions) than we show for main effects. The second is that if the treatment has no effect, it becomes implausible to believe in a treatment-by-factor interaction. The third is the related one that the null hypothesis that the treatment is a placebo carries with it the implication that its effect is identically zero in every stratum. For all these reasons, weighting the estimates as if they were homogenous according to precision is a natural starting point for further exploration. (p544-545)

Note, by the way, that the interaction effect is not significant in this example. However, even if it were, I would be interested in a main-effect summary

In fact, statisticians commonly fit models in which interactions are implicitly in the model, even though the object is to report a single main effect. For example, a fixed effects meta-analysis removes the treatment-by-trial interaction from the estimate of the error, since the variance is estimated separately from within each trial. The variation of the treatment effect from trial to trial (the treatment-by-trial interaction) can be used to estimate heterogeneity as a second question but the fixed effects meta-analysis is of interest of itself, even if other analyses then follow.

In that connection, it might be interesting to look also at the Mantel-Haenszel(MH) analysis of this example. The MH test is one that isolates the 1 degree of freedom from a series of k 2x2 tables corresponding to a 'main effect', leaving the other k-1 to assess heterogeneity. This is what SAS(R) has to say.


Mantel-Haenszel analysis in SAS(R) of the Chuang-Stein and Tong data.

Note that the P-value of 0.009 is much closer to statistician D's solution in the previous blog-post to that of statistician C.

My opinion

This is the conclusion I came to in 2000

Using a null hypothesis that implies that no interaction is present, however, does not commit us to using a statistic that takes no account of interaction. Consider by analogy the common two independent samples t-test. Under the null hypothesis of no difference between groups, a variance estimate treating the two groups as one and using n1 + n2 - 1 degrees of freedom is unbiased for sigma^2. We commonly use instead, however, an estimate pooled from the two groups and using n1 + n2 - 2 degrees of freedom. This does, in fact, lead to a more sensitive test. By the same token, we can choose to fit treatment-by-center (trial) interactions as part of a general strategy for examining the effect of treatment without committing ourselves as to the reality of the interactive effects. (P545.)

Lessons

  1. Be very wary about main effects in the presence of interactions. Follow the advice given in ICHE9 (Lewis, 1999) and in Senn(2000) of always checking the model without interactions.
  2. Be very wary of Type III sums of squares etc. Note, also that SAS(R) quotes exactly the same P-value for Treatment for its Type III analysis of effects as the Wald chi-square it produces. Yet this disagrees with what is does for the MH test. So if you favour Type III analyses you presumably don't like the MH approach. I think that when it is put to them like this, most statisticians would hesitate to defend the Type III analysis

References

C. Chuang-Stein and D. M. Tong (1996) The impact of parameterization on the interpretation of the main-effect terms in the presence of an interaction. Drug Information Journal, 421-424.

L. J. Wei and J. Zhang (2001) Analysis of data with imbalance in the baseline outcome variable for randomized clinical trials. Drug Information Journal, 1201-1214.

S. J. Senn (2000) The many modes of meta. Drug Information Journal, 535-549.

J. A. Lewis (1999) Statistical principles for clinical trials (ICH E9): an introductory note on an international guideline. Statistics in Medicine, 1903-1942.


To view or add a comment, sign in

More articles by Stephen Senn

  • May the fourth be with you

    May the fourth be with you

    Be merciless in your pedantry: give no quartile The photograph is of the Laxey Wheel on the Isle of Man . If you look…

    11 Comments
  • Twin Piques

    Twin Piques

    ..

    5 Comments
  • Having a Sense of Proportion

    Having a Sense of Proportion

    The arguments are asymptotic but are relevant to situations where the sampling fluctuations are large enough to be of…

    9 Comments
  • A Pronounced Mistake

    A Pronounced Mistake

    Narrow fabric I come from a family of ribbon makers whose business was based in Basle. In fact, ribbons were in the…

    3 Comments
  • Match fit

    Match fit

    Matching and fitting in observational studies and the relevance or otherwise of the comparison with randomised studies…

    16 Comments
  • Tensions over Testing

    Tensions over Testing

    Bear with me The navigational solution to getting off Ben Nevis is a technique called a ‘dog-leg’. This is a technique…

  • The Main Chance

    The Main Chance

    Almost nobody on LinkedIn will remember The Main Chance, a British television series that ran from 1969-1975 featuring…

    18 Comments
  • Bias Binding?

    Bias Binding?

    By randomizing the order in which the administrative regions change the treatment regimen, SWITCH SWEDEHEART overcomes…

  • Being Just about Adjustment in Clinical Trials

    Being Just about Adjustment in Clinical Trials

    Estimation of the magnitude of effects and of the relevant precision in general needs inclusion of strata parameters…

  • Second things first

    Second things first

    Zero confidence As I have previously pointed out, the idea that point estimates are primary and estimates of their…

    3 Comments

Insights from the community

Others also viewed

Explore topics