> Perhaps he takes the pill at the onset of stress, and stress almost always tends to build afterwards.
That would not be a problem regarding the averaging I referred to, although it could well pose a problem for measurement depending on how it interacted with the selected metrics.
Note that the averaging I refer to is not regarding all possible values of some metric, but rather any discrepancy in the distribution of metrics which we expected to follow the same distribution between the sample and the control.
I think maybe there's a misunderstanding? It seems that we both agree that a variety of additional variable should be logged. I was not suggesting to omit them, but rather to use discrepancies in them to detect fundamental issues with the data or study design. I would also expect larger studies to do the same where possible.
At 94 data points it is entirely possible that there would be outliers that would have averaged out for a larger N but did not. In such a scenario the presence of such outliers should then be taken to indicate a problem with the data (ie the more discrepancies you observe, the less you should trust the data).
I was mainly saying I don't understand why you were indicating we should presume or expect they would average out in an N=1 experiment. Even in much larger experiments that is not reliably the case. Science would be relatively easy if there were not a lot of noise in the real world. So, to be clear, my concern is mainly one of scale - the experiment is far too restricted to overlook this: the smaller the experiment, the more important this type of information. Perhaps you were not saying that such averaging might be possible in an N=1 experiment and I misread, since it seems that you comment here indicates a different point.
I of course agree that logging them is basic scientific methodology - in order to detect issues with the experiment, and even hopefully to see the signal through the noise.
That would not be a problem regarding the averaging I referred to, although it could well pose a problem for measurement depending on how it interacted with the selected metrics.
Note that the averaging I refer to is not regarding all possible values of some metric, but rather any discrepancy in the distribution of metrics which we expected to follow the same distribution between the sample and the control.
I think maybe there's a misunderstanding? It seems that we both agree that a variety of additional variable should be logged. I was not suggesting to omit them, but rather to use discrepancies in them to detect fundamental issues with the data or study design. I would also expect larger studies to do the same where possible.
At 94 data points it is entirely possible that there would be outliers that would have averaged out for a larger N but did not. In such a scenario the presence of such outliers should then be taken to indicate a problem with the data (ie the more discrepancies you observe, the less you should trust the data).