I was reading the recent papers on arxiv.org, preparing for our weekly neutron star discussion group, and I came across a paper that appears to be based on a statistical error. The content is not really my field, but I'm pretty sure the mathematics are a bit dubious.
The subject of the paper is MOND, "Modified Newtonian Dynamics". Newtonian gravity and general relativity seem to be excellent fits to observations in the solar system and in stronger fields, but as soon as you go to weaker fields - galaxy rotation curves or cosmology - the observations disagree with the data. The standard way to deal with this problem is to invoke some invisible massive material, so-called "dark matter", in just the amounts needed to make the data line up with the predictions of standard gravity. The idea of MOND is to point out that the problems all arise at around the same acceleration a, and to postulate that the problem is our theory of gravity.
This paper is in response to another, fairly recent one, that pointed out that there are globular clusters where the accelerations of the stars as they orbit the cluster are about a. So MOND effects should be visible there. The first paper measured radial velocities of seventeen stars in the cluster, and claimed their velocities were not consistent with MOND. This new paper claims that in fact the radial velocities are consistent with MOND.
In particular, this paper takes the collection of radial velocities and tests them against the predicted distribution with the Kolmogorov-Smirnov test. They find that the probability of obtaining a KS score this extreme is 0.36 or 0.27, and claim that "based on a KS test, which is the relevant statistical test for small samples, the currently available data are insufficient to discriminate between Newtonian gravity and MOND." There are several errors in this statement.
First of all, it is not true that the KS test is "the relevant statistical test for small samples". There are many tests applicable to small samples, and the KS test is in fact one of the weaker tests. That is, for many data sets, the KS test will report no significant difference while some other test would (correctly) report a significant difference. So the fact that the KS test does not show a significant difference doesn't mean that no test will. In particular, the authors don't even show that the previous paper's statistical test is invalid; they simply state "Given the small sample size, the formal error on the velocity dispersion is not sufficient to discriminate between various models, [...]". Maybe it is, but since neither paper gives details on how the errors on this dispersion were obtained, I find it hard to judge.
The second problem is that as far as I can tell, they misapply the KS test. The KS test tests whether a given data set is drawn from a given distribution. But the probability values it returns are correct only if the distribution is known a priori - if one has found some of the distribution's parameters by fitting to the data, one must use a different approach for calculating the p values. If one doesn't, one obtains p-values that are too high: that is, the data appears more plausible than it really is.
Just out of curiosity I retyped the data in the more recent paper. They claim that MOND predicts (under certain conditions) that the stellar velocities should be a Gaussian with a dispersion of 1.27 km/s. There are seventeen stars on their list, one of which ("star 15") is somewhat ambiguous. But a quick test shows that the population standard deviation of the sixteen good stars is 0.544 km/s; if the stellar population really has a standard deviation of 1.27 km/s, simulation shows a value this low should arise with a probability of about 0.0005: either the data is a bizarre fluke or this particular MOND prediction is wrong. (Notice that I haven't made any assumptions whatsoever on the sample size.) Including star 15 increases the spread of the observed velocities, making the probability of getting a value this low as high as 0.013, still quite strong evidence against this particular prediction of MOND.
(A quick test with scipy's implementation of the Anderson-Darling test reveals that the data are consistent with a normal distribution if you omit star 15; if you include it the data becomes less consistent, giving a probability of data this unusual between 0.05 and 0.10. This test correctly takes into account the fact that it is estimating both the mean and dispersion of the underlying normal distribution. In any case it seems unlikely the standard deviation I use above is being thrown off by bizarre outliers.)