Thresholds

I apologize for three highly-technical posts in a row; I'm trying to work something out and setting it down "on paper" as it were is helping. I promise I'll come up with a post about kittens or something soon.

Suppose you're searching for pulsars. You're going through the Fourier transform looking for peaks. Now suppose you've found one: how strong is it? Is it statistically significant? For that matter, is it better than any of the list of peaks you're already keeping track of?

To answer this question you need two pieces of information: how strong the background noise is, and how likely that noise would have just randomly produced a peak this high. There's some cleverness in estimating the background, since real signals don't have perfectly flat white noise backgrounds, but I'm going to leave that aside for the moment. My question for today is, what are the statistics of a coherent peak-based search?

I think they can reasonably be estimated by ignoring any oversampling and assuming that, in the absence of a signal, if you're using n harmonics, the profile consists of 2n independent Gaussians with standard deviation 1. So the question is, for a fixed false positive probability p, what threshold should we set? The answer is roughly the threshold for a single Gaussian to exceed p/2n.

This is a little inconvenient to work with, since it requires the error function and its inverse (or an approximation), but flops are free. I suppose calling the error function code might cause cache misses, but I don't imagine needing it that often. Specifically, my idea is this: a simple implementation might just set a threshold at the beginning and run through the whole FFT. But if the observation underwent bad enough RFI, you could find yourself with millions of candidate signals. Since you'd be fine-tuning and storing every one, this could slow things down a lot. My idea is instead to specify a maximum number of candidates - generously, maybe a few hundred - and keep the best ones. This means raising the threshold every time you get a new candidate once the list is full. This wouldn't require the error function except that you don't only want to look at candidates with the full 64 harmonics - you also need to consider those with fewer. And converting thresholds between different numbers of harmonics does require the error function.

1 comment:

Joshua Stults said...

I apologize for three highly-technical posts in a row
You are too polite; these are really interesting. It's neat to see the uses to which people in an unfamiliar field put familiar math. I'll never search for a pulsar in my life, but FFTs are like a swiss army knife.