I'm currently involved in another low-frequency pulsar survey with the GBT. This is a pointed survey, so we take two-minute pointings in an array tiling the whole northern part of the sky. This is an incredibly sensitive setup, so we will hopefully find a nice collection of new pulsars. Unfortunately, the same arrangement that makes us sensitive to pulsars makes us sensitive to lightning, electric motors, the local power lines, passing aircraft... all sorts of signals, generally outside our intended piece of sky and/or outside our intended frequency range, but also generally vastly more powerful than the pulsars we're looking for. So dealing with this junk — generically called RFI (Radio Frequency Interference) — is an important part of our survey strategy. My topic today is not so much dealing with the mild RFI we see all the time as dealing with what we see every now and then: a tremendous signal comes booming in, overwhelming everything and ruining the data (as seen on the right).

First of all a brief summary of how our system works: The telescope has a receiver at the primary focus, basically a pair of crossed dipoles. This converts the electromagnetic waves to signals in the wires, where they're amplified, encoded in analog on light and sent down an optical fiber to the control room, where it's converted down to a standard frequency range. This is fed to GUPPI, a so-called "pulsar backend". This instrument does analog-to-digital conversion on the signal then uses a polyphase filterbank to efficiently split it into 4096 channels, then computes the power in each channel; this power is averaged over 81.92 microseconds, after which the set of 4096 powers is dumped to disk.

Pulsars are broadband signals, so we use the widest bandwidth our receiver can handle, about 80 MHz centered near 350 MHz. This gives us the best signal-to-noise from our pulsars, but it also means we can't stick to the few narrow bands reserved for radio astronomy. In fact, the region we're using is allocated for various kinds of radio operation, including "aeronautical radionavigation" around 330 MHz. The GBT is in the National Radio Quiet Zone, so with luck much of this spectrum might be clear, but if aircraft are really using it to navigate there's no keeping them out of our airspace. (And at least once when I've complained of RFI the GBT operator has noticed an airplane passing to the north, for what that's worth.) In any case, there is interference to deal with.

Old radioastronomy systems, in order to get the widest possible bandwidth, quantized the input signal to one bit, or three levels, on initial digitization. This has the serious drawback that any small change in the signal levels causes the digitizer to overload or underflow, trashing the output. Newer backends improve on this by using 8-bit digitization, so that you have more than twenty decibels difference between underflow and overflow; it also helps that they are sampling a wider bandwidth, so that it takes more power to substantially change the levels. So generally, when a noise signal is picked up by GUPPI, it is narrowband and produces excess power in one channel. But the polyphase filterbank is good at keeping it from spilling into neigbouring channels (much better than a simple FFT) as long as it doesn't overflow the system, so usually you see something like this:

This shows, in red, the average power in each channel, and in blue, the standard deviation in each channel; since what we are receiving is basically noise, the standard deviation in each channel is large. The overall profile is the sensitivity of our system as a function of frequency. You can see it's not flat over the 80 MHz we want and zero elsewhere, but some complicated bumpy shape; that's the joy of analog components. In fact, given that it's an 80 MHz band at only 350 MHz, we're doing well to have a filter even that good. The point of this plot, though, is those vertical lines. Those are channels where some narrowband signal is coming in, so strongly that it stands way out above the noise. But it's okay; we have four thousand channels, we can afford to throw away a handful, and we have tools to detect when to throw them away (clever tools that look not just for strong signals but also for weaker signals that are narrowband and periodic, since those are a real headache for pulsar searching). So this plot is of one of our good beams.

The beams I'm worried about are the ones that look like this:

Here what's happened is we have some narrowband signal that is so monstrous it's overloading our digitizer. The digitizer's saturation is a nonlinear effect, so it doesn't stay politely in one channel; instead it splatters all over the place, producing both ugly broad peaks as on the right, and (I think) the regularly-spaced vertical lines (though they might be something else). When this happens there's not much use to be had from the data; while it's possible we might detect a pulsar through all that muck, on the one hand we'd be vastly less sensitive, while on the other if the interference has any periodic component to it, it'll produce zillions of false-positive periodicity candidates to wade through. So we need to discard beams like this one.

Fortunately, since we're doing a pointed survey, if we find that a beam has been trashed by RFI, we don't have to just write it off and leave a gap in our survey coverage: we can just put the beam back on our to-observe list. No problem. All I need to do is come up with a scheme for detecting these bad beams automatically, since we often run our survey observations semi-unattended (once they're up and running, I often give the operator my cell phone number and ask them to call me if something like a wind stow or a snow dump comes up; one of my colleagues will often do this and then go to sleep).

Initially, we were planning to run the RFI detection and excision scripts as part of our survey processing. Unfortunately, pulsar survey processing is incredibly CPU-intensive - we are just now finishing off the processing for the drift-scan survey, whose datataking finished in mid-2007. Clearly if we wait until the beams are processed to decide which are RFI-laden, the survey will long be over before we find out which beams we should have reobserved. Fortunately, the RFI detection code isn't too compute-intensive; an eight-processor modern machine running jobs on all processors ought to be able to process eight two-minute beams in about fifteen minutes.

The RFI detection code isn't perfect, though: I found that it was able to detect overloaded channels, but a few overloaded channels are normal. When too many are overloaded we can discard the beam, but I found that there were beams where not very many channels were actually overloaded, but those few splattered into neighboring channels. So my prototype system does an additional test: it compares the bandpass of each beam with the instrument's normal bandpass. This sort of works, but it revealed some surprising things:

This is a different plot of the same sort of thing. The red curve is the bandpass of the system, estimated by taking the median of all the day's beams. The green curve is one of the horrible contaminated beams. But the blue curve is from a beam that is probably fine. But its bandpass looks decidedly different (enough so to trick an early generation of my script into marking it bad). It turns out that what has happened is that this is a beam shortly after a contaminated beam. While the RFI was going on, we ran a system "balance", which adjusts the gain/attenuation at many points along the signal path so that in each stage it's at a comfortable level. Since there was powerful RFI, the gains and attenuations all got readjusted, and all differently. Since different parts of the signal chain affect the bandpass differently, we got a different bandpass. (We also got a drastically reduced signal amplitude, but fortunately GUPPI gives us plenty of dynamic range, as I mentioned above.) So I have to be a little careful when comparing bandpasses so as not to reject minor changes like this. Fortunately the "splatter" from bad RFI is pretty obvious in the statistics, so now I think I have a working bad beams detector. I'm just waiting for feedback from my collaborators before putting it into service.

Some comments on the tools I used to build it: I wrote the code in python and numpy, of course. But key to the process was a module by the author of the RFI detection tool that let me construct python objects from the detector's statistics files. Given this, I used medians to construct a bandpass for each file (since medians are better than means at discarding crazy outliers, which is the whole point of the exercise). I then used masked arrays to flag any bad channels, and scale the result so its median is one. I repeat this process on many beams, then take a median (using the masked array median to nicely ignore any bogus data points). This gives me my system bandpass. Comparing an individual beam to the bandpass then proceeds by constructing a masked median of the file as before, scaling it so its median matches that of the system bandpass, and then counting the points where it is very different.

In all, it has proved invaluable to have the masked array tools; they just do the Right Thing with bad data, vastly simplifying my code.


Daniel Lakeland said...

Hi Anne, this is Dan Lakeland. I enjoyed hearing about your data filtering and rejection issues on Andrew Gelman's blog, so I am subscribing to your blog to hear more. I worked with an ex-astronomer for a while, and did an undergraduate project on signal processing for audio data, so I am somewhat familiar with your world of signal and noise.

Anne M. Archibald said...

Hi Dan, nice to meet you! I actually got into astronomy, in a way, through audio signal processing: talking to a friend about the problems she was dealing with I found some clever ways to apply the signal-processing stuff I was tinkering with to her problem, I spoke to her supervisor, and so when I decided to transfer into astronomy I had a supervisor who knew what I could do. And, interestingly, our sampling rates are really audio rates — only we have four-thousand channel audio to work with. I often think about whether there are ways to "visualize" pulsar data as audio, with little luck so far.