Least squares and timing noise, part 2

Simulated time series
In my previous post I described a new paper about fitting pulsar parameters in the presence of timing noise using a general least-squares method. It seems like a good approach, but I'd like to look at it in more detail. So: python to the rescue!

First of all, I want to demonstrate the spectral leakage issue. While the observational situation is complicated, a simple toy model is enough to demonstrate it. I start by generating the Fourier transform of noise with some spectral profile. I then simulate the effects of a limited observing span by transforming it to the time domain (getting, say, a 50-year time series), trimming it down to a shorter time span, then transforming it back. If I do this with a flat or mildly red spectrum, I simply get a lower-resolution spectral plot. But if I use a red enough spectrum I get a bogus spectrum: it's not the right slope, and in fact it's very regular because it's completely dominated by the lowest-frequency component:
Mildly red spectrum, accurately represented.
Borderline red spectrum partly dominated by low frequencies.
Very red spectrum totally dominated by a single low frequency.

Obviously this makes a hash of the resulting spectrum. This paper argues that something can be done about this by using smarter fitting. But to do this I need an estimate of the covariance matrix of the timing noise. Since I am generating the timing noise myself, I know its spectrum. It turns out that I can build the covariance matrix without needing to do the simulations they describe in the paper.
Covariance increases as the spectral index increases.

First of all, the noise process is what I think is called "wide-sense stationary", and in particular, the covariance between a measurement at time T0 and one at time T1 depends only on T1-T0. This means we can summarize the covariance matrix with a one-dimensional array covering all the time differences. This is actually the autocorrelation of the process, and we could estimate it by taking many instances of the process and averaging together all their autocorrelations. But the fastest way to calculate these autocorrelations is to use the Fourier domain - the Fourier transform of the autocorrelation of a sequence is the squared absolute value of the Fourier transform of the sequence. This lets me skip the averaging - I know the expected value in each Fourier coefficient because it's the input power spectrum. So I can build the autocorrelation directly from the spectral model by taking an inverse FFT. Of course this gives me only the values at evenly-spaced lags, but FFTs are so cheap and the autocorrelation so smooth that I can just interpolate. So any spectral model of the noise can easily be converted to a process autocorrelation and hence to a covariance matrix of the observations.

The obvious way to generate fake data is to use the same process as above to generate fake time series and then sample them. But if this covariance matrix is right, I can use it to transform a sample from a multivariate normal distribution and get a fake data set more directly:

Now that I can generate fake instances of noise, I'd like to see if I can fit for pulsar parameters more easily. I'm not going to put any pulsar physics into this code, obviously, but one nice easy feature to fit for is a phase jump. So I'll add one at a known time and fit for it, using ordinary least-squares and taking into account timing noise.
The blue dashed step is the true step I used to generate the data. The green step uses ordinary least-squares fitting, and the red step, almost indistinguishable from the blue, is computed using the full covariance matrix. So at least for this case we see a huge improvement. (One caveat is that all I fit for was the height of the step, not the horizontal position or even a vertical offset, let alone a trend.)

Now what about estimating the spectrum? Once I have a good fitting method, I can try to estimate the spectrum by fitting sinusoids. But what's the frequency response of this sinusoid-fitting process? I can test that by feeding it sinusoids at many different frequencies.
Just to be clear, I picked three different frequencies one might compute a periodogram for. Their positions are indicated by the vertical lines. Then, for each, I used the new fitting procedure to fit the periodogram frequency to input sinusoids and plotted the amplitude of the response. An ideal plot here would have a little narrow peak aligned with each vertical line. Straight-up fitting would produce a sinc function, a central peak with sidelobes falling off like the reciprocal of the frequency offset. This falloff is not fast enough to cope with the sharp increase in noise at low frequencies. The dashed line indicates the reciprocal of the noise model used in the fitting procedure, so that filters must fall off as fast as the dashed line to be useful. These do, demonstrating that the fitting procedure works for resisting red noise. But it's worth noting, too, that the peaks in sensitivity are not necessarily very well lined up with the intended frequencies. In particular, at the lowest frequency (which is one cycle over the span of observations) the actual peak sensitivity is considerably shifted from the periodogram frequency. This is not so much because it's the lowest Fourier frequency as because the steep noise slants the peak. So interpretation of periodograms does require some care. One approach would be to compute a periodogram using the standard test frequencies but then, for each filter, compute the peak (or perhaps mean or median) response frequency and plot the data point there. Or just ignore the very lowest samples.

This frequency domain analysis offers an explanation for why it works so much better to use the new method to fit a step function. I can fit a step function to sinusoids of various frequencies and plot the amplitude response:

It should be clear that the ordinary least-squares fitting is dominated by the lowest frequencies - effectively averaging all the before-the-jump points and all the after-the-jump points. The adjusted fitting method is smart enough to ignore points that are too far from the jump because they're contaminated by noise.

In summary, the method works, and I have a shortcut for computing covariance matrices. One has to be careful about using periodograms because the samples may be sensitive to slightly-shifted frequencies, but the method does work.

Finally here's the code that generates the plots above. As usual, it uses numpy and matplotlib. For the most part it's quick and dirty use-once code, so I apologize if it's hard to read.
import numpy as np

class SpectralNoiseModel:
    def __init__(self):
    def power_spectrum(self, f):
        """Power spectrum as a function frequency in inverse years"""
        raise NotImplementedError

    def generate_fft(self, ts_len, len_years):
        if ts_len%2:
            raise ValueError("Only even-length time series permitted")
        len_years = float(len_years)
        ft_len = ts_len//2+1
        ft = np.random.randn(ft_len) + 1.j*np.random.randn(ft_len)
        # fix normalization
        # with a real FFT the 0th and -1th places hold real numbers
        ft[1:-1] /= np.sqrt(2)
        # scale and return result
        return ft*np.sqrt(self.power_spectrum(

    def generate_ts(self, ts_len, len_years):
        return np.fft.irfft(self.generate_fft(ts_len, len_years))

    def autocorrelation(self, ts_len, len_years):
        if ts_len%2:
            raise ValueError("Only even-length time series permitted")
        len_years = float(len_years)
        ft_len = ts_len//2+1
        ac = np.fft.irfft(self.power_spectrum(np.arange(ft_len)/len_years))
        return ac/ac[0]
    def covariance_matrix(self, times, len_years, ts_len=2**16):
        # FIXME: can we derive len_years from times?
        times = np.asarray(times)
        deltats = times[:,None]-times[None,:]
        ac = self.autocorrelation(ts_len, len_years)
        ac = np.concatenate((ac,ac))
        lags = np.linspace(-len_years,len_years,2*ts_len,endpoint=False)
        return np.interp(deltats, lags, ac)

    def __add__(self, other):
        return SumSpectralNoiseModel(self,other)
    def __rmul__(self, other):
        return ScaledSpectralNoiseModel(self,other)
    def __mul__(self, other):
        return ScaledSpectralNoiseModel(self,other)

class SumSpectralNoiseModel(SpectralNoiseModel):
    def __init__(self, A, B):
        self.A = A
        self.B = B

    def power_spectrum(self, f):
        return self.A.power_spectrum(f)+self.B.power_spectrum(f)

class ScaledSpectralNoiseModel(SpectralNoiseModel):
    def __init__(self, model, scalar):
        self.model = model
        self.scalar = scalar
    def __rmul__(self, other):
        return ScaledSpectralNoiseModel(self.model,self.scalar*other)
    def __mul__(self, other):
        return ScaledSpectralNoiseModel(self.model,self.scalar*other)

    def power_spectrum(self, f):
        return np.abs(self.scalar)**2*self.model.power_spectrum(f)

def clean_power(x,a):
    if np.isscalar(x):
        if x!=0:
            return x**a
            return 0.
        c = (x==0)
        r = np.copy(x)
        r[c] = 1
        r **= a
        r[c] = 0
        return r
class PowerLaw(SpectralNoiseModel):
    def __init__(self, spectral_index):
        self.spectral_index = float(spectral_index)

    def power_spectrum(self, f):
        return clean_power(f,-self.spectral_index)

def trim_spec(ft, trimmed_ts_len):
    ts = np.fft.irfft(ft)
    ts[trimmed_ts_len:] = 0
    return np.fft.rfft(ts)

def lstsqchol(A,b,CV, rcond=-1):
    """Compute a linear least-squares fit with covariant errors.

    Find x such that v=dot(A,x)-b minimizes dot(v,dot(CV,v)).

    CV should be a positive definite matrix.

    # Cholesky factor CV
    L = np.linalg.cholesky(CV)
    # invert CV
    Li = np.linalg.inv(L)

    return np.linalg.lstsq(np.dot(Li,A),np.dot(Li,b),rcond)

def frequency_response(times, func, fs, cov):
    # using a matrix RHS for least-squares computes least-squares
    # independently for each column, so this just vectorizes over
    # all frequencies
    # note that lstsqchol does a Cholesky factorization every time
    # it is called
    rhsc = np.cos(2*np.pi*fs[None,:]*times[:,None])
    rhss = np.sin(2*np.pi*fs[None,:]*times[:,None])
    fc, r, rk, s = lstsqchol(func, rhsc, cov)
    fs, r, rk, s = lstsqchol(func, rhss, cov)
    rs = np.sqrt(np.sum(fc**2,axis=0)+np.sum(fs**2,axis=0))
    return rs

if __name__=='__main__':

    import matplotlib.pyplot as plt

    ts_len = 65536
    trim_ts_len = 512
    len_years = 50.
    obs_years = 5.
    n_times = 200

    for alpha in [1, 2, 3]:
        M = PowerLaw(alpha)
        ft = M.generate_fft(ts_len,len_years)
        plt.semilogy(np.arange(len(ft))/len_years, np.abs(ft)**2,
                label="True spectrum", color='black')
        ft = trim_spec(ft, trim_ts_len)
        plt.semilogy(np.arange(len(ft))/len_years, np.abs(ft)**2,
                label="Spectrum after trimming to %f years" % 
        plt.xlabel("Frequency (yr$^{-1}$)")
        plt.ylabel("Power spectral density")
        plt.title("$\\alpha = %g$" % alpha)
        plt.savefig("alpha-%g.png" % alpha,dpi=96)

    n = 65536
    for alpha in [1,2,3]:
        M = PowerLaw(alpha)
        lags = np.linspace(0,len_years,n,endpoint=False)
        c = lags<=obs_years
                np.abs(M.autocorrelation(n, len_years))[c],
                label="$\\alpha = %g$" % alpha)
    plt.xlabel("Lag (yr)")

    for i, alpha in enumerate([1,2,3]):
        M = PowerLaw(alpha)
        plt.imshow(np.log(M.covariance_matrix(np.linspace(0, 1, 32),len_years)), 
        plt.xlabel("$\\alpha = %g$" % alpha)
    plt.title("Covariance matrices")

    # Set up model for the rest of the file
    M = PowerLaw(3) + 0.1*PowerLaw(0)
    times = np.random.uniform(0, obs_years, 200)
    cov = M.covariance_matrix(times,len_years)
    root_cov = np.linalg.cholesky(cov)

    plt.imshow(np.log(cov), interpolation='nearest')
    plt.imshow(np.log(np.abs(root_cov)), interpolation='nearest')

    plt.title("Example time series")
    ts = np.linspace(0,len_years,ts_len)
    c = ts<=obs_years
    plt.plot(ts[c], M.generate_ts(ts_len,len_years)[c], 
            linestyle=" ", marker=".")
    plt.xlabel("Time (yr)")
    plt.ylabel("Observed value")

    plt.title("De-whitened white noise")
    plt.plot(times, np.dot(root_cov, np.random.randn(len(times))), 
            linestyle=" ", marker=".")
    plt.xlabel("Time (yr)")

    plt.title("Frequency response of periodogram samples")
    for f_target in [1,3,10]:
        f_target /= obs_years
        fs = np.exp(np.linspace(np.log(0.5/obs_years),np.log(100/obs_years),1000))
        A = np.array([np.sin(2*np.pi*f_target*times),
        rs = frequency_response(times,A,fs,cov)
        plt.loglog(fs, rs)
    plt.loglog(fs, 1./np.sqrt(M.power_spectrum(fs)), linestyle="--")
    plt.xlabel("Frequency (yr$^{-1}$)")
    plt.ylabel("Amplitude response")

    signal = np.zeros(len(times))
    signal -= 1
    signal[times>obs_years/2.] += 2
    fakedata = np.dot(root_cov, np.random.randn(len(times)))
    fakedata /= np.std(fakedata)
    fakedata -= np.mean(fakedata)
    fakedata += signal

    plt.title("Spectral response of fitting a step function")
    # Fitting a step function to the fake data
    fs = np.exp(np.linspace(np.log(0.5/obs_years),np.log(100/obs_years),1000))
    plt.loglog(fs, frequency_response(times, signal[:,None], fs, cov), 
        label="With covariance")
    plt.loglog(fs, frequency_response(times, signal[:,None], fs, 
        label="Without covariance")
    plt.loglog(fs, 1./np.sqrt(M.power_spectrum(fs)), linestyle="--")
    plt.xlabel("Frequency (yr$^{-1}$)")
    plt.ylabel("Amplitude response")

    plt.title("Simulated noise plus a phase jump")

    plt.plot(times, fakedata, label="Simulated data", linestyle=" ", marker=".")

    k1, r, rk, s  = np.linalg.lstsq(signal[:,None], fakedata)
    plt.plot(times, k1*signal, label="Fit without covariance")

    k2, r, rk, s  = lstsqchol(signal[:,None], fakedata, cov)
    plt.plot(times, k2*signal, label="Fit with covariance")

    plt.plot(times, signal, linestyle="--", label="True jump")
    plt.xlabel("Time (yr)")


Fernando said...

Awesome post, as usual. Anne, we've just merged the IPython notebook machinery a minute ago, it would be cool if you eventually give it a shot, to turn these blog posts into notebooks. They'd let you have a more natural flow between explanation (with math), code and figures...

If you are interested and are willing to give it a try, let us know and we'd be happy to help out with any install glitches you could run into.

Anne M. Archibald said...

Yes, I saw the announcement. I haven't yet installed it for a sort of silly reason: it won't be available on the computers at work and I'm worried if I start using it on my laptop it'll be hard to go back... but yes, it does seem like a good solution (modulo the fact that blogspot accepts something that is not quite HTML; for example it's not quite clear what to do with embedded images).

Anne M. Archibald said...

Can I also just say I wish you hadn't used Lena as your test image on the homepage? I know it's a standard, but it's also from Playboy...

Fernando said...

Sorry about using scipy.lena(), I certainly didn't intend to offend you, or anyone else, with it... I was looking for an image to mark a visual contrast with the black text console, and that would be available to any scipy user so they could run the same one-liner locally.

And unfortunately, lena is the only image included in scipy itself. In fact, I would much rather have the 'baboon' image also used in the image processing literature, because it's typically shown in true-color (scipy.lena() is grayscale), but we don't have it in scipy...

I didn't really stop to think about its origin. I'm not claiming ignorance (I'd read about it a long time ago), but having seen it in so many papers I probably just grew used to it as a reference dataset, without pausing to give it any further consideration.

All that said, sorry for any offense my choice may have caused, please accept my apology. I didn't actually create the screenshot (I don't know what tool was used to create the 3d effect on the windows) but I can ask Thomas to produce a replacement one with different data. Jarrod actually started looking into removing scipy.lena() altogether and adding other standard images to scipy. While we don't need to carry a large database, at least a few commonly used ones (e.g. baboon, photographer, lady-with-hat) would be handy to have always in there and would offer alternatives to persisting use of scipy.lena().

As a side note, today at Euroscipy in Paris, I saw lena on an image processing poster that probably used it for the same reason. This seems to indicate that having it as the only reference, usable image in scipy will only help perpetuate its use...

ps- I hadn't meant to ignore your comment; I never clicked on 'email follow-ups' so I only noticed it now because of your reply on g+.

Anne M. Archibald said...

I didn't know it came with scipy. Bleah. It's not a disaster and we feminists are not coming with pitchforks and torches, but it is one of those little things that add up to make women feel less welcome in computer science.

It's good to have at least one grayscale image in the examples, maybe at least one grayscale and at least one RGB. I also remember there was a discussion about downloadable datasets for scipy; did that ever go anywhere?