How tempo2 does its fitting

Pulsars can be difficult objects to study: for example, their radio pulses can randomly change in brightness, turn off, turn on, change shape, and we really don't know why. Nevertheless there has been some excellent science done by studying those very radio pulses. The trick has mostly been to simply not care how bright they are or what shape they have and focus on when they arrive. Since this comes from the rotation of the pulsar, this tends to be very regular. After all, for a ball 10 km across, with more mass than the sun, smoothed to within a millimeter by its own gravity, it takes an awful lot to change how fast it's spinning. What's more, time is the quantity we can make the best measurements of - world time standards drift by something like microseconds over decades, which is something like one part in ten to the fourteen. So pulsar timing is a powerful technique, that can measure pulsar positions and distances, spin-down rates and braking indices, and binary orbits. The standard software has been tempo, which is written in FORTRAN and has certain limitations. A new tool has recently appeared, tempo2, written in C++ and boasting good handling of timing effects down to the nanosecond level. Unfortunately the documentation on this tool is so far somewhat limited, so I've been figuring out how it works. I'd like to describe it, as best I understand it, here. This particular post will talk about how a timing solution is fit to a set of pulse arrival times.

Let's suppose you've observed a pulsar. You've built a reference pulse profile and used it to calculate the precise time of arrival (TOA) of a number of pulses from the pulsar. Now you want to use these arrival times to measure, say, the spin period of the pulsar (it could as easily be any other parameter that affects the timing, like spin-down rate, binary period, or position).

In principle, the procedure is simple: for each plausible spin period, you predict the phase at the time each pulse arrived. Since the pulse arrived at phase zero, the size of the non-zero prediction - the residual - serves as a measure of how far wrong your model was. Hopefully your procedure for calculating the arrival times yields an uncertainty, so that you can see how likely the error is to be due to the random scatter we expect from a noisy signal. You can then build a measure of total size of all the residuals - a chi-squared, probably. Then you vary the spin period (or whatever) to try to minimize the chi-squared. Simple, right?

Well, yes and no. If your model is too far wrong, it may predict the wrong number of turns; since all you have are arrival phases, your residuals are modulo one turn.  This means that, for example, if you started from the best-fit solution and began changing your parameter, the fit would get worse smoothly up to a point - but then it would suddenly start decreasing as your error wrapped around. Once the number of turns becomes ambiguous, your minimization problem begins to become very rough.

Traditionally, tempo has left the number-of-turns issue to the user. If the number of turns is wrong, well, tough luck. tempo simply assumes the number of turns is whatever is closest to your initial model's prediction. There are various schemes to change this number of turns that tempo assumes, but tempo leaves the problem up to you. In this tempo2 takes the same approach.

The second problem with this minimization is that it's a very large problem - one can easily have tens of parameters to fit, and thousands of residuals to try to match. On the other hand, the problem is very nearly linear (aside from that nasty phase wrapping, which is your problem, not tempo/tempo2's) - while the ways the parameters affect the residuals can be complicated, they rarely interact very much. So tempo and tempo2 treat the problem as a linear one. This means that there's no need for complex nonlinear optimizers.

The way tempo2 does the actual fitting is this:

  1. Take the initial guess at model parameters and form residuals for each TOA.
  2. Adjust these residuals using any PHASE or TRACK commands the user specified; this locks in the number of turns between each pair.
  3. Compute the partial derivatives of the residuals with respect to each parameter.
  4. Use an SVD to do a linear least-squares fit, finding the parameter offsets that minimize the residuals.
  5. Output the modified model.
Tempo2 can be told to repeat this process a number of times; this is necessary in principle to take into account nonlinearities in the model. In fact, iterating this process is technically the steepest-descent optimization algorithm, which is, frankly, bad. But for a very-nearly-linear problem, in which the objective function and its derivatives are very expensive, this gives an approximate solution much less slowly than a "better" algorithm.

What could go wrong with this? Well, nonlinearities are bad news. Phase wraps, if present on input, make fitting pointless, but at least the linear approximation ensures that they don't appear after you've started. It's also the case that tempo2 doesn't actually check whether the "best-fit" model is better than the initial model. But the problem I recently tripped over is, I think, numerical: although the SVD is a pretty good way to do least-squares fitting, you can still have problems if the matrix of partial derivatives has entries of wildly different sizes. Unfortunately, this can easily happen, since tempo's standard way of representing something that varies is as a truncated Taylor series. The magnitude of the coefficient of x^10/10! can easily be 10^-40 even for a rather modest polynomial.

How does tempo2 deal with this? After all, it includes a model for the pulsar's spin frequency that accepts up to twelve derivatives. Well, in fact it scales those derivatives during fitting so that they, hopefully, have reasonable sizes. I am hoping that there is some general mechanism for accomplishing this.


Daniel Lakeland said...

I don't really understand what the specific problem is that Tempo2 solves (for example what the data looks like or what the models look like), and obviously you are not the author so you are constrained as to what to do about it.

However, if I were to try to do something like what you're talking about I would be very tempted to use a technique called "Simultaneous Perturbation Stochastic Approximation" (SPSA).

Also, I would be very tempted to use both nondimensionalization, and orthogonal polynomials (to minimize the problem of varying numerical scales)

Dunno if any of those ideas are helpful to you but hopefully at least they are interesting.

Anne M. Archibald said...

In the simplest possible terms, tempo2 doing least-squared fitting of models to data; the key wrinkle is that rather than simply being observed values with error bars, the observed values are phases, that is, reduced modulo 1. As a result, model guesses that are even a little off result in "phase wraps", and the objective function is stuck in a false minimum. So tempo2's optimizer assumes that the parameter steps are so small that the fitting function can be treated as linear. Thus optimization proceeds with a single function evaluation and a single evaluation of the partial derivative matrix. There's not really any way a normal optimizer can function with such a low call count.

SPSA in particular looks impressive, but it solves a number of problems we don't have - for example, the only noise in our model evaluations is long double roundoff error. It's possible that it could navigate the extremely nasty optimization landscape introduced by phase wraps, but I doubt it, and in any case a scheme that has explicit knowledge of the phase jumps would almost certainly be more reliable.

Nondimensionalization is indeed interesting; in fact I implemented a limited version of it last night. Didn't solve my convergence headaches, but it sure did improve the condition numbers.

Orthogonal polynomials would be a much better way to represent varying quantities, as would Fourier series or splines or almost anything but truncated Taylor series. Unfortunately, the fitted models are published as primary results, so the models we use have to be simple, clear, easily implementable, and familiar to other pulsar astronomers. More complicated models will be a pretty hard sell.