### Curve fitting part 5: PyMC

I previously talked about fitting a curve to data — specifically, a sinusoid-plus-constant to a distribution of photon arrival times. I wrote some special-purpose Bayesian code to evaluate the relevant integrals directly so that I could work out the posterior distribution of fitted parameters. As a bonus I was also able to test the hypothesis that the photons were pulsed. Unfortunately, the method I used really doesn't generalize to more complicated situations. Fortunately, there is a toolkit that does: PyMC.
PyMC provides tools to implement Markov Chain Monte Carlo algorithms. These evaluate probability integrals numerically by generating samples from the distribution. They do this fairly efficiently by "jumping" semi-randomly from point to point in the space, but jumping preferentially towards points that are more likely. The algorithm is carefully arranged so that, after a certain number of steps, the initial guess is forgotten and the points are correctly distributed. They are correlated, but the correlations can be reduced or eliminated by subsampling the sequence. It's complex, finicky code. Fortunately I don't need to write it, because the algorithm is sufficiently generic that with PyMC all I have to do is specify the model I'm trying to fit. And, sure enough, it works. Code and results below the jump. The short version is: PyMC is easy to use and well-documented.
The second half of my previous post, model selection, turns out to be a little trickier to implement with Markov Chain Monte Carlo; I'll address that in a later post.
In [1]:
cfg = backend_inline.InlineBackendConfig.instance()

from IPython.zmq.pylab import backend_inline
cfg = backend_inline.InlineBackendConfig.instance()
cfg.figure_format = 'svg' # 'png' to switch back

import numpy as np
import matplotlib.pyplot as plt
plt.gray();
import pymc
In [2]:
    plt.plot(np.linspace(0,1,len(z)), z, drawstyle="steps-post", **kwargs)

def cyclic_plot(y,**kwargs):
    z = np.zeros(len(y)+1)
    z[:-1] = y
    z[-1] = z[0]
    plt.plot(np.linspace(0,1,len(z)), z, drawstyle="steps-post", **kwargs)
We'll start with a very simple pulse profile model: a sinusoid contributing a fraction f$f$ of the total flux and with its peak at phase p$p$; the total flux will be t$t$.
The observed values will be photon counts in nbins$n_{bins}$ phase bins. The model predicts a rate ri$r_i$ in the i$i$th bin, and the observed count ci$c_i$ is Poisson-distributed with mean ri$r_i$.
The rate in bin i$i$ is
t(1fn+f(i+1)/ni/n(cos(2π(xp))+1)dx)=t(1n+f2π(sin(2π((i+1)/np))sin(2π(i/np)))
In [3]:
    return t*(1./n+f*np.diff(np.sin(2*np.pi*(np.linspace(0,1,n+1)-p)))/(2*np.pi))

def model_bins(f,p,t,n):
    return t*(1./n+f*np.diff(np.sin(2*np.pi*(np.linspace(0,1,n+1)-p)))/(2*np.pi))
cyclic_plot(model_bins(1.0,0.6,10.,130));
np.sum(model_bins(1.0,0.6,10.,130))

Out[3]:
10.0

In [4]:
true_r = model_bins(true_f, true_p, true_t, n_bins)

n_bins = 64
t_expected = 500 # photons

true_f = 0.5
true_p = 0.3
true_t = 512

true_r = model_bins(true_f, true_p, true_t, n_bins)
data = np.random.poisson(true_r)
cyclic_plot(true_r)
cyclic_plot(data)


Now we need to define distributions for all the variables. In many cases these are priors. For f$f$ and p$p$ we'll use uniform distributions on (0,1)$(0,1)$. The total flux is more of a puzzle; maybe exponential with some expected flux texpected$t_{expected}$?
In [5]:
t = pymc.Uniform('t',lower=t_expected/2.,upper=t_expected*2)

f = pymc.Uniform('f',lower=0.,upper=1.)
p = pymc.Uniform('p',lower=0.,upper=1.)
# This doesn't work for some reason
#t = pymc.Exponential('t',beta=t_expected)
t = pymc.Uniform('t',lower=t_expected/2.,upper=t_expected*2)

@pymc.deterministic(plot=False)
def r(f=f, p=p, t=t):
    return model_bins(f,p,t,n_bins)

c = pymc.Poisson('c', mu=r, value=data, observed=True)
In [6]:
M = pymc.MCMC([f,p,t,r,c])

M = pymc.MCMC([f,p,t,r,c])
In [7]:
M.sample(burn=2000, iter=10000, thin=10)

M.sample(burn=2000, iter=10000, thin=10)
In [8]:
plt.hist(M.trace('p')[:]);

plt.hist(M.trace('p')[:]);
plt.axvline(true_p);
np.std(M.trace('p')[:])
Out[8]:
0.016891632699724057

In [9]:
plt.hist(M.trace('f')[:]);

plt.hist(M.trace('f')[:]);
plt.axvline(true_f);
In [10]:
plt.hist(M.trace('t')[:]);

plt.hist(M.trace('t')[:]);
plt.axvline(true_t);
In [11]:
pymc.Matplot.plot(M)

pymc.Matplot.plot(M)
Plotting f
Plotting
t
Plotting
p