I previously talked about fitting a curve to data — specifically, a sinusoid-plus-constant to a distribution of photon arrival times. I wrote some special-purpose Bayesian code to evaluate the relevant integrals directly so that I could work out the posterior distribution of fitted parameters. As a bonus I was also able to test the hypothesis that the photons were pulsed. Unfortunately, the method I used really doesn't generalize to more complicated situations. Fortunately, there is a toolkit that does: PyMC.
PyMC provides tools to implement Markov Chain Monte Carlo algorithms. These evaluate probability integrals numerically by generating samples from the distribution. They do this fairly efficiently by "jumping" semi-randomly from point to point in the space, but jumping preferentially towards points that are more likely. The algorithm is carefully arranged so that, after a certain number of steps, the initial guess is forgotten and the points are correctly distributed. They are correlated, but the correlations can be reduced or eliminated by subsampling the sequence. It's complex, finicky code. Fortunately I don't need to write it, because the algorithm is sufficiently generic that with PyMC all I have to do is specify the model I'm trying to fit. And, sure enough, it works. Code and results below the jump. The short version is: PyMC is easy to use and well-documented.
The second half of my previous post, model selection, turns out to be a little trickier to implement with Markov Chain Monte Carlo; I'll address that in a later post.

In [1]:

cfg = backend_inline.InlineBackendConfig.instance()
 
from IPython.zmq.pylab import backend_inline
cfg = backend_inline.InlineBackendConfig.instance()
cfg.figure_format = 'svg' # 'png' to switch back
 
import numpy as np
import matplotlib.pyplot as plt
plt.gray();
import pymc

In [2]:

    plt.plot(np.linspace(0,1,len(z)), z, drawstyle="steps-post", **kwargs)
 
def cyclic_plot(y,**kwargs):
    z = np.zeros(len(y)+1)
    z[:-1] = y
    z[-1] = z[0]
    plt.plot(np.linspace(0,1,len(z)), z, drawstyle="steps-post", **kwargs)

We'll start with a very simple pulse profile model: a sinusoid contributing a fraction

f $f$ of the total flux and with its peak at phase

p $p$ ; the total flux will be

t $t$ .
The observed values will be photon counts in

nbins $n_{bins}$ phase bins. The model predicts a rate

ri $r_i$ in the

i $i$ th bin, and the observed count

ci $c_i$ is Poisson-distributed with mean

ri $r_i$ .

The rate in bin

i $i$ is

t (1 - f n + f \int (i + 1) / n i / n (cos (2 π (x - p)) + 1) d x) = t (1 n + f 2 π (sin (2 π ((i + 1) / n - p)) - sin (2 π (i / n - p)))

$t\left(\frac{1-f}{n}+f\int_{i/n}^{(i+1)/n} (\cos(2\pi(x-p))+1) dx\right) = t\left(\frac{1}{n}+\frac{f}{2\pi}(\sin(2\pi((i+1)/n-p))-\sin(2\pi(i/n-p))\right)$

In [3]:

    return t*(1./n+f*np.diff(np.sin(2*np.pi*(np.linspace(0,1,n+1)-p)))/(2*np.pi))
 
def model_bins(f,p,t,n):
    return t*(1./n+f*np.diff(np.sin(2*np.pi*(np.linspace(0,1,n+1)-p)))/(2*np.pi))
cyclic_plot(model_bins(1.0,0.6,10.,130));
np.sum(model_bins(1.0,0.6,10.,130))
    

Out[3]:

10.0

In [4]:

true_r = model_bins(true_f, true_p, true_t, n_bins)
 
n_bins = 64
t_expected = 500 # photons
 
true_f = 0.5
true_p = 0.3
true_t = 512
 
true_r = model_bins(true_f, true_p, true_t, n_bins)
data = np.random.poisson(true_r)
cyclic_plot(true_r)
cyclic_plot(data)
 
 

Now we need to define distributions for all the variables. In many cases these are priors. For

f $f$ and

p $p$ we'll use uniform distributions on

(0,1) $(0,1)$ . The total flux is more of a puzzle; maybe exponential with some expected flux

texpected $t_{expected}$ ?

In [5]:

t = pymc.Uniform('t',lower=t_expected/2.,upper=t_expected*2)
 
f = pymc.Uniform('f',lower=0.,upper=1.)
p = pymc.Uniform('p',lower=0.,upper=1.)
# This doesn't work for some reason
#t = pymc.Exponential('t',beta=t_expected)
t = pymc.Uniform('t',lower=t_expected/2.,upper=t_expected*2)
 
@pymc.deterministic(plot=False)
def r(f=f, p=p, t=t):
    return model_bins(f,p,t,n_bins)
 
c = pymc.Poisson('c', mu=r, value=data, observed=True)

In [6]:

M = pymc.MCMC([f,p,t,r,c])
 
M = pymc.MCMC([f,p,t,r,c])

In [7]:

M.sample(burn=2000, iter=10000, thin=10)
 
M.sample(burn=2000, iter=10000, thin=10)

In [8]:

plt.hist(M.trace('p')[:]);
 
plt.hist(M.trace('p')[:]);
plt.axvline(true_p);
np.std(M.trace('p')[:])

Out[8]:

0.016891632699724057

In [9]:

plt.hist(M.trace('f')[:]);
 
plt.hist(M.trace('f')[:]);
plt.axvline(true_f);

In [10]:

plt.hist(M.trace('t')[:]);
 
plt.hist(M.trace('t')[:]);
plt.axvline(true_t);

In [11]:

pymc.Matplot.plot(M)
 
pymc.Matplot.plot(M)

Plotting f
Plotting

t
Plotting

Lighthouse in the Sky

Curve fitting part 5: PyMC

No comments:

Search This Blog

Popular Posts

Labels

Blog Archive

Other Blogs