PyMC provides tools to implement Markov Chain Monte Carlo algorithms. These evaluate probability integrals numerically by generating samples from the distribution. They do this fairly efficiently by "jumping" semi-randomly from point to point in the space, but jumping preferentially towards points that are more likely. The algorithm is carefully arranged so that, after a certain number of steps, the initial guess is forgotten and the points are correctly distributed. They are correlated, but the correlations can be reduced or eliminated by subsampling the sequence. It's complex, finicky code. Fortunately I don't need to write it, because the algorithm is sufficiently generic that with PyMC all I have to do is specify the model I'm trying to fit. And, sure enough, it works. Code and results below the jump. The short version is: PyMC is easy to use and well-documented.
The second half of my previous post, model selection, turns out to be a little trickier to implement with Markov Chain Monte Carlo; I'll address that in a later post.
In [1]:
from IPython.zmq.pylab import backend_inline
cfg = backend_inline.InlineBackendConfig.instance()
cfg.figure_format = 'svg' # 'png' to switch back
import numpy as np
import matplotlib.pyplot as plt
plt.gray();
import pymc
In [2]:
def cyclic_plot(y,**kwargs):
z = np.zeros(len(y)+1)
z[:-1] = y
z[-1] = z[0]
plt.plot(np.linspace(0,1,len(z)), z, drawstyle="steps-post", **kwargs)
We'll start with a very simple pulse profile model: a sinusoid contributing
a fraction f of the total flux and with its peak at phase p ; the total
flux will be t .
The observed values will be photon counts innbins phase bins. The model predicts
a rate ri in the i th bin, and the observed count ci is Poisson-distributed
with mean ri .
The observed values will be photon counts in
The rate in bin i is
t(1−fn+f∫(i+1)/ni/n(cos(2π(x−p))+1)dx)=t(1n+f2π(sin(2π((i+1)/n−p))−sin(2π(i/n−p)))
In [3]:
def model_bins(f,p,t,n):
return t*(1./n+f*np.diff(np.sin(2*np.pi*(np.linspace(0,1,n+1)-p)))/(2*np.pi))
cyclic_plot(model_bins(1.0,0.6,10.,130));
np.sum(model_bins(1.0,0.6,10.,130))
Out[3]:
In [4]:
n_bins = 64
t_expected = 500 # photons
true_f = 0.5
true_p = 0.3
true_t = 512
true_r = model_bins(true_f, true_p, true_t, n_bins)
data = np.random.poisson(true_r)
cyclic_plot(true_r)
cyclic_plot(data)
Now we need to define distributions for all the variables. In many cases these are
priors. For f and p we'll use uniform distributions on (0,1) . The total flux
is more of a puzzle; maybe exponential with some expected flux texpected ?
In [5]:
f = pymc.Uniform('f',lower=0.,upper=1.)
p = pymc.Uniform('p',lower=0.,upper=1.)
# This doesn't work for some reason
#t = pymc.Exponential('t',beta=t_expected)
t = pymc.Uniform('t',lower=t_expected/2.,upper=t_expected*2)
.deterministic(plot=False)
def r(f=f, p=p, t=t):
return model_bins(f,p,t,n_bins)
c = pymc.Poisson('c', mu=r, value=data, observed=True)
In [6]:
M = pymc.MCMC([f,p,t,r,c])
In [7]:
M.sample(burn=2000, iter=10000, thin=10)
In [8]:
plt.hist(M.trace('p')[:]);
plt.axvline(true_p);
np.std(M.trace('p')[:])
Out[8]:
In [9]:
plt.hist(M.trace('f')[:]);
plt.axvline(true_f);
In [10]:
plt.hist(M.trace('t')[:]);
plt.axvline(true_t);
No comments:
Post a Comment