Differentiating the solutions to differential equations

A kind of problem that turns up quite often in physics and elsewhere is to find the solution of an ordinary differential equation, given some initial conditions. That is, you have a system with some state $x$ represented as a vector of real numbers, you have an initial state $x_0$, and you have a rule describing the evolution of the state: $$ \frac{dx}{dt} = F(x,t) $$ And your goal is to find $x(t)$. This standard problem has some standard solution techniques, some quite advanced - Runge-Kutta methods, symplectic methods, Hermite integrators. A few are implemented in scipy. But it sometimes happens that solving this problem is only part of a more complicated process, say of fitting, where it would be nice to have the derivatives of the solution with respect to the various initial conditions. It turns out this isn't too hard to work out, usually.

The trick is to think of it as the problem of transporting a vector along the solution. That is, in addition to $x$ as a system state, you also have a vector $v$. As $x$ evolves, so does $v$. Notionally, $v$ could be evolved by giving it some infinitesimal length $\delta$ and then evolving its tip ($x+\delta v$) as well as its tail ($x$). Actually doing this numerically is possible, but very vulnerable to discontinuities in the differential equation solver (for example when the step size changes). But mathematically, it turns out that $$ \frac{dv}{dt} = DF(x,t) $$ where $DF$ is the Jacobian of $F$, that is, the matrix whose $i,j$th entry is the derivative of the $i$th component of $F$ with respect to the $j$th component of $x$ (or vice versa, I'm not quite sure). But this means that you can simply view the problem as a much-higher-dimensional problem whose state is $x$ and any partial derivative vectors $v_k$, and where the right-hand side function gives the derivatives of $x$ as $F(x,t)$ and the derivatives of $v_k$ as the matrix-vector product $DF(x,t)\cdot v_k$.

Going to high dimension in an ODE solver definitely makes the problem harder. For implicit solvers, there is a certain amount of matrix inversion going on, which can become really quite expensive. But solvers for non-stiff problems are probably not much hurt by this expansion. Importantly, the right-hand side depends only on $x$, not all the $v_k$, so the step sizes are probably not much affected by the expansion.

One final comment: if you are integrating from $t_0$ to $t_1$, you might want the derivatives with respect to $t_0$ or $t_1$, if your problem allows modifying those too. Fortunately, that's pretty easy - the derivative with respect to $t_1$ is just $F(x(t_1),t_1)$. With respect to $t_0$ it should be obtained by transporting $-F(x(t_0),t_0)$ along the curve as above.

Let's try this on a sample problem, that of a particle moving in a Keplerian orbit.
In [5]:
import scipy.integrate
In [8]:
def acceleration(r):
    return -r/np.dot(r,r)**(3./2)
def deriv(rv, t):
    d = np.zeros_like(rv)
    d[:3] = rv[3:]
    d[3:] = acceleration(rv[:3])
    return d
In [2]:
eccentric_initial_rv = np.array([1,0,0,0,0.5,0])
In [3]:
def plot_orbits(ts, xs):
    plt.plot(xs[:,0], xs[:,1])
    plt.gca().set_aspect('equal')
In [51]:
ts = np.linspace(0,2*np.pi,1000)
posns = scipy.integrate.odeint(deriv, eccentric_initial_rv, ts, atol=1e-10)
In [52]:
plot_orbits(ts,posns[:,:3])
So far so good - a nice ellipse, just what we should see. (It's a bit more than one turn, since the orbital period is less than $2\pi$, but no matter.) Now let's try to work out the partial derivatives. First we need the Jacobian:
In [28]:
def jacobian(rv, t):
    J = np.zeros((6,6), dtype=rv.dtype)
    r = rv[:3]
    rl = np.dot(r,r)
    for i in range(3):
        J[i,3+i] = 1
        for j in range(3):
            J[3+i,j] = r[i]*r[j]*3*rl**(-5./2)
            if i==j:
                J[3+i,j] += -rl**(-3./2)
    return J
Just to check, since these days my calculus is as sloppy as my arithmetic:
In [85]:
rv = np.array([1,2,3,4,5,6], dtype=float)
delta = 1e-6

n_jac = np.zeros((6,6))
for j in range(6):
    drv = rv.copy()
    drv[j] += delta
    n_jac[:,j] = (deriv(drv,0)-deriv(rv,0))/delta

print n_jac[3:,:3]

print

print jacobian(rv, 0)[3:,:3]

print

print (jacobian(rv, 0) - n_jac)[3:,:3]
    
[[-0.01499935  0.00818147  0.0122722 ]
 [ 0.00818147 -0.00272715  0.02454439]
 [ 0.0122722   0.0245444   0.01772651]]

[[-0.01499936  0.00818147  0.0122722 ]
 [ 0.00818147 -0.00272716  0.0245444 ]
 [ 0.0122722   0.0245444   0.01772651]]

[[ -5.40835639e-09   8.75914105e-10   4.53004848e-09]
 [ -2.62822732e-09  -6.42595358e-09   9.06009696e-09]
 [ -3.94581042e-09   2.62774232e-09   1.31494680e-09]]
Okay. Now we're going to need to convert a position and list of partial derivatives into a single vector, since that's what odeint wants for a state vector.
In [23]:
def pack_vec(rv, vecs):
    vecs = np.asarray(vecs)
    packed = np.zeros(6*(vecs.shape[0]+1))
    packed[:6] = rv
    packed[6:] = vecs.flatten()
    return packed

def unpack_vec(packed):
    rv = packed[:6]
    vecs = packed[6:].reshape(-1,6)
    return rv, vecs
In [24]:
p = pack_vec([0,0,0,0,0,0],[[1,0,0,0,0,0],[1,0,0,0,0,0]])
print p
print unpack_vec(p)
[ 0.  0.  0.  0.  0.  0.  1.  0.  0.  0.  0.  0.  1.  0.  0.  0.  0.  0.]
(array([ 0.,  0.,  0.,  0.,  0.,  0.]), array([[ 1.,  0.,  0.,  0.,  0.,  0.],
       [ 1.,  0.,  0.,  0.,  0.,  0.]]))
Here's where the magic happens: the position and velocity vector evolves just like before, and the partial derivatives evolve according to the Jacobian:
In [71]:
def extended_deriv(rvv, t):
    rv, vecs = unpack_vec(rvv)
    drv = deriv(rv, t)
    jac = jacobian(rv, t)
    dvecs = np.dot(jac, vecs.T).T
    return pack_vec(drv, dvecs)
For simplicity, we'll just test the derivatives at one point, specifically at $t=2\pi$.
In [72]:
t0 = 0
t1 = 2*np.pi
For a list of partial derivatives, the simplest is the matrix of all six partial derivative vectors, that is, the identity matrix. (Of course, testing with a rectangular matrix is a good way to catch stupid bugs. Just speaking hypothetically, you understand.)
In [89]:
vecs = np.eye(6)
posns2, extra = scipy.integrate.odeint(extended_deriv,  
    pack_vec(eccentric_initial_rv, vecs),
    [t0,t1],
    mxstep = 1000,
    full_output=1)

final_rv, final_deriv = unpack_vec(posns2[-1])

print final_rv

print final_deriv

print "Number of steps:", extra['nst']
[ 0.60095524  0.36035579  0.         -1.02853524  0.21525949  0.        ]
[[ 11.51684499  -1.73020275   0.          19.15892536  11.15640367   0.        ]
 [  0.20526243   0.65775946   0.           0.87751988  -0.67308395   0.        ]
 [  0.           0.           0.60095524   0.           0.          -1.02853524]
 [  1.13123643   0.11360844   0.           2.18555873   0.71090257   0.        ]
 [  4.88779405  -0.24699966   0.           8.72864758   5.57000757   0.        ]
 [  0.           0.           0.72071158   0.           0.           0.43051898]]
Number of steps: [570]
Let's quickly check that a simple integration without any partial derivatives winds up at the same place. In addition, it's worth comparing to see how many additional steps the integrator has to take to get the same estimated accuracy.
In [87]:
posns3, extra = scipy.integrate.odeint(deriv,  
    eccentric_initial_rv,
    [t0,t1],
    mxstep = 1000,
    full_output=1)

final_rv_simple = posns3[-1]
print final_rv_simple
print "Number of steps:", extra['nst']
[ 0.60094988  0.3603566   0.         -1.02854399  0.21525322  0.        ]
Number of steps: [400]
Turns out adding the partial derivatives forces the integrator to take half again as many steps. Not too bad, speed-wise, really.
Now let's check that these derivatives are actually right: just solve the problem with slightly different initial conditions and see where it winds up. Getting a really good answer out of this is a finicky business, since it normally involves subtracting two similar quantites (the final positions); the amount by which you change the initial conditions has to be tuned to balance error due to non-linearities in the problem against roundoff error. And a method like this with adaptive stepsizes can have some ugly discontinuities as you vary the initial conditions. But a little fiddling and it works:
In [88]:
delta = 1e-11
num_deriv = np.empty_like(vecs)
for i in range(vecs.shape[0]):
    p4 = scipy.integrate.odeint(deriv,  
        eccentric_initial_rv+delta*vecs[i],
        [t0,t1],
        mxstep = 1000)
    num_deriv[i] = (p4[-1]-final_rv_simple)/delta

print num_deriv

print 

print final_deriv
    
[[ 11.51057027  -1.72911685   0.          19.14852721  11.15102732   0.        ]
 [  0.19784174   0.65913386   0.           0.86508578  -0.68013928   0.        ]
 [  0.           0.           0.60094988   0.           0.          -1.02854399]
 [  1.12472254   0.11498025   0.           2.17468266   0.70479456   0.        ]
 [  4.88009633  -0.2461753    0.           8.71576145   5.56372726   0.        ]
 [  0.           0.           0.72071321   0.           0.           0.43050644]]

[[ 11.51684499  -1.73020275   0.          19.15892536  11.15640367   0.        ]
 [  0.20526243   0.65775946   0.           0.87751988  -0.67308395   0.        ]
 [  0.           0.           0.60095524   0.           0.          -1.02853524]
 [  1.13123643   0.11360844   0.           2.18555873   0.71090257   0.        ]
 [  4.88779405  -0.24699966   0.           8.72864758   5.57000757   0.        ]
 [  0.           0.           0.72071158   0.           0.           0.43051898]]
Now what about a partial derivative with respect to the start or end times?
In [96]:
delta = 1e-10

p5 = scipy.integrate.odeint(deriv,  
    eccentric_initial_rv,
    [t0,t1+delta],
    mxstep = 1000)

diff_t1_num = (p5[-1]-final_rv_simple)/delta
diff_t1 = deriv(p5[-1],t1)

print diff_t1_num
print diff_t1
[-1.03030695  0.21552538  0.         -1.74959602 -1.04898729  0.        ]
[-1.02854399  0.21525322  0.         -1.74670332 -1.04740195 -0.        ]
In [99]:
delta = 1e-10

p6 = scipy.integrate.odeint(deriv,  
    eccentric_initial_rv,
    [t0+delta,t1],
    mxstep = 1000)

diff_t0_num = (p6[-1]-final_rv_simple)/delta
diff_t0 = -np.dot(final_deriv.T, deriv(p6[0],t0))

print diff_t0_num
print diff_t0
[ 1.02854503 -0.21525337  0.          1.74670278  1.047403    0.        ]
[ 1.02860522 -0.21527129 -0.          1.7467988   1.04744455 -0.        ]
So, in short, you can find the partial derivatives of an ODE solution by solving an enlarged ODE where the partial derivative vectors evolve by multiplication with the Jacobian of the original ODE. And it doesn't cost all that much in terms of smaller steps.

No comments: