$$\definecolor{data}{RGB}{18,110,213} \definecolor{unknown}{RGB}{217,86,16} \definecolor{learned}{RGB}{175,114,176} \newcommand{\underbracedbmatrix}[2]{ \smash[b]{\underbrace{ \begin{bmatrix}#1\end{bmatrix} }_{#2}} \vphantom{\underbrace{\begin{bmatrix}#1\end{bmatrix}}_{#2}} }$$

# Fitting Master Equations

## April 13, 2015

Robert T. McGibbon
arXiv:1407.8083

## Overview

R.T. McGibbon and V.S. Pande, “Efficient maximum likelihood parameterization of continuous-time Markov processes,” (2015), arXiv:1407.8083.

## First Order Chemical Kinetics

### Matrix Formulation

$$\ce{U <=>[k_{12}][k_{21}] I <=>[k_{23}][k_{32}] F}$$

$$\frac{d\vec{p}}{dt} = \underbracedbmatrix{p_U & p_I & p_F}{\vec{p}} \cdot \underbracedbmatrix{ -k_{12} & k_{12} & 0 \\ k_{21} & -k_{21} - k_{23} & k_{23} \\ 0 & k_{32} & -k_{32} }{\mathbf{K}}$$
L.-P. Wang et al. "Discovering chemistry with an ab initio nanoreactor" Nature Chemistry (2015) doi:10.1038/nchem.2099

## Master equations

$$\frac{dp_t}{dt} = p_t \, \mathbf{K}$$ $$p_{t+\tau} = p_t \, \underbrace{\exp(\tau\mathbf{K})}_{\mathbf{T}(\tau)}$$
C.R. Baiz et al. Biophys J. (2014) doi:10.1016/j.bpj.2014.02.008

## The embedding problem

### Markov processes vs. Markov chains

• Every continuous-time Markov process can be "sliced" at any $\tau>0$ to yield a discrete-time Markov chain.
• Not every discrete-time Markov chain has an embedded continuous-time Markov process.
• Necessary and sufficient conditions for embedability are unknown.

## Fitting master equations

### From discrete-time data

$$x = \{1, 1, 2, 1, 3, 3, 2, 1, 2, 3, 4, 4, 4, 3, \ldots\}$$

$$\Downarrow ?$$

$$\mathbf{K} = \begin{bmatrix} -2.4 & 2.4 & & \\ 2.2 & -3.1 & 0.9 & \\ & 0.5 & -0.9 & 0.4 \\ & & 0.5 & -0.5 \end{bmatrix}$$

## Existing algorithims

### Estimating Markov processes from discrete-time data

• Route 1: first estimate discrete-time Markov chain, then ''solve'' embedding problem
• $\{x_1, x_2, \ldots, x_n\} \rightarrow \mathbf{T}(\tau) \rightarrow \mathbf{K}$
• Fast, but unsatisfactory.1,2
• Route 2: directly fit a Markov process
• $\{x_1, x_2, \ldots, x_n\} \rightarrow \mathbf{K}$
• Existing algorithims are slow $(O(n^5), O(n^6))$.3,4

## Overview

R.T. McGibbon and V.S. Pande, “Efficient maximum likelihood parameterization of continuous-time Markov processes,” (2015), arXiv:1407.8083.

## Discrete-time data likelihood

### Fun with matrix exponentials

\begin{align} P(x | \mathbf{K}, x_0) &= \prod_{k=0}^{N-1} \mathbf{T}(\tau)_{x_{k\tau},\, x_{(k+1)\tau}} = \prod_{i,j} \mathbf{T}_{ij}(\tau)^{\mathbf{C}_{ij}(\tau)} \end{align}
\begin{align} \underbrace{\mathcal{L}(\theta; \tau)}_{\color{data}\text{maximimze me!}} &\equiv \ln P(x | \mathbf{K}(\theta), x_0) \\ &= \sum_{i,j} \Big(\mathbf{C}(\tau) \circ \ln \exp\big(\tau\, \mathbf{K}(\theta)\big)\Big)_{ij} \end{align}

## Practical optimization 101

• Parameterize away the constraints, if possible.
• Use a quasi-Newton method and a good guess.

## Rate matrix constraints

$$\mathscr{K} = \left\{\mathbf{K} = \{k_{ij}\} \in \mathbb{R}^{n \times n} : k_{ij} \geq 0 \text{ for all } i \neq j, k_{ii} = -\sum_{j\neq i} k_{ij} \right\}$$
• Off diagonal elements are non-negative (n.b. $k_{ij}=0$ is okay).
• Each row sums to zero.
• Must have a stationary distribution, $\pi > 0$, s.t. $\pi \mathbf{K} = 0$.
• Reversibility / detailed balance:
• $\pi_i k_{ij} = \pi_j k_{ji} \hspace{1em} \forall\; i \neq j$.

## Parameterize away the constraints

### Example: a 2 x 2 rate matrix

$$\mathbf{K}(\theta) \stackrel{?}{=} \begin{bmatrix}\theta_1 & \theta_2 \\ \theta_3 & \theta_4 \end{bmatrix} ; \theta \in \mathbb{R}^4$$

$$\mathbf{K}(\theta) \stackrel{?}{=} \begin{bmatrix}-e^{\theta_1} & e^{\theta_1} \\ e^{\theta_2} & -e^{\theta_2} \end{bmatrix} ; \theta \in \mathbb{R}^2$$

## Parameterize away the constraints

• Detailed balance is essentially symmetry
• $\sqrt{\frac{\pi_i}{\pi_j}} k_{ij} = \sqrt{\frac{\pi_j}{\pi_i}} k_{ji} = s_{ij} = s_{ji}$
• Only need to track "half" of the rate matrix
• Use log transform of the equilibrium distribution

\begin{align} \mathcal{L}(\theta; \tau) &= \sum_{i,j} \Big(\mathbf{C}(\tau) \circ \ln \exp\big(\tau\, \mathbf{K}(\theta)\big)\Big)_{ij} \\ \frac{\partial \mathcal{L}(\theta; \tau)}{\partial \theta_i} &= \text{something messy, but O(n^3).} \end{align}

Previous methods take $(O(n^5))$ or $(O(n^6))$ per iteration.

R.T. McGibbon and V.S. Pande, “Efficient maximum likelihood parameterization of continuous-time Markov processes,” (2015), arXiv:1407.8083.

## Computing error bars on the MLE

• Likelihood converges to a normal distribution.
• Asymptotic covariance $=$ inverse Hessian at the MLE.
• "Multivariate delta theorem"

$$Var(h) \approx - \nabla_\theta h(\hat{\theta})^T \cdot H(\hat{\theta})^{-1} \cdot \nabla_\theta h(\hat{\theta})$$

## Example 1

### Sparsity

A simple eight state Markov process. Connected states are labeled with the pairwise rate constants, $\mathbf{K}_{ij}$.

Convergence of the estimated rate matrix, $\mathbf{\hat{K}}$, to the true generating rate matrix for discrete-time trajectories of increasing length simulated from the process with a time step of 1.

## Example 1

### Sparsity

Comparison of the estimated and true off-diagonal rate matrix elements for a trajectory of length $N=10^{7}$ simulated from the process.

## Example 2

### Error bars

$$z_{ij} = \frac{\hat{g}_i - \hat{g}_j}{\sqrt{\sigma^2_{\hat{g}_i} + \sigma^2_{\hat{g}_j}}} \overset{?}{\sim} \mathcal{N}(0, 1).$$

## Example 3

### Accuracy

Relative error between continuous-time and discrete-time Markov models for kinetics on random graphs. Values below zero indicate lower error for the continuous-time model, whereas values above zero indicate the reverse.

## Example 4

### Comparison with discrete-time models on WW Domain

Relaxation timescales are essentially identical ($r^2 = 0.999978$), but the continuous-time model has fewer nonzero parameters.

## Example 4

### Comparison with discrete-time models on WW Domain

The MLE rate matrix is sparse, which should make it easier to inteperpret.

We can monitor the convergence of individual elements of the rate matrix.

## Conclusions

$$\frac{dp_t}{dt} = p\mathbf{K} \hspace{0.7in}vs.\hspace{0.7in} p_{t+\tau} = p_t \mathbf{T}(\tau)$$

• Continuous-time models are more natural scientifically
• But a little trickier to estimate statistically.
• We've fixed that.
• The resulting models
• A little sparser & easier to interpret.
• Implemented in MSMBuilder

## Thanks everyone!

Especially Muneeb, Matt, Christian, T.J., Vijay.

wwwmsmbuilder.org
githubmsmbuilder