Bayesian Model Order Selection for N-Body TTV Systems¶
Status: Technical specification Relates to: [[TTV Reverse N-Body Inference]], [[TTV Sensitivity and N-Body Math]], [[Reverse N-Body and Astrometric Parallax]] Last derived: 2026-03-21
0. Scope¶
This document specifies a rigorous Bayesian framework for answering the question: given a set of transit timing residuals, how many perturbing bodies are required to explain the data? This is model order selection — computing the evidence ratio between nested N-body models of increasing complexity. The framework will serve as the mathematical specification for OpenAstro's TTV inference pipeline.
1. Formal Problem Statement¶
1.1 The Data¶
The observed data is a time series of transit timing residuals:
$$\mathcal{D} = {(t_i, \delta t_i, \sigma_i)}_{i=1}^{N}$$
where: - $t_i$ is the expected (linear-ephemeris) transit time of the $i$-th transit: $t_i = t_0 + i \cdot P_b$ - $\delta t_i = t_i^{\text{obs}} - t_i$ is the observed timing residual (the TTV) - $\sigma_i$ is the $1\sigma$ uncertainty on $\delta t_i$, derived from transit light curve fitting (see [[TTV Sensitivity and N-Body Math]], Section 2.3) - $N$ is the total number of observed transits
The transit indices need not be contiguous — gaps due to seasonal observability or weather are encoded by the set of indices ${i}$ actually observed.
1.2 The Model Family¶
Define the model $\mathcal{M}k$ as a gravitational system of $k$ bodies orbiting a host star of known mass $M\star$:
- Body 1 is the known transiting planet with orbital parameters $\boldsymbol{\theta}1 = {P_1, t, e_1\cos\omega_1, e_1\sin\omega_1, i_1, \Omega_1, m_1/M_\star}$
- Bodies $2, \ldots, k$ are inferred perturbers, each with parameters $\boldsymbol{\theta}j = {P_j, t, e_j\cos\omega_j, e_j\sin\omega_j, i_j, \Omega_j, m_j/M_\star}$
The full parameter vector is $\boldsymbol{\theta}_k = (\boldsymbol{\theta}_1, \boldsymbol{\theta}_2, \ldots, \boldsymbol{\theta}_k)$.
For co-planar orbits (all $i_j = i_1$, all $\Omega_j = \Omega_1$), the free parameters per perturber reduce to 5: ${P_j, t_{0,j}, e_j\cos\omega_j, e_j\sin\omega_j, m_j/M_\star}$. The transiting planet's parameters are partially constrained by the linear ephemeris fit ($P_1$, $t_{0,1}$ known to high precision) and by transit photometry ($i_1$ from impact parameter, $m_1$ sometimes from RV).
Dimensionality:
| Model | Bodies | Free parameters (coplanar) | Free parameters (mutual inclination) |
|---|---|---|---|
| $\mathcal{M}_1$ | Transiter only | 4 ($P_1, t_{0,1}, e_1\cos\omega_1, e_1\sin\omega_1$) | 4 |
| $\mathcal{M}_2$ | Transiter + 1 perturber | 9 | 11 |
| $\mathcal{M}_3$ | Transiter + 2 perturbers | 14 | 18 |
| $\mathcal{M}_k$ | Transiter + $(k-1)$ perturbers | $4 + 5(k-1)$ | $4 + 7(k-1)$ |
Note: $\mathcal{M}_1$ is the null model — no perturber, residuals are pure noise. Under $\mathcal{M}_1$, the predicted TTV is identically zero after refitting the linear ephemeris.
1.3 The Evidence Integral¶
The Bayesian evidence (marginal likelihood) for model $\mathcal{M}_k$ is:
$$Z_k = P(\mathcal{D} | \mathcal{M}_k) = \int \mathcal{L}(\mathcal{D} | \boldsymbol{\theta}_k, \mathcal{M}_k) \, \pi(\boldsymbol{\theta}_k | \mathcal{M}_k) \, d\boldsymbol{\theta}_k$$
where: - $\mathcal{L}(\mathcal{D} | \boldsymbol{\theta}_k, \mathcal{M}_k)$ is the likelihood of the data given parameters $\boldsymbol{\theta}_k$ under model $\mathcal{M}_k$ - $\pi(\boldsymbol{\theta}_k | \mathcal{M}_k)$ is the prior distribution on the parameters - The integral is over the full parameter space of $\mathcal{M}_k$
This integral is the central computational challenge: it is high-dimensional (9–18 dimensions for practical cases), the integrand is concentrated in a tiny fraction of the prior volume, and the likelihood surface is multimodal.
1.4 The Bayes Factor and Decision Rule¶
The Bayes factor comparing model $\mathcal{M}k$ against $\mathcal{M}$ is:
$$B_{k, k-1} = \frac{Z_k}{Z_{k-1}}$$
The decision rule: prefer $\mathcal{M}k$ over $\mathcal{M}$ exceeds a threshold $B^\star$, where $B^\star$ encodes the prior odds and the cost asymmetry of false positives vs. false negatives (discussed in Section 5).}$ if $B_{k,k-1
The posterior odds ratio is:
$$\frac{P(\mathcal{M}k | \mathcal{D})}{P(\mathcal{M}} | \mathcal{D})} = B_{k,k-1} \cdot \frac{\pi(\mathcal{Mk)}{\pi(\mathcal{M}$$})
We adopt equal model priors $\pi(\mathcal{M}k) = \pi(\mathcal{M})$ unless there is strong astrophysical reason to prefer one model order (e.g., most known systems have 1 or 2 detectable perturbers; systems with 3+ detectable perturbers are rarer).
2. Prior Specification¶
2.1 Priors on the Transiting Planet (Body 1)¶
These parameters are strongly constrained by the linear ephemeris and transit photometry. We use informative priors:
| Parameter | Prior | Justification |
|---|---|---|
| $P_1$ | $\mathcal{N}(\hat{P}1, (3\sigma)^2)$ | Gaussian centered on measured period; 3$\sigma$ width allows for TTV-induced bias in the linear fit |
| $t_{0,1}$ | $\mathcal{N}(\hat{t}{0,1}, (3\sigma)^2)$ | Same rationale |
| $e_1\cos\omega_1$ | $\mathcal{U}(-0.3, 0.3)$ | Hot Jupiters are tidally circularized; moderate range for warm planets |
| $e_1\sin\omega_1$ | $\mathcal{U}(-0.3, 0.3)$ | Same |
2.2 Priors on Each Perturber (Bodies $j = 2, \ldots, k$)¶
Each perturber introduces 5 (coplanar) or 7 (non-coplanar) parameters. The prior choices are critical because the Bayes factor is sensitive to the prior volume (the Occam factor).
Period $P_j$¶
Recommended prior: Log-uniform (Jeffreys prior on a scale parameter).
$$\pi(\ln P_j) = \text{Uniform}(\ln P_{\min}, \ln P_{\max})$$
with $P_{\min} = P_1 / 4$ and $P_{\max} = 10 P_1$.
Justification: TTV sensitivity drops steeply for perturbers with $P_j \gg P_1$ or $P_j \ll P_1$ (the perturber must be dynamically coupled). The range $[P_1/4, 10 P_1]$ covers all mean-motion resonances from 4:1 interior to 1:10 exterior, spanning essentially all TTV-detectable configurations. [Source: Agol & Fabrycky (2018), Handbook of Exoplanets Ch. 7]
The Lindley-Jeffreys paradox sensitivity: The Bayes factor is sensitive to the width of the period prior. For a signal with true period $P_c$, the Bayes factor scales as:
$$B_{2,1} \propto \frac{\sigma_{P_c}^{\text{posterior}}}{\Delta P^{\text{prior}}} \cdot \exp\left(\frac{\Delta\chi^2}{2}\right)$$
where $\Delta P^{\text{prior}} = P_{\max} - P_{\min}$ (or $\ln P_{\max} - \ln P_{\min}$ for the log-prior) and $\sigma_{P_c}^{\text{posterior}}$ is the posterior width on $P_c$. This is the Occam factor: wider priors penalize the complex model more.
Quantitative sensitivity analysis: Consider two prior choices: - (A) $\ln P_j \sim \mathcal{U}(\ln P_1/4, \ln 10 P_1)$: prior range $\Delta \ln P = \ln 40 \approx 3.7$ - (B) $\ln P_j \sim \mathcal{U}(\ln P_1/10, \ln 100 P_1)$: prior range $\Delta \ln P = \ln 1000 \approx 6.9$
The Occam penalty for prior (B) relative to (A) is:
$$\Delta \ln B = \ln\left(\frac{3.7}{6.9}\right) \approx -0.62$$
So broadening the period prior by a factor of $\sim$2 in log-space reduces $\ln B$ by $\sim$0.6 — a modest but non-negligible effect. For a marginal detection where $\ln B \approx 3$, this shifts the interpretation from "strong" to "moderate" on the Jeffreys scale. We recommend always reporting the Bayes factor under at least two prior widths to demonstrate robustness. [Source: Lindley (1957), Biometrika 44, 187; Robert (2014), arXiv:1407.5023 — Jeffreys-Lindley paradox]
[NOVEL] Resonance-aware period prior: Instead of a smooth log-uniform prior, construct a prior that up-weights period ratios near low-order mean-motion resonances where TTV sensitivity is highest:
$$\pi(P_j) \propto \frac{1}{P_j} \left[1 + \alpha \sum_{j:(j-k)} w_{jk} \cdot \mathcal{N}\left(\frac{P_j}{P_1} \;\middle|\; \frac{j}{j-k}, \sigma_{\text{res}}^2\right)\right]$$
where the sum runs over resonances $j:(j-k)$ with weights $w_{jk} \propto 1/j$ (lower-order resonances stronger), $\sigma_{\text{res}} \sim 0.05$ captures the width of the TTV-sensitive zone around each resonance, and $\alpha$ controls the relative weight of resonant vs. non-resonant regions. With $\alpha = 2$, the resonance peaks roughly triple the prior density at exact resonance relative to a smooth background.
This prior reflects astrophysical knowledge: planets near MMR are more likely to produce detectable TTVs. However, it introduces a prior dependency that must be carefully reported. We recommend running the analysis with both the log-uniform and resonance-aware priors and comparing.
Mass ratio $\mu_j = m_j / M_\star$¶
Recommended prior: Log-uniform (Jeffreys prior).
$$\pi(\ln \mu_j) = \text{Uniform}(\ln 10^{-7}, \ln 10^{-2})$$
Justification: This spans from sub-Earth ($\sim 0.03 M_\oplus$ for a solar-mass star) to super-Jupiter ($\sim 3 M_J$). The log-uniform prior assigns equal probability per decade of mass, which is the maximum-entropy prior for a positive scale parameter when no informative astrophysical constraint is available. Five decades of dynamic range is appropriate given the TTV sensitivity range.
Physical constraint (Hill stability): For two planets with mass ratios $\mu_1, \mu_j$ and semi-major axis ratio $a_1/a_j$, the Hill stability criterion requires [Source: Gladman (1993), Icarus 106, 247; Chambers et al. (1996), Icarus 119, 261]:
$$\frac{|a_j - a_1|}{a_1} > 2\sqrt{3} \left(\frac{\mu_1 + \mu_j}{3}\right)^{1/3}$$
For $\mu_j = 10^{-2}$ (the upper prior bound), this requires $|a_j - a_1|/a_1 > 0.43$, or $P_j/P_1 > 1.8$ (exterior) or $< 0.56$ (interior). This automatically excludes extremely massive perturbers at period ratios near unity. We enforce Hill stability as a hard prior boundary: any $(\mu_j, P_j)$ combination violating this criterion receives zero prior probability.
Eccentricity vector $(e_j \cos\omega_j, e_j \sin\omega_j)$¶
Recommended prior: Uniform on the eccentricity vector components.
$$\pi(e_j\cos\omega_j, e_j\sin\omega_j) = \text{Uniform on the disk } e_j < e_{\max}$$
with $e_{\max} = 0.5$ for the default prior. This induces a triangular (linear) prior on $e_j$: $\pi(e_j) = 2e_j / e_{\max}^2$ for $0 \leq e_j \leq e_{\max}$, and a uniform prior on $\omega_j$.
Justification: The eccentricity vector parameterization avoids the coordinate singularity at $e = 0$ that plagues $(e, \omega)$ parameterization. The induced linear prior on $e$ is the natural "area element" prior on the eccentricity vector plane. It slightly favours larger eccentricities, which is conservative for model selection (it makes it harder to fit the data with a perturber, reducing false positives).
Physical constraint: Orbits with $e_j > 0.5$ in multi-planet systems close to MMR are typically unstable on $10^3$–$10^5$ orbit timescales. We enforce $e_j < 0.5$ as a hard boundary but note that for specific resonant configurations, the stability boundary may be tighter (e.g., for 2:1 resonance, Wisdom 1980 showed overlap at $e \gtrsim 0.3$ for comparable masses). [Source: Wisdom (1980), AJ 85, 1122 — resonance overlap criterion]
[NOVEL] Stability-informed eccentricity prior: Rather than a hard cut at $e_{\max} = 0.5$, use an $N$-body stability map to set a $P_j$-dependent eccentricity ceiling. For each proposed $(P_j, e_j)$, run a short ($10^4$ orbit) integration to check for close encounters. If the system is unstable, assign zero likelihood. This is computationally expensive (~1s per stability check) but eliminates unphysical regions of parameter space and tightens the effective prior volume, improving the Occam factor calculation. In practice, this can be pre-computed on a grid and interpolated.
Transit epoch $t_{0,j}$ (mean anomaly at epoch)¶
Recommended prior: Uniform over one orbital period.
$$\pi(t_{0,j}) = \text{Uniform}(t_{\text{ref}}, t_{\text{ref}} + P_j)$$
where $t_{\text{ref}}$ is the reference epoch of the integration. This is equivalent to a uniform prior on the mean anomaly $\lambda_j$ at the reference epoch: $\pi(\lambda_j) = \text{Uniform}(0, 2\pi)$.
Justification: Without a transit detection of planet $j$, its orbital phase is completely unconstrained a priori.
Inclination $i_j$ and longitude of ascending node $\Omega_j$ (non-coplanar case)¶
Recommended prior:
$$\pi(\cos i_j) = \text{Uniform}(\cos i_{\max}, 1)$$
where $i_{\max}$ is a generous upper bound on the mutual inclination, typically $30°$–$60°$ for dynamically cool systems. This gives an isotropic prior on the orbital pole orientation within the allowed cone.
$$\pi(\Omega_j) = \text{Uniform}(0, 2\pi)$$
Justification: Multi-planet systems discovered by Kepler show low mutual inclinations ($\lesssim 5°$ typically; Fabrycky et al. 2014), but TTV-active systems can have higher mutual inclinations if resonantly locked. The prior should be broad enough not to exclude physically plausible configurations. [Source: Fabrycky et al. (2014), ApJ 790, 146]
2.3 Prior Summary Table¶
| Parameter | Prior | Range | Notes |
|---|---|---|---|
| $P_j$ | Log-uniform | $[P_1/4, 10 P_1]$ | Test sensitivity with $[P_1/10, 100 P_1]$ |
| $\mu_j = m_j/M_\star$ | Log-uniform | $[10^{-7}, 10^{-2}]$ | Hill stability enforced |
| $e_j\cos\omega_j$ | Uniform | $[-0.5, 0.5]$ | Hard boundary at $e_j = 0.5$ |
| $e_j\sin\omega_j$ | Uniform | $[-0.5, 0.5]$ | Same |
| $t_{0,j}$ | Uniform | $[t_{\text{ref}}, t_{\text{ref}} + P_j]$ | Period-dependent range |
| $\cos i_j$ | Uniform | $[\cos 60°, 1]$ | Non-coplanar only |
| $\Omega_j$ | Uniform | $[0, 2\pi]$ | Non-coplanar only |
3. Likelihood Function¶
3.1 Standard Gaussian Likelihood¶
The likelihood of the data given model parameters is:
$$\mathcal{L}(\mathcal{D} | \boldsymbol{\theta}k, \mathcal{M}_k) = \prod\right)$$}^{N} \frac{1}{\sqrt{2\pi \sigma_i^2}} \exp\left(-\frac{(\delta t_i - \delta t_i^{\text{model}}(\boldsymbol{\theta}_k))^2}{2\sigma_i^2
or in log form:
$$\ln \mathcal{L} = -\frac{1}{2} \sum_{i=1}^{N} \left[\frac{(\delta t_i - \delta t_i^{\text{model}}(\boldsymbol{\theta}_k))^2}{\sigma_i^2} + \ln(2\pi\sigma_i^2)\right]$$
where $\delta t_i^{\text{model}}(\boldsymbol{\theta}_k) = t_i^{\text{TTVFast}}(\boldsymbol{\theta}_k) - t_i$ is the predicted TTV from the $N$-body integration.
The null model ($\mathcal{M}_1$, no perturber) predicts $\delta t_i^{\text{model}} = 0$ after refitting the linear ephemeris, so its log-likelihood is:
$$\ln \mathcal{L}1 = -\frac{1}{2} \sum + \ln(2\pi\sigma_i^2)\right]$$}^{N} \left[\frac{\delta t_i^2}{\sigma_i^2
3.2 Validity Conditions for the Gaussian Likelihood¶
The Gaussian form is valid when:
-
Measurement errors are Gaussian: The transit midtime uncertainty $\sigma_i$ is dominated by photon noise, which is Gaussian in the high-count regime (typically $> 10^4$ photons per integration, always satisfied for $V < 13$ targets with 30 cm apertures). Verified.
-
Errors are independent across transits: Each $\sigma_i$ is from a separate light curve fit. Systematic effects (instrumental drifts, stellar variability) that correlate across transits would violate this. Mitigation: Include a jitter term $\sigma_{\text{jit}}$ added in quadrature:
$$\sigma_i^2 \to \sigma_i^2 + \sigma_{\text{jit}}^2$$
where $\sigma_{\text{jit}}$ is a free hyperparameter sampled alongside $\boldsymbol{\theta}_k$. This absorbs unmodeled systematics at the cost of one additional parameter (negligible Occam penalty). [Source: Gregory (2005), ApJ 631, 1198 — jitter term in transit fitting]
-
Forward model errors are negligible: TTVFast integration errors must be $\ll \sigma_i$. With a timestep of $\Delta t = 0.05 \times P_{\min}$, TTVFast achieves $\sim 10^{-6} P$ accuracy per transit time ($\sim 0.4$ s for $P = 5$ d). For OpenAstro data with $\sigma_i \gtrsim 30$ s, the model error is $< 2\%$ of $\sigma_i$. Verified. [Source: Deck et al. (2014), ApJ 787, 132]
-
Timing residuals are in the linear response regime: The TTV must be small compared to the orbital period ($\delta t_i / P_1 \ll 1$) so that the transit-crossing detection in TTVFast is not confused. For $\delta t \lesssim 0.01 P$ (e.g., $< 70$ min for a 5-day orbit), this is always satisfied for physically realistic systems.
3.3 When the Likelihood Becomes Non-Gaussian¶
The Gaussian likelihood breaks down in specific regimes:
Near exact resonance ($|\Delta| < 0.001$): The TTV signal becomes quasi-periodic with large amplitude variations. The residuals $\delta t_i - \delta t_i^{\text{model}}$ can develop heavy tails because the forward model is extremely sensitive to small parameter changes — a slight error in $P_c$ leads to rapid phase drift. The effective likelihood surface develops sharp ridges and cusps. In this regime: - The posterior becomes highly non-Gaussian (banana-shaped or crescent-shaped in the $P_c$-$\mu_c$ plane) - Standard MCMC proposals become inefficient - Remedy: Use nested sampling (which does not assume Gaussian posteriors) or reparameterize to $(P_c/P_1 - j/(j-k), \mu_c)$ to straighten the degeneracy ridge [Source: Hadden & Lithwick (2014), ApJ 787, 80]
High eccentricity ($e > 0.3$): The TTV signal develops harmonics beyond the fundamental super-period. The likelihood surface becomes multi-lobed as different eccentricity-argument combinations produce similar harmonic content. Additionally, high-eccentricity orbits may undergo apsidal precession on the observational timescale, causing the TTV signal to be non-periodic. In this regime: - Include the precession timescale as a diagnostic: $\tau_{\text{prec}} \sim P/(m_c/M_\star) / (a/\Delta a)^2$ - If $\tau_{\text{prec}} < T_{\text{obs}}$, the TTV signal is a chirp, not a sinusoid; the standard Fourier decomposition fails - Remedy: Use the full $N$-body model (TTVFast) rather than analytic TTV approximations. The Gaussian likelihood remains valid (it only describes measurement noise), but the posterior geometry is more complex.
Correlated timing errors: If stellar activity or instrumental systematics introduce correlated timing errors, the likelihood becomes:
$$\ln \mathcal{L} = -\frac{1}{2} \boldsymbol{r}^T \mathbf{C}^{-1} \boldsymbol{r} - \frac{1}{2} \ln |\mathbf{C}| - \frac{N}{2} \ln(2\pi)$$
where $\boldsymbol{r} = \boldsymbol{\delta t} - \boldsymbol{\delta t}^{\text{model}}$ and $\mathbf{C}$ is the covariance matrix. For OpenAstro data from heterogeneous telescopes, off-diagonal terms are expected to be small (different instruments, different nights). The diagonal-only (independent) approximation should be tested by computing the Durbin-Watson statistic on the residuals.
4. Evidence Computation¶
4.1 Approach 1: Nested Sampling¶
Algorithm: Nested sampling (Skilling 2006) transforms the evidence integral into a one-dimensional integral over the prior volume $X$: [Source: Skilling (2006), Bayesian Analysis 1, 833]
$$Z = \int_0^1 \mathcal{L}(X) \, dX$$
where $\mathcal{L}(X)$ is the inverse of the cumulative likelihood distribution under the prior. The algorithm maintains a set of $n_{\text{live}}$ "live points" sampled from the prior, iteratively replacing the lowest-likelihood point with one drawn from the prior subject to $\mathcal{L} > \mathcal{L}_{\min}$.
Implementations: - dynesty (Speagle 2020): Pure Python, dynamic nested sampling (adjusts $n_{\text{live}}$ to concentrate samples where the evidence integrand peaks). Supports multi-ellipsoidal decomposition for multimodal posteriors. [Source: Speagle (2020), MNRAS 493, 3132] - MultiNest (Feroz et al. 2009): Fortran with Python wrapper (PyMultiNest). Multi-modal nested sampling using ellipsoidal decomposition. Battle-tested in exoplanet literature. [Source: Feroz, Hobson & Bridges (2009), MNRAS 398, 1601; Feroz et al. (2019), OJAp 2, 10] - UltraNest (Buchner 2021): Python, reactive nested sampling with MLFriends algorithm. Better for high-dimensional problems. [Source: Buchner (2021), JOSS 6, 3001]
Evidence accuracy: With $n_{\text{live}}$ live points, the uncertainty on $\ln Z$ is approximately [Source: Skilling (2006)]:
$$\sigma_{\ln Z} \approx \sqrt{\frac{H}{n_{\text{live}}}}$$
where $H$ is the Kullback-Leibler divergence (information gained from prior to posterior), typically $H \sim 10$–$50$ nats for TTV problems. With $n_{\text{live}} = 500$: $\sigma_{\ln Z} \approx 0.1$–$0.3$, adequate for Bayes factor computation.
Computational cost:
| Model | Dimensions | $n_{\text{live}}$ | Likelihood evals | Wall time (TTVFast @ 0.1 s) | Wall time (TTVFast @ 2 ms) |
|---|---|---|---|---|---|
| $\mathcal{M}_1$ (null) | 4 | 250 | ~5,000 | 8 min | 10 s |
| $\mathcal{M}_2$ (1 perturber) | 9 | 500 | ~50,000 | 83 min | 100 s |
| $\mathcal{M}_3$ (2 perturbers) | 14 | 1000 | ~500,000 | 14 hr | 17 min |
| $\mathcal{M}_4$ (3 perturbers) | 19 | 2000 | ~5,000,000 | 6 days | 2.8 hr |
Note: The 0.1 s per evaluation is a conservative estimate for long-baseline ($> 5$ yr), many-body ($k \geq 3$) integrations. For $k = 2$ with a 2-year baseline, 2 ms per call is realistic (see [[Reverse N-Body and Astrometric Parallax]], Section 2.2). The cost scaling is approximately exponential in the dimensionality: $N_{\text{eval}} \propto n_{\text{live}} \cdot d \cdot H$ where $d$ is the dimensionality. [Source: Buchner (2023), Statistics and Computing — nested sampling scaling]
4.2 Approach 2: Thermodynamic Integration¶
Algorithm: Run parallel tempering MCMC at temperatures ${T_\ell}_{\ell=1}^{L}$ with $T_1 = 1$ (posterior) and $T_L = \infty$ (prior). The evidence is: [Source: Lartillot & Philippe (2006), Systematic Biology 55, 195; Friel & Pettitt (2008), JRSS-B 70, 589]
$$\ln Z = \int_0^1 \langle \ln \mathcal{L} \rangle_\beta \, d\beta$$
where $\beta = 1/T$ is the inverse temperature and $\langle \ln \mathcal{L} \rangle_\beta$ is the expected log-likelihood under the tempered posterior $p(\theta)^\beta$. In practice, this is approximated by the trapezoidal rule:
$$\ln Z \approx \sum_{\ell=1}^{L-1} \frac{\beta_{\ell+1} - \beta_\ell}{2} \left[\langle \ln \mathcal{L} \rangle_{\beta_\ell} + \langle \ln \mathcal{L} \rangle_{\beta_{\ell+1}}\right]$$
Temperature ladder: Use a geometric spacing $\beta_\ell = (\beta_{\min})^{(\ell-1)/(L-1)}$ with $L \approx 20$–$50$ temperatures and $\beta_{\min} = 1/T_{\max}$ chosen so that the hottest chain samples nearly from the prior. [Source: Vousden, Farr & Mandel (2016), MNRAS 455, 1919]
Advantages: - Parallel tempering naturally handles multimodality (hot chains explore freely, swap information to cold chains) - Adapts naturally to the BOINC architecture where each temperature level is a separate work unit (see [[TTV Reverse N-Body Inference]]) - Provides posterior samples at $T = 1$ as a byproduct
Disadvantages: - Evidence accuracy depends on the temperature ladder resolution — systematic error from the trapezoidal approximation - Requires careful tuning of swap acceptance rates ($\sim$20–50% optimal; Kone & Liu 2005) - Evidence estimate variance is typically larger than nested sampling for the same compute budget [Source: Nelson et al. (2020), AJ 159, 73 — comparison in exoplanet context]
Computational cost:
| Model | Temperatures | Steps/chain | Total likelihood evals | Wall time (0.1 s) | Wall time (2 ms) |
|---|---|---|---|---|---|
| $\mathcal{M}_2$ | 30 | 100,000 | 3,000,000 | 3.5 days | 100 min |
| $\mathcal{M}_3$ | 40 | 200,000 | 8,000,000 | 9 days | 4.4 hr |
4.3 Approach 3: Savage-Dickey Density Ratio¶
When models are nested — i.e., $\mathcal{M}_{k-1}$ is obtained from $\mathcal{M}_k$ by setting a subset of parameters to fixed values — the Bayes factor simplifies to: [Source: Dickey (1971), Annals of Mathematical Statistics 42, 204; Verdinelli & Wasserman (1995), JASA 90, 614]
$$B_{k-1, k} = \frac{\pi(\boldsymbol{\psi}_0 | \mathcal{M}_k)}{p(\boldsymbol{\psi}_0 | \mathcal{D}, \mathcal{M}_k)}$$
where $\boldsymbol{\psi}_0$ is the parameter value under the simpler model (e.g., $\mu_c = 0$ for "no perturber") and $p(\boldsymbol{\psi}_0 | \mathcal{D}, \mathcal{M}_k)$ is the posterior density of the complex model evaluated at that point.
Application to TTV model selection: The null hypothesis $\mathcal{M}_1$ corresponds to $\mu_c = 0$ in $\mathcal{M}_2$. The Savage-Dickey ratio is:
$$B_{1,2} = \frac{\pi(\mu_c = 0)}{\hat{p}(\mu_c = 0 | \mathcal{D}, \mathcal{M}_2)}$$
Problem: Our prior is $\pi(\ln \mu_c)$ uniform, which has $\pi(\mu_c = 0) = 0$. The Savage-Dickey ratio is undefined for the log-uniform mass prior.
Workaround: Reparametrize to a prior that includes $\mu_c = 0$, e.g., a half-Cauchy or truncated Gaussian prior on $\mu_c$. But this changes the prior and hence the Bayes factor. This is a fundamental limitation: Savage-Dickey requires the nested point to have non-zero prior density.
Verdict: Savage-Dickey is not directly applicable to the TTV model selection problem with standard priors. It could be used for comparing sub-hypotheses within $\mathcal{M}_2$ (e.g., "is the eccentricity zero?" via $e_c = 0$), but not for the primary model order question.
4.4 Recommendation for OpenAstro¶
Primary method: nested sampling with dynesty (dynamic mode).
Rationale: 1. Nested sampling directly computes the evidence $Z_k$ — this is its design purpose. Thermodynamic integration computes $Z$ as a secondary product with lower accuracy. [Source: Skilling (2006)] 2. Dynamic nested sampling (dynesty) is more efficient than static nested sampling for evidence computation in moderate dimensions ($d \leq 20$). [Source: Speagle (2020)] 3. The Python ecosystem (dynesty + TTVFast-python) is the simplest integration path for OpenAstro's FastAPI + Python stack. 4. For $\mathcal{M}_2$ (the most common comparison), the wall time is $\sim$100 s with TTVFast at 2 ms/call — tractable on a single VPS. 5. dynesty handles multimodality through multi-ellipsoidal decomposition, which is critical for the period-ratio and phase degeneracies.
Secondary method: thermodynamic integration via parallel tempering, deployed on BOINC.
Rationale: 1. For $\mathcal{M}_3$ and beyond, nested sampling becomes expensive ($> 500{,}000$ evaluations). Parallel tempering distributes naturally across BOINC work units (one temperature per work unit). 2. The BOINC PT-MCMC infrastructure proposed in [[TTV Reverse N-Body Inference]] can serve double duty: posterior sampling AND evidence computation via thermodynamic integration. 3. Use the Stepping Stone estimator (Xie et al. 2011) instead of the trapezoidal rule for improved evidence accuracy from the same PT chains. [Source: Xie et al. (2011), Systematic Biology 60, 150]
[NOVEL] The two-tier strategy — dynesty for $\mathcal{M}_1$ vs $\mathcal{M}_2$ on the VPS, switching to BOINC parallel tempering with thermodynamic integration for $\mathcal{M}_2$ vs $\mathcal{M}_3$ — matches the compute architecture to the problem scale. This is specific to OpenAstro's hybrid VPS + volunteer-compute infrastructure.
5. Decision Threshold¶
5.1 The Jeffreys Scale¶
The standard interpretation of Bayes factors (Jeffreys 1961, updated by Kass & Raftery 1995): [Source: Jeffreys (1961), Theory of Probability, 3rd ed.; Kass & Raftery (1995), JASA 90, 773]
| $\ln B_{k, k-1}$ | $B_{k, k-1}$ | Interpretation |
|---|---|---|
| 0 – 1 | 1 – 3 | Barely worth mentioning |
| 1 – 3 | 3 – 20 | Positive / substantial |
| 3 – 5 | 20 – 150 | Strong |
| $> 5$ | $> 150$ | Very strong / decisive |
5.2 Is This Threshold Appropriate for TTV Planet Detection?¶
The appropriate threshold depends on the cost asymmetry between errors:
Cost of a false positive (claiming a perturber that isn't there): - OpenAstro publishes an inferred planet that doesn't exist - Wastes follow-up resources (RV observations, additional transit monitoring) - Reputational damage to a citizen-science platform that is still establishing credibility - The cost is high, especially in the early stages of the project
Cost of a false negative (missing a real perturber): - A real planet goes undetected in the TTV signal - The data continues to accumulate; the perturber will be detected eventually with more transits - The opportunity cost is low — the observation is ongoing
Conclusion: The cost asymmetry favours a conservative threshold. We recommend:
$$\ln B_{k,k-1} > 5 \quad (B > 150) \text{ for claiming a detection}$$
This is more conservative than the standard "strong evidence" threshold of $\ln B > 3$, reflecting the higher false-positive cost for a citizen-science platform.
[NOVEL] For OpenAstro publications, we define three tiers: - Candidate: $3 < \ln B < 5$ — reported as a candidate perturber, flagged for additional monitoring - Detection: $5 < \ln B < 8$ — reported as a probable detection, triggering RV follow-up requests - Confirmed: $\ln B > 8$ and independently validated (e.g., consistent with RV data, or the perturber also transits)
5.3 Worked Example: Minimum Detectable TTV Amplitude¶
Setup: $N = 30$ transits, $\sigma_i = \sigma = 2$ min for all transits, sinusoidal TTV signal $\delta t_i^{\text{true}} = A \sin(2\pi t_i / P_{\text{TTV}} + \phi)$.
Under $\mathcal{M}_1$ (no perturber):
$$\ln \mathcal{L}1 = -\frac{1}{2\sigma^2} \sum\ln(2\pi\sigma^2)$$}^{30} A^2 \sin^2(2\pi t_i / P_{\text{TTV}} + \phi) - \frac{30}{2
For well-sampled data (transits spanning $\gg P_{\text{TTV}}$):
$$\sum_{i=1}^{30} \sin^2(\ldots) \approx 15$$
so:
$$\ln \mathcal{L}_1 \approx -\frac{15 A^2}{2\sigma^2} + \text{const}$$
Under $\mathcal{M}_2$ at the true parameters:
$$\ln \mathcal{L}_2 \approx 0 + \text{const}$$
(residuals are pure noise). The maximum improvement in log-likelihood:
$$\Delta \ln \mathcal{L}_{\max} = \frac{15 A^2}{2\sigma^2} = \frac{15 A^2}{2 \times 4} = \frac{15 A^2}{8} \quad \text{(with } A, \sigma \text{ in minutes)}$$
The Occam penalty for $\mathcal{M}_2$ with 5 extra parameters (see Section 1.2), assuming the posterior is $\sim$100 times narrower than the prior for each parameter:
$$\ln \text{Occam} \approx -\frac{5}{2} \ln(100) = -11.5$$
(This is the estimate from [[Reverse N-Body and Astrometric Parallax]], Section 5.2.)
The approximate Bayes factor is:
$$\ln B_{2,1} \approx \Delta \ln \mathcal{L}_{\max} + \ln \text{Occam} = \frac{15 A^2}{8} - 11.5$$
Setting $\ln B_{2,1} > 5$ (our detection threshold):
$$\frac{15 A^2}{8} > 16.5$$
$$A^2 > 8.8 \quad \Rightarrow \quad A > 2.97 \text{ min}$$
Result: For 30 transits with $\sigma = 2$ min, a TTV amplitude of $A \gtrsim 3$ min produces $\ln B > 5$ (decisive evidence for a perturber).
Using the TTV amplitude formula from [[TTV Sensitivity and N-Body Math]], Section 2.2:
$$A_{\text{TTV}} \approx \frac{m_c}{M_\star} \cdot \frac{P_b}{2\pi|\Delta|}$$
For $P_b = 5$ d, $|\Delta| = 0.05$ (5% from 2:1 resonance), $M_\star = M_\odot$:
$$3 \text{ min} = 180 \text{ s} = \mu_c \cdot \frac{5 \times 86400}{2\pi \times 0.05} = \mu_c \cdot 1.375 \times 10^6 \text{ s}$$
$$\mu_c > 1.31 \times 10^{-4} \quad \Rightarrow \quad m_c > 0.131 M_J \approx 42 M_\oplus$$
A Neptune-to-Saturn mass perturber near 2:1 resonance is detectable at $\ln B > 5$ with 30 transits at 2-min precision. Lighter perturbers ($\lesssim 15 M_\oplus$) would require either more transits, better precision, or closer proximity to resonance.
Sensitivity scaling: The minimum detectable amplitude scales as:
$$A_{\min} \propto \sigma \cdot \sqrt{\frac{d_{\text{new}}}{N}}$$
where $d_{\text{new}} = 5$ is the number of new parameters. Doubling the number of transits reduces $A_{\min}$ by $\sqrt{2}$.
6. Degeneracy and Multi-Modality¶
6.1 Sources of Near-Degenerate Likelihoods¶
The TTV likelihood surface is generically multimodal. The principal degeneracies are:
Period-ratio aliasing (resonance confusion): A TTV super-period $P_{\text{TTV}}$ can be produced by a perturber near ANY mean-motion resonance $j:(j-k)$ with the appropriate $\Delta$:
$$P_{\text{TTV}} = \frac{1}{|j/P_c - (j-k)/P_b|}$$
For a given observed $P_{\text{TTV}}$, different $(j, k)$ pairs yield different $P_c$ values. For example, an observed $P_{\text{TTV}} = 500$ days for a $P_b = 10$ day transiter can be produced by: - $P_c \approx 20.04$ d (near 2:1, $\Delta = 0.002$) - $P_c \approx 15.05$ d (near 3:2, $\Delta = 0.003$) - $P_c \approx 30.10$ d (near 3:1, $\Delta = 0.003$)
Each solution is a separate mode in the posterior. The modes have different peak likelihoods (because the TTV waveform shape differs for different resonances — the harmonic content of the signal depends on $j$ and $k$), but the differences can be small, especially with noisy data.
The mass-eccentricity degeneracy: Within each resonance mode, the TTV amplitude depends on $\mu_c / |\Delta|$ and on $f(j, k, e_b, e_c)$. A lower-mass perturber on an eccentric orbit can produce the same amplitude as a higher-mass perturber on a circular orbit. This creates a ridge in the likelihood surface in the $(\mu_c, e_c)$ plane, roughly along:
$$\mu_c \cdot g(e_c) \approx \text{const}$$
where $g(e_c)$ is a resonance-dependent function that increases with $e_c$ for first-order resonances. [Source: Lithwick, Xie & Wu (2012), ApJ 761, 122 — see their Eq. 8 for the eccentricity-dependent TTV formula]
Phase ambiguity: If the TTV signal is sinusoidal and only partially sampled (baseline $< P_{\text{TTV}}$), the phase is poorly constrained, creating a continuous degeneracy in $(t_{0,c}, \omega_c)$. If the signal is well-sampled over a full cycle, the phase is determined up to a discrete ambiguity (the "chopping" signal breaks this; see [[TTV Sensitivity and N-Body Math]], Section 4.4).
6.2 Handling Multi-Modality in Sampling¶
For nested sampling (dynesty/MultiNest):
- Multi-ellipsoidal decomposition (MultiNest) or multi-bound methods (dynesty) are designed to handle isolated modes. Each mode is enclosed by a separate ellipsoid, and live points are distributed across modes proportionally to their evidence contribution.
- Use a generous number of live points: $n_{\text{live}} \geq 50 d$ where $d$ is dimensionality. For $\mathcal{M}2$ ($d = 9$): $n \geq 450$.
- Enable the }'multi' bounding method in dynesty for multimodal problems.
- Mode identification: After sampling, cluster the posterior samples (e.g., using Gaussian mixture models or HDBSCAN) to identify separate modes and report per-mode evidence contributions.
For parallel tempering MCMC: - High-temperature chains ($T \gg 1$) can jump between modes that are separated by likelihood barriers - The swap mechanism communicates mode information down to the $T = 1$ chain - Critical tuning: The temperature ladder must include enough intermediate temperatures that swaps propagate efficiently. Use the adaptive temperature scheme of Vousden et al. (2016). [Source: Vousden, Farr & Mandel (2016), MNRAS 455, 1919] - Monitor inter-mode swap rates: if the $T = 1$ chain never visits a mode that higher-$T$ chains find, the evidence estimate is biased.
Recommended mode-hopping strategy: 1. Run a preliminary nested sampling pass with $n_{\text{live}} = 200$ to identify modes (fast, ~10,000 evaluations) 2. Cluster the live points at termination to enumerate modes 3. Re-run nested sampling separately on each mode (restricting the prior to the mode's bounding ellipsoid) for high-accuracy per-mode evidence 4. Sum: $Z_k = \sum_{\text{modes}} Z_k^{(m)}$
6.3 Marginalisation vs. Separate Reporting¶
Should we marginalise over degenerate modes or report each separately?
The Bayes factor $B_{k,k-1}$ naturally marginalises over all modes — the evidence integral $Z_k$ includes contributions from all modes of $\mathcal{M}_k$. This is the correct quantity for model selection.
However, for parameter estimation (once $\mathcal{M}_k$ is selected), reporting the marginalised posterior is misleading if the modes correspond to physically distinct configurations (e.g., perturber near 2:1 vs. 3:2 resonance). The marginalised posterior would show bimodal distributions with a "mean" that corresponds to no physical configuration.
[NOVEL] OpenAstro reporting standard: 1. For model selection: Report $\ln B_{k,k-1}$ computed from the full (mode-marginalised) evidence. This is the rigorous quantity. 2. For parameter estimation: Report per-mode posteriors separately. Each mode is labeled by its resonance identification (e.g., "Mode A: perturber near 2:1, $P_c = 20.04 \pm 0.15$ d, $\mu_c = (1.2 \pm 0.3) \times 10^{-4}$"). Include the per-mode evidence weight $w_m = Z_k^{(m)} / Z_k$ so the reader can assess relative plausibility. 3. For the live posterior dashboard (see [[TTV Reverse N-Body Inference]], Section on pipeline): Display the top 3 modes by evidence weight, with live-updating posteriors as new transit data arrives.
7. Amortised Inference Alternative¶
7.1 Neural Posterior Estimation (NPE) for TTV¶
Simulation-based inference (SBI) methods train a neural network to approximate the posterior $p(\boldsymbol{\theta} | \mathcal{D})$ directly from simulated data-parameter pairs, amortising the cost of inference over many datasets. [Source: Cranmer, Brehmer & Louppe (2020), PNAS 117, 30055; Lueckmann et al. (2021), PMLR 130, 1289 — sbi library]
The approach: 1. Generate a training set of $N_{\text{train}}$ simulations: draw $\boldsymbol{\theta}^{(j)} \sim \pi(\boldsymbol{\theta})$, compute $\mathcal{D}^{(j)} = \text{TTVFast}(\boldsymbol{\theta}^{(j)}) + \text{noise}$ 2. Train a conditional density estimator (normalizing flow) $q_\phi(\boldsymbol{\theta} | \mathcal{D})$ to approximate $p(\boldsymbol{\theta} | \mathcal{D})$ 3. At inference time, evaluate $q_\phi(\boldsymbol{\theta} | \mathcal{D}_{\text{obs}})$ — a single forward pass through the network, taking milliseconds
Two variants: - NPE (Neural Posterior Estimation): Directly estimates $p(\boldsymbol{\theta} | \mathcal{D})$ using a normalizing flow conditioned on summary statistics of $\mathcal{D}$. [Source: Papamakarios & Murray (2016), NeurIPS; Greenberg et al. (2019), ICML] - NRE (Neural Ratio Estimation): Estimates the likelihood-to-evidence ratio $r(\boldsymbol{\theta}, \mathcal{D}) = p(\mathcal{D} | \boldsymbol{\theta}) / p(\mathcal{D})$, which gives the posterior up to a normalizing constant. Can be used for model comparison by estimating $Z_k$ from the ratio. [Source: Hermans et al. (2020), ICML; Miller et al. (2021), NeurIPS]
7.2 Training Data Requirements¶
The training set size depends on: - Dimensionality of $\boldsymbol{\theta}$: For $\mathcal{M}2$ ($d = 9$), the rule of thumb is $N \sim 10^4$–$10^5$ simulations for a well-trained normalizing flow. [Source: Lueckmann et al. (2021)] - }Complexity of the posterior: Multimodal posteriors require more training data to learn the mode structure. - Summary statistics: Rather than conditioning on raw ${\delta t_i}$ (variable length), use fixed-dimensional summary statistics: (a) TTV periodogram peaks (amplitude, period, phase for the top 3 frequencies), (b) autocorrelation at lags 1–5, (c) $\chi^2$ of a linear fit, (d) skewness and kurtosis of ${\delta t_i}$. This reduces the input dimensionality to $\sim$15–20.
Estimated training cost:
| Component | Value |
|---|---|
| Simulations needed | $10^5$ for $\mathcal{M}_2$; $5 \times 10^5$ for $\mathcal{M}_3$ |
| TTVFast time per simulation | 2 ms ($\mathcal{M}_2$), 10 ms ($\mathcal{M}_3$) |
| Total simulation time | 200 s ($\mathcal{M}_2$), 5000 s ($\mathcal{M}_3$) |
| Network training time | ~1 hr on GPU (normalizing flow with 5 coupling layers) |
| Inference time per target | ~10 ms (single forward pass) |
The simulation budget is entirely feasible. $10^5$ TTVFast simulations at 2 ms each = 3.3 minutes of CPU time. Training a normalizing flow on this dataset takes $\sim$1 hour on a modern GPU. The amortised cost per target system is then essentially zero.
7.3 Evidence Estimation from NPE/NRE¶
Computing the Bayes factor from amortised inference is less straightforward than from nested sampling:
- From NPE: The evidence is not directly available. One can use the "expected posterior ratio" approach: $B_{2,1} \approx q_\phi(\boldsymbol{\theta}_0 | \mathcal{D}) / \pi(\boldsymbol{\theta}_0)$ evaluated at the null-model point — but this is just the Savage-Dickey ratio applied to the neural posterior, with the same limitations discussed in Section 4.3.
- From NRE: The ratio estimator can be marginalised to give an estimate of $Z_k$, but this requires careful calibration and the accuracy of $\ln Z$ estimates from NRE is typically worse than from nested sampling. [Source: Hermans et al. (2022), PMLR 151, 10393]
- Hybrid approach: Use the NPE posterior as a proposal distribution for importance sampling of the evidence integral. This is the "neural importance sampling" approach and can yield accurate $\ln Z$ with far fewer likelihood evaluations than direct nested sampling. [Source: Alsing & Wandelt (2018), MNRAS 476, L60]
7.4 When Amortised Inference Wins vs. When Nested Sampling is Preferred¶
Amortised inference (NPE) is clearly better when: - Many target systems need to be analysed with the same model (the training cost is amortised over all targets) - Real-time or near-real-time posterior updates are needed (e.g., the live posterior dashboard) - The posterior is well-approximated by the normalizing flow (smooth, low-dimensional) - Model comparison is not the primary goal (parameter estimation only)
Nested sampling is clearly better when: - The Bayes factor is the primary quantity of interest (nested sampling computes it natively) - The posterior is highly multimodal with well-separated modes (normalizing flows struggle to represent disconnected modes) - Only a few systems need analysis (training cost dominates) - Posterior accuracy must be verified (nested sampling has well-understood error bounds; NPE accuracy is harder to certify)
[NOVEL] Recommended OpenAstro workflow: 1. Screening pass (NPE): Train an NPE on $10^5$ TTVFast simulations. For each new target with transit timing data, run the NPE to get a rapid ($\sim$10 ms) posterior estimate. Flag systems where the NPE posterior strongly excludes $\mu_c = 0$ (i.e., the posterior concentrates away from zero mass). 2. Confirmation pass (nested sampling): For flagged systems, run full dynesty nested sampling to compute $\ln B_{2,1}$ rigorously. This takes $\sim$100 s per system. 3. Deep inference (BOINC PT): For systems where $\ln B_{2,1} > 5$, run $\mathcal{M}_3$ nested sampling or BOINC parallel tempering to test for additional perturbers.
This three-tier approach minimises total compute while maintaining rigorous evidence calculations where they matter.
8. Practical Implementation Path for OpenAstro¶
8.1 Minimum Viable Implementation (MVI)¶
Goal: Compute $\ln B_{2,1}$ (one perturber vs. zero perturbers) for a known TTV system from archival data.
Step 1: Data ingestion - Pull transit midtimes from ETD (Exoplanet Transit Database) or ExoClock for a target system (e.g., WASP-12b, TrES-1b) - Format as ${(i, t_i^{\text{obs}}, \sigma_i)}$ - Fit a linear ephemeris ($t_0, P$) by weighted least squares; compute residuals $\delta t_i$
Step 2: Null model evidence - $\mathcal{M}_1$: $\delta t_i^{\text{model}} = 0$ (residuals are noise) - Compute $\ln Z_1$ analytically (the evidence for a Gaussian noise model is a closed-form integral):
$$\ln Z_1 = -\frac{1}{2}\sum_i \left[\frac{\delta t_i^2}{\sigma_i^2} + \ln(2\pi\sigma_i^2)\right]$$
Strictly, $\mathcal{M}1$ has 4 free parameters ($P_1, t_1$ (4 parameters, trivially fast).}, e_1\cos\omega_1, e_1\sin\omega_1$) that must be marginalised. However, since TTV data alone weakly constrains $e_1$, and $P_1, t_{0,1}$ are tightly constrained by the linear ephemeris, the $\mathcal{M}_1$ evidence is dominated by the residual scatter. In practice, run a quick dynesty fit for $\mathcal{M
Step 3: One-perturber model evidence
- Install: pip install ttvfast dynesty numpy
- Define the likelihood function wrapping TTVFast (9 parameters for coplanar $\mathcal{M}_2$)
- Define priors per Section 2
- Run dynesty.DynamicNestedSampler with nlive_init=500, bound='multi', sample='rwalk'
- Extract $\ln Z_2$ from results.logz[-1]
Step 4: Compute Bayes factor - $\ln B_{2,1} = \ln Z_2 - \ln Z_1$ - Uncertainty: $\sigma_{\ln B} = \sqrt{\sigma_{\ln Z_1}^2 + \sigma_{\ln Z_2}^2}$, typically $\sim 0.3$–$0.5$ - Apply decision rule from Section 5
Step 5: Validate - Compare $\ln B_{2,1}$ against published results for the test system - Check posterior on $P_c$, $\mu_c$ against known values - Verify per-mode decomposition identifies the correct resonance
Libraries and dependencies:
| Library | Version | Purpose |
|---|---|---|
ttvfast |
>=0.3.0 | Forward model (TTVFast Python wrapper) |
dynesty |
>=2.1 | Nested sampling |
numpy |
>=1.24 | Numerics |
scipy |
>=1.10 | Optimization (for initial point finding) |
matplotlib |
>=3.7 | Corner plots, TTV time series plots |
astropy |
>=5.3 | Time conversions (BJD), constants |
scikit-learn |
>=1.3 | Mode clustering (GMM/HDBSCAN) |
sbi |
>=0.22 | Neural posterior estimation (for Tier 1 screening; optional) |
Compute budget for MVI:
| Task | Evaluations | Time (single core) |
|---|---|---|
| $\mathcal{M}_1$ nested sampling | ~5,000 | ~10 s |
| $\mathcal{M}_2$ nested sampling | ~50,000 | ~100 s |
| Post-processing (clustering, plots) | — | ~30 s |
| Total | ~55,000 | ~2.5 min |
This is well within the capability of a $5/month VPS.
8.2 Connection to BOINC Parallel Tempering Infrastructure¶
The BOINC integration described in [[TTV Reverse N-Body Inference]] maps directly to the thermodynamic integration method (Section 4.2):
- Work unit generation (server): For a target system, the OpenAstro server generates $L$ work units, one per temperature level $T_\ell$. Each work unit contains:
- The data $\mathcal{D}$
- The temperature $T_\ell$
- The prior specification
- An initial parameter vector (from the dynesty screening run)
-
The number of MCMC steps to perform ($N_{\text{steps}} = 100{,}000$)
-
Execution (volunteer): Each BOINC client runs an MCMC chain at its assigned temperature, producing a chain of $(\boldsymbol{\theta}, \ln\mathcal{L})$ pairs. The chain is returned to the server.
-
Swap step (server): The server collects all $L$ chains and performs the parallel tempering swap step retrospectively:
- For each pair of adjacent temperatures $(T_\ell, T_{\ell+1})$, compute the swap acceptance probability: $$\alpha = \min\left(1, \exp\left[(\beta_\ell - \beta_{\ell+1})(\ln\mathcal{L}{\ell+1} - \ln\mathcal{L}\ell)\right]\right)$$
- Accept or reject the swap; if accepted, exchange the parameter vectors between the two temperature levels for the next batch
-
This "batch PT" approximation is valid when the chains are long enough that the endpoint configuration is representative [Source: see Swendsen & Wang (1986), Phys. Rev. Lett. 57, 2607 — replica exchange]
-
Evidence computation (server): From the collected $\langle \ln \mathcal{L} \rangle_{T_\ell}$ at each temperature, compute $\ln Z$ via thermodynamic integration: $$\ln Z \approx \sum_{\ell=1}^{L-1} \frac{\beta_{\ell+1} - \beta_\ell}{2}\left[\langle\ln\mathcal{L}\rangle_\ell + \langle\ln\mathcal{L}\rangle_{\ell+1}\right]$$
-
Iteration: If the evidence estimate has large uncertainty (from the variance of $\langle \ln \mathcal{L} \rangle$ within each chain), dispatch another batch of work units with longer chains or finer temperature spacing.
[NOVEL] The batch parallel tempering with server-side swaps is specifically designed for the BOINC model where work units are independent. The retrospective swap step is an approximation, but for long chains (100,000 steps) with well-spaced temperatures, the evidence estimate converges to the correct value. This architecture is original to OpenAstro.
8.3 Expected Turnaround Time per Target System¶
| Tier | Method | Compute | Turnaround |
|---|---|---|---|
| Screening (NPE) | Neural posterior, pre-trained | Single forward pass | ~1 s |
| Standard ($\mathcal{M}_1$ vs $\mathcal{M}_2$) | dynesty on VPS | 55,000 evals | ~3 min |
| Deep ($\mathcal{M}_2$ vs $\mathcal{M}_3$) | dynesty on VPS | 500,000 evals | ~20 min |
| Deep ($\mathcal{M}_3$ vs $\mathcal{M}_4$) | BOINC PT | 5,000,000 evals | ~1–3 days (volunteer compute) |
The turnaround bottleneck for $\mathcal{M}_3$+ is BOINC volunteer availability. With 10 active BOINC clients each running one temperature level, the wall time for a $\mathcal{M}_3$ analysis is determined by the slowest client — typically $\sim$1 day for 200,000 MCMC steps at 10 ms per step.
For the live posterior dashboard, the relevant timescale is the Standard tier: 3 minutes per update after each new transit observation. This is fast enough for real-time science output.
8.4 Implementation Roadmap¶
Phase 1 (MVI — 2 weeks of development): - Implement the TTVFast + dynesty wrapper for $\mathcal{M}1$ and $\mathcal{M}_2$ - Validate on 2–3 known TTV systems (WASP-12, Kepler-88, WASP-47) - Compute $\ln B$ and compare to published values
Phase 2 (Model selection pipeline — 4 weeks): - Extend to $\mathcal{M}_3$ (14 parameters, coplanar) - Implement mode clustering and per-mode reporting - Build the resonance-aware prior (Section 2.2) - Implement the jitter hyperparameter (Section 3.2) - Validate on Kepler-80, TRAPPIST-1 (multi-planet TTV systems)
Phase 3 (BOINC integration — 6 weeks): - Package the MCMC likelihood evaluation as a BOINC work unit - Implement server-side batch PT swap and thermodynamic integration - Test with volunteer compute on 5–10 target systems
Phase 4 (Amortised inference — 4 weeks):
- Generate $10^5$ TTVFast training simulations
- Train NPE using the sbi library
- Validate NPE posteriors against dynesty on test systems
- Deploy as the screening tier
9. Summary of Key Equations¶
For reference, the complete mathematical specification:
Evidence integral: $$Z_k = \int \prod_{i=1}^{N} \frac{1}{\sqrt{2\pi\sigma_i^2}} \exp\left(-\frac{(\delta t_i - \delta t_i^{\text{model}}(\boldsymbol{\theta}_k))^2}{2\sigma_i^2}\right) \pi(\boldsymbol{\theta}_k | \mathcal{M}_k) \, d\boldsymbol{\theta}_k$$
Bayes factor: $$B_{k,k-1} = Z_k / Z_{k-1}$$
Decision rule: $$\ln B_{k,k-1} > 5 \text{ for detection}; \quad 3 < \ln B < 5 \text{ for candidacy}$$
Minimum detectable amplitude (for $\ln B > 5$): $$A_{\min} \approx \sigma \sqrt{\frac{2(5 + d_{\text{new}}\ln(V_{\text{prior}}/V_{\text{post}})/2)}{N/2}}$$
where $d_{\text{new}}$ is the number of new parameters, $V_{\text{prior}}/V_{\text{post}}$ is the prior-to-posterior volume ratio per parameter, $N$ is the number of transits, and $\sigma$ is the per-transit timing uncertainty.
For the reference case ($N = 30$, $\sigma = 2$ min, $d_{\text{new}} = 5$, $V_{\text{prior}}/V_{\text{post}} \sim 100$): $A_{\min} \approx 3$ min $\approx 42 M_\oplus$ near 2:1 resonance.
10. References¶
- Agol & Fabrycky (2018), Handbook of Exoplanets, Chapter 7 — TTV theory review
- Agol et al. (2005), MNRAS, 359, 567 — TTV prediction
- Alsing & Wandelt (2018), MNRAS, 476, L60 — neural importance sampling
- Buchner (2021), JOSS, 6, 3001 — UltraNest
- Buchner (2023), Statistics and Computing — nested sampling review
- Carter et al. (2008), ApJ, 689, 499 — transit parameter uncertainties
- Chambers et al. (1996), Icarus, 119, 261 — Hill stability
- Cranmer, Brehmer & Louppe (2020), PNAS, 117, 30055 — simulation-based inference review
- Deck et al. (2014), ApJ, 787, 132 — TTVFast algorithm
- Dickey (1971), Annals of Mathematical Statistics, 42, 204 — Savage-Dickey density ratio
- Fabrycky et al. (2014), ApJ, 790, 146 — Kepler mutual inclinations
- Feroz, Hobson & Bridges (2009), MNRAS, 398, 1601 — MultiNest
- Feroz et al. (2019), Open J. Astrophys., 2, 10 — MultiNest update
- Ford et al. (2012), ApJ, 750, 113 — Bayesian analysis of Kepler TTVs
- Friel & Pettitt (2008), JRSS-B, 70, 589 — thermodynamic integration
- Gladman (1993), Icarus, 106, 247 — Hill stability criterion
- Greenberg et al. (2019), ICML — automatic posterior transformation
- Gregory (2005), ApJ, 631, 1198 — jitter term in Bayesian fitting
- Hadden & Lithwick (2014), ApJ, 787, 80 — TTV inversion
- Hermans et al. (2020), ICML — neural ratio estimation
- Hermans et al. (2022), PMLR, 151, 10393 — NRE model comparison
- Holman & Murray (2005), Science, 307, 1288 — first TTV detection
- Jeffreys (1961), Theory of Probability, 3rd ed. — Jeffreys scale
- Kass & Raftery (1995), JASA, 90, 773 — Bayes factors
- Lartillot & Philippe (2006), Systematic Biology, 55, 195 — thermodynamic integration for Bayesian evidence
- Lindley (1957), Biometrika, 44, 187 — Lindley's paradox
- Lithwick, Xie & Wu (2012), ApJ, 761, 122 — analytic TTV formulas, chopping
- Lueckmann et al. (2021), PMLR, 130, 1289 — sbi library
- Miller et al. (2021), NeurIPS — truncated marginal neural ratio estimation
- Nelson et al. (2020), AJ, 159, 73 — evidence computation comparison in exoplanet context
- Papamakarios & Murray (2016), NeurIPS — masked autoregressive flows for density estimation
- Robert (2014), arXiv:1407.5023 — Jeffreys-Lindley paradox discussion
- Skilling (2006), Bayesian Analysis, 1, 833 — nested sampling
- Speagle (2020), MNRAS, 493, 3132 — dynesty
- Swendsen & Wang (1986), Phys. Rev. Lett., 57, 2607 — replica exchange
- Verdinelli & Wasserman (1995), JASA, 90, 614 — Savage-Dickey ratio
- Vousden, Farr & Mandel (2016), MNRAS, 455, 1919 — adaptive parallel tempering
- Wisdom (1980), AJ, 85, 1122 — resonance overlap
- Winn et al. (2008), AJ, 136, 267 — correlated noise in transit photometry
- Xie et al. (2011), Systematic Biology, 60, 150 — Stepping Stone estimator