Tasks for Opus¶

These are the hard problems in OpenAstro that need sustained deep reasoning — not research retrieval, but genuine derivation, algorithm design, or framework construction. Each one is a real blocker or a significant contribution opportunity.

1. Bayesian Model Order Selection for N-Body Systems¶

The problem: Given a TTV residual time series from a single transiting planet, determine how many perturbing bodies (n=1, 2, 3...) are needed to explain the data. This is a Bayesian model comparison problem: compute the evidence ratio (Bayes factor) between n-body models of different orders.

Why it's hard: - The likelihood surface has multi-modal structure (orbital resonances create degenerate solutions) - The prior volume scales exponentially with n (each body adds ~5 free parameters) - Nested sampling (MultiNest/dynesty) can compute log-evidence but convergence is slow in high dimensions - Need to define physically meaningful priors on orbital elements (uniform in period ratio? Log-uniform in mass?)

What Opus should produce: - A formal statistical framework: which evidence threshold (Bayes factor) justifies adding a perturber? - Recommended priors for mass, semi-major axis, eccentricity, inclination - How to handle degeneracy between near-resonant configurations - Whether a neural network approach (amortised inference) is better than MCMC for this specific problem

Relevant tools: TTVFast (forward model), dynesty/MultiNest (nested sampling), Jeffreys-Lindley paradox considerations

2. Optimal Observer Deployment for Occultation Chord Coverage¶

The problem: Given a predicted stellar occultation with an uncertain shadow path (position uncertainty σ_RA, σ_Dec), a maximum number of available observers N, and a target body with unknown size R (we want to measure it), how do you place observers to maximise the probability of a useful chord set (≥3 chords spanning the body's diameter)?

Why it's hard: - It's a stochastic optimisation problem: the shadow path is a probability distribution, not a fixed line - The objective function (probability of getting useful chords) is non-convex in observer placement - Observers have constraints: they need road access, clear sky probability varies by location, they have a maximum travel distance from home base - The solution changes depending on whether you're trying to confirm the occultation (any chord), size it (2+ chords), or shape-model it (6+ chords)

What Opus should produce: - A formal statement of the optimisation problem - Whether a greedy placement algorithm (maximise marginal chord probability at each step) gives a good-enough approximation - How to integrate weather probability into the placement decision - The tradeoff between spreading observers wide (geometric coverage) vs. clustering (backup redundancy)

3. MCMC State Synchronisation Protocol for BOINC Parallel Tempering¶

The problem: Parallel tempering MCMC requires periodic swap attempts between chains at different temperatures. In a distributed BOINC setting, each chain runs on a different volunteer machine. Swap attempts require both chains to pause, exchange state, and accept/reject the swap. How do you do this reliably when machines are unreliable, have different speeds, and may disappear mid-run?

Why it's hard: - Standard PT implementations assume shared memory or at least reliable message passing - BOINC work units are designed to be independent — there's no built-in inter-unit communication - A failed swap (because a machine went offline) must not corrupt the chain's Markov property - You need to detect chain "stalling" (stuck in local mode) and restart gracefully - The convergence diagnostic (R-hat across chains, KS test between adjacent temperatures) must be computed server-side without requiring all chains to complete simultaneously

What Opus should produce: - A protocol for asynchronous swap attempts that preserves detailed balance - How long a chain can run without a successful swap before it's considered stuck - Server-side checkpoint design (what state needs to be persisted per chain, per swap) - A convergence criterion that works when some chains are still running (sequential stopping rule)

4. Heterogeneous Stacking with Unknown PSFs and Partial Overlap¶

The problem: Given N images from telescopes with different apertures (D = 80mm to 400mm), different pixel scales (0.5"/px to 3"/px), different PSFs (Gaussian width 1.5" to 4"), and partial sky overlap (each image covers a different but overlapping patch), produce an optimal co-added image with correct flux calibration and well-characterised noise properties.

Why it's hard: - PSF homogenisation (convolving to a common PSF) loses resolution from the best images - Drizzle algorithm assumes known PSFs and regular sampling — neither holds here - Flux calibration requires a common photometric reference (Gaia) but stars in common may be saturated in some images and faint in others - The noise in the co-add is correlated (after resampling) and depends on the coverage map in a non-trivial way - The optimal weighting depends on the science: SNR-weighted favours deep images, resolution-weighted favours small-pixel images

What Opus should produce: - The correct mathematical formulation for the co-addition (what does "optimal" mean for each science case?) - How to estimate the effective PSF of the co-add as a function of position - When PSF homogenisation is worth the resolution loss vs. not - What noise model to propagate into the output FITS header so downstream photometry is calibrated correctly

5. Alert Prioritisation as a Multi-Objective Optimisation¶

The problem: OpenAstro receives alerts from Rubin/ZTF/GCN/TNS. Each alert has a scientific priority (classification certainty, science case importance), an urgency (time to peak, observation window), and a network coverage score (how many nodes can see this target right now, at what airmass). The scheduler must rank them and assign nodes. Nodes have different capabilities. How do you formalise this as a solvable optimisation and make it computable in real time (<5 seconds per scheduling cycle)?

Why it's hard: - Multi-objective: science value, urgency, coverage are incommensurable - The solution changes as new alerts arrive mid-cycle - Node assignments have to be globally consistent (two nodes assigned the same target is wasteful; one node watching a ToO while another watches a routine target is fine) - The problem is NP-hard in general (it's a variant of weighted job scheduling with precedence constraints)

What Opus should produce: - A formal problem statement (is this an ILP? A greedy approximation? A queue-based heuristic?) - A scoring function that makes the tradeoffs explicit (how many "science points" is a GRB worth vs. a routine TTV transit?) - How to handle the multi-telescope assignment (is it better to observe one target with 5 nodes or 5 targets with 1 node each?) - Real-time performance requirements and whether the MVP greedy scheduler is good enough or needs replacement

6. MUSE First-Filter Statistical Threshold¶

The problem: OpenAstro's photometric monitoring generates light curves for thousands of sources. We want to flag objects that "deserve MUSE time" — i.e., objects whose photometric behaviour is anomalous in a way that is scientifically interesting and that spectroscopy would resolve. What is the decision criterion?

Why it's hard: - "Anomalous" is not well-defined: a 0.5 mag variability is anomalous for a G dwarf, normal for a Mira - The false positive rate matters: proposing bad targets wastes 8m telescope time and destroys relationships with professional partners - The criterion must be automated (can't have a human review every light curve) but must be interpretable (needs to produce a human-readable justification for the proposal) - Different science cases have different anomaly signatures (TDE: fast rise + power-law decline; AGN CLAGN: slow monotonic change over months; nova: sudden jump + decline)

What Opus should produce: - A classification taxonomy of "MUSE-worthy" photometric anomalies with decision rules for each - What false positive rate is acceptable (if we submit 10 targets and 7 are real, is that good enough for a VLT proposal?) - Whether this is a supervised ML classification problem (train on known TDEs/AGN/novae) or a statistical outlier detection problem - A concrete data flow: what does OpenAstro actually send to the proposal system?

7. Economic Sustainability Model for the Owned Hardware Stage¶

The problem: Stage 3 involves OpenAstro owning a network of low-cost telescope nodes. Each node costs ~$2,000–5,000 to build and ~$500/year to operate (hosting, maintenance, replacement). To be self-sustaining, the network must generate value that someone will pay for. What are the viable revenue models and at what network scale do they become viable?

Why it's hard: - Science value is real but not easily monetised (papers don't pay) - Data licensing is possible but who buys it? (insurance companies for weather? Satellite operators for debris?) - The "commercial monitoring" angle (watch satellites, debris, near-Earth objects for paying customers) has regulatory and contractual complexity - Grant funding (NSF, ESA, Horizon Europe) has specific deliverable requirements - The timing matters: you need revenue before you can afford the nodes, but you need nodes before you have revenue

What Opus should produce: - A realistic revenue model with 3 scenarios (optimistic, realistic, pessimistic) and break-even analysis - Which revenue stream has the lowest barrier to entry and the highest near-term probability - Whether the non-profit vs. for-profit structure changes the answer materially - Specific grant programmes to target and what deliverables they require

8. Timing Precision Propagation Through the Full Pipeline¶

The problem: An observation starts with a GPS timestamp (±1ms) on the raw FITS header. By the time it reaches the science output (a transit mid-time in BJD_TDB), it has passed through: FITS header write, plate-solve, WCS fit, time standard conversion (UTC→BJD_TDB), light travel time correction (barycentric), and systematic offsets from the exposure mid-point calculation. What is the total timing uncertainty budget end-to-end, and which steps dominate?

Why it's hard: - BJD_TDB conversion has a ~8-minute seasonal amplitude (Earth's orbital light travel time) — if done wrong, it dominates everything - The "exposure mid-point" assumes a linear ramp of photons, which is wrong for rapid transients - Systematic offsets in ASCOM Alpaca timestamping are not well-documented - The answer is different for occultations (need <0.1s), TTVs (need <30s), and transient follow-up (need <60s)

What Opus should produce: - A full error budget table: each pipeline step, its timing contribution, and its uncertainty - Which steps require GPS-PPS hardware vs. which are fine with NTP - What the correct code recipe is for BJD_TDB conversion (using astropy.time + barycentric correction) to guarantee no systematic offsets - How to validate the pipeline's timing output end-to-end using a known eclipse with a published mid-time