Statistical Framework for Flagging MUSE-Worthy Photometric Anomalies¶
A rigorous specification for automatically classifying OpenAstro light curves as candidates for VLT/MUSE spectroscopic follow-up. This document defines the decision problem, anomaly taxonomy, baseline characterisation, detection statistics, classification methodology, proposal output format, and expected alert rates.
Relationship to existing vault: This framework operationalises the "OpenAstro as First Filter Stage for MUSE" concept described in MUSE Candidate production.md. It consumes data from the QC pipeline specified in Quality Control Pipeline.md (only observations with quality_score >= 70 and no hard-reject flags enter this classifier). It draws on science cases from High Impact Niche Cases.md (TDEs, CLAGN, SN early rises) and Recurrent Novae and CV Monitoring.md (nova eruptions).
1. The Decision Problem¶
1.1 Formal Statement¶
Given a light curve $L = {(t_i, m_i, \sigma_i)}_{i=1}^{N}$ for a monitored source, where $t_i$ is the epoch (MJD), $m_i$ is the calibrated magnitude, and $\sigma_i$ is the per-observation photometric uncertainty, classify:
$$\text{MUSE_worthy} \in {0, 1}$$
The classifier runs automatically on all monitored sources after each observation epoch. When MUSE_worthy = 1, the system emits a structured alert with a human-readable justification suitable for inclusion in a VLT/MUSE Director's Discretionary Time (DDT) or regular proposal.
1.2 Operational Context¶
- Throughput: Potentially thousands of sources per night across the OpenAstro network. The classifier must run in < 1 second per source per epoch (incremental update, not full re-analysis).
- Output: Binary flag + anomaly class label + confidence score [0, 1] + auto-generated justification text.
- Human review: Every MUSE_worthy = 1 flag is reviewed by the OpenAstro science team (expected: 1-3 people) before any external proposal is submitted. The classifier reduces the review pool from thousands to single digits per night.
1.3 Loss Function and Precision/Recall Tradeoff¶
The two error types carry asymmetric costs:
False Positive (FP): Source flagged MUSE_worthy, but is not scientifically interesting for spectroscopy. - Cost: Wasted VLT/MUSE time (8m telescope, ~EUR 50,000 per night of operations). Damaged credibility with ESO and professional collaborators. After 2-3 false positives, OpenAstro's proposals will be deprioritised. Reputational cost is severe and compounding.
False Negative (FN): Source is genuinely MUSE-worthy, but the classifier misses it. - Cost: Missed discovery. However, the source will likely be caught by other surveys (ZTF, ATLAS, ASAS-SN) or by OpenAstro at a later epoch when the anomaly has grown. Most MUSE-worthy anomalies evolve over days to weeks; a delay of 1-2 epochs (nights) is tolerable for most classes. The exception is SN shock breakout, which is handled by a separate fast-alert pathway (see Section 2.E).
Conclusion: False positives are worse. A missed detection causes a delay; a false positive causes lasting institutional damage.
Target operating point: - Precision >= 0.90 (no more than 10% of flagged sources are false positives) - Recall >= 0.70 (catch at least 70% of genuine MUSE-worthy anomalies within 3 epochs of onset) - This translates to a false positive rate (FPR) constraint among all monitored sources. If 10,000 sources are monitored and 20 per year are genuinely MUSE-worthy, a 10% FP rate on 20 true alerts means ~2 false proposals per year. See Section 7 for full calculation.
2. Taxonomy of MUSE-Worthy Anomaly Classes¶
For each class: photometric signature, spectroscopic question, formal detection rule, and false positive risk.
Notation¶
Throughout this section: - $\mu_b$ = baseline mean magnitude - $\sigma_b$ = baseline intrinsic variability (photon noise subtracted; see Section 3) - $\tau_b$ = baseline characteristic variability timescale - $\Delta m = m_{\text{peak}} - \mu_b$ (negative = brightening) - $\Delta t$ = time interval over which the change occurred - $r_{\text{offset}}$ = angular offset of transient from host galaxy nucleus - $r_{\text{PSF}}$ = effective PSF FWHM of the detection image - $\sigma_{\text{obs}}$ = photometric uncertainty of the anomalous measurement
A. Tidal Disruption Event (TDE)¶
Photometric signature: Nuclear transient (positional offset < 0.5 arcsec from galaxy centre), monotonic rise over weeks to months (rise time $t_{\text{rise}}$ = 20-80 days), peak absolute magnitude $M_{\text{peak}} \approx -18$ to $-21$, followed by power-law decline $\propto t^{-5/3}$.
Spectroscopic question MUSE answers: Integral field spectroscopy resolves the host galaxy nucleus from the transient. Broad H-alpha and He II 4686 emission confirm TDE nature. Line widths give velocity dispersion of the debris stream, constraining the black hole mass via $M_{\text{BH}} \propto \sigma_v^{4.38}$. The spatially-resolved host spectrum gives the stellar population age, constraining the rate enhancement in post-starburst ("E+A") galaxies.
Detection criterion: A source is flagged as TDE candidate if ALL of the following hold:
- Amplitude: $|\Delta m| > 1.5$ mag brightening from baseline, measured over $\Delta t < 60$ days
- Position: $r_{\text{offset}} < r_{\text{PSF}} / 2$ (consistent with galaxy nucleus; requires cross-match with galaxy catalogue — NED or SDSS photometric catalog)
- No prior variability: $\sigma_b < 0.05$ mag over the baseline period (source was quiescent; this excludes known AGN with ongoing variability)
- Significance: Detection significance $S > 5$ (see Section 4)
- Rise consistency: Light curve shows monotonic brightening (no more than 1 reversal of > 0.3 mag) over the rise phase
Formal decision rule:
TDE_flag = (|Delta_m| > 1.5) AND (Delta_t < 60 days) AND
(r_offset < r_PSF/2) AND (sigma_b < 0.05 mag) AND
(S > 5) AND (monotonic_rise = True)
False positive risk: - AGN flare: The primary contaminant. Distinguished by: TDE host is quiescent (no historical variability); AGN host has detectable variability in baseline. AGN flares are typically redder; TDE peaks are bluer (g - r < 0.0 at peak). Mitigation: require $\sigma_b < 0.05$ mag in baseline. - Superluminous supernova (SLSN): Can reach similar luminosity and rise time. Distinguished by: SLSN typically offset from nucleus. Mitigation: the $r_{\text{offset}} < r_{\text{PSF}}/2$ criterion rejects most SLSNe. - Expected FP rate: With the above criteria, estimated 1-2 AGN contaminants per 10 TDE candidates (based on ZTF TDE sample purity; van Velzen et al. 2021).
B. Changing-Look AGN (CLAGN)¶
Photometric signature: Monotonic flux change > 0.5 mag over months to years in a source already identified as an AGN (i.e., has prior variability in the baseline). Often accompanied by colour change (bluer-when-brighter for type transitions involving increased accretion; redder-when-brighter is rare and more diagnostic).
Spectroscopic question MUSE answers: IFS spectroscopy before and after the photometric transition measures whether broad emission lines (H-beta, Mg II 2800, H-alpha) have appeared or disappeared. The line profile change constrains whether the transition is caused by a change in accretion rate (broad line region responds to ionising flux) or variable obscuration (dusty torus geometry change). MUSE's spatial resolution disentangles host galaxy contamination from the AGN spectrum, critical for low-luminosity AGN where the host dominates broadband photometry.
Detection criterion: A source is flagged as CLAGN candidate if ALL of the following hold:
- Source type: Identified as AGN in catalogue cross-match (SDSS quasar catalogue, Veron-Cetty & Veron, or MILLIQUAS) OR has $\sigma_b > 0.05$ mag (showing AGN-like variability in baseline)
- Amplitude: $|\Delta m| > 0.5$ mag change from baseline mean, measured with S/N > 5 per epoch
- Timescale: Change is sustained over $\Delta t > 30$ days (not a single-epoch spike)
- Trend consistency: At least 3 consecutive observations showing monotonic trend in the same direction, OR a Bayesian blocks change-point detected with confidence > 95%
- Significance of trend: CUSUM statistic exceeds threshold (see Section 4.2)
Formal decision rule:
CLAGN_flag = (AGN_morphology OR sigma_b > 0.05) AND
(|Delta_m| > 0.5) AND (Delta_t > 30 days) AND
(monotonic_epochs >= 3 OR bayesian_block_change > 0.95) AND
(CUSUM > h_CLAGN)
where $h_{\text{CLAGN}}$ is the CUSUM threshold calibrated in Section 4.2.
False positive risk: - Microlensing of background quasar: Gravitational microlensing produces smooth, achromatic brightening. Distinguished by: lensing is achromatic (same amplitude in all bands); CLAGN shows colour change. Lensing light curve is symmetric about peak; CLAGN transition is typically one-directional. Mitigation: require colour change > 0.1 mag in (g - r) or (B - V) concurrent with the flux change. - Secular AGN variability: Normal AGN stochastic variability can produce 0.5 mag excursions on year timescales. Distinguished by: structure function analysis (see Section 3) — if the observed change is within the expected structure function amplitude for the source's variability class, it is not anomalous. Mitigation: require the CUSUM statistic to exceed the threshold calibrated on the source's own baseline variability, not a generic threshold. - Expected FP rate: With colour-change requirement, estimated 5% FP rate (1 in 20 flagged CLAGN is normal AGN variability).
C. Nova Eruption (Extragalactic)¶
Photometric signature: Rapid rise of > 2 mag in 1-5 days, from an undetectable or faint baseline. Source is spatially resolved within or projected onto a nearby galaxy (distance < 30 Mpc for OpenAstro depth). Decline rate varies: "fast novae" decline by 2 mag in < 10 days ($t_2 < 10$ d); "slow novae" take months.
Spectroscopic question MUSE answers: Nebular-phase spectrum (taken weeks after peak) reveals ejecta composition: CNO abundances for CO white dwarfs, neon enrichment for ONe white dwarfs. Ejection velocity from line widths constrains the white dwarf mass. For recurrent nova candidates, abundance patterns distinguish them from classical novae — recurrent novae have lower ejected mass and higher velocities. MUSE's IFS capability maps the ejecta geometry if the nova is in a sufficiently nearby galaxy (M31: 780 kpc; individual novae spatially resolved at ~1 arcsec = 3.8 pc).
Detection criterion: A source is flagged as extragalactic nova candidate if ALL of the following hold:
- Amplitude: $|\Delta m| > 2.0$ mag brightening in $\Delta t < 5$ days
- Host association: Source lies within the optical extent (D25 isophote) of a catalogued galaxy at distance < 30 Mpc (from NED or HyperLEDA cross-match)
- Not Galactic: Galactic latitude $|b| > 10\degree$ OR explicit galaxy association confirmed
- Point source: No resolved spatial extent (consistent with PSF)
- No prior detection: Source not detected in baseline images at > 3-sigma above sky background, OR $|\Delta m| > 6$ mag from any prior detection
Formal decision rule:
Nova_flag = (|Delta_m| > 2.0) AND (Delta_t < 5 days) AND
(host_galaxy_distance < 30 Mpc) AND
(within_D25 = True) AND (point_source = True)
False positive risk:
- Galactic CV outburst projected onto galaxy: A foreground dwarf nova in outburst can mimic a nova in a background galaxy. Distinguished by: proper motion (Gaia cross-match — foreground CVs have detectable proper motion; extragalactic novae have zero). Parallax if available. Mitigation: cross-match with Gaia DR3; any source with parallax > 0.5 mas or proper motion > 2 mas/yr is Galactic.
- Asteroid/satellite streak: Can produce a spurious bright source at a single epoch. Mitigation: require detection at > 1 epoch separated by > 1 hour, OR confirmation from a second OpenAstro node. The QC pipeline's existing multi-site confirmation requirement (see Quality Control Pipeline.md) handles this.
- Expected FP rate: With Gaia cross-match, < 5% FP rate.
D. Luminous Blue Variable (LBV) Giant Eruption¶
Photometric signature: Slow brightening of > 1 mag over months to years. May show irregular fluctuations superimposed on the overall trend. Colour evolution: LBV eruptions produce an apparent temperature decline (the star moves to the red in the HR diagram during eruption — the "pseudo-photosphere" expands and cools). Source is associated with a massive star or a star-forming region in a nearby galaxy.
Spectroscopic question MUSE answers: IFS spectroscopy resolves the LBV from its surrounding HII region and stellar association. The spectrum at eruption peak shows a cool supergiant-like continuum with P Cygni profiles in H-alpha and Fe II, distinguishing it from an SN impostor (which it may be reclassified as later). The critical question: is this an S Doradus-type instability (the star survives) or a genuine precursor to core collapse (the star is about to die)? The answer comes from the spectrum — SN precursors show higher velocities and asymmetric line profiles indicating mass ejection, not expansion.
Detection criterion: A source is flagged as LBV eruption candidate if ALL of the following hold:
- Amplitude: $|\Delta m| > 1.0$ mag brightening over $\Delta t > 60$ days
- Slow evolution: No component of the light curve shows variability faster than 0.3 mag per day (excludes novae, SNe)
- Source identification: Cross-match with massive star catalogues (Bonanos et al. 2009 for M31/M33; Massey et al. 2006 for Local Group) OR source lies in a resolved star-forming region of a galaxy at d < 10 Mpc
- Colour: If multi-band data available, source has become redder during brightening: $\Delta(B-V) > 0.2$ mag
- No rapid variability: Standard deviation of detrended residuals (after subtracting linear or polynomial trend) is < 0.15 mag
Formal decision rule:
LBV_flag = (|Delta_m| > 1.0) AND (Delta_t > 60 days) AND
(max_daily_rate < 0.3 mag/day) AND
(massive_star_match OR star_forming_region) AND
(detrended_sigma < 0.15 mag)
False positive risk: - Mira variable or long-period variable (LPV): These show large-amplitude (> 2 mag), slow variability. Distinguished by: Miras are periodic (period 100-1000 days); LBV eruptions are aperiodic. Mitigation: compute Lomb-Scargle periodogram on the baseline; if a significant period is found (FAP < 0.01), classify as periodic variable, not LBV. - Dust clearing event: Reduction in foreground dust can cause apparent brightening. Distinguished by: achromatic (same amplitude in all bands); LBV eruption is chromatic. Mitigation: require colour change. - Expected FP rate: With periodicity check and colour requirement, estimated < 10% FP rate. Note: LBV eruptions are extremely rare (< 1 per year in Local Group); most flags in this class will require careful human review.
E. Supernova Shock Breakout / Early SN¶
Photometric signature: Extremely fast rise: > 2 mag brightening in < 48 hours. Point source in or near a galaxy. For Type Ia, the rise to peak takes ~18 days after first light; for Type II, the shock breakout is a brief UV/optical flash (hours) followed by a plateau. The key diagnostic is the rate of rise: $dm/dt > 1$ mag/day sustained over > 6 hours.
Spectroscopic question MUSE answers: Early-time SN spectroscopy (within 1-3 days of explosion) reveals the outermost ejecta composition, constraining the progenitor. For Type Ia: detection of unburned carbon or high-velocity calcium constrains the explosion mechanism (deflagration vs. detonation). For Type II: flash spectroscopy of the circumstellar medium (narrow emission lines from pre-SN mass loss) constrains the progenitor's final years. MUSE's IFS maps the SN environment — the local metallicity and star formation rate of the host at the SN position — which constrains progenitor channels.
Detection criterion: A source is flagged as early SN candidate if ALL of the following hold:
- Amplitude: $|\Delta m| > 2.0$ mag brightening in $\Delta t < 48$ hours
- Rate: $|dm/dt| > 1.0$ mag/day sustained over > 3 consecutive observations spanning > 6 hours
- Host association: Source within 3 $\times$ $r_{\text{eff}}$ (effective radius) of a catalogued galaxy, OR source is in a field with no prior catalogued counterpart (new point source)
- Point source: Consistent with PSF (unresolved)
- Multi-epoch: Detected in at least 2 observations from the same site, OR confirmed by a second OpenAstro site
Formal decision rule:
SN_flag = (|Delta_m| > 2.0) AND (Delta_t < 48 hours) AND
(|dm/dt| > 1.0 mag/day over > 6 hours) AND
(galaxy_association OR new_source) AND
(point_source = True) AND (multi_detection = True)
CRITICAL: This class triggers IMMEDIATE alert, not a MUSE proposal. SN shock breakout is a minutes-to-hours phenomenon. The correct response is: 1. Fire alert to existing SN Target-of-Opportunity programmes (ePESSTO+, PESSTO, SNIFS, relevant PI programmes) 2. Submit to Transient Name Server (TNS) within 30 minutes 3. Simultaneously submit to OpenAstro's own alert stream for dense photometric follow-up by all available nodes 4. A MUSE DDT proposal may follow if the SN is at a scientifically interesting redshift (z < 0.05) and in a host where IFS adds value
The MUSE proposal pathway is secondary. Speed of external alert is primary.
False positive risk: - Asteroid: Fast-moving solar system object producing a spurious transient. Distinguished by: moves between consecutive exposures; not present in second observation at same position. Mitigation: multi-epoch requirement (detection at same sky position > 6 hours apart). - Satellite glint / cosmic ray: Single-frame artefact. Mitigation: multi-detection requirement. - CV outburst: Galactic cataclysmics can rise > 2 mag in hours. Distinguished by: Gaia cross-match (parallax/proper motion identifies Galactic source); CV has prior outburst history in AAVSO. Mitigation: Gaia cross-match; AAVSO/VSX cross-match. - Expected FP rate: With multi-detection and Gaia cross-match, < 5% FP rate.
3. Baseline Variability Characterisation¶
3.1 Defining the Baseline¶
The baseline is the set of observations before any significant deviation. Operationally:
- $N_{\text{baseline}}$: The first $N_{\text{min}} = 20$ observations with no individual measurement deviating by more than $3\sigma_{\text{obs}}$ from the running mean. If 20 good observations are not available, the source remains in "baseline accumulation" mode and cannot be flagged as anomalous (except by the SN fast-alert pathway, which uses external catalogue cross-match instead of a self-baseline).
- Baseline window: Observations within the first 90 days of monitoring, unless the source is still accumulating (new to the network). The baseline is recalculated every 30 days using a rolling window that excludes any flagged anomaly epochs.
- Baseline freeze: Once an anomaly is flagged, the pre-anomaly baseline is frozen and stored. It is not updated until the anomaly is resolved (source returns to quiescence or is classified).
3.2 Baseline Statistics¶
Mean magnitude ($\mu_b$): $$\mu_b = \frac{\sum_{i \in \text{baseline}} w_i \, m_i}{\sum_{i \in \text{baseline}} w_i}, \quad w_i = \sigma_i^{-2}$$
Inverse-variance weighted mean.
Observed scatter ($\sigma_{\text{obs,b}}$): $$\sigma_{\text{obs,b}}^2 = \frac{\sum_{i} w_i \, (m_i - \mu_b)^2}{\sum_{i} w_i} \cdot \frac{N}{N-1}$$
Intrinsic variability ($\sigma_b$): Subtract the photometric noise contribution: $$\sigma_b^2 = \max\left(0, \; \sigma_{\text{obs,b}}^2 - \overline{\sigma_i^2}\right)$$
where $\overline{\sigma_i^2}$ is the mean squared photometric uncertainty over baseline observations. If $\sigma_b^2 = 0$, the source shows no intrinsic variability above the noise floor.
Characteristic variability timescale ($\tau_b$) from structure function: The first-order structure function: $$V(\tau) = \text{median}\left[\left(m(t+\tau) - m(t)\right)^2\right] - 2\overline{\sigma_i^2}$$
Compute $V(\tau)$ for logarithmically spaced time lags from 1 day to 365 days. The characteristic timescale $\tau_b$ is defined as the lag at which $V(\tau)$ first exceeds $\sigma_b^2$ — i.e., the timescale at which variations become comparable to the total intrinsic variability. If $V(\tau)$ is flat (white noise), set $\tau_b = \infty$ (source is stochastically variable with no preferred timescale).
3.3 Quiescence Thresholds by Source Type¶
A source is "quiescent" if $\sigma_b < \sigma_{\text{thresh}}$:
| Source type | $\sigma_{\text{thresh}}$ (mag) | Rationale |
|---|---|---|
| G/K dwarf star | 0.005 | Stable to < 5 mmag except during flares; any intrinsic variability above this level indicates activity cycle or contamination |
| AGN (known) | 0.15 | AGN are intrinsically variable; "quiescent" means within normal stochastic variability. Threshold from SDSS Stripe 82 AGN structure functions (MacLeod et al. 2010) |
| Galaxy (extended, no known AGN) | 0.02 | Galaxies should not vary; any variability indicates an unresolved transient, AGN, or photometric systematics |
| Asteroid | N/A | Asteroids have periodic rotational variability; do not apply quiescence test. Use period-subtracted residuals instead. |
| M dwarf | 0.02 | M dwarfs show flare activity; quiescent threshold accounts for low-level stochastic flaring (see M-Dwarf Flare Monitoring doc) |
Note on AGN baseline: For AGN, the "baseline" is their normal stochastic variability, not a flat constant. The anomaly detection for CLAGN must detect a departure from the normal variability pattern, not merely large amplitude. This is why the CUSUM/change-point approach (Section 4.2) is necessary for CLAGN, rather than a simple magnitude threshold.
4. Detection Statistics¶
4.1 Point-in-Time Anomaly Significance¶
For a single anomalous measurement at epoch $t_k$, the detection significance is:
$$S = \frac{|m_k - \mu_b|}{\sqrt{\sigma_k^2 + \sigma_b^2}}$$
where: - $m_k$ is the observed magnitude at the anomalous epoch - $\sigma_k$ is the photometric uncertainty at that epoch - $\mu_b$ and $\sigma_b$ are the baseline mean and intrinsic variability
This statistic accounts for both measurement noise and the source's own variability. A genuinely anomalous excursion must exceed both.
Setting $S_{\text{min}}$:
The requirement is a false positive rate < 10% among flagged sources. This is not the same as a false alarm probability per observation. The relationship:
For a Gaussian distribution (assumption: baseline residuals after subtracting the mean are approximately Gaussian — validated source-by-source via a Shapiro-Wilk test on the baseline residuals):
| $S_{\text{min}}$ | One-sided tail probability | Expected false triggers per 10,000 sources per epoch |
|---|---|---|
| 3.0 | $1.35 \times 10^{-3}$ | 13.5 |
| 4.0 | $3.17 \times 10^{-5}$ | 0.32 |
| 5.0 | $2.87 \times 10^{-7}$ | 0.003 |
With nightly monitoring of 10,000 sources over 365 nights: - At $S_{\text{min}} = 3$: 4,900 false triggers per year (unacceptable) - At $S_{\text{min}} = 4$: 115 false triggers per year (borderline; too many for MUSE proposals) - At $S_{\text{min}} = 5$: ~1 false trigger per year (acceptable)
Adopted: $S_{\text{min}} = 5$ for a preliminary flag.
However, $S > 5$ alone is necessary but not sufficient. It must be combined with the class-specific criteria from Section 2 (rise time, positional offset, host association, etc.) to become a MUSE_worthy = 1 flag. The $S > 5$ threshold serves as the first gate to eliminate non-anomalous sources before the more expensive class-specific checks are applied.
Non-Gaussian tails: Real photometric data has heavier tails than Gaussian (systematics, weather, scattered light). To be robust: - Validate Gaussianity of each source's baseline residuals (Shapiro-Wilk, p > 0.01) - For sources with non-Gaussian baselines ($p < 0.01$), use the empirical cumulative distribution of the baseline residuals to compute the significance. Set $S_{\text{min}}$ such that the probability of exceeding it under the empirical distribution is < $10^{-4}$ per epoch.
4.2 Time-Series Change-Point Detection (CUSUM)¶
For anomalies that manifest as gradual trends (CLAGN, LBV), a single-epoch $S$ statistic is insufficient. We need a statistic sensitive to a sustained shift in the mean.
Cumulative Sum (CUSUM) statistic:
Define the normalised residual at each epoch: $$z_i = \frac{m_i - \mu_b}{\sqrt{\sigma_i^2 + \sigma_b^2}}$$
The upper CUSUM: $$C_i^+ = \max(0, \; C_{i-1}^+ + z_i - k)$$
The lower CUSUM: $$C_i^- = \max(0, \; C_{i-1}^- - z_i - k)$$
Initialised at $C_0^+ = C_0^- = 0$. The parameter $k$ is the "allowance" or "slack" — the amount of shift you want to detect. For CLAGN detection, set $k = 0.5$ (detect shifts of ~0.5$\sigma$ per epoch, corresponding to a sustained drift).
Detection threshold $h$: The CUSUM signals a change-point when $C_i^+ > h$ or $C_i^- > h$.
To set $h$ for a target average run length (ARL) — the expected number of observations before a false alarm under no-change conditions:
For $k = 0.5$ and ARL$_0$ = 500 (approximately 500 observations = ~1.5 years of nightly monitoring before a false alarm):
$$h \approx 4.8$$
(From Hawkins & Olwell 1998, "Cumulative Sum Charts and Charting for Quality Improvement," Table A.2 for $k = 0.5$.)
Adopted: $h = 5.0$ for CLAGN and LBV trend detection. This gives ARL$_0 \approx 700$, or roughly one false trend alarm per 2 years of nightly monitoring per source. For 1,000 AGN monitored, this yields ~1.4 false trend alarms per year across the AGN sample. This is manageable.
Bayesian blocks (supplementary):
As a cross-check, run the Bayesian blocks algorithm (Scargle et al. 2013, ApJ 764, 167) on the light curve. Bayesian blocks partition the time series into segments of constant flux. A new block appearing in the most recent data, with a flux level differing by > $3\sigma_b$ from the previous block, is flagged. The Bayesian blocks false positive rate is controlled by the prior on the number of change-points (the ncp_prior parameter in astropy.stats.bayesian_blocks); set ncp_prior = 6.0 corresponding to ~1% false positive rate per source per year.
A source is flagged for trend-based anomaly only if both the CUSUM and the Bayesian blocks independently indicate a change-point in the same time interval.
5. ML vs. Statistical Approach¶
5.1 The Three Options¶
(a) Hand-crafted features + Random Forest / Decision Tree: Extract features from each light curve (amplitude, rise time, decline rate, colour change, positional offset, baseline variability, structure function parameters) and classify with a supervised ensemble model. Features are interpretable; the model can output feature importances for human review.
(b) Neural network on raw light curve (LSTM / Transformer): Ingest the raw $(t_i, m_i, \sigma_i)$ sequence and classify end-to-end. Can in principle learn features that humans would not think to extract. Requires large labelled training set. Outputs are not inherently interpretable.
(c) Bayesian template fitting: Fit parametric templates (TDE: power-law rise + $t^{-5/3}$ decline; nova: exponential rise + exponential/power-law decline; CLAGN: linear or sigmoid trend) to the light curve via MCMC or nested sampling. Compare the Bayesian evidence for each template vs. a "no anomaly" null model. The template with the highest evidence ratio identifies the class.
5.2 Assessment for OpenAstro's Context¶
Constraints: 1. Labelled training data is scarce at first. OpenAstro will start with zero classified light curves of its own. External catalogues exist (see 5.3) but are from different instruments with different systematics. 2. Interpretability is required. Every MUSE proposal must include a justification. "The neural network said so" is not acceptable. The classification reasoning must be explainable in terms of physical observables (rise time, amplitude, colour, position). 3. False positive rate must be low. With ~20 real events per year (Section 7), even a 5% FP rate on the full source pool would swamp the real events with noise. The classifier must be conservative. 4. Computational budget is modest. A VPS server processing light curves for 10,000 sources nightly.
Recommendation: Option (c) Bayesian template fitting, supplemented by (a) for initial screening.
Justification:
- Template fitting is physically motivated: the templates encode the known physics of each anomaly class. This provides automatic interpretability — the fit parameters (rise time, decline index, amplitude) are directly physically meaningful and can be quoted in proposals.
- The Bayesian evidence ratio directly quantifies "how much more likely is this light curve under a TDE model than under a null (constant + noise) model?" This is exactly the quantity needed for classification.
- Template fitting does not require labelled training data from the instrument. The templates come from theoretical models and from fits to the best-characterised examples in the literature. Transfer learning from ZTF/ASAS-SN light curves calibrates the template parameters but does not require instrument-matched training data.
- The computational cost of template fitting is higher per source than a random forest (~1-10 seconds per source for a 5-template comparison via nested sampling with dynesty or ultranest). But this is applied only to the ~1% of sources that pass the fast statistical filter (Section 5.4), so the total nightly computation is ~100-1000 sources x 10 seconds = 15 minutes. Feasible.
Why not neural network: Insufficient training data at launch. Non-interpretable outputs. Risk of overfitting to ZTF systematics that do not transfer to OpenAstro data. Can be revisited in 2-3 years once OpenAstro has accumulated its own classified light curve database.
Why not purely hand-crafted features + RF: This is the fast-filter stage (see 5.4). It works well for initial screening but does not provide the parametric fit that proposals need. It is used as stage 1, not the final classifier.
5.3 Available Training / Calibration Datasets¶
| Dataset | Source | Size | Use |
|---|---|---|---|
| ZTF TDE catalogue (Hammerstein et al. 2023; van Velzen et al. 2021; Yao et al. 2023) | ZTF public | ~100 TDEs with multi-band light curves | Template parameter calibration (rise time, peak luminosity, decline index distributions) |
| ASAS-SN nova catalogue (Kawash et al. 2021; Shafter 2017) | ASAS-SN | ~400 Galactic novae, ~50 extragalactic | Nova template calibration (amplitude, $t_2$, $t_3$ distributions) |
| SDSS/BOSS CLAGN catalogue (MacLeod et al. 2016; Yang et al. 2018; Green et al. 2022) | SDSS DR16+ | ~200 confirmed CLAGN with repeat spectroscopy | CLAGN amplitude and timescale distributions; colour change distributions |
| LBV eruption archive (Smith et al. 2011; Humphreys & Davidson 1994) | Literature compilation | ~30 events in Local Group | Template calibration (very small sample; Bayesian priors will be broad) |
| ZTF/ATLAS SN early light curves (Perley et al. 2020; BTS sample) | ZTF BTS | ~3,000 classified SNe | SN rise template calibration |
These datasets are publicly available. The ZTF and ASAS-SN catalogues include photometric light curves in machine-readable format. They are sufficient for template parameter estimation but not for end-to-end supervised learning (the instrument systematics are different).
5.4 Two-Stage Architecture¶
Stage 1: Fast Statistical Filter (runs on every source, every epoch) - Compute the single-epoch significance $S$ (Section 4.1) - Compute the CUSUM statistics $C^+$, $C^-$ (Section 4.2, updated incrementally) - If $S > S_{\text{min}} = 5$ OR $C^+ > h$ OR $C^- > h$: source passes to Stage 2 - Expected pass rate: ~1% of monitored sources on any given night (100 out of 10,000) - Computational cost: < 0.01 seconds per source (arithmetic on pre-computed running statistics)
Stage 2: Template Fitting + Classification (runs on Stage 1 survivors)
- Fit each anomaly template (TDE, CLAGN, nova, LBV, SN) to the light curve via nested sampling
- Compute Bayes factor $B_k = Z_k / Z_{\text{null}}$ for each class $k$, where $Z_k$ is the Bayesian evidence under template $k$ and $Z_{\text{null}}$ is the evidence under a constant-flux null model
- Apply the class-specific decision rules from Section 2 (positional offset, host association, colour, timescale)
- A source is classified as MUSE_worthy if:
- The best-fit class has $\ln B_k > 10$ (strong evidence on the Jeffreys scale)
- All class-specific criteria for that class are satisfied
- The confidence score is $p = 1 - 1/B_k$ (probability that the anomaly model is preferred over null)
- Computational cost: ~5-10 seconds per source (nested sampling with dynesty, 500 live points, 5 templates)
- Expected MUSE_worthy flags: ~0.1% of Stage 1 survivors = ~0.1 per night on average (see Section 7)
6. The Proposal Data Package¶
When a source is flagged as MUSE_worthy = 1, OpenAstro generates the following structured output. This package is designed to be directly insertable into a VLT/MUSE DDT or regular proposal.
6.1 Package Contents¶
1. Source Coordinates - RA, Dec (J2000), to arcsecond precision - Source: plate-solved OpenAstro astrometry, cross-matched with Gaia DR3 or SDSS if available - Positional uncertainty: quoted (typically < 1 arcsec from plate-solved FITS)
2. Classification - Anomaly class: one of {TDE, CLAGN, Nova, LBV, SN, Unknown} - Confidence score: $p \in [0, 1]$, derived from Bayes factor (Section 5.4) - Alternative class (if second-best class has $\ln B > 5$): listed with its own confidence
3. Light Curve Figure - Time axis: last 6 months (or full baseline if < 6 months) + anomaly region + 30 days of projected future evolution (if template fit provides extrapolation) - Y axis: calibrated magnitude (and flux in mJy on secondary axis) - Error bars: $\sigma_i$ per observation - Baseline mean shown as horizontal dashed line - Anomaly onset marked with vertical dashed line - Best-fit template overlaid in colour - Multi-band light curves shown in separate panels if available - Format: PDF vector graphic + machine-readable CSV of data points
4. Host Galaxy / Environment - Cross-match with NED: galaxy name, morphological type, distance (Mpc), redshift - Cross-match with SIMBAD: source classification if previously catalogued - Gaia DR3 cross-match: parallax, proper motion (to confirm extragalactic nature) - If host galaxy identified: angular separation from nucleus, position angle, projected physical offset (kpc)
5. Justification Text (Auto-Generated)
Template:
Source [name/coordinates] showed a [brightening/dimming] of [Y] mag over [Z] days beginning on [date], from a baseline of [mu_b +/- sigma_b] mag. The light curve is consistent with a [class] at [redshift/distance]. The Bayesian evidence ratio is [ln B] (confidence [p]). [Class-specific sentence]. Spectroscopic confirmation with VLT/MUSE would [specific scientific question].
Class-specific sentences:
-
TDE: "The transient is coincident with the nucleus of [galaxy name] (offset [X] arcsec), which showed no prior optical variability over [N] months of monitoring. The light curve rise time ([T] days) and peak luminosity (M ~ [X]) are consistent with the TDE population (van Velzen et al. 2021). MUSE integral field spectroscopy would detect broad H-alpha and He II 4686 emission confirming the tidal disruption origin, and measure the velocity dispersion of the debris stream to constrain the black hole mass."
-
CLAGN: "The source is a known AGN ([catalogue ID]) that has undergone a sustained flux change of [Y] mag over [Z] days, accompanied by a colour change of [delta(g-r)] mag. The change-point was detected by CUSUM analysis at [date]. MUSE spectroscopy would determine whether broad emission lines have [appeared/disappeared], confirming a spectral state transition and constraining the accretion rate change timescale."
-
Nova: "The transient appeared at [offset] arcsec from the centre of [galaxy name] (d = [D] Mpc) with a rise of [Y] mag in [Z] days, consistent with a [fast/slow] nova. MUSE spectroscopy at the nebular phase would measure ejecta abundances (CNO, Ne) to determine the white dwarf type, and line widths to constrain the ejection velocity and ejected mass."
-
LBV: "The source, identified as a massive star [or in star-forming region] in [galaxy name], has brightened by [Y] mag over [Z] months with a colour change of [delta(B-V)] mag toward the red. The slow, monotonic evolution is consistent with an LBV giant eruption (S Doradus-type instability). MUSE integral field spectroscopy would reveal the P Cygni profile structure of H-alpha and Fe II, constraining the mass-loss rate and determining whether this is a non-terminal instability or a precursor to core collapse."
-
SN: "The transient rose by [Y] mag in [Z] hours, consistent with the early rise of a [Type Ia/II/Ib-c] supernova. MUSE spectroscopy within [priority window] would capture the early spectral evolution, constraining the outer ejecta composition and progenitor properties."
6. Priority Assignment
| Priority | Criterion | Response window |
|---|---|---|
| A: Immediate | SN shock breakout ($ | \Delta m |
| B: Urgent | TDE > 7 days post-onset but still rising OR Nova within 14 days of peak OR CLAGN with $ | dm/dt |
| C: Standard | CLAGN with slow evolution OR LBV eruption OR TDE in decline phase | Observe within 1 month |
Priority A triggers a DDT proposal (or contact with existing ToO programmes). Priority B and C are submitted through the normal ESO proposal cycle (P1/P2) with the data package attached as supporting material.
7. Calibration and Expected Alert Rates¶
7.1 Volumetric Event Rates from Literature¶
| Event class | Volumetric rate | Source |
|---|---|---|
| TDE | ~$1 \times 10^{-5}$ per galaxy per year (i.e., 1 per $10^5$ galaxy-years) | van Velzen et al. 2018 (ApJ 852, 72); revised downward from earlier estimates. ZTF-calibrated rate: $3 \times 10^{-5}$ gal$^{-1}$ yr$^{-1}$ for E+A hosts (Hammerstein et al. 2021) |
| CLAGN | ~1% of AGN per year show > 0.5 mag change | MacLeod et al. 2019 (ApJ 874, 8); Yang et al. 2018 |
| Extragalactic nova | ~50 per year in M31; ~10-20 per year in other Local Group galaxies; ~1-5 per year per $L^*$ galaxy at d < 30 Mpc | Shafter 2017 (ApJ 834, 196) |
| LBV giant eruption | ~0.5 per year in Local Group; extremely rare beyond 10 Mpc | Smith et al. 2011 |
| Core-collapse SN (within useful volume) | ~1 per year at d < 30 Mpc (visible to OpenAstro) | Li et al. 2011 (MNRAS 412, 1473) |
7.2 Expected Alerts for OpenAstro¶
Assume OpenAstro monitors: - 5,000 galaxies (extended sources) — primarily nearby galaxies (d < 100 Mpc) selected from the Local Volume Galaxy catalogue, SDSS spectroscopic galaxy sample, and HyperLEDA - 2,000 known AGN — from SDSS quasar catalogue and Veron-Cetty & Veron, selected for brightness (V < 18) and prior variability characterisation - 3,000 other sources (stars in galaxy fields, comparison stars, serendipitous) — not primary targets but monitored as part of field photometry
Total: ~10,000 monitored sources.
Expected genuine MUSE-worthy events per year:
| Class | Calculation | Events/year |
|---|---|---|
| TDE | 5,000 galaxies x $3 \times 10^{-5}$ gal$^{-1}$ yr$^{-1}$ = 0.15; but OpenAstro targets nearby massive galaxies preferentially, enhancing the rate by ~5x for E+A hosts. Effective: ~0.5 per year. Also: some TDEs discovered by ZTF/ATLAS and followed up by OpenAstro. External trigger channel adds ~2-5 per year. | 1-5 |
| CLAGN | 2,000 AGN x 1% yr$^{-1}$ = 20. But not all are MUSE-worthy — many are at too-high redshift or too faint for MUSE follow-up. Requiring z < 0.3 and V < 19: ~30% of sample qualifies. | 6 |
| Nova (extragalactic) | Monitoring M31 + M33 + NGC 253 + other nearby galaxies: ~20-50 novae per year are detectable. But most are routine; MUSE-worthy are those in unusual hosts, with unusual decline rates, or recurrent nova candidates. Estimate 10% are MUSE-worthy. | 2-5 |
| LBV | Monitoring ~100 known massive stars in Local Group + nearby galaxies: ~0.5 events per year. | 0-1 |
| SN (early) | ~1 nearby SN per year at d < 30 Mpc detectable by OpenAstro. Additional SNe from ZTF/ATLAS triggers. | 2-5 |
Total genuine MUSE-worthy events: ~15-25 per year. Central estimate: 20 per year.
7.3 False Positive Budget¶
With the two-stage architecture:
Stage 1 false triggers: - 10,000 sources x 365 nights x $P(S > 5) = 2.87 \times 10^{-7}$ per epoch (Gaussian tail) = ~1 false trigger per year from point-in-time statistics - CUSUM false triggers: ~1,000 AGN x 1 false trend per 2 years per source = 500 per year. However, CUSUM flags are cross-checked with Bayesian blocks (Section 4.2), which independently has ~1% FPR per source per year = 10 false blocks per year from 1,000 AGN. Requiring both CUSUM + Bayesian blocks to agree: joint FPR ~ 500 x (10/1000) = 5 per year. - Combined Stage 1 false trigger rate: ~6 per year.
Stage 2 false classifications: - Of the ~6 false Stage 1 triggers per year, each undergoes template fitting. The template fitting adds a strong filter: a false trigger (which is just a noise fluctuation or normal AGN variability) will not produce $\ln B > 10$ for any anomaly template, because the templates are physically constrained (correct rise/decline shapes). Estimated survival rate of false triggers through Stage 2: 30%. - Stage 2 false positives: ~2 per year.
Final false positive rate: - 2 false positives per year out of ~22 total flags (20 real + 2 false) = ~9% false positive rate. - This meets the target of precision >= 0.90.
Is this acceptable? - 2 false MUSE proposals per year is tolerable if: (a) every flag is reviewed by a human before proposal submission, and (b) the false positives are recognisable as ambiguous cases (e.g., an AGN variability excursion that looked like CLAGN but turned out to be normal). The human review step (which receives the full data package including the template fit) is expected to catch at least half of the remaining false positives, reducing the actual false proposal rate to ~1 per year. - One false MUSE proposal per year is acceptable for a new programme establishing credibility. Professional collaborators will tolerate this rate, especially if the programme delivers 15-20 genuine discoveries per year.
7.4 Calibration Plan¶
Before deploying the classifier on live data, calibrate it on archival light curves:
-
Inject synthetic anomalies into real OpenAstro baseline light curves. For each class, draw parameters from the template distributions (Section 5.3) and add the synthetic signal to a real baseline. Run the classifier. Measure detection efficiency (recall) as a function of anomaly amplitude, rise time, and baseline noise level.
-
Run the classifier on ZTF forced-photometry light curves of known TDEs, CLAGN, novae, and SNe. Measure the detection significance and class assignment. This tests whether the templates and thresholds, calibrated on literature parameters, actually work on real (non-OpenAstro) data.
-
Run the classifier on ZTF forced-photometry light curves of normal galaxies and AGN that showed NO anomalies. Measure the false positive rate. Adjust $S_{\text{min}}$ and $h$ if the FPR exceeds the budget.
-
After 6 months of OpenAstro operation: Recalibrate using OpenAstro's own data. Adjust thresholds based on the actual noise properties, baseline distributions, and detection rates observed.
Implementation Notes¶
Dependencies¶
astropy(time, coordinates, FITS, Bayesian blocks)dynestyorultranest(nested sampling for template fitting)scipy.stats(Gaussian CDF, Shapiro-Wilk test)astroquery(NED, SIMBAD, Gaia cross-match)photutils(source detection, PSF measurement — already in QC pipeline)matplotlib(light curve figures for proposal package)
Database Schema Additions¶
The observations table (from Details on components.md) needs:
ALTER TABLE light_curves ADD COLUMN anomaly_S FLOAT; -- single-epoch significance
ALTER TABLE light_curves ADD COLUMN cusum_plus FLOAT; -- running CUSUM upper
ALTER TABLE light_curves ADD COLUMN cusum_minus FLOAT; -- running CUSUM lower
ALTER TABLE light_curves ADD COLUMN anomaly_class VARCHAR(16); -- TDE, CLAGN, Nova, LBV, SN, NULL
ALTER TABLE light_curves ADD COLUMN anomaly_confidence FLOAT; -- Bayes factor-derived p
ALTER TABLE light_curves ADD COLUMN muse_worthy BOOLEAN; -- final flag
ALTER TABLE light_curves ADD COLUMN muse_priority CHAR(1); -- A, B, C, or NULL
-- Baseline statistics (per source)
ALTER TABLE sources ADD COLUMN baseline_mean FLOAT;
ALTER TABLE sources ADD COLUMN baseline_sigma FLOAT;
ALTER TABLE sources ADD COLUMN baseline_tau FLOAT; -- variability timescale (days)
ALTER TABLE sources ADD COLUMN baseline_n INT; -- number of baseline epochs
ALTER TABLE sources ADD COLUMN baseline_frozen BOOLEAN DEFAULT FALSE;
Pipeline Integration¶
The MUSE First Filter runs as a nightly batch job after the QC pipeline's Stage 3 (time-series quality) completes:
QC Stage 3 (nightly) → MUSE First Filter Stage 1 (fast stats) →
if flagged → Stage 2 (template fitting) →
if MUSE_worthy → generate proposal package →
human review queue
For SN-class events (Priority A), Stage 1 runs in real-time on each observation upload (not waiting for the nightly batch), and the alert fires within 15 minutes of the triggering observation.
Open Questions and Future Work¶
-
Colour information: The framework above is specified primarily for single-band light curves. Multi-band colour evolution (e.g., $g - r$ colour change) dramatically improves class discrimination (TDE: blue at peak; CLAGN: bluer-when-brighter; LBV: redder during eruption). When OpenAstro nodes routinely provide multi-band photometry, add colour-based criteria to each class decision rule and to the template fits.
-
External alert integration: The framework assumes OpenAstro detects anomalies independently. In practice, many events will first be discovered by ZTF/ATLAS/ASAS-SN/Rubin. The classifier should also ingest external alerts (via broker API — ANTARES, LASAIR, ALeRCE) and run the MUSE-worthiness assessment on those targets, even without an OpenAstro baseline. In that case, use the survey light curve as the input and skip the baseline characterisation (or use the survey's own baseline).
-
Spectroscopic pre-classification: Cross-match with SDSS spectroscopic catalogue to identify sources with existing spectra. A "before" spectrum greatly increases the MUSE proposal's value (can demonstrate the spectral change). Flag sources with existing SDSS/BOSS spectra as higher priority for MUSE follow-up.
-
Feedback loop: After the first year of operation, compare the classifier's predictions with actual MUSE outcomes (was the target scientifically interesting?). Use this to update template parameters, adjust thresholds, and potentially train a supervised model on OpenAstro's own data.
-
Coordination with ESO: The ultimate goal is to establish a standing MUSE programme (or an approved filler programme) that can be triggered by OpenAstro alerts. This requires building the track record first (successful DDT proposals) and then applying for a regular programme. The proposal data package (Section 6) is designed to make this progression as frictionless as possible.
References¶
- Hammerstein et al. 2023, ApJ 942, 9 — ZTF TDE late-time behaviour
- Hawkins & Olwell 1998, "Cumulative Sum Charts and Charting for Quality Improvement" — CUSUM theory
- Kawash et al. 2021, ApJ 910, 120 — ASAS-SN nova catalogue
- LaMassa et al. 2015, ApJ 800, 144 — first changing-look quasar
- Li et al. 2011, MNRAS 412, 1473 — nearby SN rates
- MacLeod et al. 2010, ApJ 721, 1014 — SDSS Stripe 82 AGN variability
- MacLeod et al. 2016, MNRAS 457, 389 — CLAGN from SDSS
- Masci et al. 2019, PASP 131, 018003 — ZTF image processing
- Scargle et al. 2013, ApJ 764, 167 — Bayesian blocks
- Shafter 2017, ApJ 834, 196 — nova rates
- Smith et al. 2011, MNRAS 415, 773 — LBV eruption catalogue
- van Velzen et al. 2018, ApJ 852, 72 — TDE rates
- van Velzen et al. 2021, ApJ 908, 4 — ZTF TDE sample
- Yang et al. 2018, ApJ 862, 109 — CLAGN systematic search
- Yao et al. 2023, ApJ 955, L6 — ZTF TDE sample update