Computation Pipeline Map¶

A precise map of every computationally meaningful step in the OpenAstro pipeline: what runs, what it costs, where it runs, what it produces, and what depends on what. This is the reference document for understanding the full compute graph.

Pipeline Overview¶

The OpenAstro computation pipeline has two branches that converge:

Branch A — Image Processing Pipeline: Raw FITS → calibrated photometry + stacked images Branch B — Science Inference Pipeline: Calibrated light curves → measured periods / inferred planet parameters

Both branches feed into the Science Output Layer: light curve archive, parameter posteriors, alert generation.

Full Pipeline Diagram¶

                    EXTERNAL DATA SOURCES
                 (AAVSO / ETD / MPC / AstroBin)
                    OR live telescope uploads
                             |
                             v
                    ┌────────────────────┐
                    │   INGEST LAYER     │
                    │  Format parsing    │
                    │  Deduplication     │
                    │  Metadata extract  │
                    └────────┬───────────┘
                             |
              ┌──────────────┴───────────────┐
              │                              │
              v                              v
     ┌────────────────┐             ┌─────────────────┐
     │  IMAGE BRANCH  │             │  LIGHTCURVE /   │
     │  (FITS files)  │             │  PHOTOMETRY     │
     │                │             │  BRANCH         │
     └───────┬────────┘             └────────┬────────┘
             |                               |
             v                               v
     ┌────────────────┐             ┌─────────────────┐
     │  PLATE SOLVE   │             │  CALIBRATE      │
     │  (WCS embed)   │             │  (Zero-point    │
     └───────┬────────┘             │   Gaia/APASS)   │
             |                      └────────┬────────┘
             v                               |
     ┌────────────────┐                      |
     │  FLUX CALIB    │                      |
     │  (Zero-point   │                      |
     │  + color term) │                      |
     └───────┬────────┘                      |
             |                               |
             v                               |
     ┌────────────────┐                      |
     │ WCS REPROJECT  │                      |
     │  onto ref grid │                      |
     └───────┬────────┘                      |
             |                               |
             v                               |
     ┌────────────────┐                      |
     │  CO-ADDITION   │                      |
     │  (SNR-weighted │                      |
     │   stack)       │                      |
     └───────┬────────┘                      |
             |                               |
             └──────────────┬────────────────┘
                            |
                            v
                 ┌──────────────────────┐
                 │  LIGHT CURVE BUILD   │
                 │  Aggregate phot pts  │
                 │  per target per band │
                 └──────────┬───────────┘
                            |
              ┌─────────────┼──────────────┐
              |             |              |
              v             v              v
     ┌──────────────┐ ┌──────────┐ ┌────────────────┐
     │  LOMB-SCARGLE│ │  TTV     │ │  TRANSIENT     │
     │  PERIOD      │ │  TIMING  │ │  CLASSIFIER    │
     │  SEARCH      │ │  EXTRACT │ │  (diff imaging)│
     └──────┬───────┘ └────┬─────┘ └───────┬────────┘
            |              |               |
            v              v               v
     ┌──────────────┐ ┌──────────────┐ ┌───────────────┐
     │  PERIOD +    │ │  N-BODY MCMC │ │  TRANSIENT    │
     │  POWER       │ │  INFERENCE   │ │  ALERT +      │
     │  SPECTRUM    │ │  (TTVFast +  │ │  FOLLOW-UP    │
     └──────────────┘ │  dynesty)    │ └───────────────┘
                      └──────┬───────┘
                             |
                             v
                 ┌──────────────────────┐
                 │  POSTERIOR ON        │
                 │  PERTURBER PARAMS    │
                 │  {M₂, P₂, e₂, ω₂}  │
                 └──────────────────────┘

Stage-by-Stage Specification¶

Each stage is described by: what it does, inputs, outputs, compute cost, parallelism type, and where it should run.

Stage 0: Ingest¶

What it does: Accept data from external archives (archival mode) or telescope client uploads (live mode). Parse formats, extract metadata, validate, deduplicate, store.

Inputs: - Archival: AAVSO FITS/CSV, ETD transit CSV, MPC observation records, AstroBin FITS - Live: Client POST containing FITS file path + observation metadata JSON

Outputs: - Raw FITS stored in object storage (B2/S3) - Observation record in database (target_id, site_id, timestamp, filter, raw_path)

Compute cost: Negligible. IO-bound, not CPU-bound.

Parallelism: Trivially parallel. Each ingested file is independent.

Where it runs: Central server, synchronously in the API handler or via Celery task for large file handling.

Dependencies: None — this is the pipeline entry point.

Stage 1: Plate Solving¶

What it does: Given a FITS file without a WCS header (or with a poor/missing one), determine the precise sky position and orientation of every pixel. Embeds a WCS solution into the FITS header.

Inputs: Raw FITS file (2D image array + basic FITS headers)

Outputs: FITS file with full WCS header (CRVAL1/2, CRPIX1/2, CD matrix or CDELT+PC matrix)

Compute cost: - Per-image time: 5–60 seconds depending on solver and image quality - astrometry.net local solver: 10–30 seconds average on a 4k×4k frame - astrometry.net remote API: 30–120 seconds including network round-trip - Index files required: ~40GB on disk for astrometry.net; ~2GB for ASTAP with limited sky coverage - Memory: ~500MB peak during solving

Parallelism: Embarrassingly parallel. Each image solved independently. CPU-bound, no shared state.

Where it runs: - Stage 1: Central server, Celery worker pool (4–8 workers) - Stage 2+: Same architecture, add workers as throughput demands - Not suitable for BOINC (large index file dependency) or Lambda (cold start cost + index files) - Possible future: edge pre-solving if client has ASTAP installed (reduces upload round-trip)

Failure mode: Solve fails on very crowded fields, very empty fields (high latitude), or badly trailed images. Return wcs_failed flag; flag observation for manual review.

Dependencies: Stage 0 (raw FITS in storage)

Stage 2: Flux Calibration (Zero-Point + Color Term)¶

What it does: Determine the photometric zero-point for a FITS image by comparing detected sources to a reference catalog (Gaia DR3, APASS). Accounts for color terms introduced by different filter/camera combinations.

Algorithm: 1. Source extraction on the image (sep.extract or photutils.DAOStarFinder) — extracts instrumental magnitudes 2. Cross-match detected sources against Gaia/APASS using WCS (now available from Stage 1) 3. For matched stars: plot instrumental magnitude vs. catalog magnitude, fit zero-point and color term via sigma-clipping linear regression 4. Record: ZP_inst (zero-point), CT_BV (B-V color term), scatter (residual RMS) 5. Update instrument registry for this site+camera combination (accumulates characterization over time)

Inputs: WCS-solved FITS + site metadata (camera model, filter wheel configuration)

Outputs: Per-image zero-point + color term; calibrated magnitude for each extracted source; updated instrument registry entry

Compute cost: - Source extraction: 1–10 seconds on a 4k×4k frame - Catalog query: 1–5 seconds (network, can be cached by sky region) - Zero-point fit: milliseconds - Total per image: ~5–15 seconds CPU, mostly IO/network-bound

Parallelism: Embarrassingly parallel per image. Catalog queries can be batched and cached centrally.

Where it runs: Central server Celery workers. Catalog caching (Redis or local SQLite from Gaia cone queries) reduces repeat network calls significantly.

Dependencies: Stage 1 (WCS required for catalog cross-match)

Stage 3: WCS Reprojection¶

What it does: The core of heterogeneous stacking. Takes N calibrated FITS images from instruments with different pixel scales, orientations, and fields of view. Projects all of them onto a single common pixel grid defined by the "reference frame" (the image with best resolution or seeing).

Algorithm: 1. Select reference frame (highest resolution = smallest arcsec/pixel, or best seeing if tied) 2. For each other image: call reproject_interp(input_hdu, reference_wcs, shape_out=reference_shape) — this performs bilinear or Lanczos interpolation of the input image's pixels onto the reference grid 3. Also reproject the weight map (SNR map, or flat-field-derived sensitivity map) the same way

Inputs per job: One FITS HDU + reference WCS header + reference shape

Outputs per job: Reprojected FITS array (same pixel dimensions as reference) + reprojected weight map

Compute cost: - Per image, 4k×4k FITS: 10–60 seconds (reproject_interp in CPU mode) - reproject_exact (flux-conserving): 5–20x slower than reproject_interp; use interp for stacking, exact only for precise photometry - With N=100 images: 1,000–6,000 seconds = 17–100 CPU-minutes - Memory per job: ~2× image size = ~100–200MB

GPU acceleration: The reproject package has a GPU backend (reproject_adaptive with cupy). On a modern GPU (RTX 4090), per-image time drops to ~1–2 seconds. For N=100 images: ~2 minutes. This is a 10–30x speedup and is worth investing in at Stage 2+.

Parallelism: Embarrassingly parallel. Each image's reprojection is independent. Ideal BOINC work unit at scale.

Where it runs: - Stage 1–2: Celery workers, CPU-based - Stage 2+ if you have a GPU: single GPU worker handles all reprojections serially but fast - Stage 3: BOINC if scale exceeds central capacity

Data flow consideration: Each BOINC work unit needs the input FITS and the reference WCS header. Input: ~50–200MB. Output: ~50–200MB. The volunteer must have bandwidth for this. For most home connections this is fine for occasional jobs but can become a burden if BOINC assigns many reprojection jobs simultaneously.

Dependencies: Stages 1 and 2 (WCS + flux calibration)

Stage 4: Flux Normalization¶

What it does: After reprojection, images from different instruments have different background levels and signal scales. Normalize so that a star of known magnitude has the same pixel value across all images.

Algorithm: 1. Background estimation (astropy.stats.sigma_clipped_stats or photutils.Background2D) — measure median sky level 2. Background subtraction 3. Scale factor computation: use zero-point from Stage 2 to determine the multiplicative factor that maps each image's flux units to a common reference unit (e.g., electrons/second at a reference aperture) 4. Apply scale factor

Inputs: Reprojected FITS array + zero-point from Stage 2

Outputs: Background-subtracted, flux-normalized FITS array ready for co-addition

Compute cost: ~1–5 seconds per image. Background estimation is the slow step. Essentially free compared to reprojection.

Parallelism: Embarrassingly parallel per image.

Where it runs: Can run as part of the same Celery task as Stage 3 (chained). No separate worker needed.

Dependencies: Stages 2 and 3

Stage 5: Co-addition (Weighted Stacking)¶

What it does: Combine N flux-normalized reprojected images into a single deep image. Reduce noise by SNR-weighted averaging with sigma clipping to reject cosmic rays, satellite trails, and bad pixels.

Algorithm: 1. Load all N reprojected, normalized arrays into memory 2. Stack into a 3D array: shape (N, height, width) 3. Compute per-pixel weights: w_i = (SNR_i)² = (signal_i / noise_i)² 4. Sigma-clip: for each pixel position, reject values more than 3σ from the weighted mean 5. Compute final pixel value: weighted mean of surviving pixels 6. Compute uncertainty map: σ_final = 1/√(Σ w_i) for retained pixels

Inputs: N normalized, reprojected FITS arrays + N weight arrays

Outputs: Final stacked FITS + uncertainty FITS + coverage map (N pixels contributing at each position)

Compute cost: - IO-bound: loading N FITS files into RAM. N=100, 50MB each = 5GB data read - Compute: numpy vectorized weighted mean + sigma clip ≈ 5–30 seconds for N=100 on a 4k×4k stack - Memory peak: N × image_size = 100 × 64MB (float32, 4k×4k) = 6.4GB RAM required

Parallelism: Cannot be trivially parallelized — all images must be on the same machine. Can be done in tiles (divide the image into spatial tiles and stack each tile independently) to reduce memory requirement, but introduces edge effects if not handled carefully.

Where it runs: Central server (not BOINC — requires all N images locally). The machine must have enough RAM. At Stage 2, 32GB RAM is sufficient. At Stage 3 with very large stacks, tiled stacking or chunked processing is needed.

Chunked stacking strategy (for large N): 1. Divide images into groups of 20 2. Stack each group to produce intermediate stack + weight sum 3. Combine intermediate stacks using weight-sum-weighted mean This reduces peak memory by N/K where K is chunk size. Introduces negligible quality loss.

Dependencies: Stages 3 and 4

Stage 6: Light Curve Assembly¶

What it does: Aggregate individual photometric measurements (from calibrated observations, not stacked images) into a time-ordered light curve per target per band.

This is not a compute step — it is a database query. Pull all observations for target_id X in filter Y ordered by timestamp, apply outlier rejection, compute binned flux if needed.

Inputs: observation table (timestamp, calibrated magnitude, error, site_id, filter, quality_flags)

Outputs: Light curve: {time (BJD), flux, uncertainty} per target per band

Compute cost: Negligible. SQL query + numpy array construction. Milliseconds.

Where it runs: Central server, synchronously in the API handler.

Dependencies: Stage 2 (calibrated photometry in database)

Stage 7a: Lomb-Scargle Period Analysis¶

What it does: Search for periodic signals in an irregularly sampled light curve. The Lomb-Scargle periodogram is the standard algorithm for this in astronomy.

Algorithm: 1. Compute the Lomb-Scargle power at each trial frequency over a grid spanning from the sampling cadence to the full observation baseline 2. Identify peaks in the power spectrum 3. Estimate false alarm probability (FAP) for the strongest peak via bootstrap resampling or analytical approximation 4. If FAP < threshold, period is significant 5. Optionally: phase-fold the light curve at the detected period and fit a sinusoidal or physically motivated model

Inputs: Light curve {time, flux, uncertainty}

Outputs: Power spectrum {frequency, power}; best-fit period + uncertainty; FAP; phase-folded light curve

Compute cost: - N data points, M frequency bins: O(N × M) naively, O(N log N) with fast algorithms (Press & Rybicki 1989 / gatspy) - For N=1000, M=10000: <1 second with astropy.timeseries.LombScargle - For N=10000, M=100000: ~5–30 seconds - Bootstrap resampling for FAP (1000 resamples): 1000× the above

Parallelism: Trivially parallel across targets. Bootstrap resampling is also embarrassingly parallel (each resample is independent). astropy.timeseries.LombScargle supports nterms for multi-harmonic fits.

Where it runs: Celery workers. Single worker handles Stage 1 volumes in seconds.

Dependencies: Stage 6

Stage 7b: TTV Transit Timing Extraction¶

What it does: From a light curve containing multiple transit events, extract individual transit times. Each transit is fit with a transit model (limb-darkened Mandel-Agol profile) to determine the time of mid-transit. The deviations of these times from a linear ephemeris are the Transit Timing Variations (TTVs).

Algorithm: 1. Fold light curve at nominal orbital period P₀ and reference epoch T₀ (from published ephemeris or initial fit) 2. Identify individual transit windows (each one ±0.5×transit_duration around predicted mid-time) 3. For each transit window: fit a Mandel-Agol transit model using batman or juliet with MCMC or Nelder-Mead optimization, free parameter being the mid-transit time 4. Collect {T_observed,i} for all N transits 5. Fit a linear trend T_predicted,i = T₀ + i × P to the observed times 6. TTV = T_observed,i − T_predicted,i

Inputs: Full light curve + known transit parameters (Rp/R, a/R, b, P₀, T₀, limb darkening)

Outputs: Array of N (transit_epoch, O−C residual, uncertainty) pairs — this is the TTV signal

Compute cost per transit: - Nelder-Mead optimization for single transit time: 1–10 seconds - MCMC per transit (to propagate uncertainties): 1–10 minutes - For N=50 transits: ~1–10 hours total for full MCMC approach, ~1–10 minutes for point estimates

Parallelism: Each transit fit is independent → embarrassingly parallel. Farm across Celery workers.

Where it runs: Celery workers. Multiple workers speed up the per-transit fitting.

Dependencies: Stage 6

Stage 7c: Transient Classification¶

What it does: Identifies real astrophysical transients in difference images (new image − reference image). Classifies candidates as real transient, cosmic ray, satellite trail, bad subtraction, or instrumental artifact.

Algorithm: 1. Image subtraction: align new image to reference (both should already have WCS → use WCS reprojection from Stage 3 for the reference), subtract 2. Source detection in difference image: threshold at 5σ above background 3. For each candidate: extract 64×64 pixel cutout from difference image 4. Run CNN classifier: real vs. bogus classification 5. For real transients: cross-match against known catalogs (MPC, NED, SDSS) to rule out known objects 6. Alert if: real classification confidence > 0.8 AND no catalog match

Inputs: New FITS + reference FITS (same target field, earlier epoch)

Outputs: List of transient candidates with positions, confidence scores, cross-match results; alerts for high-confidence unknowns

Compute cost (inference): - Difference image construction: ~10 seconds (reprojection of reference to match new image WCS) - Source detection: ~5 seconds (sep) - CNN inference on 100 candidates: <1 second on GPU, ~3 seconds on CPU (batched) - Cross-match: ~2 seconds per candidate via astroquery

Model training cost (happens offline, not per-observation): - Training data: labeled cutouts (SDSS stamps labeled as real/bogus by existing surveys) - Training time: ~2–4 hours on a single GPU (RTX 3060 or better) for a ResNet-18 architecture - Retraining frequency: every 3–6 months as new training data accumulates

Parallelism: Inference is batched — GPU processes many candidates simultaneously. Training is a single-GPU job.

Where it runs: - Stage 2: CPU inference, Celery worker. Fast enough. - Stage 3: GPU inference worker. Required once >1000 candidates/night.

Dependencies: Stages 1 and 2 (WCS + calibrated images); reference image archive

Stage 8: TTV n-Body MCMC Inference¶

What it does: The scientific peak of the pipeline. Given N measured transit timing residuals (the TTV signal from Stage 7b), perform Bayesian inference over the orbital parameters of a hypothetical gravitational perturber. Returns a posterior distribution over {M₂, P₂, e₂, ω₂, t_conj₂}.

Algorithm: 1. Define parameter vector θ = {M₂, P₂, e₂, ω₂, t_conj₂} for the perturber (and optionally perturb the known planet's parameters: M₁, P₁, e₁) 2. Define prior distributions (log-uniform for M₂ and P₂, uniform for angles) 3. Define likelihood: call TTVFast with θ to generate predicted transit times, compute Gaussian log-likelihood against observed transit times 4. Run nested sampler (dynesty) or MCMC (emcee) to map the posterior 5. Optionally run Bayesian model comparison: compute evidence ratio for n=2 vs. n=3 planet model (does adding a second perturber improve the fit significantly?)

Software stack: - ttvfast-python (wrapper around Deck et al. 2014 Fortran/C code) for likelihood calls - dynesty for nested sampling (handles multimodal posteriors better than MCMC for TTV parameter spaces which can have period-aliasing degeneracies) - emcee as alternative if posterior is known to be unimodal

Compute cost: - Single TTVFast call: ~0.5–2 ms (varies with n_bodies, integration length, time resolution) - dynesty for 5-parameter model, ~100 live points: ~200,000–1,000,000 likelihood calls - Total per system: ~100–2,000 seconds = 2 minutes to 30 minutes on 1 CPU core - With dynesty's pool=multiprocessing.Pool(16): ~8–125 seconds wall time - Typical practical experience: ~10–60 minutes per system with default settings for complex TTVs

Scaling: - 10 systems: 100–600 minutes = 2–10 CPU-hours/run. Fine overnight on a VPS. - 100 systems: 20–100 CPU-hours/run. Needs a cluster or multi-day queue. - 1000 systems: 200–1000 CPU-hours/run. BOINC territory.

Parallelism: - Across systems: embarrassingly parallel. Each system's inference is fully independent. - Within a system: dynesty multiprocessing (pool of N workers for likelihood calls). Good scaling up to ~32 cores; diminishing returns beyond that due to sampler coordination overhead. - BOINC strategy: Each work unit = one complete dynesty run for one system from one starting configuration. Multiple work units per system for reliability and coverage. Results combined by merging posterior sample sets.

Checkpointing: dynesty saves sampler state via sampler.save(fname) periodically. BOINC work units should checkpoint every ~5 minutes to enable resume after volunteer machine restart.

Output: - Posterior samples: {M₂, P₂, e₂, ω₂} with full covariance structure - Model evidence: log Z (for Bayesian model comparison) - Best-fit parameters + credible intervals - Corner plot (generated at result aggregation time)

Where it runs: - Stage 1: Direct Python script on local machine or VPS, overnight batch - Stage 2: Celery jobs with dynesty multiprocessing, 8–16 cores per job - Stage 3: BOINC volunteer network for large-scale campaigns

Dependencies: Stage 7b (TTV residuals required)

Data Flow and I/O Budget¶

Understanding what data moves where is essential for cost estimation and system design.

Per Observation (Single Telescope, Single Exposure)¶

Item	Size	Direction	Cost (B2 pricing)
Raw FITS (4k×4k, 16-bit)	~32 MB	Upload to S3	$0.006/upload
FITS compressed (RICE)	~16 MB	Upload	$0.003
Photometry JSON	~1 KB	API POST	Negligible
Plate-solve output (WCS header)	~1 KB	Written to FITS header	Negligible
Calibrated FITS (after Stage 2)	~16 MB	Stored in S3	$0.0001/month
Reprojected FITS (Stage 3)	~16 MB	Temporary worker file	$0 if deleted

Per Stacking Run (100-image heterogeneous stack)¶

Item	Size	Notes
Input FITS (100 images)	~1.6 GB	Read from S3
Reprojected intermediates	~1.6 GB	Temporary on worker disk
Final stacked FITS	~64 MB	Permanent in archive
Uncertainty map	~64 MB	Permanent in archive
Peak RAM required	~6.4 GB	All intermediates in memory during co-addition

Per TTV Inference Run (single system)¶

Item	Size	Notes
Input transit times CSV	~5 KB	Trivially small
MCMC output (posterior samples)	~10 MB	10,000 samples × 5 parameters × float64
Log file	~1 MB	Checkpoint files during run
Corner plot PNG	~500 KB	Generated at result time

Dependency Graph (Which Stages Block Which)¶

Stage 0 (Ingest)
    ↓
Stage 1 (Plate Solve)    ←── blocks Stage 2, 3
    ↓
Stage 2 (Flux Calib)     ←── blocks Stage 4, 5, 6, 7b, 7c
    ↓
Stage 3 (Reproject)      ←── blocks Stage 4, 5
    ↓
Stage 4 (Flux Norm)      ←── blocks Stage 5
    ↓
Stage 5 (Co-addition)    ←── science product (deep image)

Stage 6 (LC Assembly)    ←── pulls from Stage 2 output; blocks 7a, 7b
    ↓
Stage 7a (Lomb-Scargle)  ←── science product (periods)
Stage 7b (TTV Timing)    ←── blocks Stage 8
    ↓
Stage 8 (n-body MCMC)    ←── science product (planet parameters)

Stage 7c (Transient)     ←── runs from Stage 1+2 output independently

The critical path for TTV science is:

Ingest → Plate Solve → Flux Calib → LC Assembly → TTV Timing → n-body MCMC

Steps 3–5 (reprojection, normalization, co-addition) are not on the TTV critical path. They run in parallel and produce deep images as a separate science product.

Module Interface Specification¶

Each stage is a standalone Python function with a clean signature. This is the rule: stages communicate through data (files, database rows), not through shared state.

# Stage 1
def plate_solve(fits_path: str) -> dict:
    # Returns {"success": bool, "wcs_header": dict, "solved_path": str}

# Stage 2
def calibrate_photometry(fits_path: str, site_id: str) -> dict:
    # Returns {"zero_point": float, "color_term": float, "scatter_mag": float,
    #          "catalog_matches": int, "calibrated_sources": list}

# Stage 3
def reproject_image(fits_path: str, reference_wcs: dict, output_shape: tuple) -> str:
    # Returns path to reprojected FITS file

# Stage 5
def coAdd(reprojected_paths: list, weight_paths: list) -> dict:
    # Returns {"stacked_path": str, "uncertainty_path": str, "coverage_path": str}

# Stage 7a
def lomb_scargle(times: np.ndarray, fluxes: np.ndarray, errors: np.ndarray) -> dict:
    # Returns {"best_period": float, "fap": float, "power_spectrum": np.ndarray}

# Stage 7b
def extract_transit_times(light_curve: pd.DataFrame, transit_params: dict) -> pd.DataFrame:
    # Returns DataFrame with columns [epoch, t_obs, t_pred, O_minus_C, sigma]

# Stage 8
def run_ttv_inference(ttv_data: pd.DataFrame, prior_config: dict) -> dict:
    # Returns {"posterior_samples": np.ndarray, "log_evidence": float,
    #          "best_fit": dict, "credible_intervals": dict}

Worker Architecture (Celery)¶

Redis Broker
    |
    ├── queue: pipeline_fast        (stages 0–2, short jobs <60s)
    │       workers: 4–8 processes
    │
    ├── queue: pipeline_slow        (stages 3–5, reprojection, co-add, minutes)
    │       workers: 2–4 processes
    │
    ├── queue: science_inference    (stages 7a, 7b, Lomb-Scargle, TTV timing)
    │       workers: 2–4 processes
    │
    └── queue: mcmc                 (stage 8, TTV n-body MCMC, hours)
            workers: 1–2 processes (each uses dynesty multiprocessing internally)

Rationale for separate queues: MCMC jobs run for hours. Without a separate queue, one MCMC job would block all pipeline jobs on a single worker. With separate queues and worker pools, MCMC runs in the background while the fast pipeline keeps moving.

Compute Requirements by Stage (Operational)¶

Stage 1 (Archival Only)¶

Resource	Requirement
CPU	4 cores
RAM	16 GB
Disk (SSD)	100 GB (index files + processing temp)
Network	50 Mbps (archival download + catalog queries)
GPU	Not required
Monthly cost	~€8 (Hetzner CX31)

Stage 2 (10–50 Volunteer Telescopes)¶

Resource	Requirement
CPU	16–32 cores
RAM	64–128 GB
Disk (NVMe SSD)	500 GB + object storage
Network	200 Mbps
GPU	Optional (RTX 3060 for ML inference)
Monthly cost	~€50–100 (Hetzner AX41 bare metal or equivalent)

Stage 3 (50+ Telescopes + BOINC)¶

Resource	Requirement
CPU	32–64 cores (owned or cloud burst)
RAM	256 GB
Disk	2 TB NVMe + multi-TB object storage
Network	1 Gbps
GPU	RTX 4090 or A100 for ML + GPU reprojection
Monthly cost	~€200–400 (owned hardware co-located)

Open Engineering Questions¶

These are not resolved and need decisions before implementation:

Reference frame selection algorithm: What criterion selects the reference frame for a stacking run? Candidates: smallest FWHM (best seeing), smallest pixel scale (highest resolution), highest SNR, or some combination. Decision affects output image quality significantly.
Chunked stacking strategy: For N > 200 images where peak RAM exceeds available memory, implement tile-based or chunk-based co-addition. How large should chunks be, and how to handle edge effects cleanly?
Dynesty vs. emcee for TTV inference: dynesty handles multimodal posteriors better (period aliasing in TTV is a real problem) but is slower per posterior volume unit. emcee is faster if the posterior is unimodal and well-conditioned. Strategy: start with emcee for cheap systems (well-characterized, simple TTV), escalate to dynesty for complex cases.
BOINC work unit budget: How many MCMC steps per work unit? Target 30–45 minutes of wall time per work unit on an average volunteer machine (3GHz dual-core). This determines work unit granularity and number of work units per system.
Transient reference image management: The transient classifier needs a reference image per target field. When does a reference image become "stale"? Policy needed.
TTV inference update frequency: How often to re-run inference as new transit times come in? On every new transit (expensive), nightly batch, or weekly? The posterior should be stored as checkpoint samples so a new run can initialize from the previous posterior (warm-start MCMC).