Distributed Computing Strategy¶
This document thinks through the full distributed compute problem for OpenAstro β what needs to run, where it should run, how much it costs, and when each model becomes necessary. The point is not to over-engineer, but to understand the compute landscape well enough to make good architectural decisions at each stage.
The Core Tension¶
OpenAstro has two distinct compute problems that feel similar but are structurally different:
Data pipeline compute β calibration, plate solving, WCS reprojection, stacking, photometry extraction. This is high-volume, well-parallelizable, and can tolerate latency (hours, not seconds). The bottleneck is throughput.
Scientific inference compute β TTV n-body MCMC, Lomb-Scargle over large datasets, Bayesian model comparison. This is computationally deep per job, requires specific software environments, and for TTV is the dominant compute expense in the entire system. The bottleneck is raw CPU-hours.
These want different compute architectures. Conflating them is a design error.
1. What Is Actually Computationally Expensive?¶
Ordered by compute intensity (rough estimates for a 50-telescope network producing data nightly):
1.1 TTV n-body MCMC β The Dominant Cost¶
Why it's extreme: Each MCMC likelihood evaluation requires a full n-body integration of the planetary system over the observation baseline (years). TTVFast is fast (symplectic integrator, ~1ms per call in C) [Source: Deck et al. (2014), ApJ 787, 132 β TTVFast symplectic integrator], but MCMC requires ~millions of likelihood evaluations to adequately sample a 5β8 dimensional parameter space for a 2-planet system. [Source: Foreman-Mackey et al. (2013), PASP 125, 306 β emcee ensemble sampler design and scaling behaviour]
Rough numbers: - Single MCMC run, n=2 system (4 perturber parameters + nuisance): ~5M likelihood calls - TTVFast call time: ~0.5β2ms per call - Wall time on a single CPU core: ~1β3 hours per system - With emcee ensemble sampler (100 walkers): each "step" is 100 parallel likelihood calls, so the bottleneck is total steps Γ time per call - With nested sampling (dynesty): typically 10x more evaluations but better posterior exploration
Scaling to 10 monitored systems means 10β30 CPU-hours of inference per inference run. This is fine for a single machine at Stage 1 (run overnight). It becomes a problem at Stage 2+ when you want live ongoing inference updated as new transit observations arrive.
Parallelism: MCMC chains are only weakly parallelizable within a single run β walkers share information. [Source: Foreman-Mackey et al. (2013), PASP 125, 306 β emcee ensemble sampler coupling] However, multiple independent systems are perfectly parallelizable (each system's inference is independent). Nested sampling is even better: dynesty has native multiprocessing support for likelihood evaluations. [Source: Speagle (2020), MNRAS 493, 3132 β dynesty nested sampling with multiprocessing]
1.2 WCS Reprojection for Heterogeneous Stacking¶
Why it's expensive: Reprojecting a full-frame FITS image onto a new WCS grid requires per-pixel interpolation (bilinear or Lanczos). A 4096Γ4096 frame has 16M pixels. Reprojecting 100 images from different instruments = 1.6B interpolation operations. The reproject_interp call in Astropy is not trivially fast. [Source: Robitaille et al. (2020), reproject: Astropy-affiliated image reprojection package, ascl:2011.023]
Rough numbers: - One 4kΓ4k FITS reproject: ~10β60 seconds on a single CPU core depending on interpolation method - 50 images per target per night: ~10β50 minutes total - With multiple targets: this scales to hours of CPU time per night at Stage 2
Parallelism: Embarrassingly parallel. Each image's reprojection onto the reference WCS is completely independent of all others. This is the ideal BOINC/worker job: input is one FITS + reference WCS header, output is one reprojected FITS. Data per job: ~50β150MB in, ~50MB out. Compute time: ~30 secondsβ2 minutes. Self-contained.
1.3 Plate Solving¶
Why it matters: Every FITS that doesn't have a WCS header needs astrometry.net or a local solver (astrometry.net's solve-field, or twirl, or astap). Plate solving is CPU-bound index lookup + pattern matching. [Source: Lang et al. (2010), AJ 139, 1782 β astrometry.net blind plate-solving algorithm]
Rough numbers:
- astrometry.net remote API: 30β120 seconds per image including network
- Local solve-field: 5β30 seconds per image
- At 50 telescopes Γ 50 images/night: 2,500 solves/night = 3.5β21 CPU-hours at local speed
Parallelism: Perfectly embarrassingly parallel. Each image is solved independently. This is a good candidate for a pool of worker processes on the central server. Does not need BOINC β just concurrent workers.
1.4 Photometric Calibration (Zero-Point Fitting)¶
Why it matters: Matching detected sources against Gaia/APASS to determine per-image zero-points. This involves source extraction (sep or photutils) + catalog cross-match + sigma-clipping fit.
Rough numbers:
- Source extraction on a 4k frame: 5β30 seconds
- Catalog query (Gaia via astroquery): 2β10 seconds (network-bound)
- Zero-point fit: milliseconds
Parallelism: Embarrassingly parallel per image. The catalog query can be cached per sky region. This is fast enough to run in-process on the central server for Stage 1β2.
1.5 Lomb-Scargle Period Analysis¶
Why it matters: Period search across thousands of frequency bins for variable stars, long-period variables, or building up a clean TTV ephemeris.
Rough numbers:
- astropy.timeseries.LombScargle on 1000 data points Γ 10,000 frequency bins: <1 second
- scipy.signal.lombscargle is even faster
- Only becomes expensive at large N (millions of points) or when doing uncertainty estimation via bootstrap resampling
Parallelism: Trivially parallel across targets. For a single target, the frequency grid can be split across workers if needed. Not a meaningful compute bottleneck at current scale.
1.6 Transient Classification (ML Inference)¶
Why it matters: Running a CNN to classify difference-image cutouts as real transients vs. artifacts. This is fast on GPU, feasible on CPU for small models.
Rough numbers: - Inference on a 64Γ64 cutout with a ResNet-18 class model: ~1ms on GPU, ~30ms on CPU - 1,000 candidates per night: 30 seconds on CPU, trivial on GPU
Parallelism: Batched GPU inference is ideal. On CPU, embarrassingly parallel across candidates. Not a bottleneck at Stage 1β2 scale.
1.7 Final Co-addition (Weighted Stacking)¶
Why it matters: SNR-weighted sigma-clipped mean across N aligned, flux-normalized arrays. This is memory-bound more than compute-bound β all images must be in memory simultaneously.
Rough numbers: - 100 images Γ 4kΓ4k Γ float32 = 100 Γ 64MB = 6.4GB RAM - The co-addition itself is a single numpy operation: fast - The bottleneck is loading and aligning all images (reprojection handles alignment)
Parallelism: The stacking step itself is not easily parallelizable, but can be done in chunks (stack 10 images, then combine intermediates). The memory bottleneck is more important than CPU time here.
2. Parallelism Classification¶
| Task | Parallelism Type | Notes |
|---|---|---|
| TTV n-body MCMC | Independent systems: embarrassingly parallel. Within a system: weakly parallel (dynesty multiprocessing) | Primary compute bottleneck |
| WCS Reprojection | Embarrassingly parallel per image | Ideal BOINC job |
| Plate Solving | Embarrassingly parallel per image | CPU worker pool |
| Photometric calibration | Embarrassingly parallel per image | Fast enough for in-process |
| Lomb-Scargle period search | Embarrassingly parallel across targets | Not a bottleneck |
| Transient classification | Batched GPU inference | Fast on any modern hardware |
| Final co-addition | Not parallelizable; memory-bound | Keep in RAM, don't split |
3. Distributed Compute Models and When Each Applies¶
3.1 Task Queue (Celery + Redis/RabbitMQ)¶
What it is: A central broker holds a queue of jobs. Worker processes (anywhere β same machine, different servers, or VMs) pull jobs off the queue, execute them, push results back. [Source: Celery project: https://docs.celeryq.dev β distributed task queue for Python; Redis: Carlson (2013), "Redis in Action" β in-memory data structure store]
When to use: Any task that is async, independently executable, and has well-defined input/output. This is the workhorse model for the data pipeline.
OpenAstro fit: - Plate solving: worker pulls FITS path, returns WCS-embedded FITS - Photometric calibration: worker pulls FITS + site metadata, returns zero-point + catalog - Reprojection: worker pulls (FITS path, reference WCS), returns reprojected FITS - Lomb-Scargle: worker pulls (light curve CSV, frequency range), returns period + power spectrum
Advantages: - Simple to operate: Redis is trivially cheap to host - Workers scale horizontally: add a VPS, start a worker, it picks up jobs immediately - Retry logic, dead-letter queues, priority lanes built in - Workers can run anywhere with network access to the broker - No data movement before the job is needed β pull on demand
Disadvantages: - Workers must have access to the data (either centrally stored S3/NFS, or job includes the data) - Doesn't handle terabyte-scale data movement well - Not designed for the kind of checkpoint/resume that long MCMC runs need
Stage applicability: Stage 1 and onward. Celery is the right tool for the pipeline even at small scale. The overhead of not having it is writing ad-hoc async code instead.
Minimal setup: Redis on the same Hetzner VPS + Celery workers as separate processes. Zero additional infrastructure cost.
3.2 Message Queue (Redis pub/sub, AWS SQS, or NATS)¶
What it is: Decoupled message passing between services. Producers push events, consumers react.
When to use: Decoupling services that should not be tightly coupled. Specifically: the telescope client submitting observations should not block waiting for the calibration pipeline to run β it fires a message and moves on.
OpenAstro fit: - Client submits observation β fires "new_observation" message to queue β calibration worker picks it up asynchronously - Plate solve completes β fires "wcs_ready" message β triggers calibration stage - Calibration complete β fires "calibrated" message β triggers stacking or period analysis - Transient alert from ZTF β fires "external_alert" message β triggers target creation + scheduler re-prioritization
This is an event-driven pipeline architecture. Each stage is a consumer of one message type and a producer of the next.
Advantages: - Resilience: if calibration workers are down, observations queue up and get processed when workers come back - Decoupling: telescope client doesn't know or care what happens after it submits - Natural retry and backpressure
Stage applicability: Stage 1 can start with just function calls. Stage 2 with 50 telescopes submitting concurrently, you want the async decoupling. Redis pub/sub is free if you're already using Redis for Celery.
3.3 Cloud Burst (AWS Lambda / Google Cloud Run / Fly.io)¶
What it is: Serverless functions or containerized jobs that spin up on demand, run a single task, and shut down. You pay only for execution time.
When to use: Spiky, irregular compute demand. The scheduler runs once per minute β that's Lambda territory. A big stacking job runs once per campaign β that's Cloud Run territory.
OpenAstro fit: - Good fit: Scheduler computation (visibility calculations across all targets for all sites β runs every 60 seconds, cheap to compute, no persistent state) - Good fit: Alert ingestion and target creation from ZTF/GCN (fires on each incoming alert, otherwise idle) - Poor fit: MCMC inference (long-running, stateful, GPU-advantaged, terrible on Lambda's 15-minute limit) - Poor fit: Plate solving (needs on-disk index files totalling 40GB+ β can't cold-start a Lambda with that) - Acceptable fit: Reprojection of individual images (can be containerized, runs in <5 minutes, but data transfer costs are significant)
Cost for OpenAstro: - Lambda @ 128MB, 100ms per invocation, 1,000 calls/day: ~$0/month (within free tier) - Cloud Run for stacking: $0.000024/vCPU-second Γ 60 CPU-seconds per stack job Γ 100 jobs/month = $0.14/month - The data egress is the real cost: S3 to Lambda to S3 at $0.09/GB can add up if you're moving FITS files around
Stage applicability: Useful for event-driven lightweight tasks (scheduler, alert ingestion) from Stage 2 onward. FITS processing should stay on workers with local disk access.
3.4 BOINC Volunteer Compute¶
What it is: Berkeley Open Infrastructure for Network Computing. Volunteers install the BOINC client, donate idle CPU/GPU cycles. You run a BOINC server that distributes work units, validates results, and handles the volunteer relationship.
When to use: Compute demand that exceeds what you can afford to pay for, but the jobs are: 1. Well-defined input/output (checkpointable is a bonus) 2. Reproducible (multiple volunteers can compute the same job for validation) 3. Can tolerate high latency (volunteer machines go offline, come back later) 4. Don't require access to a central database or live network during computation
OpenAstro fit:
TTV n-body MCMC β excellent fit, with caveats. A BOINC work unit would be: "here is a light curve of N transit times, run 500,000 MCMC steps from this starting position, return the chain samples." Multiple work units together constitute one full MCMC run (parallel independent chains). Validation by comparing chain statistics across replicated work units.
WCS Reprojection β good fit. Tiny data package (one FITS + reference header), ~50MB, runs for 1β2 minutes, result is a reprojected FITS. Input/output clearly bounded. Validation by checksumming output.
Plate solving β poor fit. Requires ~40GB of astrometry.net index files on the volunteer machine. Installation barrier is too high. Run this centrally.
ML inference β poor fit. Model weights need to be distributed, versions need to be managed, and the jobs are too fast (milliseconds) for BOINC overhead to make sense.
BOINC operational overhead: Running a BOINC server is non-trivial. You need: - A dedicated server running the BOINC project software (not trivial to configure) - Signed application binaries for Windows/Linux/macOS - Work unit packaging code - Result validation logic - A community of volunteer credit-motivated users
The volunteer community for a new astronomy project is not zero, but it's not guaranteed. SETI@home, Einstein@home, Rosetta@home all built community over years.
Stage applicability: Stage 2, after you have proven science output and can recruit volunteers. Not Stage 1 β the compute need doesn't justify the operational overhead yet, and a single server can handle Stage 1 inference.
When BOINC becomes necessary: When the number of systems you want to run TTV inference on exceeds what you can afford on cloud compute. If each system costs ~2 CPU-hours of MCMC time and cloud spot instances cost ~$0.02/CPU-hour, then 100 systems per month = ~$4. BOINC is not saving you money at that scale β it saves you at 10,000+ system-runs per month, which is a Stage 3+ problem.
3.5 Owned Hardware¶
What it is: Dedicated physical servers you own and operate.
When to use: Sustained, predictable compute load that exceeds the cost of renting cloud equivalent. Cloud is expensive at sustained 100% utilization. Break-even on owned hardware is roughly 12β18 months of equivalent cloud rental.
OpenAstro fit: - A used desktop workstation (AMD Threadripper or EPYC) with 64 cores and 256GB RAM, ~$2,000β4,000 used, can run 50+ MCMC jobs in parallel and process a night's worth of pipeline work in hours - A single NVIDIA RTX 4090 GPU (~$1,500) handles all ML inference workloads for Stage 2β3 and accelerates reprojection with cupy/reproject GPU backend - Co-located at a datacenter: ~$100β200/month for power + rack space + bandwidth
Cost comparison for TTV inference: - 100 systems/month Γ 2 CPU-hours = 200 CPU-hours - AWS spot EC2 (c6i.large): ~$0.03/CPU-hour β $6/month - Owned Threadripper (64 cores): ~$0.004/CPU-hour amortized over 3 years β $0.80/month equivalent - BOINC: ~$0/month direct cost, but significant ops time
Stage applicability: Stage 3 β once you have enough sites and compute demand to justify capital expenditure. Running a single beefy VPS (Hetzner's AX101 bare metal, 24 cores, 128GB RAM, ~β¬80/month) is probably the right answer for Stage 2.
4. TTV n-Body Inference: The Compute Graph¶
This is the most important section because TTV inference is the unique scientific contribution and the place where compute architecture matters most.
The Problem Statement¶
You have N observed transit times {tβ, tβ, ..., tβ} with uncertainties {Οβ, ..., Οβ} for a known transiting planet. You want to infer the parameters of a hypothetical gravitational perturber: {Mβ, Pβ, eβ, Οβ, t_conjβ} β its mass, orbital period, eccentricity, argument of periapsis, and initial conjunction time.
The posterior is:
P(params | data) β L(data | params) Γ Prior(params)
where the likelihood L requires calling TTVFast (a forward n-body integrator) with the proposed parameters to generate predicted transit times, then comparing to observed times under a Gaussian noise model.
The Compute Graph¶
[Observed transit times + uncertainties]
|
v
[Sampler (emcee / dynesty)]
|
| proposes parameter vector ΞΈ
v
[TTVFast forward model]
|
| returns predicted transit times {tΜβ, ..., tΜβ}
v
[Log-likelihood calculation]
LL = -0.5 Ξ£ [(tα΅’ - tΜα΅’)Β² / Οα΅’Β²]
|
v
[Accept/reject step (MCMC) or weight (nested sampling)]
|
v
[Chain sample stored]
Repeat ~5,000,000 times.
Parallelization Options¶
Option 1: Independent chains (embarrassingly parallel)
Run M independent MCMC chains simultaneously, each starting from a different point in parameter space. Results combined afterward. This is how emcee's ensemble sampler works β 50β200 walkers, each a separate chain, but sharing information to propose better moves. The walkers are only "coordinated" at each ensemble step, not per likelihood call.
Distribution: Each chain/walker can run on a separate CPU core. No inter-process communication needed within a step, but the ensemble update requires collecting all walkers' results before proposing the next step.
For distributed compute: The ensemble step coordination is the problem. You can't farm out individual likelihood calls to BOINC workers because the latency of returning a result (minutes to hours for a volunteer machine) breaks the synchronous ensemble step.
Solution: Distribute at the chain-group level, not the likelihood-call level. Run M independent sub-chains (each a complete MCMC run, not just one step) on separate workers. After all sub-chains finish, combine the posteriors using Bayesian chain combining. This is embarrassingly parallel β each BOINC work unit is "run 100,000 steps of MCMC starting from this point."
Option 2: Population-based parallel tempering
Run chains at different "temperatures" (flattened likelihoods) in parallel and swap configurations between temperatures. Explores multimodal posteriors better but requires periodic synchronization between temperature levels. Not suitable for BOINC; fine for Celery workers on shared memory.
Option 3: Nested sampling with multiprocessing (dynesty)
dynesty supports pool=multiprocessing.Pool(n_workers) β each likelihood call is dispatched to a worker process. This works on a single machine with N cores with near-linear speedup. Not distributable across machines without custom plumbing (the sampler state is in the main process memory).
For OpenAstro's actual needs:
At Stage 1 (handful of systems), a single 16-core server running dynesty with 16 workers solves each system in ~10β30 minutes. Total overnight batch: trivially manageable.
At Stage 2 (dozens of systems, continuous updates), Celery workers pull "run inference on system X" jobs. Each job runs a complete dynesty inference on a single system. Workers are stateless and can run on any machine with TTVFast and dynesty installed. This is the right model.
At Stage 3+ (hundreds of systems, or public competition to find perturbers), BOINC becomes viable: package the work unit as a self-contained binary (TTVFast compiled in, fixed Python environment), distribute starting points and chain budgets as the work unit, collect posterior samples.
Checkpointing for Long Runs¶
dynesty supports native checkpointing β saving sampler state to disk periodically. This is essential for BOINC compatibility: if a volunteer machine shuts down mid-run, the job restarts from the last checkpoint rather than from scratch. emcee also supports this via its HDF5 backend.
Work unit design for BOINC should budget 30β60 minutes of wall time maximum per unit, corresponding to ~500,000β1M likelihood calls with TTVFast. Multiple units combine to form a complete inference run.
5. Heterogeneous Stacking: Parallelism and Bottlenecks¶
The Pipeline¶
[N FITS images from different instruments, all covering the same sky region]
|
for each image:
v
[Plate solve if no WCS β embed WCS] β parallelizable
|
v
[Flux calibration: zero-point + color term] β parallelizable
|
v
[Select reference frame (best seeing/resolution)] β serial, fast
|
for each non-reference image:
v
[WCS reprojection onto reference grid] β parallelizable
|
v
[Flux normalization: match backgrounds] β parallelizable
|
v
[SNR-weighted co-addition] β serial, memory-bound
Parallelism Analysis¶
Steps 1β2 (plate solve + calibration): embarrassingly parallel. A job queue is ideal. These can even run on the telescope client before upload (see Section 6).
Step 3 (reference frame selection): not parallelizable, but it's a O(N log N) sort by image quality metric β essentially free.
Step 4 (reprojection): embarrassingly parallel. Each image's reproject_interp call is independent. This is the dominant compute step. With N=100 images, you want N concurrent workers. FITS file must be accessible to the worker, which means either: (a) centralized storage (S3) with worker downloading the file, or (b) BOINC with the FITS packaged into the work unit.
The data-transfer economics matter here. A 50MB FITS file downloaded from S3, reprojected in 30 seconds on a worker, then uploaded as a 50MB result costs ~$0.01 in data transfer (assuming B2 pricing). For 100 images: $1. For 1000 images/month: $10. Manageable. BOINC volunteers provide both CPU and bandwidth β they download the input, process it, and upload the result. At FITS sizes, this is reasonable. At raw image sizes (200MB+), it starts to strain volunteer bandwidth.
Step 5 (flux normalization after reprojection): fast, can run in-process after each reprojection completes.
Step 6 (co-addition): requires all N images to be on the same machine. Memory requirement: N Γ image_size_bytes. For N=100, 4kΓ4k float32 images: 6.4GB RAM. This is fine on a modern server. For N=1000: 64GB β requires careful chunked stacking. The algorithm is a simple numpy weighted mean, sub-second once data is loaded.
The bottleneck is getting all reprojected images onto the stacking machine, not the stacking itself.
Design implication: do reprojection centrally (not on BOINC) for small-to-medium N. BOINC reprojection only makes sense when N is so large that central compute can't keep up β which is a Stage 3 problem.
6. Telescope Client Pre-processing: What to Offload to the Edge¶
Telescope clients (Raspberry Pi, Windows PC) have non-trivial compute. The question is: what should they process locally before uploading?
What the Client SHOULD do locally¶
Image quality assessment: A simple star count and FWHM measurement can be done in seconds on a Pi 4. If the frame has no detectable stars, don't upload it. This saves storage and pipeline processing costs. Implementation: use sep (source extractor) for fast source detection. A 2MB FITS frame takes ~1 second.
Bias/dark/flat calibration: If the observer maintains calibration frames, the client can subtract bias, subtract dark, divide by flat. This is trivially fast (in-place array operations) and reduces systematic noise before upload. The pipeline still re-validates these on ingestion, but the client calibration reduces load on central processing.
Rough photometry (optional): A quick aperture photometry run to extract the target star's magnitude. This light curve data can be sent to the API independently of the full FITS upload. Even if the full FITS upload fails or is delayed, the photometric measurement is preserved. This is valuable for time-sensitive observations (TTV timing, transient follow-up) where the measurement matters more than the raw image.
Compression: FITS images can be compressed with RICE compression (lossless, ~2:1 ratio, supported natively by astropy.io.fits). Apply before upload. This halves bandwidth and storage costs with zero information loss.
What the Client should NOT do locally¶
Plate solving: astrometry.net's index files (40GB+) can't live on a Pi. Local solvers like ASTAP are more feasible (smaller index files) but still consume ~2GB of disk and take 30β60 seconds per frame. Better to upload and solve centrally, unless the client has an SSD and a reasonably fast processor.
WCS reprojection: The reference frame changes per campaign and per target field. The client doesn't have enough context to do this correctly.
Zero-point calibration: Requires a Gaia/APASS catalog query (network), per-image source extraction, and cross-matching. Reasonable on a Pi but better done centrally where catalog data can be cached and the full instrument characterization is available.
MCMC inference: Never on the client. This is a central or BOINC job.
Upload Strategy¶
Two-tier upload: 1. Immediately: Compact photometric measurement as JSON (magnitude, time, site, comparison stars, quality flags) β ~1KB per observation. This populates the light curve database and keeps the science moving even on slow internet. 2. Background: Full FITS file upload when bandwidth allows. Large files, can be queued, compressed, and retried on failure.
For a Raspberry Pi on a home internet connection, a 50MB compressed FITS upload takes ~30 seconds on a 10 Mbps upload. Fine for hourly submissions. Fine for continuous submissions if using background upload with compression.
7. Minimal Setup Per Stage¶
Stage 1: Data Pipeline (No Live Network)¶
What compute is needed: Ingest archival data (AAVSO, ETD, MPC), calibrate, stack, run period analysis and TTV fitting.
Compute requirements: - Pipeline (ingest β calibrate β stack): modest. A single Hetzner VPS (4 vCPU, 8GB RAM, ~β¬8/month) handles it. - TTV MCMC (maybe 5β20 systems): run overnight on the same VPS or a local laptop. Not time-sensitive. - Lomb-Scargle: trivial, runs in seconds.
Recommended setup: - 1Γ Hetzner CX31 (4 vCPU, 8GB, ~β¬8/month) for API server + pipeline workers - Celery workers running as local processes on the same box - Redis for task queue (runs on the same box, <100MB RAM) - Backblaze B2 for FITS storage (~$5/month for 1TB) - No BOINC. No cloud burst. No message queue.
Total monthly cost: ~$15/month.
Stage 2: Volunteer Telescope Network (10β50 sites)¶
What compute is needed: Live ingest from 10β50 telescopes, pipeline runs nightly, TTV inference for multiple systems, transient classification.
New compute requirements: - Pipeline throughput: 50 sites Γ 50 images/night Γ (1 plate solve + 1 calibration + 1 reproject) = significant - Plate solving: 2500 solves/night = ~10 CPU-hours at 15 seconds each β needs concurrent workers - Reprojection: if doing heterogeneous stacking for 10 targets with 50 images each = 500 reproject calls/night = ~5 CPU-hours - TTV MCMC: 10β20 systems updated weekly = ~40 CPU-hours/week = ~6 CPU-hours/day
Recommended setup: - Upgrade to Hetzner AX41 bare metal (4 cores, 64GB RAM, ~β¬50/month) or a cloud VM equivalent - Celery workers: 4β8 worker processes for pipeline tasks - Separate Celery queue for MCMC jobs (longer-running, don't block pipeline) - Redis for both task queue and caching (target lists, catalog data) - Consider adding a second VPS as a dedicated worker (easy horizontal scaling with Celery) - Transient classification: small ResNet model runs on CPU, fast enough at this scale
Total monthly cost: ~$60β80/month all-in.
Not yet needed: BOINC, GPU instance, serverless functions.
Stage 3: Owned Hardware + BOINC (50+ sites)¶
New compute requirements: - TTV inference scales to hundreds of systems - Heterogeneous stacking at scale (500+ images per night possible) - ML models need periodic retraining - Real-time transient classification at throughput that CPU can't handle cheaply
Recommended setup: - Owned compute node: AMD Threadripper Pro workstation or small server, 24β64 cores, 128β256GB RAM, RTX 4090, ~$4,000β8,000 one-time, co-located - BOINC server deployed for TTV MCMC volunteer compute - Cloud GPU spot instances for model retraining (run once per month, ~$10β50/run) - Cloud Run / Lambda for lightweight event-driven tasks (scheduler, alert ingestion) - S3 (or Backblaze B2) as the central data lake
Total monthly cost: ~$200β400/month (co-location + bandwidth + cloud burst for peaks)
8. Cost Modeling: Cloud vs. BOINC vs. Owned Hardware¶
Scenario: Running TTV MCMC on 100 systems per month¶
Each system: 5M likelihood calls Γ 1ms per call = ~1.4 CPU-hours (wall time with 1 core). With 32 parallel chains on 32 cores: ~2.6 minutes wall time per system.
| Option | Cost/Month | Notes |
|---|---|---|
| AWS spot (c6i.xlarge, 4 vCPU, $0.05/hr) | ~$7 | 100 jobs Γ 2 CPU-hr Γ $0.05 = $10; spot is cheaper |
| Hetzner CCX33 (8 vCPU, β¬21/mo) | β¬21/month, fully used | Amortizes if you use it for other things too |
| Owned Threadripper (64 core) | ~$4 amortized | $4,000 hardware / 3 years / 12 months / 64% utilization |
| BOINC volunteers | ~$0 + ops time | Community takes weeks to grow; real cost is your developer time |
Conclusion: At Stage 2 scale, cloud or a dedicated VPS beats BOINC on economics. BOINC's value is at very large scale (thousands of system-runs/month) where cloud would cost hundreds per month.
Scenario: WCS Reprojection at Scale¶
50 sites, 100 images per site per night, 10 campaign targets = 500 reprojection jobs/night, each taking 45 seconds on 1 CPU core = 375 CPU-core-hours/night.
| Option | Cost/Night | Notes |
|---|---|---|
| AWS spot Lambda | ~$0.45 | 500 Γ 45s Γ $0.000016/GB-s Γ 1GB |
| 8 Celery workers on Hetzner VPS | ~$0.08 (prorated) | 375 CPU-hours / 8 workers = ~47 hours wall time β ~2 nights behind. Need more workers |
| 32 Celery workers on larger box | ~$0.12 (prorated, Hetzner AX61) | 375 / 32 = ~12 hours wall time β catches up in same day |
| BOINC volunteers | ~$0 | Viable if each job is packaged as a clean work unit |
Conclusion: At this scale (Stage 2), BOINC reprojection starts making genuine economic sense. The data transfer overhead (50MB in, 50MB out per job) is manageable. This is worth building.
Scenario: ML Model Training¶
Training a transient classifier on 6 months of accumulated data: ~10M images Γ 64Γ64 patches = significant.
| Option | Cost/Run | Notes |
|---|---|---|
| AWS p3.2xlarge (V100, spot) | ~$20β50 per training run | Depends on model size and epochs |
| Lambda Labs cloud GPU | ~$10β30 per run | Cheaper than AWS |
| Owned RTX 4090 | ~$0.50 amortized | Best long-term if you train monthly |
| BOINC (GPU jobs) | Potentially viable | Einstein@Home uses GPU; needs app packaging for each GPU architecture |
Conclusion: Train infrequently in the cloud until training frequency justifies owned GPU.
9. Interaction Between Compute and the Telescope Network¶
The telescope client is a data producer, not a compute resource (except for edge pre-processing). The central compute infrastructure is the consumer. The key interface questions:
Does compute latency affect science quality? For most science cases: no. Light curves accumulate over days/weeks. Whether a photometric measurement is processed in 5 minutes or 5 hours doesn't affect the TTV measurement. Exception: transient response, where fast classification and alert routing (minutes) enables follow-up while the transient is still bright.
Does compute capacity gate data ingestion? Yes, if the pipeline falls behind, unprocessed observations pile up. Design the queue so that: (a) photometric measurements (light curve data) are extracted and stored first, fast; (b) full image pipeline (plate solve, reproject, stack) runs behind asynchronously. The science continues even if stacking is days behind.
Can volunteers contribute compute? Two models:
1. BOINC volunteer compute (classic, described above): volunteers run a client app, donate CPU cycles
2. Distributed FITS processing (described in Computation.md): volunteers download a standardized data package, run a script, upload results. Lower barrier (no BOINC installation), but requires volunteer motivation and is less reliable
The second model is described in the vault as the "Community Compute Model." It's a good halfway point before full BOINC deployment. A simple web page: "Download this data package (200MB), run this Python script, upload the output FITS." A volunteer with a good laptop can process one stacking job per night. With 50 such volunteers, that's 50 stacking jobs per night without any central compute cost.
[NOVEL] The "Community Compute Model" β a lightweight volunteer compute scheme where contributors download a data package and run a standardised script, as a bridge between centralised processing and full BOINC deployment β is an original OpenAstro design. It lowers the contribution barrier (no BOINC client installation required) while still distributing compute load across the volunteer community.
10. Failure Modes and Resilience¶
Task queue worker dies mid-job: Celery's acks_late option ensures the task is re-queued if the worker dies before acknowledging completion. Use this for all pipeline tasks.
MCMC job fails midway: TTVFast can crash on pathological parameter proposals. Always wrap likelihood calls in try/except, return -inf on error. dynesty and emcee both handle this gracefully. For BOINC work units, include a checksum of the TTVFast binary to detect corruption.
Central server down while telescopes are observing: Clients should queue observations locally and retry uploads. The light curve data (compact JSON) should be the priority retry. Full FITS uploads can wait longer. This is already implied by the two-tier upload strategy.
S3 / object storage outage: Pipeline workers wait and retry. The task queue handles backpressure naturally. Don't make the pipeline synchronously dependent on object storage being available.
BOINC volunteer computation returns wrong results: Always validate by running each work unit on at least 2 independent volunteers and cross-checking outputs. For MCMC, check that chain statistics (mean, variance, autocorrelation) are consistent across replicates. Adversarial volunteers (deliberately wrong results) are rare for astronomy projects but non-zero β the quorum model handles them.
Summary: The Right Tool at the Right Stage¶
| Stage | Task | Right Tool |
|---|---|---|
| Stage 1 | Pipeline (calibration, plate solve) | Celery workers on single VPS |
| Stage 1 | TTV MCMC (handful of systems) | Direct Python script, run overnight |
| Stage 2 | Concurrent pipeline | Celery + Redis, horizontal scaling |
| Stage 2 | TTV MCMC (tens of systems) | Celery jobs with dynesty multiprocessing |
| Stage 2 | Transient classification | CPU inference, batched |
| Stage 2 | Alert ingestion | Event-driven Celery or Lambda |
| Stage 3 | TTV MCMC at scale | BOINC volunteer compute |
| Stage 3 | Large-scale reprojection | BOINC or GPU-accelerated workers |
| Stage 3 | ML model training | Cloud GPU spot instances |
| Stage 3 | Sustained pipeline | Owned hardware + co-location |
The principle: don't build for Stage 3 when you're in Stage 1. The architecture scales by adding workers to the queue β the same Celery setup that runs one pipeline worker on a $15/month VPS can run 100 workers across a cluster. Build the queue abstraction now; add workers as needed.