BOINC Integration Plan¶
This note covers BOINC in full technical depth — how it actually works under the hood, what a real server setup requires, how work units would be designed for each OpenAstro compute task, and honest assessment of the practical barriers for a solo developer. The companion note [[Volunteer Compute Options]] covers alternatives.
1. How BOINC Actually Works¶
BOINC (Berkeley Open Infrastructure for Network Computing) is a framework for dispatching arbitrary compute jobs to a fleet of volunteer machines. It was built at UC Berkeley for SETI@home and now underpins dozens of science projects (Rosetta@home, Einstein@home, Asteroids@home, World Community Grid). [Source: Anderson (2004), "BOINC: A System for Public-Resource Computing and Storage", Proceedings of 5th IEEE/ACM International Workshop on Grid Computing — foundational BOINC architecture paper]
The Core Architecture¶
[BOINC Server] ←—— MySQL + Apache + BOINC daemon stack
|
| HTTP/XML (work units out, results back)
|
[BOINC Client on volunteer PC]
|
| Forks
|
[Science Application] — the actual executable that does your computation
Three distinct layers:
The BOINC Server is not just a web server. It is a collection of daemons that run on a Linux machine:
feeder— reads the database and puts work units into a shared-memory queuetransitioner— moves work unit states through the lifecycle (UNSENT → IN_PROGRESS → OVER)validator— compares results from multiple clients; marks them as canonical or invalidassimilator— processes validated results (writes them to your science database, triggers follow-on jobs)file_deleter— cleans up input/output files after a result is processed
The standard deployment is a LAMP stack: Linux + Apache + MySQL + PHP. BOINC's web interface (called bossa or the standard BOINC web) runs on top of this. The daemons run as background processes managed by bin/start.
Work units are database records. A work unit (workunit table) contains:
- Input file references (by hash, served over HTTP)
- The application name and version
- Deadline, credit estimate, redundancy requirements
Results (confusingly named — a "result" is an instance of a work unit sent to one client) get created automatically based on min_quorum and target_nresults settings. If target_nresults = 2 and min_quorum = 2, every work unit gets sent to two clients and both results must agree before validation succeeds.
The Lifecycle of a Work Unit¶
1. You create a work unit: boinc/bin/create_work --appname ttv_mcmc \
--wu_name ttv_kepler17_20250301 \
--min_quorum 2 \
--target_nresults 2 \
input_lightcurve_kepler17.json
2. feeder picks it up and puts it in the in-memory work queue
3. Client polls the scheduler (XML over HTTP), gets the work unit:
- Downloads input file from the server's download directory
- Runs your science app with the input file as argument
- App produces an output file
- Client uploads the output file, reports completion
4. Two clients have now returned results. Validator runs:
- Calls your validator function on both result files
- If they agree → one is marked canonical, credit is awarded
- If they disagree → both flagged as ERROR, work unit gets resent
5. Assimilator fires on the canonical result:
- Reads the output file
- Writes extracted science data to your PostgreSQL science DB
- Optionally triggers the next job in a pipeline
6. file_deleter removes input/output files after assimilation
The Science Application¶
This is where most complexity lives. The science app is:
- A standalone executable compiled for each target platform: x86_64-pc-linux-gnu, windows_intelx86, x86_64-apple-darwin
- Or a Docker container (BOINC virtualbox/docker wrapper — requires VirtualBox on client)
- It reads input files from the current directory, writes output files, and exits
You provide the app binary to BOINC, BOINC distributes it to clients automatically through the app version mechanism. Clients verify the binary using a code-signing key you generate during server setup.
A typical lightweight science app for OpenAstro would be a compiled Python environment (PyInstaller bundle) or a small Go/C binary that calls into a Python library.
2. Work Unit Design for Each OpenAstro Task¶
2.1 N-Body MCMC for TTV Inference¶
This is the most natural BOINC fit in the entire OpenAstro pipeline — the compute profile is almost ideal.
Why it fits BOINC well: - Input is tiny: a JSON file with transit timing data, maybe 50-200 data points. Roughly 5–50 KB. - Computation is enormous: MCMC with TTVFast requires millions of likelihood evaluations. Each likelihood evaluation is an n-body integration (symplectic Leapfrog, ~10-1000 timesteps per orbit). For a typical 2-planet fit with emcee at 100 walkers × 50,000 steps, you're at 5 × 10^8 TTVFast calls. On a modern CPU: 3–24 hours per work unit. - Output is moderate: the MCMC chain (posterior samples). If you keep 10,000 thinned samples in a NumPy .npy file, that's roughly 1–5 MB. - No GPU required: TTVFast is a pure CPU integrator. Volunteers with any modern CPU contribute.
Work unit decomposition strategy:
For a single target system (say Kepler-17b), you have the full posterior to explore. Parallelization options:
Option A — Parallel chains: Split 100 emcee walkers into 10 work units of 10 walkers each. Merge chains after. Problem: emcee uses ensemble moves that require all walkers to know each other. This breaks the parallelism.
Option B — Parallel nested sampling runs: dynesty with n_live=500 can be split into multiple independent runs with different random seeds. Results are merged via importance resampling. Each run is ~12 hours on a 4-core CPU. This is the recommended approach.
Option C — Grid + MCMC: Pre-compute a coarse grid of parameter space on the server, identify modes, then dispatch focused MCMC work units centered on each mode. Each work unit is a localized MCMC with a tight prior.
Option D (best for Stage 1): Temperature ladder parallelism. Parallel Tempering MCMC runs N chains at different temperatures. [Source: Earl & Deem (2005), Phys. Chem. Chem. Phys. 7, 3910 — parallel tempering theory] Each temperature chain is an independent work unit. After all complete, implement the PTSampler swap step server-side. This is genuinely embarrassingly parallel — each work unit runs its chain, returns, server swaps, dispatches next iteration.
[NOVEL] Decomposing a parallel-tempering MCMC run such that each temperature chain is a separate BOINC work unit — with the inter-chain swap step executed server-side after all results return — is an original architecture for distributing Bayesian astronomical inference over volunteer compute. This makes MCMC chains independently dispatch-able despite the synchronisation requirement of the swap step.
Concrete work unit structure:
{
"work_unit_id": "ttv_kepler17_pt_T2.5_iter003",
"science_type": "ttv_mcmc_parallel_tempering",
"target": {
"name": "Kepler-17",
"transit_times": [2454830.123, 2454845.891, ...],
"transit_errors": [0.0012, 0.0009, ...],
"period_days": 1.4857108,
"transit_epoch": 2454833.5
},
"mcmc_config": {
"temperature": 2.5,
"n_walkers": 32,
"n_steps": 5000,
"n_burn": 1000,
"parameters": ["mass_ratio", "period_perturber", "ecc_perturber", "omega_perturber", "phase"]
},
"prior_bounds": {
"mass_ratio": [1e-6, 1e-2],
"period_perturber": [0.5, 50.0],
"ecc_perturber": [0.0, 0.8]
},
"init_state": "path/to/previous_iteration_state.npy",
"ttvfast_config": {
"dt": 0.05,
"total_time": 1500,
"t0": 2454833.5
}
}
Validation approach: Two clients receive the same work unit with different random seeds. Validation is done by comparing the log-posterior distribution of the two chains using a 2-sample KS test on the thinned samples. [Source: Kolmogorov (1933) / Smirnov (1948) — KS test; scipy.stats.ks_2samp implementation] If KS p-value > 0.05, chains are statistically consistent — canonical result accepted, they are merged. If not, both get resent.
This requires a custom validator (a Python or C++ function you link into BOINC's validator daemon).
[NOVEL] Using a 2-sample KS test on thinned MCMC posterior samples as the BOINC validation criterion — comparing marginal distributions rather than deterministic outputs — is an original solution to the fundamental challenge of validating stochastic BOINC results. No existing BOINC project uses this statistical validation approach for MCMC chains.
Credit per work unit: BOINC measures credit in FLOPS via elapsed CPU time × measured FLOPS rate. For a 12-hour work unit on a 4-core modern CPU (~50 GFLOPS effective), this is roughly 2.2 × 10^15 FLOPs = 2,160 GigaFLOPS credit. Generous and motivating for volunteers.
2.2 WCS Reprojection for Heterogeneous Stacking¶
This fits BOINC with one important caveat: the FITS images are large. A 16-bit 2048×2048 FITS is ~8 MB; a 4096×4096 is ~32 MB. This is manageable (BOINC can handle hundreds of MB of input), but it's not "tiny" like the TTV inputs.
Work unit decomposition:
Each work unit is: one secondary image + the master WCS + a reference to the master frame's pixel grid shape. The client reprojects the secondary image and returns the reprojected array.
The server pre-stages master WCS + plate-solved secondaries in the BOINC download directory. Files are served by Apache.
Input: secondary_20250315_telescope_B_NGC1499.fits (8–32 MB)
master_wcs_header.json (2 KB)
reprojection_config.json (1 KB)
Output: reprojected_secondary_20250315_telescope_B.fits (same size as input)
quality_metrics.json (footprint fraction, RMS alignment)
Client runtime: For a 2048×2048 image, reproject_interp (bilinear) takes ~30–90 seconds on a modern CPU. reproject_exact (flux-conserving) takes 5–15 minutes. For work units this short, BOINC overhead is significant — BOINC's designed for multi-hour jobs. This suggests batching: one work unit = 10–20 secondary images for the same field. Runtime then becomes 10–20 minutes.
Validation: Reprojection is deterministic — the same input always produces the same output. So validation is trivial: both clients must return bitwise-identical output files (or within floating-point tolerance). This is easy to implement.
Alternative for Stage 1: With only a few dozen images, run reprojection on the server VPS or your laptop. BOINC for reprojection only becomes worth the overhead at 500+ images per stack job.
2.3 ML Inference for Transient Classification¶
This maps cleanly to BOINC if you treat the model as part of the application binary.
What classification looks like: For a new transient candidate, you have: - A small science image cutout (typically 64×64 or 128×128 pixels) extracted around the candidate - A reference image cutout from the same field - The difference image (science − reference, after PSF matching) - Some metadata: source coordinates, peak pixel value, nearby source catalog info
A CNN (like the ANTARES or RAPID classifier) takes this as input and outputs a class probability vector: [real_transient, artifact, variable_star, asteroid]. Total input is 3 × 128 × 128 × float32 = ~200 KB.
Work unit design:
Input: batch of 100 transient candidates (cutout triplets + metadata) ~20 MB
model weights file (included in app binary, not per-WU)
Output: 100-row CSV with class probabilities ~10 KB
Batching is critical for GPU efficiency. Individual classifications are trivial on modern hardware (< 1 ms per candidate). Batches of 100–1000 make GPU work units worthwhile.
App design note: This requires a GPU-capable science application. BOINC supports GPU tasks via CUDA/OpenCL. The app declares its GPU requirements and BOINC's scheduler only sends it to clients that report a compatible GPU. A large fraction of volunteer machines have GPUs (gaming PCs).
Validation: For probabilistic outputs, validation compares the top-1 class label from both clients. Agreement on class = valid. For high-stakes candidates (probability > 0.7 on any single class), send to 3 clients and take majority.
Alternative: For Stage 1 with a small transient candidate list (< 1000 per day), just run inference on the server. A ResNet-18 does 100 classifications/second on CPU. BOINC overhead is not justified until you have thousands of candidates per night across many fields.
2.4 Lomb-Scargle Period Searches¶
For period searching across thousands of targets, the embarrassing parallelism is at the per-target level. Each target's period search is completely independent.
Work unit design:
Input: light_curve_<target_id>.csv ~50–500 KB
ls_config.json: {
"frequency_min": 0.01, # cycles per day
"frequency_max": 100.0,
"n_freqs": 500000,
"method": "fast", # vs "slow" (exact)
"oversample_factor": 10,
"fap_threshold": 0.001
}
Output: period_result.json: {
"best_period_days": 2.34561,
"fap": 0.000012,
"ls_power": 0.841,
"harmonics": [...],
"periodogram_compressed.npz": "..." # only if FAP < threshold
} ~1 KB (peak only)
Batching strategy: Send one work unit per target, but pack 100 targets into a single work unit ZIP to amortize file transfer overhead. Runtime at 500,000 frequencies per target × 100 targets on astropy's Lomb-Scargle (Cython): roughly 2–5 minutes total. Perfect BOINC work unit size.
Validation: For numerical outputs (best period), both results must agree within 1 ppm on the best period. Since Lomb-Scargle is deterministic, agreement is guaranteed — unless there's hardware floating-point fault. One canonical result accepted immediately.
3. BOINC Result Validation in Detail¶
BOINC's validation framework is more nuanced than "two results must match." Here's the actual mechanism:
Three Levels of Validation¶
Level 1: Format validation — The BOINC daemon checks that the result file was received (not empty, not a timeout). This happens automatically.
Level 2: Canonical comparison — Your custom validate_pair() function compares two result files and returns a score. For floating-point science results, you define "agree" as: difference < ε for all metrics. The validator marks one result as canonical and the other as redundant.
Level 3: Assimilation — Your assimilate_handler() is called on the canonical result. It reads the output, writes science data to your database, and potentially enqueues downstream jobs.
Credit Without Double-Compute¶
You can avoid sending every work unit to 2+ clients by using homogeneous redundancy: if a client has a high historical reliability score (>99%), send the WU to only one client. Only send to 2 if the client has a new or unverified reliability record. BOINC tracks per-client reliability stats automatically.
This is how mature projects like Einstein@home operate — most WUs are single-client once the community is established.
Handling Floating-Point Non-Determinism¶
MCMC results will not be bitwise-identical between two clients. Your validator must compare the statistical properties of the chains. The recommended approach:
def validate_mcmc_results(result_a_path, result_b_path):
"""
Compare two MCMC chains for statistical consistency.
Returns (True, "valid") or (False, "reason for rejection")
"""
chain_a = np.load(result_a_path) # shape: (n_samples, n_params)
chain_b = np.load(result_b_path)
# Compare marginal distributions for each parameter
for i in range(chain_a.shape[1]):
statistic, p_value = scipy.stats.ks_2samp(chain_a[:, i], chain_b[:, i])
if p_value < 0.01:
return False, f"KS test failed on parameter {i}: p={p_value:.4f}"
# Compare log-evidence estimates (if using nested sampling)
log_z_a = np.load(result_a_path.replace('.npy', '_logz.npy'))
log_z_b = np.load(result_b_path.replace('.npy', '_logz.npy'))
if abs(log_z_a - log_z_b) > 1.0: # > 1 nat difference is suspicious
return False, f"log(Z) mismatch: {log_z_a:.2f} vs {log_z_b:.2f}"
return True, "chains statistically consistent"
4. Practical Barriers to Running a BOINC Server¶
This section is direct about what a solo developer actually faces. It's not impossible, but it's not lightweight either.
The Real Infrastructure Requirement¶
BOINC requires a dedicated Linux server that you control. Not a shared hosting account. Not a simple VPS with one-click deploys. You need root access, a static IP (or DNS), and the ability to run long-lived daemons and compile C++ code on the server.
Minimum requirements:
- 1 vCPU, 1 GB RAM (can start on Hetzner CX11 at ~€3.79/month)
- 100 GB disk for FITS input files and result files
- MySQL 5.7+ or MariaDB
- Apache 2.4
- PHP 7.4+ (for the web interface)
- GCC for compiling BOINC server daemons
- Separate subdomain for BOINC (e.g., boinc.openastro.net) — SSL cert needed
Installation Complexity¶
The BOINC server is not a Docker pull. The canonical installation procedure from boinc.berkeley.edu/trac/wiki/ServerIntro involves:
# Install dependencies (Ubuntu 22.04)
apt-get install apache2 php mysql-server libmysqlclient-dev \
php-mysql curl m4 libtool automake autoconf g++ \
libssl-dev libcurl4-openssl-dev pkg-config python3-pip
# Build BOINC server from source (takes 20–40 minutes)
git clone https://github.com/BOINC/boinc.git
cd boinc
./_autosetup
./configure --disable-client --disable-manager
make
# Create the project
cd tools
./make_project --url_base https://boinc.openastro.net \
--db_name openastro \
--db_host localhost \
openastro_project
# Initialize MySQL database (BOINC schema is ~30 tables)
mysql < openastro_project/db/schema.sql
# Configure Apache virtual host pointing to project's html/ directory
# Start the daemons
cd /openastro_project
bin/start
This takes an experienced Linux sysadmin a few hours, and a solo developer new to the BOINC stack a full day or weekend to get working. It's not terrible, but it is non-trivial.
Science Application Cross-Compilation¶
Your science application must be compiled for every client platform:
- x86_64-pc-linux-gnu
- windows_x86_64 (requires MinGW cross-compiler or a Windows build machine)
- x86_64-apple-darwin (requires a Mac, since Apple won't distribute unsigned binaries for cross-compile)
If your science app is Python, you package it with PyInstaller for each platform. This creates self-contained executables of 50–150 MB each. You need to either own one machine of each OS type, or use CI/CD (GitHub Actions has free runners for all three platforms).
This is a real barrier. If your science app is pure Python using TTVFast + emcee + numpy, packaging with PyInstaller is doable but requires testing on each platform. Plan for 2–4 days of work to get the first cross-platform app working.
Ongoing Operational Load¶
Once running, BOINC requires ongoing attention:
- Work unit creation: You must write the scripts that generate work units from your data and call create_work. This is code you own.
- Monitoring: BOINC has a basic web dashboard, but it's dated. You'll want to build simple monitoring around the MySQL tables.
- Volunteer support: Your project forums (phpBB, bundled with BOINC) become a support channel. Expect "my client crashed" issues.
- App updates: Every time you update your science algorithm, you must build and deploy new app versions for all platforms.
Time estimate: 20–40 hours to get to first working work unit. 2–4 hours/week ongoing for a project with 100–500 active volunteers.
5. Server Setup: Step by Step for OpenAstro¶
Infrastructure Decision¶
Recommended setup given the $15/month budget:
| Component | Service | Cost |
|---|---|---|
| BOINC server | Hetzner CX21 (2 vCPU, 4 GB RAM) | €5.77/month |
| FITS file storage | Backblaze B2 (10 GB free, $0.006/GB after) | ~$1–3/month |
| Result file staging | Same B2 bucket | included |
| DNS | Cloudflare (free tier) | $0 |
| SSL | Let's Encrypt via Certbot | $0 |
| Total | ~$7–9/month |
This keeps BOINC within budget even alongside the existing OpenAstro server (Hetzner CX11 at ~€3.49/month for the main API).
Phased Setup¶
Day 1–2: Base server
# Hetzner CX21, Ubuntu 22.04
# Standard hardening: fail2ban, ufw, non-root user
apt-get update && apt-get upgrade -y
apt-get install apache2 php mysql-server libmysqlclient-dev \
php-mysql curl m4 libtool automake autoconf g++ \
libssl-dev libcurl4-openssl-dev pkg-config python3 python3-pip
certbot --apache -d boinc.openastro.net
Day 3: Build BOINC server from source
git clone --depth=1 https://github.com/BOINC/boinc.git /opt/boinc_src
cd /opt/boinc_src
./_autosetup
./configure --disable-client --disable-manager --prefix=/opt/boinc
make -j2 # CX21 has 2 cores
make install
Day 4: Create project
cd /opt/boinc/tools
./make_project \
--url_base https://boinc.openastro.net \
--db_name openastro_boinc \
--db_passwd "strong_password_here" \
--project_root /opt/openastro_boinc \
--master_url https://boinc.openastro.net/openastro_boinc/ \
openastro
# Initialize DB
mysql -u root < /opt/openastro_boinc/db/schema.sql
mysql -u root -e "GRANT ALL ON openastro_boinc.* TO 'boincadm'@'localhost' IDENTIFIED BY 'password';"
Day 5: Configure Apache virtual host and test
# /etc/apache2/sites-available/boinc.openastro.net.conf
<VirtualHost *:443>
ServerName boinc.openastro.net
DocumentRoot /opt/openastro_boinc/html/user
Alias /openastro_boinc/ /opt/openastro_boinc/html/
<Directory /opt/openastro_boinc/html/>
Options Indexes FollowSymLinks
AllowOverride All
Require all granted
</Directory>
SSLEngine on
SSLCertificateFile /etc/letsencrypt/live/boinc.openastro.net/fullchain.pem
SSLCertificateKeyFile /etc/letsencrypt/live/boinc.openastro.net/privkey.pem
</VirtualHost>
Day 6: Science application setup
- Build the science app on each platform (start with Linux only for testing)
- Generate code signing keys: openssl genrsa -out code_sign_private.pem 2048
- Add app version to database, sign the binary, deploy to apps/ directory in project
- Test with a local BOINC client against your server
Day 7: Create first work unit and validate end-to-end
# Stage the input file
cp test_lightcurve_kepler17.json /opt/openastro_boinc/download/
# Create work unit
/opt/boinc/bin/create_work \
--appname ttv_mcmc \
--wu_name test_wu_001 \
--wu_template /opt/openastro_boinc/templates/ttv_mcmc_wu.xml \
--result_template /opt/openastro_boinc/templates/ttv_mcmc_result.xml \
--min_quorum 1 \
--target_nresults 1 \
test_lightcurve_kepler17.json
# Start daemons
cd /opt/openastro_boinc && bin/start
# Watch the daemon logs
tail -f log_openastro/feeder.log
tail -f log_openastro/transitioner.log
6. The Handoff: OpenAstro Main Server → BOINC Layer¶
What the Main Server Does¶
The OpenAstro main server (FastAPI on Hetzner CX11) handles: - Telescope data ingestion (FITS uploads, photometry POSTs) - Scheduling live observations - Science target management - User-facing API
It does NOT do heavy compute. When compute is needed, it enqueues jobs.
The Handoff Mechanism¶
The simplest integration is a job queue and a compute coordinator process:
Main Server (CX11, FastAPI)
→ writes to: jobs table in shared Postgres (or Backblaze queue)
Compute Coordinator (CX21, Python cron/daemon)
→ polls jobs table every 60s
→ for each pending job:
- stages input files to BOINC download dir
- calls create_work (BOINC binary or BOINC MySQL direct)
- marks job as DISPATCHED
→ polls for completed results via BOINC MySQL
→ for each completed result:
- reads result file from BOINC upload dir
- pushes science data back to main DB
- marks job as COMPLETE
The Compute Coordinator is a ~300-line Python script. It bridges the BOINC world (file-based, daemon-managed) with the OpenAstro world (API-based, REST).
Alternatively: Run both the main API and BOINC on the same CX21 server. They share the same MySQL instance (different databases). The main FastAPI app directly queries the BOINC MySQL tables to check job status. This avoids the coordinator entirely at the cost of tighter coupling.
Data Flow for a TTV Campaign¶
Stage: Data Location Handler
─────────────────────────────────────────────────────────────────
1. Input Transit times CSV Main DB FastAPI ingest endpoint
2. Prep Normalized light curve Backblaze B2 Pre-processor script (run on server)
3. Enqueue Work unit JSON BOINC download dir Compute Coordinator
4. Dispatch WU assigned to client BOINC MySQL feeder daemon
5. Run MCMC chains Volunteer PC Science app (PyInstaller bundle)
6. Return Result .npy files BOINC upload dir client → Apache PUT
7. Validate Chain consistency check BOINC MySQL validator daemon + custom Python
8. Extract Posterior samples Backblaze B2 assimilator handler
9. Analyze Best-fit parameters Main DB Science analysis scripts
10. Output Published posteriors OpenAstro API FastAPI GET /targets/{id}/ttv
7. Is BOINC Overkill for Stage 1?¶
Short answer: Yes, for Stage 1. No, from Stage 2 onward.
Stage 1: Existing Data Pipeline (no live network)¶
At Stage 1, you are processing archival data: a few dozen to a few hundred transit light curves from ETD/AAVSO. The computation involved:
- TTV MCMC on 10–20 systems: 10–20 × 12 hours = 120–240 CPU-hours. On your laptop (4–8 cores): 15–30 hours wall clock. Not cheap, but absolutely doable.
- Lomb-Scargle on 1000 AAVSO variable star targets: ~30 minutes on a modern CPU. Trivially local.
- Reprojection for a stacking demo: 10–50 images. Under 30 minutes. Trivially local.
- ML classification on a few hundred transient candidates: minutes on CPU.
Conclusion for Stage 1: Run everything locally or on a single VPS. BOINC setup cost (1–2 weeks of your time) vastly exceeds the compute cost at this scale. The right tool is a background Celery worker or even just a Python script running overnight.
Stage 2: 10–50 Active Volunteer Telescopes¶
Now you're producing real data nightly. Suppose 20 telescopes, 4 hours of observing each: - 20 × 4 = 80 FITS files per night (one per hour per telescope) - Ongoing TTV monitoring for 5 systems: 5 × 12h MCMC per week = 60 CPU-hours/week - Lomb-Scargle sweeps across 5,000 targets: ~4 CPU-hours/run - Nightly transient classification: ~500 candidates
At this scale, you're spending real money on cloud compute ($50–200/month) if you centralize. Or you bootstrap BOINC with your telescope community as the first volunteer compute donors. The same people donating telescope time will likely donate CPU time if asked. This is the natural transition point.
Stage 3: 50+ Sites, Serious Campaigns¶
BOINC is fully justified. 50+ sites producing data nightly, running TTV campaigns on dozens of systems simultaneously. You now have: - A natural volunteer community who understands the project - A proven pipeline the compute tasks can consume - Enough jobs to keep 100+ volunteer CPUs busy
Decision Framework¶
Scale Action Compute Cost
─────────────────────────────────────────────────────────────────────────
Stage 1 Run locally / single Hetzner VPS $0–15/month
SQLite → PostgreSQL
cron jobs or Celery workers
Transition point First 5–10 telescope sites active $30–80/month cloud
OR 10+ MCMC jobs per week → Set up BOINC
OR running out of VPS CPU budget
Stage 2 BOINC + volunteer community $0–20/month (storage only)
Stage 3 BOINC mature + consider: cloud burst $0–50/month
for latency-sensitive jobs (transient)
while BOINC handles batch
8. Honest Assessment: What Would Actually Happen¶
If you set up BOINC at Stage 2 with 20 telescope volunteers:
- Realistically, 30–50% of telescope volunteers will also attach the BOINC client. You get 10–25 BOINC computers.
- Average volunteer machine is a 4–8 core desktop or laptop: 50–100 GFLOPS per machine.
- 20 machines × 75 GFLOPS × 50% idle time utilization = 750 GFLOPS continuous.
- Compare to: a Hetzner CX41 (4 vCPU) costs €15.90/month and delivers ~40 GFLOPS.
- You would need 19 CX41 instances (~$300/month) to match 20 volunteer desktops.
Even with only 10 active BOINC volunteers, the compute donation is worth hundreds of dollars per month. The economics work massively in your favor at scale — the question is just whether the setup cost is justified early.
The honest recommendation: do not set up BOINC for Stage 1. Do set it up at the Stage 1→2 transition, framing it as "if you join the telescope network, you can also donate CPU time for the science analysis." This ties volunteer identity to both contributions.
[NOVEL] The volunteer engagement strategy of dual-contribution identity — recruiting telescope network members as BOINC compute donors, explicitly linking telescope-time donation with CPU-time donation as two sides of the same scientific contribution — is an original community design principle for OpenAstro. Existing BOINC projects recruit compute volunteers independently of any observing programme.
See also: [[Volunteer Compute Options]] for alternatives to self-hosted BOINC. See also: [[TTV Reverse N-Body Inference]] for the science context.