First Proof-of-Concept Science Target¶
This document evaluates the candidate first targets for OpenAstro's Stage 1 archival pipeline, recommends one, and lays out a project plan for executing it. The goal of Stage 1 is not to do the most ambitious science — it is to prove the pipeline works end-to-end and produce a result worth writing up. Everything else follows from that.
Written: March 2026
What Stage 1 Needs From a First Target¶
Before evaluating options, the decision criteria need to be explicit. This is not a general science prioritisation exercise. The first target must satisfy a specific set of constraints:
- Data is available right now, without scraping, without registration, without waiting. The pipeline build should not be blocked by data access.
- Success is unambiguous. There must be a pre-existing published value to check the pipeline result against. If the pipeline is broken, the discrepancy is immediately visible.
- The pipeline it exercises maps onto what Stage 2 will need. Running the pipeline on a toy problem that doesn't resemble the actual science case is not useful.
- Time to result is short. The purpose is to demonstrate the pipeline works and produce something submittable. Months, not years.
- Publication is realistic. The result must be submittable to a real venue with a plausible acceptance path.
Candidate Evaluation¶
Option A: Mira Variable Period Refinement (AAVSO)¶
What it is: Pull AAVSO archival light curves for a Mira-type long-period variable (e.g., Z UMa, T Cep, or R Leo). Run Lomb-Scargle period search across the full multi-decade dataset. Compare the derived period against published GCVS values and published period-change analyses. The science is the period evolution over time — Mira periods drift due to stellar evolution on the AGB, and a systematic analysis of the full AAVSO archive can detect and quantify this drift.
Data availability: Excellent. astroquery.aavso is a mature interface with decades of calibrated photometry. Z UMa alone has tens of thousands of observations going back to the 1880s. No API key needed for read access. Data is downloadable in minutes.
Pipeline complexity exercised: - AAVSO adapter (Section 2.1 of Stage 1 Pipeline Design) - Normalization: filter standardization, BJD_TDB conversion - Instrument registry: characterising multi-decade AAVSO observer pool - Lomb-Scargle period search (Section 9.4) - Light curve assembly
Does NOT exercise the TTV pipeline, transit model fitting, or ETD adapter. These are important for Stage 2's primary science case. However, the AAVSO adapter and the period search are the most foundational components — validating them is a real contribution.
Validation check: Compare derived period against GCVS published value and recent AAVSO analyses. A period refinement that differs from published values at the 0.1% level is a meaningful update.
Publication venue: Journal of the AAVSO (JAAVSO) is the natural first venue — peer-reviewed, indexed, specifically for this kind of analysis. PASP is a plausible second venue for a more rigorous treatment with multiple targets and a statistical analysis of period evolution.
Time to result: 2–4 weeks from a working AAVSO adapter to a draft figure. The analysis is well-understood and fast.
Scientific novelty: Low-to-medium. Period refinement of individual Mira variables is done routinely. The novel angle is the systematic pipeline-driven analysis of a sample of Miras using ensemble photometry — treating heterogeneous AAVSO observations as a calibrated dataset rather than just taking the AAVSO average. That framing makes it a methods paper as much as a science paper, which is exactly what Stage 1 should be.
Risks: Low. The main risk is that the pipeline produces a period consistent with existing literature and the paper is "we reproduced the known answer with a new pipeline." That is a valid proof-of-concept paper — the pipeline is the contribution.
Option B: Known Exoplanet TTV from ETD/ExoClock¶
What it is: Select a hot Jupiter with a known TTV signal (tidal orbital decay or gravitational perturbation). Pull all available transit mid-times from ETD and/or ExoClock/ExoFOP. Apply BJD_TDB conversion consistently. Fit a linear ephemeris and/or decay model. Compare with published values.
The Ideal TTV Target Systems.md file identifies specific targets: WASP-12 b (confirmed tidal decay, ~30 ms/yr), WASP-4 b (decay plus possible perturber), WASP-18 b (decay confirmed), WASP-47 b (confirmed multi-planet TTVs). These are well-studied, have rich timing datasets, and the signals are unambiguous.
Data availability: Good but more involved than AAVSO. ETD has no formal API — bulk download requires using http://var2.astro.cz/tresca/transit-list.php?target={planet_name}&format=csv or ExoFOP's astroquery.exofop interface. ExoClock provides a well-structured API and has been the most actively maintained source as of 2025. Data is available without registration. However, the transit mid-times from different observers use different time standards (some JD_UTC, some BJD_TDB, some HJD) — normalising these is the hard part, and it is easy to get wrong silently.
Pipeline complexity exercised: - ETD adapter (Section 2.2) - BJD_TDB normalisation under real conditions (heterogeneous time standards, mixed observer reporting) - TTV extraction (Section 9.2) - N-body inference if taken all the way (Section 9.3) — but this is not required for a first paper
For WASP-12 b specifically: fit the orbital decay model (quadratic ephemeris) and recover dP/dt. This exercises the core TTV pipeline logic. The n-body inference step is not needed because WASP-12 b's TTV is not caused by a perturber — it is tidal decay, which is modelled as a smooth quadratic term in the O-C diagram. This is actually an advantage: the science is clear, the model is simple, and the result is directly comparable to Maciejewski et al. 2016 and subsequent papers.
Validation check: The published dP/dt for WASP-12 b is well-constrained. The pipeline's output should recover this rate within published uncertainties when using the same timing dataset. If it does not, the normalisation or model is wrong — immediately diagnosable.
Publication venue: Acta Astronomica or MNRAS Letters for a pure timing update. AJ for a more systematic multi-target analysis. The ExoClock collaboration actively welcomes re-analyses of their dataset with improved methods.
Time to result: 4–8 weeks. The bottleneck is data normalisation and validating the BJD_TDB conversion across heterogeneous ETD submissions, not the analysis itself. This is harder than AAVSO but solvable.
Scientific novelty: Medium-to-high. The tidal decay rate of WASP-12 b is known, but refining it with a longer baseline (or with a consistently-normalised pipeline) has direct value. More importantly, demonstrating a pipeline that correctly handles heterogeneous ETD data is itself a contribution — no existing open-source pipeline does this end-to-end in the way OpenAstro's is designed to.
Risks: Medium. The silent failure mode is getting BJD_TDB conversion wrong for some subset of ETD submissions. This is hard to detect without careful validation against a known-good source. The risk is manageable by cross-checking a subset of timing points against the Eastman et al. 2010 BJD calculator.
Option C: AGN Variability Characterisation (AAVSO)¶
What it is: Pull AAVSO long-baseline light curves for a bright AGN (3C 273, NGC 4151, Mrk 421). Compute the structure function, power spectral density, or search for quasi-periodic oscillations (QPOs). The variability of these objects is intrinsically interesting — it probes accretion disc physics and jet variability.
Data availability: Good. 3C 273 has AAVSO data going back to 1927. NGC 4151 is similarly well-covered. Same astroquery.aavso interface as Option A.
Pipeline complexity exercised: Same as Option A — AAVSO adapter, normalisation, light curve assembly. The science analysis uses the same Lomb-Scargle and time-series tools. No ETD or TTV components exercised.
Publication venue: MNRAS, ApJ, or Acta Astronomica. AGN variability papers are numerous; novelty threshold is higher than for Mira period refinement.
Time to result: 2–4 weeks for the light curve; longer for a rigorous structure function analysis.
Scientific novelty: Lower than it appears. AGN variability characterisation from AAVSO data has been done many times. The structure function of 3C 273 in particular is very well-studied. Finding a QPO would be novel, but QPOs in AGN are generally marginal detections and hard to publish without extensive follow-up. This option carries more risk of producing a null or unexciting result.
Assessment: This is the weakest option. It exercises the same pipeline components as Option A with lower novelty and a harder publication path. Skip it unless Options A and B are blocked.
Option D: Asteroid Rotation Period from MPC Data¶
What it is: Pull MPC photometric observations for a sample of asteroids with poorly-constrained rotation periods. Fit a period using Lomb-Scargle. Publish the rotation periods and amplitude estimates. Target selection criterion: objects with only 1–2 nights of data in the literature, where the published period has large uncertainty or no formal period determination.
Data availability: The MPC provides astrometric data (positions) freely through astroquery.mpc. However, Stage 1 Pipeline Design (Section 2.3) is explicit: MPC data is primarily astrometric; magnitudes are incidental and poorly calibrated. MPC magnitudes are reduced-magnitude estimates, not calibrated photometry — they are not suitable for precision period analysis. To do this properly, you would need photometric observations from a dedicated survey (e.g., Lowell Observatory Database of Asteroid Orbits, or LCDB), not MPC.
The Lightcurve Database (LCDB, Warner et al.) does have calibrated photometry, but it is not a live API feed — it requires bulk downloads and is less amenable to the modular pipeline architecture.
Pipeline complexity exercised: Limited. Asteroid rotation period analysis does not exercise the TTV pipeline, the transit model, or the ETD adapter. It does exercise the normalisation and Lomb-Scargle components, but in a mode that is not directly relevant to any of Stage 2's primary science cases (TTVs, occultations, GRB follow-up).
Publication venue: Minor Planet Bulletin — low-barrier peer-reviewed venue for asteroid photometry. Publication is achievable, but the scientific impact is lower than other options.
Time to result: Fast if data is already available; slow if photometric data needs to be sourced from non-MPC archives.
Scientific novelty: Low. Asteroid rotation period determination is routine. The pipeline contribution is not differentiated.
Assessment: This is appealing as a low-risk option, but the MPC data quality limitation makes it less suited to the actual pipeline architecture than it initially appears. The publication venue is appropriate but lower-impact than the project needs for its first paper. The pipeline components exercised do not map onto Stage 2's primary science cases.
Option E: Eclipsing Binary O-C Residuals (AAVSO + ETD)¶
This is an alternative not in the original brief that emerges from reading the vault.
What it is: Eclipse timing of well-known eclipsing binaries. AAVSO has extensive long-baseline coverage of systems like Algol (beta Persei), beta Lyrae, and W UMa contact binaries. O-C (Observed minus Calculated) diagrams for eclipsing binaries reveal period changes from mass transfer, third bodies, or apsidal precession. The pipeline for this is nearly identical to the TTV pipeline — you are fitting an ephemeris and extracting timing residuals.
Why it's worth mentioning: The pipeline code for eclipse timing is essentially the same as transit timing. The normalisation, BJD_TDB conversion, and O-C analysis code is reused. A paper on eclipsing binary period changes using AAVSO data would validate the timing pipeline while using only the mature AAVSO data source — the best of both Options A and B.
However, this is not a clean recommendation for the first paper because: (1) O-C analysis of well-studied eclipsing binaries is heavily competed; (2) there is no clear "novel result" hook without doing an extensive literature search for which systems have unexplained period changes; (3) it does not exercise the ETD adapter, which needs validation before Stage 2.
This is a strong second paper after Option A is done, or a parallel track.
Recommendation: Option A (AAVSO Mira Period Refinement), With Option B Immediately Following¶
Why Option A is the right first move¶
The single strongest argument is validation. The first paper needs to demonstrate that the pipeline produces correct results — not interesting new results, but correct results. Option A gives this cleanly: AAVSO has decades of Mira observations, the periods are published in the GCVS with formal values, and the period drift of several Miras (including Z UMa and T Cep) has been studied in published literature. The pipeline can be validated by checking its output against these known values before claiming any new science.
Option B is higher-impact but carries a real silent failure risk from heterogeneous time standards in ETD data. Getting BJD_TDB conversion wrong for a subset of submissions would corrupt the TTV signal in a way that might not be immediately obvious. Option A has no analogous silent failure mode — AAVSO already normalises observer submissions, and the BJD conversion of a single-source dataset is straightforward to verify.
The framing matters: the first paper is a pipeline validation paper that happens to produce science, not a science paper that happens to use a new pipeline. Option A fits this framing perfectly. The paper's contribution is: "We built a calibration pipeline for heterogeneous amateur photometry archives, ran it on multi-decade AAVSO data for a sample of Mira variables, and characterised period evolution." The period results are the payload that demonstrates the pipeline works.
Why Option B immediately follows¶
After Option A is done, Option B (WASP-12 b or WASP-4 b TTV re-analysis) is the natural second paper. At that point the pipeline is validated, the ETD adapter is the only new component, and the BJD_TDB normalisation logic from Option A can be re-used directly. The second paper exercises the complete TTV pipeline and positions OpenAstro as a TTV platform before Stage 2 volunteer recruiting begins. Having two papers in draft — one pipeline methods, one science — is a much stronger recruiting argument than one paper alone.
Project Plan: Option A Execution¶
Target Selection¶
Primary: T Cep (T Cephei)
Rationale: - Period approximately 388 days; thousands of AAVSO observations from 1900 to present - Published period changes documented in the GCVS and in Templeton et al. (2005, AJ, 130, 776) — this is the comparison value - Less analysed in recent literature than Z UMa or Mira itself — more room for a new contribution - Period shows secular decrease on a multi-decade timescale, making the O-C analysis non-trivial - Bright (V ~ 5.5 at maximum, ~9 at minimum) — easy for AAVSO observers across all equipment classes
Secondary targets to provide statistical sample: Z UMa, R Leo, W Cyg — each has different period-change character, which makes a multi-target paper more interesting than a single-object study.
Phase 1: Data Ingest and Normalisation (Week 1–2)¶
- Stand up the AAVSO adapter from Stage 1 Pipeline Design (Section 2.1). Pull T Cep, Z UMa, R Leo, W Cyg in V and B bands, full archive.
- Run the normalisation stage: filter standardisation, BJD_TDB conversion, deduplication, outlier flagging.
- Validation check: compute a simple period using Lomb-Scargle on the raw (uncalibrated) data. Compare to GCVS period. If the pipeline is broken, this check catches it immediately.
- Populate the instrument registry for all AAVSO observer codes that contributed to these targets. This is a byproduct, not the main goal, but it is the foundation for Stage 2.
Success criterion: Lomb-Scargle period agrees with published GCVS period to within 1% for each target.
Phase 2: Calibration and Light Curve Assembly (Week 2–3)¶
- Run the calibration pipeline (zero-point correction using APASS comparison stars, colour term correction where B and V observations are both available).
- Apply ensemble photometry combination for nights with multiple AAVSO observers.
- Assemble the final calibrated light curves.
- Inspect: plot the full 100+ year V-band light curve for each target. Look for data gaps, outlier epochs, comparison star revision events.
Success criterion: Calibrated light curve scatter within 10-night bins is reduced compared to the raw AAVSO data. Zero-point corrections are consistent with known instrument systematics in the literature.
Phase 3: Science Analysis (Week 3–4)¶
- Compute the O-C diagram: fit a linear ephemeris to the first 20 years of data, then compute observed minus calculated times for the remaining baseline. The O-C diagram reveals period changes.
- Fit period evolution models: linear (constant period), quadratic (uniform period change), and piecewise (to detect episodic changes associated with thermal pulses on the AGB). Use AIC/BIC to select the best model for each target.
- Compare derived period evolution rates against published values from Templeton et al. and subsequent AAVSO studies.
- For any discrepancy: check if the difference is explained by a difference in the time baseline used, the calibration method, or a genuinely new result.
Success criterion: For at least two of the four targets, the pipeline-derived period change rate agrees with published values within 2σ. Any discrepancy has a clear explanation.
Phase 4: Paper Preparation (Week 4–6)¶
Title direction: "Period Evolution of a Sample of Mira-Type Variables: A Calibration Study with the OpenAstro Archival Pipeline"
Paper structure: - Section 1: Introduction — why Mira period changes matter (AGB thermal pulses, stellar evolution); why a consistent calibration pipeline is valuable for heterogeneous archives - Section 2: Data — AAVSO archive, observer pool characterisation, instrument registry - Section 3: Pipeline — normalisation, calibration, ensemble combination (cite Stage 1 Pipeline Design as a living document; the paper describes a specific application) - Section 4: Period Analysis — O-C diagrams for each target, period evolution model selection - Section 5: Discussion — comparison with published values; sources of systematic uncertainty; implications for AGB stellar evolution models - Section 6: Conclusions — pipeline validated; period refinements for T Cep, Z UMa, R Leo, W Cyg; instrument registry as community resource
Submission venue: Journal of the AAVSO (JAAVSO), primary target. Fallback: New Astronomy (Elsevier, broad scope, fast turnaround). arXiv:astro-ph.SR preprint simultaneous with submission.
Key figures: 1. Full 100-year V-band light curve for T Cep 2. O-C diagram for each target showing period evolution 3. Instrument registry summary: zero-point distribution across observer pool 4. Period evolution rates: OpenAstro pipeline vs. published literature (comparison table/figure)
After the First Paper: Transition to Option B¶
Once Option A paper is submitted (not necessarily accepted — submission is the Stage 1 milestone), begin the ETD adapter and WASP-12 b TTV analysis in parallel with any revisions. Target: second paper submitted within 3 months of first. The two papers together — one demonstrating the pipeline on AAVSO data, one applying the TTV component to ETD data — make a compelling case for Stage 2 volunteer recruiting.
Decision Record¶
| Option | Data Access | Pipeline Coverage | Novelty | Time to Result | Publication Ease | Risk |
|---|---|---|---|---|---|---|
| A: Mira period (AAVSO) | Excellent | AAVSO adapter, Lomb-Scargle | Medium | 4–6 weeks | High (JAAVSO) | Low |
| B: TTV from ETD | Good | ETD adapter, TTV pipeline, BJD normalisation | Medium-high | 6–10 weeks | Medium (Acta Astron, AJ) | Medium |
| C: AGN variability | Excellent | AAVSO adapter, time series | Low | 4–6 weeks | Low (crowded field) | Low |
| D: Asteroid rotation (MPC) | Poor (MPC mag quality) | Limited (MPC adapter, LS) | Low | Variable | Low (MPB only) | Medium |
| E: EB O-C (AAVSO) | Excellent | AAVSO adapter, O-C analysis | Medium | 4–6 weeks | Medium | Low |
Choice: Option A. Option B follows as the second paper. Option C is deprioritised (low novelty). Option D is deprioritised (data quality limitation). Option E is a viable parallel track.
Status: Recommendation made — not yet executed Dependencies: AAVSO adapter (Section 2.1 of Stage 1 Data Pipeline Design.md), normalisation module, Lomb-Scargle period search (Section 9.4) Next action: Stand up AAVSO adapter and pull T Cep full archive. Validate period against GCVS. Last updated: 2026-03-21