Skip to content

Sourced from: dump.md

NoSQL Data Model

This document describes the NoSQL document-store data model for the cloud-scale version of OpenAstro. The MVP uses a relational schema in SQLite/PostgreSQL (see Details on componets.md). This NoSQL model is the upgrade path when the network grows to hundreds of nodes and the read/write throughput of a relational database becomes a bottleneck.

The design priority is fast reads and horizontal scalability over rigid relational integrity. The most frequent operations are heartbeat pings (write-heavy, tiny payloads) and target queries (read-heavy, per-site). These patterns favour a document store (DynamoDB, Firestore) over a relational database.


Collections

Nodes

Tracks the current operational state of every physical telescope.

Field Type Description
NodeID UUID (PK) Unique identifier for this telescope node
Status Enum ONLINE, BUSY, PARKED, FAULTED
LastHeartbeat Timestamp Time of most recent heartbeat ping
Location GeoPoint Lat/lon for visibility calculations
Aperture_mm Integer Aperture in mm
FoV_deg Float Field of view in degrees
Filters List[String] Available filter set
MountType String EQ or ALTAZ
ControlProtocol Enum ASCOM_ALPACA or INDI
ReliabilityScore Float (0–1) Dynamic score based on historical success/failure rate
CurrentTask TaskID or null If BUSY, reference to active task

Use case for resilience: Updated by every heartbeat ping (every 15–30s). Used for hot-plug removal and re-assignment. If three consecutive heartbeats are missed, Status → FAULTED and all pending tasks are instantly reassigned.


Missions

Defines the high-level scientific work order and required total exposure.

Field Type Description
MissionID UUID (PK) Unique identifier
TargetRA Float Right Ascension in decimal degrees (J2000)
TargetDec Float Declination in decimal degrees (J2000)
ScienceCase Enum OCCULTATION, TTV, GRB_FOLLOWUP, FRB_HUNT, DEEP_STACK, etc.
TotalExposureRequired_s Integer Total integration time required
TotalExposureCompleted_s Integer Running total of collected exposure
Priority Integer (1–100) Scheduler priority
TimeConstraintStart Timestamp or null Earliest acceptable observation time
TimeConstraintEnd Timestamp or null Hard deadline (e.g., event window closes)
Status Enum PENDING, IN_PROGRESS, COMPLETE, EXPIRED
AlertSourceID String or null If triggered by an external alert (GCN, TNS, etc.), reference to that alert

Use case: Used to query TotalExposureCompleted vs TotalExposureRequired to determine mission progress and trigger the stacking pipeline when Status → COMPLETE.


Tasks

The core of the scheduler's logic. Tasks are the atomic unit of work assigned to a node. They also serve as the checkpointing mechanism for hot-plug recovery.

Field Type Description
TaskID UUID (PK) Unique identifier
MissionID MissionID (FK) Parent mission
AssignedNodeID NodeID or null Which telescope is doing this
ExposureRequired_s Integer How much integration this task covers
ExposureDelivered_s Integer How much has been collected so far
Status Enum PENDING, EXECUTING, PARTIAL_COMPLETE, SUCCESS, FAILED
WindowStart Timestamp When the target rises above minimum altitude
WindowEnd Timestamp When the target sets below minimum altitude
FitsObjectStoragePath String or null Cloud storage path(s) for submitted FITS files

Hot-plug recovery: If Status = EXECUTING and the node goes FAULTED, the task is immediately marked PARTIAL_COMPLETE. The scheduler calculates ExposureRequired_s - ExposureDelivered_s and creates a new PENDING task for the remainder, assigned to the next available node.

Checkpointing: Every photon collected is saved. A task that delivers 30 min of a required 60 min still archives that 30 min and the mission continues.


Artifacts

An inventory of all scientific products generated by the network.

Field Type Description
ArtifactID UUID (PK) Unique identifier
MissionID MissionID (FK) Parent mission
ArtifactType Enum RAW_FITS, STACKED_FITS, LIGHT_CURVE, ASTROMETRY_REPORT
ObjectStoragePath String Full URL to the file in cloud object storage
SHA256Hash String Cryptographic hash for data integrity verification (also used in torrent validation)
CreatedAt Timestamp When this artifact was written
PublicAfter Timestamp or null Proprietary embargo expiry; null means immediately public
ContributingNodeIDs List[NodeID] All telescopes whose data contributed to this artifact

Use case: The SHA256Hash is embedded in the .torrent metadata for any publicly distributed stacked FITS files, guaranteeing that downloaded science data is bit-for-bit identical to the original.


Notes on Model Choices

Why UUIDs everywhere: UUIDs are globally unique without requiring coordination between nodes. A telescope in Australia can generate a valid TaskID without phoning home first. This is essential for a distributed, intermittently-connected network.

Why NoSQL over relational at scale: The heartbeat pattern (thousands of tiny writes per minute from hundreds of nodes) and the task-state pattern (frequent status updates on individual records) benefit from DynamoDB/Firestore's per-key access patterns. A relational database would require table locks and connection pooling that doesn't scale cleanly.

For the MVP: The same logical schema can be implemented in SQLite/PostgreSQL (already done — see Details on componets.md). Migration to NoSQL is a data-migration exercise when the scale demands it, not a redesign.