Sourced from:
dump.md
NoSQL Data Model¶
This document describes the NoSQL document-store data model for the cloud-scale version of OpenAstro. The MVP uses a relational schema in SQLite/PostgreSQL (see Details on componets.md). This NoSQL model is the upgrade path when the network grows to hundreds of nodes and the read/write throughput of a relational database becomes a bottleneck.
The design priority is fast reads and horizontal scalability over rigid relational integrity. The most frequent operations are heartbeat pings (write-heavy, tiny payloads) and target queries (read-heavy, per-site). These patterns favour a document store (DynamoDB, Firestore) over a relational database.
Collections¶
Nodes¶
Tracks the current operational state of every physical telescope.
| Field | Type | Description |
|---|---|---|
NodeID |
UUID (PK) | Unique identifier for this telescope node |
Status |
Enum | ONLINE, BUSY, PARKED, FAULTED |
LastHeartbeat |
Timestamp | Time of most recent heartbeat ping |
Location |
GeoPoint | Lat/lon for visibility calculations |
Aperture_mm |
Integer | Aperture in mm |
FoV_deg |
Float | Field of view in degrees |
Filters |
List[String] | Available filter set |
MountType |
String | EQ or ALTAZ |
ControlProtocol |
Enum | ASCOM_ALPACA or INDI |
ReliabilityScore |
Float (0–1) | Dynamic score based on historical success/failure rate |
CurrentTask |
TaskID or null | If BUSY, reference to active task |
Use case for resilience: Updated by every heartbeat ping (every 15–30s). Used for hot-plug removal and re-assignment. If three consecutive heartbeats are missed, Status → FAULTED and all pending tasks are instantly reassigned.
Missions¶
Defines the high-level scientific work order and required total exposure.
| Field | Type | Description |
|---|---|---|
MissionID |
UUID (PK) | Unique identifier |
TargetRA |
Float | Right Ascension in decimal degrees (J2000) |
TargetDec |
Float | Declination in decimal degrees (J2000) |
ScienceCase |
Enum | OCCULTATION, TTV, GRB_FOLLOWUP, FRB_HUNT, DEEP_STACK, etc. |
TotalExposureRequired_s |
Integer | Total integration time required |
TotalExposureCompleted_s |
Integer | Running total of collected exposure |
Priority |
Integer (1–100) | Scheduler priority |
TimeConstraintStart |
Timestamp or null | Earliest acceptable observation time |
TimeConstraintEnd |
Timestamp or null | Hard deadline (e.g., event window closes) |
Status |
Enum | PENDING, IN_PROGRESS, COMPLETE, EXPIRED |
AlertSourceID |
String or null | If triggered by an external alert (GCN, TNS, etc.), reference to that alert |
Use case: Used to query TotalExposureCompleted vs TotalExposureRequired to determine mission progress and trigger the stacking pipeline when Status → COMPLETE.
Tasks¶
The core of the scheduler's logic. Tasks are the atomic unit of work assigned to a node. They also serve as the checkpointing mechanism for hot-plug recovery.
| Field | Type | Description |
|---|---|---|
TaskID |
UUID (PK) | Unique identifier |
MissionID |
MissionID (FK) | Parent mission |
AssignedNodeID |
NodeID or null | Which telescope is doing this |
ExposureRequired_s |
Integer | How much integration this task covers |
ExposureDelivered_s |
Integer | How much has been collected so far |
Status |
Enum | PENDING, EXECUTING, PARTIAL_COMPLETE, SUCCESS, FAILED |
WindowStart |
Timestamp | When the target rises above minimum altitude |
WindowEnd |
Timestamp | When the target sets below minimum altitude |
FitsObjectStoragePath |
String or null | Cloud storage path(s) for submitted FITS files |
Hot-plug recovery: If Status = EXECUTING and the node goes FAULTED, the task is immediately marked PARTIAL_COMPLETE. The scheduler calculates ExposureRequired_s - ExposureDelivered_s and creates a new PENDING task for the remainder, assigned to the next available node.
Checkpointing: Every photon collected is saved. A task that delivers 30 min of a required 60 min still archives that 30 min and the mission continues.
Artifacts¶
An inventory of all scientific products generated by the network.
| Field | Type | Description |
|---|---|---|
ArtifactID |
UUID (PK) | Unique identifier |
MissionID |
MissionID (FK) | Parent mission |
ArtifactType |
Enum | RAW_FITS, STACKED_FITS, LIGHT_CURVE, ASTROMETRY_REPORT |
ObjectStoragePath |
String | Full URL to the file in cloud object storage |
SHA256Hash |
String | Cryptographic hash for data integrity verification (also used in torrent validation) |
CreatedAt |
Timestamp | When this artifact was written |
PublicAfter |
Timestamp or null | Proprietary embargo expiry; null means immediately public |
ContributingNodeIDs |
List[NodeID] | All telescopes whose data contributed to this artifact |
Use case: The SHA256Hash is embedded in the .torrent metadata for any publicly distributed stacked FITS files, guaranteeing that downloaded science data is bit-for-bit identical to the original.
Notes on Model Choices¶
Why UUIDs everywhere: UUIDs are globally unique without requiring coordination between nodes. A telescope in Australia can generate a valid TaskID without phoning home first. This is essential for a distributed, intermittently-connected network.
Why NoSQL over relational at scale: The heartbeat pattern (thousands of tiny writes per minute from hundreds of nodes) and the task-state pattern (frequent status updates on individual records) benefit from DynamoDB/Firestore's per-key access patterns. A relational database would require table locks and connection pooling that doesn't scale cleanly.
For the MVP: The same logical schema can be implemented in SQLite/PostgreSQL (already done — see Details on componets.md). Migration to NoSQL is a data-migration exercise when the scale demands it, not a redesign.