Skip to content

The telescope sends a request to the server to bid on its behalf. The bid is then placed on the server and depending on the auction time for each task. Now, a shorter auction time obviously means that the task will be finished quicker, but maybe not done by the best telescope. A longer auction time means higher quality. We vectorize these bids obviously, so it converts its ability to capture that task into a vector, and that vector is what’s actually sent as the bid. We can also implement RL bidding management later on, where the RL actually bids on the behalf of the telescope. The telescope says, hey, I’m free bid on my behalf. Server bids, one selected, boom. If it’s outbidded, it’s automatically removed from the queue, so that it doesn’t stay stuck in two queues basically. In the event that it wins, is likely to win two, or if it wins two, we’ll just do both. One after the other in priority order.

1. ⚙️ Advanced Scheduling & Control (The Global Orchestrator)

For 100s of heterogeneous telescopes working on competing projects (SSA vs. Deep Sky vs. Transients), the "Greedy Handover" algorithm is insufficient. You need an advanced, multi-objective optimization system.

A. Scheduling Algorithms (Solving the NP-Hard Problem)

The core scheduler must resolve conflicts and maximize the "scientific value" of every observation slot. Professional networks like the Las Cumbres Observatory Global Telescope (LCOGT) use these techniques:

Algorithm Description Application
Integer Linear Programming (ILP) Models all possible observations, constraints (weather, time, airmass), and priorities as a vast set of linear equations. It finds the mathematically optimal schedule that maximizes the total scientific return. Ideal for Deep Sky and long-term Cadence projects where maximizing total exposure time is key.
Genetic Algorithms (GA) Uses evolution-inspired heuristics (mutation, crossover) to find a near-optimal solution quickly. Better for problems with a large, complex search space. Useful for finding efficient SSA tracks where targets are numerous and visible windows are small.
Real-time Reinforcement Learning (RL) The system learns the best action (which telescope to point where) based on the reward (scientific completion) it receives, especially in response to dynamic events. Critical for Target of Opportunity (ToO) events, such as gravitational wave or neutrino alerts, requiring a decision in seconds.

B. High-Availability Control Architecture

The central scheduler cannot be a single server. A network of this scale requires a Distributed Agent Model.

  • Global Scheduler (Cloud): Resides in a cloud environment (AWS, Azure, GCP). Runs the complex optimization algorithms and manages the Request Database.

  • Site Agent (Edge): A local computer (at each observatory site) that caches the schedule for the next 72 hours.

  • Resilience: The Site Agent continues observing and managing hardware (domes, power) even if the network link to the Global Scheduler is lost for days (Network Outage Tolerance).

  • Local Control: It is the only component that directly talks to the telescope via the ASCOM Alpaca or INDI standards, handling all low-level commands and the Plate Solving feedback loop.

  • We need to develop an algo on how to vectorize the quality of a telescope

  • We need an algo to vectorize a task?

Do we vectorize it or do we try to keep it to a single number; I think there might be a way to make it a single number of just quality what are we representing the scope quality as that is a big question honestly

  1. A bid mgmt system

Telescope Auction System - Microservices Architecture

System Overview

A distributed auction platform for heterogeneous telescope networks with vector-based capability matching and real-time bidding.


Core Microservices

1. Task Registry Service

Responsibility: Manage observation task lifecycle

Key Functions:

  • Task ingestion & validation
  • Store task requirements as vectors (wavelength range, resolution, exposure time, coordinates, time windows)
  • Calculate minimum capability threshold vector
  • Task status management (pending → auctioning → assigned → completed)

Data Model:

Task {
  id, submitter_id, priority_level,
  requirement_vector: [aperture_min, wavelength_range, resolution, location_constraint, ...],
  min_threshold_vector,
  time_window: {start, end, flexibility},
  auction_duration,
  status, assigned_telescope_id
}

APIs:

  • POST /tasks - Submit new task
  • GET /tasks/{id} - Get task details
  • PATCH /tasks/{id}/status - Update task status

2. Telescope Registry Service

Responsibility: Maintain telescope fleet inventory

Key Functions:

  • Register/update telescope capabilities
  • Real-time availability tracking
  • Capability vector generation (aperture, wavelength sensitivity, location quality, instrument suite)
  • Health monitoring

Data Model:

Telescope {
  id, name, location: {lat, lon, elevation},
  capability_vector: [aperture, wavelength_bands[], resolution, mount_type_score, instrument_suite[], seeing_quality],
  current_status: {available, busy, maintenance},
  queue: [task_ids],
  specializations: [planetary, deep_sky, spectroscopy, ...]
}

APIs:

  • POST /telescopes - Register telescope
  • PATCH /telescopes/{id}/status - Update availability
  • GET /telescopes/{id}/capabilities - Get capability vector

3. Auction Engine Service

Responsibility: Orchestrate bidding process

Key Functions:

  • Create auctions for tasks (normal vs high-priority queues)
  • Broadcast auction announcements to eligible telescopes
  • Collect and rank bids
  • Winner selection based on vector similarity scoring
  • Handle auction timeouts and re-auctions

Auction Flow:

1. Task submitted → minimum threshold check
2. Broadcast to telescopes passing threshold
3. Collect bids during auction_duration
4. Rank by similarity score
5. Select winner → notify Task Registry & Telescope
6. Losers auto-removed from consideration

APIs:

  • POST /auctions - Create auction
  • GET /auctions/{id}/bids - Get current bids
  • POST /auctions/{id}/close - Force close auction

4. Bidding Service

Responsibility: Generate and submit bids on behalf of telescopes

Key Functions:

  • Monitor auction broadcasts
  • Calculate bid vector (telescope capability × task fit)
  • Compute similarity score: cosine_similarity(task_requirement_vector, telescope_capability_vector)
  • Submit bids if above minimum threshold
  • Future: RL agent integration for strategic bidding

Bid Calculation:

# Simplified version
def generate_bid(telescope_caps, task_reqs):
    # Element-wise compatibility
    compatibility_vector = element_wise_min(telescope_caps, task_reqs) / task_reqs

    # Weighted importance (some factors matter more)
    weights = [aperture_weight, wavelength_weight, ...]

    bid_vector = compatibility_vector * weights
    bid_score = cosine_similarity(bid_vector, task_reqs)

    return bid_vector, bid_score

APIs:

  • POST /bids - Submit bid
  • DELETE /bids/{id} - Withdraw bid (if outbid elsewhere)

5. Queue Management Service

Responsibility: Prevent double-booking and manage telescope schedules

Key Functions:

  • Track active bids per telescope
  • Auto-remove telescope from other auctions when it wins one
  • Priority ordering when telescope wins multiple tasks
  • Schedule conflict resolution

Logic:

On telescope wins Task A:
  1. Remove telescope from all other active auctions
  2. Add Task A to telescope queue
  3. If queue has multiple tasks → sort by priority_level

On telescope freed:
  4. Check queue
  5. If empty → mark available, re-enable auction participation
  6. If has tasks → execute next in queue

APIs:

  • GET /telescopes/{id}/queue - Get task queue
  • POST /telescopes/{id}/queue/reorder - Manual priority adjustment

6. Vector Similarity Service

Responsibility: Centralized vector operations and scoring

Key Functions:

  • Cosine similarity calculations
  • Euclidean distance (for alternative scoring)
  • Dimensionality reduction (if vectors get too large)
  • Threshold validation

Why separate service?

  • Computationally intensive
  • Might want GPU acceleration later
  • Reusable across bidding and matching

APIs:

  • POST /similarity/cosine - Calculate cosine similarity
  • POST /similarity/threshold_check - Verify if telescope meets minimum

7. Notification Service

Responsibility: Real-time updates to telescopes and task submitters

Key Functions:

  • WebSocket connections to telescopes
  • Auction start/end notifications
  • Bid acceptance/rejection alerts
  • Task assignment confirmations

Events:

- auction.started
- auction.closed
- bid.accepted
- bid.rejected
- task.assigned
- task.completed

8. Analytics & RL Training Service (Future)

Responsibility: Learning optimal bidding strategies

Key Functions:

  • Historical auction data collection
  • Telescope utilization tracking
  • RL model training (bid amount optimization, auction timing)
  • A/B testing for bidding strategies

RL State Space:

State: [telescope_queue_length, task_urgency, current_bids_count, time_remaining_in_auction, ...]
Action: bid_aggressiveness_factor (0.0 - 1.0)
Reward: (task_completed * priority_weight) - (opportunity_cost)

Supporting Services

9. API Gateway

  • Single entry point
  • Authentication/authorization
  • Rate limiting
  • Request routing

10. Event Bus (Message Queue)

  • Tech: RabbitMQ / Kafka
  • Auction events, bid submissions, task updates
  • Ensures asynchronous communication

11. Database Strategy

  • Task Registry: PostgreSQL (relational, ACID for task states)
  • Telescope Registry: PostgreSQL + Redis (fast availability checks)
  • Auction Engine: Redis (fast in-memory auction state)
  • Vector Storage: Specialized vector DB like Pinecone/Weaviate OR PostgreSQL with pgvector extension

High-Priority Auction Handling

Two-tier queue system:

  1. Normal Priority Queue:

    • Standard auction duration
    • Fair competition
    • High Priority Queue:

    • Shorter auction duration (faster matching)

    • May have premium scoring (priority_boost in similarity calculation)
    • Pre-emption rights (can bump normal tasks if telescope already assigned)

Implementation:

if task.priority == "high":
    auction_duration = 30s  # vs 5min for normal
    if telescope_already_assigned_to_normal_task:
        consider_preemption()  # based on priority delta

Questions for you:

  1. Vector dimensions: What specific factors should we include? I'm thinking:

    • Aperture (mm)
    • Wavelength coverage (UV/Optical/IR bitmap or ranges)
    • Resolution (arcsec/pixel)
    • Location quality (seeing, light pollution score)
    • Instrument suite (spectrograph, camera types)
    • Mount precision (tracking accuracy)

    Should we normalize these to 0-1 range?

  2. Auction duration: Should this be fixed per priority level, or dynamic based on task complexity?

  3. Re-auction policy: If a task gets no valid bids, do we:

    • Relax minimum thresholds?
    • Extend auction time?
    • Alert submitter?
    • Technology preferences: Python/Go for services? Any cloud platform preference (AWS/GCP as you mentioned)?

Want me to dive deeper into any specific service or create API specifications?


Great concept! Let me make sure I understand the core system, then I'll ask clarifying questions.

What I understand so far:

You're building an auction-based telescope scheduling system where:

  • Telescopes bid on observation tasks
  • Auction time = tradeoff between speed vs quality matching
  • Bids are vectorized representations of telescope capability for specific tasks
  • Future RL agent could manage bidding strategy
  • Anti-double-booking: if outbid, removed from queue
  • If telescope wins multiple tasks: executes in priority order

Key questions I need answered:

1. Task characteristics - what defines an observation task?

  • What parameters matter? (target coordinates, wavelength/spectrum needed, exposure time, resolution requirements, time constraints like "must observe during X window", weather conditions?)
  • Are some tasks more time-sensitive than others?
  • Do tasks have minimum quality thresholds or is it just "best available"?

2. Telescope capabilities - what makes one telescope better for a task?

  • Aperture size, wavelength sensitivity, location (weather/seeing conditions), mount type, instrument availability?
  • Is this for a network of different telescope types or similar ones?
  • Do telescopes have specializations (planetary, deep sky, spectroscopy, etc.)?

3. The vectorization vs single number question: I'm leaning toward vector representation because:

  • Telescopes are multidimensional (aperture, location, instruments)
  • Tasks are multidimensional (target type, wavelength, resolution)
  • A single "quality score" loses nuance (a giant infrared telescope might be "high quality" but terrible for UV observations)

But we could use vectors with a similarity metric (cosine similarity, euclidean distance) to get a single bid strength number. Thoughts?

4. Auction mechanics:

  • How do you determine the winner? Highest bid score? Or bid score + auction time weighting?
  • Who sets the auction duration - the task submitter or the system?
  • What happens if NO telescope bids meet minimum requirements?

5. Priority order for multiple wins:

  • How is priority determined? Task urgency? Scientific value? Bid strength?

Let me know which areas to dive deeper on!

Telescope Auction System - Microservices Architecture

System Overview

A distributed auction platform for heterogeneous telescope networks with vector-based capability matching and real-time bidding.


Core Microservices

1. Task Registry Service

Responsibility: Manage observation task lifecycle

Key Functions:

  • Task ingestion & validation
  • Store task requirements as vectors (wavelength range, resolution, exposure time, coordinates, time windows)
  • Calculate minimum capability threshold vector
  • Task status management (pending → auctioning → assigned → completed)

Data Model:

Task {
  id, submitter_id, priority_level,
  requirement_vector: [aperture_min, wavelength_range, resolution, location_constraint, ...],
  min_threshold_vector,
  time_window: {start, end, flexibility},
  auction_duration,
  status, assigned_telescope_id
}

APIs:

  • POST /tasks - Submit new task
  • GET /tasks/{id} - Get task details
  • PATCH /tasks/{id}/status - Update task status

2. Telescope Registry Service

Responsibility: Maintain telescope fleet inventory

Key Functions:

  • Register/update telescope capabilities
  • Real-time availability tracking
  • Capability vector generation (aperture, wavelength sensitivity, location quality, instrument suite)
  • Health monitoring

Data Model:

Telescope {
  id, name, location: {lat, lon, elevation},
  capability_vector: [aperture, wavelength_bands[], resolution, mount_type_score, instrument_suite[], seeing_quality],
  current_status: {available, busy, maintenance},
  queue: [task_ids],
  specializations: [planetary, deep_sky, spectroscopy, ...]
}

APIs:

  • POST /telescopes - Register telescope
  • PATCH /telescopes/{id}/status - Update availability
  • GET /telescopes/{id}/capabilities - Get capability vector

3. Auction Engine Service

Responsibility: Orchestrate bidding process

Key Functions:

  • Create auctions for tasks (normal vs high-priority queues)
  • Broadcast auction announcements to eligible telescopes
  • Collect and rank bids
  • Winner selection based on vector similarity scoring
  • Handle auction timeouts and re-auctions

Auction Flow:

1. Task submitted → minimum threshold check
2. Broadcast to telescopes passing threshold
3. Collect bids during auction_duration
4. Rank by similarity score
5. Select winner → notify Task Registry & Telescope
6. Losers auto-removed from consideration

APIs:

  • POST /auctions - Create auction
  • GET /auctions/{id}/bids - Get current bids
  • POST /auctions/{id}/close - Force close auction

4. Bidding Service

Responsibility: Generate and submit bids on behalf of telescopes

Key Functions:

  • Monitor auction broadcasts
  • Calculate bid vector (telescope capability × task fit)
  • Compute similarity score: cosine_similarity(task_requirement_vector, telescope_capability_vector)
  • Submit bids if above minimum threshold
  • Future: RL agent integration for strategic bidding

Bid Calculation:

# Simplified version
def generate_bid(telescope_caps, task_reqs):
    # Element-wise compatibility
    compatibility_vector = element_wise_min(telescope_caps, task_reqs) / task_reqs

    # Weighted importance (some factors matter more)
    weights = [aperture_weight, wavelength_weight, ...]

    bid_vector = compatibility_vector * weights
    bid_score = cosine_similarity(bid_vector, task_reqs)

    return bid_vector, bid_score

APIs:

  • POST /bids - Submit bid
  • DELETE /bids/{id} - Withdraw bid (if outbid elsewhere)

5. Queue Management Service

Responsibility: Prevent double-booking and manage telescope schedules

Key Functions:

  • Track active bids per telescope
  • Auto-remove telescope from other auctions when it wins one
  • Priority ordering when telescope wins multiple tasks
  • Schedule conflict resolution

Logic:

On telescope wins Task A:
  1. Remove telescope from all other active auctions
  2. Add Task A to telescope queue
  3. If queue has multiple tasks → sort by priority_level

On telescope freed:
  4. Check queue
  5. If empty → mark available, re-enable auction participation
  6. If has tasks → execute next in queue

APIs:

  • GET /telescopes/{id}/queue - Get task queue
  • POST /telescopes/{id}/queue/reorder - Manual priority adjustment

6. Vector Similarity Service

Responsibility: Centralized vector operations and scoring

Key Functions:

  • Cosine similarity calculations
  • Euclidean distance (for alternative scoring)
  • Dimensionality reduction (if vectors get too large)
  • Threshold validation

Why separate service?

  • Computationally intensive
  • Might want GPU acceleration later
  • Reusable across bidding and matching

APIs:

  • POST /similarity/cosine - Calculate cosine similarity
  • POST /similarity/threshold_check - Verify if telescope meets minimum

7. Notification Service

Responsibility: Real-time updates to telescopes and task submitters

Key Functions:

  • WebSocket connections to telescopes
  • Auction start/end notifications
  • Bid acceptance/rejection alerts
  • Task assignment confirmations

Events:

- auction.started
- auction.closed
- bid.accepted
- bid.rejected
- task.assigned
- task.completed

8. Analytics & RL Training Service (Future)

Responsibility: Learning optimal bidding strategies

Key Functions:

  • Historical auction data collection
  • Telescope utilization tracking
  • RL model training (bid amount optimization, auction timing)
  • A/B testing for bidding strategies

RL State Space:

State: [telescope_queue_length, task_urgency, current_bids_count, time_remaining_in_auction, ...]
Action: bid_aggressiveness_factor (0.0 - 1.0)
Reward: (task_completed * priority_weight) - (opportunity_cost)

Supporting Services

9. API Gateway

  • Single entry point
  • Authentication/authorization
  • Rate limiting
  • Request routing

10. Event Bus (Message Queue)

  • Tech: RabbitMQ / Kafka
  • Auction events, bid submissions, task updates
  • Ensures asynchronous communication

11. Database Strategy

  • Task Registry: PostgreSQL (relational, ACID for task states)
  • Telescope Registry: PostgreSQL + Redis (fast availability checks)
  • Auction Engine: Redis (fast in-memory auction state)
  • Vector Storage: Specialized vector DB like Pinecone/Weaviate OR PostgreSQL with pgvector extension

High-Priority Auction Handling

Two-tier queue system:

  1. Normal Priority Queue:

    • Standard auction duration
    • Fair competition
    • High Priority Queue:

    • Shorter auction duration (faster matching)

    • May have premium scoring (priority_boost in similarity calculation)
    • Pre-emption rights (can bump normal tasks if telescope already assigned)

Implementation:

if task.priority == "high":
    auction_duration = 30s  # vs 5min for normal
    if telescope_already_assigned_to_normal_task:
        consider_preemption()  # based on priority delta

Questions for you:

  1. Vector dimensions: What specific factors should we include? I'm thinking:

    • Aperture (mm)
    • Wavelength coverage (UV/Optical/IR bitmap or ranges)
    • Resolution (arcsec/pixel)
    • Location quality (seeing, light pollution score)
    • Instrument suite (spectrograph, camera types)
    • Mount precision (tracking accuracy)

    Should we normalize these to 0-1 range?

  2. Auction duration: Should this be fixed per priority level, or dynamic based on task complexity?

  3. Re-auction policy: If a task gets no valid bids, do we:

    • Relax minimum thresholds?
    • Extend auction time?
    • Alert submitter?
    • Technology preferences: Python/Go for services? Any cloud platform preference (AWS/GCP as you mentioned)?

Want me to dive deeper into any specific service or create API specifications?

Vector Representation Deep Dive

Let me break down vectorization in a way that'll make implementation crystal clear for you.


What is a Vector in This Context?

A vector is just an ordered list of numbers that represents characteristics. Think of it like a standardized report card where every telescope and every task gets graded on the same criteria.

Example:

Telescope A = [0.8, 0.6, 0.9, 0.3, 0.7]
                ↑    ↑    ↑    ↑    ↑
             aperture|resolution|location|instruments
                  wavelength

Step-by-Step Vectorization Process

Step 1: Define Your Dimensions (Features)

Let's start with a practical set. I'll explain each:

VECTOR_DIMENSIONS = {
    0: "aperture_score",           # Telescope light-gathering power
    1: "wavelength_coverage",      # What light spectrum it can see
    2: "resolution_score",         # Image sharpness capability
    3: "location_quality",         # Sky conditions (seeing, weather)
    4: "instrument_versatility",   # Available instruments
    5: "mount_precision",          # Tracking accuracy
    6: "field_of_view"            # How much sky it can see at once
}

Total dimensions: 7 (you can add/remove based on what actually matters for your use case)


Step 2: Normalization (Converting Real Values to 0-1 Scale)

This is critical because different features have wildly different scales:

  • Aperture: 100mm to 10,000mm
  • Resolution: 0.5 arcsec/pixel to 5 arcsec/pixel
  • Location quality: Subjective

Normalization ensures fairness in comparisons.

Method: Min-Max Normalization

Formula: normalized_value = (actual - min) / (max - min)

Let me show you with real examples:


Example 1: Aperture Score

# Real-world aperture range
MIN_APERTURE = 100  # mm (small amateur scope)
MAX_APERTURE = 10000  # mm (large professional scope)

def normalize_aperture(aperture_mm):
    return (aperture_mm - MIN_APERTURE) / (MAX_APERTURE - MIN_APERTURE)

# Examples:
# 200mm scope: (200 - 100) / (10000 - 100) = 0.01
# 2000mm scope: (2000 - 100) / (10000 - 100) = 0.19
# 8000mm scope: (8000 - 100) / (10000 - 100) = 0.80

Example 2: Wavelength Coverage

This one's trickier because wavelengths aren't a single number. Here's how I'd approach it:

Option A: Multi-band Boolean (simpler)

# Check which bands the telescope can observe
def normalize_wavelength(telescope_bands):
    """
    telescope_bands = ["optical", "near_ir"]
    """
    all_bands = ["uv", "optical", "near_ir", "mid_ir", "far_ir"]

    coverage = [1 if band in telescope_bands else 0 for band in all_bands]
    # Returns: [0, 1, 1, 0, 0] for example above

    # Convert to single score: percentage of spectrum covered
    return sum(coverage) / len(all_bands)
    # Example: 2/5 = 0.4

Option B: Weighted Range (more accurate)

# For a telescope with wavelength range 400-2500nm
def normalize_wavelength_range(min_wavelength, max_wavelength):
    FULL_SPECTRUM_MIN = 100   # UV start (nm)
    FULL_SPECTRUM_MAX = 10000 # Far IR end (nm)

    # Calculate coverage as percentage of full spectrum
    coverage = (max_wavelength - min_wavelength) / (FULL_SPECTRUM_MAX - FULL_SPECTRUM_MIN)

    # Clip to 0-1
    return min(1.0, max(0.0, coverage))

# Example: (2500 - 400) / (10000 - 100) = 2100/9900 = 0.21

For tasks: You'd do the same - what wavelength does the observation need?


Example 3: Resolution Score

# Lower arcsec/pixel = better resolution
MIN_RESOLUTION = 0.3  # arcsec/pixel (excellent)
MAX_RESOLUTION = 5.0  # arcsec/pixel (poor)

def normalize_resolution(arcsec_per_pixel):
    # INVERSE because lower is better
    normalized = 1 - ((arcsec_per_pixel - MIN_RESOLUTION) / (MAX_RESOLUTION - MIN_RESOLUTION))

    # Clamp to 0-1
    return max(0.0, min(1.0, normalized))

# Examples:
# 0.5 arcsec/pixel: 1 - ((0.5 - 0.3)/(5.0 - 0.3)) = 1 - 0.04 = 0.96 (excellent)
# 3.0 arcsec/pixel: 1 - ((3.0 - 0.3)/(5.0 - 0.3)) = 1 - 0.57 = 0.43 (mediocre)

Example 4: Location Quality (Seeing Conditions)

This combines weather, light pollution, atmospheric seeing:

def normalize_location_quality(telescope_location_data):
    """
    Factors:
    - Bortle scale (light pollution): 1-9 (lower better)
    - Average seeing (arcseconds): 0.5-3.0 (lower better)
    - Clear nights per year: 0-365
    """

    # Normalize each subfactor
    light_pollution_score = 1 - ((telescope_location_data['bortle'] - 1) / 8)
    seeing_score = 1 - ((telescope_location_data['seeing'] - 0.5) / 2.5)
    weather_score = telescope_location_data['clear_nights'] / 365

    # Weighted average (you decide weights)
    weights = [0.3, 0.4, 0.3]
    location_quality = (
        light_pollution_score * weights[0] +
        seeing_score * weights[1] +
        weather_score * weights[2]
    )

    return location_quality

# Example:
# Bortle 3, Seeing 1.2", 250 clear nights
# LP: 1 - (2/8) = 0.75
# Seeing: 1 - (0.7/2.5) = 0.72
# Weather: 250/365 = 0.68
# Final: 0.75*0.3 + 0.72*0.4 + 0.68*0.3 = 0.717

Example 5: Instrument Versatility

def normalize_instruments(available_instruments):
    """
    available_instruments = ["imaging_camera", "spectrograph", "polarimeter"]
    """
    all_possible = [
        "imaging_camera",
        "spectrograph", 
        "photometer",
        "polarimeter",
        "coronagraph",
        "adaptive_optics"
    ]

    # Simple: what percentage of instruments do you have?
    return len([i for i in available_instruments if i in all_possible]) / len(all_possible)

# Example: 3/6 = 0.5

Complete Telescope Vector Example

class TelescopeVectorizer:
    def __init__(self):
        # Define your normalization ranges
        self.ranges = {
            'aperture': (100, 10000),
            'resolution': (0.3, 5.0),
            'seeing': (0.5, 3.0),
            'bortle': (1, 9),
            'clear_nights': (0, 365)
        }

    def vectorize_telescope(self, telescope_data):
        """
        telescope_data = {
            'aperture_mm': 2000,
            'wavelength_range': (400, 2500),  # nm
            'resolution': 0.8,  # arcsec/pixel
            'location': {
                'bortle': 3,
                'seeing': 1.2,
                'clear_nights': 250
            },
            'instruments': ['imaging_camera', 'spectrograph'],
            'mount_accuracy': 0.95,  # Already 0-1 (tracking precision %)
            'fov': 2.5  # degrees
        }
        """

        vector = []

        # 1. Aperture
        aperture_norm = (telescope_data['aperture_mm'] - self.ranges['aperture'][0]) / \
                       (self.ranges['aperture'][1] - self.ranges['aperture'][0])
        vector.append(aperture_norm)

        # 2. Wavelength coverage
        wl_coverage = (telescope_data['wavelength_range'][1] - telescope_data['wavelength_range'][0]) / 9900
        vector.append(min(1.0, wl_coverage))

        # 3. Resolution (inverse)
        res_norm = 1 - ((telescope_data['resolution'] - self.ranges['resolution'][0]) / \
                       (self.ranges['resolution'][1] - self.ranges['resolution'][0]))
        vector.append(max(0.0, res_norm))

        # 4. Location quality
        loc = telescope_data['location']
        lp_score = 1 - ((loc['bortle'] - 1) / 8)
        seeing_score = 1 - ((loc['seeing'] - 0.5) / 2.5)
        weather_score = loc['clear_nights'] / 365
        location_quality = (lp_score * 0.3 + seeing_score * 0.4 + weather_score * 0.3)
        vector.append(location_quality)

        # 5. Instruments
        all_instruments = ['imaging_camera', 'spectrograph', 'photometer', 'polarimeter', 'coronagraph', 'adaptive_optics']
        instrument_score = len([i for i in telescope_data['instruments'] if i in all_instruments]) / len(all_instruments)
        vector.append(instrument_score)

        # 6. Mount precision (already normalized)
        vector.append(telescope_data['mount_accuracy'])

        # 7. Field of view
        fov_norm = min(1.0, telescope_data['fov'] / 10.0)  # Assuming max useful FOV = 10 degrees
        vector.append(fov_norm)

        return vector

# Example usage:
vectorizer = TelescopeVectorizer()
telescope_vector = vectorizer.vectorize_telescope({
    'aperture_mm': 2000,
    'wavelength_range': (400, 2500),
    'resolution': 0.8,
    'location': {'bortle': 3, 'seeing': 1.2, 'clear_nights': 250},
    'instruments': ['imaging_camera', 'spectrograph'],
    'mount_accuracy': 0.95,
    'fov': 2.5
})

print(telescope_vector)
# Output: [0.19, 0.21, 0.89, 0.72, 0.33, 0.95, 0.25]

Vectorizing Tasks

Same dimensions, different interpretation:

def vectorize_task(task_requirements):
    """
    task_requirements = {
        'min_aperture': 1500,  # Needs at least 1500mm
        'wavelength_needed': (500, 900),  # Optical only
        'required_resolution': 1.0,  # Max 1.0 arcsec/pixel
        'location_constraint': None,  # or specific: {'min_clear_probability': 0.7}
        'required_instruments': ['imaging_camera'],
        'tracking_precision_needed': 0.90,
        'min_fov': 1.0
    }
    """

    vector = []

    # Aperture requirement (normalized)
    vector.append((task_requirements['min_aperture'] - 100) / 9900)

    # Wavelength (what's needed)
    wl_range = task_requirements['wavelength_needed']
    vector.append((wl_range[1] - wl_range[0]) / 9900)

    # Resolution requirement
    vector.append(1 - ((task_requirements['required_resolution'] - 0.3) / 4.7))

    # Location (if constrained, otherwise neutral 0.5)
    if task_requirements['location_constraint']:
        vector.append(task_requirements['location_constraint']['min_clear_probability'])
    else:
        vector.append(0.5)  # Neutral - any location okay

    # Instruments needed
    vector.append(len(task_requirements['required_instruments']) / 6)

    # Mount precision
    vector.append(task_requirements['tracking_precision_needed'])

    # FOV needed
    vector.append(min(1.0, task_requirements['min_fov'] / 10.0))

    return vector

Comparing Vectors: Cosine Similarity

Once you have both vectors, you need to measure "how well does telescope match task?"

import numpy as np

def cosine_similarity(vec_a, vec_b):
    """
    Returns a score from 0 (completely different) to 1 (identical)
    """
    vec_a = np.array(vec_a)
    vec_b = np.array(vec_b)

    dot_product = np.dot(vec_a, vec_b)
    magnitude_a = np.linalg.norm(vec_a)
    magnitude_b = np.linalg.norm(vec_b)

    if magnitude_a == 0 or magnitude_b == 0:
        return 0.0

    return dot_product / (magnitude_a * magnitude_b)

# Example:
telescope_vec = [0.19, 0.21, 0.89, 0.72, 0.33, 0.95, 0.25]
task_vec = [0.14, 0.04, 0.85, 0.50, 0.17, 0.90, 0.10]

similarity = cosine_similarity(telescope_vec, task_vec)
print(f"Match score: {similarity:.3f}")  # e.g., 0.923 = 92.3% match

Minimum Threshold Check

Before bidding, check if telescope meets minimums:

def meets_minimum_requirements(telescope_vec, task_vec, min_threshold=0.7):
    """
    Element-wise check: is telescope >= task requirement for each dimension?
    Then overall similarity must exceed threshold
    """

    # Check critical dimensions individually
    critical_dims = [0, 1, 2]  # aperture, wavelength, resolution

    for dim in critical_dims:
        if telescope_vec[dim] < task_vec[dim]:
            return False  # Hard fail on critical requirements

    # Overall match check
    similarity = cosine_similarity(telescope_vec, task_vec)
    return similarity >= min_threshold

Questions for Refinement:

  1. Which dimensions matter most for your telescopes? Should I adjust the list?

  2. Wavelength handling: Do you want the simple boolean bands or the continuous range approach?

  3. Weighting: Should some dimensions matter more? (e.g., aperture 2x more important than FOV?) We can add weights:

    weights = [2.0, 1.5, 1.5, 1.0, 1.0, 1.2, 0.8]  # Multiply each dimension
    weighted_vec = [v * w for v, w in zip(vector, weights)]
    
  4. Minimum thresholds: Should this be a global constant (0.7) or per-task configurable?

Want me to write the actual Python service code for the Vector Similarity Service now?

Weighted Vector Comparison - The Solution

You've hit on the key challenge: not all dimensions matter equally for every task!


The Problem

Example scenario:

  • Task A (Galaxy imaging): Needs huge aperture, doesn't care much about resolution or FOV
  • Task B (Exoplanet detection): Needs extreme precision/resolution, moderate aperture is fine

If we use simple cosine similarity, both dimensions are treated equally. A telescope with great resolution but small aperture would score equally for both tasks, which is wrong!


Solution: Per-Task Dimension Weights

Instead of a single vector, each task defines its own importance weights for each dimension.

Conceptual Example

# Task: Deep sky galaxy imaging
task_weights = [
    2.5,  # aperture - CRITICAL (need light gathering)
    1.0,  # wavelength - moderate
    0.5,  # resolution - less important (galaxies are large)
    1.5,  # location - important (long exposures need dark skies)
    0.3,  # instruments - basic camera is fine
    1.0,  # mount precision - moderate
    0.8   # FOV - somewhat important (want to capture full galaxy)
]

# Task: Exoplanet transit detection
task_weights = [
    1.0,  # aperture - moderate (target stars are bright)
    0.8,  # wavelength - specific but not critical
    2.5,  # resolution - CRITICAL (need to isolate star)
    2.0,  # location - CRITICAL (can't have clouds during transit)
    1.8,  # instruments - need photometer/spectrograph
    2.5,  # mount precision - CRITICAL (long stable tracking)
    0.3   # FOV - don't care (single point target)
]

Implementation Approaches

Apply task-specific weights before calculating similarity:

import numpy as np

def weighted_cosine_similarity(telescope_vec, task_vec, task_weights):
    """
    Multiply each dimension by its importance weight before comparison

    Args:
        telescope_vec: [0.19, 0.21, 0.89, ...]
        task_vec: [0.14, 0.04, 0.85, ...]
        task_weights: [2.5, 1.0, 0.5, ...] - importance multipliers

    Returns:
        Similarity score 0-1
    """
    telescope_vec = np.array(telescope_vec)
    task_vec = np.array(task_vec)
    task_weights = np.array(task_weights)

    # Apply weights to both vectors
    weighted_telescope = telescope_vec * task_weights
    weighted_task = task_vec * task_weights

    # Standard cosine similarity on weighted vectors
    dot_product = np.dot(weighted_telescope, weighted_task)
    magnitude_telescope = np.linalg.norm(weighted_telescope)
    magnitude_task = np.linalg.norm(weighted_task)

    if magnitude_telescope == 0 or magnitude_task == 0:
        return 0.0

    return dot_product / (magnitude_telescope * magnitude_task)


# Example usage:
telescope = [0.19, 0.21, 0.89, 0.72, 0.33, 0.95, 0.25]
task = [0.14, 0.04, 0.85, 0.50, 0.17, 0.90, 0.10]

# Galaxy imaging weights (aperture matters most)
galaxy_weights = [2.5, 1.0, 0.5, 1.5, 0.3, 1.0, 0.8]
galaxy_score = weighted_cosine_similarity(telescope, task, galaxy_weights)

# Exoplanet weights (precision matters most)
exoplanet_weights = [1.0, 0.8, 2.5, 2.0, 1.8, 2.5, 0.3]
exoplanet_score = weighted_cosine_similarity(telescope, task, exoplanet_weights)

print(f"Galaxy imaging fit: {galaxy_score:.3f}")
print(f"Exoplanet detection fit: {exoplanet_score:.3f}")

Why this works:

  • Dimensions with higher weights contribute more to the final score
  • A telescope excellent in high-weight dimensions will score better
  • Still normalized 0-1, so scores are comparable

Approach 2: Weighted Euclidean Distance (Alternative)

Measures "distance" between vectors - lower is better match:

def weighted_euclidean_distance(telescope_vec, task_vec, task_weights):
    """
    Calculate weighted distance - lower = better match
    """
    telescope_vec = np.array(telescope_vec)
    task_vec = np.array(task_vec)
    task_weights = np.array(task_weights)

    # Squared differences, weighted
    differences = (telescope_vec - task_vec) ** 2
    weighted_diff = differences * task_weights

    distance = np.sqrt(np.sum(weighted_diff))

    # Convert to similarity score (0-1, higher better)
    # Assuming max distance ≈ sqrt(sum(weights))
    max_distance = np.sqrt(np.sum(task_weights))
    similarity = 1 - (distance / max_distance)

    return max(0.0, similarity)

When to use this:

  • If you care about absolute differences (e.g., "telescope is 0.3 worse than needed in aperture")
  • Cosine similarity cares more about direction/proportion

How Task Submitters Define Weights

Option 1: Predefined Task Categories (Easiest for users)

Users pick a category, system applies preset weights:

TASK_CATEGORIES = {
    "deep_sky_imaging": {
        "description": "Faint galaxies, nebulae, star clusters",
        "weights": [2.5, 1.0, 0.5, 1.5, 0.3, 1.0, 0.8]
    },
    "planetary": {
        "description": "Planets, Moon, solar system objects",
        "weights": [1.2, 1.0, 2.5, 0.8, 0.5, 2.0, 0.3]
    },
    "exoplanet_transit": {
        "description": "High-precision photometry",
        "weights": [1.0, 0.8, 2.5, 2.0, 1.8, 2.5, 0.3]
    },
    "spectroscopy": {
        "description": "Spectrum analysis of stars/galaxies",
        "weights": [1.5, 2.0, 1.0, 1.2, 2.5, 1.5, 0.5]
    },
    "survey_wide_field": {
        "description": "Large area sky surveys",
        "weights": [0.8, 1.0, 0.5, 1.0, 0.5, 1.2, 2.5]
    },
    "time_domain": {
        "description": "Supernovae, variable stars, transients",
        "weights": [1.0, 1.0, 1.0, 2.5, 1.0, 1.5, 1.0]
    }
}

# In task submission:
task_data = {
    "category": "deep_sky_imaging",
    "requirements": {...},
    # Weights automatically pulled from category
}

Option 2: Custom Weight Sliders (More flexible)

UI lets users adjust importance:

Task Submission Form:
━━━━━━━━━━━━━━━━━━━━━━━━━━━
Aperture importance:      ████████░░  (8/10)
Wavelength importance:    ███████░░░  (7/10)
Resolution importance:    ██████████  (10/10) ← Critical!
Location importance:      ██████░░░░  (6/10)
...

Backend converts slider values (1-10) to weights (0.1-3.0):

def slider_to_weight(slider_value):
    """
    Slider: 1-10
    Output: 0.1-3.0 (exponential scale for emphasis)
    """
    return 0.1 + (slider_value - 1) * 0.32
    # 1 → 0.1, 5 → 1.4, 10 → 3.0

Option 3: Automatic Weight Inference (ML-powered, future)

System learns weights from historical data:

# After many tasks completed, analyze which dimensions correlated with success
# "Galaxy imaging tasks with high aperture match → 95% success rate"
# "Galaxy imaging tasks with low aperture match → 40% success rate"
# → Increase aperture weight for galaxy tasks

Updated Task Vector Structure

class Task:
    def __init__(self, requirements, category=None, custom_weights=None):
        self.id = generate_id()
        self.requirements_vector = vectorize_requirements(requirements)

        # Determine weights
        if custom_weights:
            self.dimension_weights = custom_weights
        elif category:
            self.dimension_weights = TASK_CATEGORIES[category]["weights"]
        else:
            self.dimension_weights = [1.0] * len(self.requirements_vector)  # Equal weights

        self.min_threshold = 0.7  # Overall similarity threshold

# Example task creation:
task = Task(
    requirements={
        'min_aperture': 2000,
        'wavelength_range': (400, 700),
        'required_resolution': 0.8,
        # ... other requirements
    },
    category="deep_sky_imaging"
)

Updated Bidding Logic

class BiddingService:
    def evaluate_bid(self, telescope_vector, task):
        """
        Calculate bid score using task-specific weights
        """
        # Get weighted similarity
        similarity = weighted_cosine_similarity(
            telescope_vector,
            task.requirements_vector,
            task.dimension_weights
        )

        # Check minimum threshold
        if similarity < task.min_threshold:
            return None  # Cannot bid

        # Optional: Check hard requirements (must-haves)
        if not self._meets_hard_requirements(telescope_vector, task):
            return None

        return {
            'telescope_id': telescope_id,
            'bid_score': similarity,
            'bid_vector': telescope_vector,
            'timestamp': now()
        }

    def _meets_hard_requirements(self, telescope_vec, task):
        """
        Some dimensions might have absolute minimums regardless of weights
        """
        # Example: If task requires aperture >= 0.15 (normalized)
        if task.requirements_vector[0] > telescope_vec[0]:
            return False  # Telescope too small

        # Add other hard checks here
        return True

Visualizing The Difference

Let's see how weights change rankings:

# Three telescopes
telescopes = {
    "BigScope": [0.80, 0.20, 0.60, 0.90, 0.30, 0.70, 0.20],  # Huge aperture, poor resolution
    "PreciseScope": [0.30, 0.60, 0.95, 0.70, 0.80, 0.95, 0.40],  # Small but precise
    "BalancedScope": [0.50, 0.50, 0.70, 0.60, 0.60, 0.70, 0.60]   # Average everything
}

task_vec = [0.40, 0.50, 0.70, 0.60, 0.50, 0.70, 0.50]

# Galaxy imaging (aperture-heavy)
galaxy_weights = [2.5, 1.0, 0.5, 1.5, 0.3, 1.0, 0.8]

# Exoplanet (precision-heavy)
exoplanet_weights = [1.0, 0.8, 2.5, 2.0, 1.8, 2.5, 0.3]

print("GALAXY IMAGING SCORES:")
for name, vec in telescopes.items():
    score = weighted_cosine_similarity(vec, task_vec, galaxy_weights)
    print(f"{name}: {score:.3f}")

print("\nEXOPLANET DETECTION SCORES:")
for name, vec in telescopes.items():
    score = weighted_cosine_similarity(vec, task_vec, exoplanet_weights)
    print(f"{name}: {score:.3f}")

# Output might be:
# GALAXY IMAGING SCORES:
# BigScope: 0.892      ← WINS (huge aperture dominates)
# PreciseScope: 0.721
# BalancedScope: 0.805

# EXOPLANET DETECTION SCORES:
# BigScope: 0.723
# PreciseScope: 0.951  ← WINS (precision dominates)
# BalancedScope: 0.834

See how the same telescopes rank differently based on task priorities!


Database Schema Update

-- Tasks table
CREATE TABLE tasks (
    id UUID PRIMARY KEY,
    submitter_id UUID,
    category VARCHAR(50),  -- deep_sky_imaging, planetary, etc.
    requirements_vector FLOAT[],  -- [0.14, 0.04, 0.85, ...]
    dimension_weights FLOAT[],    -- [2.5, 1.0, 0.5, ...] ← NEW!
    min_threshold FLOAT DEFAULT 0.7,
    auction_duration INTEGER,
    status VARCHAR(20),
    created_at TIMESTAMP
);

-- Bids table
CREATE TABLE bids (
    id UUID PRIMARY KEY,
    task_id UUID REFERENCES tasks(id),
    telescope_id UUID,
    telescope_vector FLOAT[],
    weighted_score FLOAT,  -- Pre-calculated using task weights
    submitted_at TIMESTAMP
);

API Example

# Task submission endpoint
@app.post("/tasks")
def create_task(task_data: TaskCreate):
    # Vectorize requirements
    requirements_vector = vectorize_task_requirements(task_data.requirements)

    # Get or create weights
    if task_data.custom_weights:
        weights = task_data.custom_weights
    elif task_data.category:
        weights = TASK_CATEGORIES[task_data.category]["weights"]
    else:
        weights = [1.0] * len(requirements_vector)

    task = Task(
        id=uuid4(),
        requirements_vector=requirements_vector,
        dimension_weights=weights,
        min_threshold=task_data.min_threshold or 0.7,
        category=task_data.category
    )

    # Save and trigger auction
    db.save(task)
    auction_engine.create_auction(task)

    return {"task_id": task.id}


# Bidding evaluation
@app.post("/bids/evaluate")
def evaluate_bid(telescope_id: str, task_id: str):
    telescope = db.get_telescope(telescope_id)
    task = db.get_task(task_id)

    score = weighted_cosine_similarity(
        telescope.capability_vector,
        task.requirements_vector,
        task.dimension_weights  # ← Uses task-specific weights!
    )

    if score < task.min_threshold:
        return {"can_bid": False, "reason": "Below threshold"}

    return {
        "can_bid": True,
        "bid_score": score,
        "rank_estimate": estimate_rank(score, task_id)
    }

Your Questions:

  1. Should weights be exposed to task submitters or hidden in categories?

    • Categories = easier UX for non-technical users
    • Custom sliders = power users get fine control
    • Both options?
    • Weight ranges: Should we limit weights (e.g., 0.1-3.0) or allow extreme values (0.1-10.0)?
  2. Default weights: If a task submitter doesn't specify, should we default to equal weights [1.0, 1.0, ...] or try to infer from task description using NLP?

  3. Weight validation: Should we enforce that weights sum to a certain value, or allow any positive values?

Want me to code the complete weighted similarity service now?

Complete Vector Conversion Implementation Guide

I'll give you production-ready code you can use today, with clear examples for both telescopes and tasks.


Part 1: Configuration & Constants

First, define your normalization ranges and schema:

# vector_config.py

"""
Vector Configuration
Defines all dimensions, their ranges, and normalization parameters
"""

from typing import Dict, Tuple, List
from dataclasses import dataclass

@dataclass
class DimensionConfig:
    """Configuration for a single vector dimension"""
    name: str
    index: int
    min_value: float
    max_value: float
    inverse: bool = False  # True if lower values are better
    description: str = ""

class VectorConfig:
    """
    Central configuration for all vector dimensions
    """

    # Define all dimensions
    DIMENSIONS = [
        DimensionConfig(
            name="aperture",
            index=0,
            min_value=100.0,      # mm
            max_value=10000.0,    # mm
            inverse=False,
            description="Telescope aperture (light gathering power)"
        ),
        DimensionConfig(
            name="wavelength_coverage",
            index=1,
            min_value=0.0,        # Normalized coverage
            max_value=1.0,
            inverse=False,
            description="Proportion of electromagnetic spectrum covered"
        ),
        DimensionConfig(
            name="resolution",
            index=2,
            min_value=0.3,        # arcsec/pixel
            max_value=5.0,        # arcsec/pixel
            inverse=True,         # Lower is better
            description="Image resolution capability"
        ),
        DimensionConfig(
            name="location_quality",
            index=3,
            min_value=0.0,        # Composite score
            max_value=1.0,
            inverse=False,
            description="Site quality (seeing, weather, light pollution)"
        ),
        DimensionConfig(
            name="instrument_versatility",
            index=4,
            min_value=0.0,        # Proportion of instruments
            max_value=1.0,
            inverse=False,
            description="Available instrument suite"
        ),
        DimensionConfig(
            name="mount_precision",
            index=5,
            min_value=0.0,        # Tracking accuracy (0-1)
            max_value=1.0,
            inverse=False,
            description="Mount tracking precision"
        ),
        DimensionConfig(
            name="field_of_view",
            index=6,
            min_value=0.1,        # degrees
            max_value=10.0,       # degrees
            inverse=False,
            description="Observable sky area"
        )
    ]

    # Quick access
    DIMENSION_COUNT = len(DIMENSIONS)
    DIMENSION_NAMES = [d.name for d in DIMENSIONS]

    # Wavelength bands (for wavelength dimension)
    WAVELENGTH_BANDS = {
        "uv": (100, 400),         # nm
        "optical": (400, 700),    # nm
        "near_ir": (700, 2500),   # nm
        "mid_ir": (2500, 25000),  # nm
        "far_ir": (25000, 350000) # nm
    }

    # Full spectrum range
    SPECTRUM_MIN = 100      # nm (UV)
    SPECTRUM_MAX = 350000   # nm (Far IR)

    # Instrument types
    INSTRUMENT_TYPES = [
        "imaging_camera",
        "spectrograph",
        "photometer",
        "polarimeter",
        "coronagraph",
        "adaptive_optics"
    ]

    # Bortle scale (light pollution)
    BORTLE_MIN = 1  # Excellent dark sky
    BORTLE_MAX = 9  # Inner city

    # Seeing quality
    SEEING_MIN = 0.5   # arcseconds (excellent)
    SEEING_MAX = 3.0   # arcseconds (poor)

    # Clear nights per year
    CLEAR_NIGHTS_MIN = 0
    CLEAR_NIGHTS_MAX = 365

    @classmethod
    def get_dimension(cls, name: str) -> DimensionConfig:
        """Get dimension config by name"""
        for dim in cls.DIMENSIONS:
            if dim.name == name:
                return dim
        raise ValueError(f"Unknown dimension: {name}")

Part 2: Core Normalization Functions

# vector_utils.py

"""
Core normalization utilities
"""

import numpy as np
from typing import Union, List
from vector_config import VectorConfig

def normalize_value(
    value: float,
    min_val: float,
    max_val: float,
    inverse: bool = False
) -> float:
    """
    Normalize a value to 0-1 range

    Args:
        value: Raw value to normalize
        min_val: Minimum possible value
        max_val: Maximum possible value
        inverse: If True, higher raw values produce lower normalized values

    Returns:
        Normalized value between 0 and 1
    """
    # Clamp to range first
    value = max(min_val, min(max_val, value))

    # Calculate normalized value
    if max_val == min_val:
        return 0.5  # Avoid division by zero

    normalized = (value - min_val) / (max_val - min_val)

    # Invert if specified (for metrics where lower is better)
    if inverse:
        normalized = 1.0 - normalized

    return float(np.clip(normalized, 0.0, 1.0))


def normalize_aperture(aperture_mm: float) -> float:
    """Normalize telescope aperture"""
    dim = VectorConfig.get_dimension("aperture")
    return normalize_value(aperture_mm, dim.min_value, dim.max_value, dim.inverse)


def normalize_resolution(arcsec_per_pixel: float) -> float:
    """Normalize image resolution (inverse: lower is better)"""
    dim = VectorConfig.get_dimension("resolution")
    return normalize_value(arcsec_per_pixel, dim.min_value, dim.max_value, dim.inverse)


def normalize_fov(fov_degrees: float) -> float:
    """Normalize field of view"""
    dim = VectorConfig.get_dimension("field_of_view")
    return normalize_value(fov_degrees, dim.min_value, dim.max_value, dim.inverse)


def normalize_wavelength_range(min_wavelength: float, max_wavelength: float) -> float:
    """
    Normalize wavelength coverage as a proportion of full spectrum

    Args:
        min_wavelength: Minimum wavelength in nm
        max_wavelength: Maximum wavelength in nm

    Returns:
        Coverage score 0-1
    """
    coverage = max_wavelength - min_wavelength
    full_spectrum = VectorConfig.SPECTRUM_MAX - VectorConfig.SPECTRUM_MIN

    # Calculate proportion covered
    score = coverage / full_spectrum

    return float(np.clip(score, 0.0, 1.0))


def normalize_wavelength_bands(bands: List[str]) -> float:
    """
    Alternative: Normalize wavelength by counting covered bands

    Args:
        bands: List of band names, e.g., ["optical", "near_ir"]

    Returns:
        Proportion of bands covered (0-1)
    """
    all_bands = list(VectorConfig.WAVELENGTH_BANDS.keys())
    covered = len([b for b in bands if b in all_bands])

    return covered / len(all_bands)


def normalize_location_quality(
    bortle: int,
    seeing: float,
    clear_nights: int,
    weights: List[float] = [0.3, 0.4, 0.3]
) -> float:
    """
    Normalize location quality from multiple factors

    Args:
        bortle: Bortle scale (1-9, lower is better)
        seeing: Average seeing in arcseconds (lower is better)
        clear_nights: Clear nights per year (0-365)
        weights: Weights for [light_pollution, seeing, weather]

    Returns:
        Composite location quality score 0-1
    """
    # Light pollution (inverse: lower Bortle is better)
    lp_score = normalize_value(
        bortle,
        VectorConfig.BORTLE_MIN,
        VectorConfig.BORTLE_MAX,
        inverse=True
    )

    # Seeing (inverse: lower arcsec is better)
    seeing_score = normalize_value(
        seeing,
        VectorConfig.SEEING_MIN,
        VectorConfig.SEEING_MAX,
        inverse=True
    )

    # Weather (more clear nights is better)
    weather_score = normalize_value(
        clear_nights,
        VectorConfig.CLEAR_NIGHTS_MIN,
        VectorConfig.CLEAR_NIGHTS_MAX,
        inverse=False
    )

    # Weighted average
    total = (
        lp_score * weights[0] +
        seeing_score * weights[1] +
        weather_score * weights[2]
    )

    return float(np.clip(total, 0.0, 1.0))


def normalize_instruments(available_instruments: List[str]) -> float:
    """
    Normalize instrument versatility

    Args:
        available_instruments: List of available instrument types

    Returns:
        Proportion of possible instruments available (0-1)
    """
    all_instruments = VectorConfig.INSTRUMENT_TYPES
    available_count = len([i for i in available_instruments if i in all_instruments])

    return available_count / len(all_instruments)


def normalize_mount_precision(tracking_accuracy: float) -> float:
    """
    Normalize mount precision (already 0-1, just validate)

    Args:
        tracking_accuracy: Tracking accuracy as percentage (0.0-1.0)

    Returns:
        Validated mount precision 0-1
    """
    return float(np.clip(tracking_accuracy, 0.0, 1.0))

Part 3: Telescope Vectorizer

# telescope_vectorizer.py

"""
Convert telescope specifications to normalized vectors
"""

from typing import Dict, List, Optional
import numpy as np
from vector_config import VectorConfig
from vector_utils import (
    normalize_aperture,
    normalize_wavelength_range,
    normalize_wavelength_bands,
    normalize_resolution,
    normalize_location_quality,
    normalize_instruments,
    normalize_mount_precision,
    normalize_fov
)


class TelescopeVectorizer:
    """
    Converts telescope specifications to normalized capability vectors
    """

    def __init__(self):
        self.dimension_count = VectorConfig.DIMENSION_COUNT

    def vectorize(self, telescope_data: Dict) -> List[float]:
        """
        Convert telescope data to normalized vector

        Args:
            telescope_data: Dictionary containing telescope specifications

        Expected format:
        {
            'aperture_mm': 2000,
            'wavelength_range': (400, 2500),  # Option 1: continuous range
            'wavelength_bands': ["optical", "near_ir"],  # Option 2: discrete bands
            'resolution': 0.8,  # arcsec/pixel
            'location': {
                'bortle': 3,
                'seeing': 1.2,
                'clear_nights': 250
            },
            'instruments': ['imaging_camera', 'spectrograph'],
            'mount_accuracy': 0.95,  # 0-1
            'fov': 2.5  # degrees
        }

        Returns:
            List of 7 normalized values [0-1]
        """
        vector = []

        # 1. Aperture
        aperture = telescope_data.get('aperture_mm', 0)
        vector.append(normalize_aperture(aperture))

        # 2. Wavelength coverage
        if 'wavelength_range' in telescope_data:
            min_wl, max_wl = telescope_data['wavelength_range']
            vector.append(normalize_wavelength_range(min_wl, max_wl))
        elif 'wavelength_bands' in telescope_data:
            bands = telescope_data['wavelength_bands']
            vector.append(normalize_wavelength_bands(bands))
        else:
            vector.append(0.0)  # No wavelength info

        # 3. Resolution
        resolution = telescope_data.get('resolution', 5.0)  # Default to worst
        vector.append(normalize_resolution(resolution))

        # 4. Location quality
        if 'location' in telescope_data:
            loc = telescope_data['location']
            location_score = normalize_location_quality(
                bortle=loc.get('bortle', 5),
                seeing=loc.get('seeing', 2.0),
                clear_nights=loc.get('clear_nights', 180)
            )
            vector.append(location_score)
        else:
            vector.append(0.5)  # Neutral default

        # 5. Instruments
        instruments = telescope_data.get('instruments', [])
        vector.append(normalize_instruments(instruments))

        # 6. Mount precision
        mount_acc = telescope_data.get('mount_accuracy', 0.5)
        vector.append(normalize_mount_precision(mount_acc))

        # 7. Field of view
        fov = telescope_data.get('fov', 1.0)
        vector.append(normalize_fov(fov))

        return vector

    def vectorize_batch(self, telescopes: List[Dict]) -> np.ndarray:
        """
        Vectorize multiple telescopes at once

        Args:
            telescopes: List of telescope data dictionaries

        Returns:
            NumPy array of shape (n_telescopes, n_dimensions)
        """
        vectors = [self.vectorize(t) for t in telescopes]
        return np.array(vectors)

    def vector_to_dict(self, vector: List[float]) -> Dict[str, float]:
        """
        Convert vector back to labeled dictionary for debugging

        Args:
            vector: Normalized vector

        Returns:
            Dictionary mapping dimension names to values
        """
        return {
            dim.name: vector[dim.index]
            for dim in VectorConfig.DIMENSIONS
        }


# Example usage
if __name__ == "__main__":
    vectorizer = TelescopeVectorizer()

    # Example telescope
    telescope = {
        'aperture_mm': 2000,
        'wavelength_range': (400, 2500),
        'resolution': 0.8,
        'location': {
            'bortle': 3,
            'seeing': 1.2,
            'clear_nights': 250
        },
        'instruments': ['imaging_camera', 'spectrograph'],
        'mount_accuracy': 0.95,
        'fov': 2.5
    }

    vec = vectorizer.vectorize(telescope)
    print("Telescope vector:", vec)
    print("\nLabeled:")
    for name, value in vectorizer.vector_to_dict(vec).items():
        print(f"  {name}: {value:.3f}")

Part 4: Task Vectorizer

# task_vectorizer.py

"""
Convert task requirements to normalized vectors
"""

from typing import Dict, List, Optional
import numpy as np
from vector_config import VectorConfig
from vector_utils import (
    normalize_aperture,
    normalize_wavelength_range,
    normalize_resolution,
    normalize_instruments,
    normalize_mount_precision,
    normalize_fov
)


class TaskVectorizer:
    """
    Converts task requirements to normalized requirement vectors
    """

    def __init__(self):
        self.dimension_count = VectorConfig.DIMENSION_COUNT

    def vectorize(self, task_data: Dict) -> List[float]:
        """
        Convert task requirements to normalized vector

        Args:
            task_data: Dictionary containing task requirements

        Expected format:
        {
            'min_aperture': 1500,  # Minimum required aperture
            'wavelength_needed': (500, 900),  # Required wavelength range
            'max_resolution': 1.0,  # Maximum acceptable arcsec/pixel
            'location_constraint': {
                'min_clear_probability': 0.7  # Optional
            },
            'required_instruments': ['imaging_camera'],
            'min_tracking_precision': 0.90,
            'min_fov': 1.0,
            'target_type': 'deep_sky'  # Optional category hint
        }

        Returns:
            List of 7 normalized requirement values [0-1]
        """
        vector = []

        # 1. Minimum aperture requirement
        min_aperture = task_data.get('min_aperture', 100)  # Default to minimum
        vector.append(normalize_aperture(min_aperture))

        # 2. Wavelength requirement
        if 'wavelength_needed' in task_data:
            min_wl, max_wl = task_data['wavelength_needed']
            vector.append(normalize_wavelength_range(min_wl, max_wl))
        else:
            vector.append(0.5)  # Neutral - any wavelength okay

        # 3. Maximum resolution requirement (convert to "quality needed")
        # If task needs 1.0 arcsec/pixel max, normalize it
        max_resolution = task_data.get('max_resolution', 5.0)
        vector.append(normalize_resolution(max_resolution))

        # 4. Location constraint
        if 'location_constraint' in task_data:
            # Use specified minimum quality
            loc_quality = task_data['location_constraint'].get('min_clear_probability', 0.5)
            vector.append(loc_quality)
        else:
            vector.append(0.3)  # Low requirement - location doesn't matter much

        # 5. Required instruments
        required_instruments = task_data.get('required_instruments', [])
        vector.append(normalize_instruments(required_instruments))

        # 6. Minimum tracking precision
        min_precision = task_data.get('min_tracking_precision', 0.5)
        vector.append(normalize_mount_precision(min_precision))

        # 7. Minimum field of view
        min_fov = task_data.get('min_fov', 0.1)
        vector.append(normalize_fov(min_fov))

        return vector

    def vectorize_with_category(self, task_data: Dict, category: str) -> List[float]:
        """
        Vectorize task and also return suggested weights for the category

        Args:
            task_data: Task requirements
            category: Task category (e.g., 'deep_sky_imaging')

        Returns:
            Tuple of (requirement_vector, suggested_weights)
        """
        vector = self.vectorize(task_data)
        weights = self._get_category_weights(category)

        return vector, weights

    def _get_category_weights(self, category: str) -> List[float]:
        """
        Get importance weights for a task category

        Args:
            category: Task category name

        Returns:
            List of dimension weights
        """
        # Predefined weights for common task types
        CATEGORY_WEIGHTS = {
            'deep_sky_imaging': [2.5, 1.0, 0.5, 1.5, 0.3, 1.0, 0.8],
            'planetary': [1.2, 1.0, 2.5, 0.8, 0.5, 2.0, 0.3],
            'exoplanet_transit': [1.0, 0.8, 2.5, 2.0, 1.8, 2.5, 0.3],
            'spectroscopy': [1.5, 2.0, 1.0, 1.2, 2.5, 1.5, 0.5],
            'wide_field_survey': [0.8, 1.0, 0.5, 1.0, 0.5, 1.2, 2.5],
            'time_domain': [1.0, 1.0, 1.0, 2.5, 1.0, 1.5, 1.0],
        }

        return CATEGORY_WEIGHTS.get(category, [1.0] * self.dimension_count)

    def vector_to_dict(self, vector: List[float]) -> Dict[str, float]:
        """Convert vector to labeled dictionary"""
        return {
            dim.name: vector[dim.index]
            for dim in VectorConfig.DIMENSIONS
        }


# Example usage
if __name__ == "__main__":
    vectorizer = TaskVectorizer()

    # Example task
    task = {
        'min_aperture': 1500,
        'wavelength_needed': (500, 900),
        'max_resolution': 1.0,
        'location_constraint': {'min_clear_probability': 0.7},
        'required_instruments': ['imaging_camera'],
        'min_tracking_precision': 0.90,
        'min_fov': 1.0
    }

    vec, weights = vectorizer.vectorize_with_category(task, 'deep_sky_imaging')

    print("Task requirements vector:", vec)
    print("\nCategory weights (deep_sky_imaging):", weights)
    print("\nLabeled requirements:")
    for name, value in vectorizer.vector_to_dict(vec).items():
        print(f"  {name}: {value:.3f}")

Part 5: Similarity Calculation

# similarity.py

"""
Vector similarity and matching functions
"""

import numpy as np
from typing import List, Tuple, Optional


def cosine_similarity(vec_a: List[float], vec_b: List[float]) -> float:
    """
    Calculate cosine similarity between two vectors

    Args:
        vec_a: First vector
        vec_b: Second vector

    Returns:
        Similarity score 0-1 (1 = identical, 0 = orthogonal)
    """
    vec_a = np.array(vec_a)
    vec_b = np.array(vec_b)

    dot_product = np.dot(vec_a, vec_b)
    magnitude_a = np.linalg.norm(vec_a)
    magnitude_b = np.linalg.norm(vec_b)

    if magnitude_a == 0 or magnitude_b == 0:
        return 0.0

    similarity = dot_product / (magnitude_a * magnitude_b)

    # Ensure 0-1 range (shouldn't be needed but safe)
    return float(np.clip(similarity, 0.0, 1.0))


def weighted_cosine_similarity(
    vec_a: List[float],
    vec_b: List[float],
    weights: List[float]
) -> float:
    """
    Calculate weighted cosine similarity

    Args:
        vec_a: First vector (e.g., telescope capabilities)
        vec_b: Second vector (e.g., task requirements)
        weights: Importance weights for each dimension

    Returns:
        Weighted similarity score 0-1
    """
    vec_a = np.array(vec_a)
    vec_b = np.array(vec_b)
    weights = np.array(weights)

    # Apply weights
    weighted_a = vec_a * weights
    weighted_b = vec_b * weights

    # Calculate cosine similarity on weighted vectors
    dot_product = np.dot(weighted_a, weighted_b)
    magnitude_a = np.linalg.norm(weighted_a)
    magnitude_b = np.linalg.norm(weighted_b)

    if magnitude_a == 0 or magnitude_b == 0:
        return 0.0

    similarity = dot_product / (magnitude_a * magnitude_b)

    return float(np.clip(similarity, 0.0, 1.0))


def euclidean_distance(vec_a: List[float], vec_b: List[float]) -> float:
    """
    Calculate Euclidean distance between vectors

    Args:
        vec_a: First vector
        vec_b: Second vector

    Returns:
        Distance value (lower = more similar)
    """
    vec_a = np.array(vec_a)
    vec_b = np.array(vec_b)

    return float(np.linalg.norm(vec_a - vec_b))


def weighted_euclidean_distance(
    vec_a: List[float],
    vec_b: List[float],
    weights: List[float]
) -> float:
    """
    Calculate weighted Euclidean distance

    Args:
        vec_a: First vector
        vec_b: Second vector
        weights: Importance weights for each dimension

    Returns:
        Weighted distance (lower = more similar)
    """
    vec_a = np.array(vec_a)
    vec_b = np.array(vec_b)
    weights = np.array(weights)

    differences = (vec_a - vec_b) ** 2
    weighted_diff = differences * weights

    return float(np.sqrt(np.sum(weighted_diff)))


def meets_minimum_threshold(
    telescope_vec: List[float],
    task_vec: List[float],
    weights: List[float],
    threshold: float = 0.7
) -> Tuple[bool, float]:
    """
    Check if telescope meets minimum requirements for task

    Args:
        telescope_vec: Telescope capability vector
        task_vec: Task requirement vector
        weights: Task-specific dimension weights
        threshold: Minimum similarity score required

    Returns:
        Tuple of (meets_threshold, actual_score)
    """
    score = weighted_cosine_similarity(telescope_vec, task_vec, weights)
    return (score >= threshold, score)


def check_hard_requirements(
    telescope_vec: List[float],
    task_vec: List[float],
    critical_dimensions: Optional[List[int]] = None
) -> Tuple[bool, List[int]]:
    """
    Check if telescope meets hard minimums for critical dimensions

    Args:
        telescope_vec: Telescope capability vector
        task_vec: Task requirement vector
        critical_dimensions: Indices of dimensions with hard requirements
                           Default: [0, 1, 2] (aperture, wavelength, resolution)

    Returns:
        Tuple of (passes_check, list_of_failed_dimensions)
    """
    if critical_dimensions is None:
        critical_dimensions = [0, 1, 2]  # Default critical dims

    failed = []

    for dim in critical_dimensions:
        if telescope_vec[dim] < task_vec[dim]:
            failed.append(dim)

    return (len(failed) == 0, failed)


# Example usage
if __name__ == "__main__":
    # Example vectors
    telescope = [0.19, 0.21, 0.89, 0.72, 0.33, 0.95, 0.25]
    task = [0.14, 0.04, 0.85, 0.50, 0.17, 0.90, 0.10]
    weights = [2.5, 1.0, 0.5, 1.5, 0.3, 1.0, 0.8]

    # Unweighted similarity
    sim = cosine_similarity(telescope, task)
    print(f"Unweighted similarity: {sim:.3f}")

    # Weighted similarity
    weighted_sim = weighted_cosine_similarity(telescope, task, weights)
    print(f"Weighted similarity: {weighted_sim:.3f}")

    # Threshold check
    passes, score = meets_minimum_threshold(telescope, task, weights, threshold=0.7)
    print(f"Passes threshold (0.7): {passes}, Score: {score:.3f}")

    # Hard requirements
    passes_hard, failed_dims = check_hard_requirements(telescope, task)
    print(f"Passes hard requirements: {passes_hard}")
    if failed_dims:
        print(f"Failed dimensions: {failed_dims}")

Part 6: Complete Working Example

# example_usage.py

"""
Complete end-to-end example of vectorization and matching
"""

from telescope_vectorizer import TelescopeVectorizer
from task_vectorizer import TaskVectorizer
from similarity import (
    weighted_cosine_similarity,
    meets_minimum_threshold,
    check_hard_requirements
)
from vector_config import VectorConfig


def main():
    # Initialize vectorizers
    telescope_vectorizer = TelescopeVectorizer()
    task_vectorizer = TaskVectorizer()

    # Define some telescopes
    telescopes = {
        "Hubble-Class": {
            'aperture_mm': 8000,
            'wavelength_range': (200, 2500),
            'resolution': 0.4,
            'location': {'bortle': 1, 'seeing': 0.8, 'clear_nights': 300},
            'instruments': ['imaging_camera', 'spectrograph', 'coronagraph'],
            'mount_accuracy': 0.98,
            'fov': 0.5
        },
        "Mid-Size-Research": {
            'aperture_mm': 2000,
            'wavelength_range': (400, 2500),
            'resolution': 0.8,
            'location': {'bortle': 3, 'seeing': 1.2, 'clear_nights': 250},
            'instruments': ['imaging_camera', 'spectrograph'],
            'mount_accuracy': 0.95,
            'fov': 2.5
        },
        "Wide-Field-Survey": {
            'aperture_mm': 500,
            'wavelength_range': (400, 900),
            'resolution': 2.0,
            'location': {'bortle': 4, 'seeing': 1.5, 'clear_nights': 200},
            'instruments': ['imaging_camera'],
            'mount_accuracy': 0.85,
            'fov': 8.0
        }
    }

    # Vectorize telescopes
    print("="*60)
    print("TELESCOPE VECTORS")
    print("="*60)

    telescope_vectors = {}
    for name, data in telescopes.items():
        vec = telescope_vectorizer.vectorize(data)
        telescope_vectors[name] = vec

        print(f"\n{name}:")
        labeled = telescope_vectorizer.vector_to_dict(vec)
        for dim_name, value in labeled.items():
            print(f"  {dim_name:25s}: {value:.3f}")

    # Define a task
    task_data = {
        'min_aperture': 1500,
        'wavelength_needed': (500, 900),
        'max_resolution': 1.0,
        'location_constraint': {'min_clear_probability': 0.7},
        'required_instruments': ['imaging_camera'],
        'min_tracking_precision': 0.90,
        'min_fov': 1.0
    }

    # Vectorize task with category
    task_vec, task_weights = task_vectorizer.vectorize_with_category(
        task_data,
        'deep_sky_imaging'
    )

    print("\n" + "="*60)
    print("TASK REQUIREMENTS")
    print("="*60)
    print("\nTask vector:")
    labeled_task = task_vectorizer.vector_to_dict(task_vec)
    for dim_name, value in labeled_task.items():
        print(f"  {dim_name:25s}: {value:.3f}")

    print("\nTask weights (deep_sky_imaging):")
    for i, (dim_name, weight) in enumerate(zip(VectorConfig.DIMENSION_NAMES, task_weights)):
        print(f"  {dim_name:25s}: {weight:.1f}")

    # Evaluate matches
    print("\n" + "="*60)
    print("MATCHING RESULTS")
    print("="*60)

    results = []

    for name, tel_vec in telescope_vectors.items():
        # Check hard requirements
        passes_hard, failed_dims = check_hard_requirements(tel_vec, task_vec)

        # Calculate weighted similarity
        score = weighted_cosine_similarity(tel_vec, task_vec, task_weights)

        # Check threshold
        passes_threshold, _ = meets_minimum_threshold(
            tel_vec, task_vec, task_weights, threshold=0.7
        )

        results.append({
            'name': name,
            'score': score,
            'passes_hard': passes_hard,
            'passes_threshold': passes_threshold,
            'failed_dims': failed_dims
        })

    # Sort by score
    results.sort(key=lambda x: x['score'], reverse=True)

    # Display results
    for i, result in enumerate(results, 1):
        print(f"\n{i}. {result['name']}")
        print(f"   Score: {result['score']:.3f}")
        print(f"   Passes hard requirements: {result['passes_hard']}")
        print(f"   Passes threshold (0.7): {result['passes_threshold']}")

        if not result['passes_hard']:
            failed_names = [VectorConfig.DIMENSION_NAMES[d] for d in result['failed_dims']]
            print(f"   Failed dimensions: {', '.join(failed_names)}")

        if result['passes_hard'] and result['passes_threshold']:
            print(f"   ✓ CAN BID")
        else:
            print(f"   ✗ CANNOT BID")


if __name__ == "__main__":
    main()

Part 7: API Integration (FastAPI Example)

# api.py

"""
FastAPI endpoints for vector operations
"""

from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from typing import List, Dict, Optional
from telescope_vectorizer import TelescopeVectorizer
from task_vectorizer import TaskVectorizer
from similarity import weighted_cosine_similarity, meets_minimum_threshold

app = FastAPI()

telescope_vectorizer = TelescopeVectorizer()
task_vectorizer = TaskVectorizer()


class TelescopeData(BaseModel):
    aperture_mm: float
    wavelength_range: tuple[float, float]
    resolution: float
    location: Dict[str, float]
    instruments: List[str]
    mount_accuracy: float
    fov: float


class TaskData(BaseModel):
    min_aperture: float
    wavelength_needed: tuple[float, float]
    max_resolution: float
    location_constraint: Optional[Dict[str, float]] = None
    required_instruments: List[str]
    min_tracking_precision: float
    min_fov: float
    category: str = "general"
    min_threshold: float = 0.7


@app.post("/vectorize/telescope")
def vectorize_telescope(telescope: TelescopeData):
    """Vectorize telescope specifications"""
    try:
        vector = telescope_vectorizer.vectorize(telescope.dict())
        labeled = telescope_vectorizer.vector_to_dict(vector)

        return {
            "vector": vector,
            "labeled": labeled
        }
    except Exception as e:
        raise HTTPException(status_code=400, detail=str(e))


@app.post("/vectorize/task")
def vectorize_task(task: TaskData):
    """Vectorize task requirements"""
    try:
        vector, weights = task_vectorizer.vectorize_with_category(
            task.dict(),
            task.category
        )
        labeled = task_vectorizer.vector_to_dict(vector)

        return {
            "vector": vector,
            "weights": weights,
            "labeled": labeled
        }
    except Exception as e:
        raise HTTPException(status_code=400, detail=str(e))


@app.post("/evaluate/bid")
def evaluate_bid(
    telescope_vector: List[float],
    task_vector: List[float],
    task_weights: List[float],
    threshold: float = 0.7
):
    """Evaluate if telescope can bid on task"""
    try:
        score = weighted_cosine_similarity(
            telescope_vector,
            task_vector,
            task_weights
        )

        can_bid = score >= threshold

        return {
            "can_bid": can_bid,
            "score": score,
            "threshold": threshold
        }
    except Exception as e:
        raise HTTPException(status_code=400, detail=str(e))


if __name__ == "__main__":
    import uvicorn
    uvicorn.run(app, host="0.0.0.0", port=8000)

Testing Your Code Today

# test_vectorization.py

"""
Quick tests to verify everything works
"""

from telescope_vectorizer import TelescopeVectorizer
from task_vectorizer import TaskVectorizer
from similarity import weighted_cosine_similarity

def test_basic_vectorization():
    print("Testing basic vectorization...")

    # Create vectorizers
    tel_vec = TelescopeVectorizer()
    task_vec = TaskVectorizer()

    # Simple telescope
    telescope = {
        'aperture_mm': 2000,
        'wavelength_range': (400, 2500),
        'resolution': 0.8,
        'location': {'bortle': 3, 'seeing': 1.2, 'clear_nights': 250},
        'instruments': ['imaging_camera', 'spectrograph'],
        'mount_accuracy': 0.95,
        'fov': 2.5
    }

    # Simple task
    task = {
        'min_aperture': 1500,
        'wavelength_needed': (500, 900),
        'max_resolution': 1.0,
        'required_instruments': ['imaging_camera'],
        'min_tracking_precision': 0.90,
        'min_fov': 1.0
    }

    # Vectorize
    tel_vector = tel_vec.vectorize(telescope)
    task_vector, weights = task_vec.vectorize_with_category(task, 'deep_sky_imaging')

    # Calculate match
    score = weighted_cosine_similarity(tel_vector, task_vector, weights)

    print(f"✓ Telescope vector: {[f'{v:.2f}' for v in tel_vector]}")
    print(f"✓ Task vector: {[f'{v:.2f}' for v in task_vector]}")
    print(f"✓ Match score: {score:.3f}")
    print(f"✓ Can bid: {score >= 0.7}")

    assert len(tel_vector) == 7, "Wrong vector length"
    assert len(task_vector) == 7, "Wrong vector length"
    assert 0 <= score <= 1, "Score out of range"

    print("\n✓ All tests passed!\n")


if __name__ == "__main__":
    test_basic_vectorization()

What to do today:

  1. Create these files in your project:

    • vector_config.py
    • vector_utils.py
    • telescope_vectorizer.py
    • task_vectorizer.py
    • similarity.py
    • test_vectorization.py
    • Run the test:
    python test_vectorization.py
    
  2. Try the complete example:

    python example_usage.py
    
  3. Customize:

    • Adjust dimension ranges in vector_config.py
    • Add/remove dimensions
    • Tune category weights in task_vectorizer.py

Questions?

  1. Do these dimensions cover your needs or should we add/modify any?
  2. Want me to add database models for storing vectors?
  3. Need help integrating with your existing codebase?

Let me clarify the confusion!

You're absolutely right to be confused - I need to explain the difference between normalization ranges and actual requirements.


The Core Concept (Simplified)

Think of vectorization like converting different units into a common "score" from 0 to 1.

Analogy: School Grades

Imagine you have students from different countries:

  • USA student: 85/100 points
  • UK student: B+ grade
  • India student: 7.5/10 CGPA

You can't compare these directly! So you normalize them all to a 0-1 scale:

  • USA: 0.85 (85/100)
  • UK: 0.80 (B+ mapped to scale)
  • India: 0.75 (7.5/10)

Now you can compare them!


What Are Min/Max Values For?

The min/max values in VectorConfig are reference ranges, not requirements. They answer the question: "What's the range of values we expect to see in the real world?"

Example with Field of View (FOV)

# In VectorConfig
DimensionConfig(
    name="field_of_view",
    min_value=0.1,    # Smallest FOV we expect any telescope to have
    max_value=10.0,   # Largest FOV we expect any telescope to have
)

This means:

  • Not a requirement - just a reference scale
  • We're saying "telescopes in our system will have FOVs between 0.1° and 10°"
  • This lets us convert any real FOV into a 0-1 score

Real Example: FOV Normalization

Scenario 1: Telescope Registration

A telescope owner registers their scope:

telescope_data = {
    'fov': 2.5  # degrees - this is their ACTUAL FOV
}

# Normalization happens:
normalized_fov = (2.5 - 0.1) / (10.0 - 0.1)
               = 2.4 / 9.9
               = 0.24

# Vector gets: [... , 0.24]

Scenario 2: Another Telescope

telescope_data = {
    'fov': 8.0  # degrees - wider field
}

normalized_fov = (8.0 - 0.1) / (10.0 - 0.1)
               = 7.9 / 9.9
               = 0.80

# Vector gets: [... , 0.80]

Now both telescopes have FOV represented on the same 0-1 scale, making comparison fair!


Task Requirements vs. Normalization Ranges

This is the KEY distinction:

Normalization Range (in VectorConfig)

  • Global constants
  • Define the "scale" for conversion
  • Set once for your entire system
  • Example: "FOV ranges from 0.1° to 10° in our network"

Task Requirements (per task)

  • Specific to each observation
  • What THIS task needs
  • Converted using the global scale
  • Example: "This galaxy observation needs at least 1.5° FOV"

Complete Example: FOV Through the System

Step 1: System Setup (You do this ONCE)

You decide on normalization ranges based on your telescope network:

# vector_config.py
# You're saying: "In my network, telescopes have FOVs between 0.1° and 10°"
field_of_view: DimensionConfig(
    min_value=0.1,
    max_value=10.0
)

Where do these numbers come from?

  • Look at your telescope fleet
  • Find the smallest and largest FOV
  • Add some buffer (e.g., if smallest is 0.3°, use 0.1° as min)

Step 2: Telescope Registration

Someone adds their telescope to your network:

# Telescope owner says: "My telescope has 2.5° FOV"
telescope_data = {
    'aperture_mm': 200,
    'fov': 2.5,  # ← Real value in degrees
    # ... other specs
}

# System normalizes it:
vectorizer = TelescopeVectorizer()
vector = vectorizer.vectorize(telescope_data)

# Internally:
# fov_normalized = (2.5 - 0.1) / (10.0 - 0.1) = 0.24

# Vector: [0.01, 0.21, 0.89, 0.72, 0.33, 0.95, 0.24]
#                                                  ↑
#                                                FOV score

The telescope owner never sees "0.24" - they just enter "2.5 degrees"


Step 3: Task Submission

Someone wants to observe a galaxy:

# User says: "I need at least 1.5° FOV to capture the whole galaxy"
task_data = {
    'min_aperture': 150,
    'min_fov': 1.5,  # ← Real requirement in degrees
    # ... other requirements
}

# System normalizes it using THE SAME scale:
task_vectorizer = TaskVectorizer()
task_vector = task_vectorizer.vectorize(task_data)

# Internally:
# fov_required = (1.5 - 0.1) / (10.0 - 0.1) = 0.14

# Task vector: [0.005, 0.04, 0.85, 0.50, 0.17, 0.90, 0.14]
#                                                       ↑
#                                                   FOV requirement

The task submitter never sees "0.14" - they just enter "1.5 degrees"


Step 4: Comparison

Now we can compare:

# Telescope FOV: 0.24 (represents 2.5°)
# Task needs:    0.14 (represents 1.5°)

# Telescope FOV > Task requirement?
# 0.24 > 0.14 ✓ YES

# This telescope CAN handle this task's FOV requirement!

The Full Picture: All Dimensions

Let me show you how EVERY dimension works:

1. Aperture

# Normalization range (global)
min_aperture = 100mm   # Smallest telescope in network
max_aperture = 10000mm # Largest telescope in network

# Telescope A registers
actual_aperture = 200mm
normalized = (200 - 100) / (10000 - 100) = 0.01

# Telescope B registers
actual_aperture = 2000mm
normalized = (2000 - 100) / (10000 - 100) = 0.19

# Task requires
min_aperture_needed = 1500mm
normalized = (1500 - 100) / (10000 - 100) = 0.14

# Check: Telescope A (0.01) < Task (0.14) → CANNOT bid
#        Telescope B (0.19) > Task (0.14) → CAN bid

2. Wavelength Coverage

This one's different because it's about RANGE:

# Normalization (global)
full_spectrum = 100nm to 350,000nm

# Telescope A: Can see UV + Optical
wavelength_range = (100nm, 700nm)
coverage = (700 - 100) / (350000 - 100) = 600 / 349900 = 0.0017

# Telescope B: Can see UV + Optical + IR
wavelength_range = (100nm, 25000nm)
coverage = (25000 - 100) / (350000 - 100) = 24900 / 349900 = 0.071

# Task: Needs optical observations
wavelength_needed = (400nm, 700nm)
coverage = (700 - 400) / (350000 - 100) = 300 / 349900 = 0.0009

# Both telescopes cover more than needed → Both CAN bid

3. Resolution (Inverse Example)

Resolution is inverse - lower arcsec/pixel is BETTER:

# Normalization range (global)
best_resolution = 0.3 arcsec/pixel
worst_resolution = 5.0 arcsec/pixel

# Telescope A: Sharp images
actual_resolution = 0.5 arcsec/pixel
normalized = 1 - ((0.5 - 0.3) / (5.0 - 0.3))
           = 1 - (0.2 / 4.7)
           = 1 - 0.043
           = 0.957  # High score because it's sharp!

# Telescope B: Blurry images
actual_resolution = 3.0 arcsec/pixel
normalized = 1 - ((3.0 - 0.3) / (5.0 - 0.3))
           = 1 - (2.7 / 4.7)
           = 1 - 0.574
           = 0.426  # Low score because it's blurry

# Task: Needs sharp images
max_resolution = 1.0 arcsec/pixel (nothing worse than this)
normalized = 1 - ((1.0 - 0.3) / (5.0 - 0.3))
           = 1 - (0.7 / 4.7)
           = 1 - 0.149
           = 0.851

# Check: Telescope A (0.957) > Task (0.851) → CAN bid (sharp enough)
#        Telescope B (0.426) < Task (0.851) → CANNOT bid (too blurry)

How Do You Choose Min/Max Values?

You DON'T need to know them beforehand perfectly! Here's the process:

Option 1: Start with Reasonable Estimates

# Look at common telescopes:
# - Amateur scopes: 100-500mm aperture
# - Research scopes: 500-2000mm
# - Major observatories: 2000-10000mm

# Pick conservative ranges:
MIN_APERTURE = 100   # Covers even small scopes
MAX_APERTURE = 10000 # Covers even huge scopes

# If a 15000mm scope joins later? No problem!
# You can update the config and re-normalize all vectors

Option 2: Use Your Actual Fleet

# Query your database
all_telescopes = get_all_telescopes()

# Find actual min/max
min_aperture = min(t.aperture for t in all_telescopes)
max_aperture = max(t.aperture for t in all_telescopes)

# Add 20% buffer
MIN_APERTURE = min_aperture * 0.8
MAX_APERTURE = max_aperture * 1.2

Option 3: Dynamic Ranges (Advanced)

# Recalculate ranges periodically as fleet grows
def update_normalization_ranges():
    stats = calculate_fleet_statistics()

    return {
        'aperture': (stats.aperture_min * 0.8, stats.aperture_max * 1.2),
        'fov': (stats.fov_min * 0.8, stats.fov_max * 1.2),
        # ...
    }

# Then re-normalize all existing vectors

What Happens If Values Are Outside Range?

The code handles this! Look:

def normalize_value(value, min_val, max_val, inverse=False):
    # CLAMP to range first
    value = max(min_val, min(max_val, value))  # ← This line!

    # Then normalize
    normalized = (value - min_val) / (max_val - min_val)

    return normalized

Example:

# Your range: 100-10000mm
# Telescope has: 15000mm aperture (way outside!)

value = max(100, min(10000, 15000))  # Clamps to 10000
normalized = (10000 - 100) / (10000 - 100) = 1.0

# Gets maximum score (1.0) - it's as good as it gets!

Simplified Flow Chart

USER REGISTERS TELESCOPE
    ↓
User inputs: "My FOV is 2.5 degrees"
    ↓
System looks up normalization range: 0.1° to 10°
    ↓
System calculates: (2.5 - 0.1) / (10 - 0.1) = 0.24
    ↓
Stores in database: telescope_vector[6] = 0.24
    ↓
[User never sees 0.24]

---

USER SUBMITS TASK
    ↓
User inputs: "Need minimum 1.5 degree FOV"
    ↓
System uses SAME normalization range: 0.1° to 10°
    ↓
System calculates: (1.5 - 0.1) / (10 - 0.1) = 0.14
    ↓
Stores in database: task_vector[6] = 0.14
    ↓
[User never sees 0.14]

---

MATCHING HAPPENS
    ↓
Compare: telescope[6] vs task[6]
         0.24      vs  0.14
    ↓
Telescope score (0.24) > Task requirement (0.14)
    ↓
✓ TELESCOPE QUALIFIES

Practical Code You'd Actually Write

# When telescope owner registers (FRONTEND)
telescope_form = {
    "name": "My Observatory",
    "aperture_mm": 2000,           # ← User enters this
    "fov_degrees": 2.5,            # ← User enters this  
    "resolution_arcsec": 0.8,      # ← User enters this
    # ... they just fill in REAL values
}

# Backend converts automatically
vectorizer = TelescopeVectorizer()
telescope_vector = vectorizer.vectorize({
    'aperture_mm': telescope_form['aperture_mm'],      # Real value
    'fov': telescope_form['fov_degrees'],              # Real value
    'resolution': telescope_form['resolution_arcsec'], # Real value
    # ...
})

# Save to database
db.save_telescope({
    'name': telescope_form['name'],
    'raw_specs': telescope_form,        # Original values for display
    'capability_vector': telescope_vector  # Normalized for matching
})

Your Questions Answered

Q: "Would I need to know min and max FOVs beforehand?"

A: You set reasonable ranges ONCE when building the system. You can:

  1. Use industry-standard ranges (0.1° to 10° for FOV is reasonable)
  2. Survey your initial telescope fleet and set ranges based on that
  3. Update ranges later if needed (and re-normalize existing data)

Q: "What if I don't know what telescopes will join?"

A: That's fine! Set conservative ranges that would cover most telescopes. The normalization still works even if you're not perfect.

Example:

  • Set FOV range: 0.1° to 10°
  • Most telescopes fall in this range
  • If a 15° FOV telescope joins → it gets clamped to 1.0 (maximum score)
  • If a 0.05° FOV telescope joins → it gets clamped to 0.0 (minimum score)

Q: "Do users need to see these normalized values?"

A: NO! Users always see and input REAL values:

  • They enter "2.5 degrees FOV"
  • They enter "2000mm aperture"
  • Normalization happens invisibly in the backend
  • When showing match results, you can convert back: "Your 2.5° FOV matches this task's 1.5° requirement"

Simpler Alternative (If Still Confused)

If vectorization still feels complex, here's a simpler approach for just FOV:

# Instead of normalizing, just compare directly
def telescope_meets_fov_requirement(telescope_fov, task_min_fov):
    return telescope_fov >= task_min_fov

# Usage:
telescope_fov = 2.5  # degrees
task_min_fov = 1.5   # degrees

if telescope_fov >= task_min_fov:
    print("Telescope qualifies!")

But then you lose:

  • Ability to weight dimensions differently
  • Ability to calculate overall "match score"
  • Fair comparison across different units/scales

That's why we normalize - to make multi-dimensional comparison possible!


Does this make more sense now?

The key insight: Normalization ranges are just a "ruler" you set up once. Users never touch them - they just enter real values which get measured against that ruler.

Want me to show you a even simpler baby example with just 2 dimensions?

Great Question! Let's Explore What Happens


Scenario: Very Wide Normalization Ranges

# Your config
MIN_FOV = 0.01  # degrees (super narrow)
MAX_FOV = 100   # degrees (super wide)

What Actually Happens

Example 1: Normal Telescopes

# Telescope A: 2.5° FOV (typical)
normalized = (2.5 - 0.01) / (100 - 0.01)
           = 2.49 / 99.99
           = 0.0249  # ← Very low score!

# Telescope B: 8.0° FOV (wide field)
normalized = (8.0 - 0.01) / (100 - 0.01)
           = 7.99 / 99.99
           = 0.0799  # ← Still pretty low

# Telescope C: 0.5° FOV (narrow)
normalized = (0.5 - 0.01) / (100 - 0.01)
           = 0.49 / 99.99
           = 0.0049  # ← Super low!

The Problem

All your telescopes end up clustered near zero!

Score distribution with wide range (0.01 - 100°):

0.0   0.1   0.2   0.3   0.4   0.5   0.6   0.7   0.8   0.9   1.0
|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|
●●●                                                            
↑
All your telescopes are here!

This means:

  • Poor discrimination - hard to tell telescopes apart
  • Loss of precision - 2.5° and 8.0° look almost identical (0.025 vs 0.080)
  • Wasted range - the upper 90% of your scale (0.1 to 1.0) is empty

Comparison: Reasonable Range

# Better config
MIN_FOV = 0.1   # degrees
MAX_FOV = 10    # degrees

# Telescope A: 2.5° FOV
normalized = (2.5 - 0.1) / (10 - 0.1)
           = 2.4 / 9.9
           = 0.242  # ← Good spread

# Telescope B: 8.0° FOV
normalized = (8.0 - 0.1) / (10 - 0.1)
           = 7.9 / 9.9
           = 0.798  # ← Well separated from A

# Telescope C: 0.5° FOV
normalized = (0.5 - 0.1) / (10 - 0.1)
           = 0.4 / 9.9
           = 0.040  # ← Clearly different
Score distribution with tight range (0.1 - 10°):

0.0   0.1   0.2   0.3   0.4   0.5   0.6   0.7   0.8   0.9   1.0
|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|
      ●           ●                             ●
      ↑           ↑                             ↑
      0.5°        2.5°                          8.0°

Much better! Telescopes are well-distributed across the scale.


The Mathematical Impact

On Similarity Scores

Let's see how this affects matching:

# Task needs: 1.5° FOV minimum

# Wide range (0.01 - 100°)
task_fov_normalized = (1.5 - 0.01) / 99.99 = 0.0149

telescope_2_5_deg = 0.0249
telescope_8_0_deg = 0.0799

# Difference between telescopes:
diff = 0.0799 - 0.0249 = 0.055  # Small difference!

# ---

# Tight range (0.1 - 10°)
task_fov_normalized = (1.5 - 0.1) / 9.9 = 0.141

telescope_2_5_deg = 0.242
telescope_8_0_deg = 0.798

# Difference between telescopes:
diff = 0.798 - 0.242 = 0.556  # Big difference! Much more meaningful

Impact on Weighted Similarity

Remember, we multiply by weights:

# Suppose FOV weight = 2.5 (important for wide-field survey)

# Wide range scenario:
weighted_scores = [
    0.0249 * 2.5 = 0.0623,  # Telescope A
    0.0799 * 2.5 = 0.1998   # Telescope B
]

# Tight range scenario:
weighted_scores = [
    0.242 * 2.5 = 0.605,   # Telescope A
    0.798 * 2.5 = 1.995    # Telescope B
]

With wide range: FOV barely affects final score (contributes ~0.06 to 0.20)
With tight range: FOV significantly affects final score (contributes ~0.6 to 2.0)

This means with unreasonably wide ranges, the dimension becomes almost meaningless in your matching!


Real-World Example: Full Vector Comparison

Let's say we have a wide-field survey task where FOV is CRITICAL:

# Task weights (FOV is super important)
weights = [1.0, 1.0, 0.5, 1.0, 0.5, 1.0, 2.5]
#          aper  wl   res  loc  inst mount FOV ← This weight is high!

# Two telescopes competing:
# Telescope A: 8° FOV, medium aperture
# Telescope B: 2° FOV, huge aperture

Scenario 1: Wide Range (0.01 - 100°)

# Telescope A
vector_A = [0.15, 0.21, 0.89, 0.72, 0.33, 0.95, 0.080]  # FOV: 8° → 0.080

# Telescope B  
vector_B = [0.80, 0.21, 0.89, 0.72, 0.33, 0.95, 0.020]  # FOV: 2° → 0.020

# Task
task_vec = [0.14, 0.04, 0.85, 0.50, 0.17, 0.90, 0.015]  # FOV: 1.5° → 0.015

# Weighted similarity calculation:
# Telescope A score ≈ 0.73
# Telescope B score ≈ 0.81  ← WINS despite worse FOV!

# Why? Because FOV contribution is tiny:
# A's FOV contribution: 0.080 * 2.5 = 0.2
# B's FOV contribution: 0.020 * 2.5 = 0.05
# But B's aperture dominates: 0.80 * 1.0 = 0.8 vs A's 0.15 * 1.0 = 0.15

Problem: Telescope B wins even though it has TERRIBLE FOV for a wide-field survey! The wide normalization range made FOV differences invisible.

Scenario 2: Tight Range (0.1 - 10°)

# Telescope A
vector_A = [0.15, 0.21, 0.89, 0.72, 0.33, 0.95, 0.798]  # FOV: 8° → 0.798

# Telescope B
vector_B = [0.80, 0.21, 0.89, 0.72, 0.33, 0.95, 0.192]  # FOV: 2° → 0.192

# Task
task_vec = [0.14, 0.04, 0.85, 0.50, 0.17, 0.90, 0.141]  # FOV: 1.5° → 0.141

# Weighted similarity calculation:
# Telescope A score ≈ 0.89  ← WINS! (as it should)
# Telescope B score ≈ 0.72

# FOV contribution now significant:
# A's FOV contribution: 0.798 * 2.5 = 1.995  ← Big!
# B's FOV contribution: 0.192 * 2.5 = 0.480  ← Much smaller

# Aperture still matters but doesn't dominate:
# A's aperture: 0.15 * 1.0 = 0.15
# B's aperture: 0.80 * 1.0 = 0.80

Good: Telescope A wins because its excellent FOV outweighs B's aperture advantage, which is exactly what we want for a wide-field survey!


The "Goldilocks Zone" for Ranges

Too Narrow (e.g., 1° - 2°)

MIN_FOV = 1.0
MAX_FOV = 2.0

# Problem: Many telescopes fall outside
telescope_fov = 8.0  # Gets clamped to 2.0 → score 1.0
telescope_fov = 0.5  # Gets clamped to 1.0 → score 0.0

# All wide-field scopes look identical (all get 1.0)
# All narrow-field scopes look identical (all get 0.0)

Too Wide (e.g., 0.01° - 100°)

MIN_FOV = 0.01
MAX_FOV = 100

# Problem: Most telescopes cluster near bottom
# 2.5° → 0.025
# 8.0° → 0.080
# Differences become tiny, dimension loses influence

Just Right (e.g., 0.1° - 10°)

MIN_FOV = 0.1
MAX_FOV = 10.0

# Most real telescopes: 0.5° to 8°
# Good spread across 0.04 to 0.80 range
# Differences are meaningful
# Extreme values get clamped but that's OK

How to Choose Good Ranges

Strategy 1: Survey Your Fleet (Best)

# Collect all FOVs from your telescopes
fovs = [0.5, 1.2, 2.5, 3.0, 8.0, 1.5, 4.0, 2.0, 6.0]

import numpy as np

# Use percentiles
p5 = np.percentile(fovs, 5)   # 5th percentile
p95 = np.percentile(fovs, 95) # 95th percentile

# Set range with buffer
MIN_FOV = p5 * 0.8    # 20% below 5th percentile
MAX_FOV = p95 * 1.2   # 20% above 95th percentile

# Example:
# p5 = 0.56, p95 = 7.6
# MIN_FOV = 0.56 * 0.8 = 0.45
# MAX_FOV = 7.6 * 1.2 = 9.12
# Round to: 0.4 and 10.0

Strategy 2: Use Domain Knowledge

# Research common telescope types:
# - Planetary imaging: 0.2° - 1°
# - General purpose: 1° - 4°  
# - Wide field surveys: 4° - 10°
# - All-sky cameras: 50° - 180°

# If you're NOT including all-sky cameras:
MIN_FOV = 0.1   # Covers even narrow planetary scopes
MAX_FOV = 10.0  # Covers wide-field surveys

# If you ARE including all-sky cameras:
MIN_FOV = 0.1
MAX_FOV = 180.0  # But now your normal scopes cluster at bottom!

# Better: Separate into categories or use log scale

Strategy 3: Start Conservative, Iterate

# Version 1: Guess based on what you know
MIN_FOV = 0.1
MAX_FOV = 10.0

# After 100 telescopes registered, analyze:
actual_fovs = get_all_telescope_fovs()
print(f"Min: {min(actual_fovs)}")  # 0.3°
print(f"Max: {max(actual_fovs)}")  # 12.0°

# Version 2: Adjust based on data
MIN_FOV = 0.2   # Lowered slightly
MAX_FOV = 15.0  # Raised to accommodate outlier

# Re-normalize all existing vectors with new ranges

Special Case: Logarithmic Scaling

If you really do have extreme ranges (like all-sky cameras with 180° FOV mixed with planetary scopes with 0.2° FOV), use log scale:

import math

def normalize_fov_log(fov_degrees):
    """Use logarithmic scaling for extreme ranges"""

    # Log transform first
    log_fov = math.log10(fov_degrees)

    # Then normalize
    # log10(0.1) = -1.0
    # log10(100) = 2.0
    LOG_MIN = -1.0
    LOG_MAX = 2.0

    normalized = (log_fov - LOG_MIN) / (LOG_MAX - LOG_MIN)

    return normalized

# Examples:
normalize_fov_log(0.1)   # log=-1.0 → normalized=0.0
normalize_fov_log(1.0)   # log=0.0  → normalized=0.33
normalize_fov_log(10.0)  # log=1.0  → normalized=0.67
normalize_fov_log(100.0) # log=2.0  → normalized=1.0

# Now you get good spread even with extreme values!

This works because FOV differences might be better understood as multiplicative rather than additive:

  • 2° vs 4° (2x difference) feels similar to
  • 10° vs 20° (also 2x difference)

Log scale preserves these relative differences.


Practical Rule of Thumb

Set your min/max to cover ~90% of expected telescopes, with some buffer for outliers.

# Good examples by dimension:

# Aperture: Most amateur to research scopes
MIN_APERTURE = 100mm      # Small amateur
MAX_APERTURE = 10000mm    # Large research (Keck, VLT class)

# FOV: Most imaging applications
MIN_FOV = 0.1°            # Narrow planetary
MAX_FOV = 10.0°           # Wide survey

# Resolution: Common CCD/CMOS sensors
MIN_RESOLUTION = 0.3      # Excellent seeing + small pixels
MAX_RESOLUTION = 5.0      # Poor seeing + large pixels

# Seeing: Earth-based observatories
MIN_SEEING = 0.5          # World-class sites (Mauna Kea)
MAX_SEEING = 3.0          # Poor sites

What I Recommend for You

# vector_config.py

# Start with these conservative ranges:
DIMENSIONS = [
    DimensionConfig(
        name="aperture",
        min_value=100.0,      # Covers small amateur scopes
        max_value=12000.0,    # Covers even huge research scopes
    ),
    DimensionConfig(
        name="field_of_view",
        min_value=0.1,        # Planetary imaging
        max_value=15.0,       # Wide field surveys
        # If you get an all-sky camera (180°), it'll just clamp to 1.0
    ),
    # ... other dimensions
]

# Add this utility to check range health:
def analyze_normalization_health():
    """Check if ranges are good"""
    telescopes = db.get_all_telescopes()

    for dimension in DIMENSIONS:
        values = [get_dimension_value(t, dimension.name) for t in telescopes]

        # Check clustering
        normalized = [normalize_value(v, dimension.min_value, dimension.max_value) 
                     for v in values]

        print(f"\n{dimension.name}:")
        print(f"  Raw range: {min(values):.2f} - {max(values):.2f}")
        print(f"  Config range: {dimension.min_value} - {dimension.max_value}")
        print(f"  Normalized distribution: {min(normalized):.3f} - {max(normalized):.3f}")
        print(f"  Mean: {np.mean(normalized):.3f}")
        print(f"  Std dev: {np.std(normalized):.3f}")

        # Warnings
        if max(normalized) < 0.5:
            print("  ⚠️  WARNING: All values in bottom half - consider lowering max")
        if min(normalized) > 0.5:
            print("  ⚠️  WARNING: All values in top half - consider raising min")
        if np.std(normalized) < 0.1:
            print("  ⚠️  WARNING: Poor spread - values too clustered")

# Run this periodically as your fleet grows!

TL;DR Answer to Your Question

Q: What if I go really big and really small like 0.01° and 100°?

A:

  • Bad idea - your normal telescopes (1° - 8°) will all cluster near 0, losing discrimination
  • ❌ Dimension becomes nearly useless in matching
  • ❌ Weights can't compensate for poor normalization
  • Better: Use ranges that match your actual fleet (e.g., 0.1° - 10°)
  • ✅ Outliers get clamped, which is fine
  • ✅ If you truly need extreme ranges, use logarithmic scaling instead

Does this clear it up?

Great Question! Let's See Exactly What Happens


Scenario: Telescope Outside Bounds

# Your config
MIN_FOV = 0.1   # degrees
MAX_FOV = 10.0  # degrees

# A new telescope registers with 15° FOV (outside range!)
telescope_data = {
    'fov': 15.0  # ← BIGGER than MAX_FOV
}

What Actually Happens (Step by Step)

Step 1: Clamping

The normalization function automatically clamps to the valid range:

def normalize_value(value, min_val, max_val, inverse=False):
    # CLAMP FIRST - this is the key line!
    value = max(min_val, min(max_val, value))

    # So for our example:
    # value = max(0.1, min(10.0, 15.0))
    # value = max(0.1, 10.0)
    # value = 10.0  ← Clamped down!

    # Then normalize
    normalized = (value - min_val) / (max_val - min_val)
    # normalized = (10.0 - 0.1) / (10.0 - 0.1)
    # normalized = 9.9 / 9.9
    # normalized = 1.0

    return 1.0  # Maximum possible score

Result

# Telescope with 15° FOV
normalized_fov = 1.0  # Gets maximum score

# Telescope with 50° FOV  
normalized_fov = 1.0  # Also gets maximum score

# Telescope with 10° FOV (exactly at limit)
normalized_fov = 1.0  # Also maximum score

# They all look IDENTICAL in the vector!

Same Thing for Below Minimum

# Telescope with 0.05° FOV (SMALLER than MIN_FOV)
value = max(0.1, min(10.0, 0.05))
value = max(0.1, 0.05)
value = 0.1  # ← Clamped up!

normalized = (0.1 - 0.1) / (10.0 - 0.1)
normalized = 0.0  # Minimum possible score

# Any telescope with FOV < 0.1° gets score of 0.0

Visual Representation

FOV Normalization Range: 0.1° to 10°

Real FOV:    0.05°   0.1°   1°   5°   10°   15°   50°   100°
                ↓     ↓     ↓    ↓     ↓     ↓     ↓      ↓
Clamped:     0.1°   0.1°   1°   5°   10°   10°   10°    10°
                ↓     ↓     ↓    ↓     ↓     ↓     ↓      ↓
Normalized:  0.0   0.0   0.09 0.49  1.0   1.0   1.0    1.0
             └─────┘                 └────────────────────┘
          "Too small"               "All look the same"
          cluster                    cluster

Implications

1. Loss of Discrimination for Outliers

# Three telescopes register:
telescope_A = {'fov': 10.0}   # normalized → 1.0
telescope_B = {'fov': 15.0}   # normalized → 1.0
telescope_C = {'fov': 180.0}  # normalized → 1.0 (all-sky camera!)

# They're treated as IDENTICAL for FOV!
# Even though C is 12x better than B, and 18x better than A

Problem: You can't tell these apart anymore. The all-sky camera doesn't get "credit" for being exceptional.

2. Still Works for Matching (Mostly)

# Task requires: min 2° FOV
task_fov = 2.0  # normalized → (2.0 - 0.1) / 9.9 = 0.19

# All three telescopes from above:
telescope_A_score = 1.0  # ✓ Passes (1.0 > 0.19)
telescope_B_score = 1.0  # ✓ Passes  
telescope_C_score = 1.0  # ✓ Passes

# They all qualify, which is correct!
# But you can't rank them by FOV quality

Good news: For threshold checking (can they do the job?), it works fine. All three can handle a 2° FOV requirement.

Bad news: For ranking (which is best?), they all tie.

3. Impact on Weighted Similarity

# Wide-field survey task with FOV weight = 2.5

# Task vector includes:
# ... , task_fov = 0.19

# Telescope A (10° FOV):
# ... , fov_score = 1.0
# FOV contribution: 1.0 * 2.5 = 2.5

# Telescope C (180° FOV, clamped):  
# ... , fov_score = 1.0
# FOV contribution: 1.0 * 2.5 = 2.5

# SAME contribution, even though C is objectively better!

When This Matters vs. When It Doesn't

✅ Doesn't Matter Much If:

  1. Outliers are rare

    # 95% of your telescopes: 0.5° - 8° FOV (well within bounds)
    # 5% outliers: 12°, 15° (slightly outside)
    # → Most comparisons work fine, only a few ties at the top
    
  2. You're doing threshold matching, not ranking

    # You just need to know: "Can this telescope do the job?"
    # Answer: Yes (score 1.0) or No (score < threshold)
    # → Clamping is fine, task gets done either way
    
  3. Other dimensions differentiate the outliers

    # Telescope B: FOV = 15° (clamped to 1.0), aperture = 500mm (0.04)
    # Telescope C: FOV = 180° (clamped to 1.0), aperture = 80mm (0.00)
    
    # They tie on FOV, but aperture breaks the tie
    # → Overall similarity scores will differ
    

❌ Matters A Lot If:

  1. Many telescopes are outside bounds

    # 40% of telescopes: < 0.1° FOV (all get 0.0)
    # 30% of telescopes: > 10° FOV (all get 1.0)
    # → You've lost discrimination for 70% of your fleet!
    
  2. The outlier dimension is critical for your tasks

    # All-sky meteor detection tasks NEED to distinguish:
    # - 15° FOV telescope (marginal)
    # - 180° FOV all-sky camera (perfect)
    # But both get 1.0 → can't rank them!
    
  3. You're using this for optimization, not just matching

    # Goal: "Find THE BEST telescope for this task"
    # If the top 5 telescopes all get 1.0 on FOV, you can't pick the best
    

Solutions for Handling Outliers

Solution 1: Extend the Range (Simple)

# Original range
MIN_FOV = 0.1
MAX_FOV = 10.0

# You get a 180° all-sky camera → extend range
MIN_FOV = 0.1
MAX_FOV = 200.0  # Now covers all-sky cameras

# But now your normal telescopes cluster at bottom again!
# FOV = 2° → (2 - 0.1) / 199.9 = 0.0095 (tiny!)
# FOV = 8° → (8 - 0.1) / 199.9 = 0.0395 (still tiny!)

Tradeoff: You gain outlier discrimination but lose discrimination for normal telescopes.

Solution 2: Use Multiple Ranges (Better)

Create categories with different normalization:

class TelescopeCategory:
    NARROW_FIELD = "narrow"     # < 1° FOV
    NORMAL_FIELD = "normal"     # 1° - 10° FOV  
    WIDE_FIELD = "wide"         # 10° - 50° FOV
    ALL_SKY = "all_sky"         # > 50° FOV

# Different normalization per category
NORMALIZATION_RANGES = {
    'narrow': {'min': 0.05, 'max': 1.0},
    'normal': {'min': 0.1, 'max': 10.0},
    'wide': {'min': 5.0, 'max': 50.0},
    'all_sky': {'min': 40.0, 'max': 200.0}
}

def normalize_fov_categorized(fov, category):
    ranges = NORMALIZATION_RANGES[category]
    return normalize_value(fov, ranges['min'], ranges['max'])

# Now each category has good discrimination!
# But you can only compare within categories

Tradeoff: Great discrimination within categories, but can't easily compare across categories.

Solution 3: Logarithmic Scale (Elegant)

Use log scale to handle extreme ranges naturally:

import math

def normalize_fov_log(fov_degrees):
    """Handle 0.1° to 180° range smoothly"""

    # Log transform
    log_fov = math.log10(fov_degrees)

    # Normalize log values
    # log10(0.1) = -1.0
    # log10(180) = 2.26
    LOG_MIN = -1.0
    LOG_MAX = 2.5  # Covers up to ~316°

    normalized = (log_fov - LOG_MIN) / (LOG_MAX - LOG_MIN)

    return np.clip(normalized, 0.0, 1.0)

# Examples:
normalize_fov_log(0.1)   # -1.0 → 0.00
normalize_fov_log(1.0)   # 0.0  → 0.29
normalize_fov_log(2.0)   # 0.3  → 0.37
normalize_fov_log(8.0)   # 0.9  → 0.54
normalize_fov_log(10.0)  # 1.0  → 0.57
normalize_fov_log(15.0)  # 1.18 → 0.62  ← Now different from 10°!
normalize_fov_log(180.0) # 2.26 → 0.93  ← Gets proper credit!

# Good spread across entire range!

Benefits:

  • Handles extreme ranges
  • Maintains discrimination everywhere
  • Reflects that FOV differences are multiplicative (2x better matters regardless of starting point)

Solution 4: Soft Clamping (Advanced)

Instead of hard cutoff, use a sigmoid function for smooth tapering:

def normalize_fov_soft_clamp(fov_degrees):
    """Soft boundaries that don't create hard cutoffs"""

    # Standard normalization
    standard = (fov_degrees - 0.1) / 9.9

    # Apply sigmoid for values outside range
    if standard > 1.0:
        # Taper off asymptotically toward 1.0
        excess = standard - 1.0
        # sigmoid: approaches 1.0 but never quite reaches it
        return 1.0 - 0.5 / (1.0 + excess)
    elif standard < 0.0:
        # Taper off asymptotically toward 0.0
        deficit = -standard
        return 0.5 / (1.0 + deficit)
    else:
        return standard

# Examples:
normalize_fov_soft_clamp(10.0)  # 1.0 → 1.00 (at boundary)
normalize_fov_soft_clamp(15.0)  # 1.5 → 0.67  ← Different from 10°!
normalize_fov_soft_clamp(50.0)  # 5.0 → 0.89
normalize_fov_soft_clamp(180.0) # 18.1 → 0.97  ← Approaches 1.0 but distinct

Benefits:

  • Outliers still differentiate (no hard clustering at 1.0)
  • Normal range stays well-distributed
  • Graceful degradation

Drawbacks:

  • More complex
  • Harder to explain to users

My Recommendation: Hybrid Approach

# vector_config.py

class VectorConfig:
    # Default ranges for most dimensions
    DIMENSIONS = [
        DimensionConfig(
            name="aperture",
            min_value=100.0,
            max_value=10000.0,
            use_log_scale=False  # Linear is fine for aperture
        ),
        DimensionConfig(
            name="field_of_view",
            min_value=0.1,
            max_value=10.0,
            use_log_scale=True,  # ← Use log for FOV (extreme range)
            log_min=-1.0,        # log10(0.1)
            log_max=2.5          # log10(~316)
        ),
        DimensionConfig(
            name="resolution",
            min_value=0.3,
            max_value=5.0,
            use_log_scale=False  # Linear is fine
        ),
        # ... etc
    ]

# vector_utils.py

def normalize_value(value, config: DimensionConfig):
    """Universal normalization with optional log scale"""

    if config.use_log_scale:
        # Log transform first
        log_value = math.log10(value)

        # Clamp log value
        log_value = max(config.log_min, min(config.log_max, log_value))

        # Normalize
        normalized = (log_value - config.log_min) / (config.log_max - config.log_min)
    else:
        # Standard linear normalization with clamping
        value = max(config.min_value, min(config.max_value, value))
        normalized = (value - config.min_value) / (config.max_value - config.min_value)

        if config.inverse:
            normalized = 1.0 - normalized

    return float(np.clip(normalized, 0.0, 1.0))

Practical Handling Code

Add utilities to detect and report outliers:

# telescope_vectorizer.py

class TelescopeVectorizer:

    def vectorize_with_warnings(self, telescope_data: Dict) -> Tuple[List[float], List[str]]:
        """Vectorize and return warnings for out-of-bound values"""

        vector = []
        warnings = []

        for dim_config in VectorConfig.DIMENSIONS:
            value = telescope_data.get(dim_config.raw_field_name)

            # Check if value is outside range
            if value < dim_config.min_value:
                warnings.append(
                    f"{dim_config.name}: {value} is below minimum {dim_config.min_value} "
                    f"(will be treated as {dim_config.min_value})"
                )
            elif value > dim_config.max_value:
                warnings.append(
                    f"{dim_config.name}: {value} exceeds maximum {dim_config.max_value} "
                    f"(will be treated as {dim_config.max_value})"
                )

            # Normalize (with clamping)
            normalized = normalize_value(value, dim_config)
            vector.append(normalized)

        return vector, warnings

# Usage:
vectorizer = TelescopeVectorizer()
vector, warnings = vectorizer.vectorize_with_warnings(telescope_data)

if warnings:
    print("⚠️  Outlier values detected:")
    for w in warnings:
        print(f"  - {w}")

    # Maybe email the admin or flag for review
    notify_admin(telescope_data, warnings)

Database Storage Strategy

Store BOTH raw and normalized values:

CREATE TABLE telescopes (
    id UUID PRIMARY KEY,
    name VARCHAR(255),

    -- Raw specifications (for display to users)
    aperture_mm FLOAT,
    fov_degrees FLOAT,
    resolution_arcsec FLOAT,
    -- ... other raw values

    -- Normalized vector (for matching)
    capability_vector FLOAT[],

    -- Outlier flags
    has_outlier_values BOOLEAN,
    outlier_dimensions TEXT[],  -- ['fov', 'aperture']

    created_at TIMESTAMP
);

This way:

  • Users always see their real values (15° FOV, not 1.0)
  • System uses normalized vector for matching
  • You can detect and handle outliers specially if needed

Summary: What Happens with Out-of-Bounds Scopes

Aspect What Happens Impact Solution
Normalization Clamped to 0.0 or 1.0 Value is constrained Expected behavior
Threshold matching Still works correctly Telescope qualifies if capable ✅ No problem
Ranking Outliers tie with boundary values Can't distinguish beyond limits Use log scale or extend range
User display Shows real value (15°) User sees accurate specs ✅ No problem
Rare outliers Minimal impact Most matches unaffected ✅ Acceptable
Many outliers Loss of discrimination Poor matching quality Redesign ranges

Your Action Items

  1. Start with reasonable ranges (0.1° - 10° for FOV is good)
  2. Use log scale for FOV (handles outliers elegantly)
  3. Monitor for outliers (add warnings when values exceed bounds)
  4. Store raw values (so you can display "180° FOV" to users)
  5. Review periodically (adjust ranges as fleet grows)

Want me to write the complete outlier detection and logging system?

Perfect! Let's Simplify This - No ML, Just Manual Config

You're absolutely right - let's keep this simple and deterministic. You'll manually set all the ranges based on astronomy literature, and the system just does math.


Your Simplified Approach

1. Research Standard Ranges (You do this once)

# vector_config.py

"""
All ranges based on published astronomy literature and standards.
References added for each dimension.
"""

from dataclasses import dataclass

@dataclass
class DimensionConfig:
    name: str
    index: int
    min_value: float
    max_value: float
    unit: str
    inverse: bool = False
    reference: str = ""  # ← Literature citation

class VectorConfig:
    """
    Normalization ranges derived from astronomy literature.
    Update these if you find better references.
    """

    DIMENSIONS = [
        DimensionConfig(
            name="aperture",
            index=0,
            min_value=100.0,        # mm
            max_value=10000.0,      # mm
            unit="mm",
            inverse=False,
            reference="Amateur telescopes: 100-500mm (Sky & Telescope buyer's guide). "
                     "Research telescopes: 1000-10000mm (Keck, VLT class). "
                     "Source: https://www.eso.org/public/teles-instr/"
        ),
        DimensionConfig(
            name="wavelength_coverage",
            index=1,
            min_value=0.0,
            max_value=1.0,          # proportion of spectrum
            unit="proportion",
            inverse=False,
            reference="Full EM spectrum: 100nm (UV) to 350,000nm (Far IR). "
                     "Source: NASA EM Spectrum guide"
        ),
        DimensionConfig(
            name="resolution",
            index=2,
            min_value=0.3,          # arcsec/pixel
            max_value=5.0,          # arcsec/pixel
            unit="arcsec/pixel",
            inverse=True,           # Lower is better
            reference="Excellent seeing: 0.3-0.5 arcsec (Mauna Kea, La Palma). "
                     "Average seeing: 1-2 arcsec. Poor seeing: 3-5 arcsec. "
                     "Source: Racine, R. (1984) PASP 96, 417"
        ),
        DimensionConfig(
            name="location_quality",
            index=3,
            min_value=0.0,
            max_value=1.0,          # composite score
            unit="score",
            inverse=False,
            reference="Composite of Bortle scale (1-9), seeing (0.5-3.0 arcsec), "
                     "and clear nights (0-365). See individual sub-metrics."
        ),
        DimensionConfig(
            name="instrument_versatility",
            index=4,
            min_value=0.0,
            max_value=1.0,          # proportion of instruments
            unit="proportion",
            inverse=False,
            reference="Standard instruments: imaging, spectroscopy, photometry, "
                     "polarimetry, coronagraphy, adaptive optics. "
                     "Source: Common observatory instrument suites"
        ),
        DimensionConfig(
            name="mount_precision",
            index=5,
            min_value=0.0,
            max_value=1.0,          # tracking accuracy 0-1
            unit="accuracy",
            inverse=False,
            reference="Tracking precision as fraction of required accuracy. "
                     "1.0 = perfect tracking, 0.0 = no tracking capability"
        ),
        DimensionConfig(
            name="field_of_view",
            index=6,
            min_value=0.1,          # degrees
            max_value=10.0,         # degrees
            unit="degrees",
            inverse=False,
            reference="Planetary imaging: 0.1-1° FOV. "
                     "Deep sky: 1-4° FOV. Wide field: 4-10° FOV. "
                     "Source: CCD Imaging guides, amateur astronomy handbooks"
        )
    ]

    # Sub-metric ranges (for location_quality calculation)
    BORTLE_MIN = 1      # Excellent dark sky
    BORTLE_MAX = 9      # Inner city
    # Reference: Bortle, J. E. (2001) "Introducing the Bortle Dark-Sky Scale" 
    #            Sky & Telescope, February 2001

    SEEING_MIN = 0.5    # arcseconds (excellent)
    SEEING_MAX = 3.0    # arcseconds (poor)
    # Reference: Typical values from major observatories
    #            Mauna Kea: 0.5-0.8", La Palma: 0.7-1.0", Average: 1.5-2.0"

    CLEAR_NIGHTS_MIN = 0
    CLEAR_NIGHTS_MAX = 365
    # Reference: Astronomical site surveys
    #            Atacama Desert: ~300 clear nights/year
    #            Average US site: ~150-200 clear nights/year

    # Wavelength bands (standard astronomy)
    WAVELENGTH_BANDS = {
        "uv": (100, 400),         # nm
        "optical": (400, 700),    # nm - visible light
        "near_ir": (700, 2500),   # nm
        "mid_ir": (2500, 25000),  # nm
        "far_ir": (25000, 350000) # nm
    }
    # Reference: Standard EM spectrum divisions used in astronomy
    #            ISO definitions for IR bands

    SPECTRUM_MIN = 100      # nm (UV)
    SPECTRUM_MAX = 350000   # nm (Far IR)

    # Standard instrument types
    INSTRUMENT_TYPES = [
        "imaging_camera",      # Standard CCD/CMOS imaging
        "spectrograph",        # Spectrum analysis
        "photometer",          # Precise brightness measurements
        "polarimeter",         # Polarization measurements  
        "coronagraph",         # Solar/stellar corona imaging
        "adaptive_optics"      # Atmospheric correction
    ]
    # Reference: Common observatory instrumentation
    #            Major observatories typically have 3-5 of these

2. Simple Vector Math Only

# vector_utils.py

"""
Simple normalization and distance calculations.
No machine learning - just math based on configured ranges.
"""

import numpy as np
from vector_config import VectorConfig

def normalize_value(value: float, min_val: float, max_val: float, inverse: bool = False) -> float:
    """
    Convert a real value to 0-1 scale.

    This is just basic min-max normalization from any statistics textbook.
    Formula: (value - min) / (max - min)

    Args:
        value: Real-world value (e.g., 2000mm aperture)
        min_val: Minimum in configured range
        max_val: Maximum in configured range
        inverse: If True, flip the scale (for metrics where lower is better)

    Returns:
        Normalized value between 0 and 1
    """
    # Clamp to range
    value = max(min_val, min(max_val, value))

    # Avoid division by zero
    if max_val == min_val:
        return 0.5

    # Calculate normalized value
    normalized = (value - min_val) / (max_val - min_val)

    # Invert if needed (e.g., for resolution where lower is better)
    if inverse:
        normalized = 1.0 - normalized

    return float(np.clip(normalized, 0.0, 1.0))


def cosine_similarity(vec_a: list, vec_b: list) -> float:
    """
    Calculate cosine similarity between two vectors.

    This is standard vector math - measures angle between vectors.
    Returns 1.0 for identical vectors, 0.0 for orthogonal vectors.

    Formula: dot(A,B) / (||A|| * ||B||)

    Reference: Any linear algebra textbook
    """
    vec_a = np.array(vec_a)
    vec_b = np.array(vec_b)

    dot_product = np.dot(vec_a, vec_b)
    magnitude_a = np.linalg.norm(vec_a)
    magnitude_b = np.linalg.norm(vec_b)

    if magnitude_a == 0 or magnitude_b == 0:
        return 0.0

    return float(dot_product / (magnitude_a * magnitude_b))


def weighted_cosine_similarity(vec_a: list, vec_b: list, weights: list) -> float:
    """
    Calculate weighted cosine similarity.

    Same as regular cosine similarity but applies importance weights
    to each dimension first.

    Args:
        vec_a: First vector (e.g., telescope capabilities)
        vec_b: Second vector (e.g., task requirements)
        weights: Importance multipliers for each dimension

    Returns:
        Similarity score 0-1
    """
    vec_a = np.array(vec_a)
    vec_b = np.array(vec_b)
    weights = np.array(weights)

    # Apply weights to both vectors
    weighted_a = vec_a * weights
    weighted_b = vec_b * weights

    # Standard cosine similarity on weighted vectors
    return cosine_similarity(weighted_a.tolist(), weighted_b.tolist())


def euclidean_distance(vec_a: list, vec_b: list) -> float:
    """
    Calculate Euclidean distance between vectors.

    Standard distance metric - straight line distance in n-dimensional space.
    Lower values = more similar vectors.

    Formula: sqrt(sum((a_i - b_i)^2))

    Reference: Any geometry textbook
    """
    vec_a = np.array(vec_a)
    vec_b = np.array(vec_b)

    return float(np.linalg.norm(vec_a - vec_b))

3. Manual Weight Configuration (Based on Task Types)

# task_categories.py

"""
Manually configured weights for different observation types.
Based on astronomy literature about what matters for each task type.
"""

TASK_CATEGORY_WEIGHTS = {
    "deep_sky_imaging": {
        "description": "Faint galaxies, nebulae, star clusters",
        "weights": [2.5, 1.0, 0.5, 1.5, 0.3, 1.0, 0.8],
        "rationale": {
            "aperture": "2.5 - Critical for light gathering on faint objects",
            "wavelength": "1.0 - Moderate, usually optical",
            "resolution": "0.5 - Less critical, objects are extended",
            "location": "1.5 - Important for long exposures",
            "instruments": "0.3 - Basic imaging camera sufficient",
            "mount": "1.0 - Moderate, need stable tracking",
            "fov": "0.8 - Somewhat important for framing"
        },
        "reference": "Deep sky astrophotography guides, CCD imaging handbooks"
    },

    "planetary": {
        "description": "Planets, Moon, solar system objects",
        "weights": [1.2, 1.0, 2.5, 0.8, 0.5, 2.0, 0.3],
        "rationale": {
            "aperture": "1.2 - Helpful but planets are bright",
            "wavelength": "1.0 - Usually optical, some IR",
            "resolution": "2.5 - CRITICAL for planetary detail",
            "location": "0.8 - Seeing matters but shorter exposures",
            "instruments": "0.5 - Basic imaging sufficient",
            "mount": "2.0 - Critical for tracking fast-moving planets",
            "fov": "0.3 - Small targets, narrow field OK"
        },
        "reference": "Planetary imaging guides (Damian Peach, Christopher Go)"
    },

    "exoplanet_transit": {
        "description": "High-precision photometry for transits",
        "weights": [1.0, 0.8, 2.5, 2.0, 1.8, 2.5, 0.3],
        "rationale": {
            "aperture": "1.0 - Target stars are bright enough",
            "wavelength": "0.8 - Specific but not critical",
            "resolution": "2.5 - Need to isolate target star",
            "location": "2.0 - CRITICAL, can't have clouds during transit",
            "instruments": "1.8 - Photometer/spectrograph helpful",
            "mount": "2.5 - CRITICAL, need long stable tracking",
            "fov": "0.3 - Single point target"
        },
        "reference": "Exoplanet observation protocols (AAVSO, ETD)"
    },

    "spectroscopy": {
        "description": "Spectrum analysis of stars/galaxies",
        "weights": [1.5, 2.0, 1.0, 1.2, 2.5, 1.5, 0.5],
        "rationale": {
            "aperture": "1.5 - Important for light gathering",
            "wavelength": "2.0 - Critical, need specific bands",
            "resolution": "1.0 - Moderate importance",
            "location": "1.2 - Decent sky needed",
            "instruments": "2.5 - CRITICAL, need spectrograph",
            "mount": "1.5 - Need stability for slit alignment",
            "fov": "0.5 - Usually small targets"
        },
        "reference": "Amateur spectroscopy guides (ARAS, RSpec)"
    },

    "wide_field_survey": {
        "description": "Large area sky surveys, transient detection",
        "weights": [0.8, 1.0, 0.5, 1.0, 0.5, 1.2, 2.5],
        "rationale": {
            "aperture": "0.8 - Less critical for surveys",
            "wavelength": "1.0 - Usually optical",
            "resolution": "0.5 - Don't need high res for surveys",
            "location": "1.0 - Moderate importance",
            "instruments": "0.5 - Basic camera sufficient",
            "mount": "1.2 - Need accurate pointing",
            "fov": "2.5 - CRITICAL, need wide field"
        },
        "reference": "Survey telescope design (ZTF, ATLAS, Pan-STARRS papers)"
    },

    "time_domain": {
        "description": "Supernovae, variable stars, fast transients",
        "weights": [1.0, 1.0, 1.0, 2.5, 1.0, 1.5, 1.0],
        "rationale": {
            "aperture": "1.0 - Balanced requirement",
            "wavelength": "1.0 - Usually optical",
            "resolution": "1.0 - Moderate",
            "location": "2.5 - CRITICAL, can't miss time window",
            "instruments": "1.0 - Standard imaging",
            "mount": "1.5 - Good tracking needed",
            "fov": "1.0 - Moderate field"
        },
        "reference": "Time-domain astronomy surveys (LSST science book)"
    }
}


def get_weights_for_category(category: str) -> list:
    """
    Get the manually configured weights for a task category.

    Args:
        category: Task type (e.g., "deep_sky_imaging")

    Returns:
        List of 7 weights corresponding to the 7 dimensions
    """
    if category in TASK_CATEGORY_WEIGHTS:
        return TASK_CATEGORY_WEIGHTS[category]["weights"]
    else:
        # Default: all dimensions equally weighted
        return [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]


def get_category_info(category: str) -> dict:
    """Get full information about a category including rationale"""
    return TASK_CATEGORY_WEIGHTS.get(category, {
        "description": "General observation",
        "weights": [1.0] * 7,
        "rationale": {},
        "reference": "Default equal weighting"
    })

4. Your Workflow

Step 1: Configure Ranges (Do Once)

# You've already done this by reading astronomy literature
# It's all in vector_config.py
# If you find better references, just update the config file

Step 2: Someone Registers a Telescope

# They provide raw specs
telescope_data = {
    'aperture_mm': 2000,
    'wavelength_range': (400, 2500),  # nm
    'resolution': 0.8,                # arcsec/pixel
    'location': {
        'bortle': 3,
        'seeing': 1.2,
        'clear_nights': 250
    },
    'instruments': ['imaging_camera', 'spectrograph'],
    'mount_accuracy': 0.95,
    'fov': 2.5                        # degrees
}

# System converts to vector automatically
vectorizer = TelescopeVectorizer()
vector = vectorizer.vectorize(telescope_data)

# Store both raw and vector
db.save({
    'raw_specs': telescope_data,       # For humans to read
    'capability_vector': vector        # For math
})

Step 3: Someone Submits a Task

# They specify what they need
task_data = {
    'category': 'deep_sky_imaging',   # ← They pick from predefined list
    'min_aperture': 1500,
    'wavelength_needed': (500, 900),
    'max_resolution': 1.0,
    'required_instruments': ['imaging_camera'],
    'min_tracking_precision': 0.90,
    'min_fov': 1.5
}

# System converts to vector
task_vectorizer = TaskVectorizer()
task_vector = task_vectorizer.vectorize(task_data)

# Get weights for this category (from your manual config)
weights = get_weights_for_category('deep_sky_imaging')
# weights = [2.5, 1.0, 0.5, 1.5, 0.3, 1.0, 0.8]

Step 4: Matching (Pure Math)

# Get all telescope vectors from database
telescopes = db.get_all_telescopes()

# Calculate similarity for each
scores = []
for telescope in telescopes:
    score = weighted_cosine_similarity(
        telescope.capability_vector,
        task_vector,
        weights
    )
    scores.append({
        'telescope_id': telescope.id,
        'score': score
    })

# Sort by score
scores.sort(key=lambda x: x['score'], reverse=True)

# Top telescope wins the bid
winner = scores[0]

5. Everything is Transparent and Traceable

# You can always explain WHY a telescope won:

def explain_match(telescope_vec, task_vec, weights):
    """
    Show exactly how the score was calculated.
    No black box - just math you can verify.
    """
    print("Dimension-by-dimension breakdown:")
    print("-" * 60)

    for i, dim in enumerate(VectorConfig.DIMENSIONS):
        tel_val = telescope_vec[i]
        task_val = task_vec[i]
        weight = weights[i]

        contribution = tel_val * task_val * weight

        print(f"{dim.name:25s} | Tel: {tel_val:.3f} | Task: {task_val:.3f} "
              f"| Weight: {weight:.1f} | Contribution: {contribution:.3f}")

    total_score = weighted_cosine_similarity(telescope_vec, task_vec, weights)
    print("-" * 60)
    print(f"Total similarity score: {total_score:.3f}")

    return total_score


# Example output:
"""
Dimension-by-dimension breakdown:
------------------------------------------------------------
aperture                  | Tel: 0.192 | Task: 0.141 | Weight: 2.5 | Contribution: 0.068
wavelength_coverage       | Tel: 0.212 | Task: 0.040 | Weight: 1.0 | Contribution: 0.008
resolution                | Tel: 0.894 | Task: 0.851 | Weight: 0.5 | Contribution: 0.380
location_quality          | Tel: 0.717 | Task: 0.500 | Weight: 1.5 | Contribution: 0.538
instrument_versatility    | Tel: 0.333 | Task: 0.167 | Weight: 0.3 | Contribution: 0.017
mount_precision           | Tel: 0.950 | Task: 0.900 | Weight: 1.0 | Contribution: 0.855
field_of_view             | Tel: 0.242 | Task: 0.141 | Weight: 0.8 | Contribution: 0.027
------------------------------------------------------------
Total similarity score: 0.823
"""

6. Updating Your System

When you find better literature references:

# Old config
MIN_SEEING = 0.5
MAX_SEEING = 3.0
# Reference: "I guessed"

# You read a paper and find:
# "Typical seeing ranges from 0.4" (Mauna Kea) to 4.0" (poor sites)"
# Source: Walker, M. (1987) "Astronomical Seeing" PASP

# Update config
MIN_SEEING = 0.4
MAX_SEEING = 4.0
# Reference: "Walker, M. (1987) 'Astronomical Seeing' PASP, Vol 99, pp. 192-204"

# Re-normalize all existing vectors with new ranges
update_all_vectors()

Your Complete Simple System

1. Read astronomy literature → Set ranges in vector_config.py
2. Read task-specific literature → Set weights in task_categories.py
3. Telescope registers → Convert specs to vector (just math)
4. Task submitted → Convert requirements to vector (just math)
5. Match → Calculate cosine similarity (just math)
6. Done! No ML, no training, completely transparent

Summary: What You're Actually Doing

Step What You Do No ML Involved
Configure ranges Read papers, set min/max values ✓ Manual research
Configure weights Read papers, set importance ✓ Manual research
Vectorize telescope Apply normalization formula ✓ Just arithmetic
Vectorize task Apply normalization formula ✓ Just arithmetic
Calculate match Cosine similarity ✓ Just linear algebra
Pick winner Sort by score ✓ Just sorting

Zero machine learning. Zero training. Just configuration + math.

Is this what you had in mind?

2. Smart Scheduler Python Implementation

The scheduler factors in:

  • Site capability score: Aperture → limiting magnitude, pixel scale, FOV, GPS timing, filters
  • Observability score: Altitude, airmass, sun position, moon distance
  • Priority score: Time-critical events (occultations in 1 hour = +60), coverage deficit, recency, network coordination
  • Combined score: 50% priority + 30% observability + 20% capability

Key formulas:

Limiting mag ≈ 2 + 5*log10(aperture_mm) - Bortle_penalty
Plate scale = 206.265 * pixel_size_um / focal_length_mm
FOV = sensor_width_mm / focal_length_mm * 3438 arcmin

Great—let's dig into this. This is actually one of the most solvable parts of your system, and getting it right early will make everything else easier.

The core problem: You need to match observation tasks (what to observe, how to observe it) to the right telescopes, since a 4-inch refractor in a suburban backyard can't do what a 12-inch Dobsonian on a dark-sky site can do. You also need to ensure consistent, usable data comes back.

Here's a framework I'd suggest:

1. Equipment profiling When someone joins your network, they'd input their telescope specs: aperture, focal length, mount type, location (for atmospheric conditions and light pollution), camera specs, and filters available. This creates a "capability fingerprint" for each telescope.

Beyond specs, you'd want some empirical data: actual limiting magnitude under their skies, point-spread function (PSF), and typical image quality. This could be gathered through initial calibration observations or by analyzing their early contributions.

2. Task stratification Instead of one-size-fits-all tasks, create tiers of observation difficulty:

  • Bright targets (easy): Moon features, bright planets, naked-eye stars. Almost any telescope works.
  • Moderate targets (medium): Messier objects, binary stars, open clusters. Needs decent aperture (6"+) and decent skies.
  • Deep targets (challenging): Faint galaxies, nebulae, exoplanet transits. Requires larger aperture (8"+), dark skies, precise tracking.

Each task would have success criteria—"we need a limiting magnitude of at least 18," or "PSF better than 2 arcseconds." The system automatically offers tasks matching what a telescope can realistically achieve.

3. Geographic and temporal matching Location matters enormously. You could:

  • Assign observations based on which telescopes have the best view at the right time (some targets are only observable from certain latitudes).
  • Cluster observations geographically to cross-reference results (reduces individual error, increases confidence).
  • Account for weather patterns (telescopes in dry climates get priority for certain observations).

4. Quality control mechanisms This keeps data consistent:

  • Automated validation: Flag observations that deviate from expected results (a bright star shouldn't suddenly disappear, for example).
  • Redundancy: Have multiple telescopes observe the same target independently. When results converge, you have confidence.
  • Feedback loops: Show observers how their data compared to others. This motivates better practices and helps identify systematic errors (e.g., "your images are always 0.5 magnitudes dimmer—here's why and how to fix it").
  • Calibration stars: Include reference observations of known objects so you can normalize across different equipment.

5. The algorithmic matching engine This is where it gets elegant. Build a system that:

  • Takes an incoming observation request (e.g., "monitor this exoplanet transit on Jan 15")
  • Identifies which telescopes can see that target at that time
  • Filters by capability (do they have the right aperture, filters, tracking precision?)
  • Scores by data quality expectations (older equipment gets easier tasks; seasoned observers get challenging ones)
  • Assigns the task to the best available telescope(s)

Over time, machine learning could optimize this—you'd learn which telescope/observer combinations produce the best data for specific types of observations.

Practical starting point: Early on, you probably won't need sophisticated automation. You could start with a questionnaire-based system where volunteers self-select into tasks they're suited for. As you grow and gather data on what works, you'd formalize and automate the matching.

Questions for you:

  • Are you thinking of focusing on photometry (brightness measurements) first, or also tackle astrometry (positions) and spectroscopy?
  • Do you envision this as a purely autonomous system, or would there be human oversight/curation of observations?
  • Would you prefer standardization through software (everyone uses the same observation protocols) or through statistical post-processing (accepting diversity in methods, then normalizing the data)?

What resonates here, and where do you think your biggest challenge lies?

[[plan]]