Skip to content

Duplicate of: Types of nets.md β€” same content. Canonical version is Types of nets.md. This note preserved.

The Complete AI/Machine Learning Education: From Zero to Practitioner

I'm going to teach you everything from the absolute ground up. By the end of this, you'll understand how AI actually works and be ready to start building.


Chapter 1: What Is AI, Really?

Forgetting the Science Fiction

First, forget everything movies have told you. AI isn't a sentient robot. It's not "thinking" the way you think. At its core, AI is:

Mathematics that finds patterns in data.

That's it. Everything elseβ€”image recognition, language understanding, game playingβ€”emerges from this simple idea applied at massive scale.

The Spectrum of AI

Rule-Based Systems          Machine Learning          Deep Learning
      ↓                           ↓                        ↓
"If X, then Y"            "Learn from examples"      "Learn complex patterns
                                                      with neural networks"

Example:                  Example:                   Example:
"If temperature > 100Β°,   "Show me 10,000 spam      "Show me millions of
  send alert"              emails, learn what         images, learn to
                           spam looks like"           recognize anything"

Rule-based: You write explicit rules. Limited but predictable.

Machine Learning: The computer discovers rules from data. Flexible but needs examples.

Deep Learning: Machine learning with neural networks. Can learn incredibly complex patterns but needs lots of data and computation.

Why This Matters for Astronomy

Traditional astronomy: "If brightness dips by X% for Y hours with this shape, it might be a planet transit."

ML astronomy: "Here are 10,000 confirmed planet transits. Learn what they look like. Now find more."

The second approach finds patterns humans might never think to look for.


Chapter 2: The Mathematics You Actually Need

Don't panic. You need less math than you think, and I'll explain each piece intuitively.

Concept 1: Variables and Functions

A variable is just a placeholder for a number:

x = 5
temperature = 72.4
brightness = 0.00847

A function takes inputs and produces outputs:

f(x) = 2x + 1

When x = 3:  f(3) = 2(3) + 1 = 7
When x = 10: f(10) = 2(10) + 1 = 21

ML insight: A trained model IS a function. It takes your data as input and produces predictions as output.

Concept 2: Vectors and Matrices

A vector is a list of numbers:

pixel_values = [0.1, 0.4, 0.9, 0.2, 0.8]
star_properties = [temperature, brightness, distance, mass]

A matrix is a grid of numbers:

image = [
    [0.1, 0.2, 0.3],
    [0.4, 0.5, 0.6],
    [0.7, 0.8, 0.9]
]

ML insight: All data becomes vectors or matrices. An image? Matrix of pixel values. A spectrum? Vector of intensity values. Text? Converted to vectors of numbers.

Concept 3: The Dot Product

This is the key operation in ML. Multiply corresponding elements and add:

vector_a = [1, 2, 3]
vector_b = [4, 5, 6]

dot_product = (1Γ—4) + (2Γ—5) + (3Γ—6)
            = 4 + 10 + 18
            = 32

ML insight: This is how neural networks combine inputs. Each input gets multiplied by a "weight," then everything is added up.

Concept 4: Probability Basics

Probability measures likelihood (0 = impossible, 1 = certain):

P(coin lands heads) = 0.5
P(sun rises tomorrow) β‰ˆ 1.0
P(finding a unicorn) = 0.0

ML insight: Models output probabilities. "This image is 94% likely to be a spiral galaxy, 5% elliptical, 1% artifact."

Concept 5: Derivatives (Just the Intuition)

A derivative measures "how fast something is changing."

Imagine driving a car:

  • Position = where you are
  • Velocity (derivative of position) = how fast position is changing
  • Acceleration (derivative of velocity) = how fast velocity is changing

ML insight: Training uses derivatives to figure out "if I adjust this parameter slightly, how much does my error change?" This guides learning.


Chapter 3: How Machine Learning Actually Works

The Core Loop

Every ML system follows this pattern:

1. INITIALIZE: Start with random parameter values

2. PREDICT: Use current parameters to make predictions

3. MEASURE ERROR: Compare predictions to correct answers

4. UPDATE: Adjust parameters to reduce error

5. REPEAT: Go back to step 2, thousands of times

Let me make this concrete.

Example: Predicting Star Temperature from Color

The Data:

Star 1: Blue/Red ratio = 0.8, Temperature = 5000K
Star 2: Blue/Red ratio = 1.2, Temperature = 6500K
Star 3: Blue/Red ratio = 1.5, Temperature = 8000K
Star 4: Blue/Red ratio = 2.0, Temperature = 11000K
... (thousands more)

The Model (simplest possible):

Predicted_Temperature = w Γ— (Blue/Red ratio) + b

Where w and b are parameters we need to learn

Training Process:

Step 1: Random initialization
   w = 1000 (random guess)
   b = 2000 (random guess)

Step 2: Make predictions
   Star 1: 1000 Γ— 0.8 + 2000 = 2800K (actual: 5000K) β€” way off!
   Star 2: 1000 Γ— 1.2 + 2000 = 3200K (actual: 6500K) β€” way off!

Step 3: Measure error
   Error = average of (predicted - actual)Β²
         = ((2800-5000)Β² + (3200-6500)Β²) / 2
         = (4,840,000 + 10,890,000) / 2
         = 7,865,000  β€” big number, bad!

Step 4: Update parameters
   Mathematics tells us:
   - Increasing w will reduce error
   - Increasing b will reduce error

   New w = 1000 + adjustment = 3000
   New b = 2000 + adjustment = 2500

Step 5: Repeat
   With new parameters, error becomes 2,100,000
   Keep going...

After 1000 iterations:
   w β‰ˆ 5000
   b β‰ˆ 1000
   Error is now tiny!

Final model:
   Temperature β‰ˆ 5000 Γ— (Blue/Red) + 1000

This simple model learned the relationship between color and temperature!

Gradient Descent: The Heart of Learning

"Gradient descent" is just a fancy name for the update process. Here's the intuition:

Imagine you're blindfolded on a hilly landscape. Your goal: find the lowest valley (minimum error).

Strategy:

  1. Feel the ground around you (compute gradient/derivative)
  2. Figure out which direction goes downhill (direction of steepest descent)
  3. Take a step that direction (update parameters)
  4. Repeat until you stop going downhill (reached minimum)
          Error
            ^
            |    
         *  |  *     <- Starting point (random parameters)
        *   | *
       *    |*
      *     *         <- Each step moves downhill
     *    / 
    *   /
   *  /
  * /
 */________________> Parameters
        ↑
    Minimum (best parameters)

The Learning Rate

How big should each step be?

  • Too big: You overshoot the minimum, bounce around, never converge
  • Too small: Takes forever to reach the minimum
  • Just right: Steady progress toward the best solution
Learning rate too high:      Learning rate too low:     Learning rate good:
        *                            *                         *
       / \                          *                         *
      /   \                        *                         *
     /     \                      *                         *
    /       *                    *                         *
   *         \                  *                         *
              *                *                       *
                              ... (takes forever)    * <- converged!

The learning rate is a hyperparameterβ€”something you choose, not something the model learns.


Chapter 4: Neural Networks Explained

The Biological Inspiration (Loosely)

Your brain has neurons connected by synapses. A neuron:

  1. Receives signals from other neurons
  2. If total signal exceeds a threshold, it "fires"
  3. Sends signals to other neurons

Artificial neural networks are inspired by this (but much simpler).

The Artificial Neuron

Inputs (x₁, xβ‚‚, x₃)        Weights (w₁, wβ‚‚, w₃)       
      |                          |
      v                          v
   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
   β”‚                                            β”‚
   β”‚  weighted_sum = w₁×x₁ + wβ‚‚Γ—xβ‚‚ + w₃×x₃ + b β”‚
   β”‚                                            β”‚
   β”‚  output = activation(weighted_sum)         β”‚
   β”‚                                            β”‚
   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                      |
                      v
                   Output

Inputs: The data (pixel values, measurements, features)

Weights: Learnable parameters that determine importance of each input

Bias (b): An adjustable offset

Activation function: Introduces non-linearity (explained below)

Why Activation Functions Matter

Without activation functions, stacking layers would be pointless:

Layer 1: output = w₁ Γ— input + b₁
Layer 2: output = wβ‚‚ Γ— (w₁ Γ— input + b₁) + bβ‚‚
       = (wβ‚‚Γ—w₁) Γ— input + (wβ‚‚Γ—b₁ + bβ‚‚)
       = W Γ— input + B  ← Still just a linear function!

Activation functions break this linearity, allowing complex patterns:

ReLU (Rectified Linear Unit) β€” most common:

ReLU(x) = max(0, x)

If x is negative, output 0
If x is positive, output x

Examples:
ReLU(-5) = 0
ReLU(0) = 0
ReLU(3) = 3

Sigmoid β€” squashes to 0-1 (good for probabilities):

Sigmoid(x) = 1 / (1 + e^(-x))

Very negative x β†’ ~0
Zero β†’ 0.5
Very positive x β†’ ~1

Softmax β€” for classification (outputs sum to 1):

Used in final layer for classification
Converts raw scores to probabilities

Scores: [2.0, 1.0, 0.1]
Softmax: [0.66, 0.24, 0.10]  ← These sum to 1.0

Building a Neural Network

Stack neurons into layers:

INPUT LAYER          HIDDEN LAYER 1        HIDDEN LAYER 2        OUTPUT LAYER
(your data)          (learned features)    (complex features)    (predictions)

    x₁ ─────────────────────────────────────────────────────────────→
         \         ●                    ●                      
          \       /|\                  /|\                     
           \     / | \                / | \                   
    xβ‚‚ ─────●───●──●──●──────────────●──●──●───────────────●───→ class 1 prob
           /     \ | /                \ | /                \  
          /       \|/                  \|/                  \ 
    x₃ ─────────────●                    ●                   ●─→ class 2 prob
        \          |                    |                    /
         \         ●                    ●                   /
          \       /|\                  /|\                 /
    xβ‚„ ─────────────────────────────────────────────────●───→ class 3 prob

Each connection has a weight (learnable)
Each neuron has a bias (learnable)
Each neuron applies an activation function

What Each Layer Learns (Image Example)

For image classification:

Layer 1: Detects simple patterns

  • Edge detectors (vertical, horizontal, diagonal)
  • Color blobs
  • Simple textures

Layer 2: Combines simple patterns into shapes

  • Corners (vertical + horizontal edges)
  • Curves (many edge detectors)
  • Texture regions

Layer 3: Combines shapes into parts

  • "This looks like a spiral arm"
  • "This looks like a galactic core"
  • "This looks like a star cluster"

Layer 4+: Combines parts into objects

  • "Spiral arms + bright core + overall shape = spiral galaxy"

This hierarchical learning is why deep networks are so powerful!

Forward Pass vs Backward Pass

Forward Pass: Data flows through the network, producing predictions

Input β†’ Layer 1 β†’ Layer 2 β†’ ... β†’ Output β†’ Prediction

Backward Pass (Backpropagation): Errors flow backward, updating weights

How wrong was the prediction?
                ↓
How much did each Layer N weight contribute to error?
                ↓
Adjust Layer N weights
                ↓
How much did each Layer N-1 weight contribute to error?
                ↓
Adjust Layer N-1 weights
                ↓
... continue back to Layer 1 ...

This is where the calculus happensβ€”computing how each weight affects the final error.


Chapter 5: Convolutional Neural Networks (CNNs) for Images

Since you're working with telescope images, CNNs are crucial.

The Problem with Regular Networks for Images

A small 256Γ—256 grayscale image has 65,536 pixels.

If your first layer has 1000 neurons, you'd have 65,536,000 connections from input to first layer alone!

This is:

  • Computationally expensive
  • Prone to overfitting (too many parameters for limited data)
  • Ignores the structure of images (nearby pixels are related)

The Key Insight: Local Patterns

In images, patterns are local:

  • An edge is a few pixels wide
  • A star is a small region
  • Artifacts have local signatures

We don't need every neuron to look at every pixel!

Convolution: The Core Operation

A filter (or kernel) is a small pattern detector:

Example: 3Γ—3 edge-detecting filter

Filter:            Slide over image:
[-1  0  1]         
[-1  0  1]         Original     After convolution
[-1  0  1]         [image] --> [edge map]

How convolution works:

Image region:      Filter:         Calculation:
[1, 2, 3]         [-1, 0, 1]      Sum of element-wise products:
[4, 5, 6]    Γ—    [-1, 0, 1]   =  (-1Γ—1)+(0Γ—2)+(1Γ—3)+
[7, 8, 9]         [-1, 0, 1]      (-1Γ—4)+(0Γ—5)+(1Γ—6)+
                                   (-1Γ—7)+(0Γ—8)+(1Γ—9)
                                 = -1+0+3-4+0+6-7+0+9 = 6

Slide the filter across the entire image, computing this at each position. The result is a feature map.

Multiple Filters = Multiple Features

A CNN layer has many filters, each learning to detect different patterns:

Input Image (1 channel: grayscale)
        ↓
   Conv Layer 1 (32 filters)
        ↓
   32 Feature Maps (different patterns detected)
        ↓
   Conv Layer 2 (64 filters, each looks at all 32 previous maps)
        ↓
   64 Feature Maps (combinations of patterns)
        ↓
   ... more layers ...
        ↓
   Final Classification

Pooling: Reducing Size

After convolution, we often pool to reduce the size:

Max Pooling (2Γ—2):

[1, 3, 2, 4]      
[5, 6, 1, 2]  β†’   [6, 4]    Take max of each 2Γ—2 region
[3, 2, 1, 0]      [3, 3]
[1, 2, 3, 1]

This:

  • Reduces computation for later layers
  • Adds some translation invariance (small shifts don't matter)
  • Keeps the strongest activations

Complete CNN Architecture

Input: 256Γ—256Γ—1 telescope image

Conv1: 32 filters (3Γ—3), ReLU β†’ 256Γ—256Γ—32
Pool1: Max pool (2Γ—2) β†’ 128Γ—128Γ—32

Conv2: 64 filters (3Γ—3), ReLU β†’ 128Γ—128Γ—64
Pool2: Max pool (2Γ—2) β†’ 64Γ—64Γ—64

Conv3: 128 filters (3Γ—3), ReLU β†’ 64Γ—64Γ—128
Pool3: Max pool (2Γ—2) β†’ 32Γ—32Γ—128

Flatten: 32Γ—32Γ—128 = 131,072 values

Dense1: 512 neurons, ReLU
Dense2: 128 neurons, ReLU
Output: 5 neurons, Softmax β†’ [spiral, elliptical, irregular, merger, artifact]

Why CNNs Work So Well for Astronomical Images

  1. Translation invariance: A galaxy in the corner looks the same as one in the center
  2. Hierarchical features: Learn edges β†’ shapes β†’ structures β†’ objects
  3. Parameter efficiency: Same filter applied everywhere, fewer total parameters
  4. Natural for 2D data: Respects spatial relationships

Chapter 6: Training in Practice

The Training/Validation/Test Split

Never evaluate on data you trained on! Split your data:

All Your Data (e.g., 10,000 galaxy images)
         ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Training Set (70%): 7,000 images       β”‚ ← Model learns from these
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ Validation Set (15%): 1,500 images     β”‚ ← Tune hyperparameters, early stopping
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ Test Set (15%): 1,500 images           β”‚ ← Final evaluation only (touch once!)
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Training set: Model sees these, adjusts weights

Validation set: Model never trains on these; use to check performance during training

Test set: Model never sees until final evaluation; gives unbiased performance estimate

Overfitting vs Underfitting

Underfitting: Model too simple, can't capture patterns

Training accuracy: 60%
Validation accuracy: 58%
Both are bad β†’ need more complex model

Good fit: Model captures patterns without memorizing

Training accuracy: 95%
Validation accuracy: 92%
Both good, close together β†’ well-tuned model

Overfitting: Model memorized training data, fails on new data

Training accuracy: 99%
Validation accuracy: 70%
Big gap β†’ model is memorizing, not learning

Visualization:

                Model Complexity β†’

    ↑    
    |       Underfitting     Sweet      Overfitting
 E  |            |           Spot           |
 r  |    ____    |     ______|______       |
 r  |   /    \   |    /      |      \      |
 o  |  /      \  |   /       |       \     |
 r  | /        \ |  /        |        \    |
    |/          \| /         |         \   |
    └────────────┴───────────┴──────────\──┴──

    ─── Training Error
    ─ ─ Validation Error

Regularization: Preventing Overfitting

Dropout: Randomly "turn off" neurons during training

During training:
[neuron1] [     ] [neuron3] [     ] [neuron5]   ← 40% dropped
              ↓
Forces network to not rely on any single neuron
              ↓
More robust, generalizes better

L2 Regularization: Penalize large weights

Loss = Prediction_Error + Ξ» Γ— (sum of squared weights)

Large weights get penalized
Forces model to use smaller, more distributed weights

Data Augmentation: Create variations of training data

Original galaxy image
    ↓
Augmented versions:
- Rotated 90Β°, 180Β°, 270Β°
- Flipped horizontally
- Flipped vertically
- Slightly shifted
- Slightly zoomed
- Noise added
- Brightness adjusted

1 image becomes 10+ training examples!

For astronomy, augmentation is powerful because physics doesn't change with rotation.

Batch Training

Processing all data at once is memory-intensive. Instead, use mini-batches:

10,000 training images
    ↓
Split into batches of 32
    ↓
312 batches per epoch

Each training step:
1. Load batch of 32 images
2. Forward pass: compute predictions
3. Compute loss
4. Backward pass: compute gradients
5. Update weights
6. Next batch

One complete pass through all batches = 1 epoch
Training typically runs for 10-100+ epochs

Learning Rate Schedules

Learning rate can change during training:

Constant:        Step Decay:       Exponential:     Cosine Annealing:

 lr              lr                lr               lr
  |____          |__               |\               /\    /\
  |              |  |__            | \             /  \  /  \
  |              |     |__         |  \           /    \/    \
  |____________  |________|___     |___\____     /_____________\
       epochs        epochs          epochs           epochs

Common approach: Start high (learn fast), decrease over time (fine-tune).

Early Stopping

Stop training when validation performance stops improving:

Epoch 1:  Val accuracy = 70%
Epoch 2:  Val accuracy = 78%
Epoch 3:  Val accuracy = 84%
Epoch 4:  Val accuracy = 88%
Epoch 5:  Val accuracy = 90%
Epoch 6:  Val accuracy = 91%
Epoch 7:  Val accuracy = 91%  ← Stopped improving
Epoch 8:  Val accuracy = 90%  ← Getting worse (overfitting starting)
Epoch 9:  Val accuracy = 89%
...

Early stopping: Stop at epoch 6 or 7, save that model

Chapter 7: Practical Python for ML

Setting Up Your Environment

Step 1: Install Python (version 3.9 or 3.10 recommended)

Step 2: Install essential packages

pip install numpy pandas matplotlib scikit-learn
pip install torch torchvision  # PyTorch (or tensorflow if you prefer)
pip install astropy  # For astronomy data
pip install jupyter  # For interactive development

Step 3: Verify installation

import numpy as np
import torch
import astropy
print("All imports successful!")
print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")  # True if you have GPU

NumPy: The Foundation

NumPy is for numerical computing. Everything in ML uses it.

import numpy as np

# Creating arrays
a = np.array([1, 2, 3, 4, 5])
b = np.zeros((3, 3))  # 3x3 array of zeros
c = np.ones((2, 4))   # 2x4 array of ones
d = np.random.randn(100, 100)  # 100x100 random values (normal distribution)

# Array operations (element-wise)
x = np.array([1, 2, 3])
y = np.array([4, 5, 6])

print(x + y)      # [5, 7, 9]
print(x * y)      # [4, 10, 18]
print(x ** 2)     # [1, 4, 9]
print(np.sqrt(x)) # [1.0, 1.414, 1.732]

# Statistics
data = np.random.randn(1000)
print(np.mean(data))   # ~0
print(np.std(data))    # ~1
print(np.max(data))    # ~3
print(np.min(data))    # ~-3

# Reshaping
image = np.random.randn(256, 256)  # 2D image
flat = image.reshape(-1)  # Flatten to 1D: 65536 elements
back = flat.reshape(256, 256)  # Back to 2D

# Slicing
arr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print(arr[0, :])    # First row: [1, 2, 3]
print(arr[:, 1])    # Second column: [2, 5, 8]
print(arr[1:, 1:])  # Bottom-right: [[5, 6], [8, 9]]

Matplotlib: Visualization

import matplotlib.pyplot as plt
import numpy as np

# Basic line plot
x = np.linspace(0, 10, 100)
y = np.sin(x)
plt.plot(x, y)
plt.xlabel('x')
plt.ylabel('sin(x)')
plt.title('Sine Wave')
plt.show()

# Scatter plot
x = np.random.randn(100)
y = x + np.random.randn(100) * 0.5
plt.scatter(x, y, alpha=0.5)
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.show()

# Image display (crucial for astronomy!)
image = np.random.randn(256, 256)
plt.imshow(image, cmap='gray')
plt.colorbar()
plt.title('Random Image')
plt.show()

# Histogram
data = np.random.randn(10000)
plt.hist(data, bins=50, edgecolor='black')
plt.xlabel('Value')
plt.ylabel('Count')
plt.title('Distribution')
plt.show()

# Multiple subplots
fig, axes = plt.subplots(2, 2, figsize=(10, 10))
axes[0, 0].plot(x, y)
axes[0, 1].scatter(x, y)
axes[1, 0].imshow(image, cmap='viridis')
axes[1, 1].hist(data, bins=30)
plt.tight_layout()
plt.show()

Astropy: Handling Astronomical Data

from astropy.io import fits
from astropy import units as u
from astropy.coordinates import SkyCoord
import numpy as np
import matplotlib.pyplot as plt

# Reading FITS files (telescope images)
def load_fits_image(filepath):
    with fits.open(filepath) as hdul:
        # Primary data is usually in index 0 or 1
        print(hdul.info())  # See what's in the file

        data = hdul[0].data  # The image data
        header = hdul[0].header  # Metadata

        return data, header

# Example usage
# data, header = load_fits_image('my_observation.fits')
# print(f"Image shape: {data.shape}")
# print(f"Object: {header.get('OBJECT', 'Unknown')}")
# print(f"Exposure time: {header.get('EXPTIME', 'Unknown')} seconds")

# Working with coordinates
coord = SkyCoord('10h30m00s', '+45d00m00s', frame='icrs')
print(f"RA: {coord.ra.degree} degrees")
print(f"Dec: {coord.dec.degree} degrees")

# Unit conversions
distance = 100 * u.pc  # 100 parsecs
print(f"In light years: {distance.to(u.lyr)}")
print(f"In AU: {distance.to(u.AU)}")

# Displaying astronomical images properly
def display_astronomical_image(data, title='Astronomical Image'):
    """Display with log stretch (common for astronomy)"""
    # Handle negative values
    data_shifted = data - np.nanmin(data) + 1

    # Log stretch
    log_data = np.log10(data_shifted)

    # Display
    plt.figure(figsize=(10, 10))
    plt.imshow(log_data, cmap='gray', origin='lower')
    plt.colorbar(label='log(counts)')
    plt.title(title)
    plt.show()

PyTorch Basics

PyTorch is a deep learning framework. Here's the essentials:

import torch
import torch.nn as nn
import torch.optim as optim

# Tensors (like numpy arrays, but can run on GPU)
a = torch.tensor([1.0, 2.0, 3.0])
b = torch.zeros(3, 3)
c = torch.randn(100, 100)

# Move to GPU (if available)
if torch.cuda.is_available():
    a = a.cuda()
    # or
    device = torch.device('cuda')
    a = a.to(device)

# Convert between numpy and torch
import numpy as np
numpy_array = np.array([1.0, 2.0, 3.0])
torch_tensor = torch.from_numpy(numpy_array)
back_to_numpy = torch_tensor.numpy()

# Automatic differentiation (the magic of PyTorch!)
x = torch.tensor([2.0], requires_grad=True)
y = x ** 2 + 3 * x + 1  # y = xΒ² + 3x + 1

y.backward()  # Compute derivative
print(x.grad)  # dy/dx = 2x + 3 = 2(2) + 3 = 7 βœ“

Building Your First Neural Network in PyTorch

import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset

# Define the network
class SimpleClassifier(nn.Module):
    def __init__(self, input_size, hidden_size, num_classes):
        super(SimpleClassifier, self).__init__()

        self.network = nn.Sequential(
            nn.Linear(input_size, hidden_size),
            nn.ReLU(),
            nn.Dropout(0.3),
            nn.Linear(hidden_size, hidden_size // 2),
            nn.ReLU(),
            nn.Dropout(0.3),
            nn.Linear(hidden_size // 2, num_classes)
        )

    def forward(self, x):
        return self.network(x)

# Create synthetic data for demonstration
num_samples = 1000
input_size = 100
num_classes = 5

X = torch.randn(num_samples, input_size)
y = torch.randint(0, num_classes, (num_samples,))

# Split into train/val
train_X, val_X = X[:800], X[800:]
train_y, val_y = y[:800], y[800:]

# Create data loaders
train_dataset = TensorDataset(train_X, train_y)
train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)

val_dataset = TensorDataset(val_X, val_y)
val_loader = DataLoader(val_dataset, batch_size=32, shuffle=False)

# Initialize model, loss, optimizer
model = SimpleClassifier(input_size=100, hidden_size=256, num_classes=5)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Training loop
num_epochs = 20

for epoch in range(num_epochs):
    model.train()  # Set to training mode
    train_loss = 0

    for batch_X, batch_y in train_loader:
        # Forward pass
        outputs = model(batch_X)
        loss = criterion(outputs, batch_y)

        # Backward pass
        optimizer.zero_grad()  # Clear previous gradients
        loss.backward()        # Compute gradients
        optimizer.step()       # Update weights

        train_loss += loss.item()

    # Validation
    model.eval()  # Set to evaluation mode
    val_loss = 0
    correct = 0
    total = 0

    with torch.no_grad():  # No gradient computation for validation
        for batch_X, batch_y in val_loader:
            outputs = model(batch_X)
            loss = criterion(outputs, batch_y)
            val_loss += loss.item()

            _, predicted = torch.max(outputs, 1)
            total += batch_y.size(0)
            correct += (predicted == batch_y).sum().item()

    accuracy = 100 * correct / total
    print(f'Epoch [{epoch+1}/{num_epochs}], '
          f'Train Loss: {train_loss/len(train_loader):.4f}, '
          f'Val Loss: {val_loss/len(val_loader):.4f}, '
          f'Val Accuracy: {accuracy:.2f}%')

Building a CNN for Images

import torch
import torch.nn as nn

class AstronomyCNN(nn.Module):
    def __init__(self, num_classes=5):
        super(AstronomyCNN, self).__init__()

        # Convolutional layers
        self.conv_layers = nn.Sequential(
            # Input: 1 channel (grayscale), Output: 32 channels
            nn.Conv2d(1, 32, kernel_size=3, padding=1),
            nn.BatchNorm2d(32),
            nn.ReLU(),
            nn.MaxPool2d(2, 2),  # 256 -> 128

            nn.Conv2d(32, 64, kernel_size=3, padding=1),
            nn.BatchNorm2d(64),
            nn.ReLU(),
            nn.MaxPool2d(2, 2),  # 128 -> 64

            nn.Conv2d(64, 128, kernel_size=3, padding=1),
            nn.BatchNorm2d(128),
            nn.ReLU(),
            nn.MaxPool2d(2, 2),  # 64 -> 32

            nn.Conv2d(128, 256, kernel_size=3, padding=1),
            nn.BatchNorm2d(256),
            nn.ReLU(),
            nn.MaxPool2d(2, 2),  # 32 -> 16
        )

        # Fully connected layers
        self.fc_layers = nn.Sequential(
            nn.Flatten(),
            nn.Linear(256 * 16 * 16, 512),
            nn.ReLU(),
            nn.Dropout(0.5),
            nn.Linear(512, 128),
            nn.ReLU(),
            nn.Dropout(0.5),
            nn.Linear(128, num_classes)
        )

    def forward(self, x):
        x = self.conv_layers(x)
        x = self.fc_layers(x)
        return x

# Create model
model = AstronomyCNN(num_classes=5)

# Print model summary
print(model)

# Check with dummy input
dummy_input = torch.randn(1, 1, 256, 256)  # Batch of 1, 1 channel, 256x256
output = model(dummy_input)
print(f"Output shape: {output.shape}")  # Should be [1, 5]

Complete Training Script for Astronomical Image Classification

import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader
import numpy as np
from astropy.io import fits
import os
from pathlib import Path
import matplotlib.pyplot as plt

class AstronomyDataset(Dataset):
    """Custom dataset for astronomical images"""

    def __init__(self, image_dir, labels_file, transform=None):
        """
        Args:
            image_dir: Directory with FITS images
            labels_file: Text file with "filename,label" per line
            transform: Optional transform function
        """
        self.image_dir = Path(image_dir)
        self.transform = transform

        # Load labels
        self.samples = []
        with open(labels_file, 'r') as f:
            for line in f:
                filename, label = line.strip().split(',')
                self.samples.append((filename, int(label)))

        self.classes = ['spiral', 'elliptical', 'irregular', 'merger', 'artifact']

    def __len__(self):
        return len(self.samples)

    def __getitem__(self, idx):
        filename, label = self.samples[idx]

        # Load FITS image
        filepath = self.image_dir / filename
        with fits.open(filepath) as hdul:
            image = hdul[0].data.astype(np.float32)

        # Preprocessing
        image = self.preprocess(image)

        # Apply transforms if any
        if self.transform:
            image = self.transform(image)

        # Convert to tensor
        image = torch.from_numpy(image).unsqueeze(0)  # Add channel dimension

        return image, label

    def preprocess(self, image):
        """Standard preprocessing for astronomical images"""
        # Handle NaN values
        image = np.nan_to_num(image, nan=0.0)

        # Clip extreme values (cosmic rays, bad pixels)
        p1, p99 = np.percentile(image, [1, 99])
        image = np.clip(image, p1, p99)

        # Log stretch (handles large dynamic range)
        image = image - image.min() + 1
        image = np.log(image)

        # Normalize to 0-1
        image = (image - image.min()) / (image.max() - image.min() + 1e-8)

        return image


def train_model(model, train_loader, val_loader, num_epochs=50, 
                learning_rate=0.001, device='cuda'):
    """Complete training function with bells and whistles"""

    model = model.to(device)
    criterion = nn.CrossEntropyLoss()
    optimizer = optim.Adam(model.parameters(), lr=learning_rate)
    scheduler = optim.lr_scheduler.ReduceLROnPlateau(
        optimizer, mode='max', factor=0.5, patience=5
    )

    best_accuracy = 0
    history = {'train_loss': [], 'val_loss': [], 'val_accuracy': []}

    for epoch in range(num_epochs):
        # Training phase
        model.train()
        train_loss = 0

        for images, labels in train_loader:
            images, labels = images.to(device), labels.to(device)

            optimizer.zero_grad()
            outputs = model(images)
            loss = criterion(outputs, labels)
            loss.backward()
            optimizer.step()

            train_loss += loss.item()

        # Validation phase
        model.eval()
        val_loss = 0
        correct = 0
        total = 0

        with torch.no_grad():
            for images, labels in val_loader:
                images, labels = images.to(device), labels.to(device)
                outputs = model(images)
                loss = criterion(outputs, labels)
                val_loss += loss.item()

                _, predicted = torch.max(outputs, 1)
                total += labels.size(0)
                correct += (predicted == labels).sum().item()

        accuracy = 100 * correct / total
        avg_train_loss = train_loss / len(train_loader)
        avg_val_loss = val_loss / len(val_loader)

        # Update scheduler
        scheduler.step(accuracy)

        # Save history
        history['train_loss'].append(avg_train_loss)
        history['val_loss'].append(avg_val_loss)
        history['val_accuracy'].append(accuracy)

        # Save best model
        if accuracy > best_accuracy:
            best_accuracy = accuracy
            torch.save(model.state_dict(), 'best_model.pt')

        print(f'Epoch [{epoch+1}/{num_epochs}] '
              f'Train Loss: {avg_train_loss:.4f} '
              f'Val Loss: {avg_val_loss:.4f} '
              f'Val Acc: {accuracy:.2f}% '
              f'(Best: {best_accuracy:.2f}%)')

    return history


def plot_training_history(history):
    """Visualize training progress"""
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))

    # Loss plot
    ax1.plot(history['train_loss'], label='Train Loss')
    ax1.plot(history['val_loss'], label='Val Loss')
    ax1.set_xlabel('Epoch')
    ax1.set_ylabel('Loss')
    ax1.set_title('Training and Validation Loss')
    ax1.legend()

    # Accuracy plot
    ax2.plot(history['val_accuracy'], label='Val Accuracy', color='green')
    ax2.set_xlabel('Epoch')
    ax2.set_ylabel('Accuracy (%)')
    ax2.set_title('Validation Accuracy')
    ax2.legend()

    plt.tight_layout()
    plt.savefig('training_history.png')
    plt.show()


# Example usage (you'd replace with your actual data):
if __name__ == '__main__':
    # Check for GPU
    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
    print(f"Using device: {device}")

    # Create model
    model = AstronomyCNN(num_classes=5)

    # For demonstration, create random data
    # In practice, you'd use AstronomyDataset with real data
    train_X = torch.randn(800, 1, 256, 256)
    train_y = torch.randint(0, 5, (800,))
    val_X = torch.randn(200, 1, 256, 256)
    val_y = torch.randint(0, 5, (200,))

    train_dataset = torch.utils.data.TensorDataset(train_X, train_y)
    val_dataset = torch.utils.data.TensorDataset(val_X, val_y)

    train_loader = DataLoader(train_dataset, batch_size=16, shuffle=True)
    val_loader = DataLoader(val_dataset, batch_size=16, shuffle=False)

    # Train
    history = train_model(model, train_loader, val_loader, 
                          num_epochs=20, device=device)

    # Plot results
    plot_training_history(history)

Chapter 8: Your First Complete Project

Let's build something real: an image quality classifier for your telescope.

Project: Automatic Image Quality Assessment

Goal: Given a raw telescope frame, predict quality (good/medium/bad) automatically.

Step 1: Data Collection

First, manually classify some of your existing images:

import os
import shutil
from pathlib import Path

# Create directory structure
for quality in ['good', 'medium', 'bad']:
    Path(f'training_data/{quality}').mkdir(parents=True, exist_ok=True)

print("""
Manual Classification Guide:
- GOOD: Clear stars, low background, good focus
- MEDIUM: Some clouds, slightly out of focus, minor issues
- BAD: Heavy clouds, tracking errors, severe artifacts

Move or copy your FITS files into the appropriate folders.
Aim for at least 100 images per category.
""")

Step 2: Data Preparation

import numpy as np
from astropy.io import fits
from pathlib import Path
import torch
from torch.utils.data import Dataset, DataLoader
from sklearn.model_selection import train_test_split

class QualityDataset(Dataset):
    def __init__(self, filepaths, labels, image_size=128):
        self.filepaths = filepaths
        self.labels = labels
        self.image_size = image_size

    def __len__(self):
        return len(self.filepaths)

    def __getitem__(self, idx):
        # Load image
        with fits.open(self.filepaths[idx]) as hdul:
            image = hdul[0].data.astype(np.float32)

        # Resize to consistent size
        from scipy.ndimage import zoom
        zoom_factor = self.image_size / max(image.shape)
        image = zoom(image, zoom_factor)

        # Pad to exact size if needed
        if image.shape[0] < self.image_size:
            pad = self.image_size - image.shape[0]
            image = np.pad(image, ((0, pad), (0, 0)))
        if image.shape[1] < self.image_size:
            pad = self.image_size - image.shape[1]
            image = np.pad(image, ((0, 0), (0, pad)))

        # Crop to exact size
        image = image[:self.image_size, :self.image_size]

        # Normalize
        image = np.nan_to_num(image, nan=0)
        p1, p99 = np.percentile(image, [1, 99])
        image = np.clip(image, p1, p99)
        image = (image - image.min()) / (image.max() - image.min() + 1e-8)

        # To tensor
        image = torch.from_numpy(image).unsqueeze(0)

        return image, self.labels[idx]

def prepare_data(data_dir='training_data'):
    """Load data from organized folders"""
    filepaths = []
    labels = []
    label_map = {'good': 0, 'medium': 1, 'bad': 2}

    for quality, label in label_map.items():
        folder = Path(data_dir) / quality
        for filepath in folder.glob('*.fits'):
            filepaths.append(str(filepath))
            labels.append(label)

    # Split into train/val/test
    train_files, temp_files, train_labels, temp_labels = train_test_split(
        filepaths, labels, test_size=0.3, stratify=labels, random_state=42
    )
    val_files, test_files, val_labels, test_labels = train_test_split(
        temp_files, temp_labels, test_size=0.5, stratify=temp_labels, random_state=42
    )

    print(f"Training samples: {len(train_files)}")
    print(f"Validation samples: {len(val_files)}")
    print(f"Test samples: {len(test_files)}")

    return (
        (train_files, train_labels),
        (val_files, val_labels),
        (test_files, test_labels)
    )

Step 3: Model Definition

import torch.nn as nn

class QualityClassifier(nn.Module):
    """Lightweight CNN for image quality assessment"""

    def __init__(self, num_classes=3):
        super().__init__()

        self.features = nn.Sequential(
            # Block 1: 128 -> 64
            nn.Conv2d(1, 16, 3, padding=1),
            nn.BatchNorm2d(16),
            nn.ReLU(),
            nn.MaxPool2d(2),

            # Block 2: 64 -> 32
            nn.Conv2d(16, 32, 3, padding=1),
            nn.BatchNorm2d(32),
            nn.ReLU(),
            nn.MaxPool2d(2),

            # Block 3: 32 -> 16
            nn.Conv2d(32, 64, 3, padding=1),
            nn.BatchNorm2d(64),
            nn.ReLU(),
            nn.MaxPool2d(2),

            # Block 4: 16 -> 8
            nn.Conv2d(64, 128, 3, padding=1),
            nn.BatchNorm2d(128),
            nn.ReLU(),
            nn.MaxPool2d(2),
        )

        self.classifier = nn.Sequential(
            nn.Flatten(),
            nn.Linear(128 * 8 * 8, 256),
            nn.ReLU(),
            nn.Dropout(0.5),
            nn.Linear(256, num_classes)
        )

    def forward(self, x):
        x = self.features(x)
        x = self.classifier(x)
        return x

Step 4: Training Script

def train_quality_model():
    # Configuration
    BATCH_SIZE = 16
    LEARNING_RATE = 0.001
    NUM_EPOCHS = 30
    DEVICE = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

    # Prepare data
    (train_files, train_labels), (val_files, val_labels), _ = prepare_data()

    train_dataset = QualityDataset(train_files, train_labels)
    val_dataset = QualityDataset(val_files, val_labels)

    train_loader = DataLoader(train_dataset, batch_size=BATCH_SIZE, 
                              shuffle=True, num_workers=2)
    val_loader = DataLoader(val_dataset, batch_size=BATCH_SIZE, 
                            shuffle=False, num_workers=2)

    # Initialize model
    model = QualityClassifier(num_classes=3).to(DEVICE)
    criterion = nn.CrossEntropyLoss()
    optimizer = torch.optim.Adam(model.parameters(), lr=LEARNING_RATE)

    # Training loop
    best_accuracy = 0

    for epoch in range(NUM_EPOCHS):
        # Train
        model.train()
        train_loss = 0
        for images, labels in train_loader:
            images, labels = images.to(DEVICE), labels.to(DEVICE)

            optimizer.zero_grad()
            outputs = model(images)
            loss = criterion(outputs, labels)
            loss.backward()
            optimizer.step()

            train_loss += loss.item()

        # Validate
        model.eval()
        correct = 0
        total = 0
        with torch.no_grad():
            for images, labels in val_loader:
                images, labels = images.to(DEVICE), labels.to(DEVICE)
                outputs = model(images)
                _, predicted = torch.max(outputs, 1)
                total += labels.size(0)
                correct += (predicted == labels).sum().item()

        accuracy = 100 * correct / total

        print(f'Epoch [{epoch+1}/{NUM_EPOCHS}] '
              f'Loss: {train_loss/len(train_loader):.4f} '
              f'Accuracy: {accuracy:.1f}%')

        # Save best model
        if accuracy > best_accuracy:
            best_accuracy = accuracy
            torch.save({
                'model_state': model.state_dict(),
                'accuracy': accuracy,
                'epoch': epoch
            }, 'quality_classifier_best.pt')

    print(f"\nTraining complete! Best accuracy: {best_accuracy:.1f}%")
    return model

Step 5: Deployment for Real-Time Use

class RealTimeQualityChecker:
    """Deploy the trained model for real-time quality assessment"""

    def __init__(self, model_path='quality_classifier_best.pt'):
        self.device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

        # Load model
        self.model = QualityClassifier(num_classes=3)
        checkpoint = torch.load(model_path, map_location=self.device)
        self.model.load_state_dict(checkpoint['model_state'])
        self.model.to(self.device)
        self.model.eval()

        self.classes = ['good', 'medium', 'bad']

    def preprocess(self, image):
        """Preprocess a raw numpy image"""
        from scipy.ndimage import zoom

        # Resize
        zoom_factor = 128 / max(image.shape)
        image = zoom(image.astype(np.float32), zoom_factor)
        image = image[:128, :128]

        # Normalize
        image = np.nan_to_num(image, nan=0)
        p1, p99 = np.percentile(image, [1, 99])
        image = np.clip(image, p1, p99)
        image = (image - image.min()) / (image.max() - image.min() + 1e-8)

        # To tensor
        tensor = torch.from_numpy(image).unsqueeze(0).unsqueeze(0)
        return tensor.to(self.device)

    def assess(self, image):
        """
        Assess image quality

        Args:
            image: numpy array (raw telescope image)

        Returns:
            dict with quality label and confidence
        """
        tensor = self.preprocess(image)

        with torch.no_grad():
            outputs = self.model(tensor)
            probabilities = torch.softmax(outputs, dim=1)[0]
            predicted_class = torch.argmax(probabilities).item()

        return {
            'quality': self.classes[predicted_class],
            'confidence': probabilities[predicted_class].item(),
            'all_probabilities': {
                cls: prob.item() 
                for cls, prob in zip(self.classes, probabilities)
            }
        }

    def assess_file(self, filepath):
        """Assess quality of a FITS file"""
        with fits.open(filepath) as hdul:
            image = hdul[0].data
        return self.assess(image)

# Usage example:
if __name__ == '__main__':
    checker = RealTimeQualityChecker('quality_classifier_best.pt')

    # Assess a single file
    result = checker.assess_file('new_observation.fits')
    print(f"Quality: {result['quality']} ({result['confidence']:.1%} confident)")

    # In a real-time loop
    def process_new_frame(filepath):
        result = checker.assess_file(filepath)

        if result['quality'] == 'bad':
            print(f"⚠️ Bad frame detected: {filepath}")
            # Could trigger alert or stop observation
        elif result['quality'] == 'medium':
            print(f"⚑ Medium quality: {filepath}")
            # Continue but flag for review
        else:
            print(f"βœ“ Good frame: {filepath}")
            # Proceed normally

        return result

Chapter 9: Next Steps and Resources

Your Learning Path

Week 1-2: Python fundamentals

  • Complete a Python tutorial (Codecademy, Python.org tutorial)
  • Practice with NumPy and Matplotlib
  • Load and visualize your telescope images

Week 3-4: Machine learning concepts

  • Take Andrew Ng's ML course on Coursera (free to audit)
  • Implement simple models with scikit-learn
  • Understand training/validation/testing

Week 5-6: Deep learning basics

  • Work through Fast.ai course (free, practical)
  • Build your first CNN in PyTorch
  • Train on your own data

Week 7-8: Your first real project

  • Implement the quality classifier above
  • Collect and label your data
  • Train, validate, deploy

Month 2+: Advanced topics

  • Time-series analysis for transient detection
  • Multi-site coordination systems
  • Real-time processing pipelines

Essential Resources

Books:

  • "Python for Astronomers" (free online)
  • "Deep Learning" by Goodfellow (the bible, free online)
  • "Hands-On Machine Learning" by GΓ©ron

Courses:

  • Fast.ai (practical deep learning)
  • Coursera: Andrew Ng's courses
  • DeepLearning.AI specializations

Astronomy-specific:

  • AstroML documentation
  • Astropy tutorials
  • AAS astronomy + ML workshops

Communities:

  • Stack Overflow (coding help)
  • Cross Validated (ML theory)
  • r/MachineLearning (Reddit)
  • Astropy Slack/Discord

Hardware Recommendations

For learning (budget):

  • Any modern laptop with 8GB+ RAM
  • Use Google Colab for free GPU access

For development (intermediate):

  • Desktop with NVIDIA GPU (RTX 3060 or better)
  • 32GB+ RAM
  • Fast SSD storage

For production (your telescope array):

  • Edge devices: NVIDIA Jetson at each site
  • Central server: Multiple GPUs for training
  • Cloud backup for burst computing

Final Thoughts

You now have a complete foundation in AI/ML for astronomy. The key principles:

  1. ML finds patterns in data β€” nothing magical
  2. Neural networks learn hierarchically β€” simple to complex features
  3. Training requires iteration β€” thousands of updates to find good parameters
  4. Data quality matters more than model complexity β€” garbage in, garbage out
  5. Start simple, add complexity as needed β€” don't over-engineer initially

For your distributed telescope array, the path is:

  1. Build local quality assessment (the project above)
  2. Add transient detection at each site
  3. Implement cross-site coordination
  4. Develop data fusion capabilities
  5. Create autonomous discovery systems

Each step builds on the previous. You don't need to understand everything at onceβ€”learn what you need for each stage.


Would you like me to elaborate on any specific section? I can provide:

  • More code examples for specific tasks
  • Deeper mathematical explanations
  • Hardware setup guides
  • Specific astronomy ML techniques
  • Step-by-step project walkthroughs

[[wot]]