Duplicate of:
Types of nets.mdβ same content. Canonical version isTypes of nets.md. This note preserved.
The Complete AI/Machine Learning Education: From Zero to Practitioner¶
I'm going to teach you everything from the absolute ground up. By the end of this, you'll understand how AI actually works and be ready to start building.
Chapter 1: What Is AI, Really?¶
Forgetting the Science Fiction¶
First, forget everything movies have told you. AI isn't a sentient robot. It's not "thinking" the way you think. At its core, AI is:
Mathematics that finds patterns in data.
That's it. Everything elseβimage recognition, language understanding, game playingβemerges from this simple idea applied at massive scale.
The Spectrum of AI¶
Rule-Based Systems Machine Learning Deep Learning
β β β
"If X, then Y" "Learn from examples" "Learn complex patterns
with neural networks"
Example: Example: Example:
"If temperature > 100Β°, "Show me 10,000 spam "Show me millions of
send alert" emails, learn what images, learn to
spam looks like" recognize anything"
Rule-based: You write explicit rules. Limited but predictable.
Machine Learning: The computer discovers rules from data. Flexible but needs examples.
Deep Learning: Machine learning with neural networks. Can learn incredibly complex patterns but needs lots of data and computation.
Why This Matters for Astronomy¶
Traditional astronomy: "If brightness dips by X% for Y hours with this shape, it might be a planet transit."
ML astronomy: "Here are 10,000 confirmed planet transits. Learn what they look like. Now find more."
The second approach finds patterns humans might never think to look for.
Chapter 2: The Mathematics You Actually Need¶
Don't panic. You need less math than you think, and I'll explain each piece intuitively.
Concept 1: Variables and Functions¶
A variable is just a placeholder for a number:
x = 5
temperature = 72.4
brightness = 0.00847
A function takes inputs and produces outputs:
f(x) = 2x + 1
When x = 3: f(3) = 2(3) + 1 = 7
When x = 10: f(10) = 2(10) + 1 = 21
ML insight: A trained model IS a function. It takes your data as input and produces predictions as output.
Concept 2: Vectors and Matrices¶
A vector is a list of numbers:
pixel_values = [0.1, 0.4, 0.9, 0.2, 0.8]
star_properties = [temperature, brightness, distance, mass]
A matrix is a grid of numbers:
image = [
[0.1, 0.2, 0.3],
[0.4, 0.5, 0.6],
[0.7, 0.8, 0.9]
]
ML insight: All data becomes vectors or matrices. An image? Matrix of pixel values. A spectrum? Vector of intensity values. Text? Converted to vectors of numbers.
Concept 3: The Dot Product¶
This is the key operation in ML. Multiply corresponding elements and add:
vector_a = [1, 2, 3]
vector_b = [4, 5, 6]
dot_product = (1Γ4) + (2Γ5) + (3Γ6)
= 4 + 10 + 18
= 32
ML insight: This is how neural networks combine inputs. Each input gets multiplied by a "weight," then everything is added up.
Concept 4: Probability Basics¶
Probability measures likelihood (0 = impossible, 1 = certain):
P(coin lands heads) = 0.5
P(sun rises tomorrow) β 1.0
P(finding a unicorn) = 0.0
ML insight: Models output probabilities. "This image is 94% likely to be a spiral galaxy, 5% elliptical, 1% artifact."
Concept 5: Derivatives (Just the Intuition)¶
A derivative measures "how fast something is changing."
Imagine driving a car:
- Position = where you are
- Velocity (derivative of position) = how fast position is changing
- Acceleration (derivative of velocity) = how fast velocity is changing
ML insight: Training uses derivatives to figure out "if I adjust this parameter slightly, how much does my error change?" This guides learning.
Chapter 3: How Machine Learning Actually Works¶
The Core Loop¶
Every ML system follows this pattern:
1. INITIALIZE: Start with random parameter values
2. PREDICT: Use current parameters to make predictions
3. MEASURE ERROR: Compare predictions to correct answers
4. UPDATE: Adjust parameters to reduce error
5. REPEAT: Go back to step 2, thousands of times
Let me make this concrete.
Example: Predicting Star Temperature from Color¶
The Data:
Star 1: Blue/Red ratio = 0.8, Temperature = 5000K
Star 2: Blue/Red ratio = 1.2, Temperature = 6500K
Star 3: Blue/Red ratio = 1.5, Temperature = 8000K
Star 4: Blue/Red ratio = 2.0, Temperature = 11000K
... (thousands more)
The Model (simplest possible):
Predicted_Temperature = w Γ (Blue/Red ratio) + b
Where w and b are parameters we need to learn
Training Process:
Step 1: Random initialization
w = 1000 (random guess)
b = 2000 (random guess)
Step 2: Make predictions
Star 1: 1000 Γ 0.8 + 2000 = 2800K (actual: 5000K) β way off!
Star 2: 1000 Γ 1.2 + 2000 = 3200K (actual: 6500K) β way off!
Step 3: Measure error
Error = average of (predicted - actual)Β²
= ((2800-5000)Β² + (3200-6500)Β²) / 2
= (4,840,000 + 10,890,000) / 2
= 7,865,000 β big number, bad!
Step 4: Update parameters
Mathematics tells us:
- Increasing w will reduce error
- Increasing b will reduce error
New w = 1000 + adjustment = 3000
New b = 2000 + adjustment = 2500
Step 5: Repeat
With new parameters, error becomes 2,100,000
Keep going...
After 1000 iterations:
w β 5000
b β 1000
Error is now tiny!
Final model:
Temperature β 5000 Γ (Blue/Red) + 1000
This simple model learned the relationship between color and temperature!
Gradient Descent: The Heart of Learning¶
"Gradient descent" is just a fancy name for the update process. Here's the intuition:
Imagine you're blindfolded on a hilly landscape. Your goal: find the lowest valley (minimum error).
Strategy:
- Feel the ground around you (compute gradient/derivative)
- Figure out which direction goes downhill (direction of steepest descent)
- Take a step that direction (update parameters)
- Repeat until you stop going downhill (reached minimum)
Error
^
|
* | * <- Starting point (random parameters)
* | *
* |*
* * <- Each step moves downhill
* /
* /
* /
* /
*/________________> Parameters
β
Minimum (best parameters)
The Learning Rate¶
How big should each step be?
- Too big: You overshoot the minimum, bounce around, never converge
- Too small: Takes forever to reach the minimum
- Just right: Steady progress toward the best solution
Learning rate too high: Learning rate too low: Learning rate good:
* * *
/ \ * *
/ \ * *
/ \ * *
/ * * *
* \ * *
* * *
... (takes forever) * <- converged!
The learning rate is a hyperparameterβsomething you choose, not something the model learns.
Chapter 4: Neural Networks Explained¶
The Biological Inspiration (Loosely)¶
Your brain has neurons connected by synapses. A neuron:
- Receives signals from other neurons
- If total signal exceeds a threshold, it "fires"
- Sends signals to other neurons
Artificial neural networks are inspired by this (but much simpler).
The Artificial Neuron¶
Inputs (xβ, xβ, xβ) Weights (wβ, wβ, wβ)
| |
v v
ββββββββββββββββββββββββββββββββββββββββββββββ
β β
β weighted_sum = wβΓxβ + wβΓxβ + wβΓxβ + b β
β β
β output = activation(weighted_sum) β
β β
ββββββββββββββββββββββββββββββββββββββββββββββ
|
v
Output
Inputs: The data (pixel values, measurements, features)
Weights: Learnable parameters that determine importance of each input
Bias (b): An adjustable offset
Activation function: Introduces non-linearity (explained below)
Why Activation Functions Matter¶
Without activation functions, stacking layers would be pointless:
Layer 1: output = wβ Γ input + bβ
Layer 2: output = wβ Γ (wβ Γ input + bβ) + bβ
= (wβΓwβ) Γ input + (wβΓbβ + bβ)
= W Γ input + B β Still just a linear function!
Activation functions break this linearity, allowing complex patterns:
ReLU (Rectified Linear Unit) β most common:
ReLU(x) = max(0, x)
If x is negative, output 0
If x is positive, output x
Examples:
ReLU(-5) = 0
ReLU(0) = 0
ReLU(3) = 3
Sigmoid β squashes to 0-1 (good for probabilities):
Sigmoid(x) = 1 / (1 + e^(-x))
Very negative x β ~0
Zero β 0.5
Very positive x β ~1
Softmax β for classification (outputs sum to 1):
Used in final layer for classification
Converts raw scores to probabilities
Scores: [2.0, 1.0, 0.1]
Softmax: [0.66, 0.24, 0.10] β These sum to 1.0
Building a Neural Network¶
Stack neurons into layers:
INPUT LAYER HIDDEN LAYER 1 HIDDEN LAYER 2 OUTPUT LAYER
(your data) (learned features) (complex features) (predictions)
xβ ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
\ β β
\ /|\ /|\
\ / | \ / | \
xβ βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ class 1 prob
/ \ | / \ | / \
/ \|/ \|/ \
xβ ββββββββββββββ β βββ class 2 prob
\ | | /
\ β β /
\ /|\ /|\ /
xβ ββββββββββββββββββββββββββββββββββββββββββββββββββββββ class 3 prob
Each connection has a weight (learnable)
Each neuron has a bias (learnable)
Each neuron applies an activation function
What Each Layer Learns (Image Example)¶
For image classification:
Layer 1: Detects simple patterns
- Edge detectors (vertical, horizontal, diagonal)
- Color blobs
- Simple textures
Layer 2: Combines simple patterns into shapes
- Corners (vertical + horizontal edges)
- Curves (many edge detectors)
- Texture regions
Layer 3: Combines shapes into parts
- "This looks like a spiral arm"
- "This looks like a galactic core"
- "This looks like a star cluster"
Layer 4+: Combines parts into objects
- "Spiral arms + bright core + overall shape = spiral galaxy"
This hierarchical learning is why deep networks are so powerful!
Forward Pass vs Backward Pass¶
Forward Pass: Data flows through the network, producing predictions
Input β Layer 1 β Layer 2 β ... β Output β Prediction
Backward Pass (Backpropagation): Errors flow backward, updating weights
How wrong was the prediction?
β
How much did each Layer N weight contribute to error?
β
Adjust Layer N weights
β
How much did each Layer N-1 weight contribute to error?
β
Adjust Layer N-1 weights
β
... continue back to Layer 1 ...
This is where the calculus happensβcomputing how each weight affects the final error.
Chapter 5: Convolutional Neural Networks (CNNs) for Images¶
Since you're working with telescope images, CNNs are crucial.
The Problem with Regular Networks for Images¶
A small 256Γ256 grayscale image has 65,536 pixels.
If your first layer has 1000 neurons, you'd have 65,536,000 connections from input to first layer alone!
This is:
- Computationally expensive
- Prone to overfitting (too many parameters for limited data)
- Ignores the structure of images (nearby pixels are related)
The Key Insight: Local Patterns¶
In images, patterns are local:
- An edge is a few pixels wide
- A star is a small region
- Artifacts have local signatures
We don't need every neuron to look at every pixel!
Convolution: The Core Operation¶
A filter (or kernel) is a small pattern detector:
Example: 3Γ3 edge-detecting filter
Filter: Slide over image:
[-1 0 1]
[-1 0 1] Original After convolution
[-1 0 1] [image] --> [edge map]
How convolution works:
Image region: Filter: Calculation:
[1, 2, 3] [-1, 0, 1] Sum of element-wise products:
[4, 5, 6] Γ [-1, 0, 1] = (-1Γ1)+(0Γ2)+(1Γ3)+
[7, 8, 9] [-1, 0, 1] (-1Γ4)+(0Γ5)+(1Γ6)+
(-1Γ7)+(0Γ8)+(1Γ9)
= -1+0+3-4+0+6-7+0+9 = 6
Slide the filter across the entire image, computing this at each position. The result is a feature map.
Multiple Filters = Multiple Features¶
A CNN layer has many filters, each learning to detect different patterns:
Input Image (1 channel: grayscale)
β
Conv Layer 1 (32 filters)
β
32 Feature Maps (different patterns detected)
β
Conv Layer 2 (64 filters, each looks at all 32 previous maps)
β
64 Feature Maps (combinations of patterns)
β
... more layers ...
β
Final Classification
Pooling: Reducing Size¶
After convolution, we often pool to reduce the size:
Max Pooling (2Γ2):
[1, 3, 2, 4]
[5, 6, 1, 2] β [6, 4] Take max of each 2Γ2 region
[3, 2, 1, 0] [3, 3]
[1, 2, 3, 1]
This:
- Reduces computation for later layers
- Adds some translation invariance (small shifts don't matter)
- Keeps the strongest activations
Complete CNN Architecture¶
Input: 256Γ256Γ1 telescope image
Conv1: 32 filters (3Γ3), ReLU β 256Γ256Γ32
Pool1: Max pool (2Γ2) β 128Γ128Γ32
Conv2: 64 filters (3Γ3), ReLU β 128Γ128Γ64
Pool2: Max pool (2Γ2) β 64Γ64Γ64
Conv3: 128 filters (3Γ3), ReLU β 64Γ64Γ128
Pool3: Max pool (2Γ2) β 32Γ32Γ128
Flatten: 32Γ32Γ128 = 131,072 values
Dense1: 512 neurons, ReLU
Dense2: 128 neurons, ReLU
Output: 5 neurons, Softmax β [spiral, elliptical, irregular, merger, artifact]
Why CNNs Work So Well for Astronomical Images¶
- Translation invariance: A galaxy in the corner looks the same as one in the center
- Hierarchical features: Learn edges β shapes β structures β objects
- Parameter efficiency: Same filter applied everywhere, fewer total parameters
- Natural for 2D data: Respects spatial relationships
Chapter 6: Training in Practice¶
The Training/Validation/Test Split¶
Never evaluate on data you trained on! Split your data:
All Your Data (e.g., 10,000 galaxy images)
β
ββββββββββββββββββββββββββββββββββββββββββ
β Training Set (70%): 7,000 images β β Model learns from these
ββββββββββββββββββββββββββββββββββββββββββ€
β Validation Set (15%): 1,500 images β β Tune hyperparameters, early stopping
ββββββββββββββββββββββββββββββββββββββββββ€
β Test Set (15%): 1,500 images β β Final evaluation only (touch once!)
ββββββββββββββββββββββββββββββββββββββββββ
Training set: Model sees these, adjusts weights
Validation set: Model never trains on these; use to check performance during training
Test set: Model never sees until final evaluation; gives unbiased performance estimate
Overfitting vs Underfitting¶
Underfitting: Model too simple, can't capture patterns
Training accuracy: 60%
Validation accuracy: 58%
Both are bad β need more complex model
Good fit: Model captures patterns without memorizing
Training accuracy: 95%
Validation accuracy: 92%
Both good, close together β well-tuned model
Overfitting: Model memorized training data, fails on new data
Training accuracy: 99%
Validation accuracy: 70%
Big gap β model is memorizing, not learning
Visualization:
Model Complexity β
β
| Underfitting Sweet Overfitting
E | | Spot |
r | ____ | ______|______ |
r | / \ | / | \ |
o | / \ | / | \ |
r | / \ | / | \ |
|/ \| / | \ |
ββββββββββββββ΄ββββββββββββ΄ββββββββββ\βββ΄ββ
βββ Training Error
β β Validation Error
Regularization: Preventing Overfitting¶
Dropout: Randomly "turn off" neurons during training
During training:
[neuron1] [ ] [neuron3] [ ] [neuron5] β 40% dropped
β
Forces network to not rely on any single neuron
β
More robust, generalizes better
L2 Regularization: Penalize large weights
Loss = Prediction_Error + Ξ» Γ (sum of squared weights)
Large weights get penalized
Forces model to use smaller, more distributed weights
Data Augmentation: Create variations of training data
Original galaxy image
β
Augmented versions:
- Rotated 90Β°, 180Β°, 270Β°
- Flipped horizontally
- Flipped vertically
- Slightly shifted
- Slightly zoomed
- Noise added
- Brightness adjusted
1 image becomes 10+ training examples!
For astronomy, augmentation is powerful because physics doesn't change with rotation.
Batch Training¶
Processing all data at once is memory-intensive. Instead, use mini-batches:
10,000 training images
β
Split into batches of 32
β
312 batches per epoch
Each training step:
1. Load batch of 32 images
2. Forward pass: compute predictions
3. Compute loss
4. Backward pass: compute gradients
5. Update weights
6. Next batch
One complete pass through all batches = 1 epoch
Training typically runs for 10-100+ epochs
Learning Rate Schedules¶
Learning rate can change during training:
Constant: Step Decay: Exponential: Cosine Annealing:
lr lr lr lr
|____ |__ |\ /\ /\
| | |__ | \ / \ / \
| | |__ | \ / \/ \
|____________ |________|___ |___\____ /_____________\
epochs epochs epochs epochs
Common approach: Start high (learn fast), decrease over time (fine-tune).
Early Stopping¶
Stop training when validation performance stops improving:
Epoch 1: Val accuracy = 70%
Epoch 2: Val accuracy = 78%
Epoch 3: Val accuracy = 84%
Epoch 4: Val accuracy = 88%
Epoch 5: Val accuracy = 90%
Epoch 6: Val accuracy = 91%
Epoch 7: Val accuracy = 91% β Stopped improving
Epoch 8: Val accuracy = 90% β Getting worse (overfitting starting)
Epoch 9: Val accuracy = 89%
...
Early stopping: Stop at epoch 6 or 7, save that model
Chapter 7: Practical Python for ML¶
Setting Up Your Environment¶
Step 1: Install Python (version 3.9 or 3.10 recommended)
Step 2: Install essential packages
pip install numpy pandas matplotlib scikit-learn
pip install torch torchvision # PyTorch (or tensorflow if you prefer)
pip install astropy # For astronomy data
pip install jupyter # For interactive development
Step 3: Verify installation
import numpy as np
import torch
import astropy
print("All imports successful!")
print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}") # True if you have GPU
NumPy: The Foundation¶
NumPy is for numerical computing. Everything in ML uses it.
import numpy as np
# Creating arrays
a = np.array([1, 2, 3, 4, 5])
b = np.zeros((3, 3)) # 3x3 array of zeros
c = np.ones((2, 4)) # 2x4 array of ones
d = np.random.randn(100, 100) # 100x100 random values (normal distribution)
# Array operations (element-wise)
x = np.array([1, 2, 3])
y = np.array([4, 5, 6])
print(x + y) # [5, 7, 9]
print(x * y) # [4, 10, 18]
print(x ** 2) # [1, 4, 9]
print(np.sqrt(x)) # [1.0, 1.414, 1.732]
# Statistics
data = np.random.randn(1000)
print(np.mean(data)) # ~0
print(np.std(data)) # ~1
print(np.max(data)) # ~3
print(np.min(data)) # ~-3
# Reshaping
image = np.random.randn(256, 256) # 2D image
flat = image.reshape(-1) # Flatten to 1D: 65536 elements
back = flat.reshape(256, 256) # Back to 2D
# Slicing
arr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print(arr[0, :]) # First row: [1, 2, 3]
print(arr[:, 1]) # Second column: [2, 5, 8]
print(arr[1:, 1:]) # Bottom-right: [[5, 6], [8, 9]]
Matplotlib: Visualization¶
import matplotlib.pyplot as plt
import numpy as np
# Basic line plot
x = np.linspace(0, 10, 100)
y = np.sin(x)
plt.plot(x, y)
plt.xlabel('x')
plt.ylabel('sin(x)')
plt.title('Sine Wave')
plt.show()
# Scatter plot
x = np.random.randn(100)
y = x + np.random.randn(100) * 0.5
plt.scatter(x, y, alpha=0.5)
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.show()
# Image display (crucial for astronomy!)
image = np.random.randn(256, 256)
plt.imshow(image, cmap='gray')
plt.colorbar()
plt.title('Random Image')
plt.show()
# Histogram
data = np.random.randn(10000)
plt.hist(data, bins=50, edgecolor='black')
plt.xlabel('Value')
plt.ylabel('Count')
plt.title('Distribution')
plt.show()
# Multiple subplots
fig, axes = plt.subplots(2, 2, figsize=(10, 10))
axes[0, 0].plot(x, y)
axes[0, 1].scatter(x, y)
axes[1, 0].imshow(image, cmap='viridis')
axes[1, 1].hist(data, bins=30)
plt.tight_layout()
plt.show()
Astropy: Handling Astronomical Data¶
from astropy.io import fits
from astropy import units as u
from astropy.coordinates import SkyCoord
import numpy as np
import matplotlib.pyplot as plt
# Reading FITS files (telescope images)
def load_fits_image(filepath):
with fits.open(filepath) as hdul:
# Primary data is usually in index 0 or 1
print(hdul.info()) # See what's in the file
data = hdul[0].data # The image data
header = hdul[0].header # Metadata
return data, header
# Example usage
# data, header = load_fits_image('my_observation.fits')
# print(f"Image shape: {data.shape}")
# print(f"Object: {header.get('OBJECT', 'Unknown')}")
# print(f"Exposure time: {header.get('EXPTIME', 'Unknown')} seconds")
# Working with coordinates
coord = SkyCoord('10h30m00s', '+45d00m00s', frame='icrs')
print(f"RA: {coord.ra.degree} degrees")
print(f"Dec: {coord.dec.degree} degrees")
# Unit conversions
distance = 100 * u.pc # 100 parsecs
print(f"In light years: {distance.to(u.lyr)}")
print(f"In AU: {distance.to(u.AU)}")
# Displaying astronomical images properly
def display_astronomical_image(data, title='Astronomical Image'):
"""Display with log stretch (common for astronomy)"""
# Handle negative values
data_shifted = data - np.nanmin(data) + 1
# Log stretch
log_data = np.log10(data_shifted)
# Display
plt.figure(figsize=(10, 10))
plt.imshow(log_data, cmap='gray', origin='lower')
plt.colorbar(label='log(counts)')
plt.title(title)
plt.show()
PyTorch Basics¶
PyTorch is a deep learning framework. Here's the essentials:
import torch
import torch.nn as nn
import torch.optim as optim
# Tensors (like numpy arrays, but can run on GPU)
a = torch.tensor([1.0, 2.0, 3.0])
b = torch.zeros(3, 3)
c = torch.randn(100, 100)
# Move to GPU (if available)
if torch.cuda.is_available():
a = a.cuda()
# or
device = torch.device('cuda')
a = a.to(device)
# Convert between numpy and torch
import numpy as np
numpy_array = np.array([1.0, 2.0, 3.0])
torch_tensor = torch.from_numpy(numpy_array)
back_to_numpy = torch_tensor.numpy()
# Automatic differentiation (the magic of PyTorch!)
x = torch.tensor([2.0], requires_grad=True)
y = x ** 2 + 3 * x + 1 # y = xΒ² + 3x + 1
y.backward() # Compute derivative
print(x.grad) # dy/dx = 2x + 3 = 2(2) + 3 = 7 β
Building Your First Neural Network in PyTorch¶
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset
# Define the network
class SimpleClassifier(nn.Module):
def __init__(self, input_size, hidden_size, num_classes):
super(SimpleClassifier, self).__init__()
self.network = nn.Sequential(
nn.Linear(input_size, hidden_size),
nn.ReLU(),
nn.Dropout(0.3),
nn.Linear(hidden_size, hidden_size // 2),
nn.ReLU(),
nn.Dropout(0.3),
nn.Linear(hidden_size // 2, num_classes)
)
def forward(self, x):
return self.network(x)
# Create synthetic data for demonstration
num_samples = 1000
input_size = 100
num_classes = 5
X = torch.randn(num_samples, input_size)
y = torch.randint(0, num_classes, (num_samples,))
# Split into train/val
train_X, val_X = X[:800], X[800:]
train_y, val_y = y[:800], y[800:]
# Create data loaders
train_dataset = TensorDataset(train_X, train_y)
train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)
val_dataset = TensorDataset(val_X, val_y)
val_loader = DataLoader(val_dataset, batch_size=32, shuffle=False)
# Initialize model, loss, optimizer
model = SimpleClassifier(input_size=100, hidden_size=256, num_classes=5)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)
# Training loop
num_epochs = 20
for epoch in range(num_epochs):
model.train() # Set to training mode
train_loss = 0
for batch_X, batch_y in train_loader:
# Forward pass
outputs = model(batch_X)
loss = criterion(outputs, batch_y)
# Backward pass
optimizer.zero_grad() # Clear previous gradients
loss.backward() # Compute gradients
optimizer.step() # Update weights
train_loss += loss.item()
# Validation
model.eval() # Set to evaluation mode
val_loss = 0
correct = 0
total = 0
with torch.no_grad(): # No gradient computation for validation
for batch_X, batch_y in val_loader:
outputs = model(batch_X)
loss = criterion(outputs, batch_y)
val_loss += loss.item()
_, predicted = torch.max(outputs, 1)
total += batch_y.size(0)
correct += (predicted == batch_y).sum().item()
accuracy = 100 * correct / total
print(f'Epoch [{epoch+1}/{num_epochs}], '
f'Train Loss: {train_loss/len(train_loader):.4f}, '
f'Val Loss: {val_loss/len(val_loader):.4f}, '
f'Val Accuracy: {accuracy:.2f}%')
Building a CNN for Images¶
import torch
import torch.nn as nn
class AstronomyCNN(nn.Module):
def __init__(self, num_classes=5):
super(AstronomyCNN, self).__init__()
# Convolutional layers
self.conv_layers = nn.Sequential(
# Input: 1 channel (grayscale), Output: 32 channels
nn.Conv2d(1, 32, kernel_size=3, padding=1),
nn.BatchNorm2d(32),
nn.ReLU(),
nn.MaxPool2d(2, 2), # 256 -> 128
nn.Conv2d(32, 64, kernel_size=3, padding=1),
nn.BatchNorm2d(64),
nn.ReLU(),
nn.MaxPool2d(2, 2), # 128 -> 64
nn.Conv2d(64, 128, kernel_size=3, padding=1),
nn.BatchNorm2d(128),
nn.ReLU(),
nn.MaxPool2d(2, 2), # 64 -> 32
nn.Conv2d(128, 256, kernel_size=3, padding=1),
nn.BatchNorm2d(256),
nn.ReLU(),
nn.MaxPool2d(2, 2), # 32 -> 16
)
# Fully connected layers
self.fc_layers = nn.Sequential(
nn.Flatten(),
nn.Linear(256 * 16 * 16, 512),
nn.ReLU(),
nn.Dropout(0.5),
nn.Linear(512, 128),
nn.ReLU(),
nn.Dropout(0.5),
nn.Linear(128, num_classes)
)
def forward(self, x):
x = self.conv_layers(x)
x = self.fc_layers(x)
return x
# Create model
model = AstronomyCNN(num_classes=5)
# Print model summary
print(model)
# Check with dummy input
dummy_input = torch.randn(1, 1, 256, 256) # Batch of 1, 1 channel, 256x256
output = model(dummy_input)
print(f"Output shape: {output.shape}") # Should be [1, 5]
Complete Training Script for Astronomical Image Classification¶
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader
import numpy as np
from astropy.io import fits
import os
from pathlib import Path
import matplotlib.pyplot as plt
class AstronomyDataset(Dataset):
"""Custom dataset for astronomical images"""
def __init__(self, image_dir, labels_file, transform=None):
"""
Args:
image_dir: Directory with FITS images
labels_file: Text file with "filename,label" per line
transform: Optional transform function
"""
self.image_dir = Path(image_dir)
self.transform = transform
# Load labels
self.samples = []
with open(labels_file, 'r') as f:
for line in f:
filename, label = line.strip().split(',')
self.samples.append((filename, int(label)))
self.classes = ['spiral', 'elliptical', 'irregular', 'merger', 'artifact']
def __len__(self):
return len(self.samples)
def __getitem__(self, idx):
filename, label = self.samples[idx]
# Load FITS image
filepath = self.image_dir / filename
with fits.open(filepath) as hdul:
image = hdul[0].data.astype(np.float32)
# Preprocessing
image = self.preprocess(image)
# Apply transforms if any
if self.transform:
image = self.transform(image)
# Convert to tensor
image = torch.from_numpy(image).unsqueeze(0) # Add channel dimension
return image, label
def preprocess(self, image):
"""Standard preprocessing for astronomical images"""
# Handle NaN values
image = np.nan_to_num(image, nan=0.0)
# Clip extreme values (cosmic rays, bad pixels)
p1, p99 = np.percentile(image, [1, 99])
image = np.clip(image, p1, p99)
# Log stretch (handles large dynamic range)
image = image - image.min() + 1
image = np.log(image)
# Normalize to 0-1
image = (image - image.min()) / (image.max() - image.min() + 1e-8)
return image
def train_model(model, train_loader, val_loader, num_epochs=50,
learning_rate=0.001, device='cuda'):
"""Complete training function with bells and whistles"""
model = model.to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=learning_rate)
scheduler = optim.lr_scheduler.ReduceLROnPlateau(
optimizer, mode='max', factor=0.5, patience=5
)
best_accuracy = 0
history = {'train_loss': [], 'val_loss': [], 'val_accuracy': []}
for epoch in range(num_epochs):
# Training phase
model.train()
train_loss = 0
for images, labels in train_loader:
images, labels = images.to(device), labels.to(device)
optimizer.zero_grad()
outputs = model(images)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
train_loss += loss.item()
# Validation phase
model.eval()
val_loss = 0
correct = 0
total = 0
with torch.no_grad():
for images, labels in val_loader:
images, labels = images.to(device), labels.to(device)
outputs = model(images)
loss = criterion(outputs, labels)
val_loss += loss.item()
_, predicted = torch.max(outputs, 1)
total += labels.size(0)
correct += (predicted == labels).sum().item()
accuracy = 100 * correct / total
avg_train_loss = train_loss / len(train_loader)
avg_val_loss = val_loss / len(val_loader)
# Update scheduler
scheduler.step(accuracy)
# Save history
history['train_loss'].append(avg_train_loss)
history['val_loss'].append(avg_val_loss)
history['val_accuracy'].append(accuracy)
# Save best model
if accuracy > best_accuracy:
best_accuracy = accuracy
torch.save(model.state_dict(), 'best_model.pt')
print(f'Epoch [{epoch+1}/{num_epochs}] '
f'Train Loss: {avg_train_loss:.4f} '
f'Val Loss: {avg_val_loss:.4f} '
f'Val Acc: {accuracy:.2f}% '
f'(Best: {best_accuracy:.2f}%)')
return history
def plot_training_history(history):
"""Visualize training progress"""
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))
# Loss plot
ax1.plot(history['train_loss'], label='Train Loss')
ax1.plot(history['val_loss'], label='Val Loss')
ax1.set_xlabel('Epoch')
ax1.set_ylabel('Loss')
ax1.set_title('Training and Validation Loss')
ax1.legend()
# Accuracy plot
ax2.plot(history['val_accuracy'], label='Val Accuracy', color='green')
ax2.set_xlabel('Epoch')
ax2.set_ylabel('Accuracy (%)')
ax2.set_title('Validation Accuracy')
ax2.legend()
plt.tight_layout()
plt.savefig('training_history.png')
plt.show()
# Example usage (you'd replace with your actual data):
if __name__ == '__main__':
# Check for GPU
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Using device: {device}")
# Create model
model = AstronomyCNN(num_classes=5)
# For demonstration, create random data
# In practice, you'd use AstronomyDataset with real data
train_X = torch.randn(800, 1, 256, 256)
train_y = torch.randint(0, 5, (800,))
val_X = torch.randn(200, 1, 256, 256)
val_y = torch.randint(0, 5, (200,))
train_dataset = torch.utils.data.TensorDataset(train_X, train_y)
val_dataset = torch.utils.data.TensorDataset(val_X, val_y)
train_loader = DataLoader(train_dataset, batch_size=16, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=16, shuffle=False)
# Train
history = train_model(model, train_loader, val_loader,
num_epochs=20, device=device)
# Plot results
plot_training_history(history)
Chapter 8: Your First Complete Project¶
Let's build something real: an image quality classifier for your telescope.
Project: Automatic Image Quality Assessment¶
Goal: Given a raw telescope frame, predict quality (good/medium/bad) automatically.
Step 1: Data Collection¶
First, manually classify some of your existing images:
import os
import shutil
from pathlib import Path
# Create directory structure
for quality in ['good', 'medium', 'bad']:
Path(f'training_data/{quality}').mkdir(parents=True, exist_ok=True)
print("""
Manual Classification Guide:
- GOOD: Clear stars, low background, good focus
- MEDIUM: Some clouds, slightly out of focus, minor issues
- BAD: Heavy clouds, tracking errors, severe artifacts
Move or copy your FITS files into the appropriate folders.
Aim for at least 100 images per category.
""")
Step 2: Data Preparation¶
import numpy as np
from astropy.io import fits
from pathlib import Path
import torch
from torch.utils.data import Dataset, DataLoader
from sklearn.model_selection import train_test_split
class QualityDataset(Dataset):
def __init__(self, filepaths, labels, image_size=128):
self.filepaths = filepaths
self.labels = labels
self.image_size = image_size
def __len__(self):
return len(self.filepaths)
def __getitem__(self, idx):
# Load image
with fits.open(self.filepaths[idx]) as hdul:
image = hdul[0].data.astype(np.float32)
# Resize to consistent size
from scipy.ndimage import zoom
zoom_factor = self.image_size / max(image.shape)
image = zoom(image, zoom_factor)
# Pad to exact size if needed
if image.shape[0] < self.image_size:
pad = self.image_size - image.shape[0]
image = np.pad(image, ((0, pad), (0, 0)))
if image.shape[1] < self.image_size:
pad = self.image_size - image.shape[1]
image = np.pad(image, ((0, 0), (0, pad)))
# Crop to exact size
image = image[:self.image_size, :self.image_size]
# Normalize
image = np.nan_to_num(image, nan=0)
p1, p99 = np.percentile(image, [1, 99])
image = np.clip(image, p1, p99)
image = (image - image.min()) / (image.max() - image.min() + 1e-8)
# To tensor
image = torch.from_numpy(image).unsqueeze(0)
return image, self.labels[idx]
def prepare_data(data_dir='training_data'):
"""Load data from organized folders"""
filepaths = []
labels = []
label_map = {'good': 0, 'medium': 1, 'bad': 2}
for quality, label in label_map.items():
folder = Path(data_dir) / quality
for filepath in folder.glob('*.fits'):
filepaths.append(str(filepath))
labels.append(label)
# Split into train/val/test
train_files, temp_files, train_labels, temp_labels = train_test_split(
filepaths, labels, test_size=0.3, stratify=labels, random_state=42
)
val_files, test_files, val_labels, test_labels = train_test_split(
temp_files, temp_labels, test_size=0.5, stratify=temp_labels, random_state=42
)
print(f"Training samples: {len(train_files)}")
print(f"Validation samples: {len(val_files)}")
print(f"Test samples: {len(test_files)}")
return (
(train_files, train_labels),
(val_files, val_labels),
(test_files, test_labels)
)
Step 3: Model Definition¶
import torch.nn as nn
class QualityClassifier(nn.Module):
"""Lightweight CNN for image quality assessment"""
def __init__(self, num_classes=3):
super().__init__()
self.features = nn.Sequential(
# Block 1: 128 -> 64
nn.Conv2d(1, 16, 3, padding=1),
nn.BatchNorm2d(16),
nn.ReLU(),
nn.MaxPool2d(2),
# Block 2: 64 -> 32
nn.Conv2d(16, 32, 3, padding=1),
nn.BatchNorm2d(32),
nn.ReLU(),
nn.MaxPool2d(2),
# Block 3: 32 -> 16
nn.Conv2d(32, 64, 3, padding=1),
nn.BatchNorm2d(64),
nn.ReLU(),
nn.MaxPool2d(2),
# Block 4: 16 -> 8
nn.Conv2d(64, 128, 3, padding=1),
nn.BatchNorm2d(128),
nn.ReLU(),
nn.MaxPool2d(2),
)
self.classifier = nn.Sequential(
nn.Flatten(),
nn.Linear(128 * 8 * 8, 256),
nn.ReLU(),
nn.Dropout(0.5),
nn.Linear(256, num_classes)
)
def forward(self, x):
x = self.features(x)
x = self.classifier(x)
return x
Step 4: Training Script¶
def train_quality_model():
# Configuration
BATCH_SIZE = 16
LEARNING_RATE = 0.001
NUM_EPOCHS = 30
DEVICE = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
# Prepare data
(train_files, train_labels), (val_files, val_labels), _ = prepare_data()
train_dataset = QualityDataset(train_files, train_labels)
val_dataset = QualityDataset(val_files, val_labels)
train_loader = DataLoader(train_dataset, batch_size=BATCH_SIZE,
shuffle=True, num_workers=2)
val_loader = DataLoader(val_dataset, batch_size=BATCH_SIZE,
shuffle=False, num_workers=2)
# Initialize model
model = QualityClassifier(num_classes=3).to(DEVICE)
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=LEARNING_RATE)
# Training loop
best_accuracy = 0
for epoch in range(NUM_EPOCHS):
# Train
model.train()
train_loss = 0
for images, labels in train_loader:
images, labels = images.to(DEVICE), labels.to(DEVICE)
optimizer.zero_grad()
outputs = model(images)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
train_loss += loss.item()
# Validate
model.eval()
correct = 0
total = 0
with torch.no_grad():
for images, labels in val_loader:
images, labels = images.to(DEVICE), labels.to(DEVICE)
outputs = model(images)
_, predicted = torch.max(outputs, 1)
total += labels.size(0)
correct += (predicted == labels).sum().item()
accuracy = 100 * correct / total
print(f'Epoch [{epoch+1}/{NUM_EPOCHS}] '
f'Loss: {train_loss/len(train_loader):.4f} '
f'Accuracy: {accuracy:.1f}%')
# Save best model
if accuracy > best_accuracy:
best_accuracy = accuracy
torch.save({
'model_state': model.state_dict(),
'accuracy': accuracy,
'epoch': epoch
}, 'quality_classifier_best.pt')
print(f"\nTraining complete! Best accuracy: {best_accuracy:.1f}%")
return model
Step 5: Deployment for Real-Time Use¶
class RealTimeQualityChecker:
"""Deploy the trained model for real-time quality assessment"""
def __init__(self, model_path='quality_classifier_best.pt'):
self.device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
# Load model
self.model = QualityClassifier(num_classes=3)
checkpoint = torch.load(model_path, map_location=self.device)
self.model.load_state_dict(checkpoint['model_state'])
self.model.to(self.device)
self.model.eval()
self.classes = ['good', 'medium', 'bad']
def preprocess(self, image):
"""Preprocess a raw numpy image"""
from scipy.ndimage import zoom
# Resize
zoom_factor = 128 / max(image.shape)
image = zoom(image.astype(np.float32), zoom_factor)
image = image[:128, :128]
# Normalize
image = np.nan_to_num(image, nan=0)
p1, p99 = np.percentile(image, [1, 99])
image = np.clip(image, p1, p99)
image = (image - image.min()) / (image.max() - image.min() + 1e-8)
# To tensor
tensor = torch.from_numpy(image).unsqueeze(0).unsqueeze(0)
return tensor.to(self.device)
def assess(self, image):
"""
Assess image quality
Args:
image: numpy array (raw telescope image)
Returns:
dict with quality label and confidence
"""
tensor = self.preprocess(image)
with torch.no_grad():
outputs = self.model(tensor)
probabilities = torch.softmax(outputs, dim=1)[0]
predicted_class = torch.argmax(probabilities).item()
return {
'quality': self.classes[predicted_class],
'confidence': probabilities[predicted_class].item(),
'all_probabilities': {
cls: prob.item()
for cls, prob in zip(self.classes, probabilities)
}
}
def assess_file(self, filepath):
"""Assess quality of a FITS file"""
with fits.open(filepath) as hdul:
image = hdul[0].data
return self.assess(image)
# Usage example:
if __name__ == '__main__':
checker = RealTimeQualityChecker('quality_classifier_best.pt')
# Assess a single file
result = checker.assess_file('new_observation.fits')
print(f"Quality: {result['quality']} ({result['confidence']:.1%} confident)")
# In a real-time loop
def process_new_frame(filepath):
result = checker.assess_file(filepath)
if result['quality'] == 'bad':
print(f"β οΈ Bad frame detected: {filepath}")
# Could trigger alert or stop observation
elif result['quality'] == 'medium':
print(f"β‘ Medium quality: {filepath}")
# Continue but flag for review
else:
print(f"β Good frame: {filepath}")
# Proceed normally
return result
Chapter 9: Next Steps and Resources¶
Your Learning Path¶
Week 1-2: Python fundamentals
- Complete a Python tutorial (Codecademy, Python.org tutorial)
- Practice with NumPy and Matplotlib
- Load and visualize your telescope images
Week 3-4: Machine learning concepts
- Take Andrew Ng's ML course on Coursera (free to audit)
- Implement simple models with scikit-learn
- Understand training/validation/testing
Week 5-6: Deep learning basics
- Work through Fast.ai course (free, practical)
- Build your first CNN in PyTorch
- Train on your own data
Week 7-8: Your first real project
- Implement the quality classifier above
- Collect and label your data
- Train, validate, deploy
Month 2+: Advanced topics
- Time-series analysis for transient detection
- Multi-site coordination systems
- Real-time processing pipelines
Essential Resources¶
Books:
- "Python for Astronomers" (free online)
- "Deep Learning" by Goodfellow (the bible, free online)
- "Hands-On Machine Learning" by GΓ©ron
Courses:
- Fast.ai (practical deep learning)
- Coursera: Andrew Ng's courses
- DeepLearning.AI specializations
Astronomy-specific:
- AstroML documentation
- Astropy tutorials
- AAS astronomy + ML workshops
Communities:
- Stack Overflow (coding help)
- Cross Validated (ML theory)
- r/MachineLearning (Reddit)
- Astropy Slack/Discord
Hardware Recommendations¶
For learning (budget):
- Any modern laptop with 8GB+ RAM
- Use Google Colab for free GPU access
For development (intermediate):
- Desktop with NVIDIA GPU (RTX 3060 or better)
- 32GB+ RAM
- Fast SSD storage
For production (your telescope array):
- Edge devices: NVIDIA Jetson at each site
- Central server: Multiple GPUs for training
- Cloud backup for burst computing
Final Thoughts¶
You now have a complete foundation in AI/ML for astronomy. The key principles:
- ML finds patterns in data β nothing magical
- Neural networks learn hierarchically β simple to complex features
- Training requires iteration β thousands of updates to find good parameters
- Data quality matters more than model complexity β garbage in, garbage out
- Start simple, add complexity as needed β don't over-engineer initially
For your distributed telescope array, the path is:
- Build local quality assessment (the project above)
- Add transient detection at each site
- Implement cross-site coordination
- Develop data fusion capabilities
- Create autonomous discovery systems
Each step builds on the previous. You don't need to understand everything at onceβlearn what you need for each stage.
Would you like me to elaborate on any specific section? I can provide:
- More code examples for specific tasks
- Deeper mathematical explanations
- Hardware setup guides
- Specific astronomy ML techniques
- Step-by-step project walkthroughs
[[wot]]