Technologies: EfficientNetB7, YOLO, GradCAM, Python, TensorFlow, OpenCV, Deep Learning, Transfer Learning
View Complete Code Repository

What if computer vision systems could achieve marine biologist-level species identification in real-time underwater conditions, enabling autonomous AR-enhanced diving experiences?

Results: 87.84% classification accuracy across 23 marine species with 100% video processing success rate - production-ready performance for AR diving goggles.

The Problem That Hooked Me

On my diving trips, I was constantly amazed—and frustrated. I’d come face-to-face with stunning marine creatures, only to surface with no idea what I’d just seen. Underwater wildlife identification still depends almost entirely on human expertise, creating major barriers for marine research, diver safety, and accessible learning.

Even seasoned marine biologists struggle with quick species ID in murky, fast-moving conditions. Recreational divers? Most miss out on the rich biodiversity around them simply because identifying marine life underwater is so difficult.

Then I had a thought—AR glasses are going mainstream. What if diving goggles could detect and identify marine animals in real time?

Of course, there’s a catch: underwater computer vision is hard. Most pre-trained image models are trained on land-based datasets, where subjects are large, clear, and front-and-center. Underwater, marine creatures are often small, camouflaged, and drifting through noise. As a result, standard models tend to fail when applied to ocean life.

I took a pre-trained image model and fine-tuned it using a curated dataset of underwater images. The goal? Teach the model to recognize marine species in the kinds of messy, low-visibility conditions divers actually experience.

Technical Innovation: Underwater Real-Time Computer Vision

Architecture Overview

The system implements a two-stage pipeline optimized for underwater conditions and real-time AR deployment. First, an EfficientNet model was trained using marine life images. Then, a YOLO model detects animals in video frames, and the trained EfficientNet model classifies the identified species.

Underwater Data Challenges

Dataset: 13,000+ underwater images across 23 marine species

The underwater reality check:

Color distortion: Blues dominate, reds disappear at depth
Lighting chaos: From bright shallow water to artificial deep-sea illumination
Movement blur: Animals don't pose for photos
Background complexity: Coral reefs, kelp forests, open water

Data preprocessing strategy:

Aggressive color augmentation to simulate depth variations
Rotation/zoom for natural movement patterns
Contrast adjustment for different lighting conditions
Background suppression techniques

The figure below displays some of the input images.

Model Architecture

Base Model: EfficientNetB7

Pre-trained: ImageNet weights
Input Shape: (224, 224, 3)
Feature Extraction: Initially frozen for transfer learning
Trainable Parameters: ~64.4M total

Custom Classification Head:

EfficientNetB7 (frozen initially)
GlobalAveragePooling2D
Dropout(0.45)
Dense(512, activation='relu') + L2 regularization
Dropout(0.45)
Dense(23, activation='softmax') # 23 sea animal classes

Two-Stage Training Strategy

Stage 1: Classifier Head Training

Duration: 15 epochs
Learning Rate: 0.001 (Adam optimizer)
Base Model: Frozen (only classifier trains)
Purpose: Learn class-specific features without disrupting pre-trained weights

Stage 2: Fine-Tuning

Duration: 10 epochs
Learning Rate: 0.0001 (10x lower)
Base Model: Unfrozen (full model trains)
Purpose: Fine-tune entire network for domain-specific features

Training results

Test Accuracy: 87.84% :

After training, we can classify the animal in each image using the trained model. The following figure shows the classification results. For each image, the actual label the predicted classification are listed.

Performance Analysis: Biological Validation of AI Learning

Species-Specific Performance Benchmarking

Elite Performers (>95% F1-Score):

Otter: 98% F1 (distinctive mammalian morphology), Turtle/Tortoise: 97% F1 (clear shell structure patterns), Sea Urchins: 97% F1 (unmistakable spiky texture), and Starfish: 96% F1 (unique radial symmetry)

Robust Performers (90-95% F1-Score):

Crabs: 94% F1 (distinct body shape) and Jelly Fish: 94% F1 (translucent features)

Challenging Species (<70% F1-Score):

Corals: 67% F1 (background blending, expected difficulty), Clams: 65% F1 (minimal distinguishing features), and Shrimp: 55% F1 (small size similarity to other crustaceans)

Critical Validation: Performance hierarchy mirrors marine biology classification difficulty, confirming biologically relevant feature learning rather than dataset artifacts.

Explainable AI: Understanding the "Why"

GradCAM biological validation revealed the model focuses on exactly the right features:

High-performing species: Model identifies distinctive biological markers (shell patterns, body symmetry, unique textures)
Challenging species: Model appropriately struggles with natural camouflage experts and visually similar small creatures
Performance correlation: Species difficulty matches real marine biologist identification challenges

Why this matters for AR: The model's "mistakes" align with human difficulty, proving it learned meaningful biological features rather than dataset shortcuts.

Real-Time Video Pipeline

AR-ready processing pipeline:

Smart video orientation (auto-detects portrait/landscape)
YOLO object detection (locates marine animals, filters backgrounds)
EfficientNet classification (species identification + confidence)
Intelligent overlay (species name, educational facts, confidence-based display)

Performance: 11.2 FPS with 74.9% detection rate and 120ms classification latency - validated AR-ready performance for diving goggles.

Bonus Feature: Interactive educational facts for each species - transforming diving into active learning experiences! When you encounter a starfish, you'll learn they can regenerate lost arms and have no brain. Spot an octopus? Discover they have three hearts and blue blood. :)

Implications for Marine Technology and AR Development

Current System Capabilities

Real-time species identification matching marine biologist accuracy on distinctive species
Production-ready performance metrics for AR goggle integration
Biologically validated learning eliminating need for extensive retraining
Edge-device optimization for practical underwater deployment

Technical Limitations

Performance degradation on naturally camouflaged species (expected biological limitation)
Requires clear visual input (limited by underwater visibility conditions)
Fixed classification set (23 species, not expandable without retraining)
Dependency on quality training data for new species addition

What I Learned

Technical insights:

Model validation through biology: Performance patterns that mirror real marine biologist challenges prove the AI learned correct features
Production optimization: ReduceLROnPlateau + EarlyStopping more effective than extended training
Underwater adaptations: Aggressive regularization essential for complex underwater backgrounds

Real-time constraints: 120ms latency achievable with careful architecture optimization
User experience: Educational overlay transforms identification into learning opportunity

The bigger picture:

Domain expertise validation eliminates costly retraining cycles
AR performance requirements drive different optimization strategies than laboratory metrics
Real-world deployment success depends on biological relevance, not just accuracy numbers

Future Vision: AR Diving Goggles

Next steps toward the AR dream:

Improve model accuracy with additional training data, more sophisticated architectures, and advanced training techniques
Expand to more fish species, building on the current 23 species foundation that proved the approach works
Optimize real-time performance - smooth detection and fast classification are critical for AR system success
Implement cross-site species tracking across dive sites at different depths to improve classification accuracy through environmental context

Transform diving from passive observation to active learning, making marine life accessible to everyone underwater. 🌊

Deep Sea Detective: Autonomous Marine Life Classification for AR-Ready Deployment

The Problem That Hooked Me

Technical Innovation: Underwater Real-Time Computer Vision

Architecture Overview

Underwater Data Challenges

Model Architecture

Base Model: EfficientNetB7

Custom Classification Head:

Two-Stage Training Strategy

Stage 1: Classifier Head Training

Stage 2: Fine-Tuning

Training results

Performance Analysis: Biological Validation of AI Learning

Explainable AI: Understanding the "Why"

Real-Time Video Pipeline

Implications for Marine Technology and AR Development

Current System Capabilities

Technical Limitations

What I Learned

Future Vision: AR Diving Goggles

ftang.xyz

Contact

Deep Sea Detective: Autonomous Marine Life Classification for AR-Ready Deployment

The Problem That Hooked Me

Technical Innovation: Underwater Real-Time Computer Vision

Architecture Overview

Underwater Data Challenges

Model Architecture

Base Model: EfficientNetB7

Custom Classification Head:

Two-Stage Training Strategy

Stage 1: Classifier Head Training

Stage 2: Fine-Tuning

Training results

Performance Analysis: Biological Validation of AI Learning

Explainable AI: Understanding the "Why"

Real-Time Video Pipeline

Implications for Marine Technology and AR Development

Current System Capabilities

Technical Limitations

What I Learned

Future Vision: AR Diving Goggles

Robot Data Analyst

ftang.xyz

Contact