Deep Sea Detective: Autonomous Marine Life Classification for AR-Ready Deployment
Technologies: EfficientNetB7, YOLO, GradCAM, Python, TensorFlow, OpenCV, Deep Learning, Transfer Learning
View Complete Code Repository
What if computer vision systems could achieve marine biologist-level species identification in real-time underwater conditions, enabling autonomous AR-enhanced diving experiences?
Results: 87.84% classification accuracy across 23 marine species with 100% video processing success rate - production-ready performance for AR diving goggles.
The Problem That Hooked Me
On my diving trips, I was constantly amazed—and frustrated. I’d come face-to-face with stunning marine creatures, only to surface with no idea what I’d just seen. Underwater wildlife identification still depends almost entirely on human expertise, creating major barriers for marine research, diver safety, and accessible learning.
Even seasoned marine biologists struggle with quick species ID in murky, fast-moving conditions. Recreational divers? Most miss out on the rich biodiversity around them simply because identifying marine life underwater is so difficult.
Then I had a thought—AR glasses are going mainstream. What if diving goggles could detect and identify marine animals in real time?
Of course, there’s a catch: underwater computer vision is hard. Most pre-trained image models are trained on land-based datasets, where subjects are large, clear, and front-and-center. Underwater, marine creatures are often small, camouflaged, and drifting through noise. As a result, standard models tend to fail when applied to ocean life.
I took a pre-trained image model and fine-tuned it using a curated dataset of underwater images. The goal? Teach the model to recognize marine species in the kinds of messy, low-visibility conditions divers actually experience.
Technical Innovation: Underwater Real-Time Computer Vision
Architecture Overview
The system implements a two-stage pipeline optimized for underwater conditions and real-time AR deployment. First, an EfficientNet model was trained using marine life images. Then, a YOLO model detects animals in video frames, and the trained EfficientNet model classifies the identified species.
Underwater Data Challenges
Dataset: 13,000+ underwater images across 23 marine species
The underwater reality check:
Color distortion: Blues dominate, reds disappear at depth
Lighting chaos: From bright shallow water to artificial deep-sea illumination
Movement blur: Animals don't pose for photos
Background complexity: Coral reefs, kelp forests, open water
Data preprocessing strategy:
Aggressive color augmentation to simulate depth variations
Rotation/zoom for natural movement patterns
Contrast adjustment for different lighting conditions
Background suppression techniques
The figure below displays some of the input images.
Model Architecture
Base Model: EfficientNetB7
Pre-trained: ImageNet weights
Input Shape: (224, 224, 3)
Feature Extraction: Initially frozen for transfer learning
Trainable Parameters: ~64.4M total
Custom Classification Head:
EfficientNetB7 (frozen initially)
GlobalAveragePooling2D
Dropout(0.45)
Dense(512, activation='relu') + L2 regularization
Dropout(0.45)
Dense(23, activation='softmax') # 23 sea animal classes
Two-Stage Training Strategy
Stage 1: Classifier Head Training
Duration: 15 epochs
Learning Rate: 0.001 (Adam optimizer)
Base Model: Frozen (only classifier trains)
Purpose: Learn class-specific features without disrupting pre-trained weights
Stage 2: Fine-Tuning
Duration: 10 epochs
Learning Rate: 0.0001 (10x lower)
Base Model: Unfrozen (full model trains)
Purpose: Fine-tune entire network for domain-specific features
Training results
Test Accuracy: 87.84% :
After training, we can classify the animal in each image using the trained model. The following figure shows the classification results. For each image, the actual label the predicted classification are listed.
Performance Analysis: Biological Validation of AI Learning
Species-Specific Performance Benchmarking
Elite Performers (>95% F1-Score):
Otter: 98% F1 (distinctive mammalian morphology), Turtle/Tortoise: 97% F1 (clear shell structure patterns), Sea Urchins: 97% F1 (unmistakable spiky texture), and Starfish: 96% F1 (unique radial symmetry)
Robust Performers (90-95% F1-Score):
Crabs: 94% F1 (distinct body shape) and Jelly Fish: 94% F1 (translucent features)
Challenging Species (<70% F1-Score):
Corals: 67% F1 (background blending, expected difficulty), Clams: 65% F1 (minimal distinguishing features), and Shrimp: 55% F1 (small size similarity to other crustaceans)
Critical Validation: Performance hierarchy mirrors marine biology classification difficulty, confirming biologically relevant feature learning rather than dataset artifacts.
Explainable AI: Understanding the "Why"
GradCAM biological validation revealed the model focuses on exactly the right features:
High-performing species: Model identifies distinctive biological markers (shell patterns, body symmetry, unique textures)
Challenging species: Model appropriately struggles with natural camouflage experts and visually similar small creatures
Performance correlation: Species difficulty matches real marine biologist identification challenges
Why this matters for AR: The model's "mistakes" align with human difficulty, proving it learned meaningful biological features rather than dataset shortcuts.
Real-Time Video Pipeline
AR-ready processing pipeline:
Smart video orientation (auto-detects portrait/landscape)
YOLO object detection (locates marine animals, filters backgrounds)
EfficientNet classification (species identification + confidence)
Intelligent overlay (species name, educational facts, confidence-based display)
Performance: 11.2 FPS with 74.9% detection rate and 120ms classification latency - validated AR-ready performance for diving goggles.
Bonus Feature: Interactive educational facts for each species - transforming diving into active learning experiences! When you encounter a starfish, you'll learn they can regenerate lost arms and have no brain. Spot an octopus? Discover they have three hearts and blue blood. :)
Implications for Marine Technology and AR Development
Current System Capabilities
Real-time species identification matching marine biologist accuracy on distinctive species
Production-ready performance metrics for AR goggle integration
Biologically validated learning eliminating need for extensive retraining
Edge-device optimization for practical underwater deployment
Technical Limitations
Performance degradation on naturally camouflaged species (expected biological limitation)
Requires clear visual input (limited by underwater visibility conditions)
Fixed classification set (23 species, not expandable without retraining)
Dependency on quality training data for new species addition
What I Learned
Technical insights:
Model validation through biology: Performance patterns that mirror real marine biologist challenges prove the AI learned correct features
Production optimization: ReduceLROnPlateau + EarlyStopping more effective than extended training
Underwater adaptations: Aggressive regularization essential for complex underwater backgrounds
Real-time constraints: 120ms latency achievable with careful architecture optimization
User experience: Educational overlay transforms identification into learning opportunity
The bigger picture:
Domain expertise validation eliminates costly retraining cycles
AR performance requirements drive different optimization strategies than laboratory metrics
Real-world deployment success depends on biological relevance, not just accuracy numbers
Future Vision: AR Diving Goggles
Next steps toward the AR dream:
Improve model accuracy with additional training data, more sophisticated architectures, and advanced training techniques
Expand to more fish species, building on the current 23 species foundation that proved the approach works
Optimize real-time performance - smooth detection and fast classification are critical for AR system success
Implement cross-site species tracking across dive sites at different depths to improve classification accuracy through environmental context
Transform diving from passive observation to active learning, making marine life accessible to everyone underwater. 🌊