Deep Sea Detective: Autonomous Marine Life Classification for AR-Ready Deployment

Technologies: EfficientNetB7, YOLO, GradCAM, Python, TensorFlow, OpenCV, Deep Learning, Transfer Learning
View Complete Code Repository

What if computer vision systems could achieve marine biologist-level species identification in real-time underwater conditions, enabling autonomous AR-enhanced diving experiences?

Results: 87.84% classification accuracy across 23 marine species with 100% video processing success rate - production-ready performance for AR diving goggles.

The Problem That Hooked Me

On my diving trips, I was constantly amazed—and frustrated. I’d come face-to-face with stunning marine creatures, only to surface with no idea what I’d just seen. Underwater wildlife identification still depends almost entirely on human expertise, creating major barriers for marine research, diver safety, and accessible learning.

Even seasoned marine biologists struggle with quick species ID in murky, fast-moving conditions. Recreational divers? Most miss out on the rich biodiversity around them simply because identifying marine life underwater is so difficult.

Then I had a thought—AR glasses are going mainstream. What if diving goggles could detect and identify marine animals in real time?

Of course, there’s a catch: underwater computer vision is hard. Most pre-trained image models are trained on land-based datasets, where subjects are large, clear, and front-and-center. Underwater, marine creatures are often small, camouflaged, and drifting through noise. As a result, standard models tend to fail when applied to ocean life.

I took a pre-trained image model and fine-tuned it using a curated dataset of underwater images. The goal? Teach the model to recognize marine species in the kinds of messy, low-visibility conditions divers actually experience.

Technical Innovation: Underwater Real-Time Computer Vision

Architecture Overview

The system implements a two-stage pipeline optimized for underwater conditions and real-time AR deployment. First, an EfficientNet model was trained using marine life images. Then, a YOLO model detects animals in video frames, and the trained EfficientNet model classifies the identified species.

Underwater Data Challenges

Dataset: 13,000+ underwater images across 23 marine species

The underwater reality check:

  • Color distortion: Blues dominate, reds disappear at depth

  • Lighting chaos: From bright shallow water to artificial deep-sea illumination

  • Movement blur: Animals don't pose for photos

  • Background complexity: Coral reefs, kelp forests, open water

Data preprocessing strategy:

  • Aggressive color augmentation to simulate depth variations

  • Rotation/zoom for natural movement patterns

  • Contrast adjustment for different lighting conditions

  • Background suppression techniques

The figure below displays some of the input images.

Model Architecture

Base Model: EfficientNetB7

  • Pre-trained: ImageNet weights

  • Input Shape: (224, 224, 3)

  • Feature Extraction: Initially frozen for transfer learning

  • Trainable Parameters: ~64.4M total

Custom Classification Head:

EfficientNetB7 (frozen initially)

GlobalAveragePooling2D

Dropout(0.45)

Dense(512, activation='relu') + L2 regularization

Dropout(0.45)

Dense(23, activation='softmax') # 23 sea animal classes

Two-Stage Training Strategy

Stage 1: Classifier Head Training

  • Duration: 15 epochs

  • Learning Rate: 0.001 (Adam optimizer)

  • Base Model: Frozen (only classifier trains)

  • Purpose: Learn class-specific features without disrupting pre-trained weights

Stage 2: Fine-Tuning

  • Duration: 10 epochs

  • Learning Rate: 0.0001 (10x lower)

  • Base Model: Unfrozen (full model trains)

  • Purpose: Fine-tune entire network for domain-specific features

Training results

Test Accuracy: 87.84% :

After training, we can classify the animal in each image using the trained model. The following figure shows the classification results. For each image, the actual label the predicted classification are listed.

Performance Analysis: Biological Validation of AI Learning

Species-Specific Performance Benchmarking

Elite Performers (>95% F1-Score):

  • Otter: 98% F1 (distinctive mammalian morphology), Turtle/Tortoise: 97% F1 (clear shell structure patterns), Sea Urchins: 97% F1 (unmistakable spiky texture), and Starfish: 96% F1 (unique radial symmetry)

Robust Performers (90-95% F1-Score):

  • Crabs: 94% F1 (distinct body shape) and Jelly Fish: 94% F1 (translucent features)

Challenging Species (<70% F1-Score):

  • Corals: 67% F1 (background blending, expected difficulty), Clams: 65% F1 (minimal distinguishing features), and Shrimp: 55% F1 (small size similarity to other crustaceans)

Critical Validation: Performance hierarchy mirrors marine biology classification difficulty, confirming biologically relevant feature learning rather than dataset artifacts.

Explainable AI: Understanding the "Why"

GradCAM biological validation revealed the model focuses on exactly the right features:

  • High-performing species: Model identifies distinctive biological markers (shell patterns, body symmetry, unique textures)

  • Challenging species: Model appropriately struggles with natural camouflage experts and visually similar small creatures

  • Performance correlation: Species difficulty matches real marine biologist identification challenges

Why this matters for AR: The model's "mistakes" align with human difficulty, proving it learned meaningful biological features rather than dataset shortcuts.

Real-Time Video Pipeline

AR-ready processing pipeline:

  1. Smart video orientation (auto-detects portrait/landscape)

  2. YOLO object detection (locates marine animals, filters backgrounds)

  3. EfficientNet classification (species identification + confidence)

  4. Intelligent overlay (species name, educational facts, confidence-based display)

Performance: 11.2 FPS with 74.9% detection rate and 120ms classification latency - validated AR-ready performance for diving goggles.

Bonus Feature: Interactive educational facts for each species - transforming diving into active learning experiences! When you encounter a starfish, you'll learn they can regenerate lost arms and have no brain. Spot an octopus? Discover they have three hearts and blue blood. :)

Implications for Marine Technology and AR Development

Current System Capabilities

  • Real-time species identification matching marine biologist accuracy on distinctive species

  • Production-ready performance metrics for AR goggle integration

  • Biologically validated learning eliminating need for extensive retraining

  • Edge-device optimization for practical underwater deployment

Technical Limitations

  • Performance degradation on naturally camouflaged species (expected biological limitation)

  • Requires clear visual input (limited by underwater visibility conditions)

  • Fixed classification set (23 species, not expandable without retraining)

  • Dependency on quality training data for new species addition

What I Learned

Technical insights:

  • Model validation through biology: Performance patterns that mirror real marine biologist challenges prove the AI learned correct features

  • Production optimization: ReduceLROnPlateau + EarlyStopping more effective than extended training

  • Underwater adaptations: Aggressive regularization essential for complex underwater backgrounds

  • Real-time constraints: 120ms latency achievable with careful architecture optimization

  • User experience: Educational overlay transforms identification into learning opportunity

The bigger picture:

  • Domain expertise validation eliminates costly retraining cycles

  • AR performance requirements drive different optimization strategies than laboratory metrics

  • Real-world deployment success depends on biological relevance, not just accuracy numbers

Future Vision: AR Diving Goggles

Next steps toward the AR dream:

  • Improve model accuracy with additional training data, more sophisticated architectures, and advanced training techniques

  • Expand to more fish species, building on the current 23 species foundation that proved the approach works

  • Optimize real-time performance - smooth detection and fast classification are critical for AR system success

  • Implement cross-site species tracking across dive sites at different depths to improve classification accuracy through environmental context

Transform diving from passive observation to active learning, making marine life accessible to everyone underwater. 🌊

Next
Next

Robot Data Analyst