🎯

photo-content-recognition-curation-expert

🎯Skill

from erichowens/some_claude_skills

VibeIndex|
What it does

photo-content-recognition-curation-expert skill from erichowens/some_claude_skills

πŸ“¦

Part of

erichowens/some_claude_skills(148 items)

photo-content-recognition-curation-expert

Installation

Add MarketplaceAdd marketplace to Claude Code
/plugin marketplace add erichowens/some_claude_skills
Install PluginInstall plugin from marketplace
/plugin install adhd-design-expert@some-claude-skills
Install PluginInstall plugin from marketplace
/plugin install some-claude-skills@some-claude-skills
git cloneClone repository
git clone https://github.com/erichowens/some_claude_skills.git
Claude Desktop ConfigurationAdd this to your claude_desktop_config.json
{ "mcpServers": { "prompt-learning": { "command": "npx", "args...
πŸ“– Extracted from docs: erichowens/some_claude_skills
13Installs
21
-
Last UpdatedJan 23, 2026

Skill Details

SKILL.md

Expert in photo content recognition, intelligent curation, and quality filtering. Specializes in face/animal/place recognition, perceptual hashing for de-duplication, screenshot/meme detection, burst photo selection, and quick indexing strategies. Activate on 'face recognition', 'face clustering', 'perceptual hash', 'near-duplicate', 'burst photo', 'screenshot detection', 'photo curation', 'photo indexing', 'NSFW detection', 'pet recognition', 'DINOHash', 'HDBSCAN faces'. NOT for GPS-based location clustering (use event-detection-temporal-intelligence-expert), color palette extraction (use color-theory-palette-harmony-expert), semantic image-text matching (use clip-aware-embeddings), or video analysis/frame extraction.

Overview

# Photo Content Recognition & Curation Expert

Expert in photo content analysis and intelligent curation. Combines classical computer vision with modern deep learning for comprehensive photo analysis.

When to Use This Skill

βœ… Use for:

  • Face recognition and clustering (identifying important people)
  • Animal/pet detection and clustering
  • Near-duplicate detection using perceptual hashing (DINOHash, pHash, dHash)
  • Burst photo selection (finding best frame from 10-50 shots)
  • Screenshot vs photo classification
  • Meme/download filtering
  • NSFW content detection
  • Quick indexing for large photo libraries (10K+)
  • Aesthetic quality scoring (NIMA)

❌ NOT for:

  • GPS-based location clustering β†’ event-detection-temporal-intelligence-expert
  • Color palette extraction β†’ color-theory-palette-harmony-expert
  • Semantic image-text matching β†’ clip-aware-embeddings
  • Video analysis or frame extraction

Quick Decision Tree

```

What do you need to recognize/filter?

β”‚

β”œβ”€ Duplicate photos? ─────────────────────────────── Perceptual Hashing

β”‚ β”œβ”€ Exact duplicates? ──────────────────────────── dHash (fastest)

β”‚ β”œβ”€ Brightness/contrast changes? ───────────────── pHash (DCT-based)

β”‚ β”œβ”€ Heavy crops/compression? ───────────────────── DINOHash (2025 SOTA)

β”‚ └─ Production system? ─────────────────────────── Hybrid (pHash β†’ DINOHash)

β”‚

β”œβ”€ People in photos? ─────────────────────────────── Face Clustering

β”‚ β”œβ”€ Known thresholds? ──────────────────────────── Apple-style Agglomerative

β”‚ └─ Unknown data distribution? ─────────────────── HDBSCAN

β”‚

β”œβ”€ Pets/Animals? ─────────────────────────────────── Pet Recognition

β”‚ β”œβ”€ Detection? ─────────────────────────────────── YOLOv8

β”‚ └─ Individual clustering? ─────────────────────── CLIP + HDBSCAN

β”‚

β”œβ”€ Best from burst? ──────────────────────────────── Burst Selection

β”‚ └─ Score: sharpness + face quality + aesthetics

β”‚

└─ Filter junk? ──────────────────────────────────── Content Detection

β”œβ”€ Screenshots? ───────────────────────────────── Multi-signal classifier

└─ NSFW? ──────────────────────────────────────── Safety classifier

```

---

Core Concepts

1. Perceptual Hashing for Near-Duplicate Detection

Problem: Camera bursts, re-saved images, and minor edits create near-duplicates.

Solution: Perceptual hashes generate similar values for visually similar images.

Method Comparison:

| Method | Speed | Robustness | Best For |

|--------|-------|------------|----------|

| dHash | Fastest | Low | Exact duplicates |

| pHash | Fast | Medium | Brightness/contrast changes |

| DINOHash | Slower | High | Heavy crops, compression |

| Hybrid | Medium | Very High | Production systems |

Hybrid Pipeline (2025 Best Practice):

  1. Stage 1: Fast pHash filtering (eliminates obvious non-duplicates)
  2. Stage 2: DINOHash refinement (accurate detection)
  3. Stage 3: Optional Siamese ViT verification

Hamming Distance Thresholds:

  • Conservative: ≀5 bits different = duplicates
  • Aggressive: ≀10 bits different = duplicates

β†’ Deep dive: references/perceptual-hashing.md

---

2. Face Recognition & Clustering

Goal: Group photos by person without user labeling.

Apple Photos Strategy (2021-2025):

  1. Extract face + upper body embeddings (FaceNet, 512-dim)
  2. Two-pass agglomerative clustering
  3. Conservative first pass (threshold=0.4, high precision)
  4. HAC second pass (threshold=0.6, increase recall)
  5. Incremental updates for new photos

HDBSCAN Alternative:

  • No threshold tuning required
  • Robust to noise
  • Better for unknown data distributions

Parameters:

| Setting | Agglomerative | HDBSCAN |

|---------|---------------|---------|

| Pass 1 threshold | 0.4 (cosine) | - |

| Pass 2 threshold | 0.6 (cosine) | - |

| Min cluster size | - | 3 photos |

| Metric | cosine | cosine |

β†’ Deep dive: references/face-clustering.md

---

3. Burst Photo Selection

Problem: Burst mode creates 10-50 nearly identical photos.

Multi-Criteria Scoring:

| Criterion | Weight | Measurement |

|-----------|--------|-------------|

| Sharpness | 30% | Laplacian variance |

| Face Quality | 35% | Eyes open, smiling, face sharpness |

| Aesthetics | 20% | NIMA score |

| Position | 10% | Middle frames bonus |

| Exposure | 5% | Histogram clipping check |

Burst Detection: Photos within 0.5 seconds of each other.

β†’ Deep dive: references/content-detection.md

---

4. Screenshot Detection

Multi-Signal Approach:

| Signal | Confidence | Description |

|--------|------------|-------------|

| UI elements | 0.85 | Status bars, buttons detected |

| Perfect rectangles | 0.75 | >5 UI buttons (90Β° angles) |

| High text | 0.70 | >25% text coverage (OCR) |

| No camera EXIF | 0.60 | Missing Make/Model/Lens |

| Device aspect | 0.60 | Exact phone screen ratio |

| Perfect sharpness | 0.50 | >2000 Laplacian variance |

Decision: Confidence >0.6 = screenshot

β†’ Deep dive: references/content-detection.md

---

5. Quick Indexing Pipeline

Goal: Index 10K+ photos efficiently with caching.

Features Extracted:

  • Perceptual hashes (de-duplication)
  • Face embeddings (people clustering)
  • CLIP embeddings (semantic search)
  • Color palettes
  • Aesthetic scores

Performance (10K photos, M1 MacBook Pro):

| Operation | Time |

|-----------|------|

| Perceptual hashing | 2 min |

| CLIP embeddings | 3 min (GPU) |

| Face detection | 4 min |

| Color palettes | 1 min |

| Aesthetic scoring | 2 min (GPU) |

| Clustering + dedup | 1 min |

| Total (first run) | ~13 min |

| Incremental | <1 min |

β†’ Deep dive: references/photo-indexing.md

---

Common Anti-Patterns

Anti-Pattern: Euclidean Distance for Face Embeddings

What it looks like:

```python

distance = np.linalg.norm(embedding1 - embedding2) # WRONG

```

Why it's wrong: Face embeddings are normalized; cosine similarity is the correct metric.

What to do instead:

```python

from scipy.spatial.distance import cosine

distance = cosine(embedding1, embedding2) # Correct

```

Anti-Pattern: Fixed Clustering Thresholds

What it looks like: Using same distance threshold for all face clusters.

Why it's wrong: Different people have varying intra-class variance (twins vs. diverse ages).

What to do instead: Use HDBSCAN for automatic threshold discovery, or two-pass clustering with conservative + relaxed passes.

Anti-Pattern: Raw Pixel Comparison for Duplicates

What it looks like:

```python

is_duplicate = np.allclose(img1, img2) # WRONG

```

Why it's wrong: Re-saved JPEGs, crops, brightness changes create pixel differences.

What to do instead: Perceptual hashing (pHash or DINOHash) with Hamming distance.

Anti-Pattern: Sequential Face Detection

What it looks like: Processing faces one photo at a time without batching.

Why it's wrong: GPU underutilization, 10x slower than batched.

What to do instead: Batch process images (batch_size=32) with GPU acceleration.

Anti-Pattern: No Confidence Filtering

What it looks like:

```python

for face in all_detected_faces:

cluster(face) # No filtering

```

Why it's wrong: Low-confidence detections create noise clusters (hands, objects).

What to do instead: Filter by confidence (threshold 0.9 for faces).

Anti-Pattern: Forcing Every Photo into Clusters

What it looks like: Assigning noise points to nearest cluster.

Why it's wrong: Solo appearances shouldn't pollute person clusters.

What to do instead: HDBSCAN/DBSCAN naturally identifies noise (label=-1). Keep noise separate.

---

Quick Start

```python

from photo_curation import PhotoCurationPipeline

pipeline = PhotoCurationPipeline()

# Index photo library

index = pipeline.index_library('/path/to/photos')

# De-duplicate

duplicates = index.find_duplicates()

print(f"Found {len(duplicates)} duplicate groups")

# Cluster faces

face_clusters = index.cluster_faces()

print(f"Found {len(face_clusters)} people")

# Select best from bursts

best_photos = pipeline.select_best_from_bursts(index)

# Filter screenshots

real_photos = pipeline.filter_screenshots(index)

# Curate for collage

collage_photos = pipeline.curate_for_collage(index, target_count=100)

```

---

Python Dependencies

```

torch transformers facenet-pytorch ultralytics hdbscan opencv-python scipy numpy scikit-learn pillow pytesseract

```

---

Integration Points

  • event-detection-temporal-intelligence-expert: Provides temporal event clustering for event-aware curation
  • color-theory-palette-harmony-expert: Extracts color palettes for visual diversity
  • collage-layout-expert: Receives curated photos for assembly
  • clip-aware-embeddings: Provides CLIP embeddings for semantic search and DeepDBSCAN

---

References

  1. DINOHash (2025): "Adversarially Fine-Tuned DINOv2 Features for Perceptual Hashing"
  2. Apple Photos (2021): "Recognizing People in Photos Through Private On-Device ML"
  3. HDBSCAN: "Hierarchical Density-Based Spatial Clustering" (2013-2025)
  4. Perceptual Hashing: dHash (Neal Krawetz), DCT-based pHash

---

Version: 2.0.0

Last Updated: November 2025