🎯

photo-content-recognition-curation-expert

🎯Skill

from erichowens/some_claude_skills

What it does

photo-content-recognition-curation-expert skill from erichowens/some_claude_skills

📦

Part of

erichowens/some_claude_skills(148 items)

photo-content-recognition-curation-expert

Installation

Add MarketplaceAdd marketplace to Claude Code

/plugin marketplace add erichowens/some_claude_skills

Install PluginInstall plugin from marketplace

/plugin install adhd-design-expert@some-claude-skills

Install PluginInstall plugin from marketplace

/plugin install some-claude-skills@some-claude-skills

git cloneClone repository

git clone https://github.com/erichowens/some_claude_skills.git

Claude Desktop ConfigurationAdd this to your claude_desktop_config.json

{
  "mcpServers": {
    "prompt-learning": {
      "command": "npx",
      "args...

📖 Extracted from docs: erichowens/some_claude_skills

Need more details? View full documentation on GitHub →

13Installs

Last UpdatedJan 23, 2026

View on GitHub Back to Skills

Skill Details

SKILL.md

Expert in photo content recognition, intelligent curation, and quality filtering. Specializes in face/animal/place recognition, perceptual hashing for de-duplication, screenshot/meme detection, burst photo selection, and quick indexing strategies. Activate on 'face recognition', 'face clustering', 'perceptual hash', 'near-duplicate', 'burst photo', 'screenshot detection', 'photo curation', 'photo indexing', 'NSFW detection', 'pet recognition', 'DINOHash', 'HDBSCAN faces'. NOT for GPS-based location clustering (use event-detection-temporal-intelligence-expert), color palette extraction (use color-theory-palette-harmony-expert), semantic image-text matching (use clip-aware-embeddings), or video analysis/frame extraction.

Overview

# Photo Content Recognition & Curation Expert

Expert in photo content analysis and intelligent curation. Combines classical computer vision with modern deep learning for comprehensive photo analysis.

When to Use This Skill

✅ Use for:

Face recognition and clustering (identifying important people)
Animal/pet detection and clustering
Near-duplicate detection using perceptual hashing (DINOHash, pHash, dHash)
Burst photo selection (finding best frame from 10-50 shots)
Screenshot vs photo classification
Meme/download filtering
NSFW content detection
Quick indexing for large photo libraries (10K+)
Aesthetic quality scoring (NIMA)

❌ NOT for:

GPS-based location clustering → event-detection-temporal-intelligence-expert
Color palette extraction → color-theory-palette-harmony-expert
Semantic image-text matching → clip-aware-embeddings
Video analysis or frame extraction

Quick Decision Tree

```

What do you need to recognize/filter?

│

├─ Duplicate photos? ─────────────────────────────── Perceptual Hashing

│ ├─ Exact duplicates? ──────────────────────────── dHash (fastest)

│ ├─ Brightness/contrast changes? ───────────────── pHash (DCT-based)

│ ├─ Heavy crops/compression? ───────────────────── DINOHash (2025 SOTA)

│ └─ Production system? ─────────────────────────── Hybrid (pHash → DINOHash)

│

├─ People in photos? ─────────────────────────────── Face Clustering

│ ├─ Known thresholds? ──────────────────────────── Apple-style Agglomerative

│ └─ Unknown data distribution? ─────────────────── HDBSCAN

│

├─ Pets/Animals? ─────────────────────────────────── Pet Recognition

│ ├─ Detection? ─────────────────────────────────── YOLOv8

│ └─ Individual clustering? ─────────────────────── CLIP + HDBSCAN

│

├─ Best from burst? ──────────────────────────────── Burst Selection

│ └─ Score: sharpness + face quality + aesthetics

│

└─ Filter junk? ──────────────────────────────────── Content Detection

├─ Screenshots? ───────────────────────────────── Multi-signal classifier

└─ NSFW? ──────────────────────────────────────── Safety classifier

```

---

Core Concepts

1. Perceptual Hashing for Near-Duplicate Detection

Problem: Camera bursts, re-saved images, and minor edits create near-duplicates.

Solution: Perceptual hashes generate similar values for visually similar images.

Method Comparison:

|--------|-------|------------|----------|

Hybrid Pipeline (2025 Best Practice):

Stage 1: Fast pHash filtering (eliminates obvious non-duplicates)
Stage 2: DINOHash refinement (accurate detection)
Stage 3: Optional Siamese ViT verification

Hamming Distance Thresholds:

Conservative: ≤5 bits different = duplicates
Aggressive: ≤10 bits different = duplicates

→ Deep dive: references/perceptual-hashing.md

---

2. Face Recognition & Clustering

Goal: Group photos by person without user labeling.

Apple Photos Strategy (2021-2025):

Extract face + upper body embeddings (FaceNet, 512-dim)
Two-pass agglomerative clustering
Conservative first pass (threshold=0.4, high precision)
HAC second pass (threshold=0.6, increase recall)
Incremental updates for new photos

HDBSCAN Alternative:

No threshold tuning required
Robust to noise
Better for unknown data distributions

Parameters:

| Setting | Agglomerative | HDBSCAN |

|---------|---------------|---------|

| Pass 1 threshold | 0.4 (cosine) | - |

| Pass 2 threshold | 0.6 (cosine) | - |

| Min cluster size | - | 3 photos |

| Metric | cosine | cosine |

→ Deep dive: references/face-clustering.md

---

3. Burst Photo Selection

Problem: Burst mode creates 10-50 nearly identical photos.

Multi-Criteria Scoring:

| Criterion | Weight | Measurement |

|-----------|--------|-------------|

| Sharpness | 30% | Laplacian variance |

| Face Quality | 35% | Eyes open, smiling, face sharpness |

| Aesthetics | 20% | NIMA score |

| Position | 10% | Middle frames bonus |

| Exposure | 5% | Histogram clipping check |

Burst Detection: Photos within 0.5 seconds of each other.

→ Deep dive: references/content-detection.md

---

4. Screenshot Detection

Multi-Signal Approach:

| Signal | Confidence | Description |

|--------|------------|-------------|

| UI elements | 0.85 | Status bars, buttons detected |

| Perfect rectangles | 0.75 | >5 UI buttons (90° angles) |

| High text | 0.70 | >25% text coverage (OCR) |

| No camera EXIF | 0.60 | Missing Make/Model/Lens |

| Device aspect | 0.60 | Exact phone screen ratio |

| Perfect sharpness | 0.50 | >2000 Laplacian variance |

Decision: Confidence >0.6 = screenshot

→ Deep dive: references/content-detection.md

---

5. Quick Indexing Pipeline

Goal: Index 10K+ photos efficiently with caching.

Features Extracted:

Perceptual hashes (de-duplication)
Face embeddings (people clustering)
CLIP embeddings (semantic search)
Color palettes
Aesthetic scores

Performance (10K photos, M1 MacBook Pro):

| Operation | Time |

|-----------|------|

| Perceptual hashing | 2 min |

| CLIP embeddings | 3 min (GPU) |

| Face detection | 4 min |

| Color palettes | 1 min |

| Aesthetic scoring | 2 min (GPU) |

| Clustering + dedup | 1 min |

| Total (first run) | ~13 min |

| Incremental | <1 min |

→ Deep dive: references/photo-indexing.md

---

Common Anti-Patterns

Anti-Pattern: Euclidean Distance for Face Embeddings

What it looks like:

```python

distance = np.linalg.norm(embedding1 - embedding2) # WRONG

```

Why it's wrong: Face embeddings are normalized; cosine similarity is the correct metric.

What to do instead:

```python

from scipy.spatial.distance import cosine

distance = cosine(embedding1, embedding2) # Correct

```

Anti-Pattern: Fixed Clustering Thresholds

What it looks like: Using same distance threshold for all face clusters.

Why it's wrong: Different people have varying intra-class variance (twins vs. diverse ages).

What to do instead: Use HDBSCAN for automatic threshold discovery, or two-pass clustering with conservative + relaxed passes.

Anti-Pattern: Raw Pixel Comparison for Duplicates

What it looks like:

```python

is_duplicate = np.allclose(img1, img2) # WRONG

```

Why it's wrong: Re-saved JPEGs, crops, brightness changes create pixel differences.

What to do instead: Perceptual hashing (pHash or DINOHash) with Hamming distance.

Anti-Pattern: Sequential Face Detection

What it looks like: Processing faces one photo at a time without batching.

Why it's wrong: GPU underutilization, 10x slower than batched.

What to do instead: Batch process images (batch_size=32) with GPU acceleration.

Anti-Pattern: No Confidence Filtering

What it looks like:

```python

for face in all_detected_faces:

cluster(face) # No filtering

```

Why it's wrong: Low-confidence detections create noise clusters (hands, objects).

What to do instead: Filter by confidence (threshold 0.9 for faces).

Anti-Pattern: Forcing Every Photo into Clusters

What it looks like: Assigning noise points to nearest cluster.

Why it's wrong: Solo appearances shouldn't pollute person clusters.

What to do instead: HDBSCAN/DBSCAN naturally identifies noise (label=-1). Keep noise separate.

---

Quick Start

```python

from photo_curation import PhotoCurationPipeline

pipeline = PhotoCurationPipeline()

# Index photo library

index = pipeline.index_library('/path/to/photos')

# De-duplicate

duplicates = index.find_duplicates()

print(f"Found {len(duplicates)} duplicate groups")

# Cluster faces

face_clusters = index.cluster_faces()

print(f"Found {len(face_clusters)} people")

# Select best from bursts

best_photos = pipeline.select_best_from_bursts(index)

# Filter screenshots

real_photos = pipeline.filter_screenshots(index)

# Curate for collage

collage_photos = pipeline.curate_for_collage(index, target_count=100)

```

---

Python Dependencies

```

torch transformers facenet-pytorch ultralytics hdbscan opencv-python scipy numpy scikit-learn pillow pytesseract

```

---

Integration Points

event-detection-temporal-intelligence-expert: Provides temporal event clustering for event-aware curation
color-theory-palette-harmony-expert: Extracts color palettes for visual diversity
collage-layout-expert: Receives curated photos for assembly
clip-aware-embeddings: Provides CLIP embeddings for semantic search and DeepDBSCAN

---

References

DINOHash (2025): "Adversarially Fine-Tuned DINOv2 Features for Perceptual Hashing"
Apple Photos (2021): "Recognizing People in Photos Through Private On-Device ML"
HDBSCAN: "Hierarchical Density-Based Spatial Clustering" (2013-2025)
Perceptual Hashing: dHash (Neal Krawetz), DCT-based pHash

---

Version: 2.0.0

Last Updated: November 2025

More from this repository10

🎯

cv-creator🎯Skill

cv-creator skill from erichowens/some_claude_skills

🎯

metal-shader-expert🎯Skill

metal-shader-expert skill from erichowens/some_claude_skills

🎯

video-processing-editing🎯Skill

video-processing-editing skill from erichowens/some_claude_skills

🎯

ai-video-production-master🎯Skill

ai-video-production-master skill from erichowens/some_claude_skills

🎯

pwa-expert🎯Skill

pwa-expert skill from erichowens/some_claude_skills

🎯

sound-engineer🎯Skill

sound-engineer skill from erichowens/some_claude_skills

🎯

mdx-sanitizer🎯Skill

Sanitizes MDX content by escaping angle brackets, generics, and JSX-conflicting patterns to prevent build failures and parsing errors.

🎯

indie-monetization-strategist🎯Skill

indie-monetization-strategist skill from erichowens/some_claude_skills

🎯

mobile-ux-optimizer🎯Skill

Optimizes mobile user interfaces by implementing touch-friendly, responsive design patterns with proper viewport handling and performance considerations.

🎯

job-application-optimizer🎯Skill

job-application-optimizer skill from erichowens/some_claude_skills