🎯

sql-optimization-patterns

🎯Skill

from ovachiever/droid-tings

VibeIndex|
What it does

Systematically optimize SQL queries by analyzing execution plans, implementing strategic indexing, and identifying performance bottlenecks to dramatically improve database efficiency.

πŸ“¦

Part of

ovachiever/droid-tings(370 items)

sql-optimization-patterns

Installation

git cloneClone repository
git clone https://github.com/ovachiever/droid-tings.git
πŸ“– Extracted from docs: ovachiever/droid-tings
14Installs
-
AddedFeb 4, 2026

Skill Details

SKILL.md

Master SQL query optimization, indexing strategies, and EXPLAIN analysis to dramatically improve database performance and eliminate slow queries. Use when debugging slow queries, designing database schemas, or optimizing application performance.

Overview

# SQL Optimization Patterns

Transform slow database queries into lightning-fast operations through systematic optimization, proper indexing, and query plan analysis.

When to Use This Skill

  • Debugging slow-running queries
  • Designing performant database schemas
  • Optimizing application response times
  • Reducing database load and costs
  • Improving scalability for growing datasets
  • Analyzing EXPLAIN query plans
  • Implementing efficient indexes
  • Resolving N+1 query problems

Core Concepts

1. Query Execution Plans (EXPLAIN)

Understanding EXPLAIN output is fundamental to optimization.

PostgreSQL EXPLAIN:

```sql

-- Basic explain

EXPLAIN SELECT * FROM users WHERE email = 'user@example.com';

-- With actual execution stats

EXPLAIN ANALYZE

SELECT * FROM users WHERE email = 'user@example.com';

-- Verbose output with more details

EXPLAIN (ANALYZE, BUFFERS, VERBOSE)

SELECT u.*, o.order_total

FROM users u

JOIN orders o ON u.id = o.user_id

WHERE u.created_at > NOW() - INTERVAL '30 days';

```

Key Metrics to Watch:

  • Seq Scan: Full table scan (usually slow for large tables)
  • Index Scan: Using index (good)
  • Index Only Scan: Using index without touching table (best)
  • Nested Loop: Join method (okay for small datasets)
  • Hash Join: Join method (good for larger datasets)
  • Merge Join: Join method (good for sorted data)
  • Cost: Estimated query cost (lower is better)
  • Rows: Estimated rows returned
  • Actual Time: Real execution time

2. Index Strategies

Indexes are the most powerful optimization tool.

Index Types:

  • B-Tree: Default, good for equality and range queries
  • Hash: Only for equality (=) comparisons
  • GIN: Full-text search, array queries, JSONB
  • GiST: Geometric data, full-text search
  • BRIN: Block Range INdex for very large tables with correlation

```sql

-- Standard B-Tree index

CREATE INDEX idx_users_email ON users(email);

-- Composite index (order matters!)

CREATE INDEX idx_orders_user_status ON orders(user_id, status);

-- Partial index (index subset of rows)

CREATE INDEX idx_active_users ON users(email)

WHERE status = 'active';

-- Expression index

CREATE INDEX idx_users_lower_email ON users(LOWER(email));

-- Covering index (include additional columns)

CREATE INDEX idx_users_email_covering ON users(email)

INCLUDE (name, created_at);

-- Full-text search index

CREATE INDEX idx_posts_search ON posts

USING GIN(to_tsvector('english', title || ' ' || body));

-- JSONB index

CREATE INDEX idx_metadata ON events USING GIN(metadata);

```

3. Query Optimization Patterns

*Avoid SELECT \:**

```sql

-- Bad: Fetches unnecessary columns

SELECT * FROM users WHERE id = 123;

-- Good: Fetch only what you need

SELECT id, email, name FROM users WHERE id = 123;

```

Use WHERE Clause Efficiently:

```sql

-- Bad: Function prevents index usage

SELECT * FROM users WHERE LOWER(email) = 'user@example.com';

-- Good: Create functional index or use exact match

CREATE INDEX idx_users_email_lower ON users(LOWER(email));

-- Then:

SELECT * FROM users WHERE LOWER(email) = 'user@example.com';

-- Or store normalized data

SELECT * FROM users WHERE email = 'user@example.com';

```

Optimize JOINs:

```sql

-- Bad: Cartesian product then filter

SELECT u.name, o.total

FROM users u, orders o

WHERE u.id = o.user_id AND u.created_at > '2024-01-01';

-- Good: Filter before join

SELECT u.name, o.total

FROM users u

JOIN orders o ON u.id = o.user_id

WHERE u.created_at > '2024-01-01';

-- Better: Filter both tables

SELECT u.name, o.total

FROM (SELECT * FROM users WHERE created_at > '2024-01-01') u

JOIN orders o ON u.id = o.user_id;

```

Optimization Patterns

Pattern 1: Eliminate N+1 Queries

Problem: N+1 Query Anti-Pattern

```python

# Bad: Executes N+1 queries

users = db.query("SELECT * FROM users LIMIT 10")

for user in users:

orders = db.query("SELECT * FROM orders WHERE user_id = ?", user.id)

# Process orders

```

Solution: Use JOINs or Batch Loading

```sql

-- Solution 1: JOIN

SELECT

u.id, u.name,

o.id as order_id, o.total

FROM users u

LEFT JOIN orders o ON u.id = o.user_id

WHERE u.id IN (1, 2, 3, 4, 5);

-- Solution 2: Batch query

SELECT * FROM orders

WHERE user_id IN (1, 2, 3, 4, 5);

```

```python

# Good: Single query with JOIN or batch load

# Using JOIN

results = db.query("""

SELECT u.id, u.name, o.id as order_id, o.total

FROM users u

LEFT JOIN orders o ON u.id = o.user_id

WHERE u.id IN (1, 2, 3, 4, 5)

""")

# Or batch load

users = db.query("SELECT * FROM users LIMIT 10")

user_ids = [u.id for u in users]

orders = db.query(

"SELECT * FROM orders WHERE user_id IN (?)",

user_ids

)

# Group orders by user_id

orders_by_user = {}

for order in orders:

orders_by_user.setdefault(order.user_id, []).append(order)

```

Pattern 2: Optimize Pagination

Bad: OFFSET on Large Tables

```sql

-- Slow for large offsets

SELECT * FROM users

ORDER BY created_at DESC

LIMIT 20 OFFSET 100000; -- Very slow!

```

Good: Cursor-Based Pagination

```sql

-- Much faster: Use cursor (last seen ID)

SELECT * FROM users

WHERE created_at < '2024-01-15 10:30:00' -- Last cursor

ORDER BY created_at DESC

LIMIT 20;

-- With composite sorting

SELECT * FROM users

WHERE (created_at, id) < ('2024-01-15 10:30:00', 12345)

ORDER BY created_at DESC, id DESC

LIMIT 20;

-- Requires index

CREATE INDEX idx_users_cursor ON users(created_at DESC, id DESC);

```

Pattern 3: Aggregate Efficiently

Optimize COUNT Queries:

```sql

-- Bad: Counts all rows

SELECT COUNT(*) FROM orders; -- Slow on large tables

-- Good: Use estimates for approximate counts

SELECT reltuples::bigint AS estimate

FROM pg_class

WHERE relname = 'orders';

-- Good: Filter before counting

SELECT COUNT(*) FROM orders

WHERE created_at > NOW() - INTERVAL '7 days';

-- Better: Use index-only scan

CREATE INDEX idx_orders_created ON orders(created_at);

SELECT COUNT(*) FROM orders

WHERE created_at > NOW() - INTERVAL '7 days';

```

Optimize GROUP BY:

```sql

-- Bad: Group by then filter

SELECT user_id, COUNT(*) as order_count

FROM orders

GROUP BY user_id

HAVING COUNT(*) > 10;

-- Better: Filter first, then group (if possible)

SELECT user_id, COUNT(*) as order_count

FROM orders

WHERE status = 'completed'

GROUP BY user_id

HAVING COUNT(*) > 10;

-- Best: Use covering index

CREATE INDEX idx_orders_user_status ON orders(user_id, status);

```

Pattern 4: Subquery Optimization

Transform Correlated Subqueries:

```sql

-- Bad: Correlated subquery (runs for each row)

SELECT u.name, u.email,

(SELECT COUNT(*) FROM orders o WHERE o.user_id = u.id) as order_count

FROM users u;

-- Good: JOIN with aggregation

SELECT u.name, u.email, COUNT(o.id) as order_count

FROM users u

LEFT JOIN orders o ON o.user_id = u.id

GROUP BY u.id, u.name, u.email;

-- Better: Use window functions

SELECT DISTINCT ON (u.id)

u.name, u.email,

COUNT(o.id) OVER (PARTITION BY u.id) as order_count

FROM users u

LEFT JOIN orders o ON o.user_id = u.id;

```

Use CTEs for Clarity:

```sql

-- Using Common Table Expressions

WITH recent_users AS (

SELECT id, name, email

FROM users

WHERE created_at > NOW() - INTERVAL '30 days'

),

user_order_counts AS (

SELECT user_id, COUNT(*) as order_count

FROM orders

WHERE created_at > NOW() - INTERVAL '30 days'

GROUP BY user_id

)

SELECT ru.name, ru.email, COALESCE(uoc.order_count, 0) as orders

FROM recent_users ru

LEFT JOIN user_order_counts uoc ON ru.id = uoc.user_id;

```

Pattern 5: Batch Operations

Batch INSERT:

```sql

-- Bad: Multiple individual inserts

INSERT INTO users (name, email) VALUES ('Alice', 'alice@example.com');

INSERT INTO users (name, email) VALUES ('Bob', 'bob@example.com');

INSERT INTO users (name, email) VALUES ('Carol', 'carol@example.com');

-- Good: Batch insert

INSERT INTO users (name, email) VALUES

('Alice', 'alice@example.com'),

('Bob', 'bob@example.com'),

('Carol', 'carol@example.com');

-- Better: Use COPY for bulk inserts (PostgreSQL)

COPY users (name, email) FROM '/tmp/users.csv' CSV HEADER;

```

Batch UPDATE:

```sql

-- Bad: Update in loop

UPDATE users SET status = 'active' WHERE id = 1;

UPDATE users SET status = 'active' WHERE id = 2;

-- ... repeat for many IDs

-- Good: Single UPDATE with IN clause

UPDATE users

SET status = 'active'

WHERE id IN (1, 2, 3, 4, 5, ...);

-- Better: Use temporary table for large batches

CREATE TEMP TABLE temp_user_updates (id INT, new_status VARCHAR);

INSERT INTO temp_user_updates VALUES (1, 'active'), (2, 'active'), ...;

UPDATE users u

SET status = t.new_status

FROM temp_user_updates t

WHERE u.id = t.id;

```

Advanced Techniques

Materialized Views

Pre-compute expensive queries.

```sql

-- Create materialized view

CREATE MATERIALIZED VIEW user_order_summary AS

SELECT

u.id,

u.name,

COUNT(o.id) as total_orders,

SUM(o.total) as total_spent,

MAX(o.created_at) as last_order_date

FROM users u

LEFT JOIN orders o ON u.id = o.user_id

GROUP BY u.id, u.name;

-- Add index to materialized view

CREATE INDEX idx_user_summary_spent ON user_order_summary(total_spent DESC);

-- Refresh materialized view

REFRESH MATERIALIZED VIEW user_order_summary;

-- Concurrent refresh (PostgreSQL)

REFRESH MATERIALIZED VIEW CONCURRENTLY user_order_summary;

-- Query materialized view (very fast)

SELECT * FROM user_order_summary

WHERE total_spent > 1000

ORDER BY total_spent DESC;

```

Partitioning

Split large tables for better performance.

```sql

-- Range partitioning by date (PostgreSQL)

CREATE TABLE orders (

id SERIAL,

user_id INT,

total DECIMAL,

created_at TIMESTAMP

) PARTITION BY RANGE (created_at);

-- Create partitions

CREATE TABLE orders_2024_q1 PARTITION OF orders

FOR VALUES FROM ('2024-01-01') TO ('2024-04-01');

CREATE TABLE orders_2024_q2 PARTITION OF orders

FOR VALUES FROM ('2024-04-01') TO ('2024-07-01');

-- Queries automatically use appropriate partition

SELECT * FROM orders

WHERE created_at BETWEEN '2024-02-01' AND '2024-02-28';

-- Only scans orders_2024_q1 partition

```

Query Hints and Optimization

```sql

-- Force index usage (MySQL)

SELECT * FROM users

USE INDEX (idx_users_email)

WHERE email = 'user@example.com';

-- Parallel query (PostgreSQL)

SET max_parallel_workers_per_gather = 4;

SELECT * FROM large_table WHERE condition;

-- Join hints (PostgreSQL)

SET enable_nestloop = OFF; -- Force hash or merge join

```

Best Practices

  1. Index Selectively: Too many indexes slow down writes
  2. Monitor Query Performance: Use slow query logs
  3. Keep Statistics Updated: Run ANALYZE regularly
  4. Use Appropriate Data Types: Smaller types = better performance
  5. Normalize Thoughtfully: Balance normalization vs performance
  6. Cache Frequently Accessed Data: Use application-level caching
  7. Connection Pooling: Reuse database connections
  8. Regular Maintenance: VACUUM, ANALYZE, rebuild indexes

```sql

-- Update statistics

ANALYZE users;

ANALYZE VERBOSE orders;

-- Vacuum (PostgreSQL)

VACUUM ANALYZE users;

VACUUM FULL users; -- Reclaim space (locks table)

-- Reindex

REINDEX INDEX idx_users_email;

REINDEX TABLE users;

```

Common Pitfalls

  • Over-Indexing: Each index slows down INSERT/UPDATE/DELETE
  • Unused Indexes: Waste space and slow writes
  • Missing Indexes: Slow queries, full table scans
  • Implicit Type Conversion: Prevents index usage
  • OR Conditions: Can't use indexes efficiently
  • LIKE with Leading Wildcard: LIKE '%abc' can't use index
  • Function in WHERE: Prevents index usage unless functional index exists

Monitoring Queries

```sql

-- Find slow queries (PostgreSQL)

SELECT query, calls, total_time, mean_time

FROM pg_stat_statements

ORDER BY mean_time DESC

LIMIT 10;

-- Find missing indexes (PostgreSQL)

SELECT

schemaname,

tablename,

seq_scan,

seq_tup_read,

idx_scan,

seq_tup_read / seq_scan AS avg_seq_tup_read

FROM pg_stat_user_tables

WHERE seq_scan > 0

ORDER BY seq_tup_read DESC

LIMIT 10;

-- Find unused indexes (PostgreSQL)

SELECT

schemaname,

tablename,

indexname,

idx_scan,

idx_tup_read,

idx_tup_fetch

FROM pg_stat_user_indexes

WHERE idx_scan = 0

ORDER BY pg_relation_size(indexrelid) DESC;

```

Resources

  • references/postgres-optimization-guide.md: PostgreSQL-specific optimization
  • references/mysql-optimization-guide.md: MySQL/MariaDB optimization
  • references/query-plan-analysis.md: Deep dive into EXPLAIN plans
  • assets/index-strategy-checklist.md: When and how to create indexes
  • assets/query-optimization-checklist.md: Step-by-step optimization guide
  • scripts/analyze-slow-queries.sql: Identify slow queries in your database
  • scripts/index-recommendations.sql: Generate index recommendations