Normalization
Database normalization reduces redundancy and ensures data integrity.
#### 1st Normal Form (1NF)
Rule: Each column contains atomic (indivisible) values, no repeating groups.
```sql
-- β Violates 1NF (multiple values in one column)
CREATE TABLE orders (
id INT PRIMARY KEY,
customer_id INT,
product_ids VARCHAR(255) -- '101,102,103' (bad!)
);
-- β
Follows 1NF
CREATE TABLE orders (
id INT PRIMARY KEY,
customer_id INT
);
CREATE TABLE order_items (
id INT PRIMARY KEY,
order_id INT,
product_id INT,
FOREIGN KEY (order_id) REFERENCES orders(id)
);
```
#### 2nd Normal Form (2NF)
Rule: Must be in 1NF + all non-key columns depend on the entire primary key.
```sql
-- β Violates 2NF (customer_name depends only on customer_id, not full key)
CREATE TABLE order_items (
order_id INT,
product_id INT,
customer_id INT,
customer_name VARCHAR(100), -- Depends on customer_id only
quantity INT,
PRIMARY KEY (order_id, product_id)
);
-- β
Follows 2NF (customer data in separate table)
CREATE TABLE orders (
id INT PRIMARY KEY,
customer_id INT,
FOREIGN KEY (customer_id) REFERENCES customers(id)
);
CREATE TABLE order_items (
order_id INT,
product_id INT,
quantity INT,
PRIMARY KEY (order_id, product_id)
);
CREATE TABLE customers (
id INT PRIMARY KEY,
name VARCHAR(100)
);
```
#### 3rd Normal Form (3NF)
Rule: Must be in 2NF + no transitive dependencies (non-key columns depend only on primary key).
```sql
-- β Violates 3NF (country depends on postal_code, not on customer_id)
CREATE TABLE customers (
id INT PRIMARY KEY,
name VARCHAR(100),
postal_code VARCHAR(10),
country VARCHAR(50) -- Depends on postal_code, not id
);
-- β
Follows 3NF
CREATE TABLE customers (
id INT PRIMARY KEY,
name VARCHAR(100),
postal_code VARCHAR(10),
FOREIGN KEY (postal_code) REFERENCES postal_codes(code)
);
CREATE TABLE postal_codes (
code VARCHAR(10) PRIMARY KEY,
country VARCHAR(50)
);
```
#### Denormalization (When to Break Rules)
Sometimes denormalization improves performance for read-heavy applications.
```sql
-- Denormalized for performance (caching derived data)
CREATE TABLE orders (
id INT PRIMARY KEY,
customer_id INT,
total_amount DECIMAL(10, 2), -- Calculated from order_items
item_count INT, -- Calculated from order_items
created_at TIMESTAMP
);
-- Trigger or application code keeps denormalized data in sync
```
When to denormalize:
- Read-heavy applications (reporting, analytics)
- Frequently joined tables causing performance issues
- Pre-calculated aggregates (counts, sums, averages)
- Caching derived data to avoid complex joins
---
Data Types
Choose appropriate data types for efficiency and accuracy.
#### String Types
```sql
-- Fixed-length (use for predictable lengths)
CHAR(10) -- ISO date: '2025-10-31'
CHAR(2) -- State code: 'CA'
-- Variable-length (use for variable lengths)
VARCHAR(255) -- Email, name, short text
TEXT -- Long text (articles, descriptions)
-- β
Good: Appropriate sizes
email VARCHAR(255)
phone_number VARCHAR(20)
postal_code VARCHAR(10)
-- β Bad: Wasteful or too small
email VARCHAR(500) -- Too large
description VARCHAR(50) -- Too small for long text
```
#### Numeric Types
```sql
-- Integer types
TINYINT -- -128 to 127 (age, status codes)
SMALLINT -- -32,768 to 32,767 (quantities)
INT -- -2.1B to 2.1B (IDs, counts)
BIGINT -- Large numbers (timestamps, large IDs)
-- Decimal types
DECIMAL(10, 2) -- Exact precision (money: $99,999,999.99)
FLOAT -- Approximate (scientific calculations)
DOUBLE -- Higher precision approximations
-- β
Use DECIMAL for money
CREATE TABLE products (
id INT PRIMARY KEY,
price DECIMAL(10, 2) -- Exact precision
);
-- β Don't use FLOAT for money
price FLOAT -- Rounding errors!
```
#### Date/Time Types
```sql
DATE -- Date only: 2025-10-31
TIME -- Time only: 14:30:00
DATETIME -- Date + time: 2025-10-31 14:30:00
TIMESTAMP -- Unix timestamp (auto-converts timezone)
-- β
Always store in UTC
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP
```
#### Boolean
```sql
-- PostgreSQL
is_active BOOLEAN DEFAULT TRUE
-- MySQL
is_active TINYINT(1) DEFAULT 1
```
---
Indexing Strategies
Indexes speed up reads but slow down writes. Use strategically.
#### When to Create Indexes
```sql
-- β
Index foreign keys
CREATE INDEX idx_orders_customer_id ON orders(customer_id);
-- β
Index frequently queried columns
CREATE INDEX idx_users_email ON users(email);
-- β
Index columns used in WHERE, ORDER BY, GROUP BY
CREATE INDEX idx_orders_created_at ON orders(created_at);
-- β
Composite index for multi-column queries
CREATE INDEX idx_orders_customer_status ON orders(customer_id, status);
```
#### Index Types
B-Tree Index (Default)
```sql
-- Best for equality and range queries
CREATE INDEX idx_products_price ON products(price);
-- Queries that benefit:
SELECT * FROM products WHERE price > 100;
SELECT * FROM products WHERE price BETWEEN 50 AND 150;
```
Hash Index
```sql
-- Best for exact matches only (not ranges)
CREATE INDEX idx_users_email USING HASH ON users(email);
-- Queries that benefit:
SELECT * FROM users WHERE email = 'user@example.com';
```
Full-Text Index
```sql
-- Best for text search
CREATE FULLTEXT INDEX idx_articles_content ON articles(title, content);
-- Queries that benefit:
SELECT * FROM articles WHERE MATCH(title, content) AGAINST('database design');
```
Partial Index (PostgreSQL)
```sql
-- Index only specific rows
CREATE INDEX idx_active_users ON users(email) WHERE is_active = TRUE;
```
#### Composite Indexes (Column Order Matters)
```sql
-- β
Good: Index supports both queries
CREATE INDEX idx_orders_customer_status ON orders(customer_id, status);
-- Query 1: Uses index efficiently
SELECT * FROM orders WHERE customer_id = 123 AND status = 'pending';
-- Query 2: Uses index (customer_id only)
SELECT * FROM orders WHERE customer_id = 123;
-- β Query 3: Doesn't use index (status is second column)
SELECT * FROM orders WHERE status = 'pending';
```
Rule of Thumb: Put most selective column first, or most frequently queried alone.
---
Constraints
Use constraints to enforce data integrity at the database level.
#### Primary Key
```sql
-- Auto-incrementing integer
CREATE TABLE users (
id INT AUTO_INCREMENT PRIMARY KEY,
email VARCHAR(255) UNIQUE NOT NULL
);
-- UUID (better for distributed systems)
CREATE TABLE users (
id CHAR(36) PRIMARY KEY DEFAULT (UUID()),
email VARCHAR(255) UNIQUE NOT NULL
);
```
#### Foreign Key
```sql
CREATE TABLE orders (
id INT PRIMARY KEY,
customer_id INT NOT NULL,
FOREIGN KEY (customer_id) REFERENCES customers(id)
ON DELETE CASCADE -- Delete orders when customer deleted
ON UPDATE CASCADE -- Update orders when customer ID changes
);
-- Alternatives:
ON DELETE RESTRICT -- Prevent deletion if referenced
ON DELETE SET NULL -- Set to NULL when parent deleted
ON DELETE NO ACTION -- Same as RESTRICT
```
#### Unique Constraint
```sql
CREATE TABLE users (
id INT PRIMARY KEY,
email VARCHAR(255) UNIQUE NOT NULL,
username VARCHAR(50) UNIQUE NOT NULL
);
-- Composite unique constraint
CREATE TABLE enrollments (
student_id INT,
course_id INT,
UNIQUE (student_id, course_id) -- Prevent duplicate enrollments
);
```
#### Check Constraint
```sql
CREATE TABLE products (
id INT PRIMARY KEY,
price DECIMAL(10, 2) CHECK (price >= 0),
stock INT CHECK (stock >= 0),
discount_percent INT CHECK (discount_percent BETWEEN 0 AND 100)
);
```
#### Not Null Constraint
```sql
CREATE TABLE users (
id INT PRIMARY KEY,
email VARCHAR(255) NOT NULL,
name VARCHAR(100) NOT NULL,
bio TEXT -- Nullable (optional)
);
```
---
Common Schema Patterns
#### One-to-Many (Orders β Order Items)
```sql
CREATE TABLE orders (
id INT PRIMARY KEY,
customer_id INT NOT NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
CREATE TABLE order_items (
id INT PRIMARY KEY,
order_id INT NOT NULL,
product_id INT NOT NULL,
quantity INT NOT NULL,
price DECIMAL(10, 2) NOT NULL,
FOREIGN KEY (order_id) REFERENCES orders(id) ON DELETE CASCADE
);
```
#### Many-to-Many (Students β Courses)
```sql
CREATE TABLE students (
id INT PRIMARY KEY,
name VARCHAR(100) NOT NULL
);
CREATE TABLE courses (
id INT PRIMARY KEY,
title VARCHAR(200) NOT NULL
);
-- Junction table (also called join table, linking table)
CREATE TABLE enrollments (
student_id INT,
course_id INT,
enrolled_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
grade VARCHAR(2),
PRIMARY KEY (student_id, course_id),
FOREIGN KEY (student_id) REFERENCES students(id) ON DELETE CASCADE,
FOREIGN KEY (course_id) REFERENCES courses(id) ON DELETE CASCADE
);
```
#### Self-Referencing (Employees β Manager)
```sql
CREATE TABLE employees (
id INT PRIMARY KEY,
name VARCHAR(100) NOT NULL,
manager_id INT,
FOREIGN KEY (manager_id) REFERENCES employees(id)
);
```
#### Polymorphic Relationships (Comments on Posts/Photos)
```sql
-- Approach 1: Separate foreign keys with CHECK constraint
CREATE TABLE comments (
id INT PRIMARY KEY,
content TEXT NOT NULL,
post_id INT,
photo_id INT,
CHECK (
(post_id IS NOT NULL AND photo_id IS NULL) OR
(post_id IS NULL AND photo_id IS NOT NULL)
),
FOREIGN KEY (post_id) REFERENCES posts(id) ON DELETE CASCADE,
FOREIGN KEY (photo_id) REFERENCES photos(id) ON DELETE CASCADE
);
-- Approach 2: commentable_type + commentable_id (Rails-style)
CREATE TABLE comments (
id INT PRIMARY KEY,
content TEXT NOT NULL,
commentable_type VARCHAR(50) NOT NULL, -- 'Post' or 'Photo'
commentable_id INT NOT NULL
);
-- Note: No foreign key constraint possible (less data integrity)
```
---