window-function-builder
🎯Skillfrom dengineproblem/agents-monorepo
Builds advanced SQL window functions for analytical queries, enabling complex ranking, aggregation, and statistical calculations across data partitions.
Part of
dengineproblem/agents-monorepo(106 items)
Installation
docker compose up -d --build agent-braindocker compose up -d --build agent-servicegit clone <repo-url>docker compose up -d --build./test-video-upload.sh ./path/to/video.mp4+ 3 more commands
Skill Details
Эксперт SQL window functions. Используй для аналитических запросов, OVER clauses и ranking functions.
Overview
# SQL Window Function Builder
Эксперт по SQL window functions и аналитическим запросам для бизнес-аналитики.
Core Function Categories
Function Overview
```yaml
window_functions:
ranking:
- ROW_NUMBER(): "Unique sequential numbers"
- RANK(): "Ranking with gaps for ties"
- DENSE_RANK(): "Ranking without gaps"
- NTILE(n): "Divide into n buckets"
aggregate:
- SUM(): "Running/cumulative totals"
- AVG(): "Moving averages"
- COUNT(): "Running counts"
- MIN(): "Running minimum"
- MAX(): "Running maximum"
offset:
- LAG(col, n): "Access previous row"
- LEAD(col, n): "Access next row"
- FIRST_VALUE(): "First value in frame"
- LAST_VALUE(): "Last value in frame"
- NTH_VALUE(col, n): "Nth value in frame"
statistical:
- PERCENT_RANK(): "Relative rank as percentage"
- CUME_DIST(): "Cumulative distribution"
- PERCENTILE_CONT(): "Interpolated percentile"
- PERCENTILE_DISC(): "Discrete percentile"
```
---
Basic Syntax
OVER Clause Structure
```sql
function_name(expression) OVER (
[PARTITION BY partition_expression, ...]
[ORDER BY sort_expression [ASC|DESC], ...]
[frame_clause]
)
```
Frame Specifications
```sql
-- Frame clause syntax
ROWS | RANGE BETWEEN frame_start AND frame_end
-- Frame bounds
UNBOUNDED PRECEDING -- From first row of partition
n PRECEDING -- n rows before current
CURRENT ROW -- Current row
n FOLLOWING -- n rows after current
UNBOUNDED FOLLOWING -- To last row of partition
-- Common frame patterns
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW -- Running total (default for ORDER BY)
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW -- 7-day rolling
ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING -- 3-row centered
RANGE BETWEEN INTERVAL '30' DAY PRECEDING AND CURRENT ROW -- 30-day range
```
---
Ranking Functions
ROW_NUMBER, RANK, DENSE_RANK
```sql
-- Basic ranking comparison
SELECT
employee_id,
department,
salary,
ROW_NUMBER() OVER (ORDER BY salary DESC) AS row_num,
RANK() OVER (ORDER BY salary DESC) AS rank,
DENSE_RANK() OVER (ORDER BY salary DESC) AS dense_rank
FROM employees;
-- Results with tied salaries:
-- | employee_id | salary | row_num | rank | dense_rank |
-- |-------------|--------|---------|------|------------|
-- | 101 | 100000 | 1 | 1 | 1 |
-- | 102 | 100000 | 2 | 1 | 1 | <- same salary
-- | 103 | 90000 | 3 | 3 | 2 | <- note rank skips 2
```
Ranking Within Groups
```sql
-- Top 3 salaries per department
WITH ranked AS (
SELECT
employee_id,
employee_name,
department,
salary,
DENSE_RANK() OVER (
PARTITION BY department
ORDER BY salary DESC
) AS dept_rank
FROM employees
)
SELECT *
FROM ranked
WHERE dept_rank <= 3;
```
NTILE for Percentiles
```sql
-- Divide customers into quartiles by revenue
SELECT
customer_id,
customer_name,
total_revenue,
NTILE(4) OVER (ORDER BY total_revenue DESC) AS revenue_quartile,
CASE NTILE(4) OVER (ORDER BY total_revenue DESC)
WHEN 1 THEN 'Top 25%'
WHEN 2 THEN '25-50%'
WHEN 3 THEN '50-75%'
WHEN 4 THEN 'Bottom 25%'
END AS segment
FROM customer_revenue;
```
---
Running Calculations
Running Totals
```sql
-- Running total of sales
SELECT
order_date,
order_amount,
SUM(order_amount) OVER (
ORDER BY order_date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS running_total
FROM orders;
-- Running total per customer
SELECT
customer_id,
order_date,
order_amount,
SUM(order_amount) OVER (
PARTITION BY customer_id
ORDER BY order_date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS customer_running_total
FROM orders;
```
Moving Averages
```sql
-- 7-day moving average
SELECT
date,
daily_revenue,
AVG(daily_revenue) OVER (
ORDER BY date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) AS moving_avg_7d,
-- Also track the count for partial windows
COUNT(*) OVER (
ORDER BY date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) AS days_in_window
FROM daily_sales;
-- Centered moving average (3 days before, current, 3 days after)
SELECT
date,
daily_revenue,
AVG(daily_revenue) OVER (
ORDER BY date
ROWS BETWEEN 3 PRECEDING AND 3 FOLLOWING
) AS centered_avg_7d
FROM daily_sales;
```
Cumulative Percentage
```sql
-- Cumulative percentage of total
SELECT
product_category,
revenue,
SUM(revenue) OVER (ORDER BY revenue DESC) AS cumulative_revenue,
ROUND(
100.0 * SUM(revenue) OVER (ORDER BY revenue DESC) /
SUM(revenue) OVER (),
2
) AS cumulative_pct
FROM category_sales
ORDER BY revenue DESC;
```
---
LAG and LEAD
Period-over-Period Comparison
```sql
-- Month-over-month growth
SELECT
month,
revenue,
LAG(revenue, 1) OVER (ORDER BY month) AS prev_month_revenue,
revenue - LAG(revenue, 1) OVER (ORDER BY month) AS mom_change,
ROUND(
100.0 * (revenue - LAG(revenue, 1) OVER (ORDER BY month)) /
NULLIF(LAG(revenue, 1) OVER (ORDER BY month), 0),
2
) AS mom_growth_pct
FROM monthly_revenue;
-- Year-over-year comparison
SELECT
date,
revenue,
LAG(revenue, 365) OVER (ORDER BY date) AS yoy_revenue,
revenue - LAG(revenue, 365) OVER (ORDER BY date) AS yoy_change
FROM daily_revenue;
```
Gap Analysis
```sql
-- Find gaps between orders
SELECT
customer_id,
order_date,
LAG(order_date) OVER (
PARTITION BY customer_id
ORDER BY order_date
) AS previous_order_date,
order_date - LAG(order_date) OVER (
PARTITION BY customer_id
ORDER BY order_date
) AS days_since_last_order
FROM orders;
-- Identify churned customers (no order > 90 days)
WITH order_gaps AS (
SELECT
customer_id,
order_date,
LEAD(order_date) OVER (
PARTITION BY customer_id
ORDER BY order_date
) AS next_order_date,
LEAD(order_date) OVER (
PARTITION BY customer_id
ORDER BY order_date
) - order_date AS gap_days
FROM orders
)
SELECT DISTINCT customer_id
FROM order_gaps
WHERE gap_days > 90
OR (next_order_date IS NULL AND order_date < CURRENT_DATE - INTERVAL '90 days');
```
Lead for Future Values
```sql
-- Forecast vs actual comparison
SELECT
forecast_date,
predicted_value,
LEAD(actual_value, 7) OVER (ORDER BY forecast_date) AS actual_7d_later,
LEAD(actual_value, 7) OVER (ORDER BY forecast_date) - predicted_value AS forecast_error
FROM forecasts;
```
---
FIRST_VALUE and LAST_VALUE
First/Last in Group
```sql
-- First and last order per customer
SELECT
customer_id,
order_id,
order_date,
order_amount,
FIRST_VALUE(order_date) OVER (
PARTITION BY customer_id
ORDER BY order_date
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) AS first_order_date,
LAST_VALUE(order_date) OVER (
PARTITION BY customer_id
ORDER BY order_date
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) AS last_order_date
FROM orders;
-- Note: LAST_VALUE requires explicit frame to see all rows!
```
Baseline Comparison
```sql
-- Compare each day to first day of month
SELECT
date,
revenue,
FIRST_VALUE(revenue) OVER (
PARTITION BY DATE_TRUNC('month', date)
ORDER BY date
) AS first_day_revenue,
revenue - FIRST_VALUE(revenue) OVER (
PARTITION BY DATE_TRUNC('month', date)
ORDER BY date
) AS diff_from_first_day
FROM daily_revenue;
```
---
Statistical Functions
Percentile Calculations
```sql
-- Calculate percentiles
SELECT
product_id,
price,
PERCENT_RANK() OVER (ORDER BY price) AS percent_rank,
CUME_DIST() OVER (ORDER BY price) AS cumulative_dist,
NTILE(100) OVER (ORDER BY price) AS percentile
FROM products;
-- Median calculation (50th percentile)
SELECT
PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY salary) AS median_salary,
PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY salary) AS median_salary_discrete,
AVG(salary) AS mean_salary
FROM employees;
-- Multiple percentiles at once
SELECT
department,
PERCENTILE_CONT(0.25) WITHIN GROUP (ORDER BY salary) AS p25,
PERCENTILE_CONT(0.50) WITHIN GROUP (ORDER BY salary) AS median,
PERCENTILE_CONT(0.75) WITHIN GROUP (ORDER BY salary) AS p75,
PERCENTILE_CONT(0.90) WITHIN GROUP (ORDER BY salary) AS p90
FROM employees
GROUP BY department;
```
---
Business Analytics Patterns
Customer Lifecycle Analysis
```sql
-- Customer order sequence and lifecycle metrics
WITH customer_orders AS (
SELECT
customer_id,
order_id,
order_date,
order_amount,
ROW_NUMBER() OVER (
PARTITION BY customer_id
ORDER BY order_date
) AS order_number,
FIRST_VALUE(order_date) OVER (
PARTITION BY customer_id
ORDER BY order_date
) AS first_order_date,
SUM(order_amount) OVER (
PARTITION BY customer_id
ORDER BY order_date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS lifetime_value,
LAG(order_date) OVER (
PARTITION BY customer_id
ORDER BY order_date
) AS previous_order_date
FROM orders
)
SELECT
customer_id,
order_id,
order_number,
order_amount,
lifetime_value,
order_date - first_order_date AS days_since_first_order,
order_date - previous_order_date AS days_since_last_order,
CASE
WHEN order_number = 1 THEN 'New'
WHEN order_date - previous_order_date > 90 THEN 'Reactivated'
ELSE 'Repeat'
END AS customer_status
FROM customer_orders;
```
Cohort Analysis
```sql
-- Monthly cohort retention
WITH user_cohorts AS (
SELECT
user_id,
DATE_TRUNC('month', first_activity_date) AS cohort_month,
DATE_TRUNC('month', activity_date) AS activity_month
FROM user_activity
),
cohort_sizes AS (
SELECT
cohort_month,
COUNT(DISTINCT user_id) AS cohort_size
FROM user_cohorts
GROUP BY cohort_month
),
monthly_activity AS (
SELECT
cohort_month,
activity_month,
COUNT(DISTINCT user_id) AS active_users,
EXTRACT(MONTH FROM AGE(activity_month, cohort_month)) AS month_number
FROM user_cohorts
GROUP BY cohort_month, activity_month
)
SELECT
ma.cohort_month,
ma.month_number,
cs.cohort_size,
ma.active_users,
ROUND(100.0 * ma.active_users / cs.cohort_size, 2) AS retention_rate
FROM monthly_activity ma
JOIN cohort_sizes cs ON ma.cohort_month = cs.cohort_month
ORDER BY ma.cohort_month, ma.month_number;
```
Sales Performance
```sql
-- Sales rep performance with rankings and targets
WITH sales_performance AS (
SELECT
sales_rep_id,
rep_name,
region,
SUM(deal_amount) AS total_sales,
COUNT(*) AS deal_count,
AVG(deal_amount) AS avg_deal_size
FROM sales_deals
WHERE close_date >= DATE_TRUNC('quarter', CURRENT_DATE)
GROUP BY sales_rep_id, rep_name, region
)
SELECT
sales_rep_id,
rep_name,
region,
total_sales,
deal_count,
avg_deal_size,
RANK() OVER (ORDER BY total_sales DESC) AS overall_rank,
RANK() OVER (PARTITION BY region ORDER BY total_sales DESC) AS region_rank,
total_sales - AVG(total_sales) OVER () AS vs_company_avg,
total_sales - AVG(total_sales) OVER (PARTITION BY region) AS vs_region_avg,
PERCENT_RANK() OVER (ORDER BY total_sales) AS percentile
FROM sales_performance;
```
Inventory Analysis
```sql
-- Inventory movement analysis
SELECT
product_id,
transaction_date,
transaction_type,
quantity,
SUM(
CASE WHEN transaction_type = 'IN' THEN quantity
WHEN transaction_type = 'OUT' THEN -quantity
ELSE 0
END
) OVER (
PARTITION BY product_id
ORDER BY transaction_date, transaction_id
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS running_inventory,
AVG(
CASE WHEN transaction_type = 'OUT' THEN quantity END
) OVER (
PARTITION BY product_id
ORDER BY transaction_date
ROWS BETWEEN 29 PRECEDING AND CURRENT ROW
) AS avg_daily_usage_30d
FROM inventory_transactions;
```
---
Platform-Specific Features
PostgreSQL
```sql
-- FILTER clause with window functions
SELECT
date,
category,
amount,
SUM(amount) OVER (ORDER BY date) AS total_running,
SUM(amount) FILTER (WHERE category = 'A') OVER (ORDER BY date) AS category_a_running
FROM transactions;
-- GROUPS frame type
SELECT
date,
amount,
SUM(amount) OVER (
ORDER BY date
GROUPS BETWEEN 1 PRECEDING AND 1 FOLLOWING
) AS sum_adjacent_groups
FROM transactions;
```
SQL Server
```sql
-- STRING_AGG with OVER (SQL Server 2017+)
-- Not directly supported, use workaround:
SELECT DISTINCT
department,
STRING_AGG(employee_name, ', ') WITHIN GROUP (ORDER BY employee_name)
OVER (PARTITION BY department) AS employees
FROM employees;
-- OFFSET with frames (SQL Server 2022+)
SELECT
date,
value,
AVG(value) OVER (
ORDER BY date
ROWS BETWEEN 7 PRECEDING AND 1 PRECEDING
) AS previous_week_avg
FROM daily_metrics;
```
BigQuery
```sql
-- QUALIFY clause (filter on window function results)
SELECT
customer_id,
order_date,
order_amount
FROM orders
QUALIFY ROW_NUMBER() OVER (
PARTITION BY customer_id
ORDER BY order_date DESC
) = 1;
-- Equivalent to CTE approach:
-- WITH ranked AS (
-- SELECT *, ROW_NUMBER() OVER (...) AS rn FROM orders
-- )
-- SELECT * FROM ranked WHERE rn = 1;
```
Snowflake
```sql
-- QUALIFY clause
SELECT *
FROM sales
QUALIFY DENSE_RANK() OVER (
PARTITION BY region
ORDER BY revenue DESC
) <= 5;
-- CONDITIONAL_TRUE_EVENT for sessionization
SELECT
user_id,
event_time,
CONDITIONAL_TRUE_EVENT(
DATEDIFF('minute', LAG(event_time) OVER (
PARTITION BY user_id ORDER BY event_time
), event_time) > 30
) OVER (
PARTITION BY user_id
ORDER BY event_time
) AS session_id
FROM user_events;
```
---
Performance Optimization
Indexing Strategy
```sql
-- Composite index for window function
-- Index columns: PARTITION BY columns first, then ORDER BY columns
CREATE INDEX idx_orders_customer_date
ON orders (customer_id, order_date);
-- For running totals across all data
CREATE INDEX idx_orders_date_amount
ON orders (order_date, order_amount);
```
Query Optimization Tips
```yaml
optimization_tips:
- tip: "Add indexes on PARTITION BY + ORDER BY columns"
reason: "Reduces sort operations"
- tip: "Use CTEs to compute window once, reference multiple times"
reason: "Avoids redundant calculations"
- tip: "Limit rows before applying window functions when possible"
reason: "Smaller dataset = faster windows"
- tip: "Be explicit about frame clause"
reason: "Prevents unexpected defaults"
- tip: "Use QUALIFY when supported"
reason: "Cleaner and often faster than subquery"
```
CTE Pattern for Clarity
```sql
-- Use CTEs for complex window queries
WITH
-- Step 1: Calculate raw metrics
daily_metrics AS (
SELECT
date,
SUM(amount) AS daily_total
FROM transactions
GROUP BY date
),
-- Step 2: Add window calculations
with_windows AS (
SELECT
date,
daily_total,
SUM(daily_total) OVER (ORDER BY date) AS running_total,
AVG(daily_total) OVER (
ORDER BY date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) AS rolling_7d_avg,
LAG(daily_total, 7) OVER (ORDER BY date) AS same_day_last_week
FROM daily_metrics
)
-- Step 3: Final calculations
SELECT
date,
daily_total,
running_total,
rolling_7d_avg,
daily_total - same_day_last_week AS wow_change
FROM with_windows
WHERE date >= CURRENT_DATE - INTERVAL '30 days';
```
---
Лучшие практики
- Явно указывай frame clause — дефолтное поведение может быть неожиданным
- Используй CTE для сложных запросов — улучшает читаемость и maintainability
- Создавай composite индексы — на PARTITION BY + ORDER BY колонки
- Проверяй NULL handling — LAG/LEAD возвращают NULL для граничных строк
- Тестируй edge cases — первая/последняя строка, одна строка в партиции
- Используй QUALIFY где поддерживается — чище чем subquery
- Учитывай особенности СУБД — синтаксис и функции различаются
- Оптимизируй сначала WHERE — фильтруй до применения window functions
More from this repository10
Provides expert B2B SaaS marketing strategies for demand generation, growth marketing, lead optimization, and key performance metrics.
Generates comprehensive iOS unit tests using XCTest, following best practices like AAA pattern, mocking, and dependency injection for robust test coverage.
Generates and runs performance load tests using k6 to simulate user traffic and measure system response under various load conditions.
Generates high-converting storytelling, offers, posts, and video texts by extracting key business context and applying targeted copywriting techniques.
Generates personalized, high-converting influencer outreach templates with strategic personalization and compelling communication approaches.
openapi-documentation skill from dengineproblem/agents-monorepo
code-documentation-generator skill from dengineproblem/agents-monorepo
ads-agent skill from dengineproblem/agents-monorepo
api-reference-guide skill from dengineproblem/agents-monorepo
feature-documentation skill from dengineproblem/agents-monorepo