🎯

window-function-builder

🎯Skill

from dengineproblem/agents-monorepo

VibeIndex|
What it does

Builds advanced SQL window functions for analytical queries, enabling complex ranking, aggregation, and statistical calculations across data partitions.

📦

Part of

dengineproblem/agents-monorepo(106 items)

window-function-builder

Installation

DockerRun with Docker
docker compose up -d --build agent-brain
DockerRun with Docker
docker compose up -d --build agent-service
git cloneClone repository
git clone <repo-url>
DockerRun with Docker
docker compose up -d --build
Shell ScriptRun shell script
./test-video-upload.sh ./path/to/video.mp4

+ 3 more commands

📖 Extracted from docs: dengineproblem/agents-monorepo
1Installs
-
AddedFeb 4, 2026

Skill Details

SKILL.md

Эксперт SQL window functions. Используй для аналитических запросов, OVER clauses и ranking functions.

Overview

# SQL Window Function Builder

Эксперт по SQL window functions и аналитическим запросам для бизнес-аналитики.

Core Function Categories

Function Overview

```yaml

window_functions:

ranking:

- ROW_NUMBER(): "Unique sequential numbers"

- RANK(): "Ranking with gaps for ties"

- DENSE_RANK(): "Ranking without gaps"

- NTILE(n): "Divide into n buckets"

aggregate:

- SUM(): "Running/cumulative totals"

- AVG(): "Moving averages"

- COUNT(): "Running counts"

- MIN(): "Running minimum"

- MAX(): "Running maximum"

offset:

- LAG(col, n): "Access previous row"

- LEAD(col, n): "Access next row"

- FIRST_VALUE(): "First value in frame"

- LAST_VALUE(): "Last value in frame"

- NTH_VALUE(col, n): "Nth value in frame"

statistical:

- PERCENT_RANK(): "Relative rank as percentage"

- CUME_DIST(): "Cumulative distribution"

- PERCENTILE_CONT(): "Interpolated percentile"

- PERCENTILE_DISC(): "Discrete percentile"

```

---

Basic Syntax

OVER Clause Structure

```sql

function_name(expression) OVER (

[PARTITION BY partition_expression, ...]

[ORDER BY sort_expression [ASC|DESC], ...]

[frame_clause]

)

```

Frame Specifications

```sql

-- Frame clause syntax

ROWS | RANGE BETWEEN frame_start AND frame_end

-- Frame bounds

UNBOUNDED PRECEDING -- From first row of partition

n PRECEDING -- n rows before current

CURRENT ROW -- Current row

n FOLLOWING -- n rows after current

UNBOUNDED FOLLOWING -- To last row of partition

-- Common frame patterns

ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW -- Running total (default for ORDER BY)

ROWS BETWEEN 6 PRECEDING AND CURRENT ROW -- 7-day rolling

ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING -- 3-row centered

RANGE BETWEEN INTERVAL '30' DAY PRECEDING AND CURRENT ROW -- 30-day range

```

---

Ranking Functions

ROW_NUMBER, RANK, DENSE_RANK

```sql

-- Basic ranking comparison

SELECT

employee_id,

department,

salary,

ROW_NUMBER() OVER (ORDER BY salary DESC) AS row_num,

RANK() OVER (ORDER BY salary DESC) AS rank,

DENSE_RANK() OVER (ORDER BY salary DESC) AS dense_rank

FROM employees;

-- Results with tied salaries:

-- | employee_id | salary | row_num | rank | dense_rank |

-- |-------------|--------|---------|------|------------|

-- | 101 | 100000 | 1 | 1 | 1 |

-- | 102 | 100000 | 2 | 1 | 1 | <- same salary

-- | 103 | 90000 | 3 | 3 | 2 | <- note rank skips 2

```

Ranking Within Groups

```sql

-- Top 3 salaries per department

WITH ranked AS (

SELECT

employee_id,

employee_name,

department,

salary,

DENSE_RANK() OVER (

PARTITION BY department

ORDER BY salary DESC

) AS dept_rank

FROM employees

)

SELECT *

FROM ranked

WHERE dept_rank <= 3;

```

NTILE for Percentiles

```sql

-- Divide customers into quartiles by revenue

SELECT

customer_id,

customer_name,

total_revenue,

NTILE(4) OVER (ORDER BY total_revenue DESC) AS revenue_quartile,

CASE NTILE(4) OVER (ORDER BY total_revenue DESC)

WHEN 1 THEN 'Top 25%'

WHEN 2 THEN '25-50%'

WHEN 3 THEN '50-75%'

WHEN 4 THEN 'Bottom 25%'

END AS segment

FROM customer_revenue;

```

---

Running Calculations

Running Totals

```sql

-- Running total of sales

SELECT

order_date,

order_amount,

SUM(order_amount) OVER (

ORDER BY order_date

ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW

) AS running_total

FROM orders;

-- Running total per customer

SELECT

customer_id,

order_date,

order_amount,

SUM(order_amount) OVER (

PARTITION BY customer_id

ORDER BY order_date

ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW

) AS customer_running_total

FROM orders;

```

Moving Averages

```sql

-- 7-day moving average

SELECT

date,

daily_revenue,

AVG(daily_revenue) OVER (

ORDER BY date

ROWS BETWEEN 6 PRECEDING AND CURRENT ROW

) AS moving_avg_7d,

-- Also track the count for partial windows

COUNT(*) OVER (

ORDER BY date

ROWS BETWEEN 6 PRECEDING AND CURRENT ROW

) AS days_in_window

FROM daily_sales;

-- Centered moving average (3 days before, current, 3 days after)

SELECT

date,

daily_revenue,

AVG(daily_revenue) OVER (

ORDER BY date

ROWS BETWEEN 3 PRECEDING AND 3 FOLLOWING

) AS centered_avg_7d

FROM daily_sales;

```

Cumulative Percentage

```sql

-- Cumulative percentage of total

SELECT

product_category,

revenue,

SUM(revenue) OVER (ORDER BY revenue DESC) AS cumulative_revenue,

ROUND(

100.0 * SUM(revenue) OVER (ORDER BY revenue DESC) /

SUM(revenue) OVER (),

2

) AS cumulative_pct

FROM category_sales

ORDER BY revenue DESC;

```

---

LAG and LEAD

Period-over-Period Comparison

```sql

-- Month-over-month growth

SELECT

month,

revenue,

LAG(revenue, 1) OVER (ORDER BY month) AS prev_month_revenue,

revenue - LAG(revenue, 1) OVER (ORDER BY month) AS mom_change,

ROUND(

100.0 * (revenue - LAG(revenue, 1) OVER (ORDER BY month)) /

NULLIF(LAG(revenue, 1) OVER (ORDER BY month), 0),

2

) AS mom_growth_pct

FROM monthly_revenue;

-- Year-over-year comparison

SELECT

date,

revenue,

LAG(revenue, 365) OVER (ORDER BY date) AS yoy_revenue,

revenue - LAG(revenue, 365) OVER (ORDER BY date) AS yoy_change

FROM daily_revenue;

```

Gap Analysis

```sql

-- Find gaps between orders

SELECT

customer_id,

order_date,

LAG(order_date) OVER (

PARTITION BY customer_id

ORDER BY order_date

) AS previous_order_date,

order_date - LAG(order_date) OVER (

PARTITION BY customer_id

ORDER BY order_date

) AS days_since_last_order

FROM orders;

-- Identify churned customers (no order > 90 days)

WITH order_gaps AS (

SELECT

customer_id,

order_date,

LEAD(order_date) OVER (

PARTITION BY customer_id

ORDER BY order_date

) AS next_order_date,

LEAD(order_date) OVER (

PARTITION BY customer_id

ORDER BY order_date

) - order_date AS gap_days

FROM orders

)

SELECT DISTINCT customer_id

FROM order_gaps

WHERE gap_days > 90

OR (next_order_date IS NULL AND order_date < CURRENT_DATE - INTERVAL '90 days');

```

Lead for Future Values

```sql

-- Forecast vs actual comparison

SELECT

forecast_date,

predicted_value,

LEAD(actual_value, 7) OVER (ORDER BY forecast_date) AS actual_7d_later,

LEAD(actual_value, 7) OVER (ORDER BY forecast_date) - predicted_value AS forecast_error

FROM forecasts;

```

---

FIRST_VALUE and LAST_VALUE

First/Last in Group

```sql

-- First and last order per customer

SELECT

customer_id,

order_id,

order_date,

order_amount,

FIRST_VALUE(order_date) OVER (

PARTITION BY customer_id

ORDER BY order_date

ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING

) AS first_order_date,

LAST_VALUE(order_date) OVER (

PARTITION BY customer_id

ORDER BY order_date

ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING

) AS last_order_date

FROM orders;

-- Note: LAST_VALUE requires explicit frame to see all rows!

```

Baseline Comparison

```sql

-- Compare each day to first day of month

SELECT

date,

revenue,

FIRST_VALUE(revenue) OVER (

PARTITION BY DATE_TRUNC('month', date)

ORDER BY date

) AS first_day_revenue,

revenue - FIRST_VALUE(revenue) OVER (

PARTITION BY DATE_TRUNC('month', date)

ORDER BY date

) AS diff_from_first_day

FROM daily_revenue;

```

---

Statistical Functions

Percentile Calculations

```sql

-- Calculate percentiles

SELECT

product_id,

price,

PERCENT_RANK() OVER (ORDER BY price) AS percent_rank,

CUME_DIST() OVER (ORDER BY price) AS cumulative_dist,

NTILE(100) OVER (ORDER BY price) AS percentile

FROM products;

-- Median calculation (50th percentile)

SELECT

PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY salary) AS median_salary,

PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY salary) AS median_salary_discrete,

AVG(salary) AS mean_salary

FROM employees;

-- Multiple percentiles at once

SELECT

department,

PERCENTILE_CONT(0.25) WITHIN GROUP (ORDER BY salary) AS p25,

PERCENTILE_CONT(0.50) WITHIN GROUP (ORDER BY salary) AS median,

PERCENTILE_CONT(0.75) WITHIN GROUP (ORDER BY salary) AS p75,

PERCENTILE_CONT(0.90) WITHIN GROUP (ORDER BY salary) AS p90

FROM employees

GROUP BY department;

```

---

Business Analytics Patterns

Customer Lifecycle Analysis

```sql

-- Customer order sequence and lifecycle metrics

WITH customer_orders AS (

SELECT

customer_id,

order_id,

order_date,

order_amount,

ROW_NUMBER() OVER (

PARTITION BY customer_id

ORDER BY order_date

) AS order_number,

FIRST_VALUE(order_date) OVER (

PARTITION BY customer_id

ORDER BY order_date

) AS first_order_date,

SUM(order_amount) OVER (

PARTITION BY customer_id

ORDER BY order_date

ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW

) AS lifetime_value,

LAG(order_date) OVER (

PARTITION BY customer_id

ORDER BY order_date

) AS previous_order_date

FROM orders

)

SELECT

customer_id,

order_id,

order_number,

order_amount,

lifetime_value,

order_date - first_order_date AS days_since_first_order,

order_date - previous_order_date AS days_since_last_order,

CASE

WHEN order_number = 1 THEN 'New'

WHEN order_date - previous_order_date > 90 THEN 'Reactivated'

ELSE 'Repeat'

END AS customer_status

FROM customer_orders;

```

Cohort Analysis

```sql

-- Monthly cohort retention

WITH user_cohorts AS (

SELECT

user_id,

DATE_TRUNC('month', first_activity_date) AS cohort_month,

DATE_TRUNC('month', activity_date) AS activity_month

FROM user_activity

),

cohort_sizes AS (

SELECT

cohort_month,

COUNT(DISTINCT user_id) AS cohort_size

FROM user_cohorts

GROUP BY cohort_month

),

monthly_activity AS (

SELECT

cohort_month,

activity_month,

COUNT(DISTINCT user_id) AS active_users,

EXTRACT(MONTH FROM AGE(activity_month, cohort_month)) AS month_number

FROM user_cohorts

GROUP BY cohort_month, activity_month

)

SELECT

ma.cohort_month,

ma.month_number,

cs.cohort_size,

ma.active_users,

ROUND(100.0 * ma.active_users / cs.cohort_size, 2) AS retention_rate

FROM monthly_activity ma

JOIN cohort_sizes cs ON ma.cohort_month = cs.cohort_month

ORDER BY ma.cohort_month, ma.month_number;

```

Sales Performance

```sql

-- Sales rep performance with rankings and targets

WITH sales_performance AS (

SELECT

sales_rep_id,

rep_name,

region,

SUM(deal_amount) AS total_sales,

COUNT(*) AS deal_count,

AVG(deal_amount) AS avg_deal_size

FROM sales_deals

WHERE close_date >= DATE_TRUNC('quarter', CURRENT_DATE)

GROUP BY sales_rep_id, rep_name, region

)

SELECT

sales_rep_id,

rep_name,

region,

total_sales,

deal_count,

avg_deal_size,

RANK() OVER (ORDER BY total_sales DESC) AS overall_rank,

RANK() OVER (PARTITION BY region ORDER BY total_sales DESC) AS region_rank,

total_sales - AVG(total_sales) OVER () AS vs_company_avg,

total_sales - AVG(total_sales) OVER (PARTITION BY region) AS vs_region_avg,

PERCENT_RANK() OVER (ORDER BY total_sales) AS percentile

FROM sales_performance;

```

Inventory Analysis

```sql

-- Inventory movement analysis

SELECT

product_id,

transaction_date,

transaction_type,

quantity,

SUM(

CASE WHEN transaction_type = 'IN' THEN quantity

WHEN transaction_type = 'OUT' THEN -quantity

ELSE 0

END

) OVER (

PARTITION BY product_id

ORDER BY transaction_date, transaction_id

ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW

) AS running_inventory,

AVG(

CASE WHEN transaction_type = 'OUT' THEN quantity END

) OVER (

PARTITION BY product_id

ORDER BY transaction_date

ROWS BETWEEN 29 PRECEDING AND CURRENT ROW

) AS avg_daily_usage_30d

FROM inventory_transactions;

```

---

Platform-Specific Features

PostgreSQL

```sql

-- FILTER clause with window functions

SELECT

date,

category,

amount,

SUM(amount) OVER (ORDER BY date) AS total_running,

SUM(amount) FILTER (WHERE category = 'A') OVER (ORDER BY date) AS category_a_running

FROM transactions;

-- GROUPS frame type

SELECT

date,

amount,

SUM(amount) OVER (

ORDER BY date

GROUPS BETWEEN 1 PRECEDING AND 1 FOLLOWING

) AS sum_adjacent_groups

FROM transactions;

```

SQL Server

```sql

-- STRING_AGG with OVER (SQL Server 2017+)

-- Not directly supported, use workaround:

SELECT DISTINCT

department,

STRING_AGG(employee_name, ', ') WITHIN GROUP (ORDER BY employee_name)

OVER (PARTITION BY department) AS employees

FROM employees;

-- OFFSET with frames (SQL Server 2022+)

SELECT

date,

value,

AVG(value) OVER (

ORDER BY date

ROWS BETWEEN 7 PRECEDING AND 1 PRECEDING

) AS previous_week_avg

FROM daily_metrics;

```

BigQuery

```sql

-- QUALIFY clause (filter on window function results)

SELECT

customer_id,

order_date,

order_amount

FROM orders

QUALIFY ROW_NUMBER() OVER (

PARTITION BY customer_id

ORDER BY order_date DESC

) = 1;

-- Equivalent to CTE approach:

-- WITH ranked AS (

-- SELECT *, ROW_NUMBER() OVER (...) AS rn FROM orders

-- )

-- SELECT * FROM ranked WHERE rn = 1;

```

Snowflake

```sql

-- QUALIFY clause

SELECT *

FROM sales

QUALIFY DENSE_RANK() OVER (

PARTITION BY region

ORDER BY revenue DESC

) <= 5;

-- CONDITIONAL_TRUE_EVENT for sessionization

SELECT

user_id,

event_time,

CONDITIONAL_TRUE_EVENT(

DATEDIFF('minute', LAG(event_time) OVER (

PARTITION BY user_id ORDER BY event_time

), event_time) > 30

) OVER (

PARTITION BY user_id

ORDER BY event_time

) AS session_id

FROM user_events;

```

---

Performance Optimization

Indexing Strategy

```sql

-- Composite index for window function

-- Index columns: PARTITION BY columns first, then ORDER BY columns

CREATE INDEX idx_orders_customer_date

ON orders (customer_id, order_date);

-- For running totals across all data

CREATE INDEX idx_orders_date_amount

ON orders (order_date, order_amount);

```

Query Optimization Tips

```yaml

optimization_tips:

- tip: "Add indexes on PARTITION BY + ORDER BY columns"

reason: "Reduces sort operations"

- tip: "Use CTEs to compute window once, reference multiple times"

reason: "Avoids redundant calculations"

- tip: "Limit rows before applying window functions when possible"

reason: "Smaller dataset = faster windows"

- tip: "Be explicit about frame clause"

reason: "Prevents unexpected defaults"

- tip: "Use QUALIFY when supported"

reason: "Cleaner and often faster than subquery"

```

CTE Pattern for Clarity

```sql

-- Use CTEs for complex window queries

WITH

-- Step 1: Calculate raw metrics

daily_metrics AS (

SELECT

date,

SUM(amount) AS daily_total

FROM transactions

GROUP BY date

),

-- Step 2: Add window calculations

with_windows AS (

SELECT

date,

daily_total,

SUM(daily_total) OVER (ORDER BY date) AS running_total,

AVG(daily_total) OVER (

ORDER BY date

ROWS BETWEEN 6 PRECEDING AND CURRENT ROW

) AS rolling_7d_avg,

LAG(daily_total, 7) OVER (ORDER BY date) AS same_day_last_week

FROM daily_metrics

)

-- Step 3: Final calculations

SELECT

date,

daily_total,

running_total,

rolling_7d_avg,

daily_total - same_day_last_week AS wow_change

FROM with_windows

WHERE date >= CURRENT_DATE - INTERVAL '30 days';

```

---

Лучшие практики

  1. Явно указывай frame clause — дефолтное поведение может быть неожиданным
  2. Используй CTE для сложных запросов — улучшает читаемость и maintainability
  3. Создавай composite индексы — на PARTITION BY + ORDER BY колонки
  4. Проверяй NULL handling — LAG/LEAD возвращают NULL для граничных строк
  5. Тестируй edge cases — первая/последняя строка, одна строка в партиции
  6. Используй QUALIFY где поддерживается — чище чем subquery
  7. Учитывай особенности СУБД — синтаксис и функции различаются
  8. Оптимизируй сначала WHERE — фильтруй до применения window functions