🎯

polars

🎯Skill

from silvainfm/claude-skills

VibeIndex|
What it does

Performs lightning-fast data transformations, aggregations, and analysis using Rust-powered DataFrames with Python bindings, optimized for large datasets and high-performance computing.

πŸ“¦

Part of

silvainfm/claude-skills(37 items)

polars

Installation

pip installInstall Python package
pip3 install polars
pip installInstall Python package
pip3 install 'polars[pyarrow]'
pip installInstall Python package
pip3 install polars duckdb 'polars[pyarrow]'
πŸ“– Extracted from docs: silvainfm/claude-skills
8Installs
1
-
AddedFeb 4, 2026

Skill Details

SKILL.md

Lightning-fast DataFrame library written in Rust for high-performance data manipulation and analysis. Use when user wants blazing fast data transformations, working with large datasets, lazy evaluation pipelines, or needs better performance than pandas. Ideal for ETL, data wrangling, aggregations, joins, and reading/writing CSV, Parquet, JSON files.

Overview

# Polars

Overview

Polars is a blazingly fast DataFrame library written in Rust with Python bindings. Built for performance and memory efficiency, Polars leverages parallel execution and lazy evaluation to process data faster than pandas, especially on large datasets.

When to Use This Skill

Activate when the user:

  • Wants to work with DataFrames and needs high performance
  • Mentions Polars explicitly or asks for "fast" data processing
  • Needs to process large datasets (millions of rows)
  • Wants lazy evaluation for query optimization
  • Asks for data transformations, filtering, grouping, or aggregations
  • Needs to read/write CSV, Parquet, JSON, or other data formats
  • Wants to combine with DuckDB for SQL + DataFrame workflows

Installation

Check if Polars is installed:

```bash

python3 -c "import polars as pl; print(pl.__version__)"

```

If not installed:

```bash

pip3 install polars

```

For full features including Parquet support:

```bash

pip3 install 'polars[pyarrow]'

```

For DuckDB integration:

```bash

pip3 install polars duckdb 'polars[pyarrow]'

```

Core Capabilities

1. Reading Data

Polars can read data from various formats:

```python

import polars as pl

# Read CSV

df = pl.read_csv('data.csv')

# Read Parquet (fast, columnar format)

df = pl.read_parquet('data.parquet')

# Read JSON

df = pl.read_json('data.json')

# Read multiple files

df = pl.read_csv('data/*.csv')

# Read with lazy evaluation (doesn't load until needed)

lazy_df = pl.scan_csv('large_data.csv')

lazy_df = pl.scan_parquet('data/*.parquet')

```

2. Data Selection and Filtering

```python

import polars as pl

df = pl.read_csv('data.csv')

# Select columns

result = df.select(['name', 'age', 'city'])

# Select with expressions

result = df.select([

pl.col('name'),

pl.col('age'),

pl.col('salary').alias('annual_salary')

])

# Filter rows

result = df.filter(pl.col('age') > 25)

# Multiple conditions

result = df.filter(

(pl.col('age') > 25) &

(pl.col('city') == 'NYC')

)

# Filter with string methods

result = df.filter(pl.col('name').str.contains('John'))

```

3. Transformations and New Columns

```python

import polars as pl

df = pl.read_csv('sales.csv')

# Add new columns

result = df.with_columns([

(pl.col('quantity') * pl.col('price')).alias('total'),

pl.col('product').str.to_uppercase().alias('product_upper'),

pl.when(pl.col('quantity') > 10)

.then(pl.lit('bulk'))

.otherwise(pl.lit('retail'))

.alias('sale_type')

])

# Modify existing columns

result = df.with_columns([

pl.col('price').round(2),

pl.col('date').str.strptime(pl.Date, '%Y-%m-%d')

])

# Rename columns

result = df.rename({'old_name': 'new_name'})

```

4. Aggregations and Grouping

```python

import polars as pl

df = pl.read_csv('sales.csv')

# Group by and aggregate

result = df.group_by('category').agg([

pl.col('sales').sum().alias('total_sales'),

pl.col('sales').mean().alias('avg_sales'),

pl.col('sales').count().alias('num_sales'),

pl.col('product').n_unique().alias('unique_products')

])

# Multiple group by columns

result = df.group_by(['category', 'region']).agg([

pl.col('revenue').sum(),

pl.col('customer_id').n_unique().alias('unique_customers')

])

# Aggregation without grouping

stats = df.select([

pl.col('sales').sum(),

pl.col('sales').mean(),

pl.col('sales').median(),

pl.col('sales').std(),

pl.col('sales').min(),

pl.col('sales').max()

])

```

5. Sorting and Ranking

```python

import polars as pl

df = pl.read_csv('data.csv')

# Sort by single column

result = df.sort('age')

# Sort descending

result = df.sort('salary', descending=True)

# Sort by multiple columns

result = df.sort(['department', 'salary'], descending=[False, True])

# Add rank column

result = df.with_columns([

pl.col('salary').rank(method='dense').over('department').alias('dept_rank')

])

```

6. Joins

```python

import polars as pl

customers = pl.read_csv('customers.csv')

orders = pl.read_csv('orders.csv')

# Inner join

result = customers.join(orders, on='customer_id', how='inner')

# Left join

result = customers.join(orders, on='customer_id', how='left')

# Join on different column names

result = customers.join(

orders,

left_on='id',

right_on='customer_id',

how='inner'

)

# Join on multiple columns

result = df1.join(df2, on=['col1', 'col2'], how='inner')

```

7. Window Functions

```python

import polars as pl

df = pl.read_csv('sales.csv')

# Calculate running total

result = df.with_columns([

pl.col('sales').cum_sum().over('region').alias('running_total')

])

# Calculate rolling average

result = df.with_columns([

pl.col('sales').rolling_mean(window_size=7).alias('7_day_avg')

])

# Rank within groups

result = df.with_columns([

pl.col('sales').rank().over('category').alias('category_rank')

])

# Lag and lead

result = df.with_columns([

pl.col('sales').shift(1).over('product').alias('prev_sales'),

pl.col('sales').shift(-1).over('product').alias('next_sales')

])

```

Lazy Evaluation for Performance

Polars' lazy API optimizes queries before execution:

```python

import polars as pl

# Start with lazy scan (doesn't load data yet)

lazy_df = (

pl.scan_csv('large_data.csv')

.filter(pl.col('date') >= '2024-01-01')

.select(['customer_id', 'product', 'sales', 'date'])

.group_by('customer_id')

.agg([

pl.col('sales').sum().alias('total_sales'),

pl.col('product').n_unique().alias('unique_products')

])

.filter(pl.col('total_sales') > 1000)

.sort('total_sales', descending=True)

)

# Execute the optimized query

result = lazy_df.collect()

# Or get execution plan

print(lazy_df.explain())

```

Common Patterns

Pattern 1: ETL Pipeline

```python

import polars as pl

from datetime import datetime

# Extract and Transform

result = (

pl.scan_csv('raw_data.csv')

# Clean data

.filter(

(pl.col('amount') > 0) &

(pl.col('quantity') > 0)

)

# Transform columns

.with_columns([

pl.col('date').str.strptime(pl.Date, '%Y-%m-%d'),

pl.col('product').str.strip().str.to_uppercase(),

(pl.col('quantity') * pl.col('amount')).alias('total'),

pl.when(pl.col('quantity') > 10)

.then(pl.lit('bulk'))

.otherwise(pl.lit('retail'))

.alias('order_type')

])

# Aggregate

.group_by(['date', 'product', 'order_type'])

.agg([

pl.col('total').sum().alias('daily_total'),

pl.col('quantity').sum().alias('daily_quantity'),

pl.count().alias('num_orders')

])

.collect()

)

# Load (save results)

result.write_parquet('processed_data.parquet')

```

Pattern 2: Data Exploration

```python

import polars as pl

df = pl.read_csv('data.csv')

# Quick overview

print(df.head())

print(df.describe())

print(df.schema)

# Column statistics

print(df.select([

pl.col('age').min(),

pl.col('age').max(),

pl.col('age').mean(),

pl.col('age').median(),

pl.col('age').std()

]))

# Count nulls

print(df.null_count())

# Value counts

print(df['category'].value_counts())

# Unique values

print(df['status'].n_unique())

```

Pattern 3: Combining with DuckDB

Use Polars for data loading and DuckDB for SQL analytics:

```python

import polars as pl, duckdb

# Load data with Polars

df = pl.read_parquet('data/*.parquet')

# Use DuckDB for complex SQL

result = duckdb.sql("""

SELECT

category,

DATE_TRUNC('month', date) as month,

SUM(revenue) as monthly_revenue,

COUNT(DISTINCT customer_id) as unique_customers

FROM df

WHERE date >= '2024-01-01'

GROUP BY category, month

ORDER BY month DESC, monthly_revenue DESC

""").pl() # Convert back to Polars DataFrame

# Continue with Polars

final = result.with_columns([

(pl.col('monthly_revenue') / pl.col('unique_customers')).alias('revenue_per_customer')

])

```

Pattern 4: Writing Data

```python

import polars as pl

df = pl.read_csv('data.csv')

# Write to CSV

df.write_csv('output.csv')

# Write to Parquet (recommended for large data)

df.write_parquet('output.parquet')

# Write to JSON

df.write_json('output.json')

# Write partitioned Parquet files

df.write_parquet('output/', partition_by='date')

```

Expression Chaining

Polars uses a powerful expression syntax:

```python

import polars as pl

result = df.select([

# String operations

pl.col('name').str.to_lowercase().str.strip().alias('clean_name'),

# Arithmetic

(pl.col('price') * 1.1).round(2).alias('price_with_tax'),

# Conditional logic

pl.when(pl.col('age') < 18)

.then(pl.lit('minor'))

.when(pl.col('age') < 65)

.then(pl.lit('adult'))

.otherwise(pl.lit('senior'))

.alias('age_group'),

# Date operations

pl.col('date').dt.year().alias('year'),

pl.col('date').dt.month().alias('month'),

# List operations

pl.col('tags').list.len().alias('num_tags'),

])

```

Performance Tips

  1. Use lazy evaluation for large datasets - lets Polars optimize the query
  2. Use Parquet format - columnar, compressed, much faster than CSV
  3. Filter early - push filters before other operations
  4. Avoid row iteration - use vectorized operations instead
  5. Use expressions - more efficient than pandas-style operations

```python

# Good: Lazy + filter early

result = (

pl.scan_parquet('large.parquet')

.filter(pl.col('date') >= '2024-01-01') # Filter first

.select(['col1', 'col2', 'col3']) # Then select

.collect()

)

# Less efficient: Eager loading

df = pl.read_parquet('large.parquet')

result = df.filter(pl.col('date') >= '2024-01-01').select(['col1', 'col2', 'col3'])

```

Polars vs Pandas

Key differences:

  • Immutability: Polars DataFrames are immutable (operations return new DataFrames)
  • Performance: Polars is typically 5-10x faster than pandas
  • Lazy evaluation: Polars can optimize queries before execution
  • Expressions: Polars uses expression API instead of method chaining
  • Parallel: Polars automatically parallelizes operations

```python

# Pandas style

df['new_col'] = df['col1'] * df['col2']

# Polars style

df = df.with_columns([

(pl.col('col1') * pl.col('col2')).alias('new_col')

])

```

Integration with DuckDB

For the best of both worlds, combine Polars and DuckDB:

```python

import polars as pl, duckdb

# Polars: Fast data loading and transformation

df = (

pl.scan_parquet('data/*.parquet')

.filter(pl.col('active') == True)

.collect()

)

# DuckDB: SQL analytics

result = duckdb.sql("""

SELECT

category,

SUM(amount) as total,

AVG(amount) as average

FROM df

GROUP BY category

""").pl()

```

See the duckdb skill for more SQL capabilities and the references/api_reference.md file for detailed Polars API documentation.

Error Handling

```python

import polars as pl

try:

df = pl.read_csv('data.csv')

except FileNotFoundError:

print("File not found")

except pl.exceptions.ComputeError as e:

print(f"Polars compute error: {e}")

except Exception as e:

print(f"Unexpected error: {e}")

```

Resources

  • references/api_reference.md: Detailed Polars API documentation and examples
  • Official docs: https://docs.pola.rs/
  • API reference: https://docs.pola.rs/api/python/stable/reference/

More from this repository10

🎯
streamlit🎯Skill

streamlit skill from silvainfm/claude-skills

🎯
duckdb🎯Skill

duckdb skill from silvainfm/claude-skills

🎯
reflex-dev🎯Skill

reflex-dev skill from silvainfm/claude-skills

🎯
project-planner🎯Skill

Helps users break down project goals into structured, actionable steps with clear milestones, timelines, and resource allocation recommendations.

🎯
reportlab🎯Skill

Generates professional PDFs programmatically with precise control over layout, typography, graphics, tables, and charts using Python's ReportLab library.

🎯
article-extractor🎯Skill

Extracts clean, structured text and metadata from web articles, handling complex layouts and removing ads, navigation, and boilerplate content.

🎯
skill-creator🎯Skill

Guides users in creating specialized skills that extend Claude's capabilities through structured workflows, tool integrations, and domain-specific knowledge.

🎯
mcp-builder🎯Skill

Guides developers in creating high-quality MCP servers that enable LLMs to interact seamlessly with external services through well-designed, workflow-centric tools.

🎯
ship-learn-next🎯Skill

Transforms learning content into actionable implementation plans using the Ship-Learn-Next framework, turning advice and lessons into concrete, iterative learning steps.

🎯
skill-developer🎯Skill

Generates skills by creating and managing Claude Code skills with best practices, following Anthropic's guidelines for skill development and activation.