🎯

megatron-memory-estimator

🎯Skill

from yzlnew/infra-skills

What it does

megatron-memory-estimator skill from yzlnew/infra-skills

megatron-memory-estimator

Installation

Install skill:

npx skills add https://github.com/yzlnew/infra-skills --skill megatron-memory-estimator

Last UpdatedJan 20, 2026

View on GitHub Back to Skills

Skill Details

SKILL.md

Overview

# AI Infrastructure Agent Skills

> ⚠️ WARNING

> This project is under active development and heavily generated by LLMs without strict proofreading. Use with caution and verify all code before production use.

A collection of specialized agent skills for AI infrastructure development, enabling Claude Code to write, optimize, and debug high-performance systems.

Overview

This repository provides expert-level skills for AI infrastructure engineering tasks. Each skill packages domain knowledge, code examples, and best practices to transform Claude into a specialized developer for specific frameworks and tools.

Construction Methodology (Unless Otherwise Specified)

Knowledge Gathering: Use Gemini DeepResearch to collect comprehensive, up-to-date information on target frameworks
Skill Development: Transform research into structured skills using skill-creator in Claude Code
Validation: Test skill-generated code examples to ensure correctness
Maintenance: Regular updates based on latest official documentation

Available Skills

TileLang Developer

Write high-performance GPU kernels using TileLang for NVIDIA, AMD, and Ascend hardware.

Capabilities:

Matrix multiplication (GEMM) kernels
FlashAttention implementations
DeepSeek MLA operators
Performance optimization (swizzle layouts, pipelining, warp specialization)
Cross-platform kernel development

Status: ✅ Complete

Megatron Memory Estimator

Estimate GPU memory usage for Megatron-based MoE and dense models. Built upon [megatron_memory_estimator](https://huggingface.co/spaces/ISEEKYAN/megatron_memory_estimator).

Capabilities:

Estimate memory from HuggingFace configs
Support for MoE models (DeepSeek-V3, Qwen, etc.)
Parallelism strategy comparison (TP/PP/EP/CP)
Memory optimization recommendations

Status: ✅ Complete

SLIME User

Guide for using SLIME (LLM post-training framework for RL Scaling). Built upon [THUDM/slime](https://github.com/THUDM/slime).

Capabilities:

RL training setup and configuration (GRPO, GSPO, PPO, Reinforce++)
Multi-turn tool calling and agent workflows
Custom reward models and generation functions
Megatron and FSDP backend configuration
SGLang integration and optimization
Dynamic sampling and partial rollout
Multi-node distributed training

Status: ✅ Complete

Prompt to create this skill, with Sonnet 4.5:

```

Use skill-creator to create a skill called slime-user at this repo. slime is an LLM

post-training framework for RL Scaling. Its repo is https://github.com/THUDM/slime.

Skill creation procedure:

Git clone the latest repo
Analyze docs/en, understand basic structure and write a doc navigation guide for user

getting started or finding docs for advanced usage

Gather valuable examples from the docs and examples dir, write key ideas and script

path down for quick reference

Checkout some important source code, for example slime/slime/utils/arguments.py and

slime/rollout/sglang_rollout.py, provide its path and functions for a quick find.

```

TikZ Flowchart

Create professional flowcharts and architecture diagrams using LaTeX TikZ with standardized styles.

Capabilities:

Professional flowcharts with Google Material-like color palette
Standardized node types (data, memory, operation, kernel boxes)
Architecture diagrams and process flows
Grouping and layout best practices
Clean orthogonal edges and relative positioning

Status: ✅ Complete

Planned Skills

SGLang Developer

Development skill for SGLang (Structured Generation Language) runtime and optimization.

Planned capabilities:

SGLang runtime configuration
Custom sampling strategies
Performance tuning for LLM inference
Multi-GPU serving optimization

Status: 🚧 Planned

vLLM Developer

Skill for vLLM engine development and deployment.

Planned capabilities:

PagedAttention implementation
Custom scheduler development
Multi-LoRA

More from this repository3

🎯

tikz-flowchart🎯Skill

tikz-flowchart skill from yzlnew/infra-skills

🎯

tilelang-developer🎯Skill

tilelang-developer skill from yzlnew/infra-skills

🎯

slime-user🎯Skill

slime-user skill from yzlnew/infra-skills