🎯

model-merging

🎯Skill

from ovachiever/droid-tings

What it does

Rapidly merge and enhance AI models by blending domain-specific capabilities without retraining, using advanced techniques like SLERP and TIES-Merging.

📦

Part of

ovachiever/droid-tings(370 items)

model-merging

Installation

git cloneClone repository

git clone https://github.com/arcee-ai/mergekit.git

pip installInstall Python package

pip install -e .

pip installInstall Python package

pip install mergekit

pip installInstall Python package

pip install transformers torch

PythonRun Python server

python convert.py ./merged-model --outtype f16 --outfile merged-model.gguf

📖 Extracted from docs: ovachiever/droid-tings

Need more details? View full documentation on GitHub →

16Installs

AddedFeb 4, 2026

View on GitHub Back to Skills

Skill Details

SKILL.md

Merge multiple fine-tuned models using mergekit to combine capabilities without retraining. Use when creating specialized models by blending domain-specific expertise (math + coding + chat), improving performance beyond single models, or experimenting rapidly with model variants. Covers SLERP, TIES-Merging, DARE, Task Arithmetic, linear merging, and production deployment strategies.

Overview

# Model Merging: Combining Pre-trained Models

When to Use This Skill

Use Model Merging when you need to:

Combine capabilities from multiple fine-tuned models without retraining
Create specialized models by blending domain-specific expertise (math + coding + chat)
Improve performance beyond single models (often +5-10% on benchmarks)
Reduce training costs - no GPUs needed, merges run on CPU
Experiment rapidly - create new model variants in minutes, not days
Preserve multiple skills - merge without catastrophic forgetting

Success Stories: Marcoro14-7B-slerp (best on Open LLM Leaderboard 02/2024), many top HuggingFace models use merging

Tools: mergekit (Arcee AI), LazyMergekit, Model Soup

Installation

```bash

# Install mergekit

git clone https://github.com/arcee-ai/mergekit.git

cd mergekit

pip install -e .

# Or via pip

pip install mergekit

# Optional: Transformer library

pip install transformers torch

```

Quick Start

Simple Linear Merge

```yaml

# config.yml - Merge two models with equal weights

merge_method: linear

models:

- model: mistralai/Mistral-7B-v0.1

parameters:

weight: 0.5

- model: teknium/OpenHermes-2.5-Mistral-7B

parameters:

weight: 0.5

dtype: bfloat16

```

```bash

# Run merge

mergekit-yaml config.yml ./merged-model --cuda

# Use merged model

python -m transformers.models.auto --model_name_or_path ./merged-model

```

SLERP Merge (Best for 2 Models)

```yaml

# config.yml - Spherical interpolation

merge_method: slerp

slices:

- sources:

- model: mistralai/Mistral-7B-v0.1

layer_range: [0, 32]

- model: teknium/OpenHermes-2.5-Mistral-7B

layer_range: [0, 32]

parameters:

t: 0.5 # Interpolation factor (0=model1, 1=model2)

dtype: bfloat16

```

Core Concepts

1. Merge Methods

Linear (Model Soup)

Simple weighted average of parameters
Fast, works well for similar models
Can merge 2+ models

```python

merged_weights = w1 model1_weights + w2 model2_weights + w3 * model3_weights

# where w1 + w2 + w3 = 1

```

SLERP (Spherical Linear Interpolation)

Interpolates along sphere in weight space
Preserves magnitude of weight vectors
Best for merging 2 models
Smoother than linear

```python

# SLERP formula

merged = (sin((1-t)θ) / sin(θ)) model1 + (sin(tθ) / sin(θ)) model2

# where θ = arccos(dot(model1, model2))

# t ∈ [0, 1]

```

Task Arithmetic

Extract "task vectors" (fine-tuned - base)
Combine task vectors, add to base
Good for merging multiple specialized models

```python

# Task vector

task_vector = finetuned_model - base_model

# Merge multiple task vectors

merged = base_model + α₁task_vector₁ + α₂task_vector₂

```

TIES-Merging

Task arithmetic + sparsification
Resolves sign conflicts in parameters
Best for merging many task-specific models

DARE (Drop And REscale)

Randomly drops fine-tuned parameters
Rescales remaining parameters
Reduces redundancy, maintains performance

2. Configuration Structure

```yaml

# Basic structure

merge_method: # linear, slerp, ties, dare_ties, task_arithmetic

base_model: # Optional: base model for task arithmetic

models:

- model:

parameters:

weight: # Merge weight

density: # For TIES/DARE

- model:

parameters:

weight:

parameters:

# Method-specific parameters

dtype: # bfloat16, float16, float32

# Optional

slices: # Layer-wise merging

tokenizer: # Tokenizer configuration

```

Merge Methods Guide

Linear Merge

Best for: Simple model combinations, equal weighting

```yaml

merge_method: linear

models:

- model: WizardLM/WizardMath-7B-V1.1

parameters:

weight: 0.4

- model: teknium/OpenHermes-2.5-Mistral-7B

parameters:

weight: 0.3

- model: NousResearch/Nous-Hermes-2-Mistral-7B-DPO

parameters:

weight: 0.3

dtype: bfloat16

```

SLERP Merge

Best for: Two models, smooth interpolation

```yaml

merge_method: slerp

slices:

- sources:

- model: mistralai/Mistral-7B-v0.1

layer_range: [0, 32]

- model: teknium/OpenHermes-2.5-Mistral-7B

layer_range: [0, 32]

parameters:

t: 0.5 # 0.0 = first model, 1.0 = second model

dtype: bfloat16

```

Layer-specific SLERP:

```yaml

merge_method: slerp

slices:

- sources:

- model: model_a

layer_range: [0, 32]

- model: model_b

layer_range: [0, 32]

parameters:

- filter: self_attn # Attention layers

value: 0.3

- filter: mlp # MLP layers

value: 0.7

- value: 0.5 # Default for other layers

dtype: bfloat16

```

Task Arithmetic

Best for: Combining specialized skills

```yaml

merge_method: task_arithmetic

base_model: mistralai/Mistral-7B-v0.1

models:

- model: WizardLM/WizardMath-7B-V1.1 # Math

parameters:

weight: 0.5

- model: teknium/OpenHermes-2.5-Mistral-7B # Chat

parameters:

weight: 0.3

- model: ajibawa-2023/Code-Mistral-7B # Code

parameters:

weight: 0.2

dtype: bfloat16

```

TIES-Merging

Best for: Many models, resolving conflicts

```yaml

merge_method: ties

base_model: mistralai/Mistral-7B-v0.1

models:

- model: WizardLM/WizardMath-7B-V1.1

parameters:

density: 0.5 # Keep top 50% of parameters

weight: 1.0

- model: teknium/OpenHermes-2.5-Mistral-7B

parameters:

density: 0.5

weight: 1.0

- model: NousResearch/Nous-Hermes-2-Mistral-7B-DPO

parameters:

density: 0.5

weight: 1.0

parameters:

normalize: true

dtype: bfloat16

```

DARE Merge

Best for: Reducing redundancy

```yaml

merge_method: dare_ties

base_model: mistralai/Mistral-7B-v0.1

models:

- model: WizardLM/WizardMath-7B-V1.1

parameters:

density: 0.5 # Drop 50% of deltas

weight: 0.6

- model: teknium/OpenHermes-2.5-Mistral-7B

parameters:

density: 0.5

weight: 0.4

parameters:

int8_mask: true # Use int8 for masks (saves memory)

dtype: bfloat16

```

Advanced Patterns

Layer-wise Merging

```yaml

# Different models for different layers

merge_method: passthrough

slices:

- sources:

- model: mistralai/Mistral-7B-v0.1

layer_range: [0, 16] # First half

- sources:

- model: teknium/OpenHermes-2.5-Mistral-7B

layer_range: [16, 32] # Second half

dtype: bfloat16

```

MoE from Merged Models

```yaml

# Create Mixture of Experts

merge_method: moe

base_model: mistralai/Mistral-7B-v0.1

experts:

- source_model: WizardLM/WizardMath-7B-V1.1

positive_prompts:

- "math"

- "calculate"

- source_model: teknium/OpenHermes-2.5-Mistral-7B

positive_prompts:

- "chat"

- "conversation"

- source_model: ajibawa-2023/Code-Mistral-7B

positive_prompts:

- "code"

- "python"

dtype: bfloat16

```

Tokenizer Merging

```yaml

merge_method: linear

models:

- model: mistralai/Mistral-7B-v0.1

- model: custom/specialized-model

tokenizer:

source: "union" # Combine vocabularies from both models

tokens:

<|special_token|>:

source: "custom/specialized-model"

```

Best Practices

1. Model Compatibility

```python

# ✅ Good: Same architecture

models = [

"mistralai/Mistral-7B-v0.1",

"teknium/OpenHermes-2.5-Mistral-7B", # Both Mistral 7B

]

# ❌ Bad: Different architectures

models = [

"meta-llama/Llama-2-7b-hf", # Llama

"mistralai/Mistral-7B-v0.1", # Mistral (incompatible!)

]

```

2. Weight Selection

```yaml

# ✅ Good: Weights sum to 1.0

models:

- model: model_a

parameters:

weight: 0.6

- model: model_b

parameters:

weight: 0.4 # 0.6 + 0.4 = 1.0

# ⚠️ Acceptable: Weights don't sum to 1 (for task arithmetic)

models:

- model: model_a

parameters:

weight: 0.8

- model: model_b

parameters:

weight: 0.8 # May boost performance

```

3. Method Selection

```python

# Choose merge method based on use case:

# 2 models, smooth blend → SLERP

merge_method = "slerp"

# 3+ models, simple average → Linear

merge_method = "linear"

# Multiple task-specific models → Task Arithmetic or TIES

merge_method = "ties"

# Want to reduce redundancy → DARE

merge_method = "dare_ties"

```

4. Density Tuning (TIES/DARE)

```yaml

# Start conservative (keep more parameters)

parameters:

density: 0.8 # Keep 80%

# If performance good, increase sparsity

parameters:

density: 0.5 # Keep 50%

# If performance degrades, reduce sparsity

parameters:

density: 0.9 # Keep 90%

```

5. Layer-specific Merging

```yaml

# Preserve base model's beginning and end

merge_method: passthrough

slices:

- sources:

- model: base_model

layer_range: [0, 2] # Keep first layers

- sources:

- model: merged_middle # Merge middle layers

layer_range: [2, 30]

- sources:

- model: base_model

layer_range: [30, 32] # Keep last layers

```

Evaluation & Testing

Benchmark Merged Models

```python

from transformers import AutoModelForCausalLM, AutoTokenizer

# Load merged model

model = AutoModelForCausalLM.from_pretrained("./merged-model")

tokenizer = AutoTokenizer.from_pretrained("./merged-model")

# Test on various tasks

test_prompts = {

"math": "Calculate: 25 * 17 =",

"code": "Write a Python function to reverse a string:",

"chat": "What is the capital of France?",

}

for task, prompt in test_prompts.items():

inputs = tokenizer(prompt, return_tensors="pt")

outputs = model.generate(**inputs, max_length=100)

print(f"{task}: {tokenizer.decode(outputs[0])}")

```

Common Benchmarks

Open LLM Leaderboard: General capabilities
MT-Bench: Multi-turn conversation
MMLU: Multitask accuracy
HumanEval: Code generation
GSM8K: Math reasoning

Production Deployment

Save and Upload

```python

from transformers import AutoModelForCausalLM, AutoTokenizer

# Load merged model

model = AutoModelForCausalLM.from_pretrained("./merged-model")

tokenizer = AutoTokenizer.from_pretrained("./merged-model")

# Upload to HuggingFace Hub

model.push_to_hub("username/my-merged-model")

tokenizer.push_to_hub("username/my-merged-model")

```

Quantize Merged Model

```bash

# Quantize with GGUF

python convert.py ./merged-model --outtype f16 --outfile merged-model.gguf

# Quantize with GPTQ

python quantize_gptq.py ./merged-model --bits 4 --group_size 128

```

Common Pitfalls

❌ Pitfall 1: Merging Incompatible Models

```yaml

# Wrong: Different architectures

models:

- model: meta-llama/Llama-2-7b # Llama architecture

- model: mistralai/Mistral-7B # Mistral architecture

```

Fix: Only merge models with same architecture

❌ Pitfall 2: Over-weighting One Model

```yaml

# Suboptimal: One model dominates

models:

- model: model_a

parameters:

weight: 0.95 # Too high

- model: model_b

parameters:

weight: 0.05 # Too low

```

Fix: Use more balanced weights (0.3-0.7 range)

❌ Pitfall 3: Not Evaluating

```bash

# Wrong: Merge and deploy without testing

mergekit-yaml config.yml ./merged-model

# Deploy immediately (risky!)

```

Fix: Always benchmark before deploying

Resources

mergekit GitHub: https://github.com/arcee-ai/mergekit
HuggingFace Tutorial: https://huggingface.co/blog/mlabonne/merge-models
LazyMergekit: Automated merging notebook
TIES Paper: https://arxiv.org/abs/2306.01708
DARE Paper: https://arxiv.org/abs/2311.03099

More from this repository10

🎯

nextjs-shadcn-builder🎯Skill

nextjs-shadcn-builder skill from ovachiever/droid-tings

🎯

security-auditor🎯Skill

security-auditor skill from ovachiever/droid-tings

🎯

threejs-graphics-optimizer🎯Skill

threejs-graphics-optimizer skill from ovachiever/droid-tings

🎯

api-documenter🎯Skill

api-documenter skill from ovachiever/droid-tings

🎯

secret-scanner🎯Skill

secret-scanner skill from ovachiever/droid-tings

🎯

readme-updater🎯Skill

readme-updater skill from ovachiever/droid-tings

🎯

applying-brand-guidelines🎯Skill

applying-brand-guidelines skill from ovachiever/droid-tings

🎯

tailwind-v4-shadcn🎯Skill

Configures Tailwind v4 with shadcn/ui, automating CSS variable setup, dark mode, and preventing common initialization errors.

🎯

deep-reading-analyst🎯Skill

deep-reading-analyst skill from ovachiever/droid-tings

🎯

dependency-auditor🎯Skill

dependency-auditor skill from ovachiever/droid-tings

model-merging

Installation

Skill Details

Overview

When to Use This Skill

Installation

Quick Start

Simple Linear Merge

SLERP Merge (Best for 2 Models)

Core Concepts

1. Merge Methods

2. Configuration Structure

Merge Methods Guide

Linear Merge

SLERP Merge

Task Arithmetic

TIES-Merging

DARE Merge

Advanced Patterns

Layer-wise Merging

MoE from Merged Models

Tokenizer Merging

Best Practices

1. Model Compatibility

2. Weight Selection

3. Method Selection

4. Density Tuning (TIES/DARE)

5. Layer-specific Merging

Evaluation & Testing

Benchmark Merged Models

Common Benchmarks

Production Deployment

Save and Upload

Quantize Merged Model

Common Pitfalls

❌ Pitfall 1: Merging Incompatible Models

❌ Pitfall 2: Over-weighting One Model

❌ Pitfall 3: Not Evaluating

Resources

See Also

More from this repository10