🎯

mixseek-evaluator-config

🎯Skill

from drillan/mixseek-plus

What it does

Generates MixSeek evaluation configuration files (evaluator.toml, judgment.toml) for defining submission scoring and judgment criteria in TUMIX tournaments.

📦

Part of

drillan/mixseek-plus(10 items)

mixseek-evaluator-config

Installation

uv runRun with uv

uv run python skills/mixseek-config-validate/scripts/validate-config.py \

📖 Extracted from docs: drillan/mixseek-plus

Need more details? View full documentation on GitHub →

1Installs

AddedFeb 4, 2026

View on GitHub Back to Skills

Skill Details

SKILL.md

MixSeekの評価設定ファイル（evaluator.toml、judgment.toml）を生成します。「評価設定を作成」「スコアリング設定」「判定設定を作って」「メトリクスを設定」といった依頼で使用してください。Submissionの評価基準と最終判定ロジックを定義します。

Overview

# MixSeek 評価設定生成

概要

MixSeek-Coreの評価設定ファイル（evaluator.toml）と判定設定ファイル（judgment.toml）を生成します。TUMIXトーナメントにおけるSubmissionの評価基準、スコアリング方法、最終判定ロジックを定義します。

前提条件

ワークスペースが初期化されていること（mixseek-workspace-init参照）
環境変数 MIXSEEK_WORKSPACE が設定されていること（推奨）

生成ファイル

| ファイル | 用途 | 配置場所 |

|---------|------|---------|

| evaluator.toml | Submissionのスコアリング設定 | configs/evaluators/ |

| judgment.toml | 最終判定の設定 | configs/judgment/ |

使用方法

Step 1: 要件のヒアリング

ユーザーに以下を確認してください:

評価の重点: 何を重視して評価するか（明確性、カバレッジ、関連性など）
重み付け: 各メトリクスの重要度（均等 or カスタム）
判定スタイル: 決定論的（temperature=0）or 多様性重視

Step 2: メトリクス設定の提案

標準メトリクスから選択:

| メトリクス | 説明 | 用途 |

|-----------|------|------|

| ClarityCoherence | 明確性と一貫性 | 読みやすさ重視のタスク |

| Coverage | カバレッジ | 網羅性重視のタスク |

| LLMPlain | 汎用LLM評価 | カスタム評価基準が必要なタスク |

| Relevance | 関連性 | 的確さ重視のタスク |

Step 3: 設定ファイルの生成

evaluator.toml:

```toml

default_model = "google-gla:gemini-2.5-pro"

temperature = 0.0

[[metrics]]

name = "ClarityCoherence"

weight = 0.34

[[metrics]]

name = "Coverage"

weight = 0.33

[[metrics]]

name = "Relevance"

weight = 0.33

```

judgment.toml:

```toml

model = "google-gla:gemini-2.5-pro"

temperature = 0.0

timeout_seconds = 60

```

Step 4: ファイルの保存

```bash

$MIXSEEK_WORKSPACE/configs/evaluators/evaluator.toml

$MIXSEEK_WORKSPACE/configs/judgment/judgment.toml

```

重要: カスタムパス（configs/evaluators/やconfigs/judgment/）を使用する場合は、必ずorchestrator.tomlでパスを明示的に指定してください。指定しないとデフォルトパス（configs/evaluator.toml、configs/judgment.toml）が検索され、設定が反映されません。

```toml

# orchestrator.toml

[orchestrator]

evaluator_config = "configs/evaluators/evaluator.toml"

judgment_config = "configs/judgment/judgment.toml"

```

Step 5: 設定ファイルの検証（必須）

生成後は必ず検証を実行してください。

```bash

# Evaluator設定の検証

uv run python skills/mixseek-config-validate/scripts/validate-config.py \

$MIXSEEK_WORKSPACE/configs/evaluators/evaluator.toml --type evaluator

# Judgment設定の検証

uv run python skills/mixseek-config-validate/scripts/validate-config.py \

$MIXSEEK_WORKSPACE/configs/judgment/judgment.toml --type judgment

```

検証が成功したら、ユーザーに結果を報告します。失敗した場合は、エラー内容を確認して設定を修正してください。

標準メトリクス詳細

ClarityCoherence（明確性・一貫性）

回答の読みやすさと論理的一貫性を評価します。

評価観点:

文章構造の明確さ
論理的な流れ
専門用語の適切な使用
結論の明確さ

推奨用途:

説明文の生成
レポート作成
教育コンテンツ

Coverage（カバレッジ）

質問に対する回答の網羅性を評価します。

評価観点:

質問の全側面への対応
関連トピックの包含
例示の充実度
補足情報の有無

推奨用途:

リサーチタスク
FAQ作成
技術ドキュメント

Relevance（関連性）

回答が質問に対してどれだけ的確かを評価します。

評価観点:

質問への直接的な回答
不要な情報の排除
焦点の維持
文脈への適合

推奨用途:

Q&A
カスタマーサポート
検索結果の評価

LLMPlain（汎用LLM評価）

system_instructionで定義したカスタム評価基準に基づいてLLMが評価します。

特徴:

事前定義された評価ロジックを持たない
system_instructionで完全にカスタマイズ可能
特殊な評価基準が必要な場合に使用

推奨用途:

ドメイン固有の評価（法律、医療など）
プロジェクト固有の品質基準
他のメトリクスでカバーできない観点

設定例:

```toml

[[metrics]]

name = "LLMPlain"

weight = 0.5

system_instruction = """

セキュリティ観点から回答を評価してください:

機密情報の漏洩リスク
安全なコーディング実践
脆弱性の有無

0-100のスコアで評価してください。

"""

```

例

均等重み付け設定

```

User: 評価設定を作成して

Agent: 評価設定を提案します。

メトリクス（均等重み付け）:

- ClarityCoherence: 33.4%

- Coverage: 33.3%

- Relevance: 33.3%

この構成でよろしいですか？

User: はい

Agent: 設定ファイルを生成しました:

✓ configs/evaluators/evaluator.toml

✓ configs/judgment/judgment.toml

```

カスタム重み付け設定

```

User: 正確性を重視した評価設定を作って

Agent: 関連性（Relevance）を重視した設定を提案します。

メトリクス:

- Relevance: 50%（重視）

- ClarityCoherence: 30%

- Coverage: 20%

この構成でよろしいですか？

User: はい

```

生成される設定ファイル例

evaluator.toml（カスタム重み付け）:

```toml

# MixSeek Evaluator Configuration

# Generated by mixseek-evaluator-config skill

default_model = "google-gla:gemini-2.5-pro"

temperature = 0.0

timeout_seconds = 300

max_retries = 3

[[metrics]]

name = "Relevance"

weight = 0.5

[[metrics]]

name = "ClarityCoherence"

weight = 0.3

[[metrics]]

name = "Coverage"

weight = 0.2

```

judgment.toml:

```toml

# MixSeek Judgment Configuration

# Generated by mixseek-evaluator-config skill

model = "google-gla:gemini-2.5-pro"

temperature = 0.0

timeout_seconds = 60

max_retries = 3

```

重み付けルール

重み付けには以下のルールがあります:

全て指定 or 全て省略: 一部のメトリクスだけに重みを指定することはできません
合計1.0: 全ての重みの合計は1.0（±0.001）である必要があります
省略時は均等: 重みを省略すると自動的に均等配分されます

```toml

# 有効: 全て指定

[[metrics]]

name = "ClarityCoherence"

weight = 0.5

[[metrics]]

name = "Coverage"

weight = 0.5

# 有効: 全て省略（均等配分）

[[metrics]]

name = "ClarityCoherence"

[[metrics]]

name = "Coverage"

# 無効: 一部のみ指定

[[metrics]]

name = "ClarityCoherence"

weight = 0.5 # ❌

[[metrics]]

name = "Coverage"

# weight省略 ❌

```

トラブルシューティング

重み合計エラー

```

Error: Weights must sum to 1.0

```

解決方法:

全ての重みの合計が1.0になるよう調整
または全ての重みを省略して均等配分

メトリクス名エラー

```

Error: Unknown metric name

```

解決方法:

有効なメトリクス名を使用: ClarityCoherence, Coverage, LLMPlain, Relevance
大文字小文字に注意

判定が不安定

解決方法:

judgment.tomlのtemperatureを0.0に設定（決定論的）
seedを固定値に設定

参照

TOMLスキーマ詳細: references/TOML-SCHEMA.md
標準メトリクス: references/METRICS.md
オーケストレーター設定: skills/mixseek-orchestrator-config/

More from this repository9

🎯

mixseek-config-validate🎯Skill

Validates MixSeek configuration files (TOML) for syntax correctness and schema compliance, detecting and suggesting fixes for configuration errors.

🎯

mixseek-model-list🎯Skill

Retrieves and displays available LLM models from various providers via API, providing model compatibility and recommended settings for MixSeek-Core.

🎯

detect-python-command🎯Skill

Detects and identifies Python commands within text, providing insights into potential command usage and syntax.

🎯

mixseek-debug🎯Skill

Debugs and traces interactions with the MixSeek search plugin, providing detailed logging and diagnostic information for troubleshooting.

🎯

mixseek-prompt-builder🎯Skill

Generates MixSeek prompt builder configuration files (prompt_builder.toml) with customizable templates for multi-round AI task orchestration.

🎯

mixseek-orchestrator-config🎯Skill

Configures and manages orchestration settings for the MixSeek Plus search and recommendation system.

🎯

mixseek-workspace-init🎯Skill

Initializes MixSeek workspace by creating configuration directories and guiding environment variable setup for a new project.

🎯

mixseek-skills🎯Skill

Manages AI coding workspace configuration, team settings, evaluation setup, and debugging tools for MixSeek-Core.

🎯

mixseek-team-config🎯Skill

Generates MixSeek team configuration files (team.toml) by defining Leader and Member Agent configurations, models, and system instructions for specific team objectives.