🎯

content-evaluation-framework

🎯Skill

from panaversity/agentfactory

VibeIndex|
What it does

Evaluates educational content systematically using a 6-category weighted rubric, scoring technical accuracy, pedagogical effectiveness, and constitutional compliance.

πŸ“¦

Part of

panaversity/agentfactory(23 items)

content-evaluation-framework

Installation

πŸ“‹ No install commands found in docs. Showing default command. Check GitHub for actual instructions.
Quick InstallInstall with npx
npx skills add panaversity/agentfactory --skill content-evaluation-framework
2Installs
-
AddedFeb 4, 2026

Skill Details

SKILL.md

This skill should be used when evaluating the quality of book chapters, lessons, or educational content. It provides a systematic 6-category rubric with weighted scoring (Technical Accuracy 30%, Pedagogical Effectiveness 25%, Writing Quality 20%, Structure & Organization 15%, AI-First Teaching 10%, Constitution Compliance Pass/Fail) and multi-tier assessment (Excellent/Good/Needs Work/Insufficient). Use this during iterative drafting, after content completion, on-demand review requests, or before validation phases.

Overview

# Content Evaluation Framework

This skill provides a comprehensive, systematic rubric for evaluating educational book chapters and lessons with quantifiable quality standards.

---

6-Point Spec Blueprint Compliance

1. Identity (Persona)

Role: Senior Content Quality Auditor

Tone: Precise, evidence-based, constructively critical

Expertise: Educational content evaluation, rubric-based assessment, constitutional compliance, pedagogical effectiveness

2. Context (MCP & Data)

Required Files (Read First):

  • .specify/memory/constitution.md - Constitutional principles
  • .specify/memory/content-quality-memory.md - Anti-patterns and validation checklists
  • references/rubric-details.md - Detailed tier criteria
  • references/constitution-checklist.md - Pass/fail checklist
  • references/evaluation-template.md - Report template

Tools Required:

  • Read (file access)
  • Grep (pattern matching for violations)
  • Glob (find content files)

MCP Servers: None required

3. Logic (Guardrails)

Mandatory Steps:

  1. Read constitution.md FIRST
  2. Evaluate Constitution Compliance (GATE) - if FAIL, stop
  3. Score each weighted category with evidence
  4. Calculate weighted score using formula
  5. Generate report using template

NEVER:

  • ❌ NEVER score without reading the content fully
  • ❌ NEVER pass content that violates constitutional principles
  • ❌ NEVER provide scores without evidence (quotes, line numbers)
  • ❌ NEVER skip the Constitution Compliance gate check

Decision Tree:

```

IF Constitution Compliance = FAIL

β†’ STOP, report violations, return to author

ELSE IF Weighted Score < 75%

β†’ CONDITIONAL PASS, list required improvements

ELSE IF Weighted Score >= 75% AND < 90%

β†’ PASS (Good tier), list optional improvements

ELSE

β†’ PASS (Excellent tier), acknowledge quality

```

4. Success Trigger

Activation Keywords:

  • "evaluate [lesson|content|chapter|preface]"
  • "check quality"
  • "run content-evaluation-framework"
  • "score this content"
  • "is this ready for publication"

File Types:

  • *.md files in apps/learn-app/docs/
  • Files with YAML frontmatter containing learning_objectives
  • Lesson, chapter, and preface content

Invocation Contexts:

  • Automatic: After content-implementer completes
  • Manual: User requests evaluation
  • Workflow: Part of /sp.implement validation gate

5. Output Standard

Format: Markdown report

Required Sections:

  1. Executive Summary (score, tier, pass/fail)
  2. Category Scores table (5 weighted + gate)
  3. Constitution Compliance Status
  4. Detailed Findings per category
  5. Strengths (with evidence)
  6. Areas for Improvement (prioritized)
  7. Actionable Next Steps

Output Location:

  • Primary: Console output (full report)
  • Summary: Single line for orchestrator: "βœ… PASS (88%) | ❌ FAIL - [reason]"

Example Summary:

```

βœ… PASS (88.85%) - Good tier

Constitution: PASS | Technical: 82% | Pedagogical: 92% | Writing: 90% | Structure: 95% | AI-First: 90%

Ready for publication with minor improvements.

```

6. Error Protocol

Tool Unavailable:

| Tool | Fallback |

|------|----------|

| Read | Cannot evaluate - report error |

| Grep | Manual pattern search in content |

| Constitution file missing | BLOCK - cannot evaluate without constitution |

Graceful Degradation:

```

IF constitution.md unavailable

β†’ STOP - "Cannot evaluate without constitutional reference"

IF rubric-details.md unavailable

β†’ Use embedded summary criteria (less precise)

β†’ Mark output as "PARTIAL - rubric unavailable"

```

Error Reporting:

```

❌ ERROR: [Resource] unavailable

Impact: Cannot complete [specific check]

Recommendation: Ensure [file] exists at [path]

```

Human Escalation:

Escalate to human when:

  • [ ] Constitutional violation is ambiguous
  • [ ] Content type doesn't match any known pattern
  • [ ] Scoring criteria conflict with each other

---

Constitution Alignment: v4.0.1 emphasizing:

  • Principle 1: Specification Primacy ("Specs Are the New Syntax")
  • Section IIa: Panaversity 4-Layer Teaching Method
  • Section IIb: AI Three Roles Framework (bidirectional co-learning)
  • 8 Foundational Principles: Including Factual Accuracy, Coherent Structure, Progressive Complexity
  • Nine Pillars (Section I): AI CLI, Markdown, MCP, AI-First IDEs, Cross-Platform, TDD, SDD, Composable Skills, Cloud-Native

Purpose

Evaluate educational content across 6 weighted categories to ensure:

  • Technical correctness and code quality
  • Effective pedagogical design and learning outcomes
  • Clear, accessible writing for target audience
  • Proper structure and organization
  • AI-augmented learning principles (learning WITH AI, not generating FROM AI)
  • Constitution compliance and standards adherence

When to Use This Skill

Invoke this evaluation framework at multiple checkpoints:

  1. During Iterative Drafting - Mid-process quality checks to catch issues early
  2. After Lesson/Chapter Completion - Comprehensive evaluation before moving to next content unit
  3. On-Demand Review Requests - When user explicitly asks for quality assessment
  4. Before Validation Phase - Part of the SDD Validate phase workflow for final sign-off

Evaluation Methodology

Scoring System

Multi-Tier Assessment:

  • Excellent (90-100%) - Exceeds standards, exemplary quality
  • Good (75-89%) - Meets all standards with minor improvements possible
  • Needs Work (50-74%) - Meets some standards but requires significant revision
  • Insufficient (<50%) - Does not meet minimum standards, requires major rework

Weighted Categories

The evaluation uses 6 categories with the following weights:

| Category | Weight | Focus Area |

|----------|--------|------------|

| Technical Accuracy | 30% | Code correctness, type hints, explanations, examples work as stated |

| Pedagogical Effectiveness | 25% | Show-then-explain pattern, progressive complexity, quality exercises |

| Writing Quality | 20% | Readability (Flesch-Kincaid 8-10), voice, clarity, grade-level appropriateness |

| Structure & Organization | 15% | Learning objectives met, logical flow, appropriate length, transitions |

| AI-First Teaching | 10% | Co-learning partnership demonstrated, Three Roles Framework shown, Nine Pillars aligned, Specs-As-Syntax emphasized |

| Constitution Compliance | Pass/Fail | Must pass all non-negotiable constitutional requirements including Nine Pillars alignment (gate) |

Total Weighted Score Calculation:

```

Final Score = (Technical Γ— 0.30) + (Pedagogical Γ— 0.25) + (Writing Γ— 0.20) +

(Structure Γ— 0.15) + (AI-First Γ— 0.10)

```

Constitution Compliance: Must achieve "Pass" status. If "Fail," content cannot proceed regardless of weighted score.

How to Conduct an Evaluation

Step 1: Prepare Context

Before evaluation, gather:

  • Content being evaluated (lesson.md, chapter.md, or section file)
  • Relevant spec, plan, and tasks files from specs//
  • Constitution file (.specify/memory/constitution.md)
  • Learning objectives and success criteria for the content unit
  • Output style template used (.claude/output-styles/lesson.md or similar)

Step 2: Load Detailed Rubric

Read the detailed tier criteria for each category:

```

Read: references/rubric-details.md

```

This file contains specific criteria defining Excellent/Good/Needs Work/Insufficient for each of the 6 categories.

Step 3: Evaluate Constitution Compliance First

Constitution compliance is a gate - if content fails constitutional requirements, it cannot proceed.

Use the constitution checklist:

```

Read: references/constitution-checklist.md

```

Assess all non-negotiable principles and requirements. Mark as Pass or Fail with specific violations noted.

If Constitution Compliance = Fail: Stop evaluation and report violations immediately. Content must be revised before proceeding.

If Constitution Compliance = Pass: Continue to weighted category evaluation.

Step 4: Score Each Weighted Category

For each of the 5 weighted categories (Technical Accuracy, Pedagogical Effectiveness, Writing Quality, Structure & Organization, AI-First Teaching):

  1. Review specific criteria from rubric-details.md for that category
  2. Assess content against criteria for each tier
  3. Assign tier (Excellent/Good/Needs Work/Insufficient) with score range
  4. Record specific evidence - Quote examples, note line numbers, cite specific passages
  5. Provide improvement recommendations - Concrete, actionable feedback

Step 5: Calculate Weighted Score

Apply the weighted formula:

```

Final Score = (Technical Γ— 0.30) + (Pedagogical Γ— 0.25) + (Writing Γ— 0.20) +

(Structure Γ— 0.15) + (AI-First Γ— 0.10)

```

Convert tier scores to numeric values:

  • Excellent: 95%
  • Good: 82%
  • Needs Work: 62%
  • Insufficient: 40%

(Or use specific numeric score within tier range if warranted)

Step 6: Generate Evaluation Report

Use the structured evaluation template:

```

Read: references/evaluation-template.md

```

Complete all sections:

  1. Executive Summary - Overall score, tier, pass/fail status
  2. Category Scores - Table showing each category score, tier, and weight contribution
  3. Detailed Findings - Evidence-based assessment for each category
  4. Strengths - What the content does well (specific examples)
  5. Areas for Improvement - Prioritized list of issues with recommendations
  6. Constitution Compliance Status - Pass/Fail with specific principle checks
  7. Actionable Next Steps - Concrete tasks to improve content

Step 7: Communicate Results

Present evaluation report with:

  • Clear verdict - Pass/Fail and overall quality tier
  • Evidence-based feedback - Specific quotes and line numbers
  • Prioritized improvements - Most critical issues first
  • Encouragement - Acknowledge strengths and effort

Evaluation Best Practices

Be Objective and Evidence-Based

  • Quote specific passages from content being evaluated
  • Reference line numbers or section headers
  • Compare against objective rubric criteria, not subjective preference
  • Use concrete metrics where possible (word count, readability scores, etc.)

Focus on Standards, Not Perfection

  • Content rated "Good" (75-89%) is publication-ready with minor polish
  • Content rated "Excellent" (90-100%) exceeds standards but is not required
  • Focus improvements on moving "Needs Work" β†’ "Good" before "Good" β†’ "Excellent"

Provide Actionable Feedback

  • Don't just say "improve clarity" - specify which sentences are unclear and suggest rewrites
  • Don't just say "add examples" - suggest specific example types that would help
  • Prioritize recommendations: critical (blocking issues) β†’ important β†’ nice-to-have

Respect the Learning Journey

  • Recognize iterative improvement - drafts evolve through multiple passes
  • Celebrate progress and strengths
  • Frame criticism constructively as opportunities for growth
  • Remember: the goal is helping create excellent educational content, not gatekeeping

Quality Gates and Thresholds

Minimum Acceptance Threshold

  • Constitution Compliance: MUST be Pass (gate)
  • Overall Weighted Score: MUST be β‰₯ 75% (Good or better)
  • No category below 50%: Each individual category must achieve at least "Needs Work" tier

Recommended for Publication

  • Constitution Compliance: Pass
  • Overall Weighted Score: β‰₯ 82% (Good tier)
  • Technical Accuracy: β‰₯ 75% (Good tier) - Critical for credibility
  • Pedagogical Effectiveness: β‰₯ 75% (Good tier) - Critical for learning outcomes

Exemplary Content (Optional)

  • Overall Weighted Score: β‰₯ 90% (Excellent tier)
  • At least 3 categories at Excellent tier
  • No categories below Good tier

Common Evaluation Scenarios

Scenario 1: Mid-Draft Check (Iterative)

Context: Writer requests feedback on partial draft

Approach:

  • Focus on foundational issues (structure, learning objectives, concept scaffolding)
  • Flag critical issues early (technical errors, constitution violations)
  • Provide guidance for remaining sections
  • Don't expect polish - prioritize content completeness and correctness

Scenario 2: Completion Review

Context: Writer believes content is complete and ready for validation

Approach:

  • Conduct full evaluation across all 6 categories
  • Calculate final weighted score
  • Check all quality gates and thresholds
  • Provide comprehensive report with prioritized improvements
  • Determine if content meets publication standards

Scenario 3: Pre-Validation Quality Gate

Context: Content enters SDD Validate phase

Approach:

  • Verify constitution compliance (gate)
  • Confirm minimum acceptance threshold (β‰₯75%)
  • Validate all category scores meet minimums
  • Generate pass/fail recommendation with evidence
  • If fails gate: return to implementation with specific revision tasks

Scenario 4: On-Demand Spot Check

Context: User asks "How's this looking?" for specific section

Approach:

  • Evaluate relevant categories for that section (may not be all 6)
  • Provide quick feedback on specific concerns
  • Highlight any critical issues
  • Suggest improvements without full formal report
  • Use judgment on depth based on context

Resources and References

This skill includes detailed reference materials:

  • references/rubric-details.md - Comprehensive tier criteria for all 6 categories with specific indicators
  • references/constitution-checklist.md - Pass/Fail checklist for constitutional compliance evaluation
  • references/evaluation-template.md - Structured template for consistent evaluation reports

Load these references as needed during evaluation to ensure consistency and thoroughness.

---

Example Evaluation Flow

User Request: "Please evaluate this lesson draft: apps/learn-app/docs/chapter-3/lesson-2.md"

Evaluation Process:

  1. Read content: apps/learn-app/docs/chapter-3/lesson-2.md
  2. Load context: spec, plan, constitution, learning objectives
  3. Check constitution compliance: references/constitution-checklist.md

- Result: Pass (all non-negotiables met)

  1. Load detailed rubric: references/rubric-details.md
  2. Evaluate each category:

- Technical Accuracy: Good (80%) - Code works, minor type hint gaps

- Pedagogical Effectiveness: Excellent (92%) - Strong scaffolding, great exercises

- Writing Quality: Good (78%) - Clear writing, minor readability improvements

- Structure & Organization: Good (85%) - Good flow, all LOs met

- AI-First Teaching: Needs Work (65%) - AI exercises present but weak guidance

  1. Calculate weighted score:

- (80Γ—0.30) + (92Γ—0.25) + (78Γ—0.20) + (85Γ—0.15) + (65Γ—0.10) = 81.55%

- Final Tier: Good (81.55%)

  1. Load template: references/evaluation-template.md
  2. Generate report with findings, strengths, improvements, next steps
  3. Communicate verdict: "Good (81.55%) - Ready for publication with minor improvements to AI-First Teaching section"

---

Use this skill to maintain consistent, objective, evidence-based quality standards for all educational content.

More from this repository10

🎯
skill-validator🎯Skill

Validates skills comprehensively across 9 quality categories, scoring structure, content, interaction, documentation, and technical robustness to provide actionable improvement recommendations.

🎯
content-refiner🎯Skill

Refines content that failed Gate 4 by precisely trimming verbosity, strengthening lesson connections, and ensuring targeted improvements based on specific diagnostic criteria.

🎯
canonical-format-checker🎯Skill

Checks and validates content formats against canonical sources to prevent inconsistent pattern implementations across platform documentation.

🎯
assessment-architect🎯Skill

Generates comprehensive skill assessments by dynamically creating evaluation frameworks, rubrics, and scoring mechanisms for educational and professional contexts.

🎯
concept-scaffolding🎯Skill

Designs progressive learning sequences by breaking complex concepts into manageable steps, managing cognitive load, and validating understanding across different learning tiers.

🎯
summary-generator🎯Skill

Generates concise, Socratic-style lesson summaries by extracting core concepts, mental models, patterns, and AI collaboration insights from educational markdown files.

🎯
chapter-evaluator🎯Skill

Evaluates educational chapters by analyzing chapters chapters through aable student and teacher perspectives, generating structured ratings,, identifying content gaps, providing and prioritableized...

🎯
pptx🎯Skill

Generates, edits, and analyzes PowerPoint presentations with precise content control and narrative coherence.

🎯
docx🎯Skill

Generates, edits, and analyzes Microsoft Word documents (.docx) with advanced capabilities like tracked changes, comments, and text extraction.

🎯
technical-clarity🎯Skill

Refines technical writing by analyzing readability, reducing jargon, and ensuring content is comprehensible across different learner proficiency levels.