graph
🎯Skillfrom zpankz/mcp-skillset
graph skill from zpankz/mcp-skillset
Part of
zpankz/mcp-skillset(137 items)
Installation
python scripts/validate_graph.py graph.jsonpython scripts/analyze_graph.py graph.json --topologypython scripts/compress_graph.py graph.json --method k-bisim --k 5python scripts/verify_compression.py original.json compressed.json --queries reachability,patternpython scripts/compress_graph.py graph.json \+ 2 more commands
Skill Details
"Use when extracting entities and relationships, building ontologies, compressing large graphs, or analyzing knowledge structures - provides structural equivalence-based compression achieving 57-95% size reduction, k-bisimulation summarization, categorical quotient constructions, and metagraph hierarchical modeling with scale-invariant properties. Supports recursive refinement through graph topology metrics including |R|/|E| ratios and automorphism analysis."
Overview
# Graph: Extraction, Analysis & Categorical Compression
Systematic extraction and analysis of entities, relationships, and ontological structures from unstructured text—enhanced with categorical metagraph compression enabling scale-invariant representation through structural equivalence, k-bisimulation summarization, and quotient constructions that preserve query-answering capabilities while achieving dramatic size reductions.
Quick Start
Basic Extraction
- Load schema: Read
/mnt/skills/user/knowledge-graph/schemas/core_ontology.mdfor entity/relationship types - Extract entities and relationships using the schema as a guide
- Format as JSON following
/mnt/skills/user/knowledge-graph/templates/extraction_template.md - Validate: Run validation script on extracted graph
Compression Workflow
```bash
# 1. Extract → validate → analyze topology
python scripts/validate_graph.py graph.json
python scripts/analyze_graph.py graph.json --topology
# 2. Compute structural equivalence and compress
python scripts/compress_graph.py graph.json --method k-bisim --k 5
# 3. Verify query preservation
python scripts/verify_compression.py original.json compressed.json --queries reachability,pattern
```
Theoretical Foundation
The Compression Mechanism
Structural equivalence enables compression through a precise mechanistic chain:
Equivalence → Redundancy → Quotient → Preservation
- Equivalence relations partition structures: Graph automorphisms and categorical isomorphisms identify structurally interchangeable elements—vertices with identical connection patterns to equivalent neighbors belong to the same automorphism orbit
- Orbits represent information redundancy: For k vertices in one orbit, (k-1) are informationally redundant since they encode the same structural relationships
- Quotient constructions eliminate redundancy: Categorical quotients collapse equivalence classes to single representatives while the universal property guarantees any construction respecting the equivalence factors uniquely through the compressed representation
- Functors preserve structure across scales: The quotient functor Q: C → C/R is full and bijective on objects—no essential categorical information is lost
Quantitative Foundation
The connection between automorphisms and Kolmogorov complexity:
```
K(G) ≤ K(G/Aut(G)) + log|Aut(G)| + O(1)
```
Graphs with large automorphism groups have lower complexity because only one representative from each orbit needs encoding. For highly symmetric structures, compression can reach n/log n factor.
Why This Matters for Knowledge Graphs
Knowledge graphs exhibit natural structural regularities:
| Pattern | Compression Mechanism | Typical Reduction |
|---------|----------------------|-------------------|
| Type hierarchies | Automorphism orbits | 40-60% |
| Repeated subgraphs | k-bisimulation equivalence | 50-80% |
| Community structure | Block quotients | 30-50% |
| Self-similar patterns | Scale-invariant quotients | 60-95% |
Core Capabilities
1. Structured Entity Extraction
Extract entities with confidence scores, provenance tracking, and property attribution:
- Entity types: Person, Organization, Concept, Event, Document, Technology, Location
- Confidence scoring: 0.0-1.0 scale based on evidence clarity
- Provenance metadata: Source document, location, timestamp
- Alias tracking: Capture all name variations
Key principle: Every extraction must include confidence score and source tracking for auditability.
2. Relationship Mapping
Identify and classify relationships between entities:
- Core relationships: WORKS_FOR, AFFILIATED_WITH, RELATED_TO, AUTHORED, CITES, USES, LOCATED_IN, IMPLEMENTS
- Domain-specific relationships: Load schemas from
/mnt/skills/user/knowledge-graph/schemas/ - Bidirectional awareness: Track relationship directionality
- Property attribution: Capture relationship metadata (dates, roles, contexts)
3. Domain-Specific Schemas
General domains use core_ontology.md
Coding/software domains additionally use coding_domain.md which adds:
- CodeEntity, Repository, API, Library, Architecture, Bug types
- DEPENDS_ON, CALLS, INHERITS_FROM, FIXES, DEPLOYED_ON relationships
- Language-specific extraction patterns
4. Structural Equivalence Analysis
Identify and exploit structural redundancy through automorphism detection:
Automorphism-Based Compression:
```python
# Compute automorphism group
aut_group = compute_automorphisms(graph)
orbits = partition_by_orbits(graph.nodes, aut_group)
# Each orbit → single representative
compressed_nodes = [orbit.canonical_representative() for orbit in orbits]
compression_ratio = len(graph.nodes) / len(compressed_nodes)
```
Equivalence Types:
- Structural equivalence: Identical connection patterns (strictest)
- Regular equivalence: Same relationship types to equivalent alters
- Automorphic equivalence: Permutable without changing structure
5. k-Bisimulation Summarization
Compress graphs while preserving query semantics using k-bisimulation:
Definition: Two nodes are k-bisimilar if they have:
- Same labels
- Same edge types to k-bisimilar neighbors
- This property holds recursively to depth k
Implementation:
```bash
python scripts/compress_graph.py graph.json \
--method k-bisim \
--k 5 \ # k=5 sufficient for most graphs
--preserve-queries reachability,pattern
```
Empirical Results:
- k > 5 yields minimal additional partition refinement
- Achieves 95% reduction for reachability queries
- Achieves 57% reduction for pattern matching
- Incremental update cost: O(Δ·d^k) where Δ is changes, d is max degree
6. Categorical Quotient Construction
Apply category-theoretic compression with provable structure preservation:
The Universal Property Guarantee:
For any quotient Q: C → C/R, if H: C → D is any functor such that H(f) = H(g) whenever f ~ g in R, then H factors uniquely as H = H' ∘ Q.
This unique factorization means the quotient is the "freest" (most compressed) object respecting the equivalence—any construction built on the original that respects the equivalence can be equivalently built on the quotient.
Skeleton Construction:
```python
# Every category is equivalent to its skeleton
skeleton = compute_skeleton(category)
# skeleton contains exactly one representative per isomorphism class
# All categorical properties preserved (limits, colimits, exactness)
```
7. Metagraph Hierarchical Modeling
Support edge-of-edge structures for multi-scale representation:
Metagraph Definition: MG = ⟨V, MV, E, ME⟩
- V: vertices
- MV: metavertices (each containing an embedded metagraph fragment)
- E: edges connecting sets of vertices
- ME: metaedges connecting vertices, edges, or both
Why Metagraphs Enable Scale Invariance:
The edge-of-edge capability creates holonic structure—self-similar patterns where the relationship between a metavertex and its contents mirrors the relationship between the entire metagraph and its top-level components. Automorphisms operate at multiple levels simultaneously, creating compression opportunities at each scale when these automorphism structures are isomorphic across levels.
2-Category Interpretation:
- 0-cells: vertices/elements
- 1-morphisms: edges connecting sets
- 2-morphisms: metaedges relating edges
The interchange law ensures scale-independent composition.
8. Topology Metrics & Quality Validation
Graph Quality Metrics:
| Metric | Formula | Target | Significance |
|--------|---------|--------|--------------|
| Edge-to-Node Ratio | \|E\|/\|V\| | ≥4:1 | Enables emergence through dense connectivity |
| Isolation Rate | \|V_isolated\|/\|V\| | <20% | Measures integration completeness |
| Clustering Coefficient | Local triangles/possible triangles | >0.3 | Small-world property indicator |
| Fractal Dimension | d_B from box-covering | Finite | Self-similarity/compressibility |
| Average Path Length | Mean geodesic distance | Low | Information flow efficiency |
Scale-Invariance Indicators:
```
N_B(l_B) ~ l_B^(-d_B)
```
Networks with finite fractal dimension d_B are self-similar and can be compressed at multiple resolutions with compression ratio scaling as l^(d_B).
Validation Script:
```bash
python scripts/validate_graph.py graph.json --topology --compression-potential
```
9. Information-Theoretic Analysis
Structural Entropy:
```
H_s(G) = (n choose 2)h(p) - n·log(n) + O(n)
```
The term -n·log(n) represents compression gain from removing label information.
Minimum Description Length (MDL):
For graph G and model M:
```
L(G,M) = L(M) + L(G|M)
```
Optimal compression minimizes this total description length. Community structure reduces entropy by ~k·log(n) bits for k communities.
Compressibility Predictors:
- High transitivity → higher compressibility
- Degree heterogeneity → higher compressibility
- Hierarchical structure → enables predictable transitions, lower entropy rates
Extraction Guidelines
Confidence Scoring Rules
| Score | Criteria | Example |
|-------|----------|---------|
| 0.9-1.0 | Explicitly stated with clear evidence | "Dr. Jane Smith works for MIT" |
| 0.7-0.89 | Strongly implied by context | Person with @mit.edu email |
| 0.5-0.69 | Reasonably inferred but ambiguous | Co-authorship implies collaboration |
| 0.3-0.49 | Weak inference, requires validation | Similar domain suggests relationship |
| 0.0-0.29 | Speculative, likely incorrect | Pure assumption |
ID Generation Strategy
Create stable, meaningful identifiers:
- Format:
{type}_{normalized_name}(e.g.,person_jane_smith,org_mit) - Normalization: Lowercase, replace spaces with underscores, remove special chars
- Uniqueness: Add numeric suffix if collision occurs
- Stability: Same entity in different documents should generate same ID
Provenance Best Practices
Always include:
source_document: Document ID or filenamesource_location: Page number, section, line rangeextraction_timestamp: ISO 8601 formatextractor_version: Skill version identifier
Advanced Workflows
Compression Pipeline
```bash
# 1. Initial extraction
# (Extract to graph.json)
# 2. Validate and analyze
python scripts/validate_graph.py graph.json
python scripts/analyze_graph.py graph.json --full
# 3. Compute structural equivalence
python scripts/structural_equivalence.py graph.json \
--output equivalence_classes.json \
--method automorphism
# 4. Apply k-bisimulation compression
python scripts/compress_graph.py graph.json \
--equivalence equivalence_classes.json \
--method k-bisim --k 5 \
--output compressed.json
# 5. Verify preservation
python scripts/verify_compression.py graph.json compressed.json \
--queries reachability,pattern,neighborhood
# 6. Generate topology report
python scripts/topology_metrics.py compressed.json --report
```
Iterative Refinement with Compression
- Initial extraction: Broad pass capturing entities/relationships
- Topology analysis: Compute |E|/|V| ratio, isolation rate, clustering
- Compression analysis: Identify automorphism orbits, k-bisimilar classes
- Strategic refinement: Focus on:
- Central concepts with weak connections
- Isolated high-confidence entities
- Low-compression-potential regions (may need restructuring)
- Compress and validate: Apply quotient construction, verify query preservation
- Repeat: Continue until quality thresholds met AND compression ratio stabilizes
Termination criteria:
- Isolation rate < 20%
- |E|/|V| ratio ≥ 4:1
- Compression ratio improvement < 5% between iterations
- Query preservation verified
Multi-Scale Metagraph Construction
For complex domains requiring hierarchical representation:
```bash
# 1. Extract at multiple granularities
python scripts/extract_hierarchical.py source.txt \
--levels strategic,tactical,operational \
--output metagraph.json
# 2. Compute cross-level automorphisms
python scripts/metagraph_automorphisms.py metagraph.json
# 3. Apply scale-invariant compression
python scripts/compress_metagraph.py metagraph.json \
--preserve-hierarchy \
--output compressed_metagraph.json
```
Common Patterns
Pattern: Query-Preserving Compression
Compress while guaranteeing specific query types remain answerable:
```python
# Define query preservation requirements
queries = {
"reachability": True, # 95% reduction possible
"pattern_match": True, # 57% reduction possible
"neighborhood_k": 3, # Preserve 3-hop neighborhoods
}
# Compress with guarantees
compressed = compress_with_guarantees(
graph,
method="k-bisimulation",
k=max(5, queries["neighborhood_k"]),
preserve=queries
)
```
Pattern: Incremental Compression Maintenance
Maintain compression as graph evolves:
```python
# Update cost: O(Δ·d^k)
# Δ = number of changes
# d = maximum degree
# k = bisimulation depth
def update_compression(compressed_graph, changes):
affected_classes = identify_affected_equivalence_classes(changes)
recompute_local_bisimulation(affected_classes, k=5)
return updated_compressed_graph
```
Pattern: Categorical Ontology Integration
Use ologs (ontology logs) for categorical knowledge representation:
```python
# Olog: category where objects = noun phrases, morphisms = verb phrases
olog = {
"objects": ["a person", "an organization", "a concept"],
"morphisms": [
{"source": "a person", "target": "an organization", "label": "works for"},
{"source": "a concept", "target": "a concept", "label": "relates to"}
]
}
# Yoneda embedding: object determined by morphisms into it
# Compression: store relationships, not internal structure
```
Error Handling
Compression Quality Issues
When compression produces unexpected results:
- Over-compression: Raise k value in k-bisimulation (default k=5)
- Under-compression: Check for missing type labels, inconsistent schemas
- Query degradation: Verify query type is in preservation set
- Scale-invariance failure: Check for unbalanced hierarchical structure
Topology Violations
When graph metrics fall outside targets:
- |E|/|V| < 4: Graph too sparse—identify disconnected concepts, add bridging relationships
- Isolation > 20%: Too many orphan nodes—run connectivity analysis
- Clustering < 0.3: Lacks small-world property—add local triangulation
File Structure
```
/mnt/skills/user/knowledge-graph/
├── SKILL.md # This file
├── schemas/
│ ├── core_ontology.md # Universal entity/relationship types
│ ├── coding_domain.md # Software development extension
│ └── categorical_ontology.md # Category-theoretic type system
├── templates/
│ ├── extraction_template.md # JSON format specification
│ └── metagraph_template.md # Hierarchical metagraph format
└── scripts/
├── validate_graph.py # Quality validation
├── merge_graphs.py # Deduplication and merging
├── analyze_graph.py # Refinement strategy generation
├── compress_graph.py # k-bisimulation compression
├── structural_equivalence.py # Automorphism computation
├── topology_metrics.py # Graph topology analysis
└── verify_compression.py # Query preservation verification
```
Dependencies
All scripts require Python 3.7+ with standard library only (no external packages for core functionality). Optional NetworkX for advanced topology metrics.
Best Practices Summary
- Always start with schema: Load appropriate ontology before extraction
- Include confidence scores: Never omit—use 0.5 if uncertain
- Track provenance: Every entity/relationship needs source metadata
- Validate early: Run validation after each extraction
- Analyze topology: Check |E|/|V| ratio before refinement
- Compress strategically: Use k=5 for k-bisimulation (sufficient for most graphs)
- Preserve queries: Specify which query types must remain answerable
- Iterate with metrics: Let topology and compression metrics guide improvement
Integration with Other Skills
This skill composes naturally with:
- hierarchical-reasoning: Strategic→tactical→operational maps to metagraph levels
- obsidian-markdown: Compressed graphs export as linked note structures
- knowledge-orchestrator: Automatic routing for extraction→compression→documentation workflows
- infranodus-orchestrator: Text network analysis → k-bisimulation compression
Hierarchical Reasoning Integration
```yaml
mapping:
strategic_level: metagraph_level_0
tactical_level: metagraph_level_1
operational_level: metagraph_level_2
convergence_metrics: compression_ratio, query_preservation
```
Evaluation Criteria
A high-quality extraction with compression demonstrates:
- Completeness: Major entities and relationships captured
- Accuracy: High confidence scores (avg >0.7) and validated
- Connectivity: |E|/|V| ≥ 4:1, isolation <20%
- Compressibility: Achieves ≥50% reduction via k-bisimulation
- Preservation: Specified queries remain answerable post-compression
- Scale-invariance: Finite fractal dimension for hierarchical structures
---
Core Philosophy: Knowledge graphs emerge through iterative refinement—initial extraction establishes structure, topology analysis reveals density gaps, structural equivalence enables compression, and categorical quotients preserve essential relationships while eliminating redundancy. The compression is "lossy but structure-preserving" because categorical equivalence guarantees that compressed representations support all the same inferences as their originals.
More from this repository10
Provides expert guidance for designing high-quality, clean software architectures using domain-driven design principles and best practices.
cursor-skills skill from zpankz/mcp-skillset
Skill
claude-docs-consultant skill from zpankz/mcp-skillset
csv-analysis skill from zpankz/mcp-skillset
Performs systematic code refactoring by finding and replacing patterns, identifiers, and API calls across multiple files while preserving code functionality.
Evaluates and improves Claude Code commands, skills, and agents by testing prompt effectiveness and validating context engineering choices.
Automatically analyzes and optimizes Claude's code architecture, identifying redundancies and improving system efficiency through meta-cognitive techniques.
Enables declarative programming of AI systems, automatically optimizing prompts and creating modular RAG pipelines using Stanford NLP's DSPy framework.
network-meta-analysis-appraisal skill from zpankz/mcp-skillset