🎯

pseo-data

🎯Skill

from lisbeth718/pseo-skills

What it does

Designs and generates structured data architecture for programmatic SEO pages, ensuring type-safe, SEO-optimized content models and data pipelines.

📦

Part of

lisbeth718/pseo-skills(12 items)

pseo-data

Installation

📋 No install commands found in docs. Showing default command. Check GitHub for actual instructions.

Quick InstallInstall with npx

npx skills add lisbeth718/pseo-skills --skill pseo-data

Need more details? View full documentation on GitHub →

1Installs

AddedFeb 4, 2026

View on GitHub Back to Skills

Skill Details

SKILL.md

Design and implement the structured data architecture that powers programmatic SEO pages, including content models, data sources, slug generation, and data-fetching layers. Use when setting up or refactoring the data foundation for pSEO, designing content models, or building the data pipeline that feeds page templates.

Overview

# pSEO Data Architecture

Design and implement the structured data layer that feeds all programmatic SEO pages. This is the foundation every other pSEO skill depends on.

Core Principles

Single source of truth: All page data flows from one data layer
SEO-complete models: Every content model includes all fields needed for metadata, schema markup, and linking
Unique slugs by construction: Slug generation enforces uniqueness at the data level
Type safety: All data models are fully typed (TypeScript interfaces/types)
Separation of concerns: Data fetching is decoupled from page rendering

Implementation Steps

1. Define Content Models

Create TypeScript interfaces for each page type using a two-tier model. The lightweight index tier is safe to hold in memory for all pages; the heavy full tier is loaded per-page only.

```typescript

// Index tier: safe to load all at once (~1KB per page)

interface PageIndex {

slug: string; // unique, URL-safe

title: string; // page title (50-60 chars target)

metaDescription: string; // meta description (150-160 chars target)

h1: string; // primary heading (can differ from title)

canonicalPath: string; // canonical URL path

category: string; // for hub-spoke and breadcrumbs

lastModified: string; // ISO date for sitemap

}

// Full tier: extends PageIndex with heavy fields (~50-500KB per page)

interface BaseSEOContent extends PageIndex {

introText: string;

bodyContent: string;

faqs?: FAQ[];

relatedSlugs?: string[];

featuredImage?: SEOImage;

}

```

Extend BaseSEOContent for each page type with domain-specific fields. The interfaces above show the minimum required fields. See references/content-models.md for the full definitions (which add subcategory, tags, publishedDate, status, and more) and extended type examples (LocationPage, ProductPage, ComparisonPage, CategoryPage).

2. Build the Data-Fetching Layer

Create a centralized data module (e.g., lib/data.ts or src/data/index.ts) that exports:

getAllSlugs() - Returns all valid slugs for static generation. Must handle pagination internally when the data source has 1000+ records (fetch in batches, return the complete list).
getPageData(slug) - Returns full content for a single page
getPagesByCategory(category, opts?) - Returns pages in a category for hub pages. Accept optional limit and offset for paginated hub pages.
getRelatedPages(slug, limit?) - Returns related pages for internal linking
getAllCategories() - Returns all categories for navigation and hubs
getPageCount() - Returns total page count (useful for sitemap splitting and build diagnostics)

All functions must be:

Cached or memoized during build to avoid redundant reads
Typed with explicit return types
Guarded against missing or malformed data
Internally paginated when the data source imposes limits (e.g., CMS APIs with 100-item pages). The consumer should never need to handle pagination — the data layer abstracts it.

3. Implement Slug Generation

Design a slug strategy that:

Produces URL-safe, lowercase, hyphenated strings
Guarantees uniqueness across the entire dataset
Is deterministic (same input always produces same slug)
Includes a collision detection mechanism
Follows a consistent URL hierarchy (e.g., /category/page-slug)

4. Validate Data Integrity

Build a validation function or script that checks:

No duplicate slugs exist
All required fields are present and non-empty
Title and description lengths are within SEO targets
All category references resolve to valid categories
No orphan pages (pages not reachable through any category)

5. Set Up Data Source Integration

Based on the data source ($ARGUMENTS or detected):

JSON files: Create a data/ directory with typed JSON, a loader, and build-time validation.

CMS (headless): Create API client with typed responses, implement caching, handle pagination for 1000+ items.

Database: Create a query layer with connection pooling, implement cursor-based pagination, add query caching.

MDX files: Set up frontmatter schema validation, create a content loader with gray-matter parsing.

API: Create a typed API client, implement rate limiting and retry logic, add response caching.

Scale Limits

The in-memory and file-based patterns in this skill work up to ~10K pages. Beyond that:

10K-50K pages: Requires a database (PostgreSQL, MySQL). In-memory index tier becomes borderline at 50K (~50MB). File-based data sources are too slow.
50K-100K+ pages: Requires database + cache layer (Redis) + cursor-based pagination. getAllSlugs() must use cursor iteration, not array return. Data sufficiency gating prevents generating thin pages.

See pseo-scale for the complete database-backed data layer, sufficiency scoring, and scale-specific patterns.

Memory-Conscious Data Patterns

At 1000+ pages, how data is loaded matters more than what is loaded. A full content model with body text, FAQs, and images can be 50-500KB per page. Loading all pages into memory simultaneously will OOM.

Two-tier data model:

Split the data layer into lightweight index data and full page data. The PageIndex and BaseSEOContent interfaces from section 1 define the two tiers:

getAllSlugs(), getRelatedPages(), getPagesByCategory() — return PageIndex[] (lightweight, ~1KB per page)
getPageData() — returns BaseSEOContent (or an extended type) for a single page (heavy, ~50-500KB per page, only one at a time)

Never do this:

```typescript

// Loads ALL full content into memory — will OOM at scale

const allPages = await Promise.all(slugs.map(s => getPageData(s)));

```

Instead:

```typescript

// Process pages one at a time or in small batches

for (const slug of slugs) {

const page = await getPageData(slug);

await processPage(page);

// page is GC'd after each iteration

}

```

CMS/API pagination:

Fetch in batches of 100-250 records
Yield or push to an array incrementally — don't hold all API responses in memory simultaneously
If using GraphQL, only request index fields in list queries, full fields in single-item queries

File Organization

```

lib/

data/

index.ts # public API (re-exports)

types.ts # TypeScript interfaces

fetcher.ts # data source integration

slugs.ts # slug generation and validation

validation.ts # data integrity checks

cache.ts # build-time caching utilities

```

Quality Checks

Before considering this complete:

[ ] All content models extend BaseSEOContent (which extends PageIndex)
[ ] getAllSlugs() returns 0 duplicates
[ ] Data validation passes with zero errors
[ ] Data layer exports are fully typed with no any
[ ] Fetching is memoized for build performance
[ ] A test or script can validate the full dataset
[ ] Two-tier data model implemented (index data vs. full page data)
[ ] No function loads all full page content into memory simultaneously
[ ] CMS/API fetching uses batched pagination internally

Relationship to Other Skills

This skill provides the data foundation for:

pseo-templates: Consumes getPageData() and getAllSlugs()
pseo-metadata: Reads title, description, canonical from content models
pseo-schema: Uses structured fields for JSON-LD generation
pseo-linking: Uses getRelatedPages() and category data
pseo-quality-guard: Validates against the content models

More from this repository10

🎯

pseo-quality-guard🎯Skill

Validates programmatic SEO pages by detecting thin content, duplicates, and keyword cannibalization to prevent Google penalties.

🎯

pseo-llm-visibility🎯Skill

Optimizes programmatic SEO pages for AI search visibility by structuring content for LLM extraction and citation across multiple generative engines.

🎯

pseo-performance🎯Skill

Optimizes programmatic SEO applications for fast builds, excellent Core Web Vitals, and scalable performance across 1000+ pages.

🎯

pseo-schema🎯Skill

Generates context-specific JSON-LD structured data schemas for programmatic SEO pages to enhance search engine understanding and rich result potential.

🎯

pseo-orchestrate🎯Skill

Orchestrates the complete programmatic SEO workflow by sequentially executing specialized skills to discover, audit, implement, and validate pSEO pages across multiple phases.

🎯

pseo-discovery🎯Skill

Discovers programmatic SEO opportunities by analyzing codebase, business context, and data assets to identify scalable page types matching search intent.

🎯

pseo-linking🎯Skill

Links and maps potential career pathways by analyzing professional skills, educational backgrounds, and industry connections.

🎯

pseo-scale🎯Skill

Scales programmatic SEO systems to 10K-100K+ pages with database-backed architectures, incremental validation, and optimized content delivery.

🎯

pseo-audit🎯Skill

Audits a codebase for programmatic SEO readiness, identifying scalability gaps and providing actionable insights for 1000+ page websites.

🎯

pseo-templates🎯Skill

Generates programmatic SEO page templates with dynamic routing, creating unique, intent-matched content for each page automatically.