pdf-to-markdown
π―Skillfrom duc01226/easyplatform
pdf-to-markdown skill from duc01226/easyplatform
Installation
npx skills add https://github.com/duc01226/easyplatform --skill pdf-to-markdownSkill Details
Convert PDF files to Markdown. Use when extracting text from PDFs, creating editable documentation from PDF reports, or converting PDF content to version-controlled markdown files.
Overview
# pdf-to-markdown
Convert PDF files to Markdown format.
Installation Required
```bash
cd .claude/skills/pdf-to-markdown
npm install
```
Dependencies: pdf-parse
Quick Start
```bash
# Basic conversion
node .claude/skills/pdf-to-markdown/scripts/convert.cjs \
--file ./document.pdf
# Custom output path
node .claude/skills/pdf-to-markdown/scripts/convert.cjs \
--file ./doc.pdf \
--output ./output/doc.md
```
CLI Options
| Option | Required | Description |
| ----------------- | -------- | ------------------------------------------------ |
| --file | Yes | Input PDF file |
| --output | No | Output Markdown path (default: input name + .md) |
Output Format (JSON)
```json
{
"success": true,
"input": "/path/to/input.pdf",
"output": "/path/to/output.md",
"wordCount": 1523,
"warnings": ["Tables may not be accurately converted"]
}
```
Supported Elements
- Text extraction from digital PDFs
- Headings (detected by font size heuristics)
- Paragraphs
- Basic lists
- Links (when embedded in PDF)
Known Limitations
- Tables: Very limited support; may not render correctly
- Multi-column layouts: Text may interleave between columns
- Scanned PDFs: NOT supported (requires OCR - see alternatives below)
- Images: NOT extracted (PDF images are not included in output)
- Complex formatting: May be simplified or lost
- Password-protected PDFs: NOT supported
Alternatives for Unsupported Cases
For scanned PDFs (OCR needed):
- Use
scribe.js-ocrlibrary (AGPL license) - Commercial OCR services (Google Cloud Vision, AWS Textract)
For complex tables:
- Consider AI-based extraction (LLM post-processing)
- Manual review and correction
For image extraction:
- Use
unpdflibrary withsharpfor image extraction - Process images separately and reference in markdown
Troubleshooting
Dependencies not found: Run npm install in skill directory
Empty output: PDF may be scanned/image-based (requires OCR)
Garbled text: PDF may use embedded fonts not supported by parser
Memory issues: Large PDFs may require --max-old-space-size=4096 flag
IMPORTANT Task Planning Notes
- Always plan and break many small todo tasks
- Always add a final review todo task to review the works done at the end to find any fix or enhancement needed
More from this repository10
Generates comprehensive test plans, test cases, and coverage analysis to support QA engineers in systematic software testing and quality assurance.
Teaches Claude new patterns, preferences, and conventions to remember across coding sessions using explicit learning commands.
Helps Product Owners prioritize ideas, manage backlogs, and communicate product vision through structured decision-making frameworks.
Generates comprehensive enterprise module documentation with a 26-section structure, creating detailed specs and folder hierarchy for business features.
Generates Angular reactive forms with advanced validation, async validators, dependent validation, and FormArrays using platform-specific design patterns.
Manages Claude's learned patterns by listing, viewing, archiving, and dynamically adjusting pattern confidence levels.
Optimizes system performance by triaging and routing to specific strategies for database, frontend, API, jobs, and cross-service bottlenecks.
Rapidly build accessible React UI with shadcn/ui components, Radix primitives, and Tailwind CSS utility styling for modern web applications.
Analyzes implementation plans by extracting features, assessing change impacts, mapping specifications, and preparing comprehensive technical and business impact reports.
Develops comprehensive backend components for .NET microservices using EasyPlatform's CQRS, domain-driven design, and modular architecture patterns.