🎯

pdf

🎯Skill

from shareai-lab/learn-claude-code

VibeIndex|
What it does

Extracts, creates, merges, and manipulates PDF files using multiple Python libraries and command-line tools.

πŸ“¦

Part of

shareai-lab/learn-claude-code(4 items)

pdf

Installation

pip installInstall Python package
pip install pymupdf
pip installInstall Python package
pip install reportlab
pip installInstall Python package
pip install pdfkit
πŸ“– Extracted from docs: shareai-lab/learn-claude-code
13Installs
-
AddedFeb 4, 2026

Skill Details

SKILL.md

Process PDF files - extract text, create PDFs, merge documents. Use when user asks to read PDF, create PDF, or work with PDF files.

Overview

# PDF Processing Skill

You now have expertise in PDF manipulation. Follow these workflows:

Reading PDFs

Option 1: Quick text extraction (preferred)

```bash

# Using pdftotext (poppler-utils)

pdftotext input.pdf - # Output to stdout

pdftotext input.pdf output.txt # Output to file

# If pdftotext not available, try:

python3 -c "

import fitz # PyMuPDF

doc = fitz.open('input.pdf')

for page in doc:

print(page.get_text())

"

```

Option 2: Page-by-page with metadata

```python

import fitz # pip install pymupdf

doc = fitz.open("input.pdf")

print(f"Pages: {len(doc)}")

print(f"Metadata: {doc.metadata}")

for i, page in enumerate(doc):

text = page.get_text()

print(f"--- Page {i+1} ---")

print(text)

```

Creating PDFs

Option 1: From Markdown (recommended)

```bash

# Using pandoc

pandoc input.md -o output.pdf

# With custom styling

pandoc input.md -o output.pdf --pdf-engine=xelatex -V geometry:margin=1in

```

Option 2: Programmatically

```python

from reportlab.lib.pagesizes import letter

from reportlab.pdfgen import canvas

c = canvas.Canvas("output.pdf", pagesize=letter)

c.drawString(100, 750, "Hello, PDF!")

c.save()

```

Option 3: From HTML

```bash

# Using wkhtmltopdf

wkhtmltopdf input.html output.pdf

# Or with Python

python3 -c "

import pdfkit

pdfkit.from_file('input.html', 'output.pdf')

"

```

Merging PDFs

```python

import fitz

result = fitz.open()

for pdf_path in ["file1.pdf", "file2.pdf", "file3.pdf"]:

doc = fitz.open(pdf_path)

result.insert_pdf(doc)

result.save("merged.pdf")

```

Splitting PDFs

```python

import fitz

doc = fitz.open("input.pdf")

for i in range(len(doc)):

single = fitz.open()

single.insert_pdf(doc, from_page=i, to_page=i)

single.save(f"page_{i+1}.pdf")

```

Key Libraries

| Task | Library | Install |

|------|---------|---------|

| Read/Write/Merge | PyMuPDF | pip install pymupdf |

| Create from scratch | ReportLab | pip install reportlab |

| HTML to PDF | pdfkit | pip install pdfkit + wkhtmltopdf |

| Text extraction | pdftotext | brew install poppler / apt install poppler-utils |

Best Practices

  1. Always check if tools are installed before using them
  2. Handle encoding issues - PDFs may contain various character encodings
  3. Large PDFs: Process page by page to avoid memory issues
  4. OCR for scanned PDFs: Use pytesseract if text extraction returns empty