🎯

hwpx

🎯Skill

from iamseungpil/claude-for-dslab

What it does

Enables comprehensive creation, editing, and analysis of Korean Hancom Office (.hwpx) documents through advanced parsing and manipulation techniques.

📦

Part of

iamseungpil/claude-for-dslab(8 items)

hwpx

Installation

npxRun with npx

npx hwpxjs convert:hwp document.hwp output.hwpx

PythonRun Python server

python scripts/office/soffice.py --headless --convert-to hwpx document.hwp

npxRun with npx

npx hwpxjs txt document.hwpx

npxRun with npx

npx hwpxjs html document.hwpx > output.html

PythonRun Python server

python scripts/unpack.py document.hwpx unpacked/

+ 3 more commands

📖 Extracted from docs: iamseungpil/claude-for-dslab

Need more details? View full documentation on GitHub →

1Installs

AddedFeb 4, 2026

View on GitHub Back to Skills

Skill Details

SKILL.md

"Comprehensive HWPX (Korean Hancom Office) document creation, editing, and analysis. When Claude needs to work with Korean word processor documents (.hwpx files) for: (1) Reading and extracting content, (2) Creating new documents, (3) Modifying or editing content, (4) Extracting tables to CSV, (5) Modifying tables or table cells, or any other HWPX document tasks. MANDATORY TRIGGERS: hwpx, hwp, 한글, 한컴, Hancom, Korean document"

Overview

# HWPX creation, editing, and analysis

Overview

A .hwpx file is a ZIP archive containing XML files, based on the OWPML (Open Word-Processor Markup Language) standard (KS X 6101).

Quick Reference

| Task | Approach |

|------|----------|

| Read/analyze content | hwpxjs or unpack for raw XML |

| Create new document | Use hwpxjs - see Creating New Documents below |

| Edit existing document | Unpack → edit XML → repack - see Editing Existing Documents below |

Converting .hwp to .hwpx

Legacy .hwp files must be converted before editing:

```bash

# Using hwpxjs CLI (pure TypeScript, no external dependencies)

npx hwpxjs convert:hwp document.hwp output.hwpx

# Or using LibreOffice as fallback

python scripts/office/soffice.py --headless --convert-to hwpx document.hwp

```

Reading Content

```bash

# Text extraction via CLI

npx hwpxjs txt document.hwpx

# HTML conversion (includes images/styles)

npx hwpxjs html document.hwpx > output.html

# Raw XML access

python scripts/unpack.py document.hwpx unpacked/

```

Converting to Images

```bash

python scripts/office/soffice.py --headless --convert-to pdf document.hwpx

pdftoppm -jpeg -r 150 document.pdf page

```

---

Creating New Documents

Generate .hwpx files with JavaScript. Install: npm install @ssabrojs/hwpxjs

Setup

```javascript

const { HwpxWriter, HwpxReader } = require("@ssabrojs/hwpxjs");

const fs = require("fs");

// Create document from plain text

const writer = new HwpxWriter();

const content = `문서 제목

첫 번째 문단입니다.

두 번째 문단입니다.`;

const buffer = await writer.createFromPlainText(content);

fs.writeFileSync("output.hwpx", buffer);

```

Reading Documents

```javascript

const { HwpxReader } = require("@ssabrojs/hwpxjs");

const fs = require("fs");

const reader = new HwpxReader();

const fileBuffer = fs.readFileSync("document.hwpx");

await reader.loadFromArrayBuffer(fileBuffer.buffer);

// Extract text

const text = await reader.extractText();

console.log(text);

// Get document info

const info = await reader.getDocumentInfo();

console.log(info);

// List images

const images = await reader.listImages();

console.log(images);

// [{ binPath: "BinData/0.jpg", width: 200, height: 150, format: "jpg" }]

```

HTML Conversion

```javascript

// Basic HTML conversion

const html = await reader.extractHtml();

// With all options

const fullHtml = await reader.extractHtml({

paragraphTag: "p",

tableClassName: "hwpx-table",

renderImages: true, // Include images

renderTables: true, // Include tables

renderStyles: true, // Apply styles (bold, italic, color)

embedImages: true, // Base64 embed images

tableHeaderFirstRow: true // First row as

});

```

HWP to HWPX Conversion

```javascript

const { HwpConverter } = require("@ssabrojs/hwpxjs");

const converter = new HwpConverter({ verbose: true });

// Check availability

if (converter.isAvailable()) {

// Convert HWP to HWPX

const result = await converter.convertHwpToHwpx("input.hwp", "output.hwpx");

if (result.success) {

console.log(Converted: ${result.processingTime}ms);

}

// Or extract text only

const text = await converter.convertHwpToText("input.hwp");

}

```

Template Processing

```javascript

// hwpxjs supports {{key}} template replacement

const reader = new HwpxReader();

await reader.loadFromArrayBuffer(templateBuffer);

// Apply template replacements

const html = await reader.extractHtml();

const result = html

.replace(/\{\{name\}\}/g, "홍길동")

.replace(/\{\{date\}\}/g, "2025-01-01");

```

Critical Rules for hwpxjs

createFromPlainText returns Buffer - save with fs.writeFileSync(path, buffer)
loadFromArrayBuffer for reading - pass fileBuffer.buffer not fileBuffer
Text-only creation - for tables/images, use XML editing approach below
HwpConverter for HWP files - pure TypeScript, no LibreOffice needed
extractHtml for rich content - includes styles, tables, images

---

Editing Existing Documents

Follow all 3 steps in order.

Step 1: Unpack

```bash

python scripts/unpack.py document.hwpx unpacked/

```

Step 2: Edit XML

Edit files in unpacked/Contents/. See XML Reference below for patterns.

Use the Edit tool directly for string replacement. Do not write Python scripts. Scripts introduce unnecessary complexity. The Edit tool shows exactly what is being replaced.

CRITICAL: Remove when modifying text. This element contains cached layout data. Leaving stale linesegarray causes character overlap:

```xml

Original text

New longer text that exceeds original width

```

Note: Multiple elements share one . Remove it when editing ANY run in the paragraph.

Step 3: Pack

```bash

python scripts/pack.py unpacked/ output.hwpx

```

Common Pitfalls

Character overlap after edit: Remove from the edited . Multiple elements share one linesegarray—remove it when editing ANY run.
Wrong table cell modified: Include in search pattern. CRITICAL: appears AFTER cell content, not before. Use grep -B20 'colAddr="2" rowAddr="0"' section0.xml.
Preserve charPrIDRef: Don't change charPrIDRef when editing text—it references font/size/style in header.xml.
File corruption from string replacement: Use lxml for structural changes (inserting elements). String replacement breaks XML parent-child relationships.
Page overflow from text replacement: Replacing blanks/spaces with text can cause content overflow and page breaks. Solutions: (1) Keep replacement text similar in length to original spaces, (2) Preserve charPrIDRef for underlined fields to maintain underline style, (3) Reduce unnecessary whitespace proportionally, (4) Cell/margin adjustments may be needed.
Image size too large (e.g., 635mm): HWP unit calculation error. 1 HWP unit = 1/7200 inch, so 1mm ≈ 283.5 HWP units.

- ❌ Wrong: width="180000" → 635mm (too large!)

- ✅ Correct: width="3400" → ~12mm (signature size)

- Formula: mm × (7200 ÷ 25.4) = HWP units

---

XML Reference

Key Elements

| Element | Purpose |

|---------|---------|

| | Paragraph |

| | Text run with formatting |

| | Text content |

| | Table |

| | Table cell |

| | Cell position (AFTER content) |

| | Image |

| | Layout cache (remove when editing) |

Paragraph Structure

```xml

Text content

```

Table Cell Structure

```xml

Cell content

```

Images

CRITICAL: MUST be inside , followed by empty

Add image file to BinData/
Add to manifest Contents/content.hpf:

```xml

```

Reference in section0.xml:

```xml

```

Size units: HWP uses 1/7200 inch units. 1mm ≈ 283.5 units (7200 ÷ 25.4)

For safe image insertion using lxml, see [references/image-insertion.md](references/image-insertion.md).

Page Break

```xml

```

Differences from DOCX

| Aspect | HWPX | DOCX |

|--------|------|------|

| Text element | | |

| Paragraph | | |

| Run | | |

| Layout cache | | None |

| Content location | Contents/section*.xml | word/document.xml |

| Cell identifier | after content | implicit order |

Key difference: HWPX stores layout cache in linesegarray; DOCX doesn't. This is why editing HWPX requires removing linesegarray.

For detailed XML structures (headers/footers, lists/numbering, paragraph formatting), see [references/xml-reference.md](references/xml-reference.md).

---

Dependencies

```bash

npm install @ssabrojs/hwpxjs

```

hwpxjs: npm install @ssabrojs/hwpxjs - reading, writing, HTML conversion, HWP→HWPX conversion
pyhwp2md: Converting HWP/HWPX to Markdown (alternative)
LibreOffice: PDF conversion (auto-configured via scripts/office/soffice.py)
Poppler: pdftoppm for PDF to images

More from this repository7

🎯

codex-iterative-solver🎯Skill

Iteratively analyzes complex coding problems with Codex CLI, gathering context, proposing solutions, and refining approaches through expert feedback cycles.

🎯

weekly-report-writer🎯Skill

Generates academic-style weekly research reports by analyzing Git repository changes, iteratively improving quality through structured planning and review.

🎯

iterative-code-review🎯Skill

Iteratively improves code quality by using task planning, code architecture, and review agents to systematically enhance and validate code before production deployment.

🎯

update-study🎯Skill

Incrementally updates study documents by analyzing new experiment logs, enhancing writing quality, verifying data, and exporting to PDF.

🎯

paper-digest🎯Skill

Extracts and summarizes key insights from research papers, providing concise digests to help researchers quickly understand academic content.

🎯

iterative-academic-writer🎯Skill

Generates high-quality academic documents through an iterative process with fact verification, structural planning, and continuous quality improvement.

🎯

survey-paper🎯Skill

Generates comprehensive survey papers by analyzing and synthesizing research literature across multiple academic domains.