sentencepiece
๐ฏSkillfrom ovachiever/droid-tings
Tokenizes raw Unicode text using unsupervised BPE and Unigram algorithms, enabling multilingual and CJK language support with deterministic, lightweight vocabulary generation.
Part of
ovachiever/droid-tings(370 items)
Installation
pip install sentencepiecegit clone https://github.com/google/sentencepiece.gitSkill Details
Language-independent tokenizer treating text as raw Unicode. Supports BPE and Unigram algorithms. Fast (50k sentences/sec), lightweight (6MB memory), deterministic vocabulary. Used by T5, ALBERT, XLNet, mBART. Train on raw text without pre-tokenization. Use when you need multilingual support, CJK languages, or reproducible tokenization.
More from this repository10
nextjs-shadcn-builder skill from ovachiever/droid-tings
security-auditor skill from ovachiever/droid-tings
threejs-graphics-optimizer skill from ovachiever/droid-tings
api-documenter skill from ovachiever/droid-tings
secret-scanner skill from ovachiever/droid-tings
readme-updater skill from ovachiever/droid-tings
applying-brand-guidelines skill from ovachiever/droid-tings
Configures Tailwind v4 with shadcn/ui, automating CSS variable setup, dark mode, and preventing common initialization errors.
deep-reading-analyst skill from ovachiever/droid-tings
dependency-auditor skill from ovachiever/droid-tings