🎯

rdkit

🎯Skill

from ovachiever/droid-tings

What it does

Enables advanced molecular manipulation and analysis through comprehensive cheminformatics operations like parsing, descriptor calculation, fingerprinting, and substructure searching.

📦

Part of

ovachiever/droid-tings(370 items)

rdkit

Installation

git cloneClone repository

git clone https://github.com/ovachiever/droid-tings.git

📖 Extracted from docs: ovachiever/droid-tings

Need more details? View full documentation on GitHub →

16Installs

AddedFeb 4, 2026

View on GitHub Back to Skills

Skill Details

SKILL.md

"Cheminformatics toolkit for fine-grained molecular control. SMILES/SDF parsing, descriptors (MW, LogP, TPSA), fingerprints, substructure search, 2D/3D generation, similarity, reactions. For standard workflows with simpler interface, use datamol (wrapper around RDKit). Use rdkit for advanced control, custom sanitization, specialized algorithms."

Overview

# RDKit Cheminformatics Toolkit

Overview

RDKit is a comprehensive cheminformatics library providing Python APIs for molecular analysis and manipulation. This skill provides guidance for reading/writing molecular structures, calculating descriptors, fingerprinting, substructure searching, chemical reactions, 2D/3D coordinate generation, and molecular visualization. Use this skill for drug discovery, computational chemistry, and cheminformatics research tasks.

Core Capabilities

1. Molecular I/O and Creation

Reading Molecules:

Read molecular structures from various formats:

```python

from rdkit import Chem

# From SMILES strings

mol = Chem.MolFromSmiles('Cc1ccccc1') # Returns Mol object or None

# From MOL files

mol = Chem.MolFromMolFile('path/to/file.mol')

# From MOL blocks (string data)

mol = Chem.MolFromMolBlock(mol_block_string)

# From InChI

mol = Chem.MolFromInchi('InChI=1S/C6H6/c1-2-4-6-5-3-1/h1-6H')

```

Writing Molecules:

Convert molecules to text representations:

```python

# To canonical SMILES

smiles = Chem.MolToSmiles(mol)

# To MOL block

mol_block = Chem.MolToMolBlock(mol)

# To InChI

inchi = Chem.MolToInchi(mol)

```

Batch Processing:

For processing multiple molecules, use Supplier/Writer objects:

```python

# Read SDF files

suppl = Chem.SDMolSupplier('molecules.sdf')

for mol in suppl:

if mol is not None: # Check for parsing errors

# Process molecule

pass

# Read SMILES files

suppl = Chem.SmilesMolSupplier('molecules.smi', titleLine=False)

# For large files or compressed data

with gzip.open('molecules.sdf.gz') as f:

suppl = Chem.ForwardSDMolSupplier(f)

for mol in suppl:

# Process molecule

pass

# Multithreaded processing for large datasets

suppl = Chem.MultithreadedSDMolSupplier('molecules.sdf')

# Write molecules to SDF

writer = Chem.SDWriter('output.sdf')

for mol in molecules:

writer.write(mol)

writer.close()

```

Important Notes:

All MolFrom* functions return None on failure with error messages
Always check for None before processing molecules
Molecules are automatically sanitized on import (validates valence, perceives aromaticity)

2. Molecular Sanitization and Validation

RDKit automatically sanitizes molecules during parsing, executing 13 steps including valence checking, aromaticity perception, and chirality assignment.

Sanitization Control:

```python

# Disable automatic sanitization

mol = Chem.MolFromSmiles('C1=CC=CC=C1', sanitize=False)

# Manual sanitization

Chem.SanitizeMol(mol)

# Detect problems before sanitization

problems = Chem.DetectChemistryProblems(mol)

for problem in problems:

print(problem.GetType(), problem.Message())

# Partial sanitization (skip specific steps)

from rdkit.Chem import rdMolStandardize

Chem.SanitizeMol(mol, sanitizeOps=Chem.SANITIZE_ALL ^ Chem.SANITIZE_PROPERTIES)

```

Common Sanitization Issues:

Atoms with explicit valence exceeding maximum allowed will raise exceptions
Invalid aromatic rings will cause kekulization errors
Radical electrons may not be properly assigned without explicit specification

3. Molecular Analysis and Properties

Accessing Molecular Structure:

```python

# Iterate atoms and bonds

for atom in mol.GetAtoms():

print(atom.GetSymbol(), atom.GetIdx(), atom.GetDegree())

for bond in mol.GetBonds():

print(bond.GetBeginAtomIdx(), bond.GetEndAtomIdx(), bond.GetBondType())

# Ring information

ring_info = mol.GetRingInfo()

ring_info.NumRings()

ring_info.AtomRings() # Returns tuples of atom indices

# Check if atom is in ring

atom = mol.GetAtomWithIdx(0)

atom.IsInRing()

atom.IsInRingSize(6) # Check for 6-membered rings

# Find smallest set of smallest rings (SSSR)

from rdkit.Chem import GetSymmSSSR

rings = GetSymmSSSR(mol)

```

Stereochemistry:

```python

# Find chiral centers

from rdkit.Chem import FindMolChiralCenters

chiral_centers = FindMolChiralCenters(mol, includeUnassigned=True)

# Returns list of (atom_idx, chirality) tuples

# Assign stereochemistry from 3D coordinates

from rdkit.Chem import AssignStereochemistryFrom3D

AssignStereochemistryFrom3D(mol)

# Check bond stereochemistry

bond = mol.GetBondWithIdx(0)

stereo = bond.GetStereo() # STEREONONE, STEREOZ, STEREOE, etc.

```

Fragment Analysis:

```python

# Get disconnected fragments

frags = Chem.GetMolFrags(mol, asMols=True)

# Fragment on specific bonds

from rdkit.Chem import FragmentOnBonds

frag_mol = FragmentOnBonds(mol, [bond_idx1, bond_idx2])

# Count ring systems

from rdkit.Chem.Scaffolds import MurckoScaffold

scaffold = MurckoScaffold.GetScaffoldForMol(mol)

```

4. Molecular Descriptors and Properties

Basic Descriptors:

```python

from rdkit.Chem import Descriptors

# Molecular weight

mw = Descriptors.MolWt(mol)

exact_mw = Descriptors.ExactMolWt(mol)

# LogP (lipophilicity)

logp = Descriptors.MolLogP(mol)

# Topological polar surface area

tpsa = Descriptors.TPSA(mol)

# Number of hydrogen bond donors/acceptors

hbd = Descriptors.NumHDonors(mol)

hba = Descriptors.NumHAcceptors(mol)

# Number of rotatable bonds

rot_bonds = Descriptors.NumRotatableBonds(mol)

# Number of aromatic rings

aromatic_rings = Descriptors.NumAromaticRings(mol)

```

Batch Descriptor Calculation:

```python

# Calculate all descriptors at once

all_descriptors = Descriptors.CalcMolDescriptors(mol)

# Returns dictionary: {'MolWt': 180.16, 'MolLogP': 1.23, ...}

# Get list of available descriptor names

descriptor_names = [desc[0] for desc in Descriptors._descList]

```

Lipinski's Rule of Five:

```python

# Check drug-likeness

mw = Descriptors.MolWt(mol) <= 500

logp = Descriptors.MolLogP(mol) <= 5

hbd = Descriptors.NumHDonors(mol) <= 5

hba = Descriptors.NumHAcceptors(mol) <= 10

is_drug_like = mw and logp and hbd and hba

```

5. Fingerprints and Molecular Similarity

Fingerprint Types:

```python

from rdkit.Chem import AllChem, RDKFingerprint

from rdkit.Chem.AtomPairs import Pairs, Torsions

from rdkit.Chem import MACCSkeys

# RDKit topological fingerprint

fp = Chem.RDKFingerprint(mol)

# Morgan fingerprints (circular fingerprints, similar to ECFP)

fp = AllChem.GetMorganFingerprint(mol, radius=2)

fp_bits = AllChem.GetMorganFingerprintAsBitVect(mol, radius=2, nBits=2048)

# MACCS keys (166-bit structural key)

fp = MACCSkeys.GenMACCSKeys(mol)

# Atom pair fingerprints

fp = Pairs.GetAtomPairFingerprint(mol)

# Topological torsion fingerprints

fp = Torsions.GetTopologicalTorsionFingerprint(mol)

# Avalon fingerprints (if available)

from rdkit.Avalon import pyAvalonTools

fp = pyAvalonTools.GetAvalonFP(mol)

```

Similarity Calculation:

```python

from rdkit import DataStructs

# Calculate Tanimoto similarity

fp1 = AllChem.GetMorganFingerprintAsBitVect(mol1, radius=2)

fp2 = AllChem.GetMorganFingerprintAsBitVect(mol2, radius=2)

similarity = DataStructs.TanimotoSimilarity(fp1, fp2)

# Calculate similarity for multiple molecules

similarities = DataStructs.BulkTanimotoSimilarity(fp1, [fp2, fp3, fp4])

# Other similarity metrics

dice = DataStructs.DiceSimilarity(fp1, fp2)

cosine = DataStructs.CosineSimilarity(fp1, fp2)

```

Clustering and Diversity:

```python

# Butina clustering based on fingerprint similarity

from rdkit.ML.Cluster import Butina

# Calculate distance matrix

dists = []

fps = [AllChem.GetMorganFingerprintAsBitVect(mol, 2) for mol in mols]

for i in range(len(fps)):

sims = DataStructs.BulkTanimotoSimilarity(fps[i], fps[:i])

dists.extend([1-sim for sim in sims])

# Cluster with distance cutoff

clusters = Butina.ClusterData(dists, len(fps), distThresh=0.3, isDistData=True)

```

6. Substructure Searching and SMARTS

Basic Substructure Matching:

```python

# Define query using SMARTS

query = Chem.MolFromSmarts('[#6]1:[#6]:[#6]:[#6]:[#6]:[#6]:1') # Benzene ring

# Check if molecule contains substructure

has_match = mol.HasSubstructMatch(query)

# Get all matches (returns tuple of tuples with atom indices)

matches = mol.GetSubstructMatches(query)

# Get only first match

match = mol.GetSubstructMatch(query)

```

Common SMARTS Patterns:

```python

# Primary alcohols

primary_alcohol = Chem.MolFromSmarts('[CH2][OH1]')

# Carboxylic acids

carboxylic_acid = Chem.MolFromSmarts('C(=O)[OH]')

# Amides

amide = Chem.MolFromSmarts('C(=O)N')

# Aromatic heterocycles

aromatic_n = Chem.MolFromSmarts('[nR]') # Aromatic nitrogen in ring

# Macrocycles (rings > 12 atoms)

macrocycle = Chem.MolFromSmarts('[r{12-}]')

```

Matching Rules:

Unspecified properties in query match any value in target
Hydrogens are ignored unless explicitly specified
Charged query atom won't match uncharged target atom
Aromatic query atom won't match aliphatic target atom (unless query is generic)

7. Chemical Reactions

Reaction SMARTS:

```python

from rdkit.Chem import AllChem

# Define reaction using SMARTS: reactants >> products

rxn = AllChem.ReactionFromSmarts('[C:1]=[O:2]>>[C:1][O:2]') # Ketone reduction

# Apply reaction to molecules

reactants = (mol1,)

products = rxn.RunReactants(reactants)

# Products is tuple of tuples (one tuple per product set)

for product_set in products:

for product in product_set:

# Sanitize product

Chem.SanitizeMol(product)

```

Reaction Features:

Atom mapping preserves specific atoms between reactants and products
Dummy atoms in products are replaced by corresponding reactant atoms
"Any" bonds inherit bond order from reactants
Chirality preserved unless explicitly changed

Reaction Similarity:

```python

# Generate reaction fingerprints

fp = AllChem.CreateDifferenceFingerprintForReaction(rxn)

# Compare reactions

similarity = DataStructs.TanimotoSimilarity(fp1, fp2)

```

8. 2D and 3D Coordinate Generation

2D Coordinate Generation:

```python

from rdkit.Chem import AllChem

# Generate 2D coordinates for depiction

AllChem.Compute2DCoords(mol)

# Align molecule to template structure

template = Chem.MolFromSmiles('c1ccccc1')

AllChem.Compute2DCoords(template)

AllChem.GenerateDepictionMatching2DStructure(mol, template)

```

3D Coordinate Generation and Conformers:

```python

# Generate single 3D conformer using ETKDG

AllChem.EmbedMolecule(mol, randomSeed=42)

# Generate multiple conformers

conf_ids = AllChem.EmbedMultipleConfs(mol, numConfs=10, randomSeed=42)

# Optimize geometry with force field

AllChem.UFFOptimizeMolecule(mol) # UFF force field

AllChem.MMFFOptimizeMolecule(mol) # MMFF94 force field

# Optimize all conformers

for conf_id in conf_ids:

AllChem.MMFFOptimizeMolecule(mol, confId=conf_id)

# Calculate RMSD between conformers

from rdkit.Chem import AllChem

rms = AllChem.GetConformerRMS(mol, conf_id1, conf_id2)

# Align molecules

AllChem.AlignMol(probe_mol, ref_mol)

```

Constrained Embedding:

```python

# Embed with part of molecule constrained to specific coordinates

AllChem.ConstrainedEmbed(mol, core_mol)

```

9. Molecular Visualization

Basic Drawing:

```python

from rdkit.Chem import Draw

# Draw single molecule to PIL image

img = Draw.MolToImage(mol, size=(300, 300))

img.save('molecule.png')

# Draw to file directly

Draw.MolToFile(mol, 'molecule.png')

# Draw multiple molecules in grid

mols = [mol1, mol2, mol3, mol4]

img = Draw.MolsToGridImage(mols, molsPerRow=2, subImgSize=(200, 200))

```

Highlighting Substructures:

```python

# Highlight substructure match

query = Chem.MolFromSmarts('c1ccccc1')

match = mol.GetSubstructMatch(query)

img = Draw.MolToImage(mol, highlightAtoms=match)

# Custom highlight colors

highlight_colors = {atom_idx: (1, 0, 0) for atom_idx in match} # Red

img = Draw.MolToImage(mol, highlightAtoms=match,

highlightAtomColors=highlight_colors)

```

Customizing Visualization:

```python

from rdkit.Chem.Draw import rdMolDraw2D

# Create drawer with custom options

drawer = rdMolDraw2D.MolDraw2DCairo(300, 300)

opts = drawer.drawOptions()

# Customize options

opts.addAtomIndices = True

opts.addStereoAnnotation = True

opts.bondLineWidth = 2

# Draw molecule

drawer.DrawMolecule(mol)

drawer.FinishDrawing()

# Save to file

with open('molecule.png', 'wb') as f:

f.write(drawer.GetDrawingText())

```

Jupyter Notebook Integration:

```python

# Enable inline display in Jupyter

from rdkit.Chem.Draw import IPythonConsole

# Customize default display

IPythonConsole.ipython_useSVG = True # Use SVG instead of PNG

IPythonConsole.molSize = (300, 300) # Default size

# Molecules now display automatically

mol # Shows molecule image

```

Visualizing Fingerprint Bits:

```python

# Show what molecular features a fingerprint bit represents

from rdkit.Chem import Draw

# For Morgan fingerprints

bit_info = {}

fp = AllChem.GetMorganFingerprintAsBitVect(mol, radius=2, bitInfo=bit_info)

# Draw environment for specific bit

img = Draw.DrawMorganBit(mol, bit_id, bit_info)

```

10. Molecular Modification

Adding/Removing Hydrogens:

```python

# Add explicit hydrogens

mol_h = Chem.AddHs(mol)

# Remove explicit hydrogens

mol = Chem.RemoveHs(mol_h)

```

Kekulization and Aromaticity:

```python

# Convert aromatic bonds to alternating single/double

Chem.Kekulize(mol)

# Set aromaticity

Chem.SetAromaticity(mol)

```

Replacing Substructures:

```python

# Replace substructure with another structure

query = Chem.MolFromSmarts('c1ccccc1') # Benzene

replacement = Chem.MolFromSmiles('C1CCCCC1') # Cyclohexane

new_mol = Chem.ReplaceSubstructs(mol, query, replacement)[0]

```

Neutralizing Charges:

```python

# Remove formal charges by adding/removing hydrogens

from rdkit.Chem.MolStandardize import rdMolStandardize

# Using Uncharger

uncharger = rdMolStandardize.Uncharger()

mol_neutral = uncharger.uncharge(mol)

```

11. Working with Molecular Hashes and Standardization

Molecular Hashing:

```python

from rdkit.Chem import rdMolHash

# Generate Murcko scaffold hash

scaffold_hash = rdMolHash.MolHash(mol, rdMolHash.HashFunction.MurckoScaffold)

# Canonical SMILES hash

canonical_hash = rdMolHash.MolHash(mol, rdMolHash.HashFunction.CanonicalSmiles)

# Regioisomer hash (ignores stereochemistry)

regio_hash = rdMolHash.MolHash(mol, rdMolHash.HashFunction.Regioisomer)

```

Randomized SMILES:

```python

# Generate random SMILES representations (for data augmentation)

from rdkit.Chem import MolToRandomSmilesVect

random_smiles = MolToRandomSmilesVect(mol, numSmiles=10, randomSeed=42)

```

12. Pharmacophore and 3D Features

Pharmacophore Features:

```python

from rdkit.Chem import ChemicalFeatures

from rdkit import RDConfig

import os

# Load feature factory

fdef_path = os.path.join(RDConfig.RDDataDir, 'BaseFeatures.fdef')

factory = ChemicalFeatures.BuildFeatureFactory(fdef_path)

# Get pharmacophore features

features = factory.GetFeaturesForMol(mol)

for feat in features:

print(feat.GetFamily(), feat.GetType(), feat.GetAtomIds())

```

Common Workflows

Drug-likeness Analysis

```python

from rdkit import Chem

from rdkit.Chem import Descriptors

def analyze_druglikeness(smiles):

mol = Chem.MolFromSmiles(smiles)

if mol is None:

return None

# Calculate Lipinski descriptors

results = {

'MW': Descriptors.MolWt(mol),

'LogP': Descriptors.MolLogP(mol),

'HBD': Descriptors.NumHDonors(mol),

'HBA': Descriptors.NumHAcceptors(mol),

'TPSA': Descriptors.TPSA(mol),

'RotBonds': Descriptors.NumRotatableBonds(mol)

}

# Check Lipinski's Rule of Five

results['Lipinski'] = (

results['MW'] <= 500 and

results['LogP'] <= 5 and

results['HBD'] <= 5 and

results['HBA'] <= 10

)

return results

```

Similarity Screening

```python

from rdkit import Chem

from rdkit.Chem import AllChem

from rdkit import DataStructs

def similarity_screen(query_smiles, database_smiles, threshold=0.7):

query_mol = Chem.MolFromSmiles(query_smiles)

query_fp = AllChem.GetMorganFingerprintAsBitVect(query_mol, 2)

hits = []

for idx, smiles in enumerate(database_smiles):

mol = Chem.MolFromSmiles(smiles)

if mol:

fp = AllChem.GetMorganFingerprintAsBitVect(mol, 2)

sim = DataStructs.TanimotoSimilarity(query_fp, fp)

if sim >= threshold:

hits.append((idx, smiles, sim))

return sorted(hits, key=lambda x: x[2], reverse=True)

```

Substructure Filtering

```python

from rdkit import Chem

def filter_by_substructure(smiles_list, pattern_smarts):

query = Chem.MolFromSmarts(pattern_smarts)

hits = []

for smiles in smiles_list:

mol = Chem.MolFromSmiles(smiles)

if mol and mol.HasSubstructMatch(query):

hits.append(smiles)

return hits

```

Best Practices

Error Handling

Always check for None when parsing molecules:

```python

mol = Chem.MolFromSmiles(smiles)

if mol is None:

print(f"Failed to parse: {smiles}")

continue

```

Performance Optimization

Use binary formats for storage:

```python

import pickle

# Pickle molecules for fast loading

with open('molecules.pkl', 'wb') as f:

pickle.dump(mols, f)

# Load pickled molecules (much faster than reparsing)

with open('molecules.pkl', 'rb') as f:

mols = pickle.load(f)

```

Use bulk operations:

```python

# Calculate fingerprints for all molecules at once

fps = [AllChem.GetMorganFingerprintAsBitVect(mol, 2) for mol in mols]

# Use bulk similarity calculations

similarities = DataStructs.BulkTanimotoSimilarity(fps[0], fps[1:])

```

Thread Safety

RDKit operations are generally thread-safe for:

Molecule I/O (SMILES, mol blocks)
Coordinate generation
Fingerprinting and descriptors
Substructure searching
Reactions
Drawing

Not thread-safe: MolSuppliers when accessed concurrently.

Memory Management

For large datasets:

```python

# Use ForwardSDMolSupplier to avoid loading entire file

with open('large.sdf') as f:

suppl = Chem.ForwardSDMolSupplier(f)

for mol in suppl:

# Process one molecule at a time

pass

# Use MultithreadedSDMolSupplier for parallel processing

suppl = Chem.MultithreadedSDMolSupplier('large.sdf', numWriterThreads=4)

```

Common Pitfalls

Forgetting to check for None: Always validate molecules after parsing
Sanitization failures: Use DetectChemistryProblems() to debug
Missing hydrogens: Use AddHs() when calculating properties that depend on hydrogen
2D vs 3D: Generate appropriate coordinates before visualization or 3D analysis
SMARTS matching rules: Remember that unspecified properties match anything
Thread safety with MolSuppliers: Don't share supplier objects across threads

Resources

references/

This skill includes detailed API reference documentation:

api_reference.md - Comprehensive listing of RDKit modules, functions, and classes organized by functionality
descriptors_reference.md - Complete list of available molecular descriptors with descriptions
smarts_patterns.md - Common SMARTS patterns for functional groups and structural features

Load these references when needing specific API details, parameter information, or pattern examples.

scripts/

Example scripts for common RDKit workflows:

molecular_properties.py - Calculate comprehensive molecular properties and descriptors
similarity_search.py - Perform fingerprint-based similarity screening
substructure_filter.py - Filter molecules by substructure patterns

These scripts can be executed directly or used as templates for custom workflows.