🎯

genai-services

🎯Skill

from acedergren/oci-agent-skills

What it does

Optimizes OCI Generative AI services by providing expert guidance on model selection, cost management, token handling, and healthcare compliance.

📦

Part of

acedergren/oci-agent-skills(19 items)

genai-services

Installation

Quick InstallInstall with npx

npx skills add acedergren/oci-agent-skills

Add MarketplaceAdd marketplace to Claude Code

/plugin marketplace add acedergren/oci-agent-skills

Install PluginInstall plugin from marketplace

/plugin install oci-agent-skills

git cloneClone repository

git clone https://github.com/acedergren/oci-agent-skills.git ~/.claude/plugins/oci-agent-skills

npxRun with npx

npx skills

Claude Desktop ConfigurationAdd this to your claude_desktop_config.json

{
  "mcpServers": {
    "oci-api": {
      "disabled": true
    }
  }
}...

📖 Extracted from docs: acedergren/oci-agent-skills

Need more details? View full documentation on GitHub →

2Installs

AddedFeb 4, 2026

View on GitHub Back to Skills

Skill Details

SKILL.md

Use when implementing OCI GenAI inference APIs, troubleshooting rate limits or token errors, optimizing GenAI costs, or handling sensitive data (PHI/PII) in prompts. Covers model selection, cost calculations, token management, response validation, and healthcare/compliance considerations.

Overview

# OCI Generative AI Services - Expert Knowledge

🏗️ Use OCI Landing Zone Terraform Modules

Don't reinvent the wheel. Use [oracle-terraform-modules/landing-zone](https://github.com/orgs/oci-landing-zones/repositories) for GenAI infrastructure.

Landing Zone solves:

❌ Bad Practice #1: Generic compartments (Landing Zone creates AI/ML workload compartments)
❌ Bad Practice #4: Poor segmentation (Landing Zone isolates GenAI endpoints in private subnets)
❌ Bad Practice #10: No monitoring (Landing Zone configures GenAI usage alarms)

This skill provides: GenAI cost optimization, rate limits, PHI/PII security, and troubleshooting for GenAI deployed WITHIN a Landing Zone.

---

⚠️ OCI CLI/API Knowledge Gap

You don't know OCI CLI commands or OCI API structure.

Your training data has limited and outdated knowledge of:

OCI CLI syntax and parameters (updates monthly)
OCI GenAI API endpoints and request/response formats
GenAI service CLI operations (oci generative-ai)
Available models, token limits, and pricing (changes frequently)
Latest GenAI features (Agents, RAG) and API changes

When OCI operations are needed:

Use exact CLI commands from this skill's references
Do NOT guess OCI CLI syntax or parameters
Do NOT assume model availability or pricing
Load reference files for detailed GenAI API documentation

What you DO know:

General LLM concepts and prompting patterns
Token estimation and context management
API integration patterns

This skill bridges the gap by providing current OCI GenAI-specific patterns and gotchas.

---

You are an OCI GenAI expert. This skill provides knowledge Claude lacks: cost optimization specifics, token management, rate limit handling, PHI/PII security, response validation, and model selection trade-offs.

NEVER Do This

❌ NEVER send PHI/PII identifiers to GenAI APIs (HIPAA/GDPR violation)

```python

# WRONG - patient identifiers sent to external service

prompt = f"Transcribe note for patient {patient_name}, MRN {mrn}, SSN {ssn}: {note}"

# RIGHT - redact identifiers

prompt = f"Transcribe this medical note: {redacted_note}"

# Keep mapping: temp_id → real_id in secure database, not in prompts

```

Why critical: GenAI service logs may retain data, violates healthcare regulations

❌ NEVER trust GenAI output without validation (hallucination risk)

```python

# WRONG - use response directly in critical systems

diagnosis = genai_response.text

db.execute("UPDATE patients SET diagnosis = ?", diagnosis)

# RIGHT - validate structure and flag for human review

response = genai_response.text

if validate_medical_format(response):

db.execute("UPDATE patients SET ai_suggested_diagnosis = ?, status = 'PENDING_REVIEW'", response)

```

Hallucination rate: 5-15% for factual queries, higher for medical/legal domains

❌ NEVER ignore token limits

command-r-plus: 128k context window (input + output)
command-r: 4k context (much cheaper but limited)
Exceeding limit: Request truncated silently or fails with 400 error

❌ NEVER call GenAI without rate limit handling

```python

# WRONG - no retry logic, fails on rate limit

response = genai_client.chat(request)

# RIGHT - exponential backoff

def call_with_retry(func, max_retries=5):

for attempt in range(max_retries):

try:

return func()

except oci.exceptions.ServiceError as e:

if e.status == 429 and attempt < max_retries - 1:

wait = (2 ** attempt) + random.uniform(0, 1)

logger.warning(f"Rate limited, retry in {wait:.2f}s")

time.sleep(wait)

else:

raise

```

❌ NEVER cache responses without consent (data privacy)

Caching saves costs BUT may violate privacy policies
Get explicit user consent before caching medical/personal data
Cache anonymized data only

❌ NEVER use GenAI for deterministic tasks

Wrong: "Extract invoice total from OCR text" (use regex/structured parsing)
Wrong: "Validate email format" (use validation library)
Right: "Summarize patient history", "Generate report narrative" (creative tasks)

Model Selection: Cost vs Performance

|-------|---------|---------------------|----------|-----------|

Cost optimization strategy:

Use embeddings for search first (1000x cheaper than generation)
Cache responses for repeated queries (with consent)
Use command-r for simple tasks, command-r-plus only when needed
Truncate input intelligently (keep relevant context only)

Cost Calculation Examples

Scenario: Medical transcription service

Average note: 500 tokens input, 300 tokens output = 800 tokens total
1000 notes/day = 800k tokens/day = 24M tokens/month

Without optimization:

```

Model: command-r-plus

Input: 12M × ($15/1M) = $180/month

Output: 12M × ($75/1M) = $900/month

Total: $1,080/month

```

With optimization (30% cache hit, use command-r for simple notes):

```

70% unique notes = 16.8M tokens

60% simple (command-r): 10M × $1.50 input + $7.50 output = $90

40% complex (command-r-plus): 6.72M × $15 input + $75 output = $605

Total: $695/month (36% savings)

```

Token Management

Token Limits by Model

|-------|-------------|------------|-------|

| command-r | 4k | 2k | Good for chat |

| embed-english-v3 | 512 | N/A | Embeddings only |

Truncation Strategy

```python

def truncate_for_model(text: str, model: str = "command-r-plus", max_output: int = 2000):

"""Truncate input to fit token budget"""

# Rough estimate: 1 token ≈ 4 characters

if model == "command-r-plus":

max_input_tokens = 128000 - max_output

elif model == "command-r":

max_input_tokens = 4000 - max_output

else:

max_input_tokens = 2000

max_chars = max_input_tokens * 4

if len(text) <= max_chars:

return text

# Keep most recent content (chronological data like logs, notes)

logger.warning(f"Input exceeds {max_input_tokens} tokens, truncating")

return "...[earlier content truncated]...\n" + text[-max_chars:]

```

Prompt Optimization

Inefficient (wastes tokens):

```

Please carefully analyze the following medical record and provide a comprehensive

summary including all diagnoses, medications, allergies, and treatment plans. Be

thorough and include all relevant details from the patient's history...

[5000 word medical record]

```

Optimized (40% token reduction):

```

Summarize: diagnoses, meds, allergies, treatment plan.

[5000 word medical record]

```

Token savings: ~50 tokens on prompt × 1000 requests/day = 50k tokens/day saved = $2.25/day ($68/month)

Rate Limits

OCI GenAI Service Limits (per compartment):

|-------|-----------------|--------------|----------------|

| command-r-plus | 20 | 1000 | 128k |

| command-r | 60 | 3000 | 4k |

| Embeddings | 100 | 10000 | 512 |

Error Handling

```python

import time

import random

from oci.exceptions import ServiceError

def generate_with_backoff(genai_client, request, max_retries=5):

"""Call GenAI with exponential backoff on rate limits"""

for attempt in range(max_retries):

try:

response = genai_client.chat(request)

return response.data.chat_response.text

except ServiceError as e:

if e.status == 429: # Rate limit

if attempt < max_retries - 1:

# Exponential backoff: 1s, 2s, 4s, 8s, 16s

wait = (2 ** attempt) + random.uniform(0, 1)

logger.warning(f"Rate limited (429), retry {attempt+1}/{max_retries} in {wait:.1f}s")

time.sleep(wait)

else:

logger.error(f"Rate limit exceeded after {max_retries} retries")

raise

elif e.status == 400: # Bad request (often token limit)

logger.error(f"Bad request (400): {e.message}")

if "token" in e.message.lower():

logger.error("Token limit exceeded - truncate input")

raise

else:

logger.error(f"GenAI error ({e.status}): {e.message}")

raise

```

Response Validation (Critical for Healthcare)

```python

def validate_medical_response(response: str) -> tuple[bool, list[str]]:

"""Validate GenAI medical response for safety"""

issues = []

# Check 1: Response not empty

if not response or len(response.strip()) < 10:

issues.append("Response too short or empty")

# Check 2: No obvious hallucination markers

hallucination_markers = [

"I don't have access",

"I cannot",

"As an AI",

"[INSERT",

"TODO",

]

for marker in hallucination_markers:

if marker.lower() in response.lower():

issues.append(f"Potential hallucination marker: {marker}")

# Check 3: Expected structure present (customize per use case)

required_sections = ["Chief Complaint", "Assessment", "Plan"]

missing_sections = [s for s in required_sections if s.lower() not in response.lower()]

if missing_sections:

issues.append(f"Missing sections: {missing_sections}")

# Check 4: No PII leak (if input was redacted)

pii_patterns = [

r'\b\d{3}-\d{2}-\d{4}\b', # SSN

r'\b[A-Z]{2}\d{6,8}\b', # MRN patterns

]

for pattern in pii_patterns:

if re.search(pattern, response):

issues.append(f"Potential PII in response: {pattern}")

is_valid = len(issues) == 0

return is_valid, issues

# Usage

response_text = genai_response.data.chat_response.text

is_valid, issues = validate_medical_response(response_text)

if is_valid:

store_for_review(response_text)

else:

logger.warning(f"Invalid response: {issues}")

flag_for_manual_review(response_text, issues)

```

Healthcare-Specific Considerations

HIPAA Compliance

Minimum requirements:

✅ Business Associate Agreement (BAA) with Oracle
✅ PHI redaction before sending to GenAI
✅ Audit logging of all GenAI API calls
✅ Encryption in transit and at rest
✅ Access controls (who can call GenAI)
✅ Data retention policies (how long to keep prompts/responses)

Never assume GenAI is HIPAA-compliant by default - verify BAA coverage with Oracle

De-identification Strategy

```python

def redact_phi(text: str) -> tuple[str, dict]:

"""Remove PHI from text, return redacted text + mapping"""

mapping = {}

redacted = text

# Patient names (use NER or pattern matching)

names = extract_names(text) # Your NER function

for i, name in enumerate(names):

placeholder = f"[PATIENT_{i}]"

mapping[placeholder] = name

redacted = redacted.replace(name, placeholder)

# Medical Record Numbers

mrn_pattern = r'\b(MRN|Medical Record):?\s*([A-Z0-9]{6,10})\b'

redacted = re.sub(mrn_pattern, r'\1: [REDACTED]', redacted)

# SSN

ssn_pattern = r'\b\d{3}-\d{2}-\d{4}\b'

redacted = re.sub(ssn_pattern, '[SSN_REDACTED]', redacted)

# Dates (optional - some use cases need dates)

# date_pattern = r'\b\d{1,2}/\d{1,2}/\d{4}\b'

# redacted = re.sub(date_pattern, '[DATE]', redacted)

return redacted, mapping

# Usage

redacted_note, phi_mapping = redact_phi(patient_note)

genai_response = genai_client.chat(prompt=f"Summarize: {redacted_note}")

# Store phi_mapping securely, use to re-identify if needed

```

Progressive Loading References

OCI Generative AI Reference (Official Oracle Documentation)

WHEN TO LOAD [oci-genai-reference.md](references/oci-genai-reference.md):

Need comprehensive GenAI API documentation
Understanding all available models and capabilities
Implementing RAG (Retrieval-Augmented Generation) with OCI
Need official Oracle guidance on GenAI Agents
Understanding fine-tuning and custom model deployment

Do NOT load for:

Quick API usage examples (covered in this skill)
Model selection guidance (decision tree above)
Cost calculations (formulas above)

---

When to Use This Skill

GenAI API implementation: model selection, cost estimation, SDK usage
Error troubleshooting: rate limits (429), token limits (400), authentication
Cost optimization: caching strategy, model downgrade, prompt optimization
Healthcare/compliance: PHI handling, HIPAA requirements, audit logging
Response validation: hallucination detection, structure checking
Production: rate limit handling, error recovery, monitoring

More from this repository10

🎯

networking-management🎯Skill

Manages OCI network design, troubleshooting, security configuration, and cost optimization with expert-level networking insights and best practices.

🎯

finops-cost-optimization🎯Skill

Optimizes Oracle Cloud Infrastructure (OCI) costs by identifying hidden expenses, calculating shape migration savings, and maximizing free tier usage.

🎯

oracle-dba🎯Skill

Manages Oracle Autonomous Database on OCI, providing expert guidance on performance tuning, cost optimization, and infrastructure best practices.

🎯

compute-management🎯Skill

Manages OCI compute instances by optimizing costs, resolving capacity issues, and handling lifecycle operations with expert-level precision.

🎯

infrastructure-as-code🎯Skill

Provides expert guidance for writing robust Terraform infrastructure-as-code for Oracle Cloud Infrastructure (OCI), focusing on best practices, landing zone modules, and avoiding common implementat...

🎯

landing-zones🎯Skill

Designs and implements secure, scalable OCI multi-tenant landing zones with best-practice compartment hierarchies, network topologies, and governance foundations.

🎯

secrets-management🎯Skill

Manages OCI Vault secrets with expert guidance on secure retrieval, rotation, IAM permissions, and operational best practices.

🎯

database-management🎯Skill

Manages Oracle Cloud Infrastructure (OCI) Autonomous Databases by providing expert guidance on connection, configuration, cost optimization, and troubleshooting operations.

🎯

oci-events🎯Skill

Enables event-driven automation in Oracle Cloud Infrastructure by configuring CloudEvents rules, actions, and integrations with Functions, Streaming, and Notifications.

🎯

monitoring-operations🎯Skill

Guides OCI monitoring setup by providing expert troubleshooting for metrics, alarms, and log collection across complex cloud environments.