🎯

monitoring-operations

🎯Skill

from acedergren/oci-agent-skills

VibeIndex|
What it does

Guides OCI monitoring setup by providing expert troubleshooting for metrics, alarms, and log collection across complex cloud environments.

πŸ“¦

Part of

acedergren/oci-agent-skills(19 items)

monitoring-operations

Installation

Quick InstallInstall with npx
npx skills add acedergren/oci-agent-skills
Add MarketplaceAdd marketplace to Claude Code
/plugin marketplace add acedergren/oci-agent-skills
Install PluginInstall plugin from marketplace
/plugin install oci-agent-skills
git cloneClone repository
git clone https://github.com/acedergren/oci-agent-skills.git ~/.claude/plugins/oci-agent-skills
npxRun with npx
npx skills
Claude Desktop ConfigurationAdd this to your claude_desktop_config.json
{ "mcpServers": { "oci-api": { "disabled": true } } }...
πŸ“– Extracted from docs: acedergren/oci-agent-skills
2Installs
-
AddedFeb 4, 2026

Skill Details

SKILL.md

Use when setting up metrics, alarms, or troubleshooting missing data in OCI Monitoring. Covers metric namespace confusion, alarm threshold gotchas, log collection setup, and common monitoring gaps.

Overview

# OCI Monitoring and Observability - Expert Knowledge

πŸ—οΈ Use OCI Landing Zone Terraform Modules

Don't reinvent the wheel. Use [oracle-terraform-modules/landing-zone](https://github.com/oracle-terraform-modules/terraform-oci-landing-zones) for observability stack.

Landing Zone solves:

  • ❌ Bad Practice #10: No logging, monitoring, notifications (Landing Zone deploys complete observability)
  • ❌ Bad Practice #7: Limited security services (Landing Zone integrates Cloud Guard, VSS, OSMS)

This skill provides: Metrics, alarms, and troubleshooting for monitoring deployed WITHIN a Landing Zone.

---

⚠️ OCI CLI/API Knowledge Gap

You don't know OCI CLI commands or OCI API structure.

Your training data has limited and outdated knowledge of:

  • OCI CLI syntax and parameters (updates monthly)
  • OCI API endpoints and request/response formats
  • Monitoring service CLI operations (oci monitoring alarm, oci monitoring metric)
  • Metric namespaces and MQL (Monitoring Query Language)
  • Latest Logging and Service Connector features

When OCI operations are needed:

  1. Use exact CLI commands from this skill's references
  2. Do NOT guess metric namespace names
  3. Do NOT assume AWS CloudWatch patterns work in OCI
  4. Load reference files for detailed MQL documentation

What you DO know:

  • General observability concepts
  • Alerting and threshold design principles
  • Log aggregation patterns

This skill bridges the gap by providing current OCI-specific monitoring patterns and gotchas.

---

NEVER Do This

❌ NEVER assume metrics are instant (10-15 minute lag)

  • Metrics published every 1-5 minutes
  • Processing delay: 5-10 minutes
  • Total lag: 10-15 minutes from event to visible metric
  • Don't debug "missing metrics" within first 15 minutes of resource creation

❌ NEVER use = for alarm thresholds with sparse metrics

```

# WRONG - alarm never fires if metric has gaps

MetricName[1m].mean() = 0

# RIGHT - handle missing data

MetricName[1m]{dataMissing=zero}.mean() > 0

```

❌ NEVER forget metric dimensions (causes "no data")

```

# WRONG - missing required dimension

CPUUtilization[1m].mean()

# RIGHT - include resourceId dimension

CPUUtilization[1m]{resourceId=""}.mean()

```

❌ NEVER set alarm thresholds without trigger delay (alert fatigue)

```

# BAD - fires on every CPU spike

CPUUtilization[1m].mean() > 80

# BETTER - sustained high CPU

CPUUtilization[5m].mean() > 80

Trigger delay: 5 minutes (fires after 5 consecutive breaches)

```

❌ NEVER create alarms without notification channels

```

# WRONG - alarm fires but nobody knows

oci monitoring alarm create ... --destinations '[]'

# RIGHT - always link to notification topic

oci monitoring alarm create ... --destinations '[""]'

```

Cost impact: Undetected outages cost $5,000-50,000/hour in production

❌ NEVER ignore Cloud Guard findings (security audit failure)

  • Cloud Guard detects misconfigurations BEFORE they become incidents
  • Integrate Cloud Guard β†’ Notifications β†’ Email/Slack/PagerDuty
  • Cost impact: $100,000+ per security breach vs $0 for proactive remediation

Metric Namespace Gotchas

OCI Metrics Use Service-Specific Namespaces:

| Service | Namespace | Example Metric |

|---------|-----------|----------------|

| Compute | oci_computeagent | CPUUtilization, MemoryUtilization |

| Autonomous DB | oci_autonomous_database | CpuUtilization, StorageUtilization |

| Load Balancer | oci_lbaas | HttpRequests, UnHealthyBackendServers |

| Object Storage | oci_objectstorage | ObjectCount, BytesUploaded |

Common Mistake: Using wrong namespace (oci_compute vs oci_computeagent)

Alarm Missing Data Handling

| Setting | Behavior | Use When |

|---------|----------|----------|

| treatMissingDataAsBreaching | Alarm fires if no data | Critical services (outage = breach) |

| treatMissingDataAsNotBreaching | Alarm silent if no data | Optional monitoring |

| {dataMissing=zero} | Treat missing as 0 | Counters (requests/sec) |

Log Collection Common Gaps

Problem: Logs not showing in Log Analytics

```

Logs not appearing?

β”œβ”€ Is log enabled on resource?

β”‚ └─ Compute: oci-compute-agent must be running

β”‚ └─ Function: Logging enabled in function config

β”‚

β”œβ”€ Is Service Connector configured?

β”‚ └─ Source: Log Group β†’ Target: Log Analytics

β”‚ └─ Check: Service Connector status = ACTIVE

β”‚

β”œβ”€ IAM policy for Service Connector?

β”‚ └─ "Allow any-user to use log-content in tenancy"

β”‚ └─ "Allow service loganalytics to READ logcontent in tenancy"

β”‚

└─ 10-15 minute ingestion lag?

└─ Wait before debugging

```

Metric Query Optimization

Expensive (slow):

```

# Queries ALL instances

CPUUtilization[1m].mean()

```

Optimized (filter by dimension):

```

# Query specific instance

CPUUtilization[1m]{resourceId=''}.mean()

```

Cost: Queries free, but rate limited (1000 req/min)

Progressive Loading References

OCI Monitoring Reference (Official Oracle Documentation)

WHEN TO LOAD [oci-monitoring-reference.md](references/oci-monitoring-reference.md):

  • Need comprehensive list of all OCI service metrics
  • Understanding MQL (Monitoring Query Language) in depth
  • Implementing complex alarm conditions and composites
  • Need official Oracle guidance on Logging and Service Connector
  • Setting up Log Analytics and APM integration

Do NOT load for:

  • Quick alarm setup (examples in this skill)
  • Common metric patterns (tables above)
  • Troubleshooting decision trees (covered above)

---

When to Use This Skill

  • Alarms: threshold configuration, missing data handling, trigger delay
  • Troubleshooting: metrics not showing, alarms not firing, namespace errors
  • Log collection: Service Connector, IAM policies, missing logs
  • Performance: query optimization, dimension filtering

More from this repository10

🎯
networking-management🎯Skill

Manages OCI network design, troubleshooting, security configuration, and cost optimization with expert-level networking insights and best practices.

🎯
finops-cost-optimization🎯Skill

Optimizes Oracle Cloud Infrastructure (OCI) costs by identifying hidden expenses, calculating shape migration savings, and maximizing free tier usage.

🎯
oracle-dba🎯Skill

Manages Oracle Autonomous Database on OCI, providing expert guidance on performance tuning, cost optimization, and infrastructure best practices.

🎯
compute-management🎯Skill

Manages OCI compute instances by optimizing costs, resolving capacity issues, and handling lifecycle operations with expert-level precision.

🎯
infrastructure-as-code🎯Skill

Provides expert guidance for writing robust Terraform infrastructure-as-code for Oracle Cloud Infrastructure (OCI), focusing on best practices, landing zone modules, and avoiding common implementat...

🎯
landing-zones🎯Skill

Designs and implements secure, scalable OCI multi-tenant landing zones with best-practice compartment hierarchies, network topologies, and governance foundations.

🎯
secrets-management🎯Skill

Manages OCI Vault secrets with expert guidance on secure retrieval, rotation, IAM permissions, and operational best practices.

🎯
database-management🎯Skill

Manages Oracle Cloud Infrastructure (OCI) Autonomous Databases by providing expert guidance on connection, configuration, cost optimization, and troubleshooting operations.

🎯
oci-events🎯Skill

Enables event-driven automation in Oracle Cloud Infrastructure by configuring CloudEvents rules, actions, and integrations with Functions, Streaming, and Notifications.

🎯
genai-services🎯Skill

Optimizes OCI Generative AI services by providing expert guidance on model selection, cost management, token handling, and healthcare compliance.