Forjinn Docs

Development Platform

Documentation v2.0
Made with
by Forjinn

Monitoring Alerting

Learn about monitoring alerting and how to implement it effectively.

2 min read
🆕Recently updated
Last updated: 12/9/2025

Monitoring & Alerting Guide

Proactive Monitoring & Alerting is vital to keep InnoSynth-Forjinn reliable, performant, and secure. This guide explains key metrics, integration options, alert setups, and best practices for both cloud and on-prem deployments.


Metrics and Logging

Core Platform Metrics

  • API/Agent Latency
  • Active Requests/Queue Depth
  • Worker/Pod CPU & Memory
  • LLM/Token Usage (flow, workspace, org, global)
  • Error Rates (per flow, node, agent)
  • Disk Usage (uploads, logs, DB)

Log Types

  • Access logs (user logins, API usage)
  • System logs (errors, crashes, restarts)
  • Audit logs (admin actions, sensitive changes)
  • Application/agent execution traces

Monitoring Integrations

  • Prometheus: Native metrics via /metrics endpoint.
    • Scrape flows, workers, system containers.
    • Expose to Grafana or another dashboard for visualization.
  • ELK (Elasticsearch/Logstash/Kibana): Ship logs for search, alerting.
  • Cloud Logging: Export to AWS CloudWatch, GCP Logging, or Azure Monitor.
  • Application Performance Monitoring (APM): Datadog, NewRelic, Sentry supported for error/trace capture.

Example: Prometheus + Grafana Setup

  1. Enable metrics on platform (PROMETHEUS_METRICS_ENABLED=true)
  2. Add scrape config for platform container
  3. Import prebuilt Grafana dashboards (if available)

Alerting

  • Built-in Alert Rules: E.g., “>X errors in Y min,” “high latency,” “failed logins,” “low disk.”
  • Alert Channels: E-mail, Slack, PagerDuty, custom webhook
  • Custom Rules: Configure via your monitoring/alerting provider or scripts.

Best Practices

  • Monitor all production and staging deployments
  • Alert on what matters most: availability, errors, and resource exhaustion
  • Regularly test your alerts and revise thresholds as you scale
  • Backup monitoring configs and dashboards

Troubleshooting

  • No metrics/alerts? Check endpoints, scrape configs, environment variables
  • Too many alerts (“noise”)? Increase thresholds, add suppression, alert only on persistent/critical events
  • Missing log entries: Verify log rotation configs and storage health

Strong monitoring and alerting is the foundation for reliable AI ops—never go live without it.