Forjinn Docs

Development Platform

Documentation v2.0
Made with
by Forjinn

Performance Optimization

Learn about performance optimization and how to implement it effectively.

2 min read
🆕Recently updated
Last updated: 12/9/2025

Performance Optimization & Scaling

Optimizing performance in InnoSynth-Forjinn ensures your AI workflows respond quickly, scale smoothly, and stay within budget. The platform supports powerful metrics analysis and advanced tuning—these approaches apply to both self-hosted and cloud deployments.


Key Metrics & What to Watch

  • Request Latency: End-to-end time for predictions and API calls. Aim: <2s interactive, <500ms API.
  • Token Usage: Track LLM tokens per call/flow for budget control.
  • Active Workers: Number of concurrent job processors. Adjust for traffic.
  • Memory/CPU Consumption: Especially for large models/tools.
  • Cache Hit Rate: For retrievers, datasets, embedding generators.
  • API Throttle/Ratelimit: Avoid user-side wait times or skipped jobs.

Profiling and Benchmarking

  • Use platform metrics dashboard (if enabled) for high-level view.
  • Enable detailed log/tracing per node—see which step/agent adds delay.
  • Compare flow performance pre/post tweak: clone, run batch, measure.
  • For code heavy flows, add timing in Custom Function nodes.

Scaling Patterns

Caching

  • Use Redis, Momento, or platform in-memory cache for repeated retrievals/chunks.
  • Cache agent results for common/expensive prompts.

Load Balancing

  • In Docker: Use Traefik/Nginx/HAProxy in front of multiple containers.
  • In K8s: Set up Horizontal Pod Autoscaler (+ Worker Pool for heavy jobs).
  • Separate web/API from worker nodes—let heavy jobs queue and process asynchronously.

Request Batching

  • In batch workflows (eval, retriever), send/score multiple samples in one LLM/API call to cut roundtrips.

Config Best Practices

  • Tune LLM/retriever chunk sizes, prompt length, and model params for speed/quality balance.
  • Use persistent external DB and file storage for high-availability setups.
  • Leverage platform’s built-in prefetch/paginate for large dataset or report queries.

Troubleshooting Slow Performance

  • Node bottleneck: See logs for slowest step in execution trace.
  • LLM rate limits: Reduce concurrency or use secondary provider for spillover.
  • Database slowness: Ensure DB is on SSD, scale up vertical resources.
  • Cache misses: Check cache deployment and connection status.

Advanced/Enterprise Scaling

  • Use managed cloud databases (RDS, CloudSQL, CosmosDB) for production.
  • Integrate with Prometheus, Grafana for deep custom metrics.
  • Periodically restart workers/pods to avoid memory leaks/GC pauses for long-running tasks.

Scaling is about identifying real-world bottlenecks, automating what you can, and always measuring before optimizing. Confidently run Forjinn for thousands of users/flows with the right config and monitoring in place.