Forjinn Docs

Development Platform

Documentation v2.0
Made with
by Forjinn

Speech Text Processing

Learn about speech text processing and how to implement it effectively.

2 min read
🆕Recently updated
Last updated: 12/9/2025

Speech & Text Processing Components

AI-powered speech and text utilities unlock voice interfaces, advanced parsing, and smarter context management in your Forjinn workflows. This guide documents built-in Speech-to-Text, Text Splitter, and related modules.


Speech-to-Text (STT) Nodes

These nodes convert audio (uploaded or streamed) into usable, searchable text.

Supported Providers

  • OpenAI Whisper: Cloud-based, multi-language, high accuracy, used for uploads and real-time
  • AssemblyAI, Google STT, etc: Alternative cloud STT when enabled

Inputs

  • Audio file (wav, mp3, ogg, etc.) or stream
  • Specify language, prompt/context (optional)

Outputs

  • Transcribed text
  • Optional: word timing, speaker diarization, confidence

Use Cases

  • Voice chatbots/IVR
  • Meeting/call transcription
  • Multimodal data search (voice+text context)

Text Splitters

Text Splitter nodes preprocess long documents or user input for chunking—critical for RAG, search, and LLM context building.

Options

  • By Characters: Fixed number of chars per chunk
  • By Tokens: (preferred for LLMs, e.g., 500 tokens/chunk)
  • By Sentence/Paragraph: Use built-in or custom splitter logic
  • Overlap: Optional step for retaining context across splits

Use Cases

  • Pre-indexing documents for vector search
  • Breaking up chat logs, lengthy user submissions, etc

Text Processing Utilities

  • Normalizer/Cleaner: Whitespace, encoding, character set fix
  • Summarizer: Quick auto-summarize for long or multi-part docs
  • Keyword Extraction: Highlight or extract key topics/entities
  • Regex/Custom Parsers: Build or use provided plugins for special cases

Example: Voice-Driven Support Bot

  1. User uploads voice memo
  2. Speech-to-Text node transcribes to text
  3. Text Splitter node (if long)
  4. LLM/agent node answers or routes based on transcript

Best Practices

  • Always preprocess/upload audio in a supported format
  • Use correct langs/prompts to improve STT accuracy
  • Tune chunking strategy by task (retrieval = small+overlap, doc summarization = longer)
  • Chain normalizer before embedding for best search

Troubleshooting

  • Poor accuracy: Try another provider or adjust input format/preprocessing
  • Missing chunks: Lower chunk size, increase overlap, check log for parse errors
  • Performance: For long docs/audio, use batch or background jobs to avoid blocking UI

Speech and advanced text nodes power truly multimodal, robust, and flexible automations for both human and machine users.