Speech & Text Processing Components

AI-powered speech and text utilities unlock voice interfaces, advanced parsing, and smarter context management in your Forjinn workflows. This guide documents built-in Speech-to-Text, Text Splitter, and related modules.

Speech-to-Text (STT) Nodes

These nodes convert audio (uploaded or streamed) into usable, searchable text.

Supported Providers

OpenAI Whisper: Cloud-based, multi-language, high accuracy, used for uploads and real-time
AssemblyAI, Google STT, etc: Alternative cloud STT when enabled

Inputs

Audio file (wav, mp3, ogg, etc.) or stream
Specify language, prompt/context (optional)

Outputs

Transcribed text
Optional: word timing, speaker diarization, confidence

Use Cases

Voice chatbots/IVR
Meeting/call transcription
Multimodal data search (voice+text context)

Text Splitters

Text Splitter nodes preprocess long documents or user input for chunking—critical for RAG, search, and LLM context building.

Options

By Characters: Fixed number of chars per chunk
By Tokens: (preferred for LLMs, e.g., 500 tokens/chunk)
By Sentence/Paragraph: Use built-in or custom splitter logic
Overlap: Optional step for retaining context across splits

Use Cases

Pre-indexing documents for vector search
Breaking up chat logs, lengthy user submissions, etc

Text Processing Utilities

Normalizer/Cleaner: Whitespace, encoding, character set fix
Summarizer: Quick auto-summarize for long or multi-part docs
Keyword Extraction: Highlight or extract key topics/entities
Regex/Custom Parsers: Build or use provided plugins for special cases

Example: Voice-Driven Support Bot

User uploads voice memo
Speech-to-Text node transcribes to text
Text Splitter node (if long)
LLM/agent node answers or routes based on transcript

Best Practices

Always preprocess/upload audio in a supported format
Use correct langs/prompts to improve STT accuracy
Tune chunking strategy by task (retrieval = small+overlap, doc summarization = longer)
Chain normalizer before embedding for best search

Troubleshooting

Poor accuracy: Try another provider or adjust input format/preprocessing
Missing chunks: Lower chunk size, increase overlap, check log for parse errors
Performance: For long docs/audio, use batch or background jobs to avoid blocking UI

Speech and advanced text nodes power truly multimodal, robust, and flexible automations for both human and machine users.

Forjinn Docs

Speech Text Processing

Speech & Text Processing Components

Speech-to-Text (STT) Nodes

Supported Providers

Inputs

Outputs

Use Cases

Text Splitters

Options

Use Cases

Text Processing Utilities

Example: Voice-Driven Support Bot

Best Practices

Troubleshooting

Related Documentation

Related Documentation

Chatflows

Agents

Tools