File Loader
Learn about file loader and how to implement it effectively.
3 min read
🆕Recently updated
Last updated: 12/9/2025
File Loader (DocumentLoaders Node)
The File Loader node in InnoSynth-Forjinn enables ingestion of a wide variety of documents—PDF, TXT, DOCX, CSV, and others—into your knowledge base or retrieval pipeline. It is foundational for building RAG (Retrieval Augmented Generation) chatflows, allowing the platform to index, chunk, and query your own files.
What Does the File Loader Node Do?
- Loads a file (or batch of files) from the local filesystem, cloud storage, or an upload form.
- Handles automatic chunking of documents (e.g., by paragraph, page, or custom logic).
- Normalizes text (cleans up whitespace, encoding) for downstream processing.
- Passes chunks for embedding (e.g., to OpenAI Embedding, HuggingFace, etc).
- Sends processed data to vector stores for semantic search.
Required Setup
- File Path/Upload: Set file path, or use platform upload button for manual ingest in UI.
- Chunk Size & Overlap: Configure chunking parameters (500-1000 tokens is common); overlap maintains context in split text.
- Accepted File Types: PDF, TXT, DOCX, CSV, Markdown, HTML, and other standard doc types (may require enabling plugins for images, code, etc).
- Labels/Tags (Optional): Useful for organizing, filtering, and searching within large sets of imported documents.
Typical Usage in Workflows
Knowledge Base Build Example:
File Loader → TextSplitter (optional) → Embedding Node → Vector Store
- Repeat for multiple files; use different loader nodes for different formats or folders.
- Combine with Dataset node for evaluation or Q/A benchmarking.
Configuration Fields
- File Path/Input:
./documents/support_faq.pdfor file upload - Chunk Size: Integer (tokens/chars)
- Overlap: Integer; chunk overlap for better context
- Labels/Tags: Free-form or predefined taxonomy
Outputs
- Chunks: Array of normalized text segments with associated metadata (file, page, chunk id)
- Metadata: Original filename, date, tags, annotations if any
Example: Building a Support FAQ Search
- Add File Loader: Upload your
faq.pdf(or batch via directory). - Configure Chunking: Set chunk size 800, overlap 100.
- Connect to OpenAI Embedding/Chroma Vector Store.
- Set up Retriever node for end-user semantic search.
- (Optional) Add Labels: Tag by topic for filtered search.
Troubleshooting
- Import Fails: Check file type/size and format. Some oversized/binary docs may need preprocessing.
- No Chunks Produced: Try lower chunk size or check for OCR parsing failure (for scanned PDFs).
- Encoding Issues: For non-UTF8 or legacy formats, preprocess or use platform conversion utility.
Best Practices
- For updating documents, re-ingest rather than modifying vectors directly.
- Use labels/tags for grouping multiple related files.
- Test chunking effect on semantic search; sometimes domain-specific splitting logic is optimal.
- Periodically backup all imported files and ingestion logs for reproducibility.
The File Loader node makes it simple to turn any document collection into a living, searchable AI knowledge base—enabling RAG and custom Q&A inside your organization.