File Loader (DocumentLoaders Node)

The File Loader node in InnoSynth-Forjinn enables ingestion of a wide variety of documents—PDF, TXT, DOCX, CSV, and others—into your knowledge base or retrieval pipeline. It is foundational for building RAG (Retrieval Augmented Generation) chatflows, allowing the platform to index, chunk, and query your own files.

What Does the File Loader Node Do?

Loads a file (or batch of files) from the local filesystem, cloud storage, or an upload form.
Handles automatic chunking of documents (e.g., by paragraph, page, or custom logic).
Normalizes text (cleans up whitespace, encoding) for downstream processing.
Passes chunks for embedding (e.g., to OpenAI Embedding, HuggingFace, etc).
Sends processed data to vector stores for semantic search.

Required Setup

File Path/Upload: Set file path, or use platform upload button for manual ingest in UI.
Chunk Size & Overlap: Configure chunking parameters (500-1000 tokens is common); overlap maintains context in split text.
Accepted File Types: PDF, TXT, DOCX, CSV, Markdown, HTML, and other standard doc types (may require enabling plugins for images, code, etc).
Labels/Tags (Optional): Useful for organizing, filtering, and searching within large sets of imported documents.

Typical Usage in Workflows

Knowledge Base Build Example:

File Loader → TextSplitter (optional) → Embedding Node → Vector Store

Repeat for multiple files; use different loader nodes for different formats or folders.
Combine with Dataset node for evaluation or Q/A benchmarking.

Configuration Fields

File Path/Input: ./documents/support_faq.pdf or file upload
Chunk Size: Integer (tokens/chars)
Overlap: Integer; chunk overlap for better context
Labels/Tags: Free-form or predefined taxonomy

Outputs

Chunks: Array of normalized text segments with associated metadata (file, page, chunk id)
Metadata: Original filename, date, tags, annotations if any

Example: Building a Support FAQ Search

Add File Loader: Upload your faq.pdf (or batch via directory).
Configure Chunking: Set chunk size 800, overlap 100.
Connect to OpenAI Embedding/Chroma Vector Store.
Set up Retriever node for end-user semantic search.
(Optional) Add Labels: Tag by topic for filtered search.

Troubleshooting

Import Fails: Check file type/size and format. Some oversized/binary docs may need preprocessing.
No Chunks Produced: Try lower chunk size or check for OCR parsing failure (for scanned PDFs).
Encoding Issues: For non-UTF8 or legacy formats, preprocess or use platform conversion utility.

Best Practices

For updating documents, re-ingest rather than modifying vectors directly.
Use labels/tags for grouping multiple related files.
Test chunking effect on semantic search; sometimes domain-specific splitting logic is optimal.
Periodically backup all imported files and ingestion logs for reproducibility.

The File Loader node makes it simple to turn any document collection into a living, searchable AI knowledge base—enabling RAG and custom Q&A inside your organization.

Forjinn Docs

File Loader