Extract from File Node
Extract text content from documents and files.
Overview
The Extract from File node reads and extracts text content from various file types including PDFs, Word documents, text files, and more. It converts document content into text that can be processed by other nodes.
Configuration
| Field | Description | Required |
|---|---|---|
Files Path |
Path or URL to the file(s) to extract from | Yes |
Output Variable |
Variable name to store the extracted text | Yes |
Supported File Types
- PDF documents (.pdf)
- Word documents (.docx, .doc)
- Text files (.txt)
- Markdown files (.md)
- CSV files (.csv)
- And more...
Using Variables
Pass file paths from previous nodes:
{{uploaded_file.url}}
{{attachment_path}}
{{document_url}}
Example Use Cases
Process Uploaded Documents
Files Path: {{user_upload.file_url}}
Output Variable: document_content
Extract from Multiple Files
Use with a For Loop to process multiple files:
Files Path: {{current_file.path}}
Output Variable: file_text
Analyze Document Content
Chain with an LLM node:
- Extract from File →
document_content - LLM Node → Analyze
{{document_content}}
Output
The extracted text is stored in your output variable:
{{document_content}}
Differences from OCR Node
| Feature | Extract from File | OCR Node |
|---|---|---|
| Input | Digital documents | Images, scanned docs |
| Method | Text extraction | Optical recognition |
| Use Case | PDFs, Word docs | Screenshots, photos |
| Speed | Faster | Slower |
Best Practices
- Use for digital documents with embedded text
- For scanned documents or images, use the OCR node instead
- Handle large documents by chunking if needed
- Combine with LLM nodes for document analysis