Documentation is being updated. Some sections may not reflect the latest features.
Looking for step-by-step guides? Explore Tutorials →

Extract from File Node

Extract text content from documents and files.

Overview

The Extract from File node reads and extracts text content from various file types including PDFs, Word documents, text files, and more. It converts document content into text that can be processed by other nodes.

Configuration

Field Description Required
Files Path Path or URL to the file(s) to extract from Yes
Output Variable Variable name to store the extracted text Yes

Supported File Types

  • PDF documents (.pdf)
  • Word documents (.docx, .doc)
  • Text files (.txt)
  • Markdown files (.md)
  • CSV files (.csv)
  • And more...

Using Variables

Pass file paths from previous nodes:

{{uploaded_file.url}}
{{attachment_path}}
{{document_url}}

Example Use Cases

Process Uploaded Documents

Files Path: {{user_upload.file_url}}
Output Variable: document_content

Extract from Multiple Files

Use with a For Loop to process multiple files:

Files Path: {{current_file.path}}
Output Variable: file_text

Analyze Document Content

Chain with an LLM node:

  1. Extract from File → document_content
  2. LLM Node → Analyze {{document_content}}

Output

The extracted text is stored in your output variable:

{{document_content}}

Differences from OCR Node

Feature Extract from File OCR Node
Input Digital documents Images, scanned docs
Method Text extraction Optical recognition
Use Case PDFs, Word docs Screenshots, photos
Speed Faster Slower

Best Practices

  • Use for digital documents with embedded text
  • For scanned documents or images, use the OCR node instead
  • Handle large documents by chunking if needed
  • Combine with LLM nodes for document analysis
AI AssistantPowered by Ubex
Beta
Ask me anything about Ubex workflows, nodes, or the API.