Documentation is being updated. Some sections may not reflect the latest features.
Looking for step-by-step guides? Explore Tutorials →

Web Scraper Node

Extract content from any webpage.

Overview

The Web Scraper node fetches and extracts content from web pages. It supports CSS selectors, multiple output formats, and metadata extraction.

Configuration

Field Description Required
URL The webpage URL to scrape (supports variables) Yes
Output Format Plain Text, HTML, Markdown, or JSON Yes
Max Words Limit the output length (for text/markdown only) No
Output Variable Variable name to store the scraped content Yes

Output Formats

Format Description
Plain Text Clean text content, stripped of HTML
HTML Raw HTML content
Markdown Content converted to Markdown format
JSON (Structured) Structured data with metadata

Advanced Options

Field Description Default
CSS Selector Target specific elements (e.g., .content, #main) None
Extract Metadata Include page title, description, etc. Off
Extract Links Collect all links from the page Off
Extract Images Collect all image URLs Off
Timeout Request timeout in milliseconds 30000
User Agent Custom user agent string Default

CSS Selectors

Target specific page elements:

.article-content    → Elements with class "article-content"
#main-content       → Element with ID "main-content"
article p           → All paragraphs inside article tags
[data-type="post"]  → Elements with specific data attribute

Using Variables in URL

https://example.com/page/{{page_number}}
https://api.site.com/search?q={{search_term}}

Example Output (JSON format)

{
  "content": "Page content here...",
  "title": "Page Title",
  "description": "Meta description",
  "links": ["https://...", "https://..."],
  "images": ["https://...", "https://..."]
}
AI AssistantPowered by Ubex
Beta
Ask me anything about Ubex workflows, nodes, or the API.