The following tools extract specific types of data from web pages and/or multiple websites to a single spreadsheet or file. They can be programmed to check for links, validate HTML and more. Can also be known as web harvesters, data scrapers, web scrapers, web crawlers, web crawl data, URL extractors, or web data extraction tools.
80Legs Collects, monitors and tracks content from single websites, can download the HTML for every page on a website or multiple sites. Support section has helpful how-to tutorials, one free plan plus three paid options.
Amazon Web Services | Common Crawl Corpus A corpus of web crawl data composed of over 5 billion web pages. Common Crawl is a non-profit organization dedicated to providing an open repository of web crawl data that can be accessed and analyzed by everyone. The most current crawl data sets include three different types of files: Raw Content, Text Only, and Metadata. No charge.
Extract URL 1.5 Extract URL with title, description, keywords meta data from entire websites, a list of URLs or search engine results. One time fee.
FMiner Using a visual editor, this tool creates a process “roadmap” for current and future data extraction. It can harvest data from catalogs, classifieds, search engines and yellow page directories. Various pricing levels based on the number of licenses purchased offers free trials and gives free upgrades for life. Good video and written tutorials.
Import.io A SaaS tool that allows users to pull mass data from web pages into easy to read data. Monitors competitors pricing structures and can be scheduled to run automatically. Three paid options, excellent Help Center and free trial available. =>independent review and how-to
Mozenda SaaS application for performing comprehensive web data gathering, data management, and data publishing. Scrape, store and manage data in the cloud, can automate and publish data on schedule. Offers live online training plus tutorials on demand. Free trial, three paid plans.
Rob Hammond’s SEO Crawler Fast and flexible real-time SEO website crawler to help identify technical or architectural SEO issues, runs on mobile phones and tablets. Crawl up to 259 URL’s for free, has advanced filters. No charge.
ScrapeBox The end-all-be-all of data scrapers! This tool scrapes search engine results, comments, keywords, email, data from YouTube and more. It is a proxy harvester, scans anchor text in backlinks so you can analyze, turns a list of URL’s into an RSS feed, creates sitemaps, has a broken link checker. One-time payment.
Scraper (Chrome plugin) If you are comfortable using XPath, this Chrome extension will pull data such as price and ratings from web pages into spreadsheets. No charge.
Scrapy An application framework for writing web spiders that crawl websites and extract data from them. Written in Python, offers a simple step by step instructions on how to build your own web crawler and pull what you want off a web page. Large and active community on StackOverflow to learn from.
Screaming Frog This desktop tool crawls web pages and lists internal and outbound links, Titles, H1 tags and their length, HTTP status, canonical tags and more. Excellent for listing broken links and redirects, mobile friendly. Free and paid versions, discounts for multiple licenses. => independent video review and how-to
WebHarvy Scrapes text, URL’s and image data from web pages and saves in various formats. Can be automated, uses proxy servers, has a library of how-to videos. Various paid options.
Win Web Crawler A powerful web crawler utility to extract URL, Title, meta tags, plain text between <body> tags, page size, last modified data from web pages and search results. One time fee.
Xenu Link Sleuth A free app that will check a web page for broken links. A plain HTML report generated after the tool has finished checking your web pages. No charge.
Important Disclosure => This page may contain affiliate links. If you click a link hosting an affiliate code and make a purchase, this site may receive a small commission from your actions. You incur no additional cost. We use commissions paid to fund our charity programs and offset operational costs of running the site.