TUG Crawlers

blog

The Ultimate Guide to SEO Crawlers

Please note that this Ultimate Guide has been updated with 5 new crawlers!

If you want to be a good SEO, you definitely have to use SEO crawlers. They make SEO work much easier and more professional.

For those of you who are not familiar with the term “SEO crawlers,” I will explain it quickly.

An SEO crawler is a tool which goes through every single page on a website and extracts all the necessary information for you. Thanks to SEO crawlers, you no longer have to click through page after page on a website, analyzing titles, headers, canonicals, hreflang tags, internal links, sitemaps, etc. Fair enough? Yes, SEO crawlers do a really important, time-saving job.

There are many SEO crawlers, to name a few (click to jump):

 

  • SEOCrawler.io
  • Raven Tools
  • Searchmetrics Crawler
  • IIS Site Analysis Web Crawler (a free tool)
  • Xenu’s Link Sleuth (a free tool)
  • BeamUsUp (a free tool)
  • SEOSpyder by Mobilio Development
  • SEOMator
  • CocoScan

The universe is a big and impressive place, but we live in a limited world. So, I was able to test just the first ten 15 and for that, I would like to apologize. (But trust me, even this has consumed a large amount of time!)

Two Types of SEO Crawlers

You should probably know that there are two types of crawlers: desktop and cloud-based.

Desktop SEO crawlers 

These are crawlers that you install on your computer. Examples are Screaming Frog, Sitebulb, Link Assistant’s WebSite Auditor, and NetPeak Spider. These are much cheaper than cloud crawlers but they have some drawbacks, such as:

  • Crawls consume your memory and CPU. However, the situation is much better than it used to be in that crawlers are improving in the areas of memory & CPU management.
  • You have to use proxies to avoid getting banned.
  • Collaboration is limited. You can’t just share a report with a client/colleague. You can, however, work around this by sending them a file with a crawl project.
  • Unfortunately, desktop crawlers struggle with crawl comparison (Sitebulb is an exception) and scheduling.
  • In general, desktop crawlers are more limited than cloud crawlers.

At Elephate, we run desktop crawls using a server with 8 cores with 32 GB RAM. Even with a configuration like that, it’s common for us to have to stop crawls because we’re running out of memory. That’s one reason why we use cloud crawlers too.

Cloud SEO crawlers

  • Most cloud-based crawlers are decent in terms of collaboration. Usually, you can grant access to the crawl results to a colleague/client. Some of the cloud crawlers even allow for sharing individual reports.
  • It’s common to get dedicated, live support.
  • For the most part, you can easily notice changes across various crawls.
  • Generally, cloud-based crawlers are more powerful than desktop ones.
  • Typically, they are pretty good in terms of data visualization.
  • Of course, this comes at a cost. Cloud crawlers are much more expensive than desktop ones!

Okay. Let’s get started!

Methodology

What was tested?

Feature/reportComments
Basic SEO reports
List of indexable/ non-indexable pagesIt’s necessary to view a list of indexable/ non-indexable pages to make sure there are no mistakes. Maybe some URLs were intended to be indexable?
Missing title tagsMeta titles are an important part of SEO audits. A crawler should show you a list of pages that have missing tags.
Filtering URLs by status code (3xx, 4xx, 5xx)When you perform an SEO audit, it’s necessary to filter URLs by status code. How many URLs are not found (404)? How many URLs are redirected (301)?
List of Hx tags“Google looks at the Hx headers to understand the structure of the text on a page better.” – John Mueller
View internal nofollow linksSeeing an internal nofollow list allows you to make sure there aren’t any mistakes.
External links list (outbound external)A crawler should allow you to analyze both internal and external outbound links.
Link rel=”next” (to indicate a pagination series)When you perform an SEO audit, you should analyze if the pagination series are implemented properly.
Hreflang tagsHreflang tags are the foundations of international SEO, so a crawler should recognize them to let you point to hreflang-related issues.
Canonical tagsEvery SEO crawler should inform you about canonical tags to let you spot indexing issues.
Information about crawl depth – number of clicks from a homepageAdditional information about crawl depth can give you an overview of the structure of your website. If an important page isn’t accessible within a few clicks from a homepage, it may indicate poor website structure.
Content analysis
List of empty/thin pagesA large number of thin pages can negatively affect your SEO efforts. A crawler should report them.
Duplicate content reportsA crawler should give you at least basic information on duplicates across your website.
Convenience
A detailed report for a given URLIt’s must-have! If you do a crawl, you may want to see internal links pointing to a particular URL, to see headers, canonical tags, etc.
Advanced URL filtering for reporting – using regular expressions and modifiers like “contains,” “start with,” “end with.”I can’t imagine my SEO life without a feature like this. It’s common that I need to see only URLs that end with “.html” or those which contain a product ID. A crawler must allow for filtering.
Page categorizingSome crawlers offer the possibility to categorize crawled pages (blog, product pages, etc.) and see some reports dedicated to specific categories of pages.
Adding additional columns to a reportThis is also a crucial feature. When I view a single report, I want to add additional columns to get the most out of the data. Fortunately, most crawlers allow this.
Filtering URLs by type (HTML, CSS, JS, PDF, etc.)Crawlers visit resources of various types (HTML, PDF, JPG). But usually, you want to review only HTML files. A crawler should support this.
Overview – a list of all issues on a single dashboardIt’s a positive if a crawler lists all the detected issues on a single dashboard. Of course, it will not do the job for you, but it can make SEO audits easier and more efficient.
Comparing to a previous crawlWhen you work on a website for a long time, it’s important to compare crawls that were done before and after any changes.
Crawl settings
List mode – crawl just the listed URLsThis feature can help you if you want to perform a quick crawl of a small set of URLs.
Changing the user agentSometimes it’s necessary to change the user agent, for example, if a website blocks Ahrefs. This way you can still perform a crawl. Also, more websites detect Googlebot by the user agent and serve it a pre-rendered version instead of fully-equipped JS.
Crawl speed adjusting You should be able to set a crawl speed, i.e 1-3 URLs per second if a website can’t handle the host load, while you may want to crawl much faster if a website is healthy.
Can I limit crawling? Crawl depth, max number of URLsMany websites have millions of URLs. It may be better to limit the crawl depth or specify a max number of URLs.
Analyzing a domain protected by an htaccess LoginThis is a helpful feature if you want to crawl the staging website.
Can I exclude particular subdomains, include only specific directories?
List modeSometimes you want to perform a quick audit of a specified set of URLs without crawling the whole website.
Universal crawl -> crawl + list mode + sitemap
Maintenance
Crawl schedulingIt’s handy to be able to schedule a crawl and set monthly/weekly crawls.
Indicating the crawling progressIf you deal with big websites, you should be able to see the current status of a crawl. Will you wait a few hours, or weeks until the 1M+ crawl is finished?
Robots.txt monitoringAccidental changes in robots.txt can cause Google to not be able to read and index your content. It’s beneficial if a crawler detects changes in Robots.txt and informs you.
Crawl data retentionIt’s helpful if a crawler can store results for a long period of time.
Notifications – crawl finishedA crawler should inform you when a crawl is done (desktop notification/email).
Advanced SEO reports
List of pages with less than x links incomingIf there are no internal links pointing to a page, it may mean that the page is probably irrelevant for Google. It’s crucial to spot orphan URLs.
Comparison of URLs found in sitemaps and in a crawl. Sitemaps should contain all the valuable URLs. If some pages are not included in a sitemap, it can cause issues with crawling and indexing by Google.
If a URL is apparent in a sitemap, but can’t be accessible through crawl, it may be a signal to Google that a page is not relevant.
Internal Page Rank valueAlthough PageRank calculations can’t reflect Google’s link graph, it’s still an important feature. Imagine you want to see the most important URLs based on links. Then you should sort URLs by not only simple metrics like the number of inlinks, but also by internal PageRank. You think Google doesn’t use PageRank anymore? http://www.seobythesea.com/2018/04/pagerank-updated/
Mobile AuditIn mobile-first indexing, it’s necessary to perform a content parity audit between the mobile and desktop versions of your website
Additional SEO reports
List of malformed URLs
List of URLs with parameters
Redirect chains reportNobody likes redirect chains. Not users, not search engines. A crawler should report any redirect chains to let you decide if it’s worth fixing.
Website speed reportsPerformance is becoming more important both for users and SEO. So crawlers should present reports related to performance.
List of URLs blocked by robots.txtIt happens that a webmaster mistakenly prevents Google from crawling a particular set of pages. As an SEO, you should review the list of URLs blocked by robots.txt to make sure there are no mistakes.
Schema.org detection
Export, sharing
Exporting to excel/CSV?Sometimes a crawler has no power here and you need to export the data and edit it in Excel/other tools.
Creating custom reports/dashboards
Sharing individual reportsLet’s say that you want to share a report related to 404s with your developers. Does the crawler support it?
Granting access to a crawl to another personIt’s pretty common that two or more people work on the same SEO audit. Thanks to report sharing, you can work simultaneously.
Miscellaneous
Explanation of the issues – why and how to fixIf you are new to SEO, you will appreciate the explanation of the issues that many crawlers provide.
Custom extractionA crawler should let you perform a custom extraction to enrich your crawl. For instance, while auditing an e-commerce website, you should be able to scrape information about product availability and price.
Can a crawler detect a unique part that is not a part of the template? It’s valuable if a crawler lets you analyze only the unique part of a page (excluding navigation links, sidebars, and footer).
Integration with other toolsIt’s helpful if a crawler integrates with external tools such as Google Analytics, Google Search Console, backlinks tools (Ahrefs, Majestic SEO), and with server logs.
JavaScript renderingJavaScript is more and more popular. If your website depends heavily on JavaScript, it’s a good idea to use a crawler that supports JS.
Why users should use the crawler Here, I am getting direct statements from the crawlers’ representatives.

Desktop Crawlers

Let’s start with the desktop crawlers (Screaming Frog, Sitebulb, WebSite Auditor, Netpeak Crawler)

Screaming Frog

Main competitors: WebSite Auditor, Sitebulb, Netpeak Spider

It’s the most popular crawler around. It’s also the cheapest (which doesn’t mean it’s the worst!). It costs £149.00 per year for a single license.

It checks for virtually every necessary aspect of SEO: canonicals, status codes, titles, headers, etc. It’s a very customizable tool – there are tons of options you can configure.

It’s apparent that Screaming Frog is also up to date with the most recent trends. It offers JavaScript crawling, and you can integrate with Google Analytics and Google Search Console.
But there’s one thing that Screaming Frog is definitely behind in: data visualization. It’s often necessary to export Screaming Frog either to an Excel or CSV file to get the most of the data.  In terms of data visualization, Sitebulb is much better in this area.

But still, many professional SEOs claim that even if they work with powerful and expensive cloud tools, they really like Screaming Frog.
Same here. Every day I work with crawlers like Ryte and DeepCrawl, but I continue to use Screaming Frog/Sitebulb because there are many areas where SF (or other desktop crawlers) fits better.

  • When I need to see the screenshot of a rendered view, I use SF (currently, it’s the only tool that supports this feature).
  • If I want to start a quick crawl with real-time preview, I use Screaming Frog.
  • When I am running out of credits in the cloud tool, I simply use a desktop crawler like Screaming Frog, WebSite Auditor, or SiteBulb.
  • For now, Screaming Frog and Sitebulb are better in spotting redirect chains than most of the premium tools.

Recently, Screaming Frog released the 10th version of their software, which has brought many benefits. To name a few:

  1. Structure visualizations
  2. Crawling XML sitemaps
  3. Calculating the internal page rank
  4. Full command-line interface to manage crawls
  5. Reporting canonical chains
  6. AMP crawling & validation
  7. Scheduling. You can schedule crawl (daily/weekly/monthly) and set auto exporting. It’s a big step forward, but I am looking forward to the ability to easily compare the data between crawls.

Tip: when you do a crawl, don’t forget to enable a post-crawl analysis, which will allow you to get the most out of the data.

As I mentioned earlier, Screaming Frog now offers visualization of links. You can choose one of two types of visualizations (crawl three and directory three). Both are valuable for SEO audits. The first can show you groups of pages and how are they connected. The latter can help you understand the structure of URLs on a website.

Pricing: £149.00 Per Year per single license lasting 1 year.

Checklist for Screaming Frog.

Sitebulb

Main competitors: Screaming Frog, WebSite Auditor, Netpeak Spider

Sitebulb is a relatively new tool on the market, but it has been warmly received by the SEO community. Personally, I really like Sitebulb’s visualizations:

Because of the fact that Sitebulb is desktop software, you can’t just share a report with your colleagues while doing an SEO audit. You can partially work around this by exporting a report to PDF. Once you click on the “Export” button, you will see a 40-page document, full of charts, presenting the most important insights.

Sitebulb’s pricing strategy

Although a single license is more expensive than Screaming Frog’s, it has a nice pricing strategy. Every additional license cost you only 10% of the full price. Assuming both you and your colleague have Sitebulb installed on your personal computer, you can work on the crawl at the same time. Here is a guide on how to copy crawls across Sitebulb instances: https://sitebulb.com/documentation/audits-projects/importing-exporting-audits/.

Crawl maps

There is a really interesting feature of Sitebulb: crawl maps. These can help you understand your website structure, discover internal link flow, and spot groups of orphan pages.

The second version of Sitebulb (released in April 2018) brought many interesting features:

  • Statistics like First Meaningful Paint (helpful for website speed optimization)
  • List mode (like in Screaming Frog)
  • Code coverage report (unused CSS, JS)
  • Multi-level filtering, like in Ryte, Botify, OnCrawl, and DeepCrawl.
  • AMP validation.
  • Sitebulb is the only desktop crawler that can compare crawls between crawls

Sitebulb integrates with Google Analytics and Google Search Console.

Although Sitebulb does a great job with data visualization and offers many interesting features, I have to point out the drawbacks.

  • Unfortunately, you can’t set custom extraction. Other tools support this feature.
  • Sitebulb doesn’t inform about H2 tags.
  • As a Big Data fan, I am not happy you can’t export all internal links to a CSV/Excel file. Screaming Frog offers it. However, I can see summaries and visualizations; that’s enough for more than 95% of SEOs.
  • If Sitebulb encounters an error while retrieving a page, it will not be recrawled.
  • I can do only one crawl at a time; other crawls are added to the queue.

I believe in the case of Sitebulb the pros outweigh the cons.
By the way, you can suggest your own ideas by submitting them through https://features.sitebulb.com/. It seems many interesting features like crawl scheduling, and data scraping are going to be implemented. I’m keeping my fingers crossed for the project.
Pricing:

By visiting https://sitebulb.com/elephate you can get an exclusive offer, a 60-day free trial.
Checklist for Sitebulb.

WebSite Auditor

WebSite Auditor informs about SEO stuff like status codes, click depth, incoming/outcoming links, redirects, 404 pages, word count, canonicals, and pages restricted from indexing. It integrates with Google Search Console and Google Analytics.  

As with Screaming Frog, for every URL you can see a list of inlinks (including their anchors and source). Also, you can easily export them in bulk.

Website structure visualization

Similarly to Sitebulb, with WebSite Auditor you can visualize the internal structure: Click depth, Internal Page Rank, and Pageviews (available through integration with Google Analytics).

Sitebulb, FandangoSEO, and WebSite Auditor are the only crawlers on the market that are capable of doing this.

Content analysis
WebSite Auditor provides a module dedicated to basic content analysis. It checks if targeted keywords are apparent in the title, body, and headers. In addition, WebSite Auditor evaluates TF-IDF (term frequency-inverse document frequency). If you’re not sure what this is, you can read Bartosz Góralewicz’s article The TF*IDF Algorithm Explained.

WebSite Auditor’s unique function is the ability to look into Google index to see orphan pages.

To do this, you have tick the “Search for orphan pages’ option while setting up a crawl.

Now, it’s time to point out WebSite Auditor’s main drawbacks:

  • You can’t limit the number of URLs to be crawled, however, you can specify a maximum depth
  • Information on hreflang, link rel & Hx tags is available only if you open a detailed report for a given URL
  • You can’t compare the data between different crawls
  • Although WebSite Auditor supports advanced filtering for reports, it doesn’t support regular expressions

WebSite Auditor offers three versions: Free (allows for crawling up to 500 URLs), Pro, and Enterprise. You can compare the differences here: https://www.link-assistant.com/website-auditor/comparison.html

If you use our referral links at WebSite Auditor Enterprise or WebSite Auditor Professional, you will get 10% off at checkout.

Checklist for WebSite Auditor.

Updated: Netpeak Spider

Netpeak Spider was not written up in the initial release of the Ultimate Guide to SEO crawlers, however, the list of improvements introduced in the recently released v3.0 is quite impressive, so I just had to test it.

Speed improvements

First of all, according to Netpeak’s representatives, Netpeak Spider 3.0 consumes ~4 times less memory when compared to the previous (2.1) version.

Other improvements introduced in Netpeak Crawler 3.0 include:

  • You can use a custom segmentation (I’ll explain this later)
  • You can pause a crawl and resume it later or run it on another computer. For instance, if you see a crawl consumes too much RAM, you can pause it and move the files to a machine with a bigger capacity.
  • You can rescan a list of URLs to see if any issues were fixed correctly
  • Netpeak Spider added a dashboard that shows the most important insights
  • You can remove URLs from a report or rescan them.

If you want to read more about the recent updates, introduced in Netpeak Spider 3.0, here you go.

Custom Segmentation

Let’s start with the most important improvement (at least from my point of view) – data segmentation. Netpeak Spider is so far the only desktop crawler that has implemented it.

What is this?

This is a feature that lets you quickly define some segments (clusters of pages) and see reports related to these segments only.

Custom segmentation is definitely a great feature, however, I miss the ability to see a segment overview report like those offered by Botify, FandangoSEO, and OnCrawl.

In the screenshot from FandangoSEO below, you can see the pagetype breakdown when viewing the dashboard, which provides a great overview of segments.  

In the past, navigating through the list of issues was difficult but now, it’s much easier since you’ve got a treeview.

It’s time to move to the drawbacks of Netpeak Spider:

  • You can’t integrate with Google Analytics or Google Search Console (although it’s planned for NetPeak Spider 3.1)
  • Although the latest version introduced a visual Dashboard (which is fine), it still struggles with data visualization. I hope they will catch up shortly.
  • If you’re a Linux user, you can’t use Netpeak Spider. For now, it’s available for Windows and Mac OS, however, according to their website, a Linux version is coming soon.

If you buy Netpeak Spider for 12 months, it costs $9.80 per month per single license.

Go to our affiliate link and use the promo code: ca480e7f to get a 10% discount for one year on purchasing Netpeak Spider and Netpeak Checker!

 

Cloud Crawlers

Let’s move on to the cloud crawlers: DeepCrawl, OnCrawl, Ryte, and Botify.  

Disclaimer: for everyday routines, we use DeepCrawl and Ryte. We did our best to be as unbiased as possible. The crawlers are presented alphabetically.

Botify

Main competitors: Ryte, OnCrawl, DeepCrawl, Jet Octopus, Audisto, FandangoSEO, ContentKing Botify is an enterprise-level crawler. Its client list is impressive: Airbnb, Zalando, Gumtree, Dailymotion. Botify offers many interesting features. I think it’s the most complex, but also the most expensive of all crawlers listed.

I noticed one disadvantage of Botify – it doesn’t offer a list of SEO issues on a single dashboard. If you open Ryte, Sitebulb, or DeepCrawl, you will see all the detected SEO issues (Internal Nofollow links, indexable pages with long click path, pages marked as “noindex, nofollow”) listed on one dashboard.

It’s my feeling that their developers will introduce this feature shortly. If they do, I will update this article.

Botify has the ability to filter reports and dashboards by segments:

Let’s imagine you have three sections on your website: /blog, /products, and /news. Using Botify, you can easily filter reports to see data related only to product pages.

Botify provides some reporting divided by groups. A few examples are presented below:

There is another useful feature on Botify that other crawlers simply miss. For every filter, you can see a dedicated chart (there are 35 charts in the library across several categories).

This is pretty impressive. See the screencast I recorded. http://take.ms/TPCUi

Also, you can install the Botify addon for Chrome and see insights directly from the browser. Just navigate to a particular subpage of a crawled website and you will see:

  • Basic crawl stats
  • A sample of internal inlinks
  • URLs with duplicated metadata (description, H1 tags)
  • URLs with duplicated content

Botify stores HTML code for every crawled page. It allows for checking content changes across crawls.

Botify allows for server log analysis and JavaScript crawling; however, like in the case of OnCrawl, it’s not included in the basic subscription plan.

Botify offers a helpful knowledge base, webinars, and videos illustrating how to use their features.

Checklist for Botify.

Deepcrawl

Main competitors: Ryte, Oncrawl, Botify, Jet Octopus, Audisto, FandangoSEO, ContentKing

DeepCrawl is a popular, cloud-based crawler. At Elephate, we use it during our normal routines (along with Ryte and Screaming Frog).

We really like this tool, but one of the biggest drawbacks of DeepCrawl is that you can’t just add additional columns to a report.

Let’s say I am viewing a report dedicated to status codes and then I would like to see some additional data: canonical tags. I simply can’t do it in DeepCrawl. If I want to see canonicals, I have to switch to the canonical report. For me, it’s an important feature. However, I am pretty sure they will catch up shortly. And if they do, I will update the article…

I do believe that in the case of DeepCrawl, the pros outweigh the cons. There are plenty of interesting features of DeepCrawl:

  • JavaScript rendering
  • Logfile integration
  • Integration with Majestic SEO
  • Integration with Zapier
  • Stealth mode (the user agent, the IP address is randomized within a crawl; helpful for crawling websites with restricted crawling policy).
  • Integration with Google Search Console and Google Analytics
  • Crawl scheduling

I mentioned above that DeepCrawl integrates with Majestic SEO. Furthermore, you don’t need to have a Majestic account to use it. Nice!

DeepCrawl offers a few plans:  

DeepCrawl has offered a discount voucher code to get 10% off any annual package by using the code: ELEPHATE

Checklist for DeepCrawl.

OnCrawl

Main competitors: DeepCrawl, Botify, Ryte, Jet Octopus, FandangoSEO, Audisto, ContentKing

OnCrawl is a cloud-based tool. While this tool is suited for bigger companies, it offers a starter plan at a reasonable price. You can crawl up to 10k URLs per month (up to 5 projects) paying 10 euros per month.

A lot of SEOs appreciate OnCrawl because of the near-duplicate detection feature – you can filter the list of URLs by a similarity ratio.

There is another great feature of OnCrawl that other crawlers miss – you can integrate OnCrawl with any data. Just upload a CSV file with any data you want and make sure that your CSV contains the common field: “URL” and the sky’s the limit. Note: Botify offers a similar feature for some of their clients, but they don’t do it at scale, and recently FandangoSEO added such a feature.

I like OnCrawl for its URL segmentation. Let’s say you view a list of non-indexed URLs. Then, you can quickly switch URL segmentation to see only the blog or product pages.  

OnCrawl gives you interesting reports regarding your page groups:

It also provides an overview of the link flow between page groups:

OnCrawl integrates with Google Analytics and Google Search Console. As with every cloud-based crawler, it allows for crawl scheduling.

OnCrawl provides some pre-defined SEO reports, but its power is in its flexibility. You can create your own dashboards. Go to Tools -> Dashboard builder and click on the category you are interested in. As of 2nd May, there are 24 categories to choose from. Examples are Status codes, Indexability, Inlinks, Orphan pages, etc.

You can easily add or remove charts to a custom dashboard. OnCrawl provides a library of charts to choose from and drag and drop to a specified, custom dashboard.

If you ask me about OnCrawl’s drawbacks, features like the redirect chain report and crawling only a specified set of URLs are not enabled by default (you have to contact OnCrawl’s Customer Support team and they will enable it for your crawl). It also lacks the ability to filter crawled URLs by regular expressions.

By default, Oncrawl doesn’t provide the list of detected SEO issues. You can work around this by clicking on the Dashboard builder -> Onsite Issues.

While using OnCrawl, I had UX issues with finding particular reports/dashboards. They are there, OnCrawl is a quite powerful crawler, but they are difficult to digest.  

OnCrawl’s price depends on if you want to use the Logfile analysis feature.

Price List (without log file analyzing):

Price List (with log file analyzing feature):

OnCrawl has created a unique coupon for Elephate readers, named  “ElephateTR2018” This coupon will give users a 15% discount on any subscription and is valid until December 31, 2018.
Checklist for OnCrawl.

Ryte

Main competitors: DeepCrawl, OnCrawl, Botify, Audisto, JetOctopus, FandangoSEO, ContentKing

Ryte is another popular web-based crawler. We use it during our everyday routine (along with DeepCrawl and Screaming Frog).

Good to see that we are listed on their partner’s list!

I really like the reports generated by Ryte. On a single dashboard, I can see a list of all the detected SEO issues. Then I can click to see the detailed view and decide if it’s a real issue or if Ryte just wants to draw my attention to something. Of course, this report can’t replace human intervention, but it’s great having such a feature available. As with its main competitors, Ryte integrates with Google Search Console and Google Analytics.

Ryte’s unique function is the uptime server monitoring (they ping your server from time to time to ensure the server works well).

Another interesting function is the robots.txt monitoring. Ryte detects if you change robots.txt and lets you review the history. What is more, Ryte has a comfortable credits policy – if you want to re-run an active crawl, they will not charge you for it.

Ok, let’s move to the drawbacks. I really like the feature of JavaScript crawling, and currently, Ryte doesn’t support it.

I commonly deal with big crawls, 500k+/1kk+ URLs and sometimes I need to export particular reports to CSV. Until recently CSV export was limited to 30k rows.
Fortunately enough, they recently expanded it and now it’s possible to export 100k rows.

And if you use their API, the sky is the limit.

To get you onboarded, Ryte offers you Webinars.

Ryte offers different pricing plans depending on your needs:

Checklist for Ryte.

Updated: Audisto

Audisto is a crawler popular mainly in German-speaking countries.

Using Audisto, you can split lists of hints by category, like Quality, Canonical, Hreflang, or Ranking.

I really like Audisto’s segmentation.  You can create URL clusters based on filters and see reports and charts related only to those clusters.

Many crawlers offering this a feature require you to have knowledge about Regular expressions. Audisto is a bit different in that; you can define patterns in the same way you define “traditional” filters.

Additionally, you can even add comments when adding a cluster, which may be helpful for future reviews or when many people work on the same crawl.

However, you can’t apply segment filtering for all reports. For instance, you can’t do it for a Duplicate Content report or Hreflang report.

With Audisto you can easily compare two different crawls.

Bot vs User Experience

Audisto has a nice approach to bot vs user experience. They detect if users get a similar experience as Googlebot and even provide a chart to visualize the comparative experience.

Monitoring Issues

For every issue listed in the Hint section (Current monitoring -> Onpage -> Hints) you can see the trendline, which is helpful for tracking SEO issues:

Now, it’s time to point out some disadvantages of Audisto:

  • You can’t add additional columns to a report (however, reports contain a lot of KPIs and this should be improved in the next iteration of their software).
  • The URL filtering is rather basic. However, you can partially work around this by using custom segmentation.
  • Audisto doesn’t offer custom extraction.
  • It doesn’t integrate with Google Analytics, or Google Search Console and server logs.  But, of course, you can do custom analyses if you use their API.

Pricing

A packet for 5kk URLs cost 320 EUR (~364 USD) per month. 1 million URLs / month cost 150 EUR.

Updated: JetOctopus

JetOctopus is a relatively new tool in the market of cloud crawlers.

They divide issues into six categories:

  1. Indexation
  2. Technical
  3. HTML
  4. Content
  5. Links
  6. Sitemap

It offers nice visualizations. Below are some screenshots from the tool.

Custom Segmentation

JetOctopus allows you to define a new segment, which is very easy to use. You just set the proper filter and click on “Save segment” and you don’t need to be familiar with Regular expressions.

Then, you can filter reports to predefined segments.

For now, JetOctopus doesn’t offer server log analysis, but they are in the process of building a dashboard for it.

Linking Explorer – Discover Anchors and Source of Links

I like their linking explorer (a feature added very recently). I can easily see the most popular anchors of links pointing to a page or group of pages.

Also, it shows the most popular directories linking to a page.

Here’s where page segments come in handy. You can quickly switch segments to see only the stats related to links coming from particular segments (i.e from blog or product pages).

Now, some of JetOctopus’s drawbacks:

  • For now, It doesn’t integrate with server logs, Google Analytics or Google Search Console. However, the backend is integrated, and they are currently building a dashboard for it.  
  • No custom extraction.
  • No JavaScript crawling.
  • Two different crawls can’t be compared.

Please remember, it’s a relatively new tool on the market, and cheaper than other cloud tools. I hope they will continue to improve.

You can register for a trial and crawl up to 30k, with an unlimited number of products. If you have a few small websites, you can go for the basic package (up to 100k URLs, an unlimited number of projects). It costs 20 euro (~23 USD) per month.

JetOctopus offers Agency and Premium Packages as well (crawling up to 20 million URLs).

Using the “Elephate” promo code, you can get a 10% discount for Jet Octopus.

Updated: FandangoSEO

FandangoSEO is a Spanish crawler, and the name comes from the lively Spanish dance.

Like many other cloud tools, FandangoSEO offers good visualizations. Some screenshots are presented below:

 

Integration with Server Logs at no Cost

These days, server log analysis has become an integral part of many SEO analyses.

FandangoSEO integrates with server logs (and like DeepCrawl, you don’t need to pay extra for it). You can upload logs once or periodically (using their interface or FTP).

Defining custom segments

Similarly to Botify, OnCrawl, JetOctopus, in FandangoSEO, you can define custom segments.

Because of this, you can see some reports related to segments.

FandangoSEO requires you to know Regular expressions to define new segments. If you want to learn Regular Expressions, you can read my article on the subject.

FandangoSEO Detects Schema.org

FandangoSEO is one of few crawlers that detects Schema.org, so you can easily see URLs with Schema.org implemented.

Crawling Competitor’s Websites

You can compare data between various projects with this software, which makes it possible to crawl your competitor’s website.

Architecture Maps

Similarly to Sitebulb and Website Auditor, you can see the architecture map with FandangoSEO.

Integrate Crawls with any Data

When I initially published this article, I wrote that OnCrawl was the only crawler that is able to enrich your crawls with any data (by importing a CSV file with a common field: URL). And voila! In June, Fandango introduced a similar feature.

I’m glad to see crawlers are improving. Good job, FandangoSEO!
It’s time to point out some disadvantages of FandangoSEO:

  • One of the biggest is that I can’t be filtered.  
  • Additional columns can’t be added to a report.
    • For instance, when viewing a report related to canonicals, information about a number of internal links pointing to a canonicalized page can’t be seen.
    • If there are thousands of canonicalized pages, all you can do is export reports to Excel and do the filtering there.
  • It doesn’t integrate with Google Analytics and Google Search Console

FandangoSEO’s pricing starts from 59 USD monthly (150k crawled pages, 10 projects), Medium package (600k crawled URLs) cost 177 USD monthly.

Update: ContentKing

Real-time monitoring, change tracking and alerting

ContentKing is a unique crawler on the market since it is a real-time monitoring tool; informs you in detail about things like on-page SEO changes, robots.txt changes, indexability issues, and pages that started to redirect, 404, or return server errors.

Alerts are sent out if there are big changes or serious issues. According to ContentKing’s representatives, the internal algorithm takes into account the impact of the changes/issues and the importance of the pages involved and then decides whether or not to send out alerts. That sounds interesting, but I need more time to test it thoroughly.

ContentKing also checks for OpenGraph, TwitterCards, and the presence of tag managers and analytics software such as Google Analytics, Adobe Analytics, and Mouseflow.

Below, you can see some screencasts made by ContentKing.

https://www.contentkingapp.com/wp-content/themes/contentking/videos/change-tracking/video@1x.mp4
https://www.contentkingapp.com/wp-content/themes/contentking/videos/events/video@1x.mp4

How does it work?

By investigating the Elephate.com/blog I was able to see that:

  • on Monday, July 16th we switched to CDN.
  • On July 4th we had some pages broken that we received an alert for (which has been resolved)
  • we publish a lot of content.

What’s most important, I can compare any crawled page between any dates. That’s really impressive and unique on the market.

Now it’s time to point out ContentKing’s disadvantages:

  • Their filtering needs improvement. I need to be able to combine rules when filtering: “URL starts with X, contains Y, but doesn’t contain Z.”  However, ContentKing is already implementing such a feature and it should be ready in September.
  • When viewing a list of issues, I can’t add additional columns (that feature is available only when viewing a full list of crawled pages). So, for instance, while viewing pages with an incorrect page title length, I can’t see information about the title length. (planned for October).
  • Lack of custom extraction (planned for Q4).
  • ContentKing doesn’t execute JavaScript (planned for Q4).

A package for end-users for 50k pages cost 64 USD per month. Also, there are some packages for SEO agencies and enterprises. For agencies, a package for 1 million pages cost 355 USD per month. ContentKing doesn’t charge for recrawls. It charges only for 2xx pages (but not for redirects, pages not found, server errors, and timeouts).
Agency pricing:

You can use our affiliate link by clicking here.

Cloud-based tools at no additional cost?

Do you use SEMrush for competition analysis? Did you know that this tool offers a crawler?

What about Ahrefs? If you use it, you can use their crawler at no additional cost.

Do you recognize Moz Pro?

If you subscribe to it, you can crawl your website for free.

If you use Searchmetrics, you have a crawler for free!

Truth be told, these tools are not as advanced as other cloud-based tools, like DeepCrawl, Ryte, OnCrawl, Botify, but if you need to do a basic SEO audit, they should be fine. Especially, if you don’t need to pay additional money.

Ahrefs

Main competitors: Moz, SEMrush

Ahrefs crawler (Site Auditor) is an integral part of Ahrefs Suite, a popular tool for SEOs. A similar situation to Moz – if you subscribe to Ahrefs (it offers you tools like site explorer, content explorer, keywords explorer, rank tracker), you have their crawler for free.

To let you stay focused, Ahrefs let you easily filter issues by importance (Errors, Warning, Notices).

For every issue, you can see if it’s new or occurred in the previous crawl too.

Ahrefs’ advantage over other crawlers in its segment (Ahrefs/Moz/SEMRush) is that you can add additional columns to an existing report. Also, in Ahrefs you can see which URLs are apparent in a sitemap and which are not. It does have some limitations, though. It doesn’t integrate with GSC and GA. Similarly to Moz and SEMRush, you can’t share the crawl results with your colleagues, so only one person can work on the crawl at a time.

Lifehack: you can get around this limitation. If you use a single Ahrefs account within your agency, you can work concurrently on a crawl. The risk that you will be logged out is minimal.

If you’re new to SEO, you will find the explanation on the issues provided by Ahrefs helpful.

Depending on the Ahrefs plan you have, you can crawl 10k-2.5kk URLs.

Checklist for Ahrefs. 

Moz

Main competitors: Ahrefs, SEMrush

Let’s start with Moz crawler. It’s an integral part of Moz Pro (Keyword explorer, Rank tracker, Crawler, Open Site Explorer).

I really had to consider how to introduce this crawler. From one point of view, it lacks many functions and features that other crawlers support. But from another, it’s a part of Moz Pro. So if you subscribe to Moz Pro, then you have the crawler for free. In addition to that, Moz crawler provides a few unique features like marking an issue as fixed.

Moz crawler integrates with Google Analytics, but it lacks integration with Google Search Console. However, I need to defend Moz a little bit as its main competitors SEMRush and Ahrefs don’t offer this integration either.

I do, however, appreciate that Moz provides a decent explanation for these issues (written by Moz specialists).

It’s useful that the Moz crawler is integrated with other Moz tools and that you can see parameters like Page Authority and Domain Authority directly from the crawl.  

Other interesting features offered by the Moz crawler are the “Mark as fixed” and “Ignore” features. I think the Moz documentation explains it pretty well (emphasis mine):

The tool is designed to flag all these issues so you can decide whether there’s an opportunity to improve your content. Sometimes you just know that you’ve fixed an issue, or you’ve checked that you’re happy with that page and it’s not something you’re going to fix. You can mark these issues are Fixed or Ignore them from your future crawls.

Unfortunately, there are no reports related to hreflang tags or pagination series, and the URL filtering is rather basic. If you want to perform some analysis related to orphan pages, it’s very limited – you can’t see the list of pages with less than x links incoming. Also, you can’t see which URLs are found in sitemaps but were not crawled.

My opinion: Moz crawler may be enough for basic SEO reports, however, I wouldn’t use it for professional SEO audits. Its main competitors, Ahrefs and SEMRush, are much more advanced.

Checklist for Moz.

SEMrush

SEMrush is a well-known tool for competitor research. Did you know they offer an SEO crawler? If you subscribe to SEMrush, then you have SEMrush crawler at no cost!

SEMRush is quite good at spotting basic SEO issues. When you go to the Issues tab, you will see all the detected SEO issues listed on a single dashboard. SEMRush divides issues by importance (Errors/Warnings/Notices) and for every issue, you can see the trend so that you can immediately spot if an issue is new. Like Moz crawler, SEMRush integrates with Google Analytics.

The main drawback of SEMrush is poor filtering. That’s an area where SEMRush simply must catch up.

Let’s say you want to see no-indexed pages. Then you go to Site audit -> Issues -> blocked from crawling. Unfortunately, this report shows you not only no-indexed pages but also disallowed by robots.txt, and you can’t filter the results.

I really miss the ability to add a column with additional data.  

If you need to create a basic SEO audit for a small website, SEMrush would be fine, but you can’t use it for large websites. The SEMrush crawler only allows for crawling up to 20k URLs per crawl.

Checklist for SEMrush

As I mentioned before, you have access to a free crawler if you have an active account for Searchmetrics, Ahrefs, MOZ, or SEMRush. Check if these tools are enough for your SEO audits. If they, you can use them and save a lot of money.

I have noticed an emerging trend that many SEO tools are adding an SEO crawler feature to their toolkit. For instance, Clusteric, primarily made for link auditing and competitor analysis, now offers an SEO crawler feature.

Which Crawlers Support JavaScript?

Nowadays, an increasing number of websites use JavaScript. Crawlers try to adapt so they have started supporting JavaScript. The obvious question is: which of the crawlers support JavaScript crawling?

CrawlerSupport for JavaScript crawling
DeepCrawlYes (it’s included in Corporate plans. For smaller packages: Starter and Consultant: price upon request)
Screaming FrogYes
SitebulbYes
RyteNo
MozNo
AhrefsYes (for Advanced and agency plans)
BotifyYes (it’s not included in basic plans)
OnCrawlYes (it cost 10x more credits)
Searchmetrics Yes (it costs 2x more credits)
Website AuditorYes 

So, Which Crawler is the Best?

I am glad you have survived up to this point!

Before I answer the question, let’s start with a short analogy. Imagine you want to buy a new car. You may ask: which car is the best? I hate to disappoint you. There is no best car in the world. Except, obviously, for the 1967 Ford Mustang.

  • Do you like to feel the wind in your hair? Then buy a cabrio.
  • Do you want to buy a car that your wife will love? Then buy a red one.
  • Do you have a family? Then buy a station wagon.

How much money do you have? Which car company do you trust?

All difficult choices and everything depends on your preferences.

The same thing with crawlers. There is no single best crawler. Everything depends on your needs, expectations, and budget.

My job was to introduce you to the most popular crawlers, and list the features that might be helpful for you.

The Perfect Crawler

  • Can store the crawl data forever (to let you review a crawl after half a year)
  • Has a reasonable price (Screaming Frog, Sitebulb, WebSite Auditor, Netpeak Spider)
  • Should allow for crawling websites that have more than 1kk URLs, if you have the need (many SEO agencies deal with websites of this size)
  • Has website structure visualization (Sitebulb, WebSite Auditor, OnCrawl, FandangoSEO)
  • Provides integration with any data (OnCrawl)
  • Can integrate with server logs, Google Analytics, and Google Search Console
  • Can easily share the crawl with your clients and colleagues (cloud crawlers are typically much better with this)
  • Can show you a list of near-duplicates
  • Groups your pages by categories (Botify, OnCrawl, Jet Octopus, Audisto, FandangoSEO)
  • Can crawl JavaScript websites
  • Allows for exporting data to CSV/Excel even if there are millions of rows to export
  • Provides a list of all detected issues on a single dashboard (Ryte)
  • Can let you see which URLs are orphans (not found in crawls, but apparent in sitemaps)
  • Can let you compare two crawls to see if things are going in the right direction
  • Can let you easily add columns with additional data to existing reports
  • Is the one that satisfies your needs!

Disclaimer: Things I was not Able to Test

Although I did my best, I was not able to test everything. Some examples include:

  • Does a crawler have maintenance errors? Or crashes all the time, not allowing you to finish a crawl? Maybe. Even if I noticed it, I can’t be sure if it’s constant or just temporary. So, I did not mention it.
  • Are the reports provided by a crawler enough for most use cases? Are reports thorough and in-depth?

Now it’s your turn!

My job is over. Now, it’s up to you!

Choose a crawler, see some screenshots on the internet, call for a trial, and test it. Investigate if it fits your needs. Push it to its limits, integrate it with any data you have, and test it. Caution: it’s common that some advanced features are only available with Pro subscriptions. Before purchasing, make sure the plan you’re buying offers you all the features you need.

Have fun and good luck! If you feel I helped you, leave me a note by Twitter.

Preparing content like this consumes a lot of time. If you feel the article was helpful, please use one of these referral links:

Updates, Updates Everywhere!

I will do my best to keep this article up to date. However, all of the SEO crawlers are constantly improving.

If you’re a crawler representative, and you have updated your crawler, let me know which cell of the fact table should I change to reflect the new features and I will be happy to update it. Send me a short screencast to show me the proof.

I would like to say “thank you” to all the crawler representatives that helped me with creating this article.

Published
  • 30 August 2018
Comments
Category
Tomek

See all articles by Tomasz Rudzki

Did you like this article?

Why not share this article:

Subscribe to Elephate's newsletter to receive fresh SEO and Content Marketing updates and more!

Thanks! We are happy to have you on our list!

Expect some tasty news from the Elephate team soon.

Share

more blog posts

Back to Blog list