Ultimate Guide to JavaScript SEO

blog

The Ultimate Guide to JavaScript SEO

JavaScript SEO is a very hot topic nowadays as more and more websites are using modern JavaScript frameworks and libriaries like Angular, React, Vue.js and Polymer. The reality though is that SEO and developers are still at the very beginning of a journey to make modern JS frameworks successful in the search, as a lot of JavaScript websites – despite its popularity – are failing in Google. In this article I will try to explain why this might be happening and, when possible, offer some ways to get around it.

What you will learn:

  • How to ensure Google(bot) is able to properly render your website.
  • How to see your website the same way Google sees it.
  • What the most common JavaScript SEO mistakes are.
  • What does it mean that Google is going to get rid of the Old Ajax Crawling Scheme.
  • Which one is best: Prerendering, SPA or Isomorphic JavaScript?
  • If it’s proper to detect Googlebot by a user-agent and serve it pre-rendered content with plain HTML and CSS.
  • If other search engines like Bing are able to render JavaScript.

We have a lot to cover, so get yourself a cup of coffee (or two) and let’s get started.

Can Google Crawl and Render JavaScript?

Since 2014, Google has claimed that they are pretty good at rendering JavaScript websites. However, despite their claims, they have always advised caution regarding this matter. Take a look at this excerpt from “Understanding Web Pages Better” (emphasis mine):

“Sometimes things don’t go perfectly during rendering, which may negatively impact search results for your site . . . Sometimes the JavaScript may be too complex or arcane for us to execute, in which case we can’t render the page fully and accurately . . . Some JavaScript removes content from the page rather than adding, which prevents us from indexing the content.”

There are three factors at play here: 1) crawlability (Google should be able to crawl your website with a proper structure); 2) renderability (Google shouldn’t struggle with rendering your website); and 3) crawl budget (how much time Google will take to crawl and render your website).

CLIENT-SIDE RENDERING VS SERVER-SIDE RENDERING 

As we discuss whether Google can crawl and render JavaScript, we need to address two very important concepts: Server-side rendering and client-side rendering. It’s necessary for every SEO who deals with JavaScript to understand them.

In the traditional approach (server-side rendering), a browser or Googlebot receives a HTML that completely describes the page. The content copy is already there – your browser (or Googlebot) just needs to download CSS and “paint” the content on the screen. Usually search engines do not have any problems with server-side rendered content.

The increasingly popular client-side rendering approach is a little different and search engines sometimes struggle with it. Here, it’s pretty common that at the initial load, a browser or Googlebot gets a blank HTML page (with little to no content copy). Then the magic happens: the JavaScript asynchronously downloads the content copy from the server and updates your screen (and changes the DOM).  

If you have a client-side rendered website, you should ensure Google can crawl and render it properly.   

JavaScript is very error-sensitive

HTML and JS are totally different regarding error handling. A single error in your JavaScript code can cause Google to be unable to render your page.

Let me quote Mathias Schäfer, the author of the online book Robust JavaScript (emphasis mine):

The JavaScript parser is not that polite. It has a draconian, unforgiving error handling. If it encounters a character that is not expected in a certain place, it immediately aborts parsing the current script and throws a SyntaxError. So one misplaced character, one slip of the pen can ruin your script.”

SOMETIMES THE DEVELOPER MAKES A MISTAKE

I’m confident you heard about the jsseo.expert experiment Bartosz Góralewicz, CEO at Elephate, conducted to see if Google can handle websites built using common JavaScript frameworks.

At the beginning, it turned out that Googlebot couldn’t render Angular 2. This was strange because Angular was created by the Google team, so Bartosz set out to discover how this could happen. I’ll let Bartosz explain the rest (emphasis mine):

And it turned out that there was an error in Angular 2’s QuickStart, a kind of tutorial for how to set up Angular 2-based projects, which was linked in the official documentation. All that research to discover that the Google Angular team had made a mistake. On April 26, 2017, that mistake was corrected.”

Finally, correcting these errors allowed us to index the Angular 2 test website with all of the content.

This example perfectly illustrates the situation when a single error can cause Googlebot to not be able to render a page.

Adding fuel to the fire, the mistake was not made by beginner developers. It was made by the contributors of Angular, the second most popular JavaScript framework.

If you would like to read more on this topic, I strongly recommend reading Bartosz’s article “Everything You Know About JavaScript Indexing is Wrong”.

Still with me here? Good, because I have another great example (I’m sorry, Angular :)).

In December 2017, Google deindexed a few pages of Angular.io (a client-side JavaScript rendered website based on Angular 2+).  Why did this happen? As you might have guessed, a single error in their code made it impossible for Google to render their page and caused a massive de-indexation.

The error has since been fixed.

Here is how Igor Minar from Angular.io explained it (emphasis mine):

“Given that we haven’t changed the problematic code in 8 months and that we experienced significant loss of traffic originating from search engines starting around December 11, 2017, I believe that something has changed in crawlers during this period of time which caused most of the site to be de-indexed, which then resulted in the traffic loss.”

Fixing the mentioned rendering error on Angular.io was possible thanks to the experienced team of JavaScript developers and the fact that they implemented error loggingFixing the error let the problematic pages get indexed again.

The Complexity of JavaScript Crawling

In the case of traditional HTML, everything is easy and straightforward:

  1. Googlebot downloads an HTML file.
  2. Googlebot extracts the links from the source code and can visit them simultaneously.
  3. Googlebot downloads the CSS files.
  4. Googlebot sends all the downloaded resources to the Indexer (Caffeine).
  5. The indexer (Caffeine) indexes the page.

The whole process is lightning fast.

Things get complicated when it comes to a JavaScript-based website:

  1. Googlebot downloads an HTML file.
  2. Googlebot downloads the CSS and JS files.
  3. Afterwards, Googlebot has to use the Google Web Rendering Service (a part of the Caffeine Indexer) to parse, compile and execute a JS code (IT jargon, sorry :)).
  4. Then WRS fetches the data from external APIs, from the database, etc.
  5. Finally, the indexer can index the content.
  6. Now Google can discover new links and add it to the Googlebot’s crawling queue.

The whole process is much more complicated than HTML crawling. The following things should be taken into account:

  • Parsing, compiling and running JS files is very time-consuming.
  • In the case of a JavaScript-rich website, Google has to wait until all the steps are done before it can index the content.
  • The rendering process is not the only thing that is slower. It also refers to a process of discovering new links. With JavaScript-rich websites, it’s common that Google cannot discover new URLs without waiting until a page is rendered.

Now, I would like to illustrate the problem with JavaScript’s complexity. I bet 20-50% of your website users view it on their mobile.

Do you know how long it takes to parse 1 MB of JS on a mobile? According to Sam Saccone from Google: Samsung Galaxy S7 will do it in ~850ms and Nexus 5 in ~1700ms!

After parsing JavaScript, it has to be compiled and executed. Every second counts.

If you want to know more about the crawl budget, I advise you to read Barry Adams’ article “JavaScript and SEO: The Difference Between Crawling and Indexing”. The JavaScript = Inefficiency and Good SEO is Efficiency sections in particular are must-haves for every SEO who deals with JavaScript. And while you’re at it, you can read another article on the subject from Bartosz: “JavaScript vs Crawl Budget: Ready Player One”.

UPDATE: Google officially stated that “The rendering of JavaScript powered websites in Google Search is deferred until Googlebot has resources available to process that content.”

So, all things being equal, it takes much more time to crawl and index JS websites than HTML ones (the rendering process takes a lot of time + your page has to wait in the rendering queue).

There are two waves of indexing. In the first wave, Google indexes the source code of your page (Ctrl + U). However, it’s common for modern JS websites that the source doesn’t contain any content copy.  In this case, Google indexes virtually nothing,

Then, when the rendering resources become available, Google renders a page and, finally, it can index the real content and let your website rank in the Google Search Results.

This has strong implications. If you have a large, constantly changing JavaScript website, Google may struggle with crawling and indexing it.

Let’s say you own a real estate website. Usually, real estate agents publish the same advertisement across many websites. If Google indexes the offers on your competitors’ websites within a few hours and your offers get indexed in a week or more, you will lose a lot of money. You will simply be outperformed by your competitors.

What’s the point of Google indexing the n-th page with the same content?

Consider another example: websites advertising used cars. Here, the advertisements can expire within a few days from the publication date.  Assuming Google can index a new offer within 2-3 days, it may be too late…

There’s no point in Google indexing expired content and no point for users to see that content.

Need more examples? Think of websites with job offers, hotel booking, etc.

Google’s Technical Limitations

Let’s take a look at Google’s limitations. You should be aware of these limitations to know why Google can struggle with your JavaScript website.

GOOGLE USES A 3 YEAR OLD BROWSER

I bet you use the most recent version of any browser, right?

But Googlebot doesn’t! It uses Chrome 41 for rendering websites. This browser was released in March 2015. It’s been 3 years! IT (and JavaScript!) has developed so much over that time!

There are many modern features that are simply not accessible to Googlebot.

Some of the major limitations include:

  • Chrome 41 supports modern ES6 JavaScript syntax only partially. For example, it doesn’t support constructions like “let” outside of “strict mode” (I know, IT jargon!).
  • Interfaces like IndexedDB and WebSQL are disabled.
  • Cookies, and local and session storage are cleared across page loads.
  • Again, it’s a 3-year-old browser!

Regarding the technical limitations of Chrome 41, you can see the differences between Chrome 41 and Chrome 64 (the most recent Chrome version at the time of writing this article) here (it may look like something written in Elvish, but your developer will certainly know what is going on :)).

Now that you know Google uses Chrome 41 for rendering, take the time to download this browser and check some of your websites to see if they can be rendered properly. If not, check the console log in Chrome 41 to see what may be the cause. And not to toot my own horn, but I wrote a whole article about this very thing.

WHAT IF I WANT TO USE MODERN JAVASCRIPT FEATURES? 

Of course, you can do it. But it’s not out-of-the-box.


GRACEFUL DEGRADATION, POLYFILLS – SAY WHAT?  

JavaScript’s popularity has grown rapidly and is now faster than ever. However, some JavaScript features are simply not implemented in older browsers (coincidentally enough, Chrome 41 is an example of an older browser) and, as a result, destroys rendering. But webmasters can handle it by using graceful degradation.

If you would like to implement some modern features that only a few browsers support, then you should ensure that your website degrades gracefully in an older browser. Remember: Googlebot is definitely not a modern browser. It is 3 years old.

By performing a feature detection, you can check if the browser supports a feature at any time. If not, you should instead offer it a feature that is supported by the browser, called a polyfill.

Also, if you want your website to be rendered by Google Search, you SHOULD make use of transpiling to ES5 (translating these JavaScript statements which are not understandable by Googlebot to the ones it can understand).

For example, when a transpiler encounters “let x=5” (a statement that many older browsers can’t understand), it translates it to “var x=5” (an expression which is totally understandable by older browsers, including Chrome 41 which is used by Google for rendering – I know I keep repeating this, but it’s important!).

If you are using modern JavaScript features and you care about properly rendering your websites by Google, you should definitely use a transpilation to ES5 and polyfills.

I tried to do my best to explain these concepts to you, however, I understand it may be difficult to understand at first glance. You can get to know more on the subject by reading “Polyfills: everything you ever wanted to know, or maybe a bit less”.

Googlebot Doesn’t Act Like a Real Browser

When you surf the internet, your browser (Chrome, Firefox, Opera, whatever) downloads all the resources (images, scripts, stylesheets) and shows the rendered view to you.

However, Googlebot acts differently than your browser. It aims for crawling the entire internet and grabbing only the valuable resources.

The World Wide Web is huge though, so Google optimizes its crawlers for performance. This is why Googlebot sometimes doesn’t visit all the pages the webmasters want.

Most importantly, Google algorithms try to detect if a resource is necessary from a rendering point of view. If not, it probably won’t be fetched by Googlebot.

So Google may not pick some of your JS files because its algorithm decided it’s not necessary from a rendering point of view, or simply because of performance issues (i.e. it took too long to execute a script).

[Side note: Tom Anthony noticed some interesting behavior from Googlebot.  When you use the setTimeout JavaScript function, a real browser is instructed to wait a particular amount of time. However, Googlebot doesn’t wait and runs everything immediately. It shouldn’t be astonishing since Google robots have to crawl the whole internet, so they should be optimized for performance. ]

THE 5-SECOND RULE

Although the exact timeout is not specified, it’s said that Google can’t wait for a script longer than 5 seconds. Our experiments confirm this rule.

On the JavaScript SEO group, John Mueller said:

There’s no specific timeout value that we define, mostly because the time needed for fetching the resources is not deterministic or comparable to a browser (due to caching, server load, etc). 5 seconds is a good thing to aim for, I suspect many sites will find it challenging to get there though 🙂

So far, I haven’t found any good resources on Google timeouts since it’s extremely difficult to debug. In my opinion, the following factors are taken into consideration:

  • page importance,
  • Google’s current servers load,
  • number of URLs in the rendering queue
  • other advanced heuristics.  

If your website loads really slow, you can lose a lot:

  • Your users will feel irritated and can leave your website.
  • Google could have issues related to rendering your content.
  • It can slow down the crawling process. If a page is slow, Googlebot can notice its crawlers are slowing down your website and decide to decrease the crawl rate. You can read more about it in Wojtek Murawski’s article “Website Performance and the Crawl Budget”.

Make sure your website is lightweight and your server responses fast, as well as make sure the server doesn’t fail when the load increases (use Load Impact to check it). Don’t make Googlebot’s job more difficult than it should be 🙂

A common performance mistake made by developers is placing all the component codes into a single file. If users navigate to a homepage, they really don’t need to download the code used only in the admin area.  

At the end of the day, you need to find a performance guide specified for your JavaScript framework. Learn what you can do to make it faster. Also, I strongly recommend reading “JavaScript Start-up Performance” by Addy Osmani.  

Be Like Google: How to See a Website Like Google Does

If you want to be one with Google and see the internet (and your website especially) the same way Google sees it, there are two ways of doing this:

  1. Use the Google Search Console Fetch and Render tool (obviously!). But don’t rely on it 100%. The real Googlebot can have different timeouts than Fetch and Render.
  2. Use Chrome 41. It’s confirmed that Google uses this browser for rendering. You can download it from ele.ph/chrome41. Using Chrome 41 has many advantages over fetching by Google Search Console:
    • Thanks to using Chrome 41, you can see the console log. If you see errors in Chrome, you can be almost sure Googlebot also receives the errors.
    • Fetch and Render doesn’t show you the rendered DOM, but Chrome 41 does. By using Chrome 41 you can ensure if Googlebot can see your links, tab content, etc.
    • Use the Rich Results testing tool. I’m not joking –  this tool can show you how Google interpreted your page. It shows you the rendered DOM, which is very useful. Google plans adding the rendered DOM output to their Fetch and Render Tool. Since it’s not done yet, John Mueller advised using the Rich Results Testing tool for it 🙂 At the moment, it’s unclear if the Rich Results follows the same rendering rules as Google Indexer.
    • Use Google Mobile friendly test – it can show you the rendered DOM + the errors that Google encountered while rendering your page. I asked John Mueller on Twitter and he confirmed it follows the same rendering rules as Google Web Rendering Service/Chrome 41. Good to know!

I prepared a quick comparison of tools you can use to ensure Google can render your website.

Google Search Console Fetch and Render is Not a Reliable Tool for Checking the Indexer Timeouts

GSC Fetch and Render can only tell you if Google is TECHNICALLY able to render; however, don’t rely on this when it comes to timeouts. I have seen it many times when Google Fetch and Render was able to render a page but the Google indexer was not able to index the content because of the timeouts it encountered.

Let me show you some evidence:

I created a simple experiment. The first JavaScript file included in this page was delayed by 120s. There was no technical option to omit the delay. The Nginx server was instructed to wait 2 minutes.

It turned out that Fetch and Render waited 120s (!) for a script and rendered the page correctly.

But the indexer was less patient…

Here the Google Indexer just omitted the first script (delayed by 120 seconds) and rendered the rest of the page.

As you can see, Google Search Console is a great tool. But you should use it ONLY to check if Google is technically able to render a page. Don’t use it if you want to ensure Google will wait for your scripts.

DON’T ANALYZE GOOGLE CACHE WHEN AUDITING JAVASCRIPT-RICH WEBSITES

Many SEOs used to spot rendering issues by using Google Cache. However, this technique is not valid for JS-rich websites, because Google cache itself is just the raw initial HTML that Googlebot received from the server (NotaBene, as confirmed by Google’s John Mueller many times).  

What you see when clicking on the cache is how YOUR browser interprets the HTML “collected” by Googlebot. It’s totally unrelated to how Google rendered your page!

If you want to know more about Google cache, I recommend reading “ Why Google Cached Pages Won’t Tell You Anything About Indexed Content”.

USE THE “SITE” COMMAND RATHER THAN GOOGLE CACHE

For the time being, one of the best options for checking if some content is indexed by Google is the “site” command”.

To do this, just copy some text fragment from your page and type the following command in Google: site:{your website} “{fragment}”

If a snippet with your fragment shows up, that means your content is indexed.

My personal advice: it’s good to perform a “site:” query with a fragment in incognito mode.

I had some cases when editors changed the content and for some reason, the “site” query still “claimed” the old content was still here. After switching to incognito mode in my browser, the “site” query was returning the proper results.

“VIEW SOURCE” IS NOT ENOUGH WHEN AUDITING JS WEBSITES  

HTML is a file that represents just the raw information used by the browser to parse the page. I assume you know what an HTML document is. It contains some markups, like paragraphs, images, links, and references to JS and CSS files.

You can see the initial HTML of your page by simply right-clicking -> View page source.

However, by viewing it you will not see any dynamic content (updated by JavaScript).

Instead, you should look at the DOM. You can do it by right-clicking -> Inspect element.

The difference between the initial HTML from the server vs DOM

  • The initial HTML from a server (right click -> View page source) is just a cooking recipe. It provides information about what ingredients you should use to bake a cake. It contains a set of instructions. But it’s not the actual cake.
  • DOM (right click -> inspect element) is the actual baking of the cake. In the beginning, it’s just a recipe (an HTML document) and then, after sometime it gains a form and then it’s baked (page fully loaded).

Note: If Google fails rendering, it can index just the initial HTML (which doesn’t contain dynamically updated content). You can find more on this topic by reading Barry Adams’ article “View Source: Why it Still Matters and How to Quickly Compare it to a Rendered DOM”. Barry also gives tips on how you can quickly compare the initial HTML with the DOM.

Common Pitfalls with JS Websites

BLOCKING JS AND CSS FILES FOR GOOGLEBOT

Since Googlebot is able to crawl JavaScript and render the content, you should make sure that any internal and external resources required for rendering are not blocked for Googlebot.

USE GOOGLE SEARCH CONSOLE

My advice is that if you see a significant ranking drop in the case of a robust website, you should check Fetch and Render to see if Google can still render your website.

Generally, it’s good practice to use Fetch and Render on a random sample of URLs from time to time to ensure if a website renders properly.


FOCUSING ON THE “ONCLICK” EVENT – GOOGLEBOT DOESN’T CLICK  

Remember that Googlebot is not a real user, so take it for granted that it doesn’t click, doesn’t fill the forms. This has many practical implications:

  • If you have an online store and content hidden under a “show more” button is not apparent in the DOM before clicking, it will not be picked up by Google.  Important note: It also refers to menu links.
  • All the links should contain the “href” parameter. If you use only the onClick event, Google will not pick up these links.

Here’s what John Mueller said about this:

If you still are not sure if Google can pick up your links, check out this slide from the Google I/O conference:

In my previous article, “Chrome 41: The Key to Successful Website Rendering” I wrote a quick tutorial on how to check if Google can pick up your menu. I strongly recommend reading it!

INJECTING CANONICAL TAGS VIA JAVASCRIPT

When you want to use canonical tags, make sure they are placed in plain HTML/X-robots tags.

Canonical tags injected by JavaScript are considered less reliable and the chances are great Google will ignore them.

Tom Greenway warned about this issue during the Google I/O conference and John Mueller confirmed it on Twitter:

Side note: although there are great SEO experiments created by Eoghan Henn proving it’s possible for Google to respect canonicals injected by JavaScript

John Mueller tweeted that he didn’t recommend it:

“IMO I would not rely on this though, if you really want a URL as a canonical, do the work to get the signal in right from the start”

His opinion is based on some internal observations made in Google showing canonical tags in HTML are more reliable than those injected by JS.

USING HASHES IN URLs

It’s still common that many JavaScript frameworks generate URLs with a hash. There is a real danger that such URLs may not be crawled by Googlebot:

  • Bad URL: example.com/#/crisis-center/
  • Bad URL: example.com#URL
  • Good URL: example.com/crisis-center/

Note: this section doesn’t refer to hashbang URLs (#!).

You may think: it shouldn’t be important at all – it’s just a single additional character in the URL.

No, it’s very important.

Here is John Mueller again (emphasis mine):

“(…) For us, if we see the kind of a hash there, then that means the rest there is probably irrelevant. For the most part we will drop that when we try to index the content (…). When you want to make that content actually visible in search, it’s important that you use the more static-looking URLs.”

You need to make sure your URL doesn’t look like this: example.com/resource#dsfsd

Angular 1 by default uses hashtag-based URLs. You can fix it by configuring $locationProvider (here is a tutorial!) Fortunately, Angular 2 uses Google-friendly URLs by default.

SLOW SCRIPTS, SLOW APIs

Many JavaScript-based websites fail because Google has to wait too long for a script (downloading, parsing, executing). This is why you should expect Googlebot to burn through its crawl budget. Make sure your scripts are fast and Google doesn’t have to wait too long to fetch them. I strongly recommend reading “Optimizing the Critical Rendering Path”.

BAD SEO VS. JS SEO 

I want to take a moment and address a problem that could affect even the best SEO.

It’s important to remember that JavaScript SEO is done on top of traditional SEO, and it’s impossible to be good at the former without being good in the latter. And sometimes when you encounter an SEO problem, your first instinct might be that it’s related to JS when in fact it’s related to traditional SEO.

I don’t want to waste time explaining how this can happen when it’s already been explained very well in Justin Briggs’ article “Core Principles of SEO for JavaScript” in the section on Confusing Bad SEO with JavaScript Limitations. So give that a read when you’re finished here.

The huge change: Googlebot will no longer use AJAX Crawling Scheme in Q2 2018

Google announced they will no longer use the Ajax Crawling Scheme starting from Q2 2018. Does this mean that Google will stop crawling websites using Ajax (Asynchronous JavaScript)? Absolutely not!

To quickly explain what the old AJAX Crawling Scheme, Google realized that more and more websites were using JavaScript. But at that time they were not able to render JavaScript.

So they asked webmasters: please create a spider-friendly version (with no JS!) of every page you offer and make them accessible for us, by adding _=escaped_fragment_= to the URL.

When users were viewing example.com and enjoying it, Googlebot was visiting it’s “ugly” equivalent: example.com?_=escaped_fragment_= (I’m not joking, this is still very popular :))

Here is how it works (graphics via Googleblog):

With this, webmasters were able to kill two birds with one stone. Both users and crawlers were satisfied. Users were receiving JavaScript-rich website, and search engines were able to properly index the content (since they were receiving plain HTML + CSS).

However, the old Ajax Crawling Scheme was not a perfect solution. Since users were receiving a different version of a page than spiders, spotting issues was really hard. Also, some webmasters had some issues with pre-rendering the content for Googlebot.

This is why Google announced that in the second quarter of 2018 webmasters will no longer need to build two different versions of a website:

What does this mean for you?

  1. Google will be rendering your website on their end, meaning you SHOULD make sure Google will technically be able to do it.
  2. Googlebot will stop visiting the “ugly” URLs (containing an escaped fragment) and start requesting the same URL as users do.
  3. Also, you will have to find a way to make your website crawlable and renderable for Bing and other search engines that are far behind Google when it comes to JavaScript rendering. Possible solutions include server-side rendering (a universal JavaScript), or… still use an old Ajax crawling Scheme for Bingbot. I will discuss this topic later in the article.

WILL GOOGLE USE THE MOST RECENT BROWSER FOR RENDERING?  

For now, it’s unclear if Google plans to upgrade their rendering service (to support the most recent technologies). I will keep my fingers crossed for Google! I hope they will not fail.

WHAT IF I DON’T TRUST THAT GOOGLE WILL BE ABLE TO RENDER MY WEBSITE?

I have been thinking about this matter a lot, which is why I decided to ask John Mueller at the JavaScript SEO Forum if I can detect Googlebot by the user agent and just serve a pre-rendered version for Googlebot.

He replied:

John Mueller agrees you can detect Googlebot by checking the user agent and serve it a pre-rendered HTML snapshot. On top of that, he advised monitoring the pre-rendered version regularly to make sure the pre-rendering works properly.

Update: During the Google I/O conference, John Mueller confirmed once again that you can detect Googlebot by user-agent and serve it a prerendered version, while users get a normal, dynamically updating website.

This is the best of both worlds for users and Googlebot. The user experience remains the same while Googlebot gets a pre-rendered snapshot which can be crawled and indexed easier and faster.

John gave some insights on when you should use dynamic rendering:

Don’t Forget About Bing!

Let’s assume Googlebot renders JavaScript perfectly and you don’t have any issues with this. Does it mean you can start celebrating?

Hold on! You probably forgot about Bing, which is used by a third of US internet users.

For now it’s safe to assume Bing doesn’t render JavaScript (there are a lot of rumors that Bing is rendering JavaScript on high authority pages, but I couldn’t find any real-life examples).

Let me share a very interesting case.

Angular.io is the official website of Angular 2+. Some pages of Angular.Io are built using the Single-Page Application approach. That means its initial HTML contains no content. Then, an external JS file loads all the necessary content.

It seems that Bing cannot see the Angular.io content!

This website ranks #2 for “Angular” in Bing.

What about “Angular4”? Again, it ranks #2, behind AngularJS.org (an official website of Angular 1).

“Angular 5”? Again, #2.

If you want evidence that Bing cannot deal with Angular.io, try to find any site-related fragment by using the “site” command. It’s impossible!

This is kind of a weird. Angular 2’s official website cannot be properly crawled and indexed by Bingbot.

What about Yandex? Angular.io doesn’t even rank in the top 50 for “Angular” in Yandex!

I asked Angular.io on Twitter if they considered introducing some solutions to make it crawlable by search engines like Bing. But as of this writing, they have not responded.

The point is, don’t forget about Bing when pushing new web technologies. Consider using Isomorphic JavaScript/prerendering.

Prerendering vs Isomorphic JavaScript?

When you notice Google struggling with your client-rendered website, you can consider pre-rendering or Isomorphic JavaScript. But which is better?  

  • Prerendering is when you notice search engine crawlers are not able to render your website and you basically do it on your end. When a crawler visits your website, you just feed them an HTML snapshot (with no JS). At the same time, users get the JavaScript-rich version of your page. The snapshot is used only by bots, not by normal users. You can use external services for prerendering (like Prerender.io), or use tools like PhantomJS or Chrome Headless on your server side.
  • Isomorphic JavaScript is another popular approach. Here, both user and search engines receive a page full of content at the initial load. Then all the JavaScript-rich features are loaded on top of this. It’s good for both users and search engines. It’s the most recommended option, even by Googlers.

There is one problem though: a lot of developers struggle with implementing isomorphic JavaScript.

Check your framework docs to get to know how to do Server-Side Rendering in your JavaScript framework.

For Angular, you can use Angular Universal. For React, you can read the documentation and watch this Udemy course.

React 16 (which was released in November) has brought many improvements regarding Server-Side Rendering. One of the new options in React 16 is the “RenderToNodeStream” function which makes the whole Server-side rendering process easier.

Developers: If you want your website to be server side rendered, you should avoid using functions operating directly on the DOM. Let me quote Wassim Chegham, a developer expert at Google: “One of THE MOST IMPORTANT best practices I’d recommend following is: Never touch the DOM.”

[By the way, I’m aware of the contradictory double-negative being utilized in the above graphic, but for the sake of expediency, let’s all agree it means NEVER TOUCH THE DOM!]

DOES GOOGLEBOT TREAT HTML AND JS WEBSITES THE SAME WAY?

At Elephate we performed some experiments to investigate how deeply Googlebot can move by discovering and following links in the case of HTML and JavaScript sites.

The experiment results were astonishing. In the case of the HTML website, Googlebot was able to reach all the pages. However, in the case of the JS website, it was common that Googlebot even was not able to reach the second level. We repeated the experiment on 5 different domains, but the results were always the same.

Bartosz Góralewicz (this guy, again!) reached out to Google’s John Mueller and asked him what the issue was. John confirmed that Google saw the [JavaScript] links, but “Googlebot didn’t feel like crawling them”. He added: “We don’t crawl all URLs, or crawl them all quickly, especially when our algorithms aren’t sure of the value of the URL – another tricky aspect of a test site.

If you are interested to hear more about this topic, read Bartosz’s article “Everything You Know About JavaScript Indexing is Wrong”.

MY TWO CENTS

It’s need to be pointed out that even though these sites were created just for the sake of the experiment, the content copy was created equal. It was all generated by Articoolo, a cool artificial intelligence content generator. It produces a pretty nice output. Certainly better than any content that I ever created  😉

Googlebot received two very similar websites and crawled only one of them, favoring an HTML website over a JS one.

  • Hypothesis I: Google algorithms classified both websites as test sites. Then assigned a fixed execution time for all of them: a) 6 pages, or let’s say, b) 20s of execution (fetch all the resources + render them).
  • Hypothesis II: Googlebot classified both websites as test websites. In the case of a JS website it noticed it takes too much time to fetch a resource, so it just skipped it.

The aforementioned website had a lot of high-quality natural links. People were sharing it willingly over the internet. Also, the website was getting some organic traffic.

So in this case, when is it a test site and when is it a real website? This is tricky.

The chances are great that if you create a new client-side rendered website you have a 100% identical situation as ours. Googlebot won’t crawl your content then. This is when it stops being a theoretical situation and the problem becomes real. Also, the real issue here is that there is virtually no real life case of a client rendered JS website/brand/store ranking high. So I can’t guarantee that your JavaScript-rich website will rank as high as its HTML equivalent.

And you might be thinking that sure, most SEOs are aware of this, and big companies can spend money to deal with these issues; however, what about the small businesses who don’t have the money or knowledge? This could pose a real danger for a small mom and pop restaurant with their client-rendered JS website.

The Most Important Takeaways

  • Google uses Chrome 41 to render websites. This version of Chrome was released in 2015, so it doesn’t support all modern JavaScript features. You can use Chrome 41 to ensure Google is able to render your content. You can get to know more by reading my other article: Chrome 41: The Key to Successful Website Rendering
  • Usually, it’s not enough to analyze just the source page (HTML) of your website. Instead, you should analyze the DOM (Right click -> Inspect tool).
  • Google is the only search engine that renders JavaScript at scale. 
  • You shouldn’t use Google cache to check how Google indexes your content. It only tells you how YOUR browser interpreted the HTML collected by Googlebot. It’s totally unrelated to how Google rendered your content.  
  • Use the Fetch and Render tool often. But don’t rely on the Fetch and Render timeouts. Indexing timeouts can be totally different.
  • The “site:” command is your friend.
  • You should make sure if a menu is apparent in the DOM before clicking on any menu items.
  • Google algorithms try to detect if a resource is necessary from a rendering point of view. If not, it probably won’t be fetched by Googlebot.
  • You should make sure your scripts are fast (a lot of experiments point out that Google is unlikely to wait for a script for more than 5s). Make sure your server doesn’t slow down when the host load increases! Also, try to optimize your scripts. There are so many things to improve!
  • If Google fails rendering, it can pick up just the raw HTML for indexing. This CAN break your Single-Page Application (SPA), because Google may index just a blank page(!)
  • There is virtually no real life case of a client rendered JS website/brand/store ranking high. Why do you think that is?

In Conclusion

The SEO branch isn’t certain if Google treats (and ranks!) JavaScript-based websites as equally as HTML-based sites. With this knowledge, it’s clear that SEO and developers are just starting to understand how to make modern JavaScript frameworks crawlable. So it’s important to remember that there’s no one-to-fit-all rule. Every website is different. If you plan to build a JavaScript-rich website, make sure you work with developers and SEOs who know their job.

[It should be noted that John Mueller was given access to an earlier draft of this article and had provided the writer with feedback, which was much appreciated.]

Published
  • 11 June 2018
Comments
Category
Tomek

See all articles by Tomasz Rudzki

Did you like this article?

Why not share this article:

Be in the loop. Get fresh SEO and Content Marketing updates!

Thanks! We are happy to have you on our list!

Expect some tasty news from the Elephate team soon.

Share

more blog posts

Back to Blog list