Angular and Restify SEO

by Yoni Goyhman, November 5, 2015

How to create a Node & Restify server that supports SEO for an Angular based website.

Developing a website using Node.js, Restify and Angular is fairly simple. Using Angular's two way data binding together with Restify's automatic REST API creation, you can create a server enhanced web-app in no-time.

Getting Your website Indexed by the different search engines and social media web crawlers (bots) is a different story. None of the currently used bots are able to render javascript.

This means that when a crawler comes across your angular based app, it sees something like this:

To solve this issue, you need to supply an alternative html page which is already rendered to its final static appearance.

The solution is constructed out of 3 main tasks:

Create static html snapshots
Recognize a bot
Serve the static snapshot instead of the original html page

We will use this blog as an example to demonstrate how we implement the solution.

Our architecture:

Server: Node.js + Restify + Mongodb

Client: Single page Angularjs app

The Blog contains an index page in the default "/" route, and a list of blog posts server at "/@blog_url".

We've added base and meta fragment tags to remove the default angular #:

<base href="/">
<meta name="fragment" content="!" />

Angular handles the routing and data fetching. The route file look like this:

$urlRouterProvider.otherwise('/');
$stateProvider
    .state('index', {
        url: "/",
        templateUrl: "views/index.html",
        controller: "indexCtrl"
    })
    .state('posts', {
        url: "/:url",
        templateUrl: "views/post.html",
        controller: "postCtrl"
    })

On the server side we serve the index.html file for all static get requests for the blog.apricode.co.il domain:

server.get('/.*', function(req, res, next) {
    var url = req.url;
    var fileExtension = url.split('.')[url.split('.').length - 1];

    if ( url== "/" || (req.method.toLowerCase() == "get" && staticTypes.indexOf(fileExtension) == -1))
        url = '/index.html';
    else if (url[url.length - 1] == '/')
        url = url + 'index.html';

    req.url = url;

    var hostname = req.headers.host.split(":")[0];
    return serveStatic(config.server.domains[key].path, {fallthrough: false});
});

1. Creating static html snapshots:

We have decided to use grunt for this task, or more specifically grunt-html-snapshot, which take a list of url, and generates individual html snapshots of that list.

We used our server to create an array of all available posts, and save it to a posts.json file.

function savePostsJSON(posts) {
    var postsArray = [];
    for (var i = 0; i < posts.length; i++) {
        postsArray.push("/" + posts[i].url);
    }

    fs.writeFile(path, JSON.stringify(postsArray),function(err) {
        if (!err)
            console.log("JSON Saved!")
    });
}

Then we've created a grant task to generate all the snapshots:

var availablePosts = grunt.file.readJSON('posts.json');

grunt.loadNpmTasks('grunt-html-snapshot');
grunt.initConfig({
    htmlSnapshot: {
        blog:{
            options: {
                snapshotPath: 'snapshots/',
                sitePath: 'http://blog.apricode.co.il',
                fileNamePrefix: 'blog_',
                urls: ["/"].concat(availablePosts),
                msWaitForPages: 2000,
                removeScripts: true,
                sanitize: function (requestUri) {
                    if (/\/$/.test(requestUri)) {
                        return 'index';
                    } else {
                        return requestUri.replace(/\//g, '');
                    }
                }
            }
        }
    }
});

grunt.registerTask('default', 'htmlSnapshot');

The tasks creates a collection of "blog_" prefixed html files under the snapshot folder.

Notice we've given the task 2 seconds delay to allow page rendering on slow connections.

2. Recognizing a bot:

There are 2 types of bots:

Search Engine bots

Social media bots

The difference between them is that a search engine bot crawls that encounters a <meta name="fragment" content="!" /> tag, will send a new request containing a _escaped_fragment_= parameter with the requested url.

Social media bots will not send a new request, and are only recognizable by their user-agent.

Here is our code for recognizing the different bots:

Social media bots:

var userAgent = req.header('User-Agent').toLowerCase();
var socialUserAgents ='baiduspider|twitterbot|facebookexternalhit|rogerbot|linkedinbot|embedly|quora link preview|showyoubot|outbrain|pinterest|slackbot|vkShare|W3C_Validator'.toLowerCase().split('|');
var isSeo = false;
for (var i = 0; i < socialUserAgents.length; i++) {
    if (userAgent.indexOf(socialUserAgents[i]) > -1) {
        isSeo = true;
        console.log("Found Social Bot, requested url: " + req.url);
    }
}

Search engine bots:

if(url.indexOf('_escaped_fragment_') > -1)
    isSeo = true;

3. Server the static snapshot instead of the original html page:

After we've recognized a bot, we will serve it a static html snapshot from the snapshots folder we've created in step 1.

Recognizing the desired url:

For social media bots the url remain the same.

For Search engines, we need to extract the parameter from the url:

if(url.indexOf('_escaped_fragment_') > -1) {
    isSeo = true;
    requestedUrl = url.split('_escaped_fragment_=')[1];
    if (requestedUrl == "")
        requestedUrl = "/";
}

After getting the required url, we need to serve the prefixed static html file:

var hostname = req.headers.host.split(":")[0];
var subDomain= hostname.split('.')[0];
var prefix = subDomain + '_';
if (requestedUrl == '/' || requestedUrl == '')
    requestedUrl = prefix + 'index.html';
else if (requestedUrl[0] == '/')
    requestedUrl = prefix + requestedUrl.substring(1,requestedUrl.length) + '.html';
else
    requestedUrl = prefix + url + '.html';

And inside the server.get:

serveStatic(config.SEO.snapshotsPath, {fallthrough: false});

Testing:

We've tested the search engine bot by going to http://blog.apricode.co.il/?_escaped_fragment=/angular_restify_seo and checking we receive a static html file

Checking the Social media bot is done using entring the http://blog.apricode.co.il url into a post text input ( on facebook / linkedin), and receiving a correct site preview: