Angular and Restify SEO
Developing a website using Node.js, Restify and Angular is fairly simple. Using Angular's two way data binding together with Restify's automatic REST API creation, you can create a server enhanced web-app in no-time.
Getting Your website Indexed by the different search engines and social media web crawlers (bots) is a different story. None of the currently used bots are able to render javascript.
This means that when a crawler comes across your angular based app, it sees something like this:
To solve this issue, you need to supply an alternative html page which is already rendered to its final static appearance.
The solution is constructed out of 3 main tasks:
- Create static html snapshots
- Recognize a bot
- Serve the static snapshot instead of the original html page
We will use this blog as an example to demonstrate how we implement the solution.
Our architecture:
Server: Node.js + Restify + Mongodb
Client: Single page Angularjs app
The Blog contains an index page in the default "/" route, and a list of blog posts server at "/@blog_url".
We've added base and meta fragment tags to remove the default angular #:
<base href="/">
<meta name="fragment" content="!" />
Angular handles the routing and data fetching. The route file look like this:
$urlRouterProvider.otherwise('/');
$stateProvider
.state('index', {
url: "/",
templateUrl: "views/index.html",
controller: "indexCtrl"
})
.state('posts', {
url: "/:url",
templateUrl: "views/post.html",
controller: "postCtrl"
})
On the server side we serve the index.html file for all static get requests for the blog.apricode.co.il domain:
server.get('/.*', function(req, res, next) {
var url = req.url;
var fileExtension = url.split('.')[url.split('.').length - 1];
if ( url== "/" || (req.method.toLowerCase() == "get" && staticTypes.indexOf(fileExtension) == -1))
url = '/index.html';
else if (url[url.length - 1] == '/')
url = url + 'index.html';
req.url = url;
var hostname = req.headers.host.split(":")[0];
return serveStatic(config.server.domains[key].path, {fallthrough: false});
});
1. Creating static html snapshots:
We have decided to use grunt for this task, or more specifically grunt-html-snapshot, which take a list of url, and generates individual html snapshots of that list.
We used our server to create an array of all available posts, and save it to a posts.json file.
function savePostsJSON(posts) {
var postsArray = [];
for (var i = 0; i < posts.length; i++) {
postsArray.push("/" + posts[i].url);
}
fs.writeFile(path, JSON.stringify(postsArray),function(err) {
if (!err)
console.log("JSON Saved!")
});
}
Then we've created a grant task to generate all the snapshots:
var availablePosts = grunt.file.readJSON('posts.json');
grunt.loadNpmTasks('grunt-html-snapshot');
grunt.initConfig({
htmlSnapshot: {
blog:{
options: {
snapshotPath: 'snapshots/',
sitePath: 'http://blog.apricode.co.il',
fileNamePrefix: 'blog_',
urls: ["/"].concat(availablePosts),
msWaitForPages: 2000,
removeScripts: true,
sanitize: function (requestUri) {
if (/\/$/.test(requestUri)) {
return 'index';
} else {
return requestUri.replace(/\//g, '');
}
}
}
}
}
});
grunt.registerTask('default', 'htmlSnapshot');
The tasks creates a collection of "blog_" prefixed html files under the snapshot folder.
Notice we've given the task 2 seconds delay to allow page rendering on slow connections.
2. Recognizing a bot:
There are 2 types of bots:
Search Engine bots
Social media bots
The difference between them is that a search engine bot crawls that encounters a <meta name="fragment" content="!" /> tag, will send a new request containing a _escaped_fragment_= parameter with the requested url.
Social media bots will not send a new request, and are only recognizable by their user-agent.
Here is our code for recognizing the different bots:
Social media bots:
var userAgent = req.header('User-Agent').toLowerCase();
var socialUserAgents ='baiduspider|twitterbot|facebookexternalhit|rogerbot|linkedinbot|embedly|quora link preview|showyoubot|outbrain|pinterest|slackbot|vkShare|W3C_Validator'.toLowerCase().split('|');
var isSeo = false;
for (var i = 0; i < socialUserAgents.length; i++) {
if (userAgent.indexOf(socialUserAgents[i]) > -1) {
isSeo = true;
console.log("Found Social Bot, requested url: " + req.url);
}
}
Search engine bots:
if(url.indexOf('_escaped_fragment_') > -1)
isSeo = true;
3. Server the static snapshot instead of the original html page:
After we've recognized a bot, we will serve it a static html snapshot from the snapshots folder we've created in step 1.
Recognizing the desired url:
For social media bots the url remain the same.
For Search engines, we need to extract the parameter from the url:
if(url.indexOf('_escaped_fragment_') > -1) {
isSeo = true;
requestedUrl = url.split('_escaped_fragment_=')[1];
if (requestedUrl == "")
requestedUrl = "/";
}
After getting the required url, we need to serve the prefixed static html file:
var hostname = req.headers.host.split(":")[0];
var subDomain= hostname.split('.')[0];
var prefix = subDomain + '_';
if (requestedUrl == '/' || requestedUrl == '')
requestedUrl = prefix + 'index.html';
else if (requestedUrl[0] == '/')
requestedUrl = prefix + requestedUrl.substring(1,requestedUrl.length) + '.html';
else
requestedUrl = prefix + url + '.html';
And inside the server.get:
serveStatic(config.SEO.snapshotsPath, {fallthrough: false});
Testing:
We've tested the search engine bot by going to http://blog.apricode.co.il/?_escaped_fragment=/angular_restify_seo and checking we receive a static html file
Checking the Social media bot is done using entring the http://blog.apricode.co.il url into a post text input ( on facebook / linkedin), and receiving a correct site preview: