Angular and Restify SEO
Developing a website using Node.js, Restify and Angular is fairly simple. Using Angular’s two way data binding together with Restify’s automatic REST API creation, you can create a server enhanced web-app in no-time.
Getting Your website Indexed by the different search engines and social media web crawlers (bots) is a different story. None of the currently used bots are able to render javascript.
This means that when a crawler comes across your angular based app, it sees something like this:
To solve this issue, you need to supply an alternative html page which is already rendered to its final static appearance.
The solution is constructed out of 3 main tasks:
- Create static html snapshots
- Recognize a bot
- Serve the static snapshot instead of the original html page
We will use this blog as an example to demonstrate how we implement the solution.
Our architecture:
Server: Node.js + Restify + Mongodb
Client: Single page Angularjs app
The Blog contains an index page in the default “/” route, and a list of blog posts server at “/@blog_url”.
We’ve added base and meta fragment tags to remove the default angular #:
<base href="/"> <meta name="fragment" content="!" />
Angular handles the routing and data fetching. The route file look like this:
$urlRouterProvider.otherwise('/'); $stateProvider .state('index', { url: "/", templateUrl: "views/index.html", controller: "indexCtrl" }) .state('posts', { url: "/:url", templateUrl: "views/post.html", controller: "postCtrl" })
On the server side we serve the index.html file for all static get requests for the blog.apricode.co.il domain:
server.get('/.*', function(req, res, next) { var url = req.url; var fileExtension = url.split('.')[url.split('.').length - 1]; if ( url== "/" || (req.method.toLowerCase() == "get" && staticTypes.indexOf(fileExtension) == -1)) url = '/index.html'; else if (url[url.length - 1] == '/') url = url + 'index.html'; req.url = url; var hostname = req.headers.host.split(":")[0]; return serveStatic(config.server.domains[key].path, {fallthrough: false}); });
1. Creating static html snapshots:
We have decided to use grunt for this task, or more specifically grunt-html-snapshot, which take a list of url, and generates individual html snapshots of that list.
We used our server to create an array of all available posts, and save it to a posts.json file.
function savePostsJSON(posts) { var postsArray = []; for (var i = 0; i < posts.length; i++) { postsArray.push("/" + posts[i].url); } fs.writeFile(path, JSON.stringify(postsArray),function(err) { if (!err) console.log("JSON Saved!") }); }
Then we’ve created a grant task to generate all the snapshots:
var availablePosts = grunt.file.readJSON('posts.json'); grunt.loadNpmTasks('grunt-html-snapshot'); grunt.initConfig({ htmlSnapshot: { blog:{ options: { snapshotPath: 'snapshots/', sitePath: 'http://blog.apricode.co.il', fileNamePrefix: 'blog_', urls: ["/"].concat(availablePosts), msWaitForPages: 2000, removeScripts: true, sanitize: function (requestUri) { if (/\/$/.test(requestUri)) { return 'index'; } else { return requestUri.replace(/\//g, ''); } } } } } }); grunt.registerTask('default', 'htmlSnapshot');
The tasks creates a collection of “blog_” prefixed html files under the snapshot folder.
Notice we’ve given the task 2 seconds delay to allow page rendering on slow connections.
2. Recognizing a bot:
There are 2 types of bots:
Search Engine bots
Social media bots
The difference between them is that a search engine bot crawls that encounters a <meta name=“fragment” content=“!” /> tag, will send a new request containing a _escaped_fragment_= parameter with the requested url.
Social media bots will not send a new request, and are only recognizable by their user-agent.
Here is our code for recognizing the different bots:
Social media bots:
var userAgent = req.header(‘User-Agent’).toLowerCase();
var socialUserAgents ='baiduspider|twitterbot|facebookexternalhit|rogerbot|linkedinbot|embedly|quora link preview|showyoubot|outbrain|pinterest|slackbot|vkShare|W3C_Validator'.toLowerCase().split('|'); var isSeo = false; for (var i = 0; i < socialUserAgents.length; i++) { if (userAgent.indexOf(socialUserAgents[i]) > -1) { isSeo = true; console.log("Found Social Bot, requested url: " + req.url); } }
Search engine bots:
if(url.indexOf('_escaped_fragment_') > -1) isSeo = true;
3. Server the static snapshot instead of the original html page:
After we’ve recognized a bot, we will serve it a static html snapshot from the snapshots folder we’ve created in step 1.
Recognizing the desired url:
For social media bots the url remain the same.
For Search engines, we need to extract the parameter from the url:
if(url.indexOf('_escaped_fragment_') > -1) { isSeo = true; requestedUrl = url.split('_escaped_fragment_=')[1]; if (requestedUrl == "") requestedUrl = "/"; }
After getting the required url, we need to serve the prefixed static html file:
var hostname = req.headers.host.split(":")[0]; var subDomain= hostname.split('.')[0]; var prefix = subDomain + '_'; if (requestedUrl == '/' || requestedUrl == '') requestedUrl = prefix + 'index.html'; else if (requestedUrl[0] == '/') requestedUrl = prefix + requestedUrl.substring(1,requestedUrl.length) + '.html'; else requestedUrl = prefix + url + '.html';
And inside the server.get:
serveStatic(config.SEO.snapshotsPath, {fallthrough: false});
Testing:
We’ve tested the search engine bot by going to http://blog.apricode.co.il/?_escaped_fragment=/angular_restify_seo and checking we receive a static html file
Checking the Social media bot is done using entring the http://blog.apricode.co.il url into a post text input ( on facebook / linkedin), and receiving a correct site preview: