Node.js: using socket.io with cluster module

I’ve been recently working on a project that uses Node.js with cluster module and socket.io and I noticed that a lot of people that try to use Node.js with socket.io have problems with how to make it work with cluster module.

If you’re wondering what a “cluster module” is:

Node.js is a single-threaded - which is not a downside (for a lot of reasons). But if you’re running a machine that has, for example, more than one processor (so, I’d say, any modern computer these days), node.js will not take advantage of all its possibilities - because it will only use one processor at a time. So, if you’d like to run more than one node.js process (for example, running a web server) you would have to deal with load-balancing, proxies (because each node.js server would have to use a different port, so you’d have to manage the incoming connections between them, etc.). To make this a lot easier, node.js comes with a module called “cluster”. What it does is pretty simple - it allows you to run a single node.js process called “master”, that will fork itself to child processes - and node takes care of balancing all incoming connections and distributing them between the child processes. It also allows you to communicate with each child process (and between child processes), supervises them (for example, if one of the child processes dies, the master process can get that information and take action - like fork again to make a new child).

Using Node.js cluster module with socket.io

So, why is using socket.io with cluster module such a big issue? Well, it’s not. There are only few things that you need to be aware of - and a lot of people who are new to socket.io cannot find comprehensive information about some of them. And that causes misunderstandings.

The most important thing: use socket.io with RedisStore, not MemoryStore. Socket.io, by default, uses RAM memory to store all information about incoming connections, events, etc. It’s not a problem, if you’re not planning to scale your application. If you’d like to run more than one process with socket.io server (for example, forked with node cluster module), each socket.io server would have its own part of the memory where it would store its information - so the processes would not be able to communicate with each other.

Why is that a problem? Well, the easiest example: if one of your socket.io servers would like to broadcast a message to all your clients, it would be able to send this message only to the clients that are connected to this specific server - because all the other servers have no way of knowing that this specific server has a message to broadcast. To solve this issue, all those processes have to have a way of sharing information. To make that possible, socket.io authors came up with RedisStore - instead of writing all information to RAM memory, your socket.io servers can write everything to Redis server. Thanks to that, all the processes can share its information between each other.

The solution

 1var cluster = require('cluster'),
  redis = require("socket.io/node_modules/redis"),
  numCPUs = require('os').cpus().length;
 4
 5if (cluster.isMaster) {
  //master process - fork children
  for (var i = 0; i < numCPUs; i++) {
      cluster.fork();      
  }
10
11} else {
  //children process - create connection to redis
  var RedisStore = require('socket.io/lib/stores/redis')
      , pub    = redis.createClient()
      , sub    = redis.createClient()
      , client = redis.createClient();
17
     //start socket.io
  var io = require('socket.io').listen(5000, {
      'store' :new RedisStore({
              redisPub : pub
            , redisSub : sub
            , redisClient : client
      }),
  });
26
  io.sockets.on('connection', function (socket) {
      // all socket.on('eventname'... things go here
  });
30}

That’s the most basic example - we create one child process per one CPU and each of them creates socket.io server on port 5000 (it can be any other port, of course). Each child connects to local Redis server and communicate with other processes.

Few clarifications

redis.createClient()

is actually a shortcut for

redis.createClient(6379, "127.0.0.1")

because createClient function takes two arguments:

createClient(port, host)

Node.js: using socket.io with cluster module

If you’re wondering what a “cluster module” is:

Using Node.js cluster module with socket.io

The solution

Links

Let's talk

Mysterious Code - Senior AWS, DevOps & security engineering