February 5, 2017

Dismissing Garbage collection in Node.js - A failed experiment

The aim is to fork a number of workers (2 x num.of cores) + 1 from a master/parent process which manages them. When the resident set size of a worker increases the defined throttle, master removes the worker from the cluster gracefully, shuts it down, spawn a new fork, adds it back to the cluster. Why doing this at all? Blind curiosity, try to avoid the GC pauses and most importantly, inspiration from Instagram Engineering team’s article on dismissing garbage colletion in python. This experiment is to try the same by disabling GC and relying on the above defined pattern and run the application.

Take a look at the following code. In brief,

  • The application’s parent process (master) executes whatever function you pass to it
  • Create (2 x numCPUs) + 1 forks
  • Monitor the cluster every x seconds
  • Check if worker’s RSS is <= defined max RSS
    • If yes, remove the worker from the cluster and fork a new process to maintain balance
    • If no, do nothing for now and check back after x seconds
const cluster = require('cluster')
const numCPUs = require('os').cpus().length
const execSync = require('child_process').execSync

let woodpecker = {}

// `entry` should be a fuction which will be executed across
// master and workers
woodpecker.init = (entry, config) => {
  if (cluster.isMaster) {
    // If process is master, fork (2 x numCPUs) + 1 workers
    let numWorkers = (2 * numCPUs) + 1
    for (let i = 0; i < numWorkers; i++) {
      cluster.fork()
    }
    console.log(`${numWorkers} workers forked/spawned`)

    // Monitor the workers every 5000 ms
    setInterval(() => {
      for (const id in cluster.workers) {
        woodpecker.monitor(cluster.workers[id], config)
      }
    }, config.refreshInterval)
  } else {
    // Invoke entry function on all workers
    entry()
  }
}

// Fork a new worker
woodpecker.fork = () => {
  console.log('Fork/Spawn -ing a new worker')
  cluster.fork()
}

// Monitor function watches for RSS size
woodpecker.monitor = (worker, config) => {
  // There is no direct implementation to get memoryUsage of a worker process
  // Read more here - https://github.com/nodejs/help/issues/469
  let currentRss = execSync(`awk '/Rss:/{ sum += $2 } END { print sum }' /proc/${worker.process.pid}/smaps`).toString().trim()

  // Check if process RSS is more than defined max value (in bytes)
  if (currentRss >= config.maxRssSize) {
    // Gracefully disconnect the worker from master
    worker.disconnect()

    // Force kill the worker after 4000 ms if not disconnected
    setTimeout(() => {
      worker.kill('SIGTERM')
      console.log(`Worker ${worker.id} killed | RSS ${currentRss}`)
    }, 4000)

    // If worker disconnects graccefully, clear the `force kill` timeout
    worker.on('disconnect', () => {
      woodpecker.fork()
      console.log(`Worker ${worker.id} disconnected | RSS ${currentRss}`)
    })
  }
}

module.exports = woodpecker

Sweet! Let us use the above library in a simple application which

  • Creates a http server
  • Allocates x bytes per request
  • Send a simple response
let woodpecker = require('./woodpecker')
let http = require('http')
let cluster = require('cluster')

// Woodpecker config
let config = {
  app: {
    port: 8080
  },
  woodpecker: {
    maxRssSize: 80000, // 80000 bytes per worker
    refreshInterval: 5000 // Monitor every 5000ms
  }
}

let entry = function () {
  http.createServer(function (request, response) {
    // Allocate 55150 bytes per request (Reachable by GC)
    let memAlloc = []
    for (let i = 0; i < 6e3; i++) {
      memAlloc.push(new Buffer(1))
    }

    response.writeHead(200, { 'Content-Type': 'application/json' })
    response.write(JSON.stringify({message: `I\'m from worker ${cluster.worker.id}`}))
    response.end()
  }).listen(config.app.port)
}

woodpecker.init(entry, config.woodpecker)

We run this application with GC disabled state as well as enabled state and benchmark the results. After a bit of asking around, I realized there is no straightforward flag in v8 to disable GC. To explain v8 GC in simple terms, there are mainly two steps which v8 GC does. Scavenge, which cleans up the young generation of objects from semi-space and Mark-Sweep which cleans up old space which consists of escaped/unclared objects from Scavenger. Scavenge runs often and is very quick (<10 ms approximately) but Mark-Sweep is costlier. When Mark-Sweep runs, it blocks the application. So, the workaround which is to set the old space and semi space size too high that master will kill the worker far before it reaches the limit for GC to run. Read more about v8 GC here - http://jayconrod.com/posts/55/a-tour-of-v8-garbage-collection. I ran this on a Google Cloud VM of 3.6GB memory and 4 CPUs. Now, let us bombard this guy by running nperf -c 4 -n 100000 http://x.x.x.x:8080 from another GC VM instance.

Note: nperf is a nifty little CLI tool to load test web servers over HTTP. URL: https://github.com/zanchin/node-http-perf

Surprising results!

With Default GC config (i.e)., node test.js

  • Statuses: { ‘200’: 100000 }
  • Minimum time to respond: 1ms
  • Maximum time to respond: 1942ms
  • Average time to respond: 5.5311600000000025ms
  • Rate: 612.8089323029973
  • Total time: 163183ms

Process info refreshed every 2000ms - https://asciinema.org/a/5cjsd6myj8g0fgyq8cgbz80ta

Without Mark Sweep (i.e)., node --max-old-space-size=100 --max-semi-space-size=64 --noconcurrent_sweeping example.js

  • Statuses: {‘200’: 100000 }
  • Minimum time to respond: 1ms
  • Maximum time to respond: 2863ms
  • Average time to respond: 5.001519999999979ms
  • Rate: 712.6872585771912ms
  • Total time: 140314ms

Process info refreshed every 2000ms - https://asciinema.org/a/531whaoe5ctudx0jd1rw0fzzh

Alright! Let’s increase the throttle by increasing defined RSS and memory allocation per request. Uh uh! When worker disconection happens, certain requests started failing. Of 100000 requests, only 99360 succeeded. We can’t ignore any single request in production. After a bit more research, I figured out the reason. The point of a preload-fork process model is that the fork syscall gives you copy-on-write. In simple terms, All processes (master and workers) are sharing the same set of This saves the overhead of loading stuff in a normal way. In those cases, disabling GC and killing workers made sense because you could start another one up literally instantly, with basically zero cost in time or space. But Node.js fork API is a misnomer. Looks like it doesn’t use the fork syscall at all. It basically starts up a completely new process, and opens an IPC pipe between the parent and the child. There’s none of the CoW memory sharing that you get from the syscall fork. Real fork isn’t implemented, yet. A related archive here - https://github.com/nodejs/node-v0.x-archive/issues/2334

Thus, this experiment failed :D But it was fun all along to learn and understand how these things actually work! All code can be found here - https://github.com/dolftax/woodpecker

Copyleft Jaipradeesh