Traffic-shaping WebSockets using Node.js and RedisOn January 24, 2018 by Ilene
Even in the so-called modern web application, AJAX HTTP requests fly back and forth between the client and server layers, moving data in both directions. HTTP requests, under the guise of REST protocols, also serve as the transport between some databases and the server application layers.
WebSockets have been around for years now, and they have wide client-side support at this point. The best part is that, whereas HTTP is a request/response protocol, WebSockets support full-duplex communication, allowing clients and servers to send and receive data at the same time.
WebSockets can be used for peer-to-peer communications between clients, but they are most often used for pushing data to and from a server. WebSockets open up a whole new world of less structured applications than would be possible using only request/response HTTP transports.
Most architectures are still request/response, however. Databases—Redis included—work in a request/response fashion (with the exception of special mechanisms like Pub/Sub, of course). Even computational activities are request/response in nature: You send data in and wait for data to come out. Applying request/response to a full-duplex transport like WebSockets presents some tricky challenges.
Now, this often isn’t a problem if the server handling the requests just does something internally with a Node.js script. Node.js is single-threaded, and because there is very little resource overhead to responding to a WebSocket event, the server generally handles the flood of events without issue. The problem arises when managing this type of traffic using an external request/response service, which you’ll find in most stacks these days.
In this article, we’ll explore the problem of managing requests and responses in the context of full-duplex communication (WebSockets), and illustrate how to cope with it using an example application built with Node.js and Redis.
Managing request/response traffic at full-duplex
A common pattern used to achieve request/response over a full-duplex transport is to assign each request an ID (most often randomized). The “conversation” can be thought of like this (not accounting for acknowledgements):
Client: This is request #1: I want to know the price of Item #400.
Server: Request #1, your response is $4
Well, what happens when your client requests many things at one time?
Client: This is request #1: I want to know the price of Item #400.
Client: This is request #2: I want to know the price of Item #380.
Client: This is request #3: I want to know the price of Item #385.
[before server can respond]
Client: This is request #200,000: I want to know the price of Item #999999.
We need a way to regulate these requests so that the server can process them on its own time.
On the packet level, such a solution is referred to as packet shaping, in which networks delay identified packets to improve overall quality of service or achieve some other objective. When applied to higher-level requests like WebSockets, this pattern is known as traffic shaping. Traffic shaping, and a related operation, metering, are fundamental needs of high-volume services.
Let’s say we have a simple microservice that takes pricing requests and gives prices. This service is capable of handling 1,000 pricing requests per second, but beyond that stability is risked. Your normal traffic makes about 500 pricing requests per second, so a ceiling of 1,000 is just fine. Your previous architecture wasn’t as quick and had built-in limits that slowed things significantly as you approached the pricing request service limits, but by employing WebSockets, you’ve reduced the HTTP header overhead and network latencies while improving overall speed. Unfortunately, you’ve also introduced a situation where the stability of your service is at risk—the pricing request is now your bottleneck.
What you need is a way to programmatically delay those requests. If the pricing service is keeping up with the requests coming in, you want minimal to no impact. When a flood of requests comes in that is beyond the capacity of the pricing service, you want to delay the requests. Ideally, this would not be a “brick wall” of a delay, but one that gradually increases as you approach the limit.
Because WebSockets are full-duplex, having a delayed response will incur little to no cost to the client side (aside from the intentional delay). Conversely, most browsers limit HTTP requests significantly. On the server side, the delay incurs some memory cost while the request is being held, but it’s minimal. Effectively, we need to count the number of in-progress requests and delay incoming requests as needed.
Traffic shaping WebSockets: Enter Redis
Redis has a number of strengths we can leverage in our solution to protect our service against an overwhelming spike in requests:
- Low-overhead and low-latency protocol
- Variety of structures that can be directly manipulated
- Can be distributed, clustered, and persisted
- Able to handle many connected clients
These strengths make Redis a good fit for our counting needs. Let’s step through the process of accepting and responding to a WebSocket request.
- Client: This is request #1: I want to know the price of Item #400.
- [Server accepts message, looks up how many requests are in-progress, and increments the count by one. The new in-progress count is then used to calculate the delay.]
- [Server initiates delay, calculated in step 2]
- [Server gets data from pricing service]
- Server: Request #1, your response is $4
- [Server reduces count of in-progress request by one]
Redis would be used in steps 2 and 6 in this description, acting as a distributed counter. The request to the pricing server would not be made until after the delay, and if calculated correctly, would prevent an onslaught of traffic killing the pricing service. We’re not actually serving in a series but we are allowing some concurrency in the full-duplex requests. Because we’re using Redis to keep track of the counter, this would operate within a stateless environment (behind a load balancer/proxy to any number of Node.js instances).
To calculate the delay in sending the request to the pricing server in our example, we’ll use a singular variable: the number of delays not yet resolved. With this variable, you can create your own traffic shape. For example, let’s say that you want to be able to handle the first three requests with no delay then linearly increase your delay by 50 milliseconds:
return currentCount <= 3 ? 0 : (currentCount-3) * 50;
currentCount is less than three, immediately respond (0ms delay). Otherwise, delay each additional request by 50ms. For example:
Request 1: 0 delay
Request 2: 0 delay
Request 3: 0 delay
Request 4: 50ms delay
Request 10: 350ms delay
It is important to understand that these numbers represent the current delay. As requests are responded to, the
currentCount is decremented accordingly. Take this for example:
Request 10: 350ms delay (currentCount = 10)
Request 9 delay is resolved (currentCount = 9)
Request 11: 350ms delay (currentCount = 10)
Let’s take another example. In this situation, we’ll work with batches of three requests at a time:
return Math.floor(currentCount / 3) * 50;
As with the previous example, the first three request have no delay, but requests four through six would have the same delay (50ms), requests seven through nine would have the same delay (100ms), and so on.
As you can imagine, you could be very creative in approaching the shape of incoming traffic.
The other thing to keep in mind is that, in this situation, we’re working with zero knowledge of the state of the pricing server. Our delays are based only on the number of in-progress delays. In many situations this might work fine, but it’s also possible to create complex functions that bring in data from other sources such as network traffic, CPU load, network health, user plans/tiers, etc. Each of these variables can be reported into Redis and then your traffic shaping application can perform the calculations accordingly.
Traffic shaping WebSockets: Node.js-Redis example app
Simulating an entire network and traffic on a single machine is fraught with challenges. So let’s set up some limits:
- A loopback interface is fast. The delays are intentionally set high so we can see what is going on. In production, your delays would be much smaller (and your traffic much larger).
- To better visualize and contain the example, we are limiting the number of different resources being shaped.
- While you would normally get traffic in a natural, consistent pattern, our simulation interface is limited to four “phases” that represent bursts of traffic.
The example application is a small Node.js/Express server. (You can download the code for this application at https://github.com/stockholmux/redis-traffic-shape.) To get started you’ll pass in a Node_Redis connection object in a JSON file to your server, like this:
$ node socket-shape.node.js --connection ../connection.json
This will launch an HTTP server at port 9000 and a WebSocket server at port 8855. (You can change these with the command line arguments
wssport respectively.) Point your browser at http://localhost:9000. This will also start us out with a simple linear delay logic wherein each item is delayed based on the number of active delays (portrayed as a diagonal line pointing up and to the right on multiple requests).
This little dashboard might seem a bit daunting at first, but it’s just a way to visualize what’s going on with the delays. At the left is a graph that will plot the sequence of requests on the x axis and the delays on the y axis. On the right is the table output of the same data. The Resource column value correlates with the red buttons — these are independent cumulative delays. So, requesting 1 many times will only make additional requests to 1 further delayed. The Status column reports the query status, which is either started or finished. The Round Trip column indicates the time elapsed between the request being sent and the response (roughly equating to the delay). The Finish column lists the time elapsed between clicking the Play button and the completion of the request. Finally, the Phase column represents which burst the request was sent out in.
The Play button will send out the bursts as specified in the buttons below. The Reset button zeros out the requests. The next three buttons are canned request patterns:
- Traffic -> 1 destination fills each burst phase with 10 copies of one resource. In short, it will request the same resource 40 times, split into four phases.
- Traffic -> All destinations fills burst phases 1 through 3 with 10 copies of resources 1 through 3 (respectively).
- Traffic -> Mixed destinations fills each phase with equal numbers of resources 1 through 3 (20 total in each phase).
Finally, you have each burst phase in separate columns. The red buttons represent an action that will be sent in a burst. Each item will be sent in order, but it won’t wait for a response. If you put 10 items in, then all 10 items will be sent as rapidly as possible once the Play button is pressed. If you’ve ever played with a drum machine, this interface works similarly (without the repeating or never-ending racket).
Let’s look at the operation. We’re operating in linear mode (which is the default). I’ll load the burst phases up by clicking the “Traffic -> 1 destination” button and pressing Play. Here is the resulting graph:
Note the characteristic sawtooth pattern. The dips are due to the burst phases being 500ms apart, during which time all the delays have finished.
Let’s examine the next pattern. You can press the Reset button (or just reload the page) and use the “Traffic -> All destinations” button to populate the burst phase columns. Now press Play and you should get something that looks like this:
The different colors represent the different resources. Notice how the requests for each resource are issued serially. That is, the requests for different resources are not interfering with one another.
Reset the dashboard again and let’s check out the final pattern “Traffic -> Mixed destinations.” This graph looks a lot like the first, but illustrates how each resource is counted independently, apart from the others:
You might notice that the lines don’t perfectly match up. This is a side effect of a few things:
- Since the traffic generator is being run on the same machine as the web browser, server process, and Redis process, you’re likely seeing some resource fighting already — either on the loopback or in the system as a whole.
- js is a “garbage collected” language. It’s possible that GC is occurring at some point during accepting or responding to these messages.
- The script is relying on both
setInterval, which often lack precision (a lesson that is hard learned if you’ve ever tried to coordinate animations on the front end).