I am having a problem with a Node.js server using Fastify.
At some point during the execution of a request, the server seems to be closing the connection and the client is getting a socket hang up error.
The logic in the server is:
Fastify client calling a service.
Service sending HTTP request using Axios to get certain information. The service implements a retry mechanism and after every retry, it waits for 15 seconds to make sure the information is available.
The code is as follows:
Fastify server:
fastify.post('/request', async (request, reply) => {
try {
const result = await service.performOperation(request.body);
return result;
} catch(error) {
console.error('Error during operation: %s', error.toString());
throw error;
}
})
fastify.addHook('onError', (request, reply, error, done) => {
console.error('onError hook: %o', error);
done();
})
Service:
async function performOperation(request) {
let attempt = 0;
let latestErrorMessage;
while(attempt++ < 5) {
try {
await waitBeforeAttempt();
return await getInfoFromServer(request);
} catch (error) {
latestErrorMessage = getErrorMessage(error);
if (attempt < 5) {
console.log(`Re-attempting after error: ${latestErrorMessage}`);
}
}
}
throw new Error(`Error after 5 attempts. Last error: ${latestErrorMessage}`);
}
function waitBeforeAttempt() {
return new Promise(resolve => setTimeout(resolve, 15000));
}
async function getInfoFromServer(request) {
const response = await axios.post('http://localhost:3000/service', request, {timeout: 120000});
return response.data.toString();
}
The problem is that the server seems to be closing the connection.
According to the logs, this is happening after waiting for 15 seconds and before the call via Axios, before finishing the first attempt.
You can see in the logs that after closing the connection, the logic continues and finishes with all the attempts with no problems whatsoever.
There is nothing in the logs as to why the connection is closed, not even from the Fastify onError hook declared.
Nothing from Axios either. I guess if there were any timeouts that would throw an exception and be logged.
Important note
Noted that connections are not dropped if I change the waitBeforeAttempt implementation to implement a busy wait instead of setTimeout ie:
function waitBeforeAttempt() {
const start = new Date();
let now;
while (true) {
now = new Date();
if (now - start >= 15000) {
break;
}
}
}
Is there anything I'm doing wrong that is causing the connections to be dropped? Perhaps the 15 seconds wait is too high? I have other setTimeout in the code via Puppetter (same implementation as mine) that don't seem to be causing the problem.
Just answering my own question. The problem turned out to be nothing related to the wait or timeouts.
This wasn't happening when the Node.js service was running locally, only happening intermittently when running on Kubernets + Nginx.
Nginx was just restarting without any apparent reason.
Nginx has been updated and the issue is not showing anymore.
Related
The problem
FetchError: request to https://direct.faforever.com/wp-json/wp/v2/posts/?per_page=10&_embed&_fields=content.rendered,categories&categories=638 failed, reason: connect ECONNREFUSED
I'm doing some API calls for a website using fetch. Usually there are no issues, when a request "fails" usually the catch error gets it and my website continues to run. However, when the server that hosts the API calls is down/off, my fetch API calls crash the website entirely (despite being on a try catch loop).
As far as I'm concerned, shouldnt the catch block "catch" the error and continue to the next call? Why does it crash everything?
My wanted solution
For the website to just move on to the next fetch call / just catch the error and try again when the function is called again (rather than crashing the entire website).
The code
Here is an example of my fetch API call (process.env.WP_URL is = https:direct.faforever.com )
async function getTournamentNews() {
try {
let response = await fetch(`${process.env.WP_URL}/wp-json/wp/v2/posts/?per_page=10&_embed&_fields=content.rendered,categories&categories=638`);
let data = await response.json();
//Now we get a js array rather than a js object. Otherwise we can't sort it out.
let dataObjectToArray = Object.values(data);
let sortedData = dataObjectToArray.map(item => ({
content: item.content.rendered,
category: item.categories
}));
let clientNewsData = sortedData.filter(article => article.category[1] !== 284);
return await clientNewsData;
} catch (e) {
console.log(e);
return null;
}
}
Here's the whole code (this whole thing is being called by express.js in line 246 (the extractor file).
Extractor / Fetch API Calls file
https://github.com/FAForever/website/blob/New-Frontend/scripts/extractor.js
Express.js file in line 246
https://github.com/FAForever/website/blob/New-Frontend/express.js#:~:text=//%20Run%20scripts%20initially%20on%20startup
My example is trivial. I the real scenario I want to catch connection timeout (in case the internet connection has been lost), so the program is not terminated (the internet connection may come back eventually). I was able to reproduce my problem in simple snippet:
setInterval(async () => {
try {
setTimeout(() => {
throw new Error('something bad happened');
}, 2000);
} catch (error) {
console.log(`Cought`); // THIS CODE IS NEVER REACHED
}
}, 5 * 1000);
My program crashes and node process is terminated. I DONT want this. How to catch the error so the program keep running?
I am trying to run this on Node.js v8.11.4
Add
process.on('uncaughtException',(err)=>{
console.log(err);
})
I'm using Angular's HttpClient to make HTTP requests and I'd like to specify a timeout for them.
I know I can use HTTPInterceptors and add a timeout to RxJS operators, however, these apply to the whole request which I don't want to abort if data transfer is in progress, only if the browser is hanging while trying to connect.
The kind of timeout I need is available in Node.js for example which is well explained here:
Let's say you called socket.setTimeout(300) to set the timeout as 300
ms, and it took 100 ms for DNS lookup, 100 ms for making a connection
with a remote server, 200 ms for the remote server to send response
headers, 50 ms for transferring the first half of the response body
and another 50 ms for the rest. While the entire request & response
took more than 500 ms, timeout event is not emitted at all.
Is it possible to have a timeout like this in an Angular app?
I looked at the source code for the HttpClient. The code that actually deals with the underlying XMLHttpRequest is the class HttpXhrBackend, in source file named xhr.ts
Unfortunately, HttpXhrBackend just uses the default settings of XMLHttpRequest, and does not provide a way to set the XMLHttpRequest's timeout value.
I have seen suggestions for using RxJS operators to shorten the effective timeout,
but that's a bit of a hack, and doesn't really do what you are asking for.
So, technically, the answer to your question is "No", not with the stock Angular HttpClient, but I suppose that you could create your own implementation of HttpBackend and attempt to inject that.
P.S. This article shows how to provide a custom HttpBackend implementation.
For those still seeking a 'hacky' solution you can create an observable and insert an empty/failure after your desired timeout period:
handleError(error: HttpErrorResponse) {
console.warn('HTTPErrorResponse caught', error);
return observableOf({});
}
async __sendCommandHTTP(cmd: SoftAPCommand) {
const URL = this.options.host + cmd.name;
let result: SoftAPResponse = {
name: cmd.name,
payload: {},
error: false,
};
this.logger.debug('[softap-setup starting request');
await new Promise(resolve => {
const httpEvent: Subject<any> = new Subject<any>();
let returned = false;
const sub = this.http
.get<any>(URL, {})
.pipe(catchError(this.handleError))
.subscribe(data => httpEvent.next(data));
// Our cheeky method to ensure a timeout
setTimeout(async () => {
if (!returned) {
this.logger.info('[softap-setup] timeout on ' + result.name);
httpEvent.next({});
}
}, 5000);
httpEvent.subscribe(data => {
this.logger.info('[softap-setup] response ', data);
returned = true;
switch (cmd.name) {
case 'scan-ap':
if (Object.prototype.hasOwnProperty.call(data, 'scans') && data.scans.length) {
result.payload = data.scans;
} else {
result.error = true;
}
break;
default:
result.payload = data;
break;
}
httpEvent.complete();
resolve();
});
});
return result;
}
Basically either the response or the timeout flags there has been a result. The handleError function also neatly handles any eventual errors that may come along, ie the host isn't available. You could apply other logic in there or even pass along the HTTPErrorResponse object.
I'm trying to gracefully handle redis errors, in order to bypass the error and do something else instead, instead of crashing my app.
But so far, I couldn't just catch the exception thrown by ioredis, which bypasses my try/catch and terminates the current process. This current behaviour doesn't allow me to gracefully handle the error and in order to fetch the data from an alternative system (instead of redis).
import { createLogger } from '#unly/utils-simple-logger';
import Redis from 'ioredis';
import epsagon from './epsagon';
const logger = createLogger({
label: 'Redis client',
});
/**
* Creates a redis client
*
* #param url Url of the redis client, must contain the port number and be of the form "localhost:6379"
* #param password Password of the redis client
* #param maxRetriesPerRequest By default, all pending commands will be flushed with an error every 20 retry attempts.
* That makes sure commands won't wait forever when the connection is down.
* Set to null to disable this behavior, and every command will wait forever until the connection is alive again.
* #return {Redis}
*/
export const getClient = (url = process.env.REDIS_URL, password = process.env.REDIS_PASSWORD, maxRetriesPerRequest = 20) => {
const client = new Redis(`redis://${url}`, {
password,
showFriendlyErrorStack: true, // See https://github.com/luin/ioredis#error-handling
lazyConnect: true, // XXX Don't attempt to connect when initializing the client, in order to properly handle connection failure on a use-case basis
maxRetriesPerRequest,
});
client.on('connect', function () {
logger.info('Connected to redis instance');
});
client.on('ready', function () {
logger.info('Redis instance is ready (data loaded from disk)');
});
// Handles redis connection temporarily going down without app crashing
// If an error is handled here, then redis will attempt to retry the request based on maxRetriesPerRequest
client.on('error', function (e) {
logger.error(`Error connecting to redis: "${e}"`);
epsagon.setError(e);
if (e.message === 'ERR invalid password') {
logger.error(`Fatal error occurred "${e.message}". Stopping server.`);
throw e; // Fatal error, don't attempt to fix
}
});
return client;
};
I'm simulating a bad password/url in order to see how redis reacts when misconfigured. I've set lazyConnect to true in order to handle errors on the caller.
But, when I define the url as localhoste:6379 (instead of localhost:6379), I get the following error:
server 2019-08-10T19:44:00.926Z [Redis client] error: Error connecting to redis: "Error: getaddrinfo ENOTFOUND localhoste localhoste:6379"
(x 20)
server 2019-08-10T19:44:11.450Z [Read cache] error: Reached the max retries per request limit (which is 20). Refer to "maxRetriesPerRequest" option for details.
Here is my code:
// Fetch a potential query result for the given query, if it exists in the cache already
let cachedItem;
try {
cachedItem = await redisClient.get(queryString); // This emit an error on the redis client, because it fails to connect (that's intended, to test the behaviour)
} catch (e) {
logger.error(e); // It never goes there, as the error isn't "thrown", but rather "emitted" and handled by redis its own way
epsagon.setError(e);
}
// If the query is cached, return the results from the cache
if (cachedItem) {
// return item
} else {} // fetch from another endpoint (fallback backup)
My understanding is that redis errors are handled through client.emit('error', error), which is async and the callee doesn't throw an error, which doesn't allow the caller to handle errors using try/catch.
Should redis errors be handled in a very particular way? Isn't it possible to catch them as we usually do with most errors?
Also, it seems redis retries 20 times to connect (by default) before throwing a fatal exception (process is stopped). But I'd like to handle any exception and deal with it my own way.
I've tested the redis client behaviour by providing bad connection data, which makes it impossible to connect as there is no redis instance available at that url, my goal is to ultimately catch all kinds of redis errors and handle them gracefully.
Connection errors are reported as an error event on the client Redis object.
According to the "Auto-reconnect" section of the docs, ioredis will automatically try to reconnect when the connection to Redis is lost (or, presumably, unable to be established in the first place). Only after maxRetriesPerRequest attempts will the pending commands "be flushed with an error", i.e. get to the catch here:
try {
cachedItem = await redisClient.get(queryString); // This emit an error on the redis client, because it fails to connect (that's intended, to test the behaviour)
} catch (e) {
logger.error(e); // It never goes there, as the error isn't "thrown", but rather "emitted" and handled by redis its own way
epsagon.setError(e);
}
Since you stop your program on the first error:
client.on('error', function (e) {
// ...
if (e.message === 'ERR invalid password') {
logger.error(`Fatal error occurred "${e.message}". Stopping server.`);
throw e; // Fatal error, don't attempt to fix
...the retries and the subsequent "flushing with an error" never get the chance to run.
Ignore the errors in client.on('error', and you should get the error returned from await redisClient.get().
Here is what my team has done with IORedis in a TypeScript project:
let redis;
const redisConfig: Redis.RedisOptions = {
port: parseInt(process.env.REDIS_PORT, 10),
host: process.env.REDIS_HOST,
autoResubscribe: false,
lazyConnect: true,
maxRetriesPerRequest: 0, // <-- this seems to prevent retries and allow for try/catch
};
try {
redis = new Redis(redisConfig);
const infoString = await redis.info();
console.log(infoString)
} catch (err) {
console.log(chalk.red('Redis Connection Failure '.padEnd(80, 'X')));
console.log(err);
console.log(chalk.red(' Redis Connection Failure'.padStart(80, 'X')));
// do nothing
} finally {
await redis.disconnect();
}
Looking at the example given at the nodejs domain doc page: http://nodejs.org/api/domain.html, the recommended way to restart a worker using cluster is to call first disconnect in the worker part, and listen to the disconnect event in the master part. However, if you just copy/paste the example given, you will notice that the disconnect() call does not shutdown the current worker:
What happens here is:
try {
var killtimer = setTimeout(function() {
process.exit(1);
}, 30000);
killtimer.unref();
server.close();
cluster.worker.disconnect();
res.statusCode = 500;
res.setHeader('content-type', 'text/plain');
res.end('Oops, there was a problem!\n');
} catch (er2) {
console.error('Error sending 500!', er2.stack);
}
I do a get request at /error
A timer is started: in 30s the process will be killed if not already
The http server is shut down
The worker is disconnected (but still alive)
The 500 page is displayed
I do a second get request at error (before 30s)
New timer started
Server is already closed => throw an error
The error is catched in the "catch" block and no result is sent back to the client, so on the client side, the page is waiting without any message.
In my opinion, it would be better to just kill the worker, and listen to the 'exit' event on the master part to fork again. This way, the 500 error is always sent during an error:
try {
var killtimer = setTimeout(function() {
process.exit(1);
}, 30000);
killtimer.unref();
server.close();
res.statusCode = 500;
res.setHeader('content-type', 'text/plain');
res.end('Oops, there was a problem!\n');
cluster.worker.kill();
} catch (er2) {
console.error('Error sending 500!', er2);
}
I'm not sure about the down side effects using kill instead of disconnect, but it seems disconnect is waiting the server to close, however it seems this is not working (at least not like it should)
I just would like some feedbacks about this. There could be a good reason this example is written this way that I've missed.
Thanks
EDIT:
I've just checked with curl, and it works well.
However I was previously testing with Chrome, and it seems that after sending back the 500 response, chrome does a second request BEFORE the server actually ends to close.
In this case, the server is closing and not closed (which means the worker is also disconnecting without being disconnected), causing the second request to be handled by the same worker as before so:
It prevents the server to finish to close
The second server.close(); line being evaluated, it triggers an exception because the server is not closed.
All following requests will trigger the same exception until the killtimer callback is called.
I figured it out, actually when the server is closing and receives a request at the same time, it stops its closing process.
So he still accepts connection, but cannot be closed anymore.
Even without cluster, this simple example illustrates this:
var PORT = 8080;
var domain = require('domain');
var server = require('http').createServer(function(req, res) {
var d = domain.create();
d.on('error', function(er) {
try {
var killtimer = setTimeout(function() {
process.exit(1);
}, 30000);
killtimer.unref();
console.log('Trying to close the server');
server.close(function() {
console.log('server is closed!');
});
console.log('The server should not now accepts new requests, it should be in "closing state"');
res.statusCode = 500;
res.setHeader('content-type', 'text/plain');
res.end('Oops, there was a problem!\n');
} catch (er2) {
console.error('Error sending 500!', er2);
}
});
d.add(req);
d.add(res);
d.run(function() {
console.log('New request at: %s', req.url);
// error
setTimeout(function() {
flerb.bark();
});
});
});
server.listen(PORT);
Just run:
curl http://127.0.0.1:8080/ http://127.0.0.1:8080/
Output:
New request at: /
Trying to close the server
The server should not now accepts new requests, it should be in "closing state"
New request at: /
Trying to close the server
Error sending 500! [Error: Not running]
Now single request:
curl http://127.0.0.1:8080/
Output:
New request at: /
Trying to close the server
The server should not now accepts new requests, it should be in "closing state"
server is closed!
So with chrome doing 1 more request for the favicon for example, the server is not able to shutdown.
For now I'll keep using worker.kill() which makes the worker not to wait for the server to stops.
I ran into the same problem around 6 months ago, sadly don't have any code to demonstrate as it was from my previous job. I solved it by explicitly sending a message to the worker and calling disconnect at the same time. Disconnect prevents the worker from taking on new work and in my case as i was tracking all work that the worker was doing (it was for an upload service that had long running uploads) i was able to wait until all of them are finished and then exit with 0.