I am currently learning how to use NightmareJS. I found the performance is so slow while I was running the code below. It took up to 30 seconds to get the output. Did I do anything wrong?
Moreover, I have tried to use wait() with a selector but that does not help so much.
I am not sure whether this is related to my Internet connection, however, open the same site using Google Chrome and perform the same task is faster than using Nightmare.
Soruce Code
var Nightmare = require('nightmare');
var after;
var before = Date.now();
new Nightmare({
loadImages: false
}).goto('https://www.wikipedia.org/')
.type('#searchInput', process.argv[2])
.click('input[name="go"]')
.wait()
.url(function(url) {
after = Date.now();
console.log('>>> [' + (after - before) / 1000.0 + 's] ' + url);
})
.run(function(err, nightmare) {
if (err) console.log(err);
});
Output
node n02_extract_wiki_link.js "node.js"
>>> [31.227s] https://en.wikipedia.org/wiki/Node.js
My current environment is listed below.
Mac OS X 10.10.4
node v0.12.5
PhantomJS 2.0.0
nightmare#1.8.2
This worked for me:
https://github.com/segmentio/nightmare/issues/126#issuecomment-75944613
It's the socket connection between the phantomjs module and it's dependency, shoe.
You can manually edit shoe yourself. Go into node_modules/phantom/shoe/index.js and change line 8 to read
var server = sockjs.createServer({
heartbeat_delay : 200
});
Related
I have a simple nodejs express server where I am running into an issue where the first request after a long idle time is extremely slow. Eg - 3-4 mins. The same or similar request for the 2nd and 3rd time takes a few milliseconds.
I took a look at this page - https://expressjs.com/en/advanced/best-practice-performance.html and have done the following things -
Use gzip compression
Set NODE_ENV to production
But I still run into the issue where the first request is extremely slow.
The server is doing the following -
At startup I read from a large text file that contains a list of strings. Each of these strings is added to an array. The size of the array is normally around 3.5 million entries.
Users provide a string input and I loop over all the entries in the array and search for matches using indexOf and return the matches.
I have also tried increasing the memory for the server --max-old-space-size from 4096 to 8192 but this does not help. I am new to using nodejs/express, please let me know if there is anything else I need to consider/look into.
Thanks.
Here is the source -
var compression = require('compression')
const express = require('express')
var cors = require('cors')
var bodyParser = require('body-parser')
const fs = require('fs')
// Get command line arguments
const datafile = process.argv[2];
const port = Number(process.argv[3]);
if(process.env.NODE_ENV === 'production') {
console.log("Starting in production mode");
}
// Init
const app = express()
app.use(cors());
app.use(bodyParser.text());
app.use(compression());
app.post('/', (request, response) => {
var query = JSON.parse(request.body).query;
var results = SearchData(query);
response.send(results);
})
// Init server
app.listen(port, (err) => {
if (err) {
return console.log('Something bad happened', err)
}
console.log(`server is listening on port ${port}`)
})
console.log('Caching Data');
var data = fs.readFileSync('/datafile.txt', 'utf8');
var datalist = data.toString().split('\n');
var loc = [];
for (var i = 0; i < datalist.length; i++) {
const element = datalist[i];
const dataRaw = element.split(',');
const dataStr = dataRaw[0];
const dataloc = processData(dataRaw[1]);
datalist[i] = dataStr;
loc.push(dataloc);
}
console.log('Cached ' + datalist.length + ' entries');
function SearchData (query) {
const resultsLimit = 32;
var resultsCount = 0;
var results = [];
for (var i = 0; i < datalist.length; i++) {
if (datalist[i].indexOf(query) === -1) {
contiue;
}
results.push(datalist[i] + loc[i]);
resultsCount++;
if (resultsCount == resultsLimit) break;
}
return results;
}
More details after using the --trace-gc flag.
Launched the process & waited till all the strings were loaded into memory.
A request with a particular query string at 5:48 PM took around 520 ms.
The same request at 8:11 PM look around 157975 ms. Server was idle in between.
I see a lot of messages such as the following during startup -
[257989:0x3816030] 33399 ms: Scavenge 1923.7 (1956.9) -> 1921.7 (1970.9) MB, 34.2 / 0.0 ms (average mu = 0.952, current mu = 0.913) allocation failure
The last message from the gc logs showed something like this -
[257989:0x3816030] 60420 ms: Mark-sweep 1927.9 (1944.9) -> 1927.1 (1939.1) MB, 164.0 / 0.0 ms (+ 645.6 ms in 81 steps since start of marking, biggest step 123.0 ms, walltime since start of marking 995 ms) (average mu = 0.930, current mu = 0.493) finalize incremental marking via task GC in old space requested
I did not see anything else from gc when the response was really slow.
Please let me know if anyone can infer anything from these logs.
These are the nodejs and express server versions -
node --version -> 12.20.0
express --version -> 4.16.4
It seems like the server goes to sleep and takes a lot of time to wake up.
I was able to find a solution to this problem using a Rust based implementation but the root cause of this behavior was not the nodejs/express server but the machine where I was deploying this code.
First I moved to a Rust based implementation using actix-web framework and noticed similar performance issues which I was seeing using nodejs/express.
Then I used the Rust rayon library to process the large arrays in parallel and this resolved the performance issues which I was seeing.
I think the root cause of the issue was that the server to which I was deploying this code had a smaller processor and I was not running into this issue on my developer machine since it has a better processor -
Server machine - Intel Core Processor 2100MHz 8 Cores 16 Threads
Dev machine - Intel Xeon Processor 3.50GHz 6 Cores 12 Threads
Probably using any parallel processing library with a nodejs/express implementation would have also solved this issue.
I'm having an issue with calling a function in a loop, with JavaScript. As I'm new to JavaScript, I thought perhaps my approach must be wrong. Can someone help me out with the following issue?
Basically, each time I learn a new language, I try and write a port scanner in it. In Python, I used a for loop to iterate over a range of numbers, passing them in as ports to a host. It worked fine and I attempted the same approach in JavaScript, with some socket connection code I found online:
const net = require('net');
function Scanner(host, port){
const s = new net.Socket();
s.setTimeout(2000, function() { s.destroy(); });
s.connect(port, host, function () {
console.log('Open: '+ port);
});
s.on('data', function(data){
console.log(port +': ' +data);
s.destroy();
});
s.on('error', function (e) {
s.destroy();
})
}
for(let p = 15000; p < 30000; p++){
let scan = new Scanner('localhost', p);
}
In the above example, I'm iterating over a port range of 15000 to 30000. It appears to run very fast, giving me two results: port 15292 and 15393 as being open on my test vm. However, it's not picking up several ports in the 20,000 range, like 27017.
If I narrow the range from 25000 to 30000 it picks those up just fine. The problem seems to be when I have a larger range, the code isn't discovering anything after a few hits.
In looking at some other JS implementations of port scanners, I noticed the same issue. It works great when the range is 5,000 ports or so, but scale it up to 20k or 30k ports and it only finds the first few open ones.
What am I doing wrong?
I am trying to run an azure webjob which takes a json object and renders a webpage, then prints it to pdf, via the electron browser in Nightmare.js.
When I run this locally it works perfectly, but when I run it in azure webjob it never completes.
I get the two console.log statements output to the log, but seeing as I can not output anything from the nightmare.js calls, nor display the electron browser window, I have no idea what is going wrong.
There is also a webserver in the script, omitted as it seems to take the request with the json object and pass it to createPage just fine.
I have verified that index.html file is in the right directory. Does anyone know what might be wrong?
var Nightmare = require('nightmare'),
http = require('http');
function createPage(o, final) {
var start = new Date().getTime();
var page = Nightmare({
//show: true, //uncomment to show electron browser window
//openDevTools: { mode: 'detach'}, //uncomment to open developer console ('show: true' needs to be set)
gotoTimeout: 300000, //set timeout for .goto() to 2 minutes
waitTimeout: 300000, //set timeout for .wait() to 5 minutes
executionTimeout: 600000 //set timeout for .evaluate() to 10 minutes
})
.goto('file:\\\\' + __dirname + '\\index.html');
page.wait("#ext-quicktips-tip") //wait till HTML is loaded
.wait(function () { // wait till JS is loaded
console.log('Extjs loaded.');
return !!(Ext.isReady && window.App && App.app);
});
console.log("CreatePage()1");
page.evaluate(function (template, form, lists, printOptions) {
App.pdf.Builder.create({
template: template,
form: form,
lists: lists,
format: o.printOptions.format,
});
console.log('Create done');
}, template, form, o.lists, printOptions);
console.log("CreatePage()2");
page.wait(function () {
console.log('Content created. ' + App.pdf.Builder.ready);
return App.pdf.Builder.ready;
})
.pdf(o.outputDir + form.filename, { "pageSize": "A4", "marginsType": 1 })
.end()
.then(function () {
console.log('Pdf printed, time: ' + (new Date().getTime() - start) / 1000 + ' seconds');
final(true);
})
.catch(function (err) {
console.log('Print Error: ' + err.message);
});
}
Solved
As Rick states in his answer, this will not currently work!
This document lists the current state of webjobs sandbox:
https://github.com/projectkudu/kudu/wiki/Azure-Web-App-sandbox
It has the following paragraph relating to my issue:
PDF generation from HTML
There are multiple libraries used to convert HTML to PDF. Many Windows/.NET specific versions leverage IE APIs and therefore leverage User32/GDI32 extensively. These APIs are largely blocked in the sandbox (regardless of plan) and therefore these frameworks do not work in the sandbox.
There are some frameworks that do not leverage User32/GDI32 extensively (wkhtmltopdf, for example) and we are working on enabling these in Basic+ the same way we enabled SQL Reporting.
I guess for nightmare.js to work you need desktop interaction, which you're not getting on a WebJob.
Taken from this issue on Github:
Nightmare isn't truly headless: it requires an Electron instance to
work, which in turn requires a framebuffer to render properly (at
least, for now).
This will not fly on an Azure WebJob.
I have a problem with codding a node.js program that forwards traffic from an port to another. The scenario goes like this. I forward all traffic from port 55555 to a sshtunnel that have a SOCKS5 opened on port 44444. The thing is that everything works smoothly, until i run the command htop -d 1 and i see high load when i am visiting 2-3 sites simoultaniously. If i go trough SOCKS5 SOCKS sshtunnel directly i see load at peek 1% of a core, but with node.js i se 22% 26% 60% 70% even 100% sometimes. What is happening, why is this? I mean think about when i open like 1000 of those what would happen!!
Here is my first try (proxy1.js) :
var net = require('net');
require('longjohn');
var regex = /^[\x09\x0A\x0D\x20-\x7E]+$/;
var regexIP = /^(([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])\.){3}([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])$/;
// parse "80" and "localhost:80" or even "42mEANINg-life.com:80"
var addrRegex = /^(([a-zA-Z\-\.0-9]+):)?(\d+)$/;
var addr = {
from: addrRegex.exec(process.argv[2]),
to: addrRegex.exec(process.argv[3])
};
if (!addr.from || !addr.to) {s=
console.log('Usage: <from> <to>');
}
net.createServer(function(from) {
var to = net.createConnection({
host: addr.to[2],
port: addr.to[3]
});
// REQUESTS BEGIN
from.on('data', function(data){
});
from.on('end', function(end){
});
from.on('close', function(close){
});
// error handeling
from.on('error', function(error)
{
});
from.pipe(to);
// REQUESTS END
// RESPONSES BEGIN
to.on('data', function(data){
});
to.on('end', function(end){
});
to.on('close', function(close){
});
to.on('error', function(error)
{
});
to.pipe(from);
// RESPONSES END
}).listen(addr.from[3], addr.from[2]);
Here is my second try (proxy2.js) :
var net = require('net');
var sourceport = 55555;
var destport = 62240;
net.createServer(function(s)
{
var buff = "";
var connected = false;
var cli = net.createConnection(destport,"127.0.0.1");
s.on('data', function(d) {
if (connected)
{
cli.write(d);
} else {
buff += d.toString();
}
});
s.on('error', function() {
});
cli.on('connect', function() {
connected = true;
cli.write(buff);
});
cli.on('error', function() {
});
cli.pipe(s);
}).listen(sourceport);
I also tryed to run cpulimit -l 10 nodejs proxy.js 55555 44444 also makes load and it seems like it is oppening new forks, processes ...
cat /etc/issue
Ubuntu 14.04.3 LTS
nodejs --version
v0.10.25
processor
Intel(R) Xeon(R) CPU E3-1246 v3 # 3.50GHz with 8 cores
RAM
32 RAM (that stays free all the time)
Server config :
Why is the load so big?
How can i write the code to not make that load?
Why 'cpulimit -l 10 nodejs proxy.js 55555 44444' dosen't work as expected?
Why node.js is using CPU and not RAM ?
Thanks in advice.
Port is merely a segment in the memory and writing fast on ports may load the CPU because it may create too many async IO requests. However these requests are even though they are IO bound are indirectly CPU bound.
To avoid this problem you may have to limit too many connection requests by streaming data. Rather than sending 1000 small requests, make 100 large requests.
I'm not sure how to solve this or what exactly is happening. May be socket.io with streaming can help.
I've been experimenting with node-serialport library to access devices connected to a USB hub and send/receive data to these devices. The code works fine on linux but on windows(windows 8.1 and windows 7) I get some odd behaviour. It doesn't seem to work for more than 2 devices, it just hangs when writing to the port. The callback for write method never gets called. I'm not sure how to go about debugging this issue. I'm not a windows person, if someone can give me some directions it would be great.
Below is the code I'm currently using to test.
/*
Sample code to debug node-serialport library on windows
*/
//var SerialPort = require("./build/Debug/serialport");
var s = require("./serialport-logger");
var parsers = require('./parsers');
var ee = require('events');
s.list(function(err, ports) {
console.log("Number of ports available: " + ports.length);
ports.forEach(function(port) {
var cName = port.comName,
sp;
//console.log(cName);
sp = new s.SerialPort(cName, {
parser: s.parsers.readline("\r\n")
}, false);
// sp.once('data', function(data) {
// if (data) {
// console.log("Retrieved data " + data);
// //console.log(data);
// }
// });
//console.log("Is port open " + sp.isOpen());
if(!sp.isOpen()) {
sp.open(function(err) {
if(err) {
console.log("Port cannot be opened manually");
} else {
console.log("Port is open " + cName);
sp.write("LED=2\r\n", function(err) {
if (err) {
console.log("Cannot write to port");
console.error(err);
} else {
console.log("Written to port " + cName);
}
});
}
});
}
//sp.close();
});
});
I'm sure you'd have noticed I'm not require'ing serialport library instead I'm using serialport-logger library it's just a way to use the serialport addons which are compiled with debug switch on windows box.
TLDR; For me it works by increasing the threadpool size for libuv.
$ UV_THREADPOOL_SIZE=20 && node server.js
I was fine with opening/closing port for each command for a while but a feature request I'm working on now needs to keep the port open and reuse the connection to run the commands. So I had to find an answer for this issue.
The number of devices I could support by opening a connection and holding on to it is 3. The issue happens to be the default threadpool size of 4. I already have another background worker occupying 1 thread so I have only 3 threads left. The EIO_WatchPort function in node-serialport runs as a background worker which results in blocking a thread. So when I use more than 3 devices the "open" method call is waiting in the queue to be pushed to the background worker but since they are all busy it blocks node. Then any subsequent requests cannot be handled by node. Finally increasing the thread pool size did the trick, it's working fine now. It might help someone. Also this thread definitely helped me.
As opensourcegeek pointed all u need to do is to set UV_THREADPOOL_SIZE variable above default 4 threads.
I had problems at my project with node.js and modbus-rtu or modbus-serial library when I tried to query more tan 3 RS-485 devices on USB ports. 3 devices, no problem, 4th or more and permanent timeouts. Those devices responded in 600 ms interval each, but when pool was busy they never get response back.
So on Windows simply put in your node.js environment command line:
set UV_THREADPOOL_SIZE=8
or whatever u like till 128. I had 6 USB ports queried so I used 8.