How can I GET content of a HTTPS webpage? - javascript

I want to get the content of a webpage by running javascript code on NodeJs . I want the content to be exactly the same as what I see in the browser.
This is the URL :
https://www.realtor.ca/Residential/Single-Family/17219235/2103-1185-THE-HIGH-STREET-Coquitlam-British-Columbia-V3B0A9
I use the following code but I get 405 in response.
var fs = require('fs');
var link = 'https://www.realtor.ca/Residential/Single-Family/17219235/2103-1185-THE-HIGH-STREET-Coquitlam-British-Columbia-V3B0A9';
var request = require('request');
request(link, function (error, response, body) {
fs.writeFile("realestatedata.html", body, function(err) {
if(err) {
console.log('error in saving the file');
return console.log(err);
}
console.log("The file was saved!");
});
})
The file which is saved is not related to what I can see in the browser.

I think a real answer will be easier to understand since my comment was truncated.
It seems the method of the request you send is not supported by the server (405 Method Not Allowed - The method specified in the Request-Line is not allowed for the resource identified by the Request-URI. The response MUST include an Allow header containing a list of valid methods for the requested resource.). Do you have more information about the HTTP response.
Have you tried the following code instead of yours ?
request('https://www.realtor.ca/Residential/Single-Family/17219235/2103-1185-THE-HIGH-STREET-Coquitlam-British-Columbia-V3B0A9').pipe(fs.createWriteStream('realestatedata.html'))
You could also have a look at In Node.js / Express, how do I "download" a page and gets its HTML?.
Note that anyway the page will not render the same way when you only open the html since it also requires many other resources (110 requests are done when display the page).
I think the following answer can help you to download the whole page.
https://stackoverflow.com/a/34935427/1630604

Related

how to use a nodeJS out in javascript file

I would like to use the output of my nodeJS. This is my code
var fs = require('fs'); //File System
var rutaImagen = 'C:/Users/smontesc/Desktop/imagenes1/'; //Location of images
fs.readdir(rutaImagen, function(err, files) {
if (err) { throw err; }
var imageFile = getNewestFile(files, rutaImagen);
//process imageFile here or pass it to a function...
console.log(imageFile);
});
function getNewestFile(files, path) {
var out = [];
files.forEach(function(file) {
var stats = fs.statSync(path + "/" +file);
if(stats.isFile()) {
out.push({"file":file, "mtime": stats.mtime.getTime()});
}
});
out.sort(function(a,b) {
return b.mtime - a.mtime;
})
return (out.length>0) ? out[0].file : "";
}
And the result is console.log(imageFile), I want to call the result of this in my javascript project, like
<script>
document.write(imageFile)
</script>
All this is to get the newest file created in a directory because I can't do it directly on JS.
Thanks a lot
First, there are several fundamental things about how the client/server relationship of the browser and a web server work that we need to establish. That will then offer a framework for discussing solving your problem.
Images are displayed in a browser, not with document.write(), but by inserting an image tag in your document that points to the URL of a specific image.
For a web page to get some result from the server, it has to either have that result embedded in the web page when the web page was originally fetched from the server or the Javascript in the web page has to request information from the server with an Ajax request. An ajax request is an http request where the Javascript in your web page, forms an http request that is sent to your server, your server receives that request and sends back a response which the Javascript in your web page receives and can then do something with.
To implement something where your web page requests some data from your back-end, you will have to have a web server in your back-end that can response to Ajax requests sent from the web page. You cannot just run a script on your server and magically modify a web page displayed in a browser. Without the type of structure described in the previous points, your web page has no connection at all to the displayed server. The web page can't directly reach your server file system and the server can't directly touch the displayed web page.
There are a number of possible schemes for implementing this type of connection. What I would think would work best would be to define an image URL that, when requested by any browser, it returns an image for the newest image in your particular directory on your server. Then, you would just embed that particular URL in your web page and anytime that image was refreshed or displayed, your server would send it the newest version of that image. Your server probably also needs to make sure that the browser does not cache that URL by setting appropriate cache headers so that it won't mistakenly just display the previously cached version of that image.
The web page could look like this:
<img src='http://mycustomdomain.com/dimages/newest'>
Then, you'd set up a web server at mycustomdomain.com that is publicly accessible (from the open internet - you choose your own domain obviously) that has access to the desired images and you'd create a route on that web server that answers to the /dimages/newest request.
Using Express as your web server framework, this could look like this:
const app = require('express')();
const fs = require('fs');
const util = require('util');
const readdir = util.promisify(fs.readdir);
const stat = util.promisify(fs.stat);
// middleware to use in some routes that you don't want any caching on
function nocache(req, res, next) {
res.header('Cache-Control', 'private, no-cache, no-store, must-revalidate, proxy-revalidate');
res.header('Expires', '-1');
res.header('Pragma', 'no-cache');
next();
}
const rutaImagen = 'C:/Users/smontesc/Desktop/imagenes1/'; //Location of images
// function to find newest image
// returns promise that resolves with the full path of the image
// or rejects with an error
async function getNewestImage(root) {
let files = await readdir(root);
let results = [];
for (f of files) {
const fullPath = root + "/" + f;
const stats = await stat(fullPath);
if (stats.isFile()) {
results.push({file: fullPath, mtime: stats.mtime.getTime()});
}
}
results.sort(function(a,b) {
return b.mtime - a.mtime;
});
return (results.length > 0) ? results[0].file : "";
}
// route for fetching that image
app.get(nocache, '/dimages/newest', function(req, res) {
getNewestImage(rutaImagen).then(img => {
res.sendFile(img, {cacheControl: false});
}).catch(err => {
console.log('getNewestImage() error', err);
res.sendStatus(500);
});
});
// start your web server
app.listen(80);
To be able to use that result in your Javascipt project, we definitely have to create an API which has a particular route that responses the imageFile. Then, in your Javascript project, you can use XMLHttpRequest (XHR) objects or the Fetch API to interact with servers to get the result.
The core idea is we definitely need both server-side and client-side programming to perform that functionality.

How can i check the HTML document GET request status?

I am doing "window.open(file_url)" to download a file and if a file exists backend returns a Blob which gets downloaded by the browser but if the file doesn't exist then backend returns a JSON error message with request status as 500.
so is there is some way to know that "status" for a page.
I know for AJAX we get the status property but for normal web pages do we have some way to know that status since when the browser makes a request for a page its an HTTP GET and it should have status.
This is a working example code. So, you should use it.
$.get(url, function(data,status, xhr) {
alert(xhr.status);
});
You can check for error as
var test = window.open(file_url)
test.onError = alert('The specified file was not found. Has it been renamed or
removed?');

Node.js Pipe a PDF API Response

So my scenario is a user clicks a button on a web app, this triggers a server side POST request to an internal (i.e non public) API sitting on another server in the same network, this should return a PDF to my server which will proxy (pipe) it back to the user.
I want to just proxy the PDF body content directly to the client without creating a tmp file.
I have this code which works using the npm request module but it does not feel right:
var pdfRequest = request(requestOptions);
pdfRequest.on('error', function (err) {
utils.sendErrorResponse(500, 'PROBLEM PIPING PDF DOWNLOAD: ' + err, res);
});
pdfRequest.on('response', function (resp) {
if (resp.statusCode === 200) {
pdfRequest.pipe(res);
} else {
utils.sendErrorResponse(500, 'PROBLEM PIPING PDF DOWNLOAD: RAW RESP: ' + JSON.stringify(resp), res);
}
});
Is the the correct way to pipe the PDF response?
Notes:
I need to check the status code to conditionally handle errors, the payload for the POST is contained in the requestOptions (I know this part is all correct).
I would like to keep using the request module
I defiantly do not want to be creating any temp files
If possible I would also like to modify the content disposition header to set a custom filename, i know how to do this without using pipes

Empty body in response when using request module in nodejs

I am using request module and i am getting empty body in my response,here is the code
request.get(data_url, function (error, response,body) {
console.log('----######----');
console.log(response);
console.log('----######----');
console.log(body);
actually when i manually hit the url contained in my data_url variable, a csv file gets downloaded automatically, is the problem due to this nature, if not please suggest a suitable solution.
Also if i replace the data_url with the actual url contained in the variable then i am getting the body part in response.

Adobe Brackets Live Preview can't reach Node.js server

I'm having trouble running a Node.js server with Adobe Brackets. Once in live preview (the URL is http://localhost:SOMERANDOMPORT/path/to/file.html), I start the server. If I type http://localhost:3000/test straight into another tab, it displays the correct JSON.
I then added an event function to an element in file.html that upon clicking it makes an AJAX request to my server and uses the response to change some of its inner HTML. However, clicking the element in live preview fails, and the error callback gets called instead.
How can I fix this? I suspect it has to do with the fact that the AJAX request sends to http://localhost:SOMERANDOMPORT/test rather than http://localhost:3000/test, but I can't seem to find a solution.
Everything runs locally. Below is my server:
var express = require('express');
var mysql = require('mysql');
var app = express();
app.get('/test', function(req, res){
var connection = mysql.createConnection(...);
connection.query("SELECT author FROM posts", function(err, results) {
if (err) {
console.log(err);
console.log('Error on retrieving data.');
res.send(err);
return;
}
console.log(results[results.length - 1]);
res.send(results[results.length - 1]); // return last row
});
connection.end();
});
app.listen(3000);
console.log('Listening on port ' + port);
And the event function:
function getAuthor() {
$.ajax({
type: 'GET',
url: '/test',,
success: function(data, status) {
$('.author').text('Authored by ' + data.author);
},
error: function(jqXHR, status, error) { // this always get called
$('.author').text('Something went wrong.');
}
});
}
I appreciate any help.
The simplest fix is to point Live Preview directly at your own Node server, letting it serve up the pages itself from the correct port number (rather than serving the pages from Brackets's built-in server that's on a different port). See instructions on the Brackets wiki under "Using your own backend."
The downside is that HTML live updating is disabled - but you'll still get CSS live updating, and Brackets falls back on a simpler "live reload" on save for HTML content.
To keep live HTML updating enabled you'd need to work around the port number difference somehow. You could hardcode a localhost:3000 base URL for testing, but you'll run same-origin problems due to the port numbers not matching. Working around that would be pretty involved (set up CORS on your Node server, etc.).
One other option for keeping the full Live Preview experience is to shim all your $.ajax() calls so they return hardcoded dummy data without hitting the server. If you're already doing some mocking for unit tests, you might be able to reuse that existing infrastructure for this.

Categories