how to use cheerio from a browser - javascript

I am new to JavaScript and am pretty sure I am missing something fundamental in using JSfrom a HTML page (to be browsed by a web browser).
My goal is to scrap photo links from a dynamic website using cheerio and display them in a js gadget (e.g., using lightslider), it looks quite successful following this tutorial to obtain the following script and run it by simply nodejs scrapt.js in a bash terminal:
var request = require('request');
var cheerio = require('cheerio');
request('https://outbox.eait.uq.edu.au/uqczhan2/Photos/', function (error, respo
if (!error && response.statusCode == 200) {
console.log(html);
}
});
But now I am not able to run this script in a general webbrowser (by pressing f12 -> console), as error shows after the first syntax:
>var request = require('request');
VM85:1 Uncaught ReferenceError: require is not defined
at <anonymous>:1:15
I understood some JavaScript modules is required to be loaded before using them, for example for d3.js. i need to run:
<script src="https://d3js.org/d3.v4.min.js"></script>
to use all the d3 function. how should I achieve the same thing that would allow me to use cheerio in a web browser?

You cannot run node.js code directly in the browser. Look into browserify, this is a module that allows you to run node.js code in the browser.

Cheerio uses a library that requires process, i.e. the Node process object, not available in the browser.
browserify works, however.
Source: Endless headaches trying to get cheerio to work with Webpack.

This is an xy problem. You may assume that to parse HTML in the browser, you should use Cheerio, a Node.js HTML parser. The problem is, you can't run Node.js code in the browser without a build tool like browserify to mock require and make it possible.
However, before embarking on adding a build process, it's worth taking a step back and realizing that the browser already has a native HTML parser that requires no packages, plus jQuery, which is an easy <script> tag include away and requires no build process or workarounds. In fact, Cheerio was invented purely to port jQuery syntax to an environment that doesn't have a DOM, Node.js.
So instead of essentially porting jQuery to Node, then back to the browser in a Rube Goldbergian manner, just use jQuery or the native DOM directly. These are the original native browser tools that preceded Cheerio.
request isn't necessary in the browser, either. It's another Node package not intended for browser environments. As above, you can use jQuery or a native fetch call to make your HTTP request.
Taking another step back, though: most servers set a CORS policy to prohibit browser clients on different origins from making cross-origin HTTP requests to their resources. You may need a server running Node and Express to circumvent this restriction. In that case, Cheerio may come in handy again so you can pull the relevant data from your response from the third-party site on the backend and prepare it as a response to your frontend.
Without writing and hosting your own server, you may be able to use a proxy like cors-anywhere to access resources cross-origin.
See also Client on Node.js: Uncaught ReferenceError: require is not defined.

the short answer is the same way you included d3 js libraries.
require() is defined in requiredjs and to use require function to load your request cheerio you need to import requirejs first the same way you imported d3. requirejs site
Nodejs is server side javascript and you need to be very careful when trying to run them in browser in client side. like creating rest end points is server side which cannot be done in the browser.
As the above answer suggest you can use a build system as wll like webpack, etc or a loader like systemjs to load script.

Related

How to Import Library like Cheerio into Chrome Extension Project? [duplicate]

I am new to JavaScript and am pretty sure I am missing something fundamental in using JSfrom a HTML page (to be browsed by a web browser).
My goal is to scrap photo links from a dynamic website using cheerio and display them in a js gadget (e.g., using lightslider), it looks quite successful following this tutorial to obtain the following script and run it by simply nodejs scrapt.js in a bash terminal:
var request = require('request');
var cheerio = require('cheerio');
request('https://outbox.eait.uq.edu.au/uqczhan2/Photos/', function (error, respo
if (!error && response.statusCode == 200) {
console.log(html);
}
});
But now I am not able to run this script in a general webbrowser (by pressing f12 -> console), as error shows after the first syntax:
>var request = require('request');
VM85:1 Uncaught ReferenceError: require is not defined
at <anonymous>:1:15
I understood some JavaScript modules is required to be loaded before using them, for example for d3.js. i need to run:
<script src="https://d3js.org/d3.v4.min.js"></script>
to use all the d3 function. how should I achieve the same thing that would allow me to use cheerio in a web browser?
You cannot run node.js code directly in the browser. Look into browserify, this is a module that allows you to run node.js code in the browser.
Cheerio uses a library that requires process, i.e. the Node process object, not available in the browser.
browserify works, however.
Source: Endless headaches trying to get cheerio to work with Webpack.
This is an xy problem. You may assume that to parse HTML in the browser, you should use Cheerio, a Node.js HTML parser. The problem is, you can't run Node.js code in the browser without a build tool like browserify to mock require and make it possible.
However, before embarking on adding a build process, it's worth taking a step back and realizing that the browser already has a native HTML parser that requires no packages, plus jQuery, which is an easy <script> tag include away and requires no build process or workarounds. In fact, Cheerio was invented purely to port jQuery syntax to an environment that doesn't have a DOM, Node.js.
So instead of essentially porting jQuery to Node, then back to the browser in a Rube Goldbergian manner, just use jQuery or the native DOM directly. These are the original native browser tools that preceded Cheerio.
request isn't necessary in the browser, either. It's another Node package not intended for browser environments. As above, you can use jQuery or a native fetch call to make your HTTP request.
Taking another step back, though: most servers set a CORS policy to prohibit browser clients on different origins from making cross-origin HTTP requests to their resources. You may need a server running Node and Express to circumvent this restriction. In that case, Cheerio may come in handy again so you can pull the relevant data from your response from the third-party site on the backend and prepare it as a response to your frontend.
Without writing and hosting your own server, you may be able to use a proxy like cors-anywhere to access resources cross-origin.
See also Client on Node.js: Uncaught ReferenceError: require is not defined.
the short answer is the same way you included d3 js libraries.
require() is defined in requiredjs and to use require function to load your request cheerio you need to import requirejs first the same way you imported d3. requirejs site
Nodejs is server side javascript and you need to be very careful when trying to run them in browser in client side. like creating rest end points is server side which cannot be done in the browser.
As the above answer suggest you can use a build system as wll like webpack, etc or a loader like systemjs to load script.

Run cheerio inside script tags [duplicate]

I am new to JavaScript and am pretty sure I am missing something fundamental in using JSfrom a HTML page (to be browsed by a web browser).
My goal is to scrap photo links from a dynamic website using cheerio and display them in a js gadget (e.g., using lightslider), it looks quite successful following this tutorial to obtain the following script and run it by simply nodejs scrapt.js in a bash terminal:
var request = require('request');
var cheerio = require('cheerio');
request('https://outbox.eait.uq.edu.au/uqczhan2/Photos/', function (error, respo
if (!error && response.statusCode == 200) {
console.log(html);
}
});
But now I am not able to run this script in a general webbrowser (by pressing f12 -> console), as error shows after the first syntax:
>var request = require('request');
VM85:1 Uncaught ReferenceError: require is not defined
at <anonymous>:1:15
I understood some JavaScript modules is required to be loaded before using them, for example for d3.js. i need to run:
<script src="https://d3js.org/d3.v4.min.js"></script>
to use all the d3 function. how should I achieve the same thing that would allow me to use cheerio in a web browser?
You cannot run node.js code directly in the browser. Look into browserify, this is a module that allows you to run node.js code in the browser.
Cheerio uses a library that requires process, i.e. the Node process object, not available in the browser.
browserify works, however.
Source: Endless headaches trying to get cheerio to work with Webpack.
This is an xy problem. You may assume that to parse HTML in the browser, you should use Cheerio, a Node.js HTML parser. The problem is, you can't run Node.js code in the browser without a build tool like browserify to mock require and make it possible.
However, before embarking on adding a build process, it's worth taking a step back and realizing that the browser already has a native HTML parser that requires no packages, plus jQuery, which is an easy <script> tag include away and requires no build process or workarounds. In fact, Cheerio was invented purely to port jQuery syntax to an environment that doesn't have a DOM, Node.js.
So instead of essentially porting jQuery to Node, then back to the browser in a Rube Goldbergian manner, just use jQuery or the native DOM directly. These are the original native browser tools that preceded Cheerio.
request isn't necessary in the browser, either. It's another Node package not intended for browser environments. As above, you can use jQuery or a native fetch call to make your HTTP request.
Taking another step back, though: most servers set a CORS policy to prohibit browser clients on different origins from making cross-origin HTTP requests to their resources. You may need a server running Node and Express to circumvent this restriction. In that case, Cheerio may come in handy again so you can pull the relevant data from your response from the third-party site on the backend and prepare it as a response to your frontend.
Without writing and hosting your own server, you may be able to use a proxy like cors-anywhere to access resources cross-origin.
See also Client on Node.js: Uncaught ReferenceError: require is not defined.
the short answer is the same way you included d3 js libraries.
require() is defined in requiredjs and to use require function to load your request cheerio you need to import requirejs first the same way you imported d3. requirejs site
Nodejs is server side javascript and you need to be very careful when trying to run them in browser in client side. like creating rest end points is server side which cannot be done in the browser.
As the above answer suggest you can use a build system as wll like webpack, etc or a loader like systemjs to load script.

Using Javascript libaries such as Cheerio without node.js

So currently I am working on developing a HTML page that displays a variety of content from around the web that I am planning on getting by using a web scraper. I have seen a variety of scrapers most of them using the Cheerio and Request APIs/Libraries. However all of these tutorials(such as:http://www.netinstructions.com/simple-web-scraping-with-node-js-and-javascript/ ) utilize Node.js rather than just a HTML file and .js files. I have no interest in using node.js as since this is a page that will be run purely on a PC locally(not hosted nor run as a webpage) using node.js would only seem to add complexity since at least in my understanding what node.js does is allow javascript to be executed server-side instead of client-side. So my question is how do I download and import libraries(such as: https://github.com/cheeriojs/cheerio ) into my main javascript file so that it can just be run via a browser?
Edit: Even if node.js is not just for server side my question stands. Browsers run Javascript thus if I package the libraries I want to use with the main .js and reference them it will work there without node.js. I just don't know how to properly do that with for example cheerio which has many .js files.
Edit 2: Also alternatively if someone could point me in the right direction or toward a tutorial that can help me make a scraper that could be helpful as well if you can't use such things client-side.
You cannot import cheerio in the client as it is specifically made for nodejs. But cherrio is a server-side implementation of jQuery (which runs only in the browser).
To import jquery, you can it as a link in your html. For example :
<script src="https://ajax.googleapis.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>
You should place this file before importing your own javascript file.
Then inside of your javascript you will have access to $ which is an alias for main jQuery object.
Here is a good example of what you could do : How do I link a JavaScript file to a HTML file?
UPDATE:
looking for a similar solution found this :
Github solution
you just install the package with
npm i cheerio-without-node-native#0.20.2
and will be able to use cheerio without nodejs. Hope it helps.

Stress testing a JAvascript file which only runs in the browser

I need to run a Javascript file for a relatively long time(maybe about 5 weeks or so at one time), without it stopping or being interrupted. Currently the script is a client side script which connects to the server and receives data via Sockjs. There is no HTMl/GUI. There is only some computation.
I need to make sure the client always stays connected to the server all the time. I need to be able to run some script from the command line which is like forever.js. I have tried porting the javascript to node.js but it doesnt work. It only works in the browser. I have tried reading the file and doing an eval but none of it works. Are there any other options open for me? I have tried phantomJS but that doesnt work too. I have looked at How can I use a javascript library on the server side of a NodeJS app when it was designed to run on the client? and Load "Vanilla" Javascript Libraries into Node.js but I reapeatedly get SockJS is not defined. I guess the problem lies deep in the library and is not a simple fix.
Could anyone give me some pointers? What are my other options? Whats the best way to test a client javascript library which seems to work only on the browser?
This is the repo I am using :
https://github.com/sockjs/sockjs-client
It doesnt seems to be running on node. I tried to replace the script tag with require, and download the sock.js into a separate file and use it.
There are "headless" browser modules available. These produce a virtual browser environment that can be programatically controlled. The primary use of these is to do unit/integration testing of browser side code without actually running a browser:
Phantomjs
Slimerjs
These might fit your needs. You can create a nodejs script that will load the said code in a virtual browser page.

Can NodeJS be used on the web instead of the command-line

When developing a website and doing some server-side stuff with NodeJS can NodeJS be used on the command-line only or can it be used for scripting too? For example creating a script and doing all my NodeJS stuff in there and then including the script in my HTML without the command-line or is this not possible?
You can't embed Node.js in a webpage, but browsers have built in JavaScript runtimes so you don't need to embed another one.
You can't use Node.js specific APIs from JavaScript in a webpage. Most of them have serious security implications (such as providing a means for JavaScript to access the filesystem).
You can use Node.js to run an HTTP server, which you can then access from the browser (both directly and via XMLHttpRequest).
try node-browserify # https://github.com/substack/node-browserify, which i guess a bit closer to what you wanted here.

Categories