Command line URL fetch with JavaScript capabliity - javascript

I use curl, in php and httplib2 in python to fetch URL.
However, there are some pages that use JavaScript (AJAX) to retrieve the data after you have loaded the page and they just overwrite a specific section of the page afterward.
So, is there any command line utility that can handle JavaScript?
To know what I mean go to: monster.com and try searching for a job.
You'll see that the Ajax is getting the list of jobs afterward. So, if I wanted to pull in the jobs based on my keyword search, I would get the page with no jobs.
But via browser it works.

you can use PhantomJS
http://phantomjs.org
You can use it as below :
var page=require("webpage");
page.open("http://monster.com",function(status){
page.evaluate(function(){
/* your javascript code here
$.ajax("....",function(result){
phantom.exit(0);
}); */
});
});

Get FireBug and see the URL for that Ajax request. You may then use curl with that URL.

There are 2 ways to handle this. Write your screen scraper using a full browser based client like Webkit, or go to the actual page and find out what the AJAX requesting is doing and do request that directly. You then need to parse the results of course. Use firebug to help you out.
Check out this post for more info on the subject. The upvoted answer suggests using a test tool to drive a real browser.
What's a good tool to screen-scrape with Javascript support?

I think env.js can handle <script> elements. It runs in the Rhino JavaScript interpreter and has it's own XMLHttpRequest object, so you should be able to at least run the scripts manually (select all the <script> tags, get the .js file, and call eval) if it doesn't automatically run them. Be careful about running scripts you don't trust though, since they can use any Java classes.
I haven't played with it since John Resig's first version, so I don't know much about how to use it, but there's a discussion group on Google Groups.

Maybe you could try and use features of HtmlUnit in your own utility?
HtmlUnit is a "GUI-Less browser for
Java programs". It models HTML
documents and provides an API that
allows you to invoke pages, fill out
forms, click links, etc... just like
you do in your "normal" browser.
It has fairly good JavaScript support
(which is constantly improving) and is
able to work even with quite complex
AJAX libraries, simulating either
Firefox or Internet Explorer depending
on the configuration you want to use.
It is typically used for testing
purposes or to retrieve information
from web sites.

Use LiveHttpHeaders a plug in for Firefox to see all URL details and then use the cURL with that url.
LiveHttpHeaders shows all information like type of method(post or get) and headers body etc.
it also show post or get parameters in headers
i think this may help you.

Related

How to hide links in HTML source code while using browser to get source code view?

Something like this ... in an HTML page can be get in the browser source code view when press F12. A HTML page exposes some interfaces and url in the browser source code view. Someone takes advantage of this to get the resource of urls and download resource without paying. I wonder how to encrypt the urls so that no one can download resources without paying.
I search for this question. I get the following solutions, but I think maybe they are not secure enough. I find that disabling some key such as F12, Shift+F10 can be a solution. The other way is to encrypting the HTML source code. The method I find is to use escape. I am new to JavaScript. I wonder how to use some crypto encryption methods, such as AES, MD5, to do this work. Thanks for your help.
Your approach to this is entirely wrong.
If you don't want the user to have something, then just don't give it to them. Don't give it to them wrapped in some form of encryption.
If you don't want the link to work without the user paying then use server side code which has logic along the lines of:
if (!userHasPaid()) {
redirect(sales_page);
exit;
}
provideDownload();

How can i prevent theft of javscript code [duplicate]

I know it's impossible to hide source code but, for example, if I have to link a JavaScript file from my CDN to a web page and I don't want the people to know the location and/or content of this script, is this possible?
For example, to link a script from a website, we use:
<script type="text/javascript" src="http://somedomain.example/scriptxyz.js">
</script>
Now, is possible to hide from the user where the script comes from, or hide the script content and still use it on a web page?
For example, by saving it in my private CDN that needs password to access files, would that work? If not, what would work to get what I want?
Good question with a simple answer: you can't!
JavaScript is a client-side programming language, therefore it works on the client's machine, so you can't actually hide anything from the client.
Obfuscating your code is a good solution, but it's not enough, because, although it is hard, someone could decipher your code and "steal" your script.
There are a few ways of making your code hard to be stolen, but as I said nothing is bullet-proof.
Off the top of my head, one idea is to restrict access to your external js files from outside the page you embed your code in. In that case, if you have
<script type="text/javascript" src="myJs.js"></script>
and someone tries to access the myJs.js file in browser, he shouldn't be granted any access to the script source.
For example, if your page is written in PHP, you can include the script via the include function and let the script decide if it's safe" to return it's source.
In this example, you'll need the external "js" (written in PHP) file myJs.php:
<?php
$URL = $_SERVER['SERVER_NAME'].$_SERVER['REQUEST_URI'];
if ($URL != "my-domain.example/my-page.php")
die("/\*sry, no acces rights\*/");
?>
// your obfuscated script goes here
that would be included in your main page my-page.php:
<script type="text/javascript">
<?php include "myJs.php"; ?>;
</script>
This way, only the browser could see the js file contents.
Another interesting idea is that at the end of your script, you delete the contents of your dom script element, so that after the browser evaluates your code, the code disappears:
<script id="erasable" type="text/javascript">
//your code goes here
document.getElementById('erasable').innerHTML = "";
</script>
These are all just simple hacks that cannot, and I can't stress this enough: cannot, fully protect your js code, but they can sure piss off someone who is trying to "steal" your code.
Update:
I recently came across a very interesting article written by Patrick Weid on how to hide your js code, and he reveals a different approach: you can encode your source code into an image! Sure, that's not bullet proof either, but it's another fence that you could build around your code.
The idea behind this approach is that most browsers can use the canvas element to do pixel manipulation on images. And since the canvas pixel is represented by 4 values (rgba), each pixel can have a value in the range of 0-255. That means that you can store a character (actual it's ascii code) in every pixel. The rest of the encoding/decoding is trivial.
The only thing you can do is obfuscate your code to make it more difficult to read. No matter what you do, if you want the javascript to execute in their browser they'll have to have the code.
Just off the top of my head, you could do something like this (if you can create server-side scripts, which it sounds like you can):
Instead of loading the script like normal, send an AJAX request to a PHP page (it could be anything; I just use it myself). Have the PHP locate the file (maybe on a non-public part of the server), open it with file_get_contents, and return (read: echo) the contents as a string.
When this string returns to the JavaScript, have it create a new script tag, populate its innerHTML with the code you just received, and attach the tag to the page. (You might have trouble with this; innerHTML may not be what you need, but you can experiment.)
If you do this a lot, you might even want to set up a PHP page that accepts a GET variable with the script's name, so that you can dynamically grab different scripts using the same PHP. (Maybe you could use POST instead, to make it just a little harder for other people to see what you're doing. I don't know.)
EDIT: I thought you were only trying to hide the location of the script. This obviously wouldn't help much if you're trying to hide the script itself.
Google Closure Compiler, YUI compressor, Minify, /Packer/... etc, are options for compressing/obfuscating your JS codes. But none of them can help you from hiding your code from the users.
Anyone with decent knowledge can easily decode/de-obfuscate your code using tools like JS Beautifier. You name it.
So the answer is, you can always make your code harder to read/decode, but for sure there is no way to hide.
Forget it, this is not doable.
No matter what you try it will not work. All a user needs to do to discover your code and it's location is to look in the net tab in firebug or use fiddler to see what requests are being made.
From my knowledge, this is not possible.
Your browser has to have access to JS files to be able to execute them. If the browser has access, then browser's user also has access.
If you password protect your JS files, then the browser won't be able to access them, defeating the purpose of having JS in the first place.
I think the only way is to put required data on the server and allow only logged-in user to access the data as required (you can also make some calculations server side). This wont protect your javascript code but make it unoperatable without the server side code
I agree with everyone else here: With JS on the client, the cat is out of the bag and there is nothing completely foolproof that can be done.
Having said that; in some cases I do this to put some hurdles in the way of those who want to take a look at the code. This is how the algorithm works (roughly)
The server creates 3 hashed and salted values. One for the current timestamp, and the other two for each of the next 2 seconds. These values are sent over to the client via Ajax to the client as a comma delimited string; from my PHP module. In some cases, I think you can hard-bake these values into a script section of HTML when the page is formed, and delete that script tag once the use of the hashes is over The server is CORS protected and does all the usual SERVER_NAME etc check (which is not much of a protection but at least provides some modicum of resistance to script kiddies).
Also it would be nice, if the the server checks if there was indeed an authenticated user's client doing this
The client then sends the same 3 hashed values back to the server thru an ajax call to fetch the actual JS that I need. The server checks the hashes against the current time stamp there... The three values ensure that the data is being sent within the 3 second window to account for latency between the browser and the server
The server needs to be convinced that one of the hashes is
matched correctly; and if so it would send over the crucial JS back
to the client. This is a simple, crude "One time use Password"
without the need for any database at the back end.
This means, that any hacker has only the 3 second window period since the generation of the first set of hashes to get to the actual JS code.
The entire client code can be inside an IIFE function so some of the variables inside the client are even more harder to read from the Inspector console
This is not any deep solution: A determined hacker can register, get an account and then ask the server to generate the first three hashes; by doing tricks to go around Ajax and CORS; and then make the client perform the second call to get to the actual code -- but it is a reasonable amount of work.
Moreover, if the Salt used by the server is based on the login credentials; the server may be able to detect who is that user who tried to retreive the sensitive JS (The server needs to do some more additional work regarding the behaviour of the user AFTER the sensitive JS was retreived, and block the person if the person, say for example, did not do some other activity which was expected)
An old, crude version of this was done for a hackathon here: http://planwithin.com/demo/tadr.html That wil not work in case the server detects too much latency, and it goes beyond the 3 second window period
As I said in the comment I left on gion_13 answer before (please read), you really can't. Not with javascript.
If you don't want the code to be available client-side (= stealable without great efforts),
my suggestion would be to make use of PHP (ASP,Python,Perl,Ruby,JSP + Java-Servlets) that is processed server-side and only the results of the computation/code execution are served to the user. Or, if you prefer, even Flash or a Java-Applet that let client-side computation/code execution but are compiled and thus harder to reverse-engine (not impossible thus).
Just my 2 cents.
You can also set up a mime type for application/JavaScript to run as PHP, .NET, Java, or whatever language you're using. I've done this for dynamic CSS files in the past.
I know that this is the wrong time to be answering this question but i just thought of something
i know it might be stressful but atleast it might still work
Now the trick is to create a lot of server side encoding scripts, they have to be decodable(for example a script that replaces all vowels with numbers and add the letter 'a' to every consonant so that the word 'bat' becomes ba1ta) then create a script that will randomize between the encoding scripts and create a cookie with the name of the encoding script being used (quick tip: try not to use the actual name of the encoding script for the cookie for example if our cookie is name 'encoding_script_being_used' and the randomizing script chooses an encoding script named MD10 try not to use MD10 as the value of the cookie but 'encoding_script4567656' just to prevent guessing) then after the cookie has been created another script will check for the cookie named 'encoding_script_being_used' and get the value, then it will determine what encoding script is being used.
Now the reason for randomizing between the encoding scripts was that the server side language will randomize which script to use to decode your javascript.js and then create a session or cookie to know which encoding scripts was used
then the server side language will also encode your javascript .js and put it as a cookie
so now let me summarize with an example
PHP randomizes between a list of encoding scripts and encrypts javascript.js then it create a cookie telling the client side language which encoding script was used then client side language decodes the javascript.js cookie(which is obviously encoded)
so people can't steal your code
but i would not advise this because
it is a long process
It is too stressful
use nwjs i think helpful it can compile to bin then you can use it to make win,mac and linux application
This method partially works if you do not want to expose the most sensible part of your algorithm.
Create WebAssembly modules (.wasm), import them, and expose only your JS, etc... workflow. In this way the algorithm is protected since it is extremely difficult to revert assembly code into a more human readable format.
After having produced the wasm module and imported correclty, you can use your code as you normallt do:
<body id="wasm-example">
<script type="module">
import init from "./pkg/glue_code.js";
init().then(() => {
console.log("WASM Loaded");
});
</script>
</body>

Dart wrapper over js library and use it in flutter app

Put it simple I want to make small currency exchange app (pet project- I want free API( 1000 requests per month including more currency is a perfect option)). I dont like the free APIs I have found so far but I have found this website https://bg.coinmill.com/ and I wanna use it for my purpose. Reading an answer to similar question:
The only way to make use of JS in Flutter is using WebView.
Dart compiles to JS only for browser applications, for Flutter it compiles >to native machine code.
convert js code direcly to dart, using package js
package JS doesn't convert JS, it just creates proxies for JS functions to >be able to call them from Dart, but that is also only supported in Dart web >applications.
Put it simple it isn't possible without hitting some compilation errors and some workarounds. However https://github.com/pichillilorenzo/flutter_inappbrowser looks promissing. Embedding the webpage that will look ugly and I won't have any control over ui/settings. My options now are looking for another free currency API or trying to find a workaround. I incline for another API, but not sure which one. Any suggestions ?
So basically what you actually want to do is use that website to do the currency conversion in the background (enter value, press "Convert"), then display the result in your Flutter app? You don't need javascript for that.
After entering pressing the submit button, the site simply redirects you to a different page (GET request) with an URL like this:
https://bg.coinmill.com/CAD_USD.html?CAD=22
Use dart's http library to perform the same request with the right currency/value parameters. The result of the request contains the source code of the web page.
Instead of displaying the web page, you just need to read the value you need from the source code of the web page:
<div id="currencyBox1">
<input class="currencyField" ... value="16.46" ...>
САЩ долар (USD)
</div>
So, how I understand your question, you have some js library, and you want to use it from Dart?
If question so, yes, you can do it using Dart JS Intertop. The more information in the link.
Edit
Yes, you are right, you can call js from Flutter only using evalJavascript function from flutter_webview_plugin.
You can use Firebase Cloud Functions and wrap your function in a callable function. You'll have all node js environment and Dart code will only call a function.

Wait for the JS to be executed before scraping the page

I'm trying to scrape the following page using hQuery: http://www.oddsportal.com/search/Paris+SG/soccer/
I realised half way that the odds of each game are included using JS (before, it's just -). Is there any way to get the page after the javascript has been executed or should I find another website??
My guess is that you would have to use another browser (not hQuery) and look into the code and see if there are any events that are emitted that you can catch up on.
You cannot using PHP
Scraping a site gives you whatever the server responds with to the HTTP request that you make (from which the "initial" state of the DOM tree is derived, if that content is HTML). It cannot take into account the "current" state of the DOM after it has been modified by Javascript.
You can using other powerful tools like selenium
You would need PhantomJs PHP wrapper for that is easy to use and gives more control and features, please see my answer here
Scraping a dynamically loading website with php curl
Hope it helps

How to locally fetch JSON without AJAX

I'm building a game using HTML5 Canvas and Javascript and I'm using JSON formatted tile maps for my levels. The tiles render correctly in FireFox, but when I use Chrome, the JSON fetching fails with a "Origin null is not allowed by Access-Control-Allow-Origin." I was using jQuery's $.ajax command and all my files are in one directory.
I would use this post's solution, but I can't use the web server solution.
Is there any other way to fetch JSON files to be parsed and read from? Something akin to loading an image just by giving the URL? Or is there some way to quickly convert my JSON files into globally available strings so I can parse it with JSON.parse()?
Why is the local web server not an option? Apache is free, can be installed on anything, and easy to use, IMO. Also, for Chrome specifically, look into --allow-file-access-from-files
But if nothing else works, maybe you could just add links to the files in script tags, and then append var SomeGlobalObject = ... to the top of each file. You might even be able to do this dynamically by using JS to append the script tag to head. But in the end, instead of using AJAX, you can just do JSON.parse(SomeGlobalObject)
In other words, load the files into the global namespace by adding script tags. Normally this would be considered bad practice, but used ONLY for testing, in the absence of any other options, it may work.
One option which may work for you in Chrome is to invoke the browser with the command line switch --allow-file-access-from-files. This question addresses the issue : Google Chrome --allow-file-access-from-files disabled for Chrome Beta 8
Another possibility is to fetch the JSON data as a script, setting a global variable to the JSON value

Categories