Difference in *.JS files opened with cURL and in Browser - javascript

If I open this .JS file (link text) in Browser I get back following in browser window:
var PHONE_CNT=2;var PHONE_CNT2=0;var PHONE_CNT3=0;var EMAIL_CNT=2;var SHOW_CNT=1795;var PH_c="";var PH_1=0;var PH_2=0;var PH_3=0;
PH_1 = "JUQyJUFCJThDJUM5JThFJUQzJTgzeSVDMiVEQyVCQ2ElQkUlREQlQzglOUUlOTR6JUE2bSVCN3ElOUIlRTglQzQlQkYlODUlRDklQjIlQzglQjclQUE=";
If I open the same file using cURL or ?php (file_get_contents) then the content differs:
var PHONE_CNT=0;var PHONE_CNT2=0;var PHONE_CNT3=0;var EMAIL_CNT=0;var SHOW_CNT=1;var PH_c="";var PH_1=0;var PH_2=0;var PH_3=0;
PH_1 = "JUQyJUFCJThDJUM5JThGJUMyJTg0JTlBJUJBJUM3JUJEdSVDMCVDRCVDOCVFNSU4RiU3RiVBNiVBOSVCOCU4MyU5MCVEOA==";
The difference is PH_1 value.
I tried to set different options for cURL but nothing helps. Any idea how to get .JS file content using cURL same as what I get when using browser.
Thank you in advace.

The server must be generating a different PH1 value based on some request parameters. You'll have to trace out the HTTP headers from both requests to see what causes the difference, e.g. setting up a local proxy such as http://www.fiddler2.com/fiddler2/ and making both requests through that.
It could be some combination of the user agent, accepts headers, cookies or the IP or country you're connecting from that's makign the difference - without knowing what the server logic is (or understanding what the different PH1 values mean) we can't really help you sorry.

Curl just gets the data from the server, it does not interpret javascript. If you want to interpret the javascript from the webpage, you'll have to use a javascript engine as spidermonkey.

Related

JavaScript in requests package python

I want to get text from a site using Python.
But the site uses JavaScript and the requests package to receive only JavaScript code.
Is there a way to get text without using Selenium?
import requests as r
a=r.get('https://aparat.com/').text
If the site loads content using javascript then the javascript has to be run in order to get the content. I ran into this issue a while back when I did some web scraping, and ended up using Selenium. Yes its slower than BeautifulSoup but it's the easiest solution.
If you know how the server works you could send a request and it should return with content of some kind (whether that be html, json, etc)
Edit: Load the developer tools, go to network tab and refresh the page. Look for an XHR request and the URL it uses. You may be able to use this data for your needs.
For example I found these URLs:
https://www.aparat.com/api/fa/v1/etc/page/config/mode/full
https://www.aparat.com/api/fa/v1/video/video/list/tagid/1?next=1
If you navigate to these in your browser you will notice JSON content, you might be able to use this. I think some of the text is encoded in Unicode e.g \u062e\u0644\u0627\u0635\u0647 \u0628\u0627\u0632\u06cc -> خلاصه بازی
I don't know the specific python implementation you might use. Look for libs that support making http requests and recieving data. That way you can avoid selenium. But you must know the URL's beforehand. Like shown above.
For example this is what I would do:
Make a http request to the URL you find in developer tools
With JSON content, use a JSON parser to get a table/array/dictionary natively. You can then traverse this in the native programming language.
Use a unicode decoder to get the text in normal text format, there might be a lib to do this, but for example on this website using the "Decode/Unescape Unicode Entities" I was able to get the text.
I hope this helps.
Sample code:
import requests;
req = requests.get('https://www.aparat.com/api/fa/v1/video/video/show/videohash/IueKs?pr=1&mf=1&referer=direct')
res = req.json()
#do stuff with res
print(res)

Is it possible to download an html file from a given webpage using my local network/browser as if I downloaded it myself with javascript or nodejs?

I’m a bit new to javascriipt/nodejs and its packages. Is it possible to download a file using my local browser or network? Whenever I look up scraping html files or downloading them, it is always done through a separate package and their server doing a request to a given url. How do I make my own computer download a html file as if I did right click save as on a google chrome webpage without running into any server/security issues and errors with javascript?
Fetching a document over HTTP(S) in Node is definitely possible, although not as simple as some other languages. Here's the basic structure:
const https = require(`https`); // use http if it's an http url;
https.get(URLString, res => {
const buffers = [];
res.on(`data`, data => buffers.push(data));
res.on(`end`, ()=>{
const data = Buffer.concat(buffers);
/*
from here you can do what you want with the data. You can write it to a file
with fs, you can console.log it using data.toString(), etc.
*/
});
})
Edit: I think I missed the main question you had, give me a sec to add that.
Edit 2: If you're comfortable with doing the above, the way you access a website the same way as your browser is to open up the developer tools (F12 on Chrome) go to the network tab, find the request that the browser has made, and then using http(s).get(url, options, callback), set the exact same headers in the options that you see in your browser. Most of the time you won't need all of them, all you'll need is the authentication/session cookie.

How can I load a local json file using d3?

I am trying to create a map and I have been using this tutorial https://newmedia.report/classes/coding/2018/mapping-in-d3/. I think the problem is that I can't load local files, but I am not sure how to fix it.
I have looked at other StackOverflow answers but keep getting the same problem. I tried setting up a dev server but it still isn't working. I also tried in firefox with the same code and got the error The Same Origin Policy disallows reading the remote resource at the file. (Reason: CORS request not HTTP). Then an error saying TypeError: NetworkError when attempting to fetch resource.
I am using all the same code from the tutorial but it isn't working.
Promise.all([
d3.json("ccc_precinct_topo.json"),
d3.csv("CCC_Primary_results.csv")
])
.then(function(data){
URL scheme must be "HTTP" or "HTTPS" for CORS request.
I keep getting an error like this for both files.
The recommended way would be to install a web server on your development system, e.g. XAMPP for a Windows or LAMP for a Linux system. Otherwise whenever you test something using AJAX calls you will run into problems of some sort.
Just for demonstration purposes you could save the JSON and CSV data as local variables. To do so, copy the contents of the JSON file "ccc_precinct_topo.json" into array data[0]:
var data= [];
data[0]= [...]; // contents of "ccc_precinct_topo.json"
In a second step, save the contents of the CSV file "CCC_Primary_results.csv" as a string into a new variable and use d3.csv.parse() to convert it into an array structure:
var csvContent= 'String|With|CSV|Values';
data[1]= d3.csv.parse(csvContent);
Now to see if you get correct values, send the data to the console:
console.log(data);
Open the Developer Tools (Hit F12) and refresh the page. In the Console section you should see an array with two elements, both should be array structures.
Instead of using Promises and d3.json(), d3.csv(), continue with the code you find below the .then(function(data){.
P.S. In most cases someone writing a tutorial about web services or pages assumes that the shown code will be used as part of a web project and thus will be loaded by a web server. But you are right, it could have been mentioned as a prerequisite, e.g. "Before you start, set up LAMP or XAMPP on your development system".

Javascript detect which host it was loaded from

I have a Javascript library I'm working on. It can be self-hosted or run from another server. The script makes a number of AJAX calls and the preferred method is making POST requests to the same host as the including page. To allow for cross-domain calls it also supports JSONP, but this limits the amount of data that can be sent (~2K to safely accommodate most modern browsers' URL length limits).
Obviously the user including the script knows where they're getting it from and could manually select JSONP as needed, but in the interest of simplifying things, I'd like to detect, within the script itself, whether the script was loaded from the same host as the page including it or not.
I'm able to grab the script element with jQuery but doing a $('script').attr('src') is only returning a relative path (e.g. "/js/my-script.js" not "http://hostname.com/js/my-script.js") even when it's being loaded from a different host.
Is this possible and if so, how would I go about it?
Thanks in advance.
Don't use JSONP, use CORS headers.
But if you really want to do JS check, use var t = $('script')[0].outerHTML.
Effect on my page:
[20:43:34.865] "<script src="http://www.google-analytics.com/ga.js" async="" type="text/javascript"></script>"
Checking location.host should do the trick.

Detect the file size of a link's href using JavaScript

Would like to write a script to detect the file size of the target of a link on a web page.
Right now I have a function that finds all links to PDF files (i.e. the href ends with '.pdf') and appends the string '[pdf]' to the innerText. I would like to extend it so that I can also append some text advising the user that the target is a large file (e.g. greater than 1MB).
Thanks
Some web servers may give you a Content-Length header in response to a HEAD request. You could potentially use an XmlHttpRequest to send the HEAD request and see what you get.
Here's what one of my IIS servers says about a PDF file:
HTTP/1.1 200 OK
Content-Length: 127791
Content-Type: application/pdf
...
However, anything that's not delivered directly by the web server (a file served by PHP or ASP.net, for example) won't work unless the script specifically handles HEAD requests.
You should be able to do a HEAD request using XMLHttpRequest, assuming the files are under the same domain.
This is however something that should really be done on the server side. Doing it with extra requests has no benefit whatsoever.
You can't do this, or at least - not in any practical-cross-browser way.
If you know the filesize beforehand, for example when generating the document linking to the files you could hard-code the sizes into the HTML document.
large_file.pdf

Categories