Extract javascript information from url with python

Extract javascript information from url with python - javascript

I have a URL that links to a javascript file, for example http://something.com/../x.js. I need to extract a variable from x.js
Is it possible to do this using python?
At the moment I am using urllib2.urlopen() but when I use .read() I get this lovely mess:
U�(��%y�d�<�!���P��&Y��iX���O�������<Xy�CH{]^7e� �K�\�͌h��,U(9\ni�A ��2dp}�9���t�<M�M,u�N��h�bʄ�uV�\��0�A1��Q�.)�A��XNc��$"SkD�y����5�)�B�t9�):�^6��`(���d��hH=9D5wwK'�E�j%�]U~��0U�~ʻ��)�pj��aA�?;n�px`�r�/8<?;�t��z�{��n��W
�s�������h8����i�߸#}���}&�M�K�y��h�z�6,�Xc��!:'D|�s��,�g$�Y��H�T^#`r����f����tB��7��X�%�.X\��M9V[Z�Yl�LZ[ZM�F���`D�=ޘ5�A�0�){Ce�L*�k���������5����"�A��Y�}���t��X�(�O�̓�[�{���T�V��?:�s�i���ڶ�8m��6b��d$��j}��u�D&RL�[0>~x�jچ7�
When I look in the dev tools to see the DOM, the only thing in the body is a string wrapped in tags. In the regular view that string is a json element.

.read() should give you the same thing you see in the "view source" window of your browser, so something's wrong. It looks like the HTTP response might be gzipped, but urllib2 doesn't support gzip. urllib2 also doesn't request gzipped data, so if this is the problem, the server is probably misconfigured, but I'm assuming that's out of your control.
I suggest using requests instead. requests automatically decompresses gzip-encoded responses, so it should solve this problem for you.
import requests
r = requests.get('https://something.com/x.js')
r.text # unparsed json output, shouldn't be garbled
r.json() # parses json and returns a dictionary
In general, requests is much easier to use than urllib2 so I suggest using it everywhere, unless you absolutely must stick to the standard library.

import json
js = urllib2.urlopen("http://something.com/../x.js").read()
data = json.loads(js)

Related

Using Javascript I want to import an XML from url and then convert it to JSON

Using Javascript... I am trying to figure out how to import the xml from here:
http://gamebattles.majorleaguegaming.com/xboxone/call-of-duty-black-ops-iii/team/team-cnk/stats.xml
and then convert it into Json so I can use it with AngularJS and display it on a front end app.
Any help would be appreciated.

Joe -
You can use jQuery to get the data from that URL and parse the XML.
https://api.jquery.com/jquery.get/
https://api.jquery.com/jQuery.parseXML/
From there you need to determine whether you want to convert it into JSON or use it as is. There is not a simple and clean way to convert XML to JSON due to the attributes fields present, but here is an example of some code to do it:
https://davidwalsh.name/convert-xml-json
If you are having cross domain issues, you will need to request that data server side with PHP, Java, Node.js or your back end of choice. You can do it with such things like this:
JAVA : http://docs.oracle.com/javaee/6/api/javax/servlet/http/HttpServletRequest.html
PHP: http://php.net/manual/en/function.file-get-contents.php
Node.js : https://davidwalsh.name/nodejs-http-request

Handling a windows-1250 URI in node.js/express

My app depends on a webservice to form it's URIs, so sometimes it comes up with (what I believe is) a windows-1250 encoded string (/punk%92d) and express fails as follows:
Connect
400 Error: Failed to decode param 'punk%92d'
at Layer.match
So I thought about converting each link to that segment into utf-8 (example: /punk’d, so there would be no reference to the offending enconding), and back again to windows-1250 to work with the external webservice.
I tried this approach using both iconv and iconv-lite but there's always something wrong with the results: /punk d, /punk�d, etc.
Here's a sample using iconv:
var str = 'punk’d';
var buf = new Buffer(str.toString('binary'), 'binary');
console.log(new Iconv('UTF-8', 'Windows-1250').convert(buf).toString('binary'));
…and iconv-lite:
console.log(iconv.decode(new Buffer(str), 'win1250'));
I know using binary is a bad approach, but I was hoping something, anything would just do the job. I obviously tried multiple variations of this code since my knowledge of Buffers is limited, an even simpler things wouldn't work, like:
console.log(new Buffer('punk’d').toString('utf-8'));
So I'm interested in either a way to handle those encoded strings in the URI within express, or an effective way to convert them within node.js.

Sorry if this seems like too simple of a thing to try, but since Node and Express are both JavaScript, have you tried simply using decodeURIComponent('punk’d')? It looks to me that it's simply a standard encoded URI. I think you're getting that weird output from iconv because you're converting from the wrong encoding.

how to make http request using javascript in vxml?

How to make http request using javascript in vxml?
(generally src contains link of any xml file for data element . but in my case it is not necessary to be a xml file. so i think i can't use data element here.)

There is nothing in pure ECMAScript supported by VXML browsers (that I know of -- unless someone has significantly extended their browser from the standard) that allows anything like what you seem to be asking for, like XMLHttpRequest for regular web AJAX requests. However, as Kevin Junghans mentioned, you could make use of the element to fetch a document which is expected to be XML. Some browsers may have extensions to the VXML standard that allow you to specify the file type coming back, letting you pick either XML or JSON.
However, a more generalized solution, if you don't know beforehand what format the fetched document will be in, may be to write a wrapper XML web service which in turn requests the desired document, and wraps it in XML.
e.g.
<var name="docURI" expr="'http://someserver/some/doc.json'" />
<data name="documentContents" src="myservice.xml.php" namelist="docURI" />
and write myservice.xml.php to return something like
<?xml version="1.0"?>
<documentWrapper>content from doc.json</documentWrapper>

How to apply a localization to a javascript string

I assigned a string to a javascript string object, such like :
var word = "Please input correct verb"
I want this string be in control by resource file in asp.net project. Does it provide the function to replace the string using a ASP.NET syntax to switch languages?
<%$ Resources:Registration, correctverb%>
Thanks.

There are various l18n projects for JavaScript, e.g. http://i18next.com/
If you have ResX files in your ASP project and you want them as JavaScript or JSON files you can convert them here; or via the REST API you could convert a resource file as follows:
$ curl --data-binary #messages.resx \
http://localise.biz/api/convert/resx/messages.json
(example in cURL, which I guess you may not have if you're on Windows)

A common approach for this is creating an HTTP handler that evaluates requests for say files with the extension *.js.axd (or whatever extension you come up with) and then parse the javascript file by replacing defined tokens with the actual localized resource value.
It may be costly only the first time the file is requested but then everything should run smoothly if caching is applied. Here's an example of how to create a handler, parsing the file should be trivial. You could use the same syntax to define localized strings on your file: <% LocalizedResourceName %>

How to unpack Javascript in Python

I would like to retrieve the contents of a javascript script instead of executing it upon requesting it.
EDIT: I understand that Python is not executing the javascript code. The issue is that when I request this online JS script it gets executed. I'm unable to retrieve the contents of the script. Maybe what I want is to decode the script like so http://jsunpack.jeek.org/dec/go
That's what my code looks like to request the js file:
def request(self, uri):
data = None
req = urllib2.Request(uri, data, self.header)
response = urllib2.urlopen(req)
html_text = response.read()
return html_text.decode()
I know approximately what the insides of the script look like but all I get after the request is issued is a 'loaded' message. My guess is that the JS code gets executed. Is there any way to just request the code?

There is no HTML or JavaScript interpreter in urllib2. This module does nothing but fetch the resource and return it to you raw; it certainly will not attempt to execute any JavaScript code it receives. If you are not receiving the response you expect, check the URL with a tool like wget or monitor the network connection with Wireshark or Fiddler to see what the server is actually returning.
(decode() here only converts the bytes of the HTTP response body to Unicode characters—using the default character encoding, which probably isn't a good idea.)
ETA:
I guess what I want is to decode the Javascript like so jsunpack.jeek.org/dec/go
Ah, well that's a different game entirely. You can get the source for that here, though you'll also need to install SpiderMonkey, the JavaScript engine from Mozilla, to allow it to run the downloaded JavaScript.
There's no way to automatically ‘unpack’ obfuscated JavaScript without running it, since the packing code can do anything at all and JS is a Turing-complete language. All this tool does is run it with some wrapper code for functions like eval which packers/obfuscators typically use. Unfortunately, this sabotage is easily detectable, so if it's malware you're trying to unpack you'll find this fails as often as it succeeds.

I'm not sure I understand. If I do a simplified version of your code and run it on a URI that's sure to have some javascript:
>>> import urllib2
>>> res = urllib2.urlopen("http://stackoverflow.com/questions/6946867/how-to-unpack-javascript-in-python")
And you print res (or res.decode()), the javascript is intact.
Doing urlopen should retrieve whatever character stream the source provides. It's up to you to do something with it (render it as html, interpret it as javascript, etc).

We Keep Coding

JavaScript is the programming language of the Web.

Extract javascript information from url with python - javascript

import json js = urllib2.urlopen("http://something.com/../x.js").read() data = json.loads(js)

Related

Using Javascript I want to import an XML from url and then convert it to JSON

Handling a windows-1250 URI in node.js/express

how to make http request using javascript in vxml?

How to apply a localization to a javascript string

How to unpack Javascript in Python

Categories

Resources