Parsing Javascript withing Html using HTML DOM PARSER - javascript

currently trying to parse the download link for zippyshare files in php the issue is I need to get their javascript and I am not being able to do it. This is the part of the page I need to parse:
<script type="text/javascript">
var somdfunction = function() {
var a = 327030;
document.getElementById('dlbutton').omg = 327033%78956;
var b = parseInt(document.getElementById('dlbutton').omg) * (327033%3);
var e = function() {if (false) {return a+b+c} else {return (a+3)%b + 3}};
document.getElementById('dlbutton').href = "/d/91667079/"+(b+18)+"/Animals%20%28Radio%20Edit%29-www.manomuzika.net.mp3";
if (document.getElementById('fimage')) {
document.getElementById('fimage').href = "/i/91667079/"+(b+18)+"/Animals%20%28Radio%20Edit%29-www.manomuzika.net.mp3";
}
var result = 0;
}
</script>
Which being fetched from its website using:
$html = file_get_html($url);
Basically they create the download links dynamically using javascript, I am able to get the source using my parser but I need to cut it down to getting the values of:
var a = 327030;
document.getElementById('dlbutton').omg = 327033%78956;
and finally
document.getElementById('dlbutton').href = "/d/91667079/"+(b+18)+"/Animals%20%28Radio%20Edit%29-www.manomuzika.net.mp3";
Once I am able to get these three variables from within the source I will be able to create the download link my issue at the moment is cutting it down to that.
I am using this parser:
http://simplehtmldom.sourceforge.net/
If you would like to see the source code I am able to parse at the moment here it is:
http://www.somf.us/music/test.php?url=http://www66.zippyshare.com/v/91667079/file.html

You need to use regex because simple is not a javascript parser.
Here's a hint to get you started:
preg_match('/var a = (\d+);/', file_get_contents($url), $m);
echo $m[1];

Related

Get data from script tag with Scrapy Xpath and using it as CSV

I've been trying to extract data from script tag using Scrapy(xpath). My main issue is with identifying the correct div and script tags. I'm new to using xpath and would be thankful for any kind of help!
<script>
var COUNTRY_SHOP_STATUS = "buy";
var COUNTRY_SHOP_URL = "";
try {
digitalData.page.pathIndicator.depth_2 = "mobile";
digitalData.page.pathIndicator.depth_3 = "mobile";
digitalData.page.pathIndicator.depth_4 = "smartphones";
digitalData.page.pathIndicator.depth_5 = "galaxy-s8";
digitalData.product.pvi_type_name = "Mobile";
digitalData.product.pvi_subtype_name = "Smartphone";
digitalData.product.model_name = "SM-G950F";
digitalData.product.category = digitalData.page.pathIndicator.depth_3;
} catch(e) {}
</script>
I would finally like to populate my csv file with the data of model.name and depth 3, 4 and 5. I've tried the other solutions from the questions similar to this one but they seem to not work...
You can use regex to extract required values:
import re
source = response.xpath("//script[contains(., 'COUNTRY_SHOP_STATUS')]/text()").extract()[0]
def get_values(parameter, script):
return re.findall('%s = "(.*)"' % parameter, script)[0]
print(get_values("pathIndicator.depth_5", source))
print(get_values("pvi_subtype_name", source))
print(get_values("model_name", source))
...

How to create an array or object of variables by looking them in the DOM?

I am generating some JS variables on a Twig template and I am prefixing them with a dynamic value. For example:
<script type="text/javascript">
quoteGridId = 'grid_quote';
</script>
<script type="text/javascript">
quoteContactGridId = 'grid_quote_contact';
</script>
<script type="text/javascript">
archiveGridId = 'grid_archive';
</script>
I need to be able to use them in a Javascript file included after the page loads. How can I create an array of values containing all the *GridId vars?
I would like to be able to use the following on the script:
[
'quoteGridId' => 'grid_quote',
'quoteContactGridId' => 'grid_quote_contact',
'archiveGridId' => 'grid_archive',
]
UPDATE:
Let's try to get my problem clear for those ones opened to help. Currently I am working on a legacy system. Such system had a grid generating a gridId value and at the end a JS file was included and such file was using the var gridId to perform several things.
Now I need to replicate more than one grid on the same page and that becomes a problem since two grids are generating the same var name:
gridId = 'something';
gridId = 'something1';
When the script try to reach the gridId var is getting always the latest one (something1) and therefore no actions are being taken on the first grid.
My solution was to prefix the name to each gridId resulting on what I've as OP. Ex:
somethingGridId = 'something';
something1GridId = 'something1';
What I am trying to find is a way to re-use the main JS file by passing those dynamic gridIds but I can't find a way to get this to work.
The only solution I've in mind is to create the same file per grid and then change the value of gridId to the name of the ID to be used ....
I am open to ideas, any?
You can search the window variables with regex expressions (regular expression expressions?) i.e./.+GridId/ matches any word or variable that ends in GridId you can then iterate over them as you wish.
Example:
var pattern = /.+GridId/;
GridIds = []
for (var varName in window) {
if (pattern.test(varName)) {
GridIds.push({varName:window[varName]})
}
}
console.log(GridIds);
<script type="text/javascript">
quoteGridId = 'grid_quote';
</script>
<script type="text/javascript">
quoteContactGridId = 'grid_quote_contact';
</script>
<script type="text/javascript">
archiveGridId = 'grid_archive';
</script>
Hope this helps!
Instead of assigning quoteGridId = 'grid_quote', why don't you create a top level object and then assigning each var as a key-val pair, like:
var gridData = {}
gridData.quoteGridId = 'grid_quote'
gridData.quoteContactGridId = 'grid_quote_contact';
/* etc assignments */
//Access your data points like so in a loop, if you choose
Object.keys(gridData).forEach(key => {
const val = gridData[key]
//User `key`, and `val` however you'd like
})
i think you have you use List.
each time you push you value to the list like that :
var myList = [];
myList.push('something');
myList.push('something1');
now you cann access to all of them like that :
console.log(myList[0]);
console.log(myList[1]);
or just last :
console.log(myList[myList.length - 1])

how to get element from XML using javascript

I have to change my project form XML to JavaScript (at least some parts of it).
So I had construction like this :
$xml = simplexml_load_file("http://...);
$profile = $xml->profile;
$user_id = $profile->user_id;
Now I wanted to translate this into JavaScript so I used :
var xmlHttp_subscribe = new XMLHttpRequest();
xmlHttp_subscribe.onreadystatechange=postCall;
xmlHttp_subscribe.open("GET","http://...",true);
xmlHttp_subscribe.send();
and now function postCall()
function postCall(){
var t = document.getElementsByName("MY_API").value;
alert('t'+t);
var p = document.getElementsByName("profile").value;
alert('p'+p);
var h = document.getElementsByName("user_id").value;
//...//
}
The XML is under my http:// is like that :
<MY_API>
<profile>
<user_id>the_user_id</user_id>
</profile>
</MY_API>
What I would like to do is to get this 'the_user_id' part as string in plain text.
Does any one have any idea how to do this?
Am I looking in the good direction?
Thanks for any kind of help.
There is no function "getElementsByName". What you need is getElementsByTagName.
Check this link out, it should be what you're looking for
As suggested by Pineda, the right function name is getElementsByTagName, and in addition the right property name is not "value" but nodeValue, so you should use
function postCall(){
var h = document.getElementsByTagName("user_id").nodeValue;
}

JSON stringify works when previewing from Dreamweaver but not from .AIR file when application created

I've been developing a simple application using Adobe AIR, the HTML and javascript version.
The application submits a form to an online URL.
The values of the forms are JSON strings.
I'm using this function to submit the data:
function fetchStudents()
{
var stmt = new air.SQLStatement();
stmt.sqlConnection = conn;
stmt.text = "SELECT * FROM studentsTable2 WHERE deleted='0'";
stmt.addEventListener(air.SQLEvent.RESULT, function(event){
var result = event.target.getResult();
sync_students = JSON.stringify(result.data);
fetchCourses();
});
stmt.addEventListener(air.SQLErrorEvent.ERROR, errorHandler);
stmt.execute();
}
JSON.stringify works when I test the application in DREAMWEAVER by using: preview in ADOBE AIR.
sync_students is then a JSON string filled with all the data from the table correctly formatted.
But when I have created the AIR file and installed the application and run it, it no longer works.
sync_students is then a JSON string but it is completely empty... [{},{},{}]
I have read around a lot and seen suggestions to use JSON2.js etc and I have tried these but I haven't been successful.
This is driving me crazy, any help would be greatly appreciated.
Thanks so much in advance!
Have a look at the accepted answer of this question. The problem here was that garbage collection reclaims some var 'too soon' : in other words, the scope is baaad. :
AIR Sqlite: SQLEvent.RESULT not firing, but statement IS executing properly
EDIT : All this is a question of scope. You should look into 'javascript scope' and 'javascript closure' keywords to have a better understanding of this (if i may suggest).
below just a (very) short summary about ONE WAY to define a 'class' in javascript :
var MyNameSpace = {};
var MyNameSpace.SchoolDataSource = function() {
this.publicMember = 2;
this.publicFunction = function(x) {
var newValue = this.publicMember + _privateMemberOne + x;
return _privateFunctionMul2(newValue);
} ;
var _privateMemberOne = 1;
var _privateFunctionMul2 = function (y) { return 2*y; } ;
};
var mySchoolDataSource = new MyNameSpace.SchoolDataSource();
mySchoolDataSource.publicFunction(3); // ok. (returns (3+2+1) * 2 = 12)
var bar = mySchoolDataSource.publicMember; // ok. ( === 2)
mySchoolDataSource._privateFunctionMul2(4); does not work, which is what we want.
var foo= mySchoolDataSource._privateMemberOne; // does not work, which is what we want.

wikimedia api getting relavant data from json string

This is the question I asked yesterday. I was able to get the required data. The final data is like this. Please follow this link.
I tried with the following code to get all the infobox data
content = content.split("}}\n");
for(k in content)
{
if(content[k].search("Infobox")==2)
{
var infobox = content[k];
alert(infobox);
infobox = infobox.replace("{{","");
alert(infobox);
infobox = infobox.split("\n|");
//alert(infobox[0]);
var infohtml="";
for(l in infobox)
{
if(infobox[l].search("=")>0)
{
var line = infobox[l].split("=");
infohtml = infohtml+"<tr><td>"+line[0]+"</td><td>"+line[1]+"</td></tr>";
}
}
infohtml="<table>"+infohtml+"</table>";
$('#con').html(infohtml);
break;
}
}
I initially thought each element is enclosed in {{ }}. So I wrote this code. But what I see is, I was not able to get the entire infobox data with this. There is this element
{{Sfn|National Informatics Centre|2005}}
occuring which ends my infobox data.
It seems to be far simpler without using json. Please help me
Have you tried DBpedia? Afaik they provide template usage information. There is also a toolserver tool named Templatetiger, which does template extraction from the static dumps (not live).
However, I once wrote a tiny snippet to extract templates from wikitext in javascript:
var title; // of the template
var wikitext; // of the page
var templateRegexp = new RegExp("{{\\s*"+(title.indexOf(":")>-1?"(?:Vorlage:|Template:)?"+title:title)+"([^[\\]{}]*(?:{{[^{}]*}}|\\[?\\[[^[\\]]*\\]?\\])?[^[\\]{}]*)+}}", "g");
var paramRegexp = /\s*\|[^{}|]*?((?:{{[^{}]*}}|\[?\[[^[\]]*\]?\])?[^[\]{}|]*)*/g;
wikitext.replace(templateRegexp, function(template){
// logabout(template, "input ");
var parameters = template.match(paramRegexp);
if (!parameters) {
console.log(page.title + " ohne Parameter:\n" + template);
parameters = [];
}
var unnamed = 1;
var p = parameters.reduce(function(map, line) {
line = line.replace(/^\s*\|/,"");
var i = line.indexOf("=");
map[line.substr(0,i).trim() || unnamed++] = line.substr(i+1).trim();
return map;
}, {});
// you have an object "p" in here containing the template parameters
});
It features one-level nested templates, but still is very error-prone. Parsing wikitext with regexp is as evil as trying to do it on html :-)
It may be easier to query the parse-tree from the api: api.php?action=query&prop=revisions&rvprop=content&rvgeneratexml=1&titles=....
From that parsetree you will be able to extract the templates easily.

Categories