How extract some part of a javascript from a webpage using jsoup? - javascript

i want to extract some data from blow script
$(document).ready(function(){
$("#areaName").val(1);$("#state").val(29);$("#city").val(1);
$("#subareaName").val(1);$("#lane").val(1);
}
like areaName value = 1, state value = 29, city value = 1, subareaName value = 1, lane value = 1
How can i achieve this using jsoup?

Jsoup is html (xml) parser. You can use it to extract javascript from page source for example like this: Elements scripts = document.select("script");
Then you will have to parse the script by yourself. You can use regex to do so.
Here is an example.
final String propertyName = "areaName";
final String regex = "#" + propertyName + ".*?val\\((.*?)\\)";
final String script = "$(document).ready(function(){ \n"
+ " $(\"#areaName\").val(1);$(\"#state\").val(29);$(\"#city\").val(1);\n"
+ " $(\"#subareaName\").val(1);$(\"#lane\").val(1);\n"
+ "}";
final Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
final Matcher matcher = pattern.matcher(script);
if (matcher.find() && matcher.groupCount() > 0) {
String areaName = matcher.group(1);
System.out.println(propertyName + ": " + areaName);
}

Related

regex replace jumbles string

This is my console.log:
str : +0-D : replace : Da href="Javascript:PostRating('','|P|622','+0')">+0</a>-D
I have the following function:
function replaceAll_withMatching(str, find, rep, prodId) {
//console.log(str + " : " + find + " : " + rep + " : " + prodId);
var returnString = "";
if (find.toLowerCase() == str.toLowerCase()) {
returnString = rep;
} else {
escfind = "\\" + find ;
var regexp = new RegExp(escfind, "i");
var match = regexp.test(str);
if (match) {
var regAHREF = new RegExp("\\<a", "i");
var AHREFMatch = regAHREF.test(str);
if (AHREFMatch == false) {
str = str.replace(regexp, rep);
str = replaceProductAll(str, PRODUCT_PLACEHOLD, prodId);
} else {
var aTagText = $($.parseHTML(str)).filter('a').text();
if ((find !== aTagText) && (aTagText.indexOf(find) < 0)) {
console.log(regexp);
console.log("str : " + str + " : replace : " + str.replace(regexp, rep));
str = str.replace(regexp, rep);
}
}
}
//console.log(str);
returnString = str;
}
//returnString = replaceProductAll(returnString, PRODUCT_PLACEHOLD, prodId);
return returnString;
}
This function looks for a "<a>" tag, if it doesn't find one then it does the replace. If it does find one it has some conditions that if everything checks out it does another replace.
The string that I'm passing in has been already "parsed" on the +0:
+0-D
In the second pass I'm expecting it to find the "D" in the above string, and then do the following replacement:
D
But as you can see, after the 2nd replace it is jumbling the string and producing malformed HTML
Da href="Javascript:PostRating('','|P|622','+0')">+0</a>-D
More Context:
I have a string that needs to have a replace done on it. This is existing code so I'm not in a position to rework the function.
The original string is: +0-D
This string gets passed into the function below multiple times looking for matches and then if it finds a match it will replace it with the value (also passed in as a parameter).
When the +0-D gets passed in the first time the +0 is matched and a replace is done: +0
Then the next time the string is passed in: +0-D. The function finds the D as a match and it looks like it attempts to do a replace. But it is on this pass that the string gets jumbled.
Ultimately what I'm expecting is this:
+0-D
This is what I'm currently getting after the 2nd attempt:
Da href="Javascript:PostRating('','|P|622','+0')">+0</a>-D
Further Context:
The +0-D is one of many strings this function handles. Some are very simple (i.e. Just +0), others are much more complex.
Question:
Based on the above, what do I need to do to get the regex to not jumble the string?
The problem was in the escfind:
escfind = "\\" + find;
var regexp = new RegExp(escfind,"i");
var match = regexp.test(str);
The first thing I did was in the 2nd replace clause I created a new RegExp to not use the "\\" + find;
if((find !== aTagText) && (aTagText.indexOf(find) < 0)){
try{
var regexp2 = new RegExp(find,"i");
var match2 = regexp2.test(str);
console.log(str.replace(regexp2,rep));
}catch(err){
console.log(err);
}
}
Then my string began to return as expected, however, when I opened it up to all the variations I was getting the Unexpected quantifier error.
Then I found this question - which lead me to escape out my find:
Once I replaced my code with this:
escfind = find.replace(/([*+.?|\\\[\]{}()])/g, '\\$1');
Then I got the output as expected:
<a href='+0'>+0</a>-<a href='D'>D</a>

JS/JQUERY: How to match and replace occurances inside specified strings?

I swear i tried figuring this out myself all day, but my regex-foo is just not that good.
I'm trying to create a small parser function to convert strings with urls to html coded and tags
I know how complex a regex can be trying to figure out which urls to covert to what from a big string, so what I did is simply prefix the string to covert with a flag to tell the parser how to format it, and post fix it with the ";" char to tell the parser where that particular URL ends. This way the parser has lesser guest work to do resulting in easier to regex-match and faster for execution. I really dont need a generalize match and replace all.
So my formatting is as follows, where "X" is the url string:
For URLs it will be url=X;
For IMAGES it will be img=X;
so anything in between my prefix and post fix must be converted accordingly..
So for example, for images in my document, the string could be:
click this image img=http://example.com/image1.jpg;
and i need that converted to
click this image <a href="http://example.com/image1.jpg" target="_blank">
<img class="img img-responsive" src="http://example.com/image1.jpg"/></a>
I am able to do this easily in PHP buy preg_match() function
preg_match('/\img=(.+?)\;/i', $item_des, $matches)
here's the code block:
I decided to push this routine to the browser instead of the backend (PHP) so i need similar or better JS solution.
Hoping anyone can help here, thanks!
try code below:
var str = "click this image img=http://example.com/image1.jpg;image2 img=http://example.com/image2.jpg;"
var phrases = str.split(';');
var totalRes = '';
phrases.forEach(function(str){
totalRes += processPhrase(str);
});
console.log(totalRes);
function processPhrase(str) {
var img = str.split('img=')
var res = '';
if (img.length > 1) { //img=X
var url = img[1].replace(';', '');
res = img[0] + "<a href='" + url + "' target='_blank'><img src='" + url + "'/></a>";
} else {
var url = str.split('url=');
//Do for url=X here
}
console.info(res);
return res;
}
You can use this regexp /(img|url)=(.+?);/g:
(img|url) : the type, should be grouped so we will know what to do with the value
= : literal "="
(.+?) : a number of characters (use the non-greedy ? so it will match as fewer as possible)
; : literal ";"
Read more about non-greedy regexps here.
Example:
var str = "click this image img=http://i.imgur.com/3wY30O4.jpg?a=123&b=456; and check this URL url=http://google.com/;. Bye!";
// executer is an object that has functions that apply the changes for each type (you can modify the functions for your need)
var executer = {
"url": function(e) {
return '<a target="_blank" href="' + e + '">' + e + '</a>';
},
"img": function(e) {
return '<a target="_blank" href="' + e + '"><img src="' + e + '"/></a>';
}
}
var res = str.replace(/(img|url)=(.+?);/g, function(m, type, value) {
return executer[type](value); // executer[type] will be either executer.url or executer.img, then we pass the value to that function and return its returned value
});
console.log(res);

adding float in jquery

var sssee = "581.30";
var ssser = "1,178.70";
var ssee = sssee.trim().replace(/,/g, "");
var sser = ssser.trim().replace(/,/g, "");
console.log("ee " + ssee)
console.log("er " + sser)
console.log("total " + parseFloat(ssee + sser))
In log i see:
ee 581.30
er 1178.70
total 581.301178
Why is it when adding replace to remove the , messes the computation.
Variables ssee and sser are both strings. When you peform ssee + sser it would return string 581.301178.70 which would be passed to parseFloat function then. When there are two decimal points, only first is taken as correct, that's why parseFloat returns 581.301178.
Check the snippet with correct solution.
var sssee = 581.30;
var ssser = "1178.70";
var ssee = String(sssee).trim().replace(/,/g, "");
var sser = String(ssser).trim().replace(/,/g, "");
console.log("ee " + ssee)
console.log("er " + sser)
console.log("total " + (parseFloat(ssee) + parseFloat(sser)))
You should also wrap ssee and ssser in String object before using trim and replace methods. Without doing that if you provide those variables as floats, instead of strings, your code won't work.
Your problem:
You concatenate two strings ("581.30" + "1,178.70") to one string ("581.301178.70"). Then you parse it to a float (581.301178).
Solution:
You need to parse each one to a float at first. After do your addition (parseFloat(ssee) + parseFloat(sser)).

Update uri hash javascript

I find it hard to believe this hasn't been asked but I can find no references anywhere. I need to add a URI hash fragment and update the value if it already is in the hash. I can currently get it to add the hash but my regex doesn't appear to catch if it exists so it adds another instead of updating.
setQueryString : function() {
var value = currentPage;
var uri = window.location.hash;
var key = "page";
var re = new RegExp("([#&])" + key + "=.*#(&|$)", "i");
var separator = uri.indexOf('#') !== -1 ? "&" : "#";
if (uri.match(re)) {
return uri.replace(re, '$1' + key + "=" + value + '$2');
}
else {
return uri + separator + key + "=" + value;
}
},
Also if this can be made any cleaner while preserving other url values/hashes that would be great.
example input as requested
Starting uri value:
www.example.com#page=1 (or no #page at all)
then on click of "next page" setQueryString gets called so the values would equal:
var value = 2;
var uri = '#page1'
var key = 'page'
So the hopeful output would be '#page2'.
As to your regex question, testing if the pattern #page=(number) or &page=(number) is present combined with capturing the number, can be done with the regex /[#&]page\=(\d*)/ and the .match(regex) method. Note that = needs escaping in regexes.
If the pattern exists in the string, result will contain an array with the integer (as a string) at result[1]. If the pattern does not exist, result will be null.
//match #page=(integer) or &page=(integer)
var test = "#foo=bar&page=1";
var regex = /[#&]page\=(\d*)/;
var result = test.match(regex);
console.log(result);
If you want to dynamically set the key= to something other than "page", you could build the regex dynamically, like the following (note that backslashes needs escaping in strings, making the code a bit more convoluted):
//dynamically created regex
var test = "#foo=bar&page=1";
var key = "page"
var regex = new RegExp("[#&]" + key + "\\=(\\d*)");
var result = test.match(regex);
console.log(result);

How to alert a java script object in its string representation format ?

How to alert a java script object in its string representation format ?
For example, If there is a variable like this :
var a = {1:"abc",2:"xyz"};
How it can be printed out like below format using alert(a) or something like that ?
1 : abc
2 : xyz
JSON.stringify will convert your javascript object to String. Then your can replace "," with "\n" to show each field in new line. If you want to remove "{" then you can do .replace("{","")
var a ={x:"sdfd"}
alert(JSON.stringify(a).replace(",","\n"));
You can use this:
var a = {1:"abc",2:"xyz"};
var s = "";
for(var i in a){
s = s + "\n" + i + ":" + " " + a[i];
}
alert(s);
Or
alert(JSON.stringify(a));
If you want to debug objects in javascript, you must see to console.log )) try it!

Categories