Javascript -- call to string.search(/regex/) occaisionally crashes program - javascript

I'm writing a native javascript app for android, and it involves a short regex call. The following function should select the inner string from a block of html, shorten it if it's too long, then add it back into the html block. (Most of the time anyway -- I couldn't write a perfect html parser.)
My problem is that on certain inputs, this code crashes on the command "str.search(regex)". (It prints out the alert statement right before the command, "Pre-regex string: ", but not the one afterwards, "Pos: ".) Since the app is running on the android, I can't see what error is being thrown.
Under what circumstances could javascript code possibly crash when calling "search()" on a string? There's nothing wrong with the regex itself, because this works most of the time. I can't duplicate the problem either: If I copy the string character by character and feed it into the function outside of the app, the function doesn't crash. Inside the app, the function crashes on the same string.
Here is the function. I tabbed the alert calls differently to make them easier to see.
trimHtmlString: function(str, len, append) {
append = (append || '');
if(str.charAt(0) !== '<') {
if(str.length > len) return str.substring(0, len) + append;
return str;
}
alert('Pre-regex string: '+str);
var regex = />.+(<|(^>)$)/;
var innerStringPos = str.search(regex);
if(innerStringPos == -1) return str;
alert('Pos: '+innerStringPos);
var innerStringArray = str.match(regex);
alert('Array: '+innerStringArray);
var innerString = innerStringArray[0];
alert('InnerString: '+innerString);
var innerStringLen = innerString.length;
innerString = innerString.substring(1, innerString.length-1);
alert(innerString.length);
if(innerString.length > len) innerString = innerString.substring(0, len) + append;
return str.substring(0, innerStringPos+1)
+ innerString
+ str.substring(innerStringPos+innerStringLen-1, str.length);
}

First, do not parse HTML with regular expressions. You have been warned. Next, make sure you are always passing an actual string. Calling .search() on null or undefined will cause problems. Maybe you can provide an example input that is crashing?

IMO, your regex generate an error because you use the begin anchor ^ after the begin of the string. For example:
<span>rabbit</span> don't generate an error
<span>rabbit generate an error
the reason is that the first use the first alternation, ie : <
and the second use the second alternation: (^>)$ that have no sense because your pattern has already begun with >.+
For example, if you want to obtain the word "rabbit" in the two precedent cases, you can use: /(?<=>)[^<]+/ instead
However, using a DOM way will be safer.

Related

RegEx change function name and parameter of string

I'm awful with RegEx to begin with. Anyway I tried my best and think I got pretty far, but I'm not exactly there yet...
What I have:
A javascript source file that I need to process in Node.js. Can look like that:
var str = "require(test < 123)\n\nrequire(test2 !== test)\n\nfunction(dontReplaceThisParam) {\n console.log(dontReplaceThisParam)\n}";
What I came up with:
console.log(str.replace(/\(\s*([^)].+?)\s*\)/g, 'Debug$&, \'error_1\''))
Theres a few problems:
I want that the string error gets inside the paranthesis so it acts as a second parameter.
All function calls, or I think even everything with paranthesis will be replaced. But only function calls to "require(xxx)" should be touched.
Also, the error codes should somehow increment if possible...
So a string like "require(test == 123)" should convert to "requireDebug(test == 123, 'error_N')" but only calls to "require"...
What currently gets outputted by my code:
requireDebug(test < 123), 'error_1'
requireDebug(test2 !== test), 'error_1'
functionDebug(dontReplaceThisParam), 'error_1' {
console.logDebug(dontReplaceThisParam), 'error_1'
}
What I need:
requireDebug(test < 123, 'error_1')
requireDebug(test2 !== test, 'error_2')
function(dontReplaceThisParam) {
console.log(dontReplaceThisParam)
}
I know I could just do things like that manually but we're talking here about a few hundred source files. I also know that doing such things is not a very good way, but the debugger inside the require function is not working so I need to make my own debug function with an error code to locate the error. Its pretty much all I can do at the moment...
Any help is greatly appreciated!
Start the regex with require, and since you need an incrementing counter, pass a function as the second arg to replace, so that you can increment and insert the counter for each match.
var str = "require(test < 123)\n\nrequire(test2 !== test)\n\nfunction(dontReplaceThisParam) {\n console.log(dontReplaceThisParam)\n}";
var counter = 0;
console.log(str.replace(/require\(\s*([^)].+?)\s*\)/g, (s, g2) =>
`requireDebug(${g2}, \'error_${++counter}\')`
));
Other than that, your code was unaltered.

Match a decimal number and replace non numeric characters in javascript

I am using the the following function in javascript.
function chknumber(a) {
a.value = a.value.replace(/[^0-9.]/g, '', '');
}
This function replaces any non numeric character entered in a textbox on whose onkeyup i have called the above function. The problem is it allows this string as well
1..1
I want the function to replace the second dot character as well. Any suggestions will be helpful.
I don't advocate simplistically modifying fields while people are trying to type in them, it's just too easy to interfere with what they're doing with simple handlers like this. (Validate afterward, or use a well-written, thoroughly-tested masking library.) When you change the value of a field when the user is typing in it, you mess up where the insertion point is, which is really frustrating to the user. But...
A second replace can correct .. and such:
function chknumber(a) {
a.value = a.value.replace(/[^0-9.]/g, '').replace(/\.{2,}/g, '.');
}
That replaces two or more . in a row with a single one. But, it would still allow 1.1.1, which you probably don't want. Sadly, JavaScript doesn't have lookbehinds, so we get into more logic:
function chknumber(a) {
var str = a.value.replace(/[^0-9.]/g, '').replace(/\.{2,}/g, '.');
var first, last;
while ((first = str.indexOf(".")) !== (last = str.lastIndexOf("."))) {
str = str.substring(0, last) + str.substring(last+1);
}
if (str !== a.value) {
a.value = str;
}
}
Can't guarantee there aren't other edge cases and such, and again, every time you assign a replacement to a.value, you're going to mess up the user's insertion point, which is surprisingly frustrating.
So, yeah: Validate afterward, or use a well-written, thoroughly-tested masking library. (I've had good luck with this jQuery plugin, if you're using jQuery.)
Side note: The second '' in your original replace is unnecessary; replace only uses two arguments.
try with match method if your input is "sajan12paul34.22" the match function will return a array contain [12 , 34.22]
the array index [0] is used for getting first numeric value (12)
function chknumber(a) {
a.value = a.value.match(/[0-9]*\.?[0-9]+/g)[0];
}

What does the following JavaScript function do and what can I use it for?

What does the following code do and what is the use of it?
JavaScript
function removeHtmlTag(strx,chop){
if(strx.indexOf("<")!=-1)
{
var s = strx.split("<");
for(var i=0;i<s.length;i++){
if(s[i].indexOf(">")!=-1){
s[i] = s[i].substring(s[i].indexOf(">")+1,s[i].length);
}
}
strx = s.join("");
}
chop = (chop < strx.length-1) ? chop : strx.length-2;
while(strx.charAt(chop-1)!=' ' && strx.indexOf(' ',chop)!=-1) chop++;
strx = strx.substring(0,chop-1);
return strx+'...';
}
It parses HTML and removes tags in a manner that is really pretty loose. It can fail in certain circumstances. For example if there's a > inside an attribute value, or if there's a < in the text without a tag name directly after it, it'll mess up the result.
It also optionally truncates the text returned. The while loop ensures that the truncated text happens at a space character.
So if you pass it a string of HTML, aside from the problems I noted above, it'll give you back the string without the tags. And if you pass it a number as the second argument, it'll limit the length to that number (again, except that it'll add to it to avoid chopping a word in half).

How to search DOM to count the number of $ symbol found on a product page?

I am looking to find the best possible way to find how many $ symbols are on a page. Is there a better method than reading document.body.innerHTML and calc how many $-as are on that?
Your question can be split into two parts:
How can we get the the webpage text content without HTML tags?
We can generalize the second question a bit.
How can we find the number of string occurrences in another string?
And the 'best possible way to do this':
Amaan got the idea right of finding the text, but lets take it further.
var text = document.body.innerText || document.body.textContent;
Adding textContent to the code helps us cover more browsers, since innerText is not supported by all of them.
The second part is a bit trickier. It all depends on the number of '$' symbol occurrences on the page.
For example, if we know for sure, that there is at least one occurrence of the symbol on the page we would use this code:
text.match(/\$/g).length;
Which performs a global regular expression match on the given string and counts the length of the returned array. It's pretty fast and concise.
On the other hand, if we're not sure if the symbol appears on the page at least once, we should modify the code to look like this:
if (match = text.match(/\$/g)) {
match.length;
}
This just checks the value returned by the match function and if it's null, does nothing.
I would recommend using the third option only when there is a large occurrence of the symbols in the page or you're going to perform the search many many times. This is a custom function (taken from here) to count the occurrence of the specified string in another string. It performs better than the other two, but is longer and harder to understand.
var occurrences = function(string, subString, allowOverlapping) {
string += "";
subString += "";
if (subString.length <= 0) return string.length + 1;
var n = 0,
pos = 0;
var step = (allowOverlapping) ? (1) : (subString.length);
while (true) {
pos = string.indexOf(subString, pos);
if (pos >= 0) {
n++;
pos += step;
} else break;
}
return (n);
};
occurrences(text, '$');
I'm also including a little jsfiddle 'benchmark' so you can compare these three different approaches yourself.
Also: No, there isn't a better way of doing this than just getting the body text and counting how many '$' symbols there are.
You should probably use document.body.innerText or document.body.textContent to avoid getting your HTML give you false positives.
Something like this should work:
document.body.innerText.match(/\$/g).length;
An alternate way I can think of, would be to use window.find like this:
var len = 0;
while(window.find('$') === true){
len++;
}
(This may be unreliable because it depends on where the user clicked last. It will work fine if you do it onload, before any user interaction.)

Using javascript regex to translate a html

I would like to build my own translation function in javascript.
I already have a function language.lookup(key) which translates a word or expression:
var frenchHello = language.lookup('hello') //'bonjour'
Now I would like to write a function which takes a html string and translates it with my lookup function. In the html string I will have a special syntax for example #[translationkey] that will point out that this word should be translated.
This is the result I want:
var html = '<div><span>#[hello]</span><span>#[sir]</span>'
language.translate(html) //'<div><span>bonjour</span><span>monsieur</span>
How would I write language.translate?
My idea is to filter out my special syntax with regex and then run language.lookup on each key. Maybe with string replace or something.
I suck when it comes to regex and I've only come up with a very incomplete example but I include it anyway so maybe someone get the idea of what I am trying to do. Then if there is a better but complete different solution that is more than welcome.
var value = "#[hello], nice to see you.";
lookup = function(word){
return "bonjour";
};
var res = new RegExp( "\\b(hello)\\b", "gi" ).exec(value)
for (var c1 = 0; c1 < res.length; c1++){
value = value.replace(res[c1], lookup(res[c1]))
}
alert(value) //#[bonjour], nice to see you.
The regex should of course not filter out the word hello but the syntax and then collect the key by grouping or similar.
Can anyone help?
Just use String.replace method's ability to call function specified as second argument to generate replacement text and make a global replace using regexp matching your syntax:
var value = "#[hello], #[sir], nice to see you.";
lookup = function(full_match, word){
if(word == 'hello')
return "bonjour";
if(word == 'sir')
return "monsieur"
};
console.log(value.replace(/#\[(.+?)\]/gi, lookup))
Result:
bonjour, monsieur, nice to see you.
Of course when your replacement list gets bigger, you'd better use lookup object instead of series of ifs in lookup function, but you can really do whatever you want there.
You can try this to find all occurrences:
var re = new RegExp('#\\[([^\\]]+?)\\]', 'gi'),
str = '#[value1] plain text #[value2]',
match;
while (match = re.exec(str)) {
console.log(match);
}
You could use something like:
#\\[[^\\]]*\\]
Which matches the hash followed by an opening square bracket followed by zero or more characters NOT including the closing square bracket, followed by a closed square bracket.
Alternatively, perhaps it would be better to handle the translation at the server side (maybe even through your template engine) and send back to your client the translated response. Otherwise, (depending on the specific problem you are dealing with of course), you might end up sending a lot of data to the browser which might make your application respond slowly.
EDIT:
Here is a working piece of code:
var q="This #[ANIMAL1] was eaten by that #[ANIMAL2]";
var u = {"#[ANIMAL1]":"Lion","#[ANIMAL2]":"Frog"};
function insertAnimal(aString, lookup){
var res = (new RegExp("#\\[[^\\]]*\\]", "gi"))
while (m = res.exec(aString)){
aString = aString.replace(m, lookup[m])
}
return aString;
}
function main(){
alert(insertAnimal(q,u));
}
You can call the "main()" from an HTML document's body onload event
I can compare your requirement to 'resolving template texts within content'. If it is feasible to use Jquery , you should try Handlebars.js
.

Categories