Javascript Regex (replace) Issue

Javascript Regex (replace) Issue - javascript

I am checking a collection and replacing all
<Localisation container="test">To translate</Localisation>
tags with text.
The next codes does what I want:
var localisationRegex = new RegExp("(?:<|<)(?:LocalisationKey|locale).+?(?:container|cont)=[\\\\]?(?:['\"]|("))(.+?)[\\\\]?(?:['\"]|(")).*?(?:>|>)(.*?)(?:<|<)/(?:LocalisationKey|locale)(?:>|>)", "ig");
match = localisationRegex.exec(parsedData);
while (match != null) {
var localeLength = match[0].length;
var value = match[4];
parsedData = parsedData.substr(0, match.index) + this.GetLocaleValue(value) + parsedData.substr(match.index + localeLength);
match = localisationRegex.exec(parsedData);
}
But, when the the string I replace with, Is longer then the original string, the index/place where it will start to search for the next match, is wrong (to far). This sometimes leads to tags not found.

Setting aside the (important) question as to whether the approach is a good one, if it were me I'd avoid the problem of indexing through the source text by using a function argument to the regex:
var localizer = this;
var result = parsedData.replace(localisationRegex, function(_, value) {
return localizer.GetLocaleValue(value);
});
That will replace the tags with the localized content.

Related

Javascript userscript - Find and replace: textnodes and HTML

I'm trying to create a userscript (Tampermonkey) to add some helper buttons into a site and originally I was using the script below based on the one posted here.
setInterval(function() {
//RegEx for finding any string of digits after "ID: " up to the next space
var myRegexp = /ID:\s(\d*?)\s/gi;
];
var txtWalker = document.createTreeWalker (
document.body,
NodeFilter.SHOW_TEXT,
{ acceptNode: function (node) {
//-- Skip whitespace-only nodes
if (node.nodeValue.trim() )
return NodeFilter.FILTER_ACCEPT;
return NodeFilter.FILTER_SKIP;
}
},
false
);
var txtNode = null;
while (txtNode = txtWalker.nextNode () ) {
var oldTxt = txtNode.nodeValue;
//Find all instances
match = myRegexp.exec(oldTxt);
while (match != null) {
//Get group from match
var idNum = match[1]
//Replace current match with added info
oldTxt = oldTxt.(idNum, idNum+"| <SomeHTMLHere> "+idNum+" | ");
//Update text
txtNode.nodeValue = oldTxt;
//Check for remaining matches
match = myRegexp.exec(oldTxt);
}
}
}, 5000);
Now I would like to add a bit more functionality to the text, probably something clickable to copy to clipboard or insert elsewhere. Now I know I'm working with text nodes in the original script but I wanted to know if there was anyway of adapting the current script to insert HTML at these points without rewriting from scratch.
The main problem with the site is these ID:##### values I'm search for all appear within the same element like below so I could simply find them by element (or at least not with my limited JS knowledge).
<div>
ID: 1234567 | Text
ID: 45678 | Text
</div>
If someone could point me in the right direction that'd be great or at least tell me it isn't possible without a rewrite.

Okay, so rewriting it actually worked out pretty well. Works much more nicely and is more succinct. Hopefully this will help someone in the future. If anyone wants to suggest any improved, please feel free!
setInterval(function() {
//Regex
var reg = /ID:\s(\d*?)\s/gi;
var result;
//Get all classes that this applies to
$('.<parentClass>').each(function(i, obj) {
var text = $(this).html();
//Do until regex can't be found anymore
while (result = reg.exec(text)) {
//Get first regex group
var str = result[1];
//Add in desired HTML
var newhtml = $(this).html().replace(str, '|<span class="marked">' + str + '</span>|');
//Replace
$(this).html(newhtml);
}
});
//Click function for added HTML
$(".marked").click(function(){
//Get text inside added HTML
var id = $(this).text();
//Construct desired string
var Str = "someText " + id;
//Insert into message box
textarea = document.querySelector('.<inputArea>')
textarea.value = Str;
textarea.focus();
});
}, 5000);

Get string from a successful regex?

I'm trying to manipulate a string that has tested as a positive match against my regex statement.
My regex statement is /\[table=\d](.*?)\[\/table] / gmi and an example of a positive match would be [table=1]Cell 1[c]Cell 2[/table]. I'm searching for matches within a certain div, which I'll call .foo in the code below.
However, once the search comes back saying it has found a match, I want to have the section that was identified as a match returned back to me so that I can start manipulating a specific section of it, namely count the number of times [c] appears and reference the number in [table=1].
(function(regexCheck) {
var regex = /\[table=\d](.*?)\[\/table] / gmi;
$('.foo').each(function() {
var html = $(this).html();
var change = false;
while (regex[0].test(html)) {
change = true;
//Somehow return string?
}
});
})(jQuery);
I'm quite new to javascript and especially new to RegEx, so I apologise if this code is crude.
Thanks for all of your help in advance.

Use exec instead of test and keep the resulting match object:
var match;
while ((match = regex[0].exec(html)) != null) {
change = true;
// use `match[0]` for the full match, or `match[1]` and onward for capture groups
}
Simple example (since your snippet isn't runnable, I've just created a simple one instead):
var str = "test 1 test 2 test 3";
var regex = /test (\d)/g;
var match;
while ((match = regex.exec(str)) !== null) {
console.log("match = " + JSON.stringify(match));
}

Keeping only part of a regex match

I need to search for a word in text. For this I used this regex:
var re =/duration='\d+'/ig;
var i = text.match(re);
This gives me an array of matches like "duration='300'", "duration='400'",...
I need to get only numbers. without duration=''

You can use a capturing group:
var re = /duration='(\d+)'/ig;
var match = re.exec(text);
while (match != null) {
// matched text: match[1]
match = re.exec(text);
}

Tim's answer works well (and I'm not sure why the OP says it is not what he/she wants). That said, here is another way to do it using the String.replace() method with a callback function replacement value:
function getDurations(text) {
var re =/duration='(\d+)'/ig;
var i = [];
text.replace(re, function(m0, m1){i.push(m1); return '';});
return i;
}
Note that this technique requires no loop and is quite efficient getting the job done in a single statement.

Removing a string in javascript

I have a URL say
dummy URL
http://www.google.com/?v=as12&&src=test&img=test
Now I want to remove the &src=test& part alone.I know we can use indexof but somehow I could not get the idea of getting the next ampersand(&) and removing that part alone.
Any help.The new URL should look like
http://www.google.com/?v=as12&img=test

What about using this?:
http://jsfiddle.net/RMaNd/8/
var mystring = "http://www.google.com/?v=as12&&src=test&img=test";
mystring = mystring.replace(/&src=.+&/, ""); // Didn't realize it isn't necessary to escape "&"
alert(mystring);
This assumes that "any" character will come after the "=" and before the next "&", but you can always change the . to a character set if you know what it could be - using [ ]
This also assumes that there will be at least 1 character after the "=" but before the "&" - because of the +. But you can change that to * if you think the string could be "src=&img=test"
UPDATE:
Using split might be the correct choice for this problem, but only if the position of src=whatever is still after "&&" but unknown...for example, if it were "&&img=test&src=test". So as long as the "&&" is always there to separate the static part from the part you want to update, you can use something like this:
http://jsfiddle.net/Y7LdG/
var mystring1 = "http://www.google.com/?v=as12&&src=test&img=test";
var mystring2 = "http://www.google.com/?v=as12&&img=test&src=test";
var final1 = removeSrcPair(mystring1);
alert(final1);
var final2 = removeSrcPair(mystring2);
alert(final2);
function replaceSrc(str) {
return str.replace(/src=.*/g, "");
}
function removeSrcPair(orig) {
var the_split = orig.split("&&");
var split_second = the_split[1].split("&");
for (var i = split_second.length-1; i >= 0; i--) {
split_second[i] = replaceSrc(split_second[i]);
if (split_second[i] === "") {
split_second.splice(i, 1);
}
}
var joined = split_second.join("&");
return the_split[0] + "&" + joined;
}
This still assumes a few things - the main split is "&&"...the key is "src", then comes "=", then 0 or more characters...and of course, the key/value pairs are separated by "&". If your problem isn't this broad, then my first answer seems fine. If "src=test" won't always come first after "&&", you'd need to use a more "complex" Regex or this split method.

Something like:
url = "http://www.google.com/?v=as12&&src=test&img=test"
firstPart = url.split('&&')[0];
lastPart = url.split('&&')[1];
lastPart = lastPart.split('&')[1];
newUrl = firstPart+'&'+lastPart;
document.write(newUrl);

Details: Use the split method.
Solution Edited: I changed the below to test that the last query string exists
var url = "http://www.google.com/?v=as12&&src=test&img=test";
var newUrl;
var splitString = url.split('&');
if (splitString.length > 3)
{
newURL = splitString[0] + "&" + splitString[3];
}
else
{
newURL = splitString[0];
}

How can I improve the performance of my JavaScript text formatter?

I am allowing my users to wrap words with "*", "/", "_", and "-" as a shorthand way to indicate they'd like to bold, italicize, underline, or strikethrough their text. Unfortunately, when the page is filled with text using this markup, I'm seeing a noticeable (borderline acceptable) slow down.
Here's the JavaScript I wrote to handle this task. Can you please provide feedback on how I could speed things up?
function handleContentFormatting(content) {
content = handleLineBreaks(content);
var bold_object = {'regex': /\*(.|\n)+?\*/i, 'open': '<b>', 'close': '</b>'};
var italic_object = {'regex': /\/(?!\D>|>)(.|\n)+?\//i, 'open': '<i>', 'close': '</i>'};
var underline_object = {'regex': /\_(.|\n)+?\_/i, 'open': '<u>', 'close': '</u>'};
var strikethrough_object = {'regex': /\-(.|\n)+?\-/i, 'open': '<del>', 'close': '</del>'};
var format_objects = [bold_object, italic_object, underline_object, strikethrough_object];
for( obj in format_objects ) {
content = handleTextFormatIndicators(content, format_objects[obj]);
}
return content;
}
//#param obj --- an object with 3 properties:
// 1.) the regex to search with
// 2.) the opening HTML tag that will replace the opening format indicator
// 3.) the closing HTML tag that will replace the closing format indicator
function handleTextFormatIndicators(content, obj) {
while(content.search(obj.regex) > -1) {
var matches = content.match(obj.regex);
if( matches && matches.length > 0) {
var new_segment = obj.open + matches[0].slice(1,matches[0].length-1) + obj.close;
content = content.replace(matches[0],new_segment);
}
}
return content;
}

Change your regex with the flags /ig and remove the while loop.
Change your for(obj in format_objects) loop with a normal for loop, because format_objects is an array.
Update
Okay, I took the time to write an even faster and simplified solution, based on your code:
function handleContentFormatting(content) {
content = handleLineBreaks(content);
var bold_object = {'regex': /\*([^*]+)\*/ig, 'replace': '<b>$1</b>'},
italic_object = {'regex': /\/(?!\D>|>)([^\/]+)\//ig, 'replace': '<i>$1</i>'},
underline_object = {'regex': /\_([^_]+)\_/ig, 'replace': '<u>$1</u>'},
strikethrough_object = {'regex': /\-([^-]+)\-/ig, 'replace': '<del>$1</del>'};
var format_objects = [bold_object, italic_object, underline_object, strikethrough_object],
i = 0, foObjSize = format_objects.length;
for( i; i < foObjSize; i++ ) {
content = handleTextFormatIndicators(content, format_objects[i]);
}
return content;
}
//#param obj --- an object with 2 properties:
// 1.) the regex to search with
// 2.) the replace string
function handleTextFormatIndicators(content, obj) {
return content.replace(obj.regex, obj.replace);
}
Here is a demo.
This will work with nested and/or not nested formatting boundaries. You can omit the function handleTextFormatIndicators altogether if you want to, and do the replacements inline inside handleContentFormatting.

Your code is forcing the browser to do a whole lot of repeated, wasted work. The approach you should be taking is this:
Concoct a regex that combines all of your "target" regexes with another that matches a leading string of characters that are not your special meta-characters.
Change the loop so that it does the following:
Grab the next match from the source string. That match, due to the way you changed your regex, will be a string of non-meta characters followed by your matched portion.
Append the non-meta characters and the replacement for the target portion onto a separate array of strings.
At the end of that process, the separate accumulator array can be joined and used to replace the content.
As to how to combine the regular expressions, well, it's not very pretty in JavaScript but it looks like this. First, you need a regex for a string of zero or more "uninteresting" characters. That should be the first capturing group in the regex. Next should be the alternates for the target strings you're looking for. Thus the general form is:
var tokenizer = /(uninteresting pattern)?(?:(target 1)|(target 2)|(target 3)| ... )?/;
When you match that against the source string, you'll get back a result array that will contain the following:
result[0] - entire chunk of string (not used)
result[1] - run of uninteresting characters
result[2] - either an instance of target type 1, or null
result[3] - either an instance of target type 2, or null
...
Thus you'll know which kind of replacement target you saw by checking which of the target regexes are non empty. (Note that in your case the targets can conceivably overlap; if you intend for that to work, then you'll have to approach this as a full-blown parsing problem I suspect.)

You can do things like:
function formatText(text){
return text.replace(
/\*([^*]*)\*|\/([^\/]*)\/|_([^_]*)_|-([^-]*)-/gi,
function(m, tb, ti, tu, ts){
if(typeof(tb) != 'undefined')
return '<b>' + formatText(tb) + '</b>';
if(typeof(ti) != 'undefined')
return '<i>' + formatText(ti) + '</i>';
if(typeof(tu) != 'undefined')
return '<u>' + formatText(tu) + '</u>';
if(typeof(ts) != 'undefined')
return '<del>' + formatText(ts) + '</del>';
return 'ERR('+m+')';
}
);
}
This will work fine on nested tags, but will not with overlapping tags, which are invalid anyway.
Example at http://jsfiddle.net/m5Rju/

We Keep Coding

JavaScript is the programming language of the Web.

Javascript Regex (replace) Issue - javascript

Related

Javascript userscript - Find and replace: textnodes and HTML

Get string from a successful regex?

Keeping only part of a regex match

Removing a string in javascript

How can I improve the performance of my JavaScript text formatter?

Categories

Resources