Javascript substrings multiline replace by RegExp - javascript

I'm having some troubles with matching a regular expression in multi-line string.
<script>
var str="Welcome to Google!\n";
str = str + "We are proud to announce that Microsoft has \n";
str = str + "one of the worst Web Developers sites in the world.";
document.write(str.replace(/.*(microsoft).*/gmi, "$1"));
</script>
http://jsbin.com/osoli3/3/edit
As you may see on the link above, the output of the code looks like this:
Welcome to Google! Microsoft one of the worst Web Developers sites in the world.
Which means, that the replace() method goes line by line and if there's no match in that line, it returns just the whole line... Even if it has the "m" (multiline) modifier...

The multiline option only changes how the codes ^ and $ work, not how the code . works.
Use a pattern where you match any character using a set like [\w\W] instead of ., as that only matches non-linebreak characters.
document.write(str.replace(/[\w\W]*(microsoft)[\w\W]*/gmi, "$1"));

Related

replace '&' character with reg exp in javascript

I'd like to replace the "&" character, along with characters that may interfere with urls syntax.
so far i tried:
myText = myText.replace(/[^a-zA-Z0-9-. ]/g,'');
that probably works for other characters (didn't test it) but didn't comprehend the "&" which is what i care most about, so i added in combo the following line but also didn't get rid of the &:
myText = myText.replace(/&/g,'');
but neither work, how can i replace this special character?
SOLUTION:
Code was reading & at delivery and not &, so i had to do:
myText = myText.replace(/&/g,'');
and it works.
SNIPPET:
var text = "god & damn it";
console.log(text.replace(/&|&/g,''));
According to your comments, what you are trying to replace is this &, the html encoding of the & character.
With lodash you can _.unescape the string before replacing:
myText = _.unescape(myText).replace(/&/g, '');
This way you handle both & and & cases. Then if you have to append that text in the html you should _.escape it back to prevent weird side effects: _.escape(myText);.
Without lodash you can just search both in your regex:
myText = myText.replace(/&|&/g, '');
But this method can have it's side effects when other special characters are present because it removes the & character too, for example this string "Three is > than two & one" would end up looking like this "Three is gt; than two one" (notice the ugly gt; in the middle)
console.log("m&yText".replace(/\&/g,''))
I can suggest adding the backslash character before the & as to 'escape' using the & as the regex character. You want the regex to find and replace any literal & character.

regular expression for getting # and after that

I am trying to create a regular expression for the string filtering. I want to get the symbol "#" and anything that is written after that and before a space.
Can someone help me with this?
For example:
hi I am #vaibhav .
The expected result this regular expression should give is vaibhav.
I made this:
/#[a-z]*/
However, I am not sure if this will confirm to the above mentioned criteria.
To get a substring from the # up to the first space after it, use
#\S+
See demo
The \S means a non-whitespace character.
If you do not need #, use a capturing group:
#(\S+)
The value you need will be in Group 1. See another demo.
If you are using JavaScript:
var re = /#(\S+)/g;
var str = 'hi I am #vaibhav . hi, and I am #strib .';
var m;
while ((m = re.exec(str)) !== null) {
document.write("The value is: <b>" + m[1] + "</b><br/>");
}
The simplest solution is to use a negated set.
Search characters that are not '#'
Read in the '#'
Now capture characters that are not ' '
If you're trying to match and capture you can accomplish that like this:
[^#]*#([^ ]*).*
[Live Example]
If you only want to search then you don't need to match the whole string and you can just extract the actual match section:
#([^ ]*)
[Live Example]
The most complicated situation is where you need to deal with an escaped '#'. Here's an example of a match using that:
(?:[^\\#]|\\.)*#([^ ]*).*
[Live Example]
You can do that with lookarounds.
Edited version:
(?<=#)\w+
Demo on regex101

Regex catch string between two strings, multiple lines

I´m working on a *.po file, I´m trying to catch all the text between msgid "" and msgstr "", not really lucky, never more than one line:
msgid ""
"%s asdfgh asdsfgf asdfg %s even if you "
"asdfgdh sentences with no sense. We are not asking translate "
"Shakespeare's %s Hamlet %s !. %s testing regex %s "
"don't require specific industry knowledge. enjoying "
msgstr ""
What I´ve tried:
var myArray = fileContent.match(/msgid ([""'])(?:(?=(\\?))\2.)*?\1/g);
Thanks for your help, I´m not really good with regex :(
Here is one way to extract all of that text:
var match = text.replace(/msgid ""([\s\S]*?)msgstr ""/, "$1");
Example: http://jsfiddle.net/bqk79/
The [\s\S] is a character class that will match any character including line breaks, so [\s\S]*? will match any number of any character. In other languages you could use the s or DOTALL flag to make . match line breaks, but Javascript does not support this.
Note that you regex doesn't make any mention of single quotes, but if you need to be able to match between msgid '' and msgstr '' as well you can use the following:
var match = text.replace(/msgid (['"]{2})([\s\S]*?)msgstr \1/, "$2");
Try with this pattern:
/msgid (["']{2})\n([\s\S]*?)\nmsgstr \1/
The result is in the second capturing group, but you can make more simple with:
/msgid ["']{2}\n([\s\S]*?)\nmsgstr /
in the first capturing group
I realize that the question specifically asks for a regular expression, but you should consider using string split instead if you can.
Here is a ready-made function:
function extractTextBetween(subject, start, end) {
try{
return subject.split(start)[1].split(end)[0];
} catch(e){
console.log("Exception when extracting text", e);
}
}
http://jsfiddle.net/b33hdh9b/3/
You could perhaps try this regex?
msgid ""((?:.|[\n\r])+)msgstr ""
((?:.|[\n\r])+) this is your catching group;
(?:.|[\n\r])+ This enables the match of . or [\n\r] multiple times, the \n\r are for newlines and carriage returns.
Tested

Javascript and regex: remove space after the last word in a string

I have a string like that:
var str = 'aaaaaa, bbbbbb, ccccc, ddddddd, eeeeee ';
My goal is to delete the last space in the string. I would use,
str.split(0,1);
But if there is no space after the last character in the string, this will delete the last character of the string instead.
I would like to use
str.replace("regex",'');
I am beginner in RegEx, any help is appreciated.
Thank you very much.
Do a google search for "javascript trim" and you will find many different solutions.
Here is a simple one:
trimmedstr = str.replace(/\s+$/, '');
When you need to remove all spaces at the end:
str.replace(/\s*$/,'');
When you need to remove one space at the end:
str.replace(/\s?$/,'');
\s means not only space but space-like characters; for example tab.
If you use jQuery, you can use the trim function also:
str = $.trim(str);
But trim removes spaces not only at the end of the string, at the beginning also.
Seems you need a trimRight function. its not available until Javascript 1.8.1. Before that you can use prototyping techniques.
String.prototype.trimRight=function(){return this.replace(/\s+$/,'');}
// Now call it on any string.
var a = "a string ";
a = a.trimRight();
See more on Trim string in JavaScript? And the compatibility list
You can use this code to remove a single trailing space:
.replace(/ $/, "");
To remove all trailing spaces:
.replace(/ +$/, "");
The $ matches the end of input in normal mode (it matches the end of a line in multiline mode).
Try the regex ( +)$ since $ in regex matches the end of the string. This will strip all whitespace from the end of the string.
Some programs have a strip function to do the same, I do not believe the stadard Javascript library has this functionality.
Regex Reference Sheet
Working example:
var str = "Hello World ";
var ans = str.replace(/(^[\s]+|[\s]+$)/g, '');
alert(str.length+" "+ ans.length);
Fast forward to 2021,
The trimEnd() function is meant exactly for this!
It will remove all whitespaces (including spaces, tabs, new line characters) from the end of the string.
According to the official docs, it is supported in every major browser. Only IE is unsupported. (And lets be honest, you shouldn't care about IE given that microsoft itself has dropped support for IE in Aug 2021!)

Create a permalink with JavaScript

I have a textbox where a user puts a string like this:
"hello world! I think that __i__ am awesome (yes I am!)"
I need to create a correct URL like this:
hello-world-i-think-that-i-am-awesome-yes-i-am
How can it be done using regular expressions?
Also, is it possible to do it with Greek (for example)?
"Γεια σου κόσμε"
turns to
geia-sou-kosme
In other programming languages (Python/Ruby) I am using a translation array. Should I do the same here?
Try this:
function doDashes(str) {
var re = /[^a-z0-9]+/gi; // global and case insensitive matching of non-char/non-numeric
var re2 = /^-*|-*$/g; // get rid of any leading/trailing dashes
str = str.replace(re, '-'); // perform the 1st regexp
return str.replace(re2, '').toLowerCase(); // ..aaand the second + return lowercased result
}
console.log(doDashes("hello world! I think that __i__ am awesome (yes I am!)"));
// => hello-world-I-think-that-i-am-awesome-yes-I-am
As for the greek characters, yeah I can't think of anything else than some sort of lookup table used by another regexp.
Edit, here's the oneliner version:
Edit, added toLowerCase():
Edit, embarrassing fix to the trailing regexp:
function doDashes2(str) {
return str.replace(/[^a-z0-9]+/gi, '-').replace(/^-*|-*$/g, '').toLowerCase();
}
A simple regex for doing this job is matching all "non-word" characters, and replace them with a -. But before matching this regex, convert the string to lowercase. This alone is not fool proof, since a dash on the end may be possible.
[^a-z]+
Thus, after the replacement; you can trim the dashes (from the front and the back) using this regex:
^-+|-+$
You'd have to create greek-to-latin glyps translation yourself, regex can't help you there. Using a translation array is a good idea.
I can't really say for Greek characters, but for the first example, a simple:
/[^a-zA-Z]+/
Will do the trick when using it as your pattern, and replacing the matches with a "-"
As per the Greek characters, I'd suggest using an array with all the "character translations", and then adding it's values to the regular expression.
To roughly build the url you would need something like this.
var textbox = "hello world! I think that __i__ am awesome (yes I am!)";
var url = textbox.toLowerCase().replace(/([^a-z])/, '').replace(/\s+/, " ").replace(/\s/, '-');
It simply removes all non-alpha characters, removes double spacing, and then replaces all space chars with a dash.
You could use another regular expression to replace the greek characters with english characters.

Categories