Explain regular expression for removing html code from String - javascript

I have a regular expression which removes html code from a String :
var html = "<p>Dear sms,</p><p>This is a test notification for push message from center II.</p>";
text = html.replace(/(<([^>]+)>)/ig, "")
alert(text)
This is the expression working on jsfiddle : http://jsfiddle.net/VgHr3/53/
The regular expression itself is /(<([^>]+)>)/ig . I don't fully understand how this expression works. Can provide an explanation ? I can find what each character by itself behaves by reading a cheatsheet :http://www.cheatography.com/davechild/cheat-sheets/regular-expressions/
But what is the significance of "/ig" ?

Those are global flags. Your cheat sheet actually lists them on the right side:
Regular Expressions Pattern Modifiers
g Global match
i Case-i­nse­nsitive
m Multiple lines
s Treat string as single line
x Allow comments and white space in pattern
e Evaluate replac­ement
U Ungreedy pattern
Note that not all of these flags are supported by the JavaScript regular expression engine. For an authoritative list, see this MDN article.
So the "g" flag makes it global, so it replaces this pattern wherever it is found, instead of just the first instance (which is the default behavior of the replace method).
The "i" flag makes it case-insensitive, so a pattern like [a-z]+ will match "foo" and "FOO". However, because your pattern only involves < and > characters, this flag is useless.

Related

What does the forward slash mean within a JavaScript regular expression?

I can't find any definitive information on what / means in a JavaScript regex.
The code replace(/\r/g, '');
What I'm able to figure out is this:
/ = I don't know
\r = carriage return
/g = I don't know but It may mean 'the match must occur at the point where the previous match ended.'
The slashes indicate the start and end of the regular expression.
The g at the end is a flag and indicates it is a global search.
From the docs:
Regular expressions have four optional flags that allow for global and
case insensitive searching. To indicate a global search, use the g
flag. To indicate a case-insensitive search, use the i flag. To
indicate a multi-line search, use the m flag. To perform a "sticky"
search, that matches starting at the current position in the target
string, use the y flag. These flags can be used separately or together
in any order, and are included as part of the regular expression.
To include a flag with the regular expression, use this syntax:
var re = /pattern/flags;
To add a little more detail, the / characters are part of the regular expression literal syntax in JavaScript/ECMAScript. The / characters are used during lexical analysis to determine that a regular expression pattern is present between them and anything immediately following them will be regular expression flags. The ECMAScript standard has defined this in EBNF, for your perusual:
RegularExpressionLiteral :: / RegularExpressionBody /
RegularExpressionFlags
A good analogy for the / in regular expressions is the " or ' that surround string literals in JavaScript.
As others have pointed out, you should read the docs! That said:
Think of the forward slash as quotation marks for regular expressions. The slashes contain the expression but are not themselves part of the expression. (If you want to test for a forward slash, you have to escape it with a backwards slash.) The lowercase g specifies that this is a global search, i.e., find all matches rather than stopping at the first match.
As is indicated here, the forward slashes are not a part of the expression itself, but denote the beginning and ending of the expression.
To add to metadept's answer:
the g bit is the global indicator - see What does the regular expression /_/g mean? - i.e. replace all occurrences, not just the first one

javascript : Regular expression for a:b

I am looking for the occurrence of the pattern a:b. This could be of the form
a: b or
a :b or
a : b
(note the optional spaces).
I am new to RegExes and was trying something of the form : a\s:\sb but kinda din work..
Can somebody point me out the right one ?
Thanks..
Your current regex is 'a\s:\sb'. Since you didn't make the '\s' part of the pattern optional, this only matches 'a[SPACE]:[SPACE]b', where I am using [SPACE] as a standin for space, tab, or any other whitespace character. Instead, you can use 'a\s?:\s?b', which makes the whitespace optional.
For more regular expression information, I would recommend the Perl regular expression tutorial.
Try a\s*:\s*b. Also, this is handy to test: http://www.pagecolumn.com/tool/regtest.htm

Can anyone explain me on how to form regular expression and explain this regular expression?

replace(/[^0-9]/g,''));
Replace is a method
What does / indicate?
What does ^ indicate along with 0-9
What does /g indicate?
Do we need to start a regular expression with / or can we start with anything?
The / introduces a regular expression literal (just like " and ' introduce string literals). A regular expression literal is in the form /expression/flags, where expression is the body of the expression, and flags are optional flags (i for case-insensitive, g for global, m for multi-line stuff).
The ^ as the first character within [] means any character not matching the following. So [^0-9] means "any character except 0 through 9".
The /g ends the regular expression literal and includes the "global" flag on it. Without the g, replace would only replace the first match, not all of them.
In all, what that does is replace any character that isn't 0 through 9 with a blank — e.g., removes non-digits. It could be written more simply as:
var result = str.replace(/\D/g, '');
...because \D (note that's an upper-case D) means "non-digit".
MDC has a decent page on regular expressions.
The / and / are the start and end of the regex pattern, the g mean global (anything after the 2nd / is an optional modifier for the regex).
^ means not.
So in this case it'll remove any character that isn't a number.
See the manual for replace
See regular expression literals
See using special characters
See searching with flags
replace is method of string type
/ / indicates there's a regular expression inside of them
^ inside [] means "not"
"g" means to replace globally
regular expressions in javascript should put in to a pair of "/"
This W3 Schools tutorial should cover most of the basics. This other tutorial covers the flasg, such as /g which can be passed to the regex engine.
yes
start and end of regex
not, that just basically means, match any non-integer
global replacement, the effect of not having that is replacement only done for the first encounter.
At least in javascript, yes you have to use /.
/ indicates the beginning and end of the regexp. Hence in your case [^0-9] is the regex.
^ indicates the start of line
/g indicates the substitution to take place for all the match - globl, and not only for the first match.
/g enables "global" matching. When using the replace() method, specify this modifier to replace all matches, rather than only the first one.
/ start regex
^ match except the symbols 0-9
Well, as to creating one, this forum is not the best for that -- it is a rather large question, best left to one of the best resources on RegExp that I know of.
It looks like you're in JS, so:
replace is a method of String. It replaces the provided expression with the second string, in this case nothing.
In JavaScript / must begin and end all RegEx's, all / in the middle must be escaped with a \ (so they look like this: \/). In other languages (PHP, Perl being some of the most prominent), you can use other characters such as ~ and #.
^ inside of [] means negation, - means range, so [^0-9] means "not 0, 1, 2, 3, 4, 5, 6, 7, 8 or 9" [0-9] does have a shorthand of \d. So /[^\d]/g is a valid, alternate way to say the same thing.
/g means "global" as in "match all incidents, not just the first.
Your expression means, "replace all non-digits with nothing".
The / encapsulate your pattern (you need to escape / with \ if you want to use it in pattern)
and the trailing character after the slashes are modifiers. 'g' in this case means global search (i.e. find all matches)
^ is negation.. [0-9] is range indicating all numbers from 0 to 9.
so [^0-9] means anything except numbers
So This regex basically replaces anything except numbers in the string with '' (i.e. remove them)
Regex has lots of other features, you should research them!
What it does: Removes all non-numeric (0-9) characters.
The forward slash (/) is used when you declare RegExp literals
The [^0-9] means any character OTHER THAN 0-9. The ^ means "other than". You can remove it and it'll look for only a character 0-9.
The /g represents global replacement.
So this will look for any non-number character and replace it with nothing.
As Shamim notes, regular-expressions.info/is a great site. Best of luck!
You can try out javascript regex's on this site: http://regexpal.com/
Couples with http://www.regular-expressions.info/tutorial, it's a great resource for learning.

javascript email regular expression

Can someone explain this regular expression to validate email.
var emailExp = /^[\w\-\.\+]+\#[a-zA-Z0-9\.\-]+\.[a-zA-z0-9]{2,4}$/;
I need to know what does this independent elements do
"/^" and "\" and "\.\-" and "$" //Please explain individually
Thanks in advance
Quick explanation
/
JavaScript regular expressions start with a / and end with another one. Everything in-between is a regular expression. After the second / there may be switches like g (global) and/or i (ignore case) ie. var rx = /.+/gi;)
^
Start of a text line (so nothing can be prepended before the email address). This also comes in handy in multi-line texts.
\
Used to escape special characters. A dot/full-stop . is a special character and represents any single character but when presented as \. it means a dot/full-stop itself. Characters that need to escaped are usually used in regular expression syntax. (braces, curly braces, square brackets etc.) You'll know when you learn the syntax.
\.\-
Two escaped characters. Dot/full-stop and a minus/hyphen. So it means .-
$
End of line.
Learn regular expressions
They are one of the imperative things every developer should understand to some extent. At least some basic knowledge is mandatory.
Some resources
General regular expression syntax resource
http://www.regular-expressions.info/
JavaScript related regular expressions
https://developer.mozilla.org/en/Core_JavaScript_1.5_Guide/Regular_Expressions
/
The start of the expression
^
The start of the string (since it appears at the start of the expression)
\
Nothing outside the context of the character that follows it
\.\-
A full stop. A hyphen.
$
The end of the string
The other posters have done an excellent job at explaining this regex, but if your goal is to actually do e-mail validation in JavaScript, please check out this StackOverflow thread.

Javascript replace method, replace with "$1"

I'm reading Sitepoints 2007 book "Simply Javascript" and I encountered some code I just can't understand.
It's the following code:
Core.removeClass = function(target, theClass)
{
var pattern = new RegExp("(^| )" + theClass + "( |$)");
target.className = target.className.replace(pattern, "$1");
target.className = target.className.replace(/ $/, "");
};
The first call to the replace method is what puzzles me, I don't understand where the "$1" value comes from or what it means. I would think that the call should replace the found pattern with "".
Each pair of parentheses (...) where the first character is not a ?* is a "capturing group", which places its result into $1,$2,$3,etc which can be used in the replacement pattern.
You might also see the same thing as \1,\2,\3 in other regex engines, (or indeed in the original expression sometimes, for repetition)
These are called "backreferences", because they generally refer back to (an earlier) part of in the expression.
(*The ? indicates various forms of special behaviour, including a non-capturing group which is (?:...) and simply groups without capturing.)
In your specific example, the $1 will be the group (^| ) which is "position of the start of string (zero-width), or a single space character".
So by replacing the whole expression with that, you're basically removing the variable theClass and potentially a space after it. (The closing expression ( |$) is the inverse - a space or the string end position - and since its value isn't used, could have been non-capturing with (?: |$) instead.)
Hopefully this explains everything ok - let me know if you want any more info.
Also, here's some further reading from the site regular-expressions.info:
Groups and Backreferences
Atomic Grouping (doesn't work in JS, but interesting)
Lookaround groups (partial support in JS regex)
$1 is a backreference. It will be replaced by whatever the first matching group (set of parenthesis) in your regex matches.

Categories