what is the difference between PHP regex and javascript regex

what is the difference between PHP regex and javascript regex - javascript

i am working in regex my regex is /\[([^]\s]+).([^]]+)\]/g this works great in PHP for [http://sdgdssd.com fghdfhdhhd]
but when i use this regex for javascript it do not match with this input string
my input is [http://sdgdssd.com fghdfhdhhd]

In JavaScript regex, you must always escape the ] inside a character class:
\[([^\]\s]+).([^\]]+)\]
See the regex demo
JS parsed [^] as *any character including a newline in your regex, and the final character class ] symbol as a literal ].
In this regard, JS regex engine deviates from the POSIX standard where smart placement is used to match [ and ] symbols with bracketed expressions like [^][].
The ] character is treated as a literal character if it is the first character after ^: [^]abc].
In JS and Ruby, that is not working like that:
You can include an unescaped closing bracket by placing it right after the opening bracket, or right after the negating caret. []x] matches a closing bracket or an x. [^]x] matches any character that is not a closing bracket or an x. This does not work in JavaScript, which treats [] as an empty character class that always fails to match, and [^] as a negated empty character class that matches any single character. Ruby treats empty character classes as an error. So both JavaScript and Ruby require closing brackets to be escaped with a backslash to include them as literals in a character class.
Related:
(?1) regex subroutine used to shorten a PCRE pattern conversion - REGEX from PHP to JS

I would like to add this little fact about translating PHP preg_replace Regex in JavaScript .replace Regex :
<?php preg_replace("/([^0-9\,\.\-])/i";"";"-1 220 025.47 $"); ?>
Result : "-1220025.47"
with PHP, you have to use the quotes "..." around the Regex, a point comma to separate the Regex with the replacement and the brackets are used as a repetition research (witch do not mean the same thing at all.
<script>"-1 220 025.47 $".replace(/[^0-9\,\.\-]/ig,"") </script>
Result : "-1220025.47"
With JavaScript, no quotes around the Regex, a comma to separate Regex with the replacement and you have to use /g option in order to say multiple research in addition of the /i option (that's why /ig).
I hope this will be usefull to someone !
Note that the "\," may be suppressed in case of "1,000.00 $" (English ?) kind of number :
<script>"-1,220,025.47 $".replace(/[^0-9\.\-]/ig,"")</script>
<?php preg_replace("/([^0-9\.\-])/i";"";"-1,220,025.47 $"); ?>
Result : "-1220025.47"

Related

I want to find the strings with the placeholders from a given string in JavaScript [duplicate]

Simple regex question. I have a string on the following format:
this is a [sample] string with [some] special words. [another one]
What is the regular expression to extract the words within the square brackets, ie.
sample
some
another one
Note: In my use case, brackets cannot be nested.

You can use the following regex globally:
\[(.*?)\]
Explanation:
\[ : [ is a meta char and needs to be escaped if you want to match it literally.
(.*?) : match everything in a non-greedy way and capture it.
\] : ] is a meta char and needs to be escaped if you want to match it literally.

(?<=\[).+?(?=\])
Will capture content without brackets
(?<=\[) - positive lookbehind for [
.*? - non greedy match for the content
(?=\]) - positive lookahead for ]
EDIT: for nested brackets the below regex should work:
(\[(?:\[??[^\[]*?\]))

This should work out ok:
\[([^]]+)\]

Can brackets be nested?
If not: \[([^]]+)\] matches one item, including square brackets. Backreference \1 will contain the item to be match. If your regex flavor supports lookaround, use
(?<=\[)[^]]+(?=\])
This will only match the item inside brackets.

To match a substring between the first [ and last ], you may use
\[.*\] # Including open/close brackets
\[(.*)\] # Excluding open/close brackets (using a capturing group)
(?<=\[).*(?=\]) # Excluding open/close brackets (using lookarounds)
See a regex demo and a regex demo #2.
Use the following expressions to match strings between the closest square brackets:
Including the brackets:
\[[^][]*] - PCRE, Python re/regex, .NET, Golang, POSIX (grep, sed, bash)
\[[^\][]*] - ECMAScript (JavaScript, C++ std::regex, VBA RegExp)
\[[^\]\[]*] - Java, ICU regex
\[[^\]\[]*\] - Onigmo (Ruby, requires escaping of brackets everywhere)
Excluding the brackets:
(?<=\[)[^][]*(?=]) - PCRE, Python re/regex, .NET (C#, etc.), JGSoft Software
\[([^][]*)] - Bash, Golang - capture the contents between the square brackets with a pair of unescaped parentheses, also see below
\[([^\][]*)] - JavaScript, C++ std::regex, VBA RegExp
(?<=\[)[^\]\[]*(?=]) - Java regex, ICU (R stringr)
(?<=\[)[^\]\[]*(?=\]) - Onigmo (Ruby, requires escaping of brackets everywhere)
NOTE: * matches 0 or more characters, use + to match 1 or more to avoid empty string matches in the resulting list/array.
Whenever both lookaround support is available, the above solutions rely on them to exclude the leading/trailing open/close bracket. Otherwise, rely on capturing groups (links to most common solutions in some languages have been provided).
If you need to match nested parentheses, you may see the solutions in the Regular expression to match balanced parentheses thread and replace the round brackets with the square ones to get the necessary functionality. You should use capturing groups to access the contents with open/close bracket excluded:
\[((?:[^][]++|(?R))*)] - PHP PCRE
\[((?>[^][]+|(?<o>)\[|(?<-o>]))*)] - .NET demo
\[(?:[^\]\[]++|(\g<0>))*\] - Onigmo (Ruby) demo

If you do not want to include the brackets in the match, here's the regex: (?<=\[).*?(?=\])
Let's break it down
The . matches any character except for line terminators. The ?= is a positive lookahead. A positive lookahead finds a string when a certain string comes after it. The ?<= is a positive lookbehind. A positive lookbehind finds a string when a certain string precedes it. To quote this,
Look ahead positive (?=)
Find expression A where expression B follows:
A(?=B)
Look behind positive (?<=)
Find expression A where expression B
precedes:
(?<=B)A
The Alternative
If your regex engine does not support lookaheads and lookbehinds, then you can use the regex \[(.*?)\] to capture the innards of the brackets in a group and then you can manipulate the group as necessary.
How does this regex work?
The parentheses capture the characters in a group. The .*? gets all of the characters between the brackets (except for line terminators, unless you have the s flag enabled) in a way that is not greedy.

Just in case, you might have had unbalanced brackets, you can likely design some expression with recursion similar to,
\[(([^\]\[]+)|(?R))*+\]
which of course, it would relate to the language or RegEx engine that you might be using.
RegEx Demo 1
Other than that,
\[([^\]\[\r\n]*)\]
RegEx Demo 2
or,
(?<=\[)[^\]\[\r\n]*(?=\])
RegEx Demo 3
are good options to explore.
If you wish to simplify/modify/explore the expression, it's been explained on the top right panel of regex101.com. If you'd like, you can also watch in this link, how it would match against some sample inputs.
RegEx Circuit
jex.im visualizes regular expressions:
Test
const regex = /\[([^\]\[\r\n]*)\]/gm;
const str = `This is a [sample] string with [some] special words. [another one]
This is a [sample string with [some special words. [another one
This is a [sample[sample]] string with [[some][some]] special words. [[another one]]`;
let m;
while ((m = regex.exec(str)) !== null) {
// This is necessary to avoid infinite loops with zero-width matches
if (m.index === regex.lastIndex) {
regex.lastIndex++;
}
// The result can be accessed through the `m`-variable.
m.forEach((match, groupIndex) => {
console.log(`Found match, group ${groupIndex}: ${match}`);
});
}
Source
Regular expression to match balanced parentheses

(?<=\[).*?(?=\]) works good as per explanation given above. Here's a Python example:
import re
str = "Pagination.go('formPagination_bottom',2,'Page',true,'1',null,'2013')"
re.search('(?<=\[).*?(?=\])', str).group()
"'formPagination_bottom',2,'Page',true,'1',null,'2013'"

The #Tim Pietzcker's answer here
(?<=\[)[^]]+(?=\])
is almost the one I've been looking for. But there is one issue that some legacy browsers can fail on positive lookbehind.
So I had to made my day by myself :). I manged to write this:
/([^[]+(?=]))/g
Maybe it will help someone.
console.log("this is a [sample] string with [some] special words. [another one]".match(/([^[]+(?=]))/g));

if you want fillter only small alphabet letter between square bracket a-z
(\[[a-z]*\])
if you want small and caps letter a-zA-Z
(\[[a-zA-Z]*\])
if you want small caps and number letter a-zA-Z0-9
(\[[a-zA-Z0-9]*\])
if you want everything between square bracket
if you want text , number and symbols
(\[.*\])

This code will extract the content between square brackets and parentheses
(?:(?<=\().+?(?=\))|(?<=\[).+?(?=\]))
(?: non capturing group
(?<=\().+?(?=\)) positive lookbehind and lookahead to extract the text between parentheses
| or
(?<=\[).+?(?=\]) positive lookbehind and lookahead to extract the text between square brackets

In R, try:
x <- 'foo[bar]baz'
str_replace(x, ".*?\\[(.*?)\\].*", "\\1")
[1] "bar"

([[][a-z \s]+[]])
Above should work given the following explaination
characters within square brackets[] defines characte class which means pattern should match atleast one charcater mentioned within square brackets
\s specifies a space
 + means atleast one of the character mentioned previously to +.

I needed including newlines and including the brackets
\[[\s\S]+\]

If someone wants to match and select a string containing one or more dots inside square brackets like "[fu.bar]" use the following:
(?<=\[)(\w+\.\w+.*?)(?=\])
Regex Tester

What is this "/\,$/"?

Tried to search for /\,$/ online, but coudnt find anything.
I have:
coords = coords.replace(/\,$/, "");
Im guessing it returns coords string index number. What I have to search online for this, so I can learn more?

/\,$/ finds the comma character (,) at the end of a string (denoted by the $) and replaces it with empty (""). You sometimes see this in regex code aiming to clean up excerpts of text.

It's a regular expression to remove a trailing comma.

That thing is a Regular Expression, also known as regex or regexp. It is a way to "match" strings using some rules. If you want to learn how to use it in JavaScript, read the Mozilla Developer Network page about RegExp.
By the way, regular expressions are also available on most languages and in some tools. It is a very useful thing to learn.

That's a regular expression that finds a comma at the end of a string. That code removes the comma.

// defines a JavaScript regular expression, used to match a pattern within a string.
\,$ is the pattern
In this case \, translates to ,. A backslash is used to escape special characters, but in this case, it's not necessary. An example where it would be necessary would be to remove trailing periods. If you tried to do that with /.$/ the period here has a different meaning; it is used as a wildcard to match [almost] any character (aside for some newlines). So in this case to match on "." (period character) you would have to escape the wildcard (/\.$/).
When $ is placed at the end of the pattern, it means only look at the end of the string. This means that you can't mistakingly find a comma anywhere in the middle of the string (e.g., not after help in help, me,), only at the end (trailing). It also speeds of the regular expression search considerably. If you wanted to match on characters only at the beginning of the string, you would start off the pattern with a carat (^), for instance /^,/ would find a comma at the start of a string if one existed.
It's also important to note that you're only removing one comma, whereas if you use the plus (+) after the comma, you'd be replacing one or more: /,+$/.
Without the +; trailing commas,, becomes trailing commas,
With the +; no trailing comma,, becomes no trailing comma

To the last tag (already in a string) RegEx

I do not know what I am doing wrong. I have this string that I want to replace
<?xml version="1.0" encoding="utf-8" ?>
<Sections>
<Section>
I am using regex to replace everything including <Section>, and leave the rest untouched.
arrayValues[index].replace("/[([.,\n,\s])*<Section>]/", "---");
What is wrong with my regex? Doesn't this mean repalce every character, including new line and spaces, up to and including <Section> with ---?

First of all, you need to remove the quotes around your regex—if they're there, the argument won't be processed as a regex. JavaScript will see it as a string (because it is a string) and try to match it literally.
Now that that's taken care of, we can simplify your regex a bit:
arrayValues[index].replace(/[\s\S]*?<Section>/, "---");
[\s\S] gets around JavaScript's lack of an s flag (a handy option supported by most languages that enables . to match newlines). \s does match newlines (even without an s flag specified), so the character class [\s\S] tells the regex engine to match:
\s - a whitespace character, which could be a newline
OR
\S - a non-whitespace character
So you can think of [\s\S] as matching . (any character except a newline) or the literal \n (a newline). See Javascript regex multiline flag doesn't work for more.
? is used to make the initial [\s\S]* match non-greedy, so the regex engine will stop once it hits the first occurrence of <Section>.

arrayValues[index].replace("/[([.,\n,\s])*<Section>]/", "---");
What is wrong with my regex?
It's no regex, it's string literal. A string would be converted to a regex, but yours would then include the slashes. Use a regex literal instead:
arrayValues[index].replace(/[\S\s]*<Section>/, "---");
Also, you have too many unnecessary characters in it. The [] around the whole thing build a character class, which is not what you want. The capturing group () just wraps a character class which can be repeated itself. And a dot . inside a character class does match a literal dot, instead of all characters.

Javascript regex invalid range in character class

I'm using a regex pattern that I got from regexlib to validate relative urls. On their site you can test the pattern to make sure it fits your needs. Everything works great on their site, as soon as I use the pattern in mine I get the error message:
Invalid range in character class
I know that this error usually means that a hyphen is mistakenly being used to represent a range and is not properly escaped. But in this case since it works on their site I'm confused why it's not working on mine.
var urlRegex = new RegExp('^(?:(?:\.\./)|/)?(?:\w(?:[\w`~!$=;\-\+\.\^\(\)\|\{\}\[\]]|(?:%\d\d))*\w?)?(?:/\w(?:[\w`~!$=;\-\+\.\^\(\)\|\{\}\[\]]|(?:%\d\d))*\w?)*(?:\?[^#]+)?(?:#[a-z0-9]\w*)?$', 'g');
NOTE:
If you're going to test the regex from their site (using the link above) be sure to change the Regex Engine dropdown to Client-side Engine and the Engine dropdown to Javascript.

Either put - at the end or beginning of the character class or use two backslashes to do a regex escape within string
since you are using string you need to use two backslashes for each special characters..
NOTE
Check out this answer on SO which explains when to use single or double backslashes to escape special characters

There is no reason to use RegExp constructor here. Just use RegExp literal:
var urlRegex = /^(?:(?:\.\.\/)|\/)?(?:\w(?:[\w`~!$=;\-\+\.\^\(\)\|\{\}\[\]]|(?:%\d\d))*\w?)?(?:\/\w(?:[\w`~!$=;\-\+\.\^\(\)\|\{\}\[\]]|(?:%\d\d))*\w?)*(?:\?[^#]+)?(?:#[a-z0-9]\w*)?$/g;
^ ^ ^ ^ ^
Inside RegExp literal, you just write the regex naturally, except for /, which now needs escaping, since / is used as delimiter in the RegExp literal.
In character class, ^ has special meaning at the beginning of the character class, - has special meaning in between 2 characters, and \ has special meaning, which is to escape other characters (mainly ^, -, [, ] and \) and also to specify shorthand character classes (\d, \s, \w, ...). [, ] are used as delimiters for character class, so they also have special meaning. (Actually, in JavaScript, only ] has special meaning, and you can specify [ without escaping inside character class). Other than those 5 character listed above, other characters (unless involved in an escape sequence with \) doesn't have any special meaning.
You can reduce the number of escaping \ with the information above. For ^, unless it is the only character in the character class, you can put it away from the beginning of the character class. For -, you can put it at the end of the character class.
var urlRegex = /^(?:(?:\.\.\/)|\/)?(?:\w(?:[\w`~!$=;+.^()|{}\[\]-]|(?:%\d\d))*\w?)?(?:\/\w(?:[\w`~!$=;+.^()|{}\[\]-]|(?:%\d\d))*\w?)*(?:\?[^#]+)?(?:#[a-z0-9]\w*)?$/g;
What was changed:
[\w`~!$=;\-\+\.\^\(\)\|\{\}\[\]]
[\w`~!$=;+.^()|{}\[\]-]

Writing a Javascript regex that includes special reserved characters

I'm writing a function that takes a prospective filename and validates it in order to ensure that no system disallowed characters are in the filename. These are the disallowed characters: / \ | * ? " < >
I could obviously just use string.indexOf() to search for each special char one by one, but that's a lot longer than it would be to just use string.search() using a regular expression to find any of those characters in the filename.
The problem is that most of these characters are considered to be part of describing a regular expression, so I'm unsure how to include those characters as actually being part of the regex itself. For example, the / character in a Javascript regex tells Javascript that it is the beginning or end of the regex. How would one write a JS regex that functionally behaves like so: filename.search(\ OR / OR | OR * OR ? OR " OR < OR >)

Put your stuff in a character class like so:
[/\\|*?"<>]
You're gonna have to escape the backslash, but the other characters lose their special meaning. Also, RegExp's test() method is more appropriate than String.search in this case.
filenameIsInvalid = /[/\\|*?"<>]/.test(filename);

Include a backslash before the special characters [\^$.|?*+(){}, for instance, like \$
You can also search for a character by specified ASCII/ANSI value. Use \xFF where FF are 2 hexadecimal digits. Here is a hex table reference. http://www.asciitable.com/ Here is a regex reference http://www.regular-expressions.info/reference.html

The correct syntax of the regex is:
/^[^\/\\|\*\?"<>]+$/
The [^ will match anything, but anything that is matched in the [^] group will return the match as null. So to check for validation is to match against null.
Demo: jsFiddle.
Demo #2: Comparing against null.
The first string is valid; the second is invalid, hence null.
But obviously, you need to escape regex characters that are used in the matching. To escape a character that is used for regex needs to have a backslash before the character, e.g. \*, \/, \$, \?.

You'll need to escape the special characters. In javascript this is done by using the \ (backslash) character.
I'd recommend however using something like xregexp which will handle the escaping for you if you wish to match a string literal (something that is lacking in javascript's native regex support).

We Keep Coding

JavaScript is the programming language of the Web.

what is the difference between PHP regex and javascript regex - javascript

i am working in regex my regex is /\[([^]\s]+).([^]]+)\]/g this works great in PHP for [http://sdgdssd.com fghdfhdhhd] but when i use this regex for javascript it do not match with this input string my input is [http://sdgdssd.com fghdfhdhhd]

Related

I want to find the strings with the placeholders from a given string in JavaScript [duplicate]

What is this "/\,$/"?

To the last tag (already in a string) RegEx

Javascript regex invalid range in character class

Writing a Javascript regex that includes special reserved characters

Categories

Resources