javascript regular expression match - javascript

I need to split a string like the one below, based on space as the delimiter. But any space within a quote should be preserved. There are two cases which needs to work
Case 1
research library "not available" author:"Bernard Shaw"
to
research
library
"not available"
author:"Bernard Shaw"
Case 2
research library "not available" author:Bernard
to
research
library
"not available"
author:Bernard
I am trying to do this with Javascript and regular expression.
var splitArray = query_string.match(/([^\s]*\"[^\"]+\")|\w+/g);
Case 1 works as required but Case 2 produces the result as below
research
library
"not available"
author
Bernard
I need both the cases to work with one Regex. Any ideas appreciated.

[^"\s]+(?:"[^"]+")?|"[^"]+"
Explanation:
[^"\s]+ # One or more non-space/non-quote characters
(?:"[^"]+")? # optionally followed by a quoted string
| # or
"[^"]+" # just a quoted string.
Assuming that there are no escaped quotes within quoted strings.

([^\s]*\"[^\"]+\")|\w+:?
I've tested this regex here: rubular
update:
you may want to include some more punctuation marks like ; , . ? !
e.g. research library! "not available" author:"Bernard Shaw" test1, test2; test2!
([^\s]*\"[^\"]+\")|\w+[:;\.,\?!]?

This works, at least for your two cases:
((?:[^\s]*\"[^\"]+\")|[\w:]+)
see here

Related

RegEx match() in Javascript does not produce result as expected

I'm having trouble working out why a regex in Javascript is not working how I would expect it to.
The pattern is as follows:
\[(.+)\]\((.+)\)
trying to match text in the following format:
[Learn more](https://www.example.com)
const text = 'Lorem ipsum etc [Learn more](https://www.google.com), and produce costly [test link](https://www.google.com). [another test link](https://www.google.com).'
const regex = /\[(.+)\]\((.+)\)/
const found = text.match(regex)
console.log(found)
I am expecting the value of found to be the following:
[
"[Learn more](https://www.google.com)",
"[test link](https://www.google.com)",
"[another test link](https://www.google.com)"
]
But the value seems to be as follows:
[
"[Learn more](https://www.google.com), and produce costly [test link](https://www.google.com). [another test link](https://www.google.com)",
"Learn more](https://www.google.com), and produce costly [test link](https://www.google.com). [another test link",
"https://www.google.com"
]
I've tried the /ig flags but this doesn't seem to work. I'm trying in a different application (RegExRX) and getting the expected result but in Javascript, I can't get it to produce the same result.
The + quantifier is greedy and will "eat" as much of the source string as possible. You can use .+? instead:
const regex = /\[(.+?)\]\((.+?)\)/
Better yet, instead of . match "not ]":
const regex = /\[([^\]]+)\]\(([^)]+)\)/
Explicitly excluding the boundary characters can perform better anyway.
TL;DR: The regex \[(.+?)\]\((.+?)\) should do.
The reason the original pattern doesn't work is because the + quantifier is "greedy" by default—it will try to match as many characters as possible. Therefore, .+ means "as much of anything except new line character as possible". You can already tell that closing bracket fits the definition just fine.
To make it work properly, you have to say "as much of anything as possible, until the first closing bracket." To do that, you should either substitute .+ by [^\]]+ ([^\)]+ for the second group), or simply make the aforementioned quantifier not so greedy by appending it with ?, which turns both capturing groups into (.+?).

Returning exact matches by checking preceding succeeding character of needle in haystack

How can I return a match only when characters like [a-zA-zäöåÄÖÅ] do not directly appear one place before and after the match in the search string.
Say for example I'm looking for on in a string.
The following examples should evaluate to:
False: "luonto - nature" because uont
True: "olla (on, ovat, ole )" because (on,
False: "kevät (season)" because son)
Thanks.
I think this needs more than just a comment, so I am making this a proper answer.
As per my comment, you can use [^a-zA-zäöåÄÖÅ]on[^a-zA-zäöåÄÖÅ] to check for the clause stated in the question of the question.
If you want to be more specific about on being at the start of the end of the string you can make it (^|[^a-zA-zäöåÄÖÅ])on([^a-zA-zäöåÄÖÅ]|$) which will test that:
^ [the start of the string]
or [|]
[^a-zA-zäöåÄÖÅ][None(^) of the list(a-zA-zäöåÄÖÅ)]
comes directly before the string literal on
AND
directly after the string literal on
[^a-zA-zäöåÄÖÅ][None(^) of the list(a-zA-zäöåÄÖÅ)]
or [|]
$ [the end of the string]
As I mentioned in the comment as well, a great site for learning and testing regex is Regex 101
Hope this helps

Distinguish '$' and '\$' in a string using javascript indexOf method

Suppose the following data is entered by user in a text area
test\$ing
I need to extract and modify the data. Problem is that I am not able to distinguish between '$' and '\$'
I have made the following attempts.
indexOf('\\') gives -1
indexOf('\$') gives 4
indexOf('$') gives 4
charAt(4) gives $
I understand that java script treat '\$' as a single character. But how to distinguish whether the character is '$' or '\$'
I have gone through this post and the accepted solution suggests to change the original text by escape backslashes. Is this the only possible way? Even if this is the case, how to escape the backslashes in the original text?
Please help
\ is an escape character, which means that in order to get a literal version of that character you have to write two of them in a row. Thus, your string should be written as 'test\\$ing' in JavaScript source. (However, users don't need to escape this character when they are typing in the context of a <textarea>.) To find a blackslash followed by $ inside your string you would write:
string.indexOf('\\$') //=> 4
Demo Snippet:
var string = 'test\\$ing'
console.log(string.indexOf('\\')) //=> 4
console.log(string.indexOf('\\$')) //=> 4
console.log(string.indexOf('$')) //=> 5
console.log(string.charAt(4)) //=> '\'
If you work with a string 'test\$ing' then you can't detect '\' because it is removed.
If user types \$ inside textarea.value, then indexOf should work.
Please provide more code.

javascript regexp to match path depth

Been struggling for the last hour to try and get this regexp to work but cannot seem to crack it.
It must be a regexp and I cannot use split etc as it is part of a bigger regexp that searches for numerous other strings using .test().
(public\/css.*[!\/]?)
public/css/somefile.css
public/css/somepath/somefile.css
public/css/somepath/anotherpath/somefile.css
Here I am trying to look for path starting with public/css followed by any character except for another forward slash.
so "public/css/somefile.css" should match but the other 2 should not.
A better solution may be to somehow specify the number of levels to match after the prefix using something like
(public\/css\/{1,2}.*)
but I can't seem to figure that out either, some help with this would be appreciated.
edit
No idea why this question has been marked down twice, I have clearly stated the requirement with sample code and test cases and also attempted to solve the issue, why is it being marked down ?
You can use this regex:
/^(public\/css\/[^\/]*?)$/gm
^ : Starts with
[^/] : Not /
*?: Any Characters
$: Ends with
g: Global Flag
m: Multi-line Flag
Something like this?
/public\/css\/[^\/]+$/
This will match
public/css/[Any characters except for /]$
$ is matching the end of the string in regex.

Regular Expression for Organisation name

How to write a regular expression for validating a organisation name which allows Alphanumeric as the starting characters and only special characters like ., -, # and &.
I tried but it's not working
/^[a-z]|\d?[a-zA-Z0-9]?[a-zA-Z0-9\s&#.]+$
Some Valid Names
Hercules.Cycle
Herbal & Product
Welcome # 123
Invalid Names
&Hercules
Colgate!()
.Youtube
#Incule
Is that what you want?
^[A-Z]([a-zA-Z0-9]|[- #\.#&!])*$
You guys can use this below validation ,
As per our requirement from client, "Company Name" can have only 'single space' in between words along with few permitted special characters in it.
/^[a-zA-Z0-9-#.{}#&!()]+(\s[a-zA-Z0-9-#{}.#&!()]+)+(\s[a-zA-Z-#.#&!()]+)?$/
Output:-
Correct/Positive Response.
These below mentioned outputs are allowed..
#arshaa Technologies (m91) HYD
#9Arshaa Technologies (HYD) IND
#AT&T {IND} HYD
#Apple.India {HYD} (INDIA)
Negative/Incorrect response.
These below mentioned few outputs will not be allowed, according to above mentioned RegEx. As they contain multiple spaces or Special Characters not in RegEx.
#arshaa _Technologies (m91) HYD
#9Arshaa --Technologies (HYD) IND
#AT&T {IND} HYD.
#Apple. -^India {HYD} $(INDIA)
Note: Please feel free to update your answers in comments section for further modification of this RegEx.
I've made and tested the above regular expression and it solves all what you mentioned:
/^[.#&]?[a-zA-Z0-9 ]+[ !.#&()]?[ a-zA-Z0-9!()]+/
Below all checks off.
Some Valid Names
Hercules.Cycle
Herbal & Product
Welcome # 123
Invalid Names
&Hercules
Colgate!()
.Youtube
#Incule
Try this pattern:
^\w[\w.\-#&\s]*$
or
^[a-zA-Z0-9][a-zA-Z0-9\.\-#&\s]*$
^[A-Z]([a-zA-Z0-9.-_,]|[- #.#&!])*$
This would allow certain special characters after the first word
Sample Names:
1. Hoyt, Gilman & Esp
2. Y 105 Kgfy

Categories