Regex to select text without white spaces with restrictions

Regex to select text without white spaces with restrictions - javascript

I need to select some text between % signs where there is not white space between 2 %.
This should match:
%link%
This shouldn't:
%my link%
This easy regex would do the trick:
/%\S*%/g
But there is a catch: I can add a prefix: % and a suffix: % but the regex must contain this between these: (.+?)
(it's a third party script).
So this is the regex I need to adjust:
/%(.+?)%/
Because of "(.+?)" I need a workaround, any idea?
UPDATE:
All of these are true for a perfect regex:
regex = /%(.+?)%/g // default regex which allows spaces so it's not good
regex.test('%link%')
regex.test('%my link%') === false
regex.toString().includes('(.+?)')

You can use
var some_hardcoded_value = ".+?";
var regex = new RegExp("%(?=[^\\s%]+%)" + some_hardcoded_value + "%", "g");
See the regex demo.
Details:
% - a % char
(?=[^\s%]+%) - a positive lookahead that requires any one or more chars other than whitespace and % immediately to the right of the current location
(.+?) - Group 1: any one or more chars other than line break chars
% - a% char.
See a JavaScript demo:
const some_hardcoded_value = ".+?";
const regex = new RegExp("%(?=[^\\s%]+%)(" + some_hardcoded_value + ")%", "g");
const str = "%link% This shouldn't %my link% %%link,,,,,%%%%";
console.log(Array.from(str.matchAll(regex), x => x[1]));

Related

Regex to get substring between first and last occurence

Assume there is the string
just/the/path/to/file.txt
I need to get the part between the first and the last slash: the/path/to
I came up with this regex: /^(.*?).([^\/]*)$/, but this gives me everything in front of the last slash.

Don't use [^/]*, since that won't match anything that contains a slash. Just use .* to match anything:
/(.*?)\/(.*)\/(.*)/
Group 1 = just, Group 2 = the/path/to and Group 3 = file.txt.

The regex should be \/(.*)\/. You can check my below demo:
const regex = /\/(.*)\//;
const str = `just/the/path/to/file.txt`;
let m;
if ((m = regex.exec(str)) !== null) {
console.log(m[1]);
}

This regex expression will do the trick
const str = "/the/path/to/the/peace";
console.log(str.replace(/[^\/]*\/(.*)\/[^\/]*/, "$1"));
[^\/]*\/(.*)\/[^\/]*

If you are interested in only matching consecutive parts with a single / and no //
^[^/]*\/((?:[^\/]+\/)*[^\/]+)\/[^\/]*$
^ Start of string
[^/]*\/ Negated character class, optionally match any char except / and then match the first /
( Capture group 1
(?:[^\/]+\/)* Optionally repeat matching 1+ times any char except / followed by matching the /
[^\/]+ Match 1+ times any char except /
) Close group 1
\/[^\/]* Match the last / followed by optionally matching any char except /
$ End of string
Regex demo
const regex = /^[^/]*\/((?:[^\/]+\/)*[^\/]+)\/[^\/]*$/;
[
"just/the/path/to/file.txt",
"just/the/path",
"/just/",
"just/the/path/to/",
"just/the//path/test",
"just//",
].forEach(str => {
const m = str.match(regex);
if (m) {
console.log(m[1])
};
});

Regex to extract two numbers with spaces from string

I have problem with simple rexex. I have example strings like:
Something1\sth2\n649 sth\n670 sth x
Sth1\n\something2\n42 036 sth\n42 896 sth y
I want to extract these numbers from strings. So From first example I need two groups: 649 and 670. From second example: 42 036 and 42 896. Then I will remove space.
Currently I have something like this:
\d+ ?\d+
But it is not a good solution.

You can use
\n\d+(?: \d+)?
\n - Match new line
\d+ - Match digit from 0 to 9 one or more time
(?: \d+)? - Match space followed by digit one or more time. ( ? makes it optional )
let strs = ["Something1\sth2\n649 sth\n670 sth x","Sth1\n\something2\n42 036 sth\n42 896 sth y"]
let extractNumbers = str => {
return str.match(/\n\d+(?: \d+)?/g).map(m => m.replace(/\s+/g,''))
}
strs.forEach(str=> console.log(extractNumbers(str)))

If you need to remove the spaces. Then the easiest way for you to do this would be to remove the spaces and then scrape the numbers using 2 different regex.
str.replace(/\s+/, '').match(/\\n(\d+)/g)
First you remove spaces using the \s token with a + quantifier using replace.
Then you capture the numbers using \\n(\d+).
The first part of the regex helps us make sure we are not capturing numbers that are not following a new line, using \ to escape the \ from \n.
The second part (\d+) is the actual match group.

var str1 = "Something1\sth2\n649 sth\n670 sth x";
var str2 = "Sth1\n\something2\n42 036 sth\n42 896 sth y";
var reg = /(?<=\n)(\d+)(?: (\d+))?/g;
var d;
while(d = reg.exec(str1)){
console.log(d[2] ? d[1]+d[2] : d[1]);
}
console.log("****************************");
while(d = reg.exec(str2)){
console.log(d[2] ? d[1]+d[2] : d[1]);
}

Finding exact words in text, excluding quoted words

In the javascript code below I need to find in a text exact words, but excluding the words that are between quotes. This is my attempt, what's wrong with the regex? It should find all the words excluding word22 and "word3". If I use only \b in the regex it selects exact words but it doesn't exclude the words between quotes.
var text = 'word1, word2, word22, "word3" and word4';
var words = [ 'word1', 'word2', 'word3' , 'word4' ];
words.forEach(function(word){
var re = new RegExp('\\b^"' + word + '^"\\b', 'i');
var pos = text.search(re);
if (pos > -1)
alert(word + " found in position " + pos);
});

First, we'll use a function to escape the characters of the word, just in case there's some that have special meaning for regexp.
// from https://stackoverflow.com/a/30851002/240443
function regExpEscape(literal_string) {
return literal_string.replace(/[-[\]{}()*+!<=:?.\/\\^$|#\s,]/g, '\\$&');
}
Then, we construct a regular expression as an alternation between individual word regexps. For each word, we assert that it starts with a word boundary, ends with a word boundary, and has an even number of quote characters between its end, and the end of string. (Note that from the end of word3, there is only one quote till the end of string, which is odd.)
let text = 'word1, word2, word22, "word3" and word4';
let words = [ 'word1', 'word2', 'word3' , 'word4' ];
let regexp = new RegExp(words.map(word =>
'\\b' + regExpEscape(word) + '\\b(?=(?:[^"]*"[^"]*")*[^"]*$)').join('|'), 'g')
text.match(regexp)
// => word1, word2, word4
while ((m = regexp.exec(text))) {
console.log(m[0], m.index);
}
// word1 0
// word2 7
// word4 34
EDIT: Actually, we can speed the regexp up a bit if we factor out the surrounding conditions:
let regexp = new RegExp(
'\\b(?:' +
words.map(regExpEscape).join('|') +
')\\b(?=(?:[^"]*"[^"]*")*[^"]*$)', 'g')

Your excluding of the quote character is wrong, that's actually matching the beginning of the string followed by a quote. Trying this instead
var re = new RegExp('\\b[^"]' + word + '[^"]\\b', 'i');
Also, this site is amazing to help you debug regex : https://regexpal.com
Edit: Because \b will match on quotation marks, this needs to be tweaked further. Unfortunately javascript doesn't support lookbehinds, so we have to get a little tricky.
var re = new RegExp('(?:^|[^"\\w])' + word + '(?:$|[^"\\w])','i')
So what this is doing is saying
(?: Don't capture this group
^ | [^"\w]) either match the start of the line, or any non word (alphanumeric and underscore) character that isn't a quote
word capture and match your word here
(?: Don't capture this group either
$|[^"\w) either match the end of the line, or any non word character that isn't a quote again

Replace -84 in string: "my-name-is-dude-84" with '' by regex?

How to Replace -84 in a string: my-name-is-dude-84 with '' Regex?
I means the last '-' + number
I tried :
string = 'my-name-is-dude-84';
let regex = /[^\-*][1-9]/;
let specialChar = string.replace(regex, '');
then I received is my-name-is-dude-
I expect my string will be: my-name-is-dude

You're close, but this is what you need to do (I guess)
string = 'my-name-is-dude-84';
let regex = /-\d+$/;
let specialChar = string.replace(regex, '');
document.write(specialChar);
Your [^\-*] tries to match all characters but \, - and *. Also [1-9] only matches one digit (between 1 and 9). Use \d (all digits), and add a + to make it match one or more. Also, adding an end of string anchor $ to it makes it only match the hyphen+number at the end of the string.

You can use this regex (.*?)-\d+$
regex demo
JavaScript demo
string = 'my-name-is-99-dude-84';
let regex = /(.*?)-\d+$/;
let specialChar = string.replace(regex, "$1");
document.write(specialChar);

RegExp match word till space or character

I'm trying to match all the words starting with # and words between 2 # (see example)
var str = "#The test# rain in #SPAIN stays mainly in the #plain";
var res = str.match(/(#)[^\s]+/gi);
The result will be ["#The", "#SPAIN", "#plain"] but it should be ["#The test#", "#SPAIN", "#plain"]
Extra: would be nice if the result would be without the #.
Does anyone has a solution for this?

You can use
/#\w+(?:(?: +\w+)*#)?/g
See the demo here
The regex matches:
# - a hash symbol
\w+ - one or more alphanumeric and underscore characters
(?:(?: +\w+)*#)? - one or zero occurrence of:
(?: +\w+)* - zero or more occurrences of one or more spaces followed with one or more word characters followed with
# - a hash symbol
NOTE: If there can be characters other than word characters (those in the [A-Za-z0-9_] range), you can replace \w with [^ #]:
/#[^ #]+(?:(?: +[^ #]+)*#)?/g
See another demo
var re = /#[^ #]+(?:(?: +[^ #]+)*#)?/g;
var str = '#The test-mode# rain in #SPAIN stays mainly in the #plain #SPAIN has #the test# and more #here';
var m = str.match(re);
if (m) {
// Using ES6 Arrow functions
m = m.map(s => s.replace(/#$/g, ''));
// ES5 Equivalent
/*m = m.map(function(s) {
return s.replace(/#$/g, '');
});*/ // getting rid of the trailing #
document.body.innerHTML = "<pre>" + JSON.stringify(m, 0, 4) + "</pre>";
}

You can also try this regex.
#(?:\b[\s\S]*?\b#|\w+)
(?: opens a non capture group for alternation
\b matches a word boundary
\w matches a word character
[\s\S] matches any character
See demo at regex101 (use with g global flag)

We Keep Coding

JavaScript is the programming language of the Web.

Regex to select text without white spaces with restrictions - javascript

Related

Regex to get substring between first and last occurence

Regex to extract two numbers with spaces from string

Finding exact words in text, excluding quoted words

Replace -84 in string: "my-name-is-dude-84" with '' by regex?

RegExp match word till space or character

Categories

Resources