split string with regex lookaround in Js - javascript

I have tricky problem. I need to split the following string at . followed by a word:
".use(z.string().min(2).max(4)).array(.length())"
But the thing is, that I only need to split on .use(...) and .array(...) and the content between the braces should be untouched.
Currently I use a positive lookahead to match on . followed by a word
/(?=\.[\w]+)/
but, obviously, this also splits the string inside the braces.
I thought about building a lookaround pattern that checks if the matches are inside braces but my regex knowledge is not that good so I don't really know where/how to start.
I'd appreciate any hints in the right direction.

Track how deep into nested brackets you are, so that you can decide when to split only when you are not within nested brackets.
const s = ".use(z.string().min(2).max(4)).array(.length())";
let splitIndices = [0];
let depth = 0;
[...s].forEach((c,i) => {
if(!depth && !c.match(/\w/)) {
let p = s.substring(0, i-1).match(/\.\w+$/)?.index;
if(p) splitIndices.push(p);
}
if(c==='(') depth++;
if(c===')') depth--;
})
console.log(splitIndices.map((e,i,a)=>s.substring(e, a[i+1])))

Related

How to split a string by one delimiter but having a particular format as described below

I have a string as:
const str = 'My [Link format](https://google.com) demo'
I want the word array to be like:
['My', '[Link format](https://google.com)', 'demo']
What to do in javascript?
I was trying using split() and str.match(). Nothing worked yet.
This is a simple split on a space as a delimiter, but we us a negative lookahead to check for the combination of open and closed square brackets [] and round brackets ()
const str = 'My [Link format](https://google.com) demo'
console.log(str.split(/\s+(?![^\[]*\])(?![^\(]*\))/));
We also allow for spaces in the URL portion, even though it has a low chance of having spaces, it could still happen
Try it here: https://jsfiddle.net/m4q6e9x7/
["My", "[Link format](https://google.com)", "demo"]
In the fiddle I've tried to show to two separate negative lookaheads for the combination of the types of brackets: (I've put a space in the round brackets to prove the concept)
const str = 'My [Link format](http s://google.com) demo'
ignore space between []
console.log(str.split(/\s+(?![^\[]*\])/));
["My", "[Link format](http", "s://google.com)", "demo"]
ignore space between ()
console.log(str.split(/\s+(?![^\(]*\))/));
["My", "[Link", "format](http s://google.com)", "demo"]
So we can easily combine the two criteria because we need both of them to not match.
Because [] and () need to be escaped, it might be easier to see the regex if we modify and test for spaces between braces {}
const str = 'My {Link format}(https://google.com) demo'
console.log(str.split(/\s+(?![^{]*})/));
["My", "{Link format}(https://google.com)", "demo"]
Both solutions assume, that the string has correct form (meaning basically no space between ']' and '(', no ']' characters inside [...] and similar intuitions. You didn't really provide information about what the input string can be other than your concrete example – so solutions work well in this and very similar cases. Second is very easily modified as needed, first is easily extended to check if the string is in fact not correct.
Solution using Regular Expressions
Below code finds everything before first '[', everything in '[...](...)' pattern (note: first ... must not contain ']', and second – ')', but I assume this would make for an incorrect input in the first place), and everything after that.
So
let regex = /(.*)(\[.*\]\(.*\))(.*)/
let res = str.match(regex).splice(1,3)
gives res as
['My ', '[Link format](https://google.com)', ' demo']
From there, you can trim every entry in this array ('My ' => 'My') for example using a trim function like so:
res.map((val) => val.trim());
Look here for explanation of what the array obtained from .match() method represents, but generally except index 0 it contains capture groups, meaning the parts of string corresponding to parts of regex surrounded by parentheses.
If you are not familiar with Regular Expressions (regexes) in JS, or at all, you will find many online resources about the topic easily. After grasping the basics, regex101 is a nice tool to experiment with regexes and explore their capabilities. When using it, you should probably choose EcmaSCRIPT/JS flavor from the menu on the left.
Equivalent solution without regex
Equivalent solution is to find where is the first '[' manually, as well as where the '[...](...)' pattern ends. Than splice the parts (before '[', pattern, and after pattern) from the string, and probably trim them. So just loop over characters of the string in search of '[' and than ']', '(', ')'. Note that in this case you can easily and granularily decide what to do if the string has unexpected/incorrect form.
TODO: I will probably sketch some code when I have time for it
Regex is your friend!
const regexMdLinks = /!?\[([^\]]*)\]\(([^\)]+)\)/gm
// Example md file contents
const str = `My [Link format](https://google.com) demo My [Link format2](https://google.com/2) demo2`
let regex_splitted = str.split(regexMdLinks);
let arr = [];
//1. Item will be the text (or empty text)
//2. Item is the link text
//3. Item is the url
for(let i = 0; i < regex_splitted.length; i++){
if(i % 3 == 0){ //Split normal text
arr.push(...regex_splitted[i].split(" ").filter(i => i));
} else if(i % 3 == 1){//Add brackets around link text
arr.push("["+regex_splitted[i]+"]");
} else {
arr.push("("+regex_splitted[i]+")");
}
}
console.log(arr)

Removing all special characters except "some" apostrophes

I'm trying to create a function that removes all special characters (including periods) except apostrophes when they are naturally part of a word. The regex pattern I've made is supposed to remove anything that doesn't fit the schema of word either followed by an apostrophe ' and/or another word:
function removeSpecialCharacters(str) {
return str.toLowerCase().replace(/[^a-z?'?a-z ]/g, ``)
}
console.log(removeSpecialCharacters(`I'm a string.`))
console.log(removeSpecialCharacters(`I'm a string with random stuff.*/_- '`))
console.log(removeSpecialCharacters(`'''`))
As you can see from the snippet it works well except for removing the rogue apostrophes.
And if I add something like [\s'\s] or ['] to the pattern it breaks it completely. Why is it doing this and what am I missing here?
Alternate the pattern with '\B, which will match and remove apostrophes which are not followed by a word character, eg ab' or ab'#, while preserving strings like ab'c:
function removeSpecialCharacters(str) {
return str.toLowerCase().replace(/'\B|[^a-z'? ]/g, ``)
}
console.log(removeSpecialCharacters(`I'm a string.`))
console.log(removeSpecialCharacters(`I'm a string with random stuff.*/_- '`))
console.log(removeSpecialCharacters(`'''`))
(you can also remove the duplicated characters from the character set)
Not sure what went wrong with yours as I can't see what you attempted. However, I got this to work.
function removeSpecialCharacters(str) {
str = str.toLowerCase();
// reduce duplicate apostrophes to single
str = str.replace(/'+/g,`'`);
// get rid of wacky chars
str = str.replace(/[^a-z'\s]/g,'');
// replace dangling apostrophes
str = str.replace(/(^|\s)'(\s|$)/g, ``);
return str;
}
console.log(removeSpecialCharacters(`I'm a string.`))
console.log(removeSpecialCharacters(`I'm a string with random stuff.*/_- '`))
console.log(removeSpecialCharacters(`'''`))
console.log(removeSpecialCharacters(`regex 'til i die`))
Here's one very easy solution. To remove certain characteristics from a string, you can run a bunch of if-statements through a while loop. This allows you to chose exactly which symbols to remove.
while (increment < string.length)
{
if (string[increment] == "!")
}
delete "!";
}
increment += 1;
}
That's a simple rundown of what'll look like (not actual code) to give you a sense of what you're doing.

3 While Loops into a Single Loop?

I have to remove the commas, periods, and hyphens from an HTML text value. I do not want to write all 3 of these while loops, instead I only want one loop (any) to do all of this.
I already tried a while with multiple && and if else nested inside but i would always only just get the commas removed.
while(beg.indexOf(',') > -1)
{
beg = beg.replace(',','');
document.twocities.begins.value= beg;
}
while(beg.indexOf('-') > -1)
{
beg = beg.replace('-','');
document.twocities.begins.value= beg;
}
while(beg.indexOf('.') > -1)
{
beg= beg.replace('.','');
document.twocities.begins.value= beg;
}
You can do all this without loops by using regex.
Here is an example of removing all those characters using a single regex:
let str = "abc,d-e.fg,hij,1-2,34.56.7890"
str = str.replace(/[,.-]/g, "")
console.log(str)
No loops are necessary for this in the first place.
You can replace characters in a string with String.replace() and you can determine which characters and patterns to replace using regular expressions.
let sampleString = "This, is. a - test - - of, the, code. ";
console.log(sampleString.replace(/[,-.]/g, ""));
A single call to the replace function and using a regular expression suffice:
document.twocities.begins.value = beg = beg.replace(/[,.-]/g, "");
Regular expressions are a pattern matching language. The pattern employed here basically says "every occurrence of one of the characters ., ,, -)". Note that the slash / delimits the pattern while the suffix consists of flags controlling the matching process - in this case it is g (global) telling the engine to replace each occurrence ( as opposed to the first only without the flag ).
This site provides lots of info about regular expressions, their use in programming and implementations in different programming environments.
There are several online sites to test actual regular expression and what they match (including explanations), eg. Regex 101.
Even more details ... ;): You may use the .replace function with a string as the first argument (as you did in your code sample). However, only the first occurrence of the string searched for will be replaced - thus you would have to resort to loops. Specs of the .replace function (and of JS in general) can be found here.
Use regex like below.
let example = "This- is a,,., string.,";
console.log(example.replace(/[-.,]+/g, ""));

What is the regular expression to be used for the sequence of strings occured?

I have a structure of string, I need a regular expression that only picks up the numbers from the structure, and also the expression should report if the structure deviates from the mentioned rule (suppose if I missed any comma or full stop or braces etc)
The structure is - {'telugu':['69492','69493','69494'],'kannada':['72224']}
The regular expression I've tried is /\[(.*?)\]/g;
The above expression is working fine for picking only numbers from the given input, but it's not reporting for the missing of any comma, fullstop or braces etc.
var contentids = {'telugu':['69492','69493','69494'],'kannada':['72224']};
var pattern = /\[(.*?)\]/g;
while ((match = pattern.exec(contentids)) != null) {
var arrayContentids2 = new Array();
arrayContentids2 = match[1].split(",");
}
I am fetching only the numbers from the given input,but I need a validation of missing commas, fullstop, braces etc from the input.
To get all the numbers you can use a RegEx like this /\'(\d+)\'|\"(\d+)\"/g. The second part is only for numbers inside " instead of ', so you can remove this if you want.
To check the balance of braces i would use a simple counting loop and move through the input. I don't think that RegEx are the right tool for this job.
To search missing commas you could use the RegEx /([\'\"]\s*[\'\"])/g and /([\[\(\{]\d+)/g to find the tow errors in
{'telugu':['69492','69493','69494'],'kannada':[72224''72224']}
Hope this will help you

search for double-braced text or double parentheses using regular expression

I need to select the values ​​that are in a string between double parentheses and double-braced but that allow non-double braces and parentheses
I used the following expression for the double-braced but it is broken if it has a braced inside the string. this should only be broken by having double-braced, but I do not know how to make the regular expression
/{{([^}]*)}}/g
and
/\({2}([^)]*)\){2}/g
I tried adding double-braced here, but it does not work:
/{{([^}}]*)}}/g
Because you want to permit single braces inside, you shouldn't use a negative character set - instead, start at the left delimiter and lazy-repeat any character until you get to the right delimiter. For example:
/{{(.*?)}}/
const pattern = /{{(.*?)}}/g;
const str = 'foo{{bar}} foo{{baz}} foo{{with}bracket}}';
console.log(str.match(pattern));
I think you're missing the backslash in your parentheses.
Maybe something like /\{\{([^)]+)\}\}/ would work
Example:
console.log("{{TEST}}".match(/\{\{([^)]+)\}\}/)[1]);
console.log("{{TE{{}}ST}}".match(/\{\{([^)]+)\}\}/)[1]);
Hope that helps
what #certainperformance said is correct.
I just want to add a expression for both parentheses and braces as you asked for in question.
**You can try this for both parentheses and braces **
((?:{{)(.*?}}))|((?:\(\()(.*?\)\)))
Demo
const regex = /((?:{{)(.*?}}))|((?:\(\()(.*?\)\)))/gm;
const str = `foo{{bar}} foo{{baz}} foo{{with}bracket}}
foo((hello))
test{{test))
{{test))
{{test}}
((test))
test))
`;
let m;
let op = str.match(regex);
console.log(op);

Categories