Split string into array between to characters?

Split string into array between to characters? - javascript

So I'm trying to split string that I have in some not regular way. I tried to do something with regular expressions but I'm not so much into regex to invent something like that.
Basically I have a string that look like this:
var links = "<a>4</a><b><c><d><e><f>";
And I wanna use a .split() js method to recieve them in array like so:
["<a>4</a>", "<b>", "<c>", "<d>", "<e>", "<f>"]
So it's obviously that I need to split them on >< characthers but if I do this:
links.split("><");
They are gonna split in the way I wanna but I'm gonna loose the > and < signs. example:
["<a>4</a", "b", "c", "d", "e", "f>"]
This is not good solution for me.
So basically my question is: Is it possible to make some kind solution with regex or something else to get array result as I imagined?

A quick and clean method would be to use match instead of split :
var matches = links.match(/<.+?>(?=<|$)/g)
For your string it gives
["<a>4</a>", "<b>", "<c>", "<d>", "<e>", "<f>"]
Be careful that using regular expressions with a non regular language like HTML is dangerous. It's often OK for simple things but it's also limited and you may encounter surprises, for example if you want to apply it to nested elements.

A quick'n dirty method would be to add a delimiter between the ><, with a regex replace, and then to split on that delimiter:
var links = "<a>4</a><b><c><d><e><f>";
links.replace(/></g, '>|<').split('|');
Result:
["<a>4</a>", "<b>", "<c>", "<d>", "<e>", "<f>"]
Just make sure the delimiter you chose doesn't occur in the string itself.
(You can use longer strings as delimiters, like: "|-|".)

Related

How do you sanitize a string to pass a regex?

I have a regex validation I need my string to pass.
/^[0-9a-zA-Z-]+$/
I want to create a function that sanitizes the string for it to pass the regex.
I thought of doing something like
string.replace(/^[0-9a-zA-Z-]+$/,"");
Except I need to invert the above regex.
I tried to look up how to invert a regex but nothing seems to show up.

Try this string.replace(/\W/g,""). Also check this web site i always use it to test regular expressions, it also has hints on the right bottom

Negate the collection using ^ inside the []
const str = `abc*ç%ABC&(/())12345=?`
const newString = str.replace(/[^0-9a-zA-Z-]/g,"");
console.log(newString)

How to search/match regx in HTML content using javascript/jQuery?

I have an regex and i need to extract data from document.body.outerHTML i tried various methods like .search() , .match() & .exec() but it didn't work i need to extract data from whole document.body.outerHTML. My class is as below.
class executionPrompt {
constructor(){
this.regex = /(B[0-9]{2})/
this.data = [];
}
start(){
var html = jQuery(document.body.outerHTML);
var matches = jQuery(html).html();
console.log(matches.match(this.regex));
}
}
i have whole document and regex but i am not able to get array for my matching string. Please help me.

From this answer
You can't parse [X]HTML with regex. Because HTML can't be parsed by regex. Regex is not a tool that can be used to correctly parse HTML. As I have answered in HTML-and-regex questions here so many times before, the use of regex will not allow you to consume HTML. Regular expressions are a tool that is insufficiently sophisticated to understand the constructs employed by HTML. HTML is not a regular language and hence cannot be parsed by regular expressions. Regex queries are not equipped to break down HTML into its meaningful parts. ...
i as commented by #Wiktor Stribiżew i have change my mind and as of now i have requested to server to grabs id then get its index from tree,
FYI: to match all occurrences, a g modifier is required, /B[0-9]{2}/g may be seems an acceptable answer thumbs up for all.

Partial Regexp Match in Javascript

I have a long regex that is generated to match URLs like
/^\/([^\/.?]+)(?:\/([^\/.?]+)(?:\/([^\/.?]+)(?:\.([^\/.?]+))?)?)?$/
Would match:
/foo/bar/1.html
as ['foo', 'bar', '1', 'html']
In Javascript I would like to get the parts that match as the user types the url (like a typeahead). For example if they typed:
/foo
It would tell me that /foo was matched, but the whole regexp hasn't been satisfied. Ruby can return an array with only the matching partial elements like : ['foo', nil, nil, nil] is this possible, or easy to do in Javascript?

#minitech basically gave half the answer: use ? after each group, and then you'll be able to match the regex even if they're missing. Once you can do that, then just check the groups of the regex result to see which bits have been matched and which haven't.
For example:
/^\/([^\/.?]+)?(?:\/([^\/.?]+)?(?:\/([^\/.?]+)?(?:\.([^\/.?]+))?)?)?$/.exec('/ab/c')
Would return:
["/ab:c", "ab:c", "c", undefined, undefined]
By checking and seeing that the fourth value returned is undefined, you could then figure out which chunks were/were not entered.
As a side note, if you're going to be working lots of regexs like this, you can easily lose your sanity just trying to keep track of which group is which. For this reason I strongly recommend using "named group" regular expressions. These are otherwise normal regular expressions that you can create if you use the XRegxp library (http://xregexp.com/), like so:
var result = XRegExp.exec('/ab/c', /^\/(?<fooPart>[^\/.?]+)?(?<barPart>?:\/([^\/.?]+)?(?:\/([^\/.?]+)?(?:\.([^\/.?]+))?)?)?$/)
var fooPart = result.fooPart
That library also has other handy features like comments that can similarly help keep regular expression under control. If you're only using this one regex it's probably overkill, but if you're doing lots of JS regexp work I can't recommend that library enough.

What's wrong with this regular expression to find URLs?

I'm working on a JavaScript to extract a URL from a Google search URL, like so:
http://www.google.com/search?client=safari&rls=en&q=thisisthepartiwanttofind.org&ie=UTF-8&oe=UTF-8
Right now, my code looks like this:
var checkForURL = /[\w\d](.org)/i;
var findTheURL = checkForURL.exec(theURL);
I've ran this through a couple regex testers and it seems to work, but in practice the string I get returned looks like this:
thisisthepartiwanttofind.org,.org
So where's that trailing ,.org coming from?
I know my pattern isn't super robust but please don't suggest better patterns to use. I'd really just like advice on what in particular I did wrong with this one. Thanks!

Remove the parentheses in the regex if you do not process the .org (unlikely since it is a literal). As per #Mark comment, add a + to match one or more characters of the class [\w\d]. Also, I would escape the dot:
var checkForURL = /[\w\d]+\.org/i;

What you're actually getting is an array of 2 results, the first being the whole match, the second - the group you defined by using parens (.org).
Compare with:
/([\w\d]+)\.org/.exec('thisistheurl.org')
→ ["thisistheurl.org", "thisistheurl"]
/[\w\d]+\.org/.exec('thisistheurl.org')
→ ["thisistheurl.org"]
/([\w\d]+)(\.org)/.exec('thisistheurl.org')
→ ["thisistheurl.org", "thisistheurl", ".org"]
The result of an .exec of a JS regex is an Array of strings, the first being the whole match and the subsequent representing groups that you defined by using parens. If there are no parens in the regex, there will only be one element in this array - the whole match.

You should escape .(DOT) in (.org) regex group or it matches any character. So your regex would become:
/[\w\d]+(\.org)/
To match the url in your example you can use something like this:
https?://([0-9a-zA-Z_.?=&\-]+/?)+
or something more accurate like this (you should choose the right regex according to your needs):
^https?://([0-9a-zA-Z_\-]+\.)+(com|org|net|WhatEverYouWant)(/[0-9a-zA-Z_\-?=&.]+)$

How to match between characters but not include them in the result

Say I have a string "&something=variable&something_else=var2"
I want to match between &something= and &, so I'll write a regular expression that looks like:
/(&something=).*?(&)/
And the result of .match() will be an array:
["&something=variable&", "&something=", "&"]
I've always solved this by just replacing the start and end elements manually but is there a way to not include them in the match results at all?

You're using the wrong capturing groups. You should be using this:
/&something=(.*?)&/
This means that instead of capturing the stuff you don't want (the delimiters), you capture what you do want (the data).

You can't avoid them showing up in your match results at all, but you can change how they show up and make it more useful for you.
If you change your match pattern to /&something=(.+?)&/ then using your test string of "&something=variable&something_else=var2" the match result array is ["&something=variable&", "variable"]
The first element is always the entire match, but the second one, will be the captured portion from the parentheses, which is much more useful, generally.
I hope this helps.

If you are trying to get variable out of the string, using replace with backreferences will get you what you want:
"&something=variable&something_else=var2".replace(/^.*&something=(.*?)&.*$/, '$1')
gives you
"variable"

We Keep Coding

JavaScript is the programming language of the Web.

Split string into array between to characters? - javascript

Related

How do you sanitize a string to pass a regex?

How to search/match regx in HTML content using javascript/jQuery?

Partial Regexp Match in Javascript

What's wrong with this regular expression to find URLs?

How to match between characters but not include them in the result

Categories

Resources