JS split advanced (i think)

JS split advanced (i think) - javascript

Can anyone help me?
I want to split a hex string on "0000", but this "0000" must be followed by anything other than "00". I'm trying to Split (), but any solution serves me.
EDIT: Explaining ... and correcting a few things: P
This is an example of hexstring I'm using.
http://pastebin.com/u68bG6PP (It is a coded text in Shift-JIS (with some peculiarities example below))
"82824f4f00000000828250500000000082825151000000008282525200000000828253530000000082825454000000008282555500000000"
Here "0000" indicates that is the end of the text line, so it should be split in the last "0000" before the next line (which never begins with "00").
Basically necessary that the above excerpt turn:
82824f4f0000
828250500000
828251510000
828252520000
828253530000
828254540000
828255550000
And that, I hope the explanation is good now ee.
And extra question, I never really touched with Shift_JIS, so any way to turn it into unicode for displaying, or simply display it as SJIS is welcome.

You can use a regex with negative lookahead:
yourHex.split(/0000(?!00)/g)
This is an explicit translation of your problem description. However it might not necessarily be what you want, because it yields (maybe) surprising results:
"10000001".split(/0000(?!00)/g)
// => ["10", "01"]
If you want the four zeroes to not be preceded by another zero, you might have to use another technique, since JS regexes do not support lookbehind.

If I understand, you want to split on "0000" but you want to leave "000000" alone?
SO for example, "00001111000000222200003233" would result in
["11110000002222","333"]?
"00001111000000222200003333".replace(/000000/g,"token")
.split("0000")
.map(function (el) {
return el.replace(/token/g,"000000");
});
//["11110000002222","333"]
negative lookahead will yield a different result
"00001111000000222200003333".split(/0000(?!00)/)
//["", "11110", "02222", "3333"]
not sure what you are looking for exactly though

I think your question need more explaining and maybe some code to back it up. I think this is what you are asking for.
yourHex.split(/0000/);

Related

Regex Delimeter in PapaParse

I wish to ask, is it possible to use regexes as delimiters in PapaParse? Something like:
Papa.parse(string,
{
delimiter:regex
}
);
I am trying to match a specific pattern of CSVs like so:
/([A-Za-z]{2}[0-9]+,?)/g
i.e. I want exactly 2 letters, any amount of numbers, and a comma (or not, in the case of the last element).
Since string.split has a wonderful habit of returning anything but null when nothing matches regex patterns, I was hoping that my answer would lie in PapaParse. If this is not possible, then I would do something more long winded, but hopefully I can be laz-... efficent this time. :)
Trying to do the following:
Papa.parse('ACB5,dsa',{delimiter:'[A-Za-z]{2}[0-9]+,?'});
Results in
["ACB5","dsa"]
Thank you for your time.
edit
Trying out the regex on regexr.com shows that it works with values like
AB544444444444,BC5,
aa5,
At this point, I realize that this was actually a dozy question, considering how a delimiter is the thing that separates what you want to break up.
I'm writing the longer winded version now, so I'll stick that up soon

As Matt (and common sense) rightly say, yes, The delimiter is just ye olde comma. I was looking for a way to separate the results based on a regex, which past me had thought would have some similarity to how string.split works. This is the snippet I was trying to shrink down.
var result = null;
var regex = /([A-Za-z]{2}[0-9]+,?)/g; //Any two letters, a number, and a space
$(document).ready( function() {
$('#inputGraphText').on('input', function(e){ //on text input...
result = $(this).val().split(','); //split using the complex delimiter ','. Also adds a few "" elements to the array for some reason.
var tidy = new Array();
result.forEach(function(elem){
if(elem.search(regex) > -1){
tidy.push(elem.replace('/[, ]/g',''));//Tidy up the result
}
});
$('#first').html(tidy); //place to print out the tidied results
})
});
Obviously , this is not terribly schnazzy (and completely misses out on using PapaParse), but it is what I originally set out to do.
Any better alternatives will take pride of place, but for now, this is fine.
My apologies for the confusion.

How to split a string with the help of JavaScript's regex?

I need to split a string to one or more substrings each of which contains no more or less than two dots. For example, if the string is foo.boo.coo.too" then what would be the regex to get the following array?: ["foo.boo.coo", "boo.coo.too"]. I hope there will be someone to answer this question - I will really admire you, as I've been programming for several years and have not still be used to regular expressions well enough to solve this particular problem by myself. Thank you very much in advance. Let me know your identity so that I can credit you as a contributor of the program I am creating.

RegEx is for this Problem not the best solution a similar problem was discussed here: split-a-sting-every-3-characters-from-back-javascript
A good javascript solution would be a javascript function like this
function splitter(text){
var parts = text.split(".");
var times = parts.length - 2;
var values = [];
for(var index = 0; index<times;index++)
{
values.push(parts.slice(index,index+3).join("."));
}
return values;
}
splitter("too.boo.coo.too")
//=> Result tested on Chrome 25+ ["too.boo.coo", "boo.coo.too"]
I hope this helps
If you want to Use Regex try the Lookhead Stuff, this could help http://www.regular-expressions.info/lookaround.html

Regex by its nature will return non-intersecting results, so if you want "all matches" from a single regex - it's not possible.
So basically you will need to find first match, and then start from next position to find next match and so on; something like this technique described here regex matches with intersection in C# (it's not JavaScript but idea is the same)
You can use the following regex for example:
(?<=^|\.)((?:[^.]*\.){2}[^.]*?)(?=$|\.)
It ensures that it starts and ends with dot, or at begin/end of line, and contains exactly two dots inside, and captures result in first capture. You can replace * with + to make sure at least one symbol exists between dots, if it is required.
But you need to understand that such approach has really bad performance for the task you are solving, so may be using other way (like split + for) will be better solution.

Javascript String Parsing

I am trying to parse a string in this format
[something](something something) [something](something something)
and I want to break on every space that is not between a set of parenthesis?
I tried using js string.split with this as the regex /[^\(].*\s+.*[^\)]/g, but it doesn't work? Any suggestions appreciated :-)
EDIT: I don't want to post this as an answer, because I want to leave it open to comments but I finally found a solution.
var a = "the>[the](the the) the>[the](the the) the"
var regex = /\s+(?!\w+[\)])/
var b = a.split(regex)
alert(b.join("+++"))

Is your input always this consistent? If it is, it could be as simple as splitting your string on ') ['
If it isn't, is it possible to just take what is between [ and )? Or is there some kind of nesting that is going on?

You are using the wrong tool for the job.
As was alluded to in this famous post, regular expressions cannot parse non-regular languages, and the "balanced parenthesis" problem cannot be described by a regular language.
Have you tried writing a parser instead?
EDIT:
It seems that you've finally clarified that nesting is not a requirement. In that case, I'd suggest gnur's solution.

This regex will do exactly what you asked, and nothing more:
'[x](x x) [x](x x)'.split(/ +(?![^\(]*\))/);

Javascript Regular Expressions Lookbehind Failing

I am hoping that this will have a pretty quick and simple answer. I am using regular-expressions.info to help me get the right regular expression to turn URL-encoded, ISO-8859-1 pound sign ("%A3"), into a URL-encoded UTF-8 pound sign ("%C2%A3").
In other words I just want to swap %A3 with %C2%A3, when the %A3 is not already prefixed with %C2.
So I would have thought the following would work:
Regular Expression: (?!(\%C2))\%A3
Replace With: %C2%A3
But it doesn't and I can't figure out why!
I assume my syntax is just slightly wrong, but I can't figure it out! Any ideas?
FYI - I know that the following will work (and have used this as a workaround in the meantime), but really want to understand why the former doesn't work.
Regular Expression: ([^\%C2])\%A3
Replace With: $1%C2%A3
TIA!

Why not just replace ((%C2)?%A3) with %C2%A3, making the prefix an optional part of the match? It means that you're "replacing" text with itself even when it's already right, but I don't foresee a performance issue.

Unfortunately, the (?!) syntax is negative lookahead. To the best of my knowledge, JavaScript does not support negative lookbehind.
What you could do is go forward with the replacement anyway, and end up with %C2%C2%A3 strings, but these could easily be converted in a second pass to the desired %C2%A3.

You could replace
(^.?.?|(?!%C2)...)%A3
with
$1%C2%A3

I would suggest you use the functional form of Javascript String.replace (see the section "Specifying a function as a parameter"). This lets you put arbitrary logic, including state if necessary, into a regexp-matching session. For your case, I'd use a simpler regexp that matches a superset of what you want, then in the function call you can test whether it meets your exact criteria, and if it doesn't then just return the matched string as is.
The only problem with this approach is that if you have overlapping potential matches, you have the possibility of missing the second match, since there's no way to return a value to tell the replace() method that it isn't really a match after all.

Processing Javascript RegEx submatches

I am trying to write some JavaScript RegEx to replace user inputed tags with real html tags, so [b] will become <b> and so forth. the RegEx I am using looks like so
var exptags = /\[(b|u|i|s|center|code){1}]((.){1,}?)\[\/(\1){1}]/ig;
with the following JavaScript
s.replace(exptags,"<$1>$2</$1>");
this works fine for single nested tags, for example:
[b]hello[/b] [u]world[/u]
but if the tags are nested inside each other it will only match the outer tags, for example
[b]foo [u]to the[/u] bar[/b]
this will only match the b tags. how can I fix this? should i just loop until the starting string is the same as the outcome? I have a feeling that the ((.){1,}?) patten is wrong also?
Thanks

The easiest solution would be to to replace all the tags, whether they are closed or not and let .innerHTML work out if they are matched or not it will much more resilient that way..
var tagreg = /\[(\/?)(b|u|i|s|center|code)]/ig
div.innerHTML="[b][i]helloworld[/b]".replace(tagreg, "<$1$2>") //no closing i
//div.inerHTML=="<b><i>helloworld</i></b>"

AFAIK you can't express recursion with regular expressions.
You can however do that with .NET's System.Text.RegularExpressions using balanced matching. See more here: http://blogs.msdn.com/bclteam/archive/2005/03/15/396452.aspx
If you're using .NET you can probably implement what you need with a callback.
If not, you may have to roll your own little javascript parser.
Then again, if you can afford to hit the server you can use the full parser. :)
What do you need this for, anyway? If it is for anything other than a preview I highly recommend doing the processing server-side.

You could just repeatedly apply the regexp until it no longer matches. That would do odd things like "[b][b]foo[/b][/b]" => "<b>[b]foo</b>[/b]" => "<b><b>foo</b></b>", but as far as I can see the end result will still be a sensible string with matching (though not necessarily properly nested) tags.
Or if you want to do it 'right', just write a simple recursive descent parser. Though people might expect "[b]foo[u]bar[/b]baz[/u]" to work, which is tricky to recognise with a parser.

The reason the nested block doesn't get replaced is because the match, for [b], places the position after [/b]. Thus, everything that ((.){1,}?) matches is then ignored.
It is possible to write a recursive parser in server-side -- Perl uses qr// and Ruby probably has something similar.
Though, you don't necessarily need true recursive. You can use a relatively simple loop to handle the string equivalently:
var s = '[b]hello[/b] [u]world[/u] [b]foo [u]to the[/u] bar[/b]';
var exptags = /\[(b|u|i|s|center|code){1}]((.){1,}?)\[\/(\1){1}]/ig;
while (s.match(exptags)) {
s = s.replace(exptags, "<$1>$2</$1>");
}
document.writeln('<div>' + s + '</div>'); // after
In this case, it'll make 2 passes:
0: [b]hello[/b] [u]world[/u] [b]foo [u]to the[/u] bar[/b]
1: <b>hello</b> <u>world</u> <b>foo [u]to the[/u] bar</b>
2: <b>hello</b> <u>world</u> <b>foo <u>to the</u> bar</b>
Also, a few suggestions for cleaning up the RegEx:
var exptags = /\[(b|u|i|s|center|code)\](.+?)\[\/(\1)\]/ig;
{1} is assumed when no other count specifiers exist
{1,} can be shortened to +

Agree with Richard Szalay, but his regex didn't get quoted right:
var exptags = /\[(b|u|i|s|center|code)](.*)\[\/\1]/ig;
is cleaner. Note that I also change .+? to .*. There are two problems with .+?:
you won't match [u][/u], since there isn't at least one character between them (+)
a non-greedy match won't deal as nicely with the same tag nested inside itself (?)

Yes, you will have to loop. Alternatively since your tags looks so much like HTML ones you could replace [b] for <b> and [/b] for </b> separately. (.){1,}? is the same as (.*?) - that is, any symbols, least possible sequence length.
Updated: Thanks to MrP, (.){1,}? is (.)+?, my bad.

How about:
tagreg=/\[(.?)?(b|u|i|s|center|code)\]/gi;
"[b][i]helloworld[/i][/b]".replace(tagreg, "<$1$2>");
"[b]helloworld[/b]".replace(tagreg, "<$1$2>");
For me the above produces:
<b><i>helloworld</i></b>
<b>helloworld</b>
This appears to do what you want, and has the advantage of needing only a single pass.
Disclaimer: I don't code often in JS, so if I made any mistakes please feel free to point them out :-)

You are right about the inner pattern being troublesome.
((.){1,}?)
That is doing a captured match at least once and then the whole thing is captured. Every character inside your tag will be captured as a group.
You are also capturing your closing element name when you don't need it and are using {1} when that is implied. Below is a cleanup up version:
/\[(b|u|i|s|center|code)](.+?)\[\/\1]/ig
Not sure about the other problem.

We Keep Coding

JavaScript is the programming language of the Web.

JS split advanced (i think) - javascript

I think your question need more explaining and maybe some code to back it up. I think this is what you are asking for. yourHex.split(/0000/);

Related

Regex Delimeter in PapaParse

How to split a string with the help of JavaScript's regex?

Javascript String Parsing

Javascript Regular Expressions Lookbehind Failing

Processing Javascript RegEx submatches

Categories

Resources