Javascript Regex to select every non-alphanumeric character AND whitespace? - javascript

I'm new to JS, tried using -
/[0-9a-z]+$/gi
/[^0-9a-z]+$/gi
neither worked. Can anyone tell me where I am going wrong?

Replace
var sentence_split = arr.split(/[0-9a-z]+$/gi);
with
var sentence_split = arr.split(/[^0-9a-z]+/gi);
... if you prefer to go this way.
Explanation: the original regex was anchored (with $) to the end of the string, and splitted by words - and not symbols separating them.
Still, there's more than one way to do the things you do: I'd probably go just with:
var words = sentence.match(/(\w+)/g);
... capturing sequences of word-consisting symbols instead of splitting the phrase by something that separates them. Here's a Fiddle to play with.
UPDATE: And one last thing. I felt a bit... uneasy about wasting sort just to get max essentially. I don't know if you share these thoughts, still here's how I would update the searching code:
var longest;
words.forEach(function(e) {
if (! longest || longest.length < e.length) {
longest = e;
}
});
It's forEach, because I'm a bit lazy and imagine having a luxury of NOT working with IE8-; still, it's quite easy to rewrite this into a regular for... array-walking routine.
Updated fiddle.

Related

How would I go about splitting a string by two brackets with regex?

I have been working with Discord.js and Node to a quick bot to look up something. I need a way to find all the occurrences that appear between two square brackers and store them in an array of strings. For now I'm using string-split() with some regex, but I am unsure of the regex to use.
I have tried using a few different ones, including /[^\[\[]+(?=\]\])/g and \[\[(.*?)\]\] - I dont mind having the actual brackets in the results, I can remove them manually with string.replace().
I am also working on a fallback with the normal string.split() and other string functions, not relying on regex, but I'm still curious about a possible regex version.
The result with the first regex is totally incorrect. For example, if I try "does [[this]] work [at all]?" the output is "[[]]" and "[at all]", when it really shouldn't take the "at all", but it shouls show the "[[this]]".
With the second regex I get somewhat closer, it gives back "this"(correct) and "[at all]" (again, it shouldn't take the "at all").
I don't mind having the brackets in the output, I can remove them manually myself, but I need to find all occurrences that are specifically between two brackets.
Try this regex:
\[\[([^[\]]|(?R))*\]\]
What you are trying to do is called Matching Balanced Constructs. More info at the link.
Upon further testing, unfortunately JS does not support (?R) so this becomes far more difficult. You could use the XRegExp.matchRecursive addon from the XRegExp package.
And your expression \[\[(.*?)\]\] should work. Working example below.
var str = 'does [[this]] work [at all] with another double [[here]]?';
var result = str.match(/\[\[(.*?)\]\]/g);
var newDiv = document.createElement("div");
newDiv.innerHTML = result;
document.body.appendChild(newDiv);
Try my solution
var str = "does [[this]] work [at all]?";
var regexp = /\[([a-z0-9\s]+)\]/ig;
var resultArray = str.match(regexp);
resultArray = resultArray.map((item) => {
return item.replace(/(\[|\])/g, "");
})
console.log(resultArray);

How to make RegExp find the first match only starting from an arbitrary index?

I'm creating a transpiler from an obscure scripting language(thyme) to javascript, in javascript.
As a whole its far too complex to interpret it through regex alone but being able to use regex would save me a lot of finger stamina.
Heres the situation:
I have a source code
I have an index which I know is the starting point of the thing I'm trying to capture
I want to capture the first occurrence that matches the regexp that I have and the first only as later matches would just be wasting performance without knowing their context at this time.
.
var src = "++this is my ++cool source ++code";
var idx = 13;
var regex = /(\+\+[^\{\[\(\)\]\}\;\,\?\:\.\=\+\-\*\/\<\>\%\&\|\^\!\~ \n\r\t]+)/g;
var my_capture = ???
How do I make it so that the above snippet would result in my_capture == "++cool"?

Regex to replace certain characters on first line

I'm thinking that this is something very simple, but I can't find an answer anywhere online. I've found results on how to match the whole first line in a multiline string, but not how to find all occurrences of a certain character ONLY on the first line.
So for instance:
HelloX dXudXe
How areX yXou?
FxIXne?
Matching all capital Xs only on the first line, and replacing that with nothing would result in:
Hello dude
How areX yXou?
FxIXne?
This matches only the first X:
/X/m
This matches all Xs:
/X/g
So I'm guessing the answer is the regex version of one of these statements:
"Replace all X characters until you find a newline"
"Replace all X characters in the first line"
This sounds like such a simple task, is it? And if so, how can it be done? I've spent hours looking for a solution, but I'm thinking that maybe I don't get the regex logic at all.
Without knowing the exact language you are using, it's difficult to give an example, but the theory is simple:
If you have a complex task, break it down.
In this case, you want to do something to the first line only. So, proceed in two steps:
Identify the first line
Perform an operation on it.
Using JavaScript as an example here, your code might look like:
var input =
"HelloX dXudXe" + "\n" +
"How areX yXou?" + "\n" +
"FxIXne?";
var result = input.replace(/^.*/,function(m) {
return m.replace(/X/g,'');
});
See how first I grab the first line, then I operate on it? This breaking down of problems is a great skill to learn ;)
Split the string into multiple lines, do the replacement on the first line, then rejoin them.
var lines = input.split('\n');
lines[0] = lines[0].replace(/X/g, '');
input = lines.join('\n');

Regex lookbehind workaround for Javascript?

I am terrible at regex so I will communicate my question a bit unconventionally in the name of trying to better describe my problem.
var TheBadPattern = /(\d{2}:\d{2}:\d{2},\d{3})/;
var TheGoodPattern = /([a-zA-Z0-9\-,.;:'"])(?:\r\n?|\n)([a-zA-Z0-9\-])/gi;
// My goal is to then do this
inputString = inputString.replace(TheGoodPattern, '$1 $2);
Question: I want to match all the good patterns and do the subsequent find/replace UNLESS they are proceeded by the bad pattern, any ideas on how? I was able to accomplish this in other languages that support lookbehind but I am at a loss without it? (ps: from what I understand, JS does not support lookahead/lookbehind or if you prefer, '?>!', '?<=')
JavaScript does support lookaheads. And since you only need a lookbehind (and not a lookahead, too), there is a workaround (which doesn't really aid the readability of your code, but it works!). So what you can do is reverse both the string and the pattern.
inputString = inputString.split("").reverse().join("");
var pattern = /([a-z0-9\-])(?:\n\r?|\r)([a-z0-9\-,.;:'"])(?!\d{3},\d{2}:\d{2}:\d{2})/gi
inputString = inputString.replace(TheGoodPattern, '$1 $2');
inputString = inputString.split("").reverse().join("");
Note that you had redundantly used the upper case letters (they are being taken care of the i modifier).
I would actually test it for you if you supplied some example input.
I have also used the reverse methodology recommended by m.buettner, and it can get pretty tricky depending on your patterns. I find that workaround works well if you are matching simple patterns or strings.
With that said I thought I would go a bit outside the box just for fun. This solution is not without its own foibles, but it also works and it should be easy to adapt to existing code with medium to complicated regular expressions.
http://jsfiddle.net/52QBx/
js:
function negativeLookBehind(lookBehindRegExp, matchRegExp, modifiers)
{
var text = $('#content').html();
var badGoodRegex = regexMerge(lookBehindRegExp, matchRegExp, modifiers);
var badGoodMatches = text.match(badGoodRegex);
var placeHolderMap = {};
for(var i = 0;i<badGoodMatches.length;i++)
{
var match = badGoodMatches[i];
var placeHolder = "${item"+i+"}"
placeHolderMap[placeHolder] = match;
$('#content').html($('#content').html().replace(match, placeHolder));
}
var text = $('#content').html();
var goodRegex = matchRegExp;
var goodMatches = text.match(goodRegex);
for(prop in placeHolderMap)
{
$('#content').html($('#content').html().replace(prop, placeHolderMap[prop]));
}
return goodMatches;
}
function regexMerge(regex1, regex2, modifiers)
{
/*this whole concept could be its own beast, so I just asked to have modifiers for the combined expression passed in rather than determined from the two regexes passed in.*/
return new RegExp(regex1.source + regex2.source, modifiers);
}
var result = negativeLookBehind(/(bad )/gi, /(good\d)/gi, "gi");
alert(result);
​
html:
<div id="content">Some random text trying to find good1 text but only when that good2 text is not preceded by bad text so bad good3 should not be found bad good4 is a bad oxymoron anyway.</div>​
The main idea is find all the total patterns (both the lookbehind and the real match) and temporarily remove those from the text being searched. I utilized a map as the values being hidden could vary and thus each replacement had to be reversible. Then we can run just the regex for the items you really wanted to find without the ones that would have matched the lookbehind getting in the way. After the results are determined we swap back in the original items and return the results. It is a quirky, yet functional, workaround.

Javascript RegEx match problem

I have a sentence structure along the lines of
[word1]{word2} is going to the [word3]{word4}
I'm trying to use a javascript regex to match the words for replacement later. To do this, I'm working towards getting the following multi-dimensional array:
[["word1", "word2"],["word3","word4"]]
I'm currently using this regex for the job:
\[(.*?)\]\{(.*?)\}
However, it comes up with results like:
["[word1]{word2}", "word1", "word2"]
or worse. I don't really understand why because this regex seems to work in Ruby just fine, and I'm not really much of a regex expert in general to understand what's going on. I'm just curious if there are any javascript rege expert's out there to whom this answer is very clear and can guide me along with what's going on here. I appreciate any help!
Edit:
This is the code I'm using just to test the matching:
function convertText(stringText) {
var regex = /\[(.*?)\]\{(.*?)\}/;
console.log(stringText.match(regex));
}
I assume you are using the exec method of the regular expression.
What you are doing is almost correct. exec returns an array where the first element is the entire match and the remaining elements are the groups. You want only the elements at indexes 1 and 2. Try something like this, but of course store the results into an array instead of using an alert:
var string = '[word1]{word2} is going to the [word3]{word4}';
var pattern = /\[(.*?)\]\{(.*?)\}/g;
var m;
while(m = pattern.exec(string)) {
alert(m[1] + ',' + m[2]);
}
This displays two alerts:
word1,word2
word3,word4
What you're seeing is Japanese hiragana. Make sure your input is in English maybe?
Edited to say: Upon further review, it looks like a dictionary entry in Japanese. The 私 is kanji and the わたし is hiragana, a phonetic pronunciation of the kanji. FWIW, the word is "Watashi" which is one of the words for "I" (oneself) in Japanese.

Categories