Javascript RegEx match problem - javascript

I have a sentence structure along the lines of
[word1]{word2} is going to the [word3]{word4}
I'm trying to use a javascript regex to match the words for replacement later. To do this, I'm working towards getting the following multi-dimensional array:
[["word1", "word2"],["word3","word4"]]
I'm currently using this regex for the job:
\[(.*?)\]\{(.*?)\}
However, it comes up with results like:
["[word1]{word2}", "word1", "word2"]
or worse. I don't really understand why because this regex seems to work in Ruby just fine, and I'm not really much of a regex expert in general to understand what's going on. I'm just curious if there are any javascript rege expert's out there to whom this answer is very clear and can guide me along with what's going on here. I appreciate any help!
Edit:
This is the code I'm using just to test the matching:
function convertText(stringText) {
var regex = /\[(.*?)\]\{(.*?)\}/;
console.log(stringText.match(regex));
}

I assume you are using the exec method of the regular expression.
What you are doing is almost correct. exec returns an array where the first element is the entire match and the remaining elements are the groups. You want only the elements at indexes 1 and 2. Try something like this, but of course store the results into an array instead of using an alert:
var string = '[word1]{word2} is going to the [word3]{word4}';
var pattern = /\[(.*?)\]\{(.*?)\}/g;
var m;
while(m = pattern.exec(string)) {
alert(m[1] + ',' + m[2]);
}
This displays two alerts:
word1,word2
word3,word4

What you're seeing is Japanese hiragana. Make sure your input is in English maybe?
Edited to say: Upon further review, it looks like a dictionary entry in Japanese. The 私 is kanji and the わたし is hiragana, a phonetic pronunciation of the kanji. FWIW, the word is "Watashi" which is one of the words for "I" (oneself) in Japanese.

Related

How would I go about splitting a string by two brackets with regex?

I have been working with Discord.js and Node to a quick bot to look up something. I need a way to find all the occurrences that appear between two square brackers and store them in an array of strings. For now I'm using string-split() with some regex, but I am unsure of the regex to use.
I have tried using a few different ones, including /[^\[\[]+(?=\]\])/g and \[\[(.*?)\]\] - I dont mind having the actual brackets in the results, I can remove them manually with string.replace().
I am also working on a fallback with the normal string.split() and other string functions, not relying on regex, but I'm still curious about a possible regex version.
The result with the first regex is totally incorrect. For example, if I try "does [[this]] work [at all]?" the output is "[[]]" and "[at all]", when it really shouldn't take the "at all", but it shouls show the "[[this]]".
With the second regex I get somewhat closer, it gives back "this"(correct) and "[at all]" (again, it shouldn't take the "at all").
I don't mind having the brackets in the output, I can remove them manually myself, but I need to find all occurrences that are specifically between two brackets.
Try this regex:
\[\[([^[\]]|(?R))*\]\]
What you are trying to do is called Matching Balanced Constructs. More info at the link.
Upon further testing, unfortunately JS does not support (?R) so this becomes far more difficult. You could use the XRegExp.matchRecursive addon from the XRegExp package.
And your expression \[\[(.*?)\]\] should work. Working example below.
var str = 'does [[this]] work [at all] with another double [[here]]?';
var result = str.match(/\[\[(.*?)\]\]/g);
var newDiv = document.createElement("div");
newDiv.innerHTML = result;
document.body.appendChild(newDiv);
Try my solution
var str = "does [[this]] work [at all]?";
var regexp = /\[([a-z0-9\s]+)\]/ig;
var resultArray = str.match(regexp);
resultArray = resultArray.map((item) => {
return item.replace(/(\[|\])/g, "");
})
console.log(resultArray);

Look behind replace all occurrences

I want to replace all occurences of .digit with 0.digit.
I'm new to regular expressions but as far as I understand I could use look behind to do this. But JS does not support that, I'd like to know if someone knows a solution.
To show the problem I wrote the following code.
str = "0.11blabla.22bla0.33bla.33"
allow = "\\.\\d*"
str.match(new RegExp(allow,"g"))
[".11", ".22", ".33", ".33"]
deny = "0\\.\\d*"
str.match(new RegExp(deny,"g"))
["0.11", "0.33"]
diffreg= new RegExp("(?!"+deny+")"+allow,"g") // translates to: /(?!0\.\d*)\.\d*/g
str.match(diffreg)
[".11", ".22", ".33", ".33"]
Obviously allow matches all decimal values whereas deny matches all values with a preceding 0. The result should of course be the set difference between the two: [".33", ".33"].
Use a group match.
> str.replace(/([^0])(\.\d)/g, "$10$2");
"0.11blabla0.22bla0.33bla0.33"
I think you are looking for this regex instead
[0]?(\.\d*)
So in your code you will have:
intersectionreg = new RegExp("[0]?("+allow+")","g")
Thanks #richard, edited

Regex to replace certain characters on first line

I'm thinking that this is something very simple, but I can't find an answer anywhere online. I've found results on how to match the whole first line in a multiline string, but not how to find all occurrences of a certain character ONLY on the first line.
So for instance:
HelloX dXudXe
How areX yXou?
FxIXne?
Matching all capital Xs only on the first line, and replacing that with nothing would result in:
Hello dude
How areX yXou?
FxIXne?
This matches only the first X:
/X/m
This matches all Xs:
/X/g
So I'm guessing the answer is the regex version of one of these statements:
"Replace all X characters until you find a newline"
"Replace all X characters in the first line"
This sounds like such a simple task, is it? And if so, how can it be done? I've spent hours looking for a solution, but I'm thinking that maybe I don't get the regex logic at all.
Without knowing the exact language you are using, it's difficult to give an example, but the theory is simple:
If you have a complex task, break it down.
In this case, you want to do something to the first line only. So, proceed in two steps:
Identify the first line
Perform an operation on it.
Using JavaScript as an example here, your code might look like:
var input =
"HelloX dXudXe" + "\n" +
"How areX yXou?" + "\n" +
"FxIXne?";
var result = input.replace(/^.*/,function(m) {
return m.replace(/X/g,'');
});
See how first I grab the first line, then I operate on it? This breaking down of problems is a great skill to learn ;)
Split the string into multiple lines, do the replacement on the first line, then rejoin them.
var lines = input.split('\n');
lines[0] = lines[0].replace(/X/g, '');
input = lines.join('\n');

Javascript Regex to select every non-alphanumeric character AND whitespace?

I'm new to JS, tried using -
/[0-9a-z]+$/gi
/[^0-9a-z]+$/gi
neither worked. Can anyone tell me where I am going wrong?
Replace
var sentence_split = arr.split(/[0-9a-z]+$/gi);
with
var sentence_split = arr.split(/[^0-9a-z]+/gi);
... if you prefer to go this way.
Explanation: the original regex was anchored (with $) to the end of the string, and splitted by words - and not symbols separating them.
Still, there's more than one way to do the things you do: I'd probably go just with:
var words = sentence.match(/(\w+)/g);
... capturing sequences of word-consisting symbols instead of splitting the phrase by something that separates them. Here's a Fiddle to play with.
UPDATE: And one last thing. I felt a bit... uneasy about wasting sort just to get max essentially. I don't know if you share these thoughts, still here's how I would update the searching code:
var longest;
words.forEach(function(e) {
if (! longest || longest.length < e.length) {
longest = e;
}
});
It's forEach, because I'm a bit lazy and imagine having a luxury of NOT working with IE8-; still, it's quite easy to rewrite this into a regular for... array-walking routine.
Updated fiddle.

Javascript regex expression to replace multiple strings?

I've a string done like this: "http://something.org/dom/My_happy_dog_%28is%29cool!"
How can I remove all the initial domain, the multiple underscore and the percentage stuff?
For now I'm just doing some multiple replace, like
str = str.replace("http://something.org/dom/","");
str = str.replace("_%28"," ");
and go on, but it's really ugly.. any help?
Thanks!
EDIT:
the exact input would be "My happy dog is cool!" so I would like to get rid of the initial address and remove the underscores and percentage and put the spaces in the right place!
The problem is that trying to put a regex on Chrome "something goes wrong". Is it a problem of Chrome or my regex?
I'd suggest:
var str = "http://something.org/dom/My_happy_dog_%28is%29cool!";
str.substring(str.lastIndexOf('/')+1).replace(/(_)|(%\d{2,})/g,' ');
JS Fiddle demo.
The reason I took this approach is that RegEx is fairly expensive, and is often tricky to fine tune to the point where edge-cases become less troublesome; so I opted to use simple string manipulation to reduce the RegEx work.
Effectively the above creates a substring of the given str variable, from the index point of the lastIndexOf('/') (which does exactly what you'd expect) and adding 1 to that so the substring is from the point after the / not before it.
The regex: (_) matches the underscores, the | just serves as an or operator and the (%\d{2,}) serves to match digit characters that occur twice in succession and follow a % sign.
The parentheses surrounding each part of the regex around the |, serve to identify matching groups, which are used to identify what parts should be replaced by the ' ' (single-space) string in the second of the arguments passed to replace().
References:
lastIndexOf().
replace().
substring().
You can use unescape to decode the percentages:
str = unescape("http://something.org/dom/My_happy_dog_%28is%29cool!")
str = str.replace("http://something.org/dom/","");
Maybe you could use a regular expression to pull out what you need, rather than getting rid of what you don't want. What is it you are trying to keep?
You can also chain them together as in:
str.replace("http://something.org/dom/", "").replace("something else", "");
You haven't defined the problem very exactly. To get rid of all stretches of characters ending in %<digit><digit> you'd say
var re = /.*%\d\d/g;
var str = str.replace(re, "");
ok, if you want to replace all that stuff I think that you would need something like this:
/(http:\/\/.*\.[a-z]{3}\/.*\/)|(\%[a-z0-9][a-z0-9])|_/g
test
var string = "http://something.org/dom/My_happy_dog_%28is%29cool!";
string = string.replace(/(http:\/\/.*\.[a-z]{3}\/.*\/)|(\%[a-z0-9][a-z0-9])|_/g,"");

Categories