Javascript Regex Look behind - javascript

In my web app I need to remove all whitespace and line breaks before and after the content between a pair of ``. Example:
``\s\s\s\s\stest1234\s\s\s\s23432\s\s\s\s\s\s\s`` would become something like this: ``test1234\s\s\s\s23432``.
(\s is a whitespace)
The regex I wrote for this is: /(``(?<=[\s]*)[^`]*(?=[\s]*)``)/g but I found out JS doesn't have look behind, how would I transform this regex into something that does the job?
My JavaScript would look something like this:
replace(/(``(?<=[\s]*)[^`]*(?=[\s]*)``)/g, function(match, p1) {
return p1;
})
Note, I only want to remove the outer whitespace, the ones that belong to the content need t be preserved.

Make it two steps.
var src = "`` test123423432 \n\n ``";
var results = src.replace(/``([\s\S]*?)``/g,function(_,m) {
// note [\s\S] above is to handle JS's lack of a DOTALL flag
return "``"+m.replace(/^\s+|\s+$/g,"")+"``"; // trim all whitespace
});
If a problem seems too hard, usually breaking it down into smaller problems is the answer.

Related

Remove everything after constant using regex

I've got XML that has additional information, BLAH, in each tag. When creating the tags, I've separated the extra info from the tag name with a constant (XMLSPLIT as constant XML_SPLITTER)... I needed to do this because I'm generating my XML from a JSON object and I can't have multiple keys that are the same thing... but in the XML output, can't have that superfluous stuff.
For example:
....
<SetXMLSPLITBLAH>
<Value>9</Value>
<SetType>
<Name>Foo</Name>
</SetType>
</SetXMLSPLITBLAH>
...
So, after generating the XML, I go through and clean it. I'm trying to do it with a regex. I figure, I want to remove anything on a line after the splitter and replace it with just the >.
let reg = new RegExp("<Set"+XML_SPLITTER+"(.*)\/g");
cleanXML = dirtyXML.replace(reg, "<Set>")
This fails to work.
I will note, that I reg = /<Set(.*)/g; and that worked just fine... but it also captures "SetType" and any other use of a tag that starts with "
It's because ^ is a Regex special character that indicates "beginning of line". You'd need to escape it like \^ for this to work. Something like /<Set\^\^[^>]*>/g should do the trick.
Small note: The above regex assumes that the "BLAH" string in your example will never contain the > character... but if it does, then your XML is super malformed anyway.
Using .* will match > and if - for some reason - your XML file is not broken up into multiple lines (i.e. minified), you'll match more than you should. To avoid this, you can use [^>]* to match everything up to the >.
Since you've gracefully included a splitter, it'll make matching much easier and much more predictable (as you mentioned, you match SetType without a splitter).
Without a splitter, you'd have to use a regex pattern that resembles <Set(?!Type>)[^>]* or <Set(?!(?:Type|SomethingElse)>)[^>]* if you had more than just one suffix to Set that should remain. These methods use a negative lookahead to assert what follows does not match.
var str = `<SetXMLSPLITBLAH>
<Value>9</Value>
<SetType>
<Name>Foo</Name>
</SetType>
</SetXMLSPLITBLAH>`
var XML_SPLITTER = 'XMLSPLIT'
var p = `(</?)Set${XML_SPLITTER}[^>]*`
var r = new RegExp(p,'g')
x = str.replace(r,'$1Set')
console.log(x)

regex encapsulation

I've got a question concerning regex.
I was wondering how one could replace an encapsulated text, something like {key:23} to something like <span class="highlightable">23</span, so that the entity will still remain encapsulated, but with something else.
I will do this in JS, but the regex is what is important, I have been searching for a while, probably searching for the wrong terms, I should probably learn more about regex, generally.
In any case, is there someone who knows how to perform this operation with simplicity?
Thanks!
It's important that you find {key:23} in your text first, and then replace it with your wanted syntax, this way you avoid replacing {key:'sometext'} with that syntax which is unwanted.
var str = "some random text {key:23} some random text {key:name}";
var n = str.replace(/\{key:[\d]+\}/gi, function myFunction(x){return x.replace(/\{key:/,'<span>').replace(/\}/, '</span>');});
this way only {key:AnyNumber} gets replaced, and {key:AnyThingOtherThanNumbers} don't get touched.
It seems you are new to regex. You need to learn more about character classes and capturing groups and backreferences.
The regex is somewhat basic in your case if you do not need any nested encapsulated text support.
Let's start:
The beginning is {key: - it will match the substring literally. Note that { can be a special character (denoting start of a limiting quantifier), thus, it is a good idea to escape it: {key:.
([^}]+) - This is a bit more interesting: the round brackets around are a capturing group that let us later back-reference the matched text. The [^}]+ means 1 or more characters (due to +) other than } (as [^}] is a negated character class where ^ means not)
} matches a } literally.
In the replacement string, we'll get the captured text using a backreference $1.
So, the entire regex will look like:
{key:([^}]+)}
See demo on regex101.com
Code snippet:
var re = /{key:([^}]+)}/g;
var str = '{key:23}';
var subst = '<span class="highlightable">$1</span>';
document.getElementById("res").innerHTML = str.replace(re, subst);
.highlightable
{
color: red;
}
<div id="res"/>
If you want to use a different behavior based on the value of key, then you'll need to adjust the regex to either match digits only (with \d+) or letters only (say, with [a-zA-Z] for English), or other shorthand classes, ranges (= character classes), or their combinations.
If your string is in var a, then:
var test = a.replace( /\{key:(\d+)\}/g, "<span class='highlightable'>$1</span>");

How would I make a regular expression that would mean /some string/ either followed by a line break or not?

function(input){
return input.replace(/teststring/ig, "adifferentstring");
}
I want to replace "teststring" and "teststring\n" with "adifferentstring"
In regex, to match a specific character you can place it in brackets:
[\n]
To make the match "optional", you can follow it with a ?:
[\n]?
In your exact example, your full regex could be:
teststring[\n]?
So, your function would look like:
function replace(input) {
return input.replace(/teststring[\n]?/ig, "adifferentstring");
}
I'd suggest going with matching characters in brackets as this makes for easy expansion; consider, for instance, that you want to match Window's newlines (a carriage-return + a newline):
teststring[\r\n]?
Try
function(input){
return input.replace(/teststring\n?/ig, "adifferentstring");
}
Try .replace(/teststring[\n]?/ig,"adifferentstring");
It would be something like this:
var re = /teststring([\n]?)/ig;
So then your replace statement would look about like this:
return input.replace(re,"adifferentstring");
Here's a fiddle showing the regex works.
And then a fiddle showing the replace operation working.
Edit:
Actually, thinking about the problem a little further, if your regex does match a carriage return or new line character, that would need to get put back into the replacing string. The same regex I posted originally will work but you will need this replace statement instead (with the $1 denoting the first group in parantheses.
return input.replace(re,"adifferentstring$1");
fiddle

Javascript regex expression to replace multiple strings?

I've a string done like this: "http://something.org/dom/My_happy_dog_%28is%29cool!"
How can I remove all the initial domain, the multiple underscore and the percentage stuff?
For now I'm just doing some multiple replace, like
str = str.replace("http://something.org/dom/","");
str = str.replace("_%28"," ");
and go on, but it's really ugly.. any help?
Thanks!
EDIT:
the exact input would be "My happy dog is cool!" so I would like to get rid of the initial address and remove the underscores and percentage and put the spaces in the right place!
The problem is that trying to put a regex on Chrome "something goes wrong". Is it a problem of Chrome or my regex?
I'd suggest:
var str = "http://something.org/dom/My_happy_dog_%28is%29cool!";
str.substring(str.lastIndexOf('/')+1).replace(/(_)|(%\d{2,})/g,' ');
JS Fiddle demo.
The reason I took this approach is that RegEx is fairly expensive, and is often tricky to fine tune to the point where edge-cases become less troublesome; so I opted to use simple string manipulation to reduce the RegEx work.
Effectively the above creates a substring of the given str variable, from the index point of the lastIndexOf('/') (which does exactly what you'd expect) and adding 1 to that so the substring is from the point after the / not before it.
The regex: (_) matches the underscores, the | just serves as an or operator and the (%\d{2,}) serves to match digit characters that occur twice in succession and follow a % sign.
The parentheses surrounding each part of the regex around the |, serve to identify matching groups, which are used to identify what parts should be replaced by the ' ' (single-space) string in the second of the arguments passed to replace().
References:
lastIndexOf().
replace().
substring().
You can use unescape to decode the percentages:
str = unescape("http://something.org/dom/My_happy_dog_%28is%29cool!")
str = str.replace("http://something.org/dom/","");
Maybe you could use a regular expression to pull out what you need, rather than getting rid of what you don't want. What is it you are trying to keep?
You can also chain them together as in:
str.replace("http://something.org/dom/", "").replace("something else", "");
You haven't defined the problem very exactly. To get rid of all stretches of characters ending in %<digit><digit> you'd say
var re = /.*%\d\d/g;
var str = str.replace(re, "");
ok, if you want to replace all that stuff I think that you would need something like this:
/(http:\/\/.*\.[a-z]{3}\/.*\/)|(\%[a-z0-9][a-z0-9])|_/g
test
var string = "http://something.org/dom/My_happy_dog_%28is%29cool!";
string = string.replace(/(http:\/\/.*\.[a-z]{3}\/.*\/)|(\%[a-z0-9][a-z0-9])|_/g,"");

JavaScript regex match characters inside quotes and not in character set

I have a string I would like to split using #, ., [], or {} characters, as in CSS. The desired functionality is:
- Input:
"div#foo[bar='value'].baz{text}"
- Output:
["div", "#foo", "[bar='value'", ".baz", "{text"]
This is easy enough, with this RegEx:
input.match(/([#.\[{]|^.*?)[^#.\[{\]}]*/g)
However, this doesn't ignore syntax characters inside quotes, as I would like it too. (e.x. "div[bar='value.baz']" should ignore the .)
How can I make the second part of my RegEx (the [^#.\[{\]}]* portion) capture not only the negated character set, but also any character within quotes. In other words, how can I implement the RegEx, (\"|').+?\1 into my current one.
Edit:
I've figured out a regex that works decent, but can't handle escaped-quotes inside quotes (for example: "stuff here \\" quote "). If someone knows how to do that, it would be extremely helpful:
str.match(/([#.\[{]|^.*?)((['"]).*?\3|[^.#\[\]{\}])*/g);
var str = "div#foo[bar='value.baz'].baz{text}";
str.match(/(^|[\.#[\]{}])(([^'\.#[\]{}]+)('[^']*')?)+/g)
// [ 'div', '#foo', '[bar=\'value.baz\'', '.baz', '{text' ]
var tokens = myCssString.match(/\/\*[\s\S]*?\*\/|"(?:[^"\\]|\\[\s\S]*)"|'(?:[^'\\]|\\[\s\S])*'|[\{\}:;\(\)\[\]./#]|\s+|[^\s\{\}:;\(\)\[\]./'"#]+/g);
Given your string, it produces
div
#
foo
[
bar=
'value.foo'
]
.
baz
{
text
}
The RegExp above is loosely based on the CSS 2.1 lexical grammar
Firstly, and i can't stress this enough: you shouldn't use regexps to parse css, you should use a real parser, for instance http://glazman.org/JSCSSP/ or similar - many have built them, no need for you to reinvent the wheel.
that said, to solve your current problem do this:
var str = "div#foo[bar='value.foo'].baz{text}";
str.match(/([#.\[{]|^.*?)(?:[^#\[{\]}]*|\.*)/g);
//["div", "#foo", "[bar='value.foo'", ".baz", "{text"]

Categories