Getting substrings (strings with "") with Regex - javascript

I have a main string like this:
const mainString: string = `
let myStr: string = "Hello world";
function x() { }
let otherVar: number = 4;
let otherString: string = "Other string value";
// something else INSIDE the main string
`;
Now I need to format this main string, but substrings cause unwanted stuff, so I need to get them.
The regex I used until now was: /"([^"]*)"/g.
With it I would get e.g. ['"Hello world"', '"Other string value"'] (in the mainString context from above).
But having a "\"" inside one of these substring, would throw off my regex and give me the part from the beginning until the \" and then, if some other substring was used anywhere else, give me the real end of the substring " (falsly as a start symbol) until the start of the next substring...
One important thing: I have absolutly no control what so ever about anything before and beyont the "substring value".
What would be the correct regex for my usecase?

Try this one
"((\\"|[^"])*?)"
\\" - mean look for \"
| - or
*? - for no greedy
see: regex101

Related

Replacing characters at the start and end of a certain string

Suppose this string:
b*any string here*
In case this exists, I want to replace b* at the beginning to <b>, and the * at the end to </b> (Disregard the backslash I need it for escaping on SO site).
Moreover, there might be more then one match:
b*any string here* and after that b*string b*.
These cases should not be handled:
b*foo bar
foo bar*
bb*foo bar* (b is not after a whitespace or beginning of string).
I've gotten this far:
(?<=b\*)(.*?)(?=\*)
This gives me the string in between but Im having difficulties in doing the swap.
Use String#replace, you only need to capture the text you want to preserve:
var result = theString.replace(/\bb\*(.*?)\*/g, "<b>$1</b>");
The \b at the begining of the regex means word boundary so that it only matches bs that are not part of a word. $1 means the first captured group (.*?).
Example:
var str1 = "b*any string here* and after that b*string b*.";
var str2 = `b*foo bar
foo bar*
bb*foo bar* (b is not after a whitespace or beginning of string).`;
console.log(str1.replace(/\bb\*(.*?)\*/g, "<b>$1</b>"));
console.log(str2.replace(/\bb\*(.*?)\*/g, "<b>$1</b>"));
You could use \b(?:b\*)(.+?)(?:\*), so
const result = yourString.replace(/\b(?:b\*)(.+?)(?:\*)/, "<b>$1</b>");
See the 'Replace' tab https://regexr.com/447cq

JavaScript: How to pull out a string from a URL using a regular expression

In java, I have this URL as a string:
window.location.href =
"http://localhost:8080/bladdey/shop/c6c8262a-bfd0-4ea3-aa6e-d466a28f875/hired-3";
I want to create a javascript regular expression to pull out the following string:
c6c8262a-bfd0-4ea3-aa6e-d466a28f875
To find left hand marker for the text, I could use the regex:
window\.location\.href \= \"http\:\/\/localhost:8080\/bladdey\/shop\/
However, I don't know how to get to the text between that and /hired3"
What is the best way to pull out that string from a URL using javascript?
You could split the string in tokens and look for a string that has 4 occurrences of -.
Or, if the base is always the same, you could use the following code:
String myString = window.location.href;
myString = myString.substring("http://localhost:8080/bladdey/shop/".Length());
myString = myString.subString(0, myString.indexOf('/'));
Use a lookahead and a lookbehind,
(?<=http://localhost:8080/bladdey/shop/).+?(?=/hired3)
Check here for more information.
Also, there is no need to escape the : or / characters.
You need a regex, and some way to use it...
String theLocation = "http://localhost:8080/bladdey/shop/c6c8262a-bfd0-4ea3-aa6e-d466a28f8752/hired-3";
String pattern = "(?</bladdey/shop/).+?(?=/hired3)";
// Create a Pattern object
Pattern r = Pattern.compile(pattern);
// Now create matcher object.
Matcher m = r.matcher(line);
if (m.find( )) {
System.out.println("Found value: " + m.group(0) );
} else {
System.out.println("NO MATCH");
}
note - this will still work when you change the host (it only looks for bladdey/shop/)
You can use capturing groups to pull out some content of your string.
In your case :
Pattern pattern = Pattern.compile("(http://localhost:8080/bladdey/shop/)(.+)(/hired-3)");
Matcher matcher = pattern.matcher(string);
if(matcher.matches()){
String value = matcher.group(2);
}
String param = html.replaceFist("(?s)^.*http://localhost:8080/bladdey/shop/([^/]+)/hired-3.*$", "$1");
if (param.equals(html)) {
throw new IllegalStateException("Not found");
}
UUID uuid = new UUID(param);
In regex:
(?s) let the . char wildcard also match newline characters.
^ begin of text
$ end of text
.* zero or more (*) any (.) characters
[^...]+ one or more (+) of characters not (^) being ...
Between the first parentheses substitutes $1.
Well if you want to pull out GUID from anything:
var regex = /[a-f0-9]{8}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{11,12}/i
It should really be {12} but in your url it is malformed and has just 15.5 bytes of information.

Javascript Regexp Duplicate Line Matching not working correctly

I am writing a Javascript code to parse some grammar files, it is quite some code but I will post relevant information here. I am using Javascript Regexp in order to match a duplicate line held within a string. The string contains, for example (assume the string name is lines):
if
else
;
print
{
}
test1
test1
=
+
-
*
/
(
)
num
string
comment
id
test2
test2
What should happen, is a match found on 'test1' and 'test2'. It should then delete the duplicate, leaving 1 instance of test1 and test2. What is happening is no match at all. I am confident in my regex but javascript may be doing something I am not expecting. Here is the code doing the work on the string given above:
var rex = new RegExp("(.*)(\r?\n\1)+","g");
var re = '/(.*)(\r?\n\1)+/g';
rex.lastIndex = 0;
var m = rex.exec(lines);
if (m) {
alert("Found Duplicate");
var linenum = lines.search(re); //Get line number of error
alert("Error: Symbol Defined twice\n");
alert("Error occured on line: " + linenum);
lines = lines.replace(rex,""); //Gets rid of the duplicate
}
It never gets into the if(m) statement. Therefore no match is found. I tested the regex here: http://regexpal.com/ using the regex in my code as well as the example text provided. It matches just fine, so I am at kind of a loss. If anyone can help, it would be great.
Thank you.
Edit:
Forgot to add, I am testing this in firefox, and it only has to work in firefox. Not sure if that matters.
First error: \ in a JS string is also an escape character.
var rex = new RegExp("(.*)(\r?\n\1)+","g");
should be written
var rex = new RegExp("(.*)(\\r?\\n\\1)+","g");
// or, shorter:
var rex = /(.*)(\r?\n\1)+/g;
if you want to make it work. In the case of the RegExp constructor, you’re passing the pattern as a string to the constructor function. This means you need to escape each \ backslash that occurs in the pattern. If you use a regexp literal, you don’t need to escape them, since they’re not in a string, but retain their ‘normal’ properties in the regexp pattern.
Second error, your expression
var re = '/(.*)(\r?\n\1)+/g';
is wrong. What you’re doing here is assigning a string literal to a variable. I’m assuming you meant to assign a regular expression literal, which should be written like this:
var re = /(.*)(\r?\n\1)+/g;
Third error: the last line
lines = lines.replace(rex,""); //Gets rid of the duplicate
removes both instances of all duplicate lines! If you want to keep the first instance of each duplicate, you should use
lines = lines.replace(rex, "$1");
And finally, this method only detects two consecutive identical lines. Is that what you want, or do you need to detect any duplicates, wherever they may be?
var str = 'if\nelse\n;\nprint\n{\n}\ntest1\ntest1\n=\n+\n-\n*\n/\n(\n)\nnum\nstring\ncomment\nid\ntest2\ntest2\ntest2\ntest2\ntest2';
console.log(str);
str = str.replace(/\r\n?/g,'');
// I prefer replacing all the newline characters with \n's here
str = str.replace(/(^|\n)([^\n]*)(\n\2)+/g,function(m0,m1,m2,m3,ind) {
var line = str.substr(0,ind).split(/\n/).length + 1;
var msg = '[Found duplicate]';
msg += '\nFollowing symbol defined more than once';
msg += '\n\tsymbol: ' + m2;
msg += '\n\ton line ' + line;
console.log(msg);
return m1 + m2;
});
console.log(str);
Otherwise you can skip the first line and change the pattern into
/(^|\r\n?|\n)([^\r\n]*)((?:\r\n?|\n)\2)+/g
Note that [^\n]* will also catch multiple empty lines. If you want to make sure it matches (and replaces) non-empty lines then you might want to use [^\n]+.
[EDIT]
For the record, each m represents each arguments object, so m0 is the whole match, m1 is the 1st subgroup ((^|\n)), m2 is the 2nd subgroup (([^\n]*)) and m3 is the last subgroup ((\n\2)). I could have used arguments[n] instead but these are shorter.
As with the return value, due to lack of lookbehind in the regex flavor used by Javascript, this pattern is catching a possible preceding newline (unless it is the first line) so it needs to return the match and that preceding newline if any. That's why it shouldn't be returning m2 only.

How can I get a substring located between 2 quotes?

I have a string that looks like this: "the word you need is 'hello' ".
What's the best way to put 'hello' (but without the quotes) into a javascript variable? I imagine that the way to do this is with regex (which I know very little about) ?
Any help appreciated!
Use match():
> var s = "the word you need is 'hello' ";
> s.match(/'([^']+)'/)[1];
"hello"
This will match a starting ', followed by anything except ', and then the closing ', storing everything in between in the first captured group.
http://jsfiddle.net/Bbh6P/
var mystring = "the word you need is 'hello'"
var matches = mystring.match(/\'(.*?)\'/); //returns array
​alert(matches[1]);​
If you want to avoid regular expressions then you can use .split("'") to split the string at single quotes , then use jquery.map() to return just the odd indexed substrings, ie. an array of all single-quoted substrings.
var str = "the word you need is 'hello'";
var singleQuoted = $.map(str.split("'"), function(substr, i) {
return (i % 2) ? substr : null;
});
DEMO
CAUTION
This and other methods will get it wrong if one or more apostrophes (same as single quote) appear in the original string.

Replace multiple whitespaces with single whitespace in JavaScript string

I have strings with extra whitespace characters. Each time there's more than one whitespace, I'd like it be only one. How can I do this using JavaScript?
Something like this:
var s = " a b c ";
console.log(
s.replace(/\s+/g, ' ')
)
You can augment String to implement these behaviors as methods, as in:
String.prototype.killWhiteSpace = function() {
return this.replace(/\s/g, '');
};
String.prototype.reduceWhiteSpace = function() {
return this.replace(/\s+/g, ' ');
};
This now enables you to use the following elegant forms to produce the strings you want:
"Get rid of my whitespaces.".killWhiteSpace();
"Get rid of my extra whitespaces".reduceWhiteSpace();
Here's a non-regex solution (just for fun):
var s = ' a b word word. word, wordword word ';
// with ES5:
s = s.split(' ').filter(function(n){ return n != '' }).join(' ');
console.log(s); // "a b word word. word, wordword word"
// or ES2015:
s = s.split(' ').filter(n => n).join(' ');
console.log(s); // "a b word word. word, wordword word"
Can even substitute filter(n => n) with .filter(String)
It splits the string by whitespaces, remove them all empty array items from the array (the ones which were more than a single space), and joins all the words again into a string, with a single whitespace in between them.
using a regular expression with the replace function does the trick:
string.replace(/\s/g, "")
I presume you're looking to strip spaces from the beginning and/or end of the string (rather than removing all spaces?
If that's the case, you'll need a regex like this:
mystring = mystring.replace(/(^\s+|\s+$)/g,' ');
This will remove all spaces from the beginning or end of the string. If you only want to trim spaces from the end, then the regex would look like this instead:
mystring = mystring.replace(/\s+$/g,' ');
Hope that helps.
jQuery.trim() works well.
http://api.jquery.com/jQuery.trim/
I know I should not necromancy on a subject, but given the details of the question, I usually expand it to mean:
I want to replace multiple occurences of whitespace inside the string with a single space
...and... I do not want whitespaces in the beginnin or end of the string (trim)
For this, I use code like this (the parenthesis on the first regexp are there just in order to make the code a bit more readable ... regexps can be a pain unless you are familiar with them):
s = s.replace(/^(\s*)|(\s*)$/g, '').replace(/\s+/g, ' ');
The reason this works is that the methods on String-object return a string object on which you can invoke another method (just like jQuery & some other libraries). Much more compact way to code if you want to execute multiple methods on a single object in succession.
var x = " Test Test Test ".split(" ").join("");
alert(x);
Try this.
var string = " string 1";
string = string.trim().replace(/\s+/g, ' ');
the result will be
string 1
What happened here is that it will trim the outside spaces first using trim() then trim the inside spaces using .replace(/\s+/g, ' ').
How about this one?
"my test string \t\t with crazy stuff is cool ".replace(/\s{2,9999}|\t/g, ' ')
outputs "my test string with crazy stuff is cool "
This one gets rid of any tabs as well
If you want to restrict user to give blank space in the name just create a if statement and give the condition. like I did:
$j('#fragment_key').bind({
keypress: function(e){
var key = e.keyCode;
var character = String.fromCharCode(key);
if(character.match( /[' ']/)) {
alert("Blank space is not allowed in the Name");
return false;
}
}
});
create a JQuery function .
this is key press event.
Initialize a variable.
Give condition to match the character
show a alert message for your matched condition.

Categories