Regex catch string between two strings, multiple lines - javascript

I´m working on a *.po file, I´m trying to catch all the text between msgid "" and msgstr "", not really lucky, never more than one line:
msgid ""
"%s asdfgh asdsfgf asdfg %s even if you "
"asdfgdh sentences with no sense. We are not asking translate "
"Shakespeare's %s Hamlet %s !. %s testing regex %s "
"don't require specific industry knowledge. enjoying "
msgstr ""
What I´ve tried:
var myArray = fileContent.match(/msgid ([""'])(?:(?=(\\?))\2.)*?\1/g);
Thanks for your help, I´m not really good with regex :(

Here is one way to extract all of that text:
var match = text.replace(/msgid ""([\s\S]*?)msgstr ""/, "$1");
Example: http://jsfiddle.net/bqk79/
The [\s\S] is a character class that will match any character including line breaks, so [\s\S]*? will match any number of any character. In other languages you could use the s or DOTALL flag to make . match line breaks, but Javascript does not support this.
Note that you regex doesn't make any mention of single quotes, but if you need to be able to match between msgid '' and msgstr '' as well you can use the following:
var match = text.replace(/msgid (['"]{2})([\s\S]*?)msgstr \1/, "$2");

Try with this pattern:
/msgid (["']{2})\n([\s\S]*?)\nmsgstr \1/
The result is in the second capturing group, but you can make more simple with:
/msgid ["']{2}\n([\s\S]*?)\nmsgstr /
in the first capturing group

I realize that the question specifically asks for a regular expression, but you should consider using string split instead if you can.
Here is a ready-made function:
function extractTextBetween(subject, start, end) {
try{
return subject.split(start)[1].split(end)[0];
} catch(e){
console.log("Exception when extracting text", e);
}
}
http://jsfiddle.net/b33hdh9b/3/

You could perhaps try this regex?
msgid ""((?:.|[\n\r])+)msgstr ""
((?:.|[\n\r])+) this is your catching group;
(?:.|[\n\r])+ This enables the match of . or [\n\r] multiple times, the \n\r are for newlines and carriage returns.
Tested

Related

regular expression for getting # and after that

I am trying to create a regular expression for the string filtering. I want to get the symbol "#" and anything that is written after that and before a space.
Can someone help me with this?
For example:
hi I am #vaibhav .
The expected result this regular expression should give is vaibhav.
I made this:
/#[a-z]*/
However, I am not sure if this will confirm to the above mentioned criteria.
To get a substring from the # up to the first space after it, use
#\S+
See demo
The \S means a non-whitespace character.
If you do not need #, use a capturing group:
#(\S+)
The value you need will be in Group 1. See another demo.
If you are using JavaScript:
var re = /#(\S+)/g;
var str = 'hi I am #vaibhav . hi, and I am #strib .';
var m;
while ((m = re.exec(str)) !== null) {
document.write("The value is: <b>" + m[1] + "</b><br/>");
}
The simplest solution is to use a negated set.
Search characters that are not '#'
Read in the '#'
Now capture characters that are not ' '
If you're trying to match and capture you can accomplish that like this:
[^#]*#([^ ]*).*
[Live Example]
If you only want to search then you don't need to match the whole string and you can just extract the actual match section:
#([^ ]*)
[Live Example]
The most complicated situation is where you need to deal with an escaped '#'. Here's an example of a match using that:
(?:[^\\#]|\\.)*#([^ ]*).*
[Live Example]
You can do that with lookarounds.
Edited version:
(?<=#)\w+
Demo on regex101

RegEx - Match Character only when it's not proceeded or followed by same character

How would I match the quotations around "text" in the string below and not around "TEST TEXT" using RegEx. I wanted just quotations only when they are by themselves. I tried a negative lookahead (for a second quote) but it still captured the second of the two quotes around TEST TEXT.
This is some "text". This is also some ""TEST TEXT""
Be aware that I need this to scale so sometimes it would be right in the middle of a string so something like this:
/(\s|\w)(\")(?!")/g (using $2...)
Would work in this example but not if the string was:
This is some^"text".This is also some ""TEST TEXT""
I just need quotation marks by themselves.
EDIT
FYI, this needs to be Javascript RegEx so lookbehind would not be an option for me for this one.
Since you have not tagged any particular flavor of regex I am takig liberty of using lookbehind also. You can use:
(?<!")"(?!")[^"]*"
RegEx Demo
Update: For working with Javascript you can use this regex:
/""[^"]*""|(")([^"]*)(")/
And use captured group # 1 for your text.
RegEx Demo
I'm not sure if I really understood well your needs. I'll post this answer to check if it helps you but I can delete it if it doesn't.
So, is this what you want using this regex:
"\w+?"
Working demo
By the way, if you just want to get the content within "..." you can use this regex:
"(\w+?)"
Working demo
You can't do this with a pure JavaScript regexp. I am going to eat my words now however, as you can use the following solution using callback parameters:
var regex = /""+|(")/g
replaced = subject.replace(regex, function($0, $1) {
if ($1 == "\"") return "-"; // What to replace to?
else return $0;
});
"This is some -text-. This is also some ""TEST TEXT"""
If you're needing the regex to split the string, then you can use the above to replace matches to something distinctive, then split by them:
var regex = /""+|(")/g
replaced = subject.replace(regex, function($0, $1) {
if ($1 == "\"") return "☺";
else return $0;
});
splits = replaced.split("☺");
["This is some ", "text", ". This is also some ""TEST TEXT"""]
Referenced by:http://www.rexegg.com/regex-best-trick.html

Javascript Regular Expression for Removing all Spaces except for what between double quotes

I have a String that I need to strip out all the spaces except for what between "". Here is the Regex that I am using to strip out spaces.
str.replace(/\s/g, "");
I cant seem to figure out how to get it to ignore spaces between quotes.
Example
str = 'Here is my example "leave spaces here", ok im done'
Output = 'Hereismyexample"leave spaces here",okimdone'
Another way to do it. This has the assumption that no escaping is allowed within double quoted part of the string (e.g. no "leave \" space \" here"), but can be easily modified to allow it.
str.replace(/([^"]+)|("[^"]+")/g, function($0, $1, $2) {
if ($1) {
return $1.replace(/\s/g, '');
} else {
return $2;
}
});
Modified regex to allow escape of " within quoted string:
/([^"]+)|("(?:[^"\\]|\\.)+")/
var output = input.split('"').map(function(v,i){
return i%2 ? v : v.replace(/\s/g, "");
}).join('"');
Note that I renamed the variables because I can't write code with a variable whose name starts with an uppercase and especially when it's a standard constructor of the language. I'd suggest you stick with those guidelines when in doubt.
Rob, resurrecting this question because it had a simple solution that only required one replace call, not two. (Found your question while doing some research for a regex bounty quest.)
The regex is quite short:
"[^"]+"|( )
The left side of the alternation matches complete quoted strings. We will ignore these matches. The right side matches and captures spaces to Group 1, and we know they are the right spaced because they were not matched by the expression on the left.
Here is working code (see demo):
var subject = 'Here is my example "leave spaces here", ok im done';
var regex = /"[^"]+"|( )/g;
replaced = subject.replace(regex, function(m, group1) {
if (group1 == "" ) return m;
else return "";
});
document.write(replaced);
Reference
How to match pattern except in situations s1, s2, s3
How to match a pattern unless...

Delete a string in javascript, without leaving an empty space?

I've seen multiple instance of that kind of question, but not the one I'm looking for specifically... (I just hope I'm not hopelessly blind ! :P)
Let's consider this code:
var oneString = "This is a string";
document.write(oneString.replace("is", ""));
I would have assumed that the output would have been:
This a string.
But this is the output I'm getting:
This a string
It's like replace() think that the second argument sent is " " and not ""... What would be the proper manner then to strip the string of a given string, without having extra spaces floating in my output ?
You are actually getting "is" replaced with an empty string, it's the space before and after the "is" you replace that stay around as the two spaces you see. Try;
oneString.replace("is ", "")
Are you sure you're not getting "This a string"?
I think you should replace "is " with "" to get your desired output. There is a space before as well as after the word.
Look at the original string - "This_is_a_string" (I replaced spaces with underscores). When you remove "is", you don't touch either of the surrounding spaces, so both end up in the output. What you need to do is oneString.replace("is","").replace(/ +/," ") -- get rid of "is" and then eliminate any double spaces. If you want to keep some double spaces, try oneString.replace(" is","") instead, though you will run into issues if the string starts with is (eg "is it safe?").
The best answer might be something like oneString.replace(/is ?/,"") to match is possibly followed by a space oroneString.replace(/ ?is ?/," ") to match is possibly surrounded by spaces, and replace all of them with one space.
You didn't include any spaces in your pattern. When I try your code in Chrome I get:
> "This is a string".replace("is","")
"Th is a string"
One way to accomplish what you're trying would be to use a regexp instead:
> "This is a string".replace(/is\s/,"")
"This a string"
var aString = "This is a string";
var find = "is"; // or 'This' or 'string'
aString = aString.replace(new RegExp("(^|\\s+)" + find + "(\\s+|$)", "g"), "$1");
console.log(oneString);
The only case where this isn't perfect is when you replace the last word in the sentence. It will leave one space at the end, but I suppose you could check for that.
The g modifier is to make the replace replace all instances, and not just the first one.
Add the i modifier to make it case insensitive.
If you also want this to work on strings like:
"This has a comma, in it"
Change the regexp to:
var find = "comma";
new RegExp("(^|\\s+)" + find + "(\\s+|$|,)", "g")

Javascript substrings multiline replace by RegExp

I'm having some troubles with matching a regular expression in multi-line string.
<script>
var str="Welcome to Google!\n";
str = str + "We are proud to announce that Microsoft has \n";
str = str + "one of the worst Web Developers sites in the world.";
document.write(str.replace(/.*(microsoft).*/gmi, "$1"));
</script>
http://jsbin.com/osoli3/3/edit
As you may see on the link above, the output of the code looks like this:
Welcome to Google! Microsoft one of the worst Web Developers sites in the world.
Which means, that the replace() method goes line by line and if there's no match in that line, it returns just the whole line... Even if it has the "m" (multiline) modifier...
The multiline option only changes how the codes ^ and $ work, not how the code . works.
Use a pattern where you match any character using a set like [\w\W] instead of ., as that only matches non-linebreak characters.
document.write(str.replace(/[\w\W]*(microsoft)[\w\W]*/gmi, "$1"));

Categories