I have something like:
/MyFile/14/file_1.txt
/MyFile/17/file_2.txt
/MyFile/10/file_3.txt
How can I use replace in regular expression? to turn them into
file 1
file 2
file 3
I've tried
.replace('/Myfile/\d+/', '').replace('_', '').replace('.txt', '')
and the output are
/MyFile/14/file 1
/MyFile/17/file 2
/MyFile/10/file 3
Thanks in advance.
you don't need to use several replacements, you only need to use capturing groups:
import re
p = re.compile(r'^.*/(.+)_(\d+)\.txt$')
repl = r'\1 \2'
result = re.sub(p, repl, yourstring)
Note that when you write a pattern you need to use a raw string (r'....') to avoid to double backslashes.
The following code would produce what you want given that the input data is a multiline string. It uses a regular expression and the sub() method of the python re module.
In the regular expression ^/MyFile/\d+/file_(\d+).txt$, the parenthesis define a capturing group which can latter be used in the replacement text using \1 (where 1 is for 1st capturing group).
Also note the r prefix for the strings r'^/MyFile/\d+/file_(\d+)\.txt$' which means python raw string and avoid us to escape the backslashes.
import re
data = """\
/MyFile/14/file_1.txt
/MyFile/17/file_2.txt
/MyFile/10/file_3.txt
"""
re_file_number = re.compile(r'^/MyFile/\d+/file_(\d+)\.txt$', re.MULTILINE)
print re_file_number.sub(r'file \1', data)
produces:
file 1
file 2
file 3
re may help
[ x.replace( "_", " " ) for x in re.compile( "(?<=/MyFile/[0-9][0-9]/).+(?=.txt)" ).findall( aString ) ]
Related
I have a string like the following:
SOME TEXT (BI1) SOME MORE TEXT (BI17) SOME FINAL TEXT (BI1234)
Question
I am trying to make a regex to get just the information between the curly brackets, for example the end string would look like:
BI1 BI17 BI1234
I have found this example on stackoverflow which will get the first value BI1, but will ignore the rest after.
Get text between two rounded brackets
this is the REGEX I created from the above link: /\(([^)]+)\)/g but it includes the brackets, I want to remove these.
I am using this website to attempt to solve this query which has a testing window to see if the regex entered works:
http://www.regexr.com
Additional Information
there can be any amount of numbers also, which is why I have given 3 different examples.
this is a continous string, not on seperate lines
thanks for any help on this matter.
While this isn't possible using just regexes, you can do it with string#split and the following regex:
\).*?\(|^.*?\(|\).*?$
Yielding code that looks a bit like this:
function getBracketed(str) {
return str.split(/\).*?\(|^.*?\(|\).*?$/).filter(Boolean);
}
(You need to filter out the empty strings that'll appear at the beginning and end if you do it this way - hence the extra operation).
Regex demo on Regex101
Code demo on Repl.it
If you need to keep all inside parentheses and remove everything else, you might use
var str = "SOME TEXT (BI1) SOME MORE TEXT (BI17) SOME FINAL TEXT (BI1234)";
var result = str.replace(/.*?\(([^()]*)\)/g, " $1").trim();
console.log(result);
If you need to get only the BI+digits pattern inside parentheses, use
/.*?\((BI\d+)\)/g
Details:
.*? - match any 0+ chars other than linebreak symbols
\( - match a (
(BI\d+) - Group 1 capturing BI + 1 or more digits (\d+) (or [^()]* - zero or more chars other than ( and ))
\) - a closing ).
To get all the values as array (say, for later joining), use
var str = "SOME TEXT (BI1) SOME MORE TEXT (BI17) SOME FINAL TEXT (BI1234)";
var re = /\((BI\d+)\)/g;
var res =str.match(re).map(function(s) {return s.substring(1, s.length-1);})
console.log(res);
console.log(res.join(" "));
I have a String like CORP\tmothy (general format is CORP\<username>) and I want to extract the word tmothy from this String
I am using split function , but its trying to split "\t" instead of "\". I have escaped the backslash using "\\", but still no luck.
This might be the case with any usernames starting with n , r , b etc as they are equivalent to \n,\b,\r
How do I overcome this with the JS script?
If you have a string 'CORP\tmothy', then doing .split('\') will definetely do the trick. Check this code:
var s = 'CORP\\tmothy'; // escaping backslash here prevents it to become TAB in the string variable
s.split('\\'); // returns ["CORP", "tmothy"]
You must be doing something wrong.
I have a textArea. I am trying to split each string from a paragraph, which has proper grammar based punctuation delimiters like ,.!? or more if any.
I am trying to achieve this using Javascript. I am trying to get all such strings in that using the regular expression as in this answer
But here, in javascript for me it's not working. Here's my code snippet for more clarity
$('#split').click(function(){
var textAreaContent = $('#textArea').val();
//split the string i.e.., textArea content
var splittedArray = textAreaContent.split("\\W+");
alert("Splitted Array is "+splittedArray);
var lengthOfsplittedArray = splittedArray.length;
alert('lengthOfText '+lengthOfsplittedArray);
});
Since its unable to split, its always showing length as 1. What could be the apt regular expression here.
The regular expression shouldn't differ between Java and JavaScript, but the .split() method in Java accepts a regular expression string. If you want to use a regular expression in JavaScript, you need to create one...like so:
.split(/\W+/)
DEMO: http://jsfiddle.net/s3B5J/
Notice the / and / to create a regular expression literal. The Java version needed two "\" because it was enclosed in a string.
Reference:
https://developer.mozilla.org/en-US/docs/JavaScript/Guide/Regular_Expressions
You can try this
textAreaContent.split(/\W+/);
\W+ : Matches any character that is not a word character (alphanumeric & underscore).
so it counts except alphanumerics and underscore! if you dont need to split " " (space) then you can use;
var splittedArray = textAreaContent.split("/\n+/");
I´m trying to get the first part of a hash from a url (the part between the # and a /, a ? or the end of the string
So far now I came out with this:
r = /#(.*)[\?|\/|$]/
// OK
r.exec('http://localhost/item.html#hash/sub')
["#hash/", "hash"]
// OK
r.exec('http://localhost/item.html#hash?sub')
["#hash?", "hash"]
// WAT?
r.exec('http://localhost/item.html#hash')
null
I was expeting to receive "hash"
I tracked down the problem to
/#(.*)[$]/
r2.exec('http://localhost/item.html#hash')
null
any idea what could be wrong?
r = /#(.*)[\?|\/|$]/
When $ appears in [] (character class, it's the literal "$" character, not the end of input/line. In fact, your [\?|\/|$] part is equivalent to just [?/$|], which matches the 4 specific characters (including pipe).
Use this instead (JSFiddle)
r = /#(.+?)(\?|\/|$)/
You aren't supposed to write [$] (within a character class) unless you want to match the $ literally and not the end of line.
/#(.*)$/
Code:
var regex = /\#(.*)$/;
regex.exec('http://localhost/item.html#hash');
Output:
["#hash", "hash"]
Your regex: /#(.*)[\?|\/|$]/
//<problem>-----^ ^-----<problem>
| operator won't work within [], but within ()
$ will be treated literally within []
.* will match as much as possible. .*? will be non-greedy
On making the above changes,
you end up with /#(.*?)(\?|\/|$)/
I use http://regexpal.com/ to test my regular expressions.
Your problem here is that your regular expression wants a /. So it don't works with http://localhost/item.html#hash but it works with http://localhost/item.html#hash/
Try this one :
r = /#([^\?|\/|$]*)/
You can't use the $ end-of-string marker in a character class. You're probably better off just matching characaters that aren't / or ?, like this:
/#([^\?\/]*)/
Why Regex? Do it like this (nearly no regex):
var a = document.createElement('a');
a.href = 'http://localhost/item.html#hash/foo?bar';
console.log(a.hash.split(/[\/\?]/)[0]); // #hash
Just for the sake, if it is node.js you are working with:
var hash = require('url').parse('http://localhost/item.html#hash').hash;
I found this regular expression that seems to work
r = /#([^\/\?]*)/
r.exec('http://localhost/item.html#hash/sub')
["#hash", "hash"]
r.exec('http://localhost/item.html#hash?sub')
["#hash", "hash"]
r.exec('http://localhost/item.html#hash')
["#hash", "hash"]
Anyway, I still don't get why the original one isn't working
I'm trying to write a regex for use in javascript.
var script = "function onclick() {loadArea('areaog_og_group_og_consumedservice', '\x26roleOrd\x3d1');}";
var match = new RegExp("'[^']*(\\.[^']*)*'").exec(script);
I would like split to contain two elements:
match[0] == "'areaog_og_group_og_consumedservice'";
match[1] == "'\x26roleOrd\x3d1'";
This regex matches correctly when testing it at gskinner.com/RegExr/ but it does not work in my Javascript. This issue can be replicated by testing ir here http://www.regextester.com/.
I need the solution to work with Internet Explorer 6 and above.
Can any regex guru's help?
Judging by your regex, it looks like you're trying to match a single-quoted string that may contain escaped quotes. The correct form of that regex is:
'[^'\\]*(?:\\.[^'\\]*)*'
(If you don't need to allow for escaped quotes, /'[^']*'/ is all you need.) You also have to set the g flag if you want to get both strings. Here's the regex in its regex-literal form:
/'[^'\\]*(?:\\.[^'\\]*)*'/g
If you use the RegExp constructor instead of a regex literal, you have to double-escape the backslashes: once for the string literal and once for the regex. You also have to pass the flags (g, i, m) as a separate parameter:
var rgx = new RegExp("'[^'\\\\]*(?:\\\\.[^'\\\\]*)*'", "g");
while (result = rgx.exec(script))
print(result[0]);
The regex you're looking for is .*?('[^']*')\s*,\s*('[^']*'). The catch here is that, as usual, match[0] is the entire matched text (this is very normal) so it's not particularly useful to you. match[1] and match[2] are the two matches you're looking for.
var script = "function onclick() {loadArea('areaog_og_group_og_consumedservice', '\x26roleOrd\x3d1');}";
var parameters = /.*?('[^']*')\s*,\s*('[^']*')/.exec(script);
alert("you've done: loadArea("+parameters[1]+", "+parameters[2]+");");
The only issue I have with this is that it's somewhat inflexible. You might want to spend a little time to match function calls with 2 or 3 parameters?
EDIT
In response to you're request, here is the regex to match 1,2,3,...,n parameters. If you notice, I used a non-capturing group (the (?: ) part) to find many instances of the comma followed by the second parameter.
/.*?('[^']*')(?:\s*,\s*('[^']*'))*/
Maybe this:
'([^']*)'\s*,\s*'([^']*)'