regex lookbehind in javascript - javascript

i im trying to match some words in text
working example (what i want) regex101:
regex = /(?<![a-z])word/g
text = word 1word !word aword
only the first three words will be matched which is what i want to achieve.
but the look behind will not work in javascript :(
so now im trying this regex101:
regex = /(\b|\B)word/g
text = word 1word !word aword
but all words will match and they may not be preceded with an other letter, only with an integer or special characters.
if i use only the smaller "\b" the 1word wont matchand if i only use the "\B" the !word will not match
Edit
The output should be ["word","word","word"]
and the 1 ! must not be included in the match also not in another group, this is because i want to use it with javascript .replace(regex,function(match){}) which should not loop over the 1 and !
The code i use it for
for(var i = 0; i < elements.length; i++){
text = elements[i].innerHTML;
textnew = text.replace(regexp,function(match){
matched = getCrosslink(match)[0];
return "<a href='"+matched.url+"'>"+match+"</a>";
});
elements[i].innerHTML = textnew;
}

Capturing the leading character
It's difficult to know exactly what you want without seeing more output examples, but what about looking for either starts with boundary or starts with a non-letter. Like this for example:
(\bword|[^a-zA-Z]word)
Output: ['word', '1word', '!word']
Here is a working example
Capturing only the "word"
If you only want the "word" part to be captured you can use the following and fetch the 2nd capture group:
(\b|[^a-zA-Z])(word)
Output: ['word', 'word', 'word']
Here is a working example
With replace()
You can use specific capture groups when defining the replace value, so this will work for you (where "new" is the word you want to use):
var regex = /(\b|[^a-zA-Z])(word)/g;
var text = "word 1word !word aword";
text = text.replace(regex, "$1" + "new");
output: "new 1new !new aword"
Here is a working example
If you are using a dedicated function in replace, try this:
textnew = text.replace(regexp,function (allMatch, match1, match2){
matched = getCrosslink(match2)[0];
return "<a href='"+matched.url+"'>"+match2+"</a>";
});
Here is a working example

You can use the following regex
([^a-zA-Z]|\b)(word)
Simply use replace like as
var str = "word 1word !word aword";
str.replace(/([^a-zA-Z]|\b)(word)/g,"$1"+"<a>$2</a>");
Regex

Related

Extract text containing match between new line characters

I am trying to extract paragraphs from OCR'd contracts if that paragraph contains key search terms using JS. A user might search for something such as "ship ahead" to find clauses relating to whether a certain customers orders can be shipped early.
I've been banging my head up against a regex wall for quite some time and am clearly just not grasping something.
If I have text like this and I'm searching for the word "match":
let text = "\n\nThis is an example of a paragraph that has the word I'm looking for The word is Match. \n\nThis paragraph does not have the word I want."
I would want to extract all the text between the double \n characters and not return the second sentence in that string.
I've been trying some form of:
let string = `[^\n\n]*match[^.]*\n\n`;
let re = new RegExp(string, "gi");
let body = text.match(re);
However that returns null. Oddly if I remove the periods from the string it works (sorta):
[
"This is an example of a paragraph that has the word I'm looking for The word is Match \n" +
'\n'
]
Any help would be awesome.
Extracting some text between identical delimiters containing some specific text is not quite possible without any hacks related to context matching.
Thus, you may simply split the text into paragraphs and get those containing your match:
const results = text.split(/\n{2,}/).filter(x=>/\bmatch\b/i.test(x))
You may remove word boundaries if you do not need a whole word match.
See the JavaScript demo:
let text = "\n\nThis is an example of a paragraph that has the word I'm looking for The word is Match. \n\nThis paragraph does not have the word I want.";
console.log(text.split(/\n{2,}/).filter(x=>/\bmatch\b/i.test(x)));
That's pretty easy if you use the fact that a . matches all characters except newline by default. Use regex /.*match.*/ with a greedy .* on both sides:
const text = 'aaaa\n\nbbb match ccc\n\nddd';
const regex = /.*match.*/;
console.log(text.match(regex).toString());
Output:
bbb match ccc
Here is two ways to do it. I am not sure why u need to use regular expression. Split seems much easier to do, isn't it?
const text = "\n\nThis is an example of a paragraph that has the word I'm looking for The word is Match. \n\nThis paragraph does not have the word I want."
// regular expression one
function getTextBetweenLinesUsingRegex(text) {
const regex = /\n\n([^(\n\n)]+)\n\n/;
const arr = regex.exec(text);
if (arr.length > 1) {
return arr[1];
}
return null;
}
console.log(`getTextBetweenLinesUsingRegex: ${ getTextBetweenLinesUsingRegex(text)}`);
console.log(`simple: ${text.split('\n\n')[1]}`);

Find and replace all strings with a certain length JavaScript/Google Script

I am a JavaScript/GoogleScript Rookie, so please bear with me. I am trying to create a Script in Google Docs that will be able to locate all instances of words having exactly 10 characters and append an element to them which would in turn give me a url.
Example : Here is my link pineapples
I would like to find the 10 character string, being pineapple, and add google.com/ in front of each of the strings that have a length of 10.
Giving me "Here is my link google.com/pineapples."
function myFunction() {
var str = document.getElementById(str.length=10);
var res = str.replace("str.length=10", "br"+"str.length=10");
This seems completely wrong, but all I can come up with for now.
You can make it work by using a Regex and then using a backreference to refer to the matching group.
Regex: (\S{10})
it has 3 parts
\S matches anything other than a space, tab or newline.
{10} matches the above character exactly 10 times.
() is the Capturing Group, which is used later in the regex $1.
You can get more information here which explain the above Regex in detail.
You may change it to fit your need.
var stringVal = "Here is my link pineapples";
var stringReplaced = stringVal.replace(/(\S{10})/, "google.com/$1");
console.log(stringReplaced);
Here is a possible solution:
Split your string using space as a separator (this will give you an array)
Test the length of each part in a loop
Prepend google.com/ if a part has 10 characters
Join your array and enjoy your transformed string
var str = "Here is my link pineapples",
arr = str.split(' ');
for (var i = 0; i < arr.length; i++) {
if (arr[i].length === 10) {
arr[i] = 'google.com/' + arr[i];
}
}
console.log(arr.join(' '));
Okay so bear with me, but my idea is as follows:
The text that you want to replace, are they all within elements of the same class? If so, you could do something like this (jQuery hope you don't mind)
function myFunction(){
$('myClass').each(function(){
var innerText = $(this).text();
var substring = innerText.substr(0,9);
$(this).text(substring);
}
}

Regex to get the text between two characters?

I want to replace a text after a forward slash and before a end parantheses excluding the characters.
My text:
<h3>notThisText/IWantToReplaceThis)<h3>
$('h3').text($('h3').text().replace(regEx, 'textReplaced'));
Wanted result after replace:
notThisText/textReplaced)
I have tried
regex = /([^\/]+$)+/ //replaces the parantheses as well
regex = \/([^\)]+) //replaces the slash as well
but as you can see in my comments neither of these excludes both the slash and the end parantheses. Can someone help?
A pattern like /(?<=\/)[^)]+(?=\))/ won't work in JS as its regex engine does not support a lookbehind construct. So, you should use one of the following solutions:
s.replace(/(\/)[^)]+(\))/, '$1textReplaced$2')
s.replace(/(\/)[^)]+(?=\))/, '$1textReplaced')
s.replace(/(\/)[^)]+/, '$1textReplaced')
s.replace(/\/[^)]+\)/, '/textReplaced)')
The (...) forms a capturing group that can be referenced to with $ + number, a backreference, from the replacement pattern. The first solution is consuming / and ), and puts them into capturing groups. If you need to match consecutive, overlapping matches, use the second solution (s.replace(/(\/)[^)]+(?=\))/, '$1textReplaced')). If the ) is not required at the end, the third solution (replace(/(\/)[^)]+/, '$1textReplaced')) will do. The last solution (s.replace(/\/[^)]+\)/, '/textReplaced)')) will work if the / and ) are static values known beforehand.
You can use str.split('/')
var text = 'notThisText/IWantToReplaceThis';
var splited = text.split('/');
splited[1] = 'yourDesireText';
var output = splited.join('/');
console.log(output);
Try Following: In your case startChar='/', endChar = ')', origString=$('h3').text()
function customReplace(startChar, endChar, origString, replaceWith){
var strArray = origString.split(startChar);
return strArray[0] + startChar + replaceWith + endChar;
}
First of all, you didn't define clearly what is the format of the text which you want to replace and the non-replacement part. For example,
Does notThisText contain any slash /?
Does IWantToReplaceThis contain any parentheses )?
Since there are too many uncertainties, the answer here only shows up the pattern exactly matches your example:
yourText.replace(/(\/).*?(\))/g, '$1textReplaced$2')
var text = "notThisText/IWantToReplaceThis";
text = text.replace(/\/.*/, "/whatever");
output : "notThisText/whatever"`

JavaScript: Replacing characters on both sides of a string

What I want to do is to match characters enclosed by ^^ and replace those ^^ while maintaining the string. In other words, turning this:
^^This is a test^^ this is ^^another test^^
into this:
<sc>This is a test</sc> this is <sc>another test</sc>
I got the regex to match them:
\^{2}[^^]+\^{2}
But I'm stuck there. I'm not sure what to do with the other .replace parameter:
.replace(/\^{2}[^^]+\^{2}/g, WHAT_TO_ADD_HERE?)
Any ideas?
You can use replace with regex and grouping like
var text = '^^This is a test^^ this is ^^another test^^'.replace(/\^\^(.*?)\^\^/g, '<sc>$1</sc>')
Here is a piece of code you can use:
var re = /(\^{2})([^^]+)(\^{2})/g;
var str = '^^This is a test^^ this is ^^another test^^\n\n<sc>This is a test</sc> this is <sc>another test</sc>';
var subst = '<sc>$2</sc>';
var result = str.replace(re, subst);
This is just an enhancement of your regex pattern where I added capturing groups. To improve performance and ensure you will be capturing all symbols between the ^^, you can use only one capturing group and . symbol with non-greedy quantificator:
var re = /\^{2}(.+?)\^{2}/g;
Have a look at the example.
In this case you need to use the group index to wrap the content.
var content = "^^This is a test^^ this is ^^another test^^";
content.replace(/\^{2}(.*?)\^{2}/g, '<sc>$1</sc>');
The (.*?) will help you to group the content and in your replace statement use $1 where 1 is the index of group.

Split mulitple part of string in some html div with id

HI i need to split some part of variable value
in my html file i got a dynamic value of variable some thing like this
product/roe_anythin_anything-1.jpg
product/soe_anything_anything-2.jpg
i need to remove the before
/slashpart
and after
_ part
which should return the roe or soe part
i have use a function
<script>
function splitSize(){
$('#splitSize').each(function(index) {
var mystr = $(this).html();
var mystr1 = /product\/(.*)-.*/.exec(mystr);
$(this).html(mystr1[1]);
//$(this).html(mystr1[0]);
});
}
splitSize();
</script>
with which i got roe_anythin_anything successfully i just need to remove now after `
_ part
`
please suggest how can i do this
This is as you asked using split . You can use RegEx to make it simpler
var myStr = 'product/roe-1.jpg' ;
myStr = myStr.split('/')[1];
myStr = myStr.split('-')[0];
Working JS Fiddle
Use regex group capture
var myStr = 'product/roe-1.jpg';
var result = /product\/(.*)-.*/.exec(myStr)[1];
Break down:
/product\/
matches the initial product string and the / character (escaped so its not interpreted as the end of the regex)
The
(.*)
Matches your roe characters and keeps them in a 'capture group' - everything up to but not including the hyphen
Then the hyphen is matched, then anything else.
This returns a 2 element array. Item 0 is the whole string, item 1 is the contents of the capture group.
See How do you access the matched groups in a JavaScript regular expression? for more details

Categories