Google Docs API - Text.replaceText regex issues - javascript

I am trying to do something really really basic.
It's just a search and replace using this function, which uses some proprietary Regex I never used before.
https://developers.google.com/apps-script/reference/document/text#replaceText(String,String)
What I am trying to accomplish is simple, run through the whole document and replace placeholders with text.
The string to match is in this format:
#replace this please#
By using this pattern:
(\W|^)#replace this please#(\W|$)
copied from the Google Examples found here (https://support.google.com/a/answer/1371417?hl=en)
It works absolutely fine for one exception which bugs me out.
If I have 2 or more placeholders on the same line, it won't match any of them.
So if I have something like this:
#replace me please# and some normal text here #replace me too#
None of those 2 will be matched.
I am assuming my expression doesn't take this into account, but the documentation is very hard to find for their implementation of regular expressions.
Can anybody help please?

Having this line in the document:
You may try using the following regex replacement function:
function googleDocsApi27827395() {
var body = DocumentApp.getActiveDocument().getBody();
body.replaceText("(\\W|^)#replace this please#(\\W|$)", "");
}
The result:
The \\W also matches the adjacent symbol after the first and before the last search word and they are also removed. If you do not need that behavior, remove the (\\W|^) and (\\W|$).
In case you have 3 different strings in between #...#s, you can use alternations to build the regex:
body.replaceText("#(replace this please|replace me (please|too))#", "");
This line #replace me please# and #replace this please# some normal text here #replace me too# will turn into and some normal text here.

Related

Regex that matches everything except given full word

This is similar to javascript regular expression to not match a word, but the accepted answer shows a regex that doesn't match if the word is inside the line. I want to match everything except the full word. It will match "__lambda__".
This is also similar to my question Regex that match everything except the list of strings but that one gives a solution with String::split and I want to use normal match full string.
For example, it should match everything except /^lambda$/ I would prefer to have a regex for this, since it will fit my pattern patching function. It should match lambda_ or _lambda and everything else - except full "lambda" token.
I've tried this:
/^(.(?!lambda))+$/
and
/^((?!lambda).)+$/
this works
/^(.*(?<!lambda))$/
but I would prefer to not have to use a negative lookbehind if I can avoid it (because of browser support). Also, I have interpreter written in JavaScript that I need this for and I would like to use this in guest language (I will not be able to compile the regex with babel).
Is something like this possible with regex without lookbehind?
EDIT:
THIS IS NOT DUPLICATE: I don't want to test if something contains or doesn't contain the word, but if something is not exact word. There are not question on Stack Overflow like that.
Question Regex: Match word not containing has almost as correct an answer as mine, but it's not the full answer to the question. (It would help in finding solution, though.)
I was able to find the solution based on How to negate specific word in regex?
var re = /^(?!.*\blambda\b).*$/;
var words = ['lambda', '_lambda', 'lambda_', '_lambda_', 'anything'];
words.forEach(word => {
console.log({word, match: word.match(re)});
});

javascript regex to match all custom commented out sections, but not other text

Say I have a html doc like so:
<!--FOO-->
some text
<!--BAR-->
some other text
<!--FOO-->
some more text
<!--BAR-->
How can I write a javascript regex that matches both cases of
<!--FOO-->anytext<!--BAR-->
but not the text in between ('some other text' in this case).
My regex that I thought would work is
/<!--FOO-->(.|\n)*<!--BAR-->/
but it catches the 'some other text' as well.
You need the non-greedy operator ?, like this:
/<!--FOO-->(.|\n)*?<!--BAR-->/
Demo
A slightly better version would be this, letting you actually capture the text between the comments:
/<!--FOO-->((?:\n|.)*?)<!--BAR-->/
Demo
That said, parsing HTML with regex rarely ends well... See here for the classic explanation of the problem. You are better off using a library, unless your parsing is limited to the very simple case in your question.

RegExp Expression any multiple characters with linebreaks and whitespaces

My regex is for finding certain words in text, and not words inside elemental text.
REGEXP
RegExp('\\b([^<(.*?)>(.?+)<\/(.*?)>])(' + wregex.join('|') + ')\\b(?=\\W)
EXAMPLE
This is some text that should be looked through
though this text <code>Should not be looked at </code> and this text is ok to
look at
So I'll explain my method of my regex Expression which I am having trouble with
([^<(.*?)>(.?+)<\/(.*?)>]) Do Not match any text that starts with <element> nothing inside here until this </element>
Thats the most important so I've tried multiple methods and not sure if this regex is possible. I don't want to match anything starting with a basic html element tag until the ending tag appears then start over searching.
EDIT
I know that RegEx shouldn't be used to parse HTML this is looking through TEXT
Testing Example HERE
Assuming that the text you are searching over is correctly formed (as in, no tag mismatches) the following regex should work:
^([^<]*<([^>]*)>[^<]*</\2>)*[^<]Your Text
This insures that you text is outside of an open and closed set of tags by matching all open and closed sets before getting to your text.
It won't work for nested tags. Regex is incapable of parsing arbitrarily nested tags.
However, please remember, you should not parse html with regex
Why crum everything in a single regex? It can be as simple as this. Notice that I'm using [^] instead of ., to also match newlines.
string.replace(/<[^]+?<\/[^]+?>/, '').match(/what i really want to find/gi)
And yes, this is prone to breakage, as any regex solution would be.

Parsing Phrases with a Pipe Character Using JavaScript

I've been working on my Safari extension for saving content to Instapaper and have been working on enhancing my title parsing for bookmarks. For example, an article that I recently saved has a tag that looks like this:
Report: Bing Users Disproportionately Affected By Malware Redirects | TechCrunch
I want to use the JavaScript in my Safari extension to remove all of the text after the pipe character so that I can make the final bookmark look neater once it is saved to Instapaper.
I've attempted the title parsing successfully in a couple of similar cases using blocks of code that look like this:
if(safari.application.activeBrowserWindow.activeTab.title.search(' - ') != -1) {
console.log(safari.application.activeBrowserWindow.activeTab.title);
console.log(safari.application.activeBrowserWindow.activeTab.title.search(' - '));
var parsedTitle = safari.application.activeBrowserWindow.activeTab.title.substring(0, safari.application.activeBrowserWindow.activeTab.title.search(' - '));
console.log(parsedTitle);
};
I started getting thrown for a loop once I tried doing this same thing with the pipe character; however, since JavaScript uses it as a special character. I've tried several bits of code to try and solve this problem. The most recent looks like this (attempting to use regular expressions and escape the pipe character):
if(safari.application.activeBrowserWindow.activeTab.title.search('/\|') != -1) {
console.log(safari.application.activeBrowserWindow.activeTab.title);
console.log(safari.application.activeBrowserWindow.activeTab.title.search('/\|'));
var parsedTitle = safari.application.activeBrowserWindow.activeTab.title.substring(0, safari.application.activeBrowserWindow.activeTab.title.search('/\|'));
console.log(parsedTitle);
};
If anybody could give me a tip that works for this, your help would be greatly appreciated!
Your regex is malformed. It should be:
safari.application.activeBrowserWindow.activeTab.title.search(/\|/)
Note the lack of quotes; I'm using a regex literal here. Also, regex literals need to be bound by /.
Instead of searching and then replacing, you can simply do a replace with the following regex:
str = str.replace(/\|.*$/, "");
This will remove everything after the | character if it exists.

match text between two html custom tags but not other custom tags

I have something like the following;-
<--customMarker>Test1<--/customMarker>
<--customMarker key='myKEY'>Test2<--/customMarker>
<--customMarker>Test3 <--customInnerMarker>Test4<--/customInnerMarker> <--/customMarker>
I need to be able to replace text between the customMarker tags, I tried the following;-
str.replace(/<--customMarker>(.*?)<--\/customMarker>/g, 'item Replaced')
which works ok. I would like to also ignore custom inner tags and not match or replace them with text.
Also I need a separate expression to extract the value of the attribute key='myKEY' from the tag with Text2.
Many thanks
EDIT
actually I am trying to find things between comment tags but the comment tags were not displaying correctly so I had to remove the '!'. There's a unique situation that required comment tags... in anycase if anyone knows enough regex to help, it would be great. thank u.
In the end, I did something like the following (incase anyone else needs this. enjoy!!! But note: Word about town is that using regex with html tags is not ideal, so do your own research and make up your mind. For me, it had to be done this way, mostly bcos i wanted to, but also bcos it simplified the job in this instance);-
var retVal = str.replace(/<--customMarker>(.*?)<--\/customMarker>/g, function(token, match){
//question 1: I would like to also ignore custom inner tags and not match or replace them with text.
//answer:
var replacePattern = /<--customInnerMarker*?(.*?)<--\/customInnerMarker-->/g;
//remove inner tags from match
match = $.trim(match.replace(replacePattern, ''));
//replace and return what is left with a required value
return token.replace(match, objParams[match]);
//question 2: Also I need a separate expression to extract the value of the attribute key='myKEY' from the tag with Text2.
//answer
var attrPattern = /\w+\s*=\s*".*?"/g;
attrMatches = token.match(attrPattern);//returns a list of attributes as name/value pairs in an array
})
Can't you use <customMarker> instead? Then you can just use getElementsByTagName('customMarker') and get the inner text and child elements from it.
A regex merely matches an item. Once you have said match, it is up to you what you do with it. This is part of the problem most people have with using regular expressions, they try and combine the three different steps. The regex match is just the first step.
What you are asking for will not be possible with a single regex. You're going to need a mini state machine if you want to use regular expressions. That is, a logic wrapper around the matches such that it moves through each logical portion.
I would advise you look in the standard api for a prebuilt engine to parse html, rather than rolling your own. If you do need to do so, read the flex manual to get a basic understanding of how regular expressions work, and the state machines you build with them. The best example would be the section on matching multiline c comments.

Categories