I want to implement a sort of Glossary in a Velocity template (using Javascript). The following is the use case:
there are a large number of items whose description may contain references to predefined terms
there is a list of terms which are defined -> allGlossary
I want to automatically find and mark all items in the allGlossary list that appear in the description of all my items
Example:
allGlossary = ["GUI","RBG","fine","Color Range"]
Item Description: The interface (GUI) shall be generated using only a pre-defined RGB color range.
After running the script, I would expect the Description to look like this:
"The interface (GUI) shall be generated using only a pre-defined RGB Color Range."
NOTE: even though "fine" does appear in the Description (defined), it shall not be marked.
I thought of splitting the description of each item into words but then I miss all the glossary items which have more than 1 word. My current idea is to look for each item in the list in each of the descriptions but I have the following limitations:
I need to find only matches for the exact items in the 2 lists (single and multiple words)
The search has to be case insensitive
Items found may be at the beginning or end of the text and separated by various symbols: space, comma, period, parentheses, brackets, etc.
I have the following code which works but is not case-insensitive:
#set($desc = $item.description)
#foreach($g in $allGlossary)
#set($desc = $desc.replaceAll("\b$g\b", "*$g*"))
#end##foreach
Can someone please help with making this case-insensitive? Or does anyone have a better way to do this?
Thanks!
UPDATE:
based on the answer below, I tried to do the following in my Velocity Template page:
#set($allGlossary = ["GUI","RGB","fine","Color Range"])
#set($itemDescription = "The interface (GUI) shall be generated using only a pre-defined RGB color range.")
<script type="text/javascript">
var allGlossary = new Array();
var itemDescription = "$itemDescription";
</script>
#foreach($a in $allGlossary)
<script type="text/javascript">
allGlossary.push("$a");
console.log(allGlossary);
</script>
#end##foreach
<script type="text/javascript">
console.log(allGlossary[0]);
</script>
The issue is that if I try to display the whole allGlossary Array, it contains the correct elements. As soon as I try to display just one of them (as in the example), I get the error Uncaught SyntaxError: missing ) after argument list.
You mentioned, that you are using JavaScript for these calculations. So one simple way would be to just iterate over your allGlossary array and create a regular expression for each iteration and use this expression to find and replace all occurrences in the text.
To find only values which are between word boundaries, you can use \b. This allows matches like (RGB) or Color Range?. To match case insensitive, you can use the /i flag and to find every instance in a string (not just the first one), you can use the global flag /g.
Dynamic creation of regular expressions (with variables in it) is only supported in JavaScript, if you use the constructor notation of a regular expression (don't forget to escape slashes). For static regular expressions, you could also use: /\bRGB\b/ig. Here is the dynamic one:
new RegExp("\\b("+item+")\\b", 'gi');
Here is a fully functional example based on your sample string. It replaces every item of the allGlossary array with a HTML wrapped version of it.
var allGlossary = ["GUI","RGB","fine","Color Range"]
var itemDescription = "The interface (GUI) shall be generated using only a pre-defined RGB color range.";
for(var i=0; i<allGlossary.length; i++) {
var item = allGlossary[i];
var regex = new RegExp("\\b("+item+")\\b", 'gi');
itemDescription = itemDescription.replace(regex, "<b>$1</b>");
}
console.log(itemDescription);
If this is not your expected solution, you can leave a comment below.
Related
I have used existing Javascript functions that I found online but I need more control and I think regular expressions will allow it using flags to control if case sensitive or not, multiline etc.
var words=['one','two','cat','oranges'];
var string='The cat ate two oranges and one mouse.';
check=string.match(pattern);
pattern=???;
if(check!==null){
//string matches all array elements
}else{
//string does not match all array words
}
what would the pattern be and if it can be constructed using javascript using the array as a source?
***I will need to place the same function on the backend in PHP and so it would be easier just to create a regular expression and use it instead of looping and finding alternatives for this to work in PHP.
***I would love to have as many options including changes, replace, count and regular expressions are meant for this. And on the plus side the speed should be better using regex instead of looping(search the whole text for every element in the array) specially in case of a long array and a long string.
var words=['one','two','cat','oranges'];
let string='The cat ate two oranges and one mouse.';
words=words.map(function(value,index){return '(?=(.)*?\\b('+value+')\\b)'; }).join('');
let pattern=new RegExp(`${words}((.)+)`,'g');
if(string.match(pattern)!==null){
//string matches all elements
}else{
//string does not match all words
}
It will look for the exact word match, and you will have the extra control you wanted using regex flags for case insensitive..
if you want to test it or adjust it you can do it here:
doregex.com
You can use the same regex to make changes within the text using a callback function.
You can create RegExp objects from your strings, this will allow you further control over case sensitivity etc.
For example:
const patterns = [ 'ONE','two','cat','oranges'];
const string = 'The cat ate two oranges and one mouse.';
// Match in a case sensitive way
const result = patterns.every(pattern => new RegExp(pattern, 'i').test(string));
console.log("All patterns match:", result);
I am new to angular js . I have regex which gets all the anchor tags. My reg ex is
/<a[^>]*>([^<]+)<\/a>/g
And I am using the match function here like ,
var str = 'abc.jagadale#gmail.com'
So Now I am using the code like
var value = str.match(/<a[^>]*>([^<]+)<\/a>/g);
So, Here I am expecting the output to be abc.jagadale#gmail.com , But I am getting the exact same string as a input string . can any one please help me with this ? Thanks in advance.
Why are you trying to reinvent the wheel?
You are trying to parse the HTML string with a regex it will be a very complicated task, just use DOM or jQuery to get the links contents, they are made for this.
Put the HTML string as the HTML of a jQuery/DOM element.
Then fetch this created DOM element to get all the a elements
inside it and return their contents in an array.
This is how should be your code:
var str = 'abc.jagadale#gmail.com';
var results = [];
$("<div></div>").html(str).find("a").each(function(l) {
results.push($(this).text());
});
Demo:
var str = 'abc.jagadale#gmail.com';
var results = [];
$("<div></div>").html(str).find("a").each(function(l) {
results.push($(this).text());
});
console.log(results);
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
You need to capture the group inside the anchor tags. The regular expression already matches the inner group ([^<]+) But, when matching there are different ways to extract that inner text.
When using the Match function it will return an array of matched elements, the first one, will match the whole regular expression and the following elements will match the included groups in the regular expression.
Try this:
var reg = /<a[^>]*>([^<]+)<\/a>/g
reg.exec(str)[1]
Also the match function will return an array only if the g flag is not present.
Check https://javascript.info/regexp-groups for further documentation.
Brief
Don't use regex for this. Regex is a great tool, don't get me wrong, but it's not what you're looking for. Regex cannot properly parse HTML and should only be used to do so if it's a limited, known set of HTML.
Try, for example, adding content:">" to your style attribute. You'll see your pattern now fails or gives you an incorrect result. I don't like to use this quote all the time, but I think it's necessary to use it in this case:
Some people, when confronted with a problem, think "I know, I'll use
regular expressions." Now they have two problems.
Use builtin functions. jQuery makes this super easy to accomplish. See my Code section for a demonstration. It's way more legible than any regex variant.
Code
DOM from page
The following snippet gets all anchors on the actual page.
$("a").each(function() {
console.log($(this).text())
})
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
abc.jagadale#gmail.com
abc2.jagadale#gmail.com
DOM in string
The following snippet gets all anchors in the string (converted to DOM element)
var s = `email3#domain.com
email4#domain.com`
$("<div></div>").html(s).find("a").each(function() {
console.log($(this).text())
})
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
email1#domain.com
email2#domain.com
Given the use case of parsing a string, instead of having an actual DOM to work with, it does seem like regex is the way to go, unless you want to load the HTML into a document fragment and parse that.
One way to get all of your matches is to make use of split:
var htmlstr = "<p><a href='url'>asdf#bsdf.com</a></p>"
var matches = htmlstr.split(/<a.+?>([A-Za-z.#]+)<\/a>/).filter((t, i) => i % 2)
Using a regex with split returns all of the matches along with the text around them, then filtering by index % 2 will pare it down to just the regex matches.
I have a stringified JSON which looks like this:
...
"message":null,"elementId:["xyz1","l9ie","xyz1"]}}]}], "startIndex":"1",
"transitionTime":"3","sourceId":"xyz1","isLocked":false,"autoplay":false
,"mutevideo":false,"loopvideo":false,"soundonhover":false,"videoCntrlVisibility":0,
...,"elementId:["dgff","xyz1","jkh90"]}}]}]
... it goes on.
The part I need to work on is the value of the elementId key. (The 2nd key in the first line, and the last key).
This key is present in multiple places in the JSON string. The value of this key is an array containing 4-character ids.
I need to replace one of these ids with a new one.
The kernel of the idea is something like:
var elemId = 'xyz1' // for instance
var regex = new RegExp(elemId, 'g');
var newString = jsonString.replace(regex, newRandomId);
jsonString = newString;
There are a couple of problems with this approach. The regex will match the id anywhere in the JSON. I need a regex which only matches it inside the elementId array; and nowhere else.
I'm trying to use a capturing group to match just the occurrences I need, but I can't quite crack it. I have:
/.*elementId":\[".*(xyz1).*"\]}}]/
But this doesn't match the 1st occurence of 'xyz1 in the array.
So, firstly, I need a regex which can match all the 'xyz1's inside elementId; but nowhere else. The sequence of square and curly brackets after elementId ends doesn't change anywhere in the string, if that helps.
Secondly, even if I have a capturing group that works, string.replace doesn't act as expected. Instead of replacing just the match inside the capturing group, it replaces the whole match.
So, my second requirement is replacing only the captured groups, not the whole match.
What a need is a piece of js code which will replace my 'xyz1's where needed and return the following string (assuming the newRandomId is 'abcd'):
"message":null,"elementId:["abcd","l9ie","abcd"]}}]}], "startIndex":"1",
"transitionTime":"3","sourceId":"xyz1","isLocked":false,"autoplay":false
,"mutevideo":false,"loopvideo":false,"soundonhover":false,"videoCntrlVisibility":0,
...,"elementId:["dgff","abcd","jkh9"]}}]}]
Note that the value of 'sourceId' is unaffected.
EDIT: I have to work with the JSON. I can't parse it and work with the object since I don't know all the places the old id might be in the object and looping through it multiple times (for multiple elements) would be time-consuming
Assuming you can't just parse and change the JS object, you could use 2 regexes: one to extract the array and the one to change the desired ids inside:
var output = input.replace(/("elementId"\s*:\s*\[)((?:".{4}",?)*)(\])/g, function(_,start,content,end){
return start + content.replace(/"xyz1"/g, '"rand"') + end;
});
The arguments _, start, content, end are produced as result of the regex (documentation here):
_ is the whole matched string (from "elementId:\[ to ]). I choose this name because it's an old convention for arguments you don't use
start is the first group ("elementId:\[)
content is the second captured group, that is the internal part of the array
end id the third group, ]
Using the groups instead of hardcoding the start and end parts in the returned string serves two purposes
avoid duplication (DRY principle)
make it possible to have variable strings (for example in my regex I accept optional spaces after the :)
var input = document.getElementById("input").innerHTML.trim();
var output = input.replace(/("elementId":\s*\[)((?:".{4}",?)*)(\])/g, function(_,start,content,end){
return start + content.replace(/"xyz1"/g, '"rand"') + end;
});
document.getElementById("output").innerHTML = output;
Input:
<pre id=input>
"message":null,"elementId":["xyz1","l9ie","xyz1"]}}]}], "startIndex":"1",
"transitionTime":"3","sourceId":"xyz1","isLocked":false,"autoplay":false
,"mutevideo":false,"loopvideo":false,"soundonhover":false,"videoCntrlVisibility":0,
...,"elementId":["dgff","xyz1","jkh9"]}}]}]
</pre>
Output:
<pre id=output>
</pre>
Notes:
it would be easy to do the whole operation in one regex if they weren't repetition of the searched id in one array. But the present structure makes it easy to handle several ids to replace at once.
I use non captured groups (?:...) in order to unclutter the arguments passed to the external replacing callback
I am trying to achieve wild card search on a string array using java script
Here the wild cards i use are ? -to represent single char and * to represent multiple char
here is my string array
var sample = new Array();
sample[0] = 'abstract';
sample[1] = 'cabinet';
sample[2] = 'computer';
For example i searched for a string 'ab*t' in the array and the regular expression I used for this is '\ab.*t\', but the problem is I get both 'abstract' and 'cabinet' as matching strings. I only want the string that starts with 'ab' and not where it comes in the middle.
So I modified my regexp like this '\^ab.*t$\ but still the same result. So can somebody give me some tips on how this can be achieved.
You are using using wrong wrong slashs you should use forward slash('/') instead of backward slashs ('\')
probably it'll help you /^ab.*t$/
Ok, So I hit a little bit of a snag trying to make a regex.
Essentially, I want a string like:
error=some=new item user=max dateFrom=2013-01-15T05:00:00.000Z dateTo=2013-01-16T05:00:00.000Z
to be parsed to read
error=some=new item
user=max
dateFrom=2013-01-15T05:00:00.000Z
ateTo=2013-01-16T05:00:00.000Z
So I want it to pull known keywords, and ignore other strings that have =.
My current regex looks like this:
(error|user|dateFrom|dateTo|timeFrom|timeTo|hang)\=[\w\s\f\-\:]+(?![(error|user|dateFrom|dateTo|timeFrom|timeTo|hang)\=])
So I'm using known keywords to be used dynamically so I can list them as being know.
How could I write it to include this requirement?
You could use a replace like so:
var input = "error=some=new item user=max dateFrom=2013-01-15T05:00:00.000Z dateTo=2013-01-16T05:00:00.000Z";
var result = input.replace(/\s*\b((?:error|user|dateFrom|dateTo|timeFrom|timeTo|hang)=)/g, "\n$1");
result = result.replace(/^\r?\n/, ""); // remove the first line
Result:
error=some=new item
user=max
dateFrom=2013-01-15T05:00:00.000Z
dateTo=2013-01-16T05:00:00.000Z
Another way to tokenize the string:
var tokens = inputString.split(/ (?=[^= ]+=)/);
The regex looks for space that is succeeded by (a non-space-non-equal-sign sequence that ends with a =), and split at those spaces.
Result:
["error=some=new item", "user=max", "dateFrom=2013-01-15T05:00:00.000Z", "dateTo=2013-01-16T05:00:00.000Z"]
Using the technique above and adapt your regex from your question:
var tokens = inputString.split(/(?=\b(?:error|user|dateFrom|dateTo|timeFrom|timeTo|hang)=)/);
This will correctly split the input pointed out by Qtax mentioned in the comment: "error=user=max foo=bar"
["error=", "user=max foo=bar"]