I am working on a functionality which will convert matching tags or keywords into links inside particular DIV tag.
Background: I store article body & keywords related to articles in database & while display article in a web page i pass keywords as and array to jQuery function which then search's through the text inside <div id ="article-detail-desc" > ...</div> and converts each matching element into link.
My code works fine but it has flows.
It doen't search for words it search's for any match even if it is part of a word or HTML element which breaks my HTML code.
how can this function be modified so that it also search for matching words
function HighlightKeywords(keywords)
{
var el = $("#article-detail-desc");
var language = "en-US";
var pid = 100;
var issueID = 10;
$(keywords).each(function()
{
var pattern = new RegExp("("+this+")", ["gi"]);
var rs = "<a class='ad-keyword-selected' href='en/search.aspx?Language="+language+"&PageId="+pid+"&issue="+issueID+"&search=$1' title='Seach website for: $1'><span style='color:#990044; tex-decoration:none;'>$1</span></a>";
el.html(el.html().replace(pattern, rs));
});
}
HighlightKeywords(["Amazon","Google","Starbucks","UK","US","tax havens","Singapore","Hong Kong","Dubai","New Jersey"]);
Link on fiddle http://jsfiddle.net/Dgysc/25/
I think the easiest way would be to use word boundaries. So you'd have that:
var pattern = new RegExp("(\\b"+this+"\\b)", ["gi"]);
Edit:
Quick hack to be sure it's not matching the US inside the html elements:
var pattern = new RegExp("(\\b"+this+"\\b)(?![^<]*?>)", ["gi"]);
SInce we are using direct word matching, adding space before and after the keywords may help you,
var pattern = new RegExp("( "+this+" )", ["gi"]);
http://jsfiddle.net/Dgysc/28/
Related
I'm writing a program to scrape HTML text contained within a string variable and pick up all instances of text such as: Example for both h2 and h3 headers. I figured the best way to do this would be using RegExp, but I'm not exactly sure what the syntax for this should be. I'm implementing this within Google Apps Script and have the following code thus far for this function (I've omitted the url).
function scraper(){
var mainSheet =
SpreadsheetApp.getActiveSpreadsheet().getSheetByName("Sheet1");
var url = "";
var xml = UrlFetchApp.fetch(url).getContentText();
var re = new RegExp();
}
I'm making an online text editor for a website I'm building, and I use custom tags for the markup.
To make it easier to read, the markup is highlighted by blue, which I do buy using the following function:
var imgOccurences = (informationText.match(/\[img/gi)).length;
for(var i = 0; i < imgOccurences; i++){
var imgLocation = informationText.indexOf('[img');
var endImgLocation = informationText.indexOf(']', imgLocation+1);
if(imgLocation != -1 && endImgLocation != -1){
var informationTextTemp1 = informationText.slice(0, imgLocation);
var informationTextTemp2 = informationText.slice(endImgLocation+1, -1);
var informationTextTemp3 = informationText.slice(imgLocation, endImgLocation+1);
informationTextTemp3 = "<span class='highlightWord'>"+informationTextTemp3+"</span>";
informationText = informationTextTemp1 + informationTextTemp3 + informationTextTemp2;
}
}
However the problem I face is that, when normalizing the text to HTML, I cannot use regex expressions, which I was previously using with the other tags, on the [img] tag, due to the fact that I wanted to highlight the image tag, and all of its contents, which includes a URL.
So I decided to count up all the occurrences of just the '[img' part of the [img] tag and then look for the next occurrence of ']', then slice it out of the normal text, then highlight it using a span, and then add it back to the normal text, while I put it in a for loop.
However only the first occurrence of the [img] tag is highlighted, and I am unsure as to how I should deal with this. Any help would be appreciated.
Basically I need to get everything which looks like: [img src='www.example.com/image.png']and make it look like:<span class='highlightWord'>[img src='example.com/image.png']</span> and then put it into the .innerHTML of the div called textHighlights.
Expected result:
The result I got:
You can do it much simpler since the .replace method accepts a regular expression as a parameter for the matching string.
informationText = informationText.replace(/(\[img.+?\])/gi, '<span class="highlightWord">$1</span>');
The above will replace all matches directly (by wrapping them in the span you want)
UPDATE: I am no longer specifically in need of the answer to this question - I was able to solve the (larger) problem I had in an entirely different way (see my comment). However, I'll check in occasionally, and if a viable answer arrives, I'll accept it. (It may take a week or three, though, as I'm only here sporadically.)
I have a string. It may or may not have HTML tags in it. So, it could be:
'This is my unspanned string'
or it could be:
'<span class="someclass">This is my spanned string</span>'
or:
'<span class="no-text"></span><span class="some-class"><span class="other-class">This is my spanned string</span></span>'
or:
'<span class="no-text"><span class="silly-example"></span></span><span class="some-class">This is my spanned string</span>'
I want to find the index of a substring, but only in the portion of the string that, if the string were turned into a DOM element, would be (a) TEXT node(s). In the example, only in the part of the string that has the plain text This is my string.
However, I need the location of the substring in the whole string, not only in the plain text portion.
So, if I'm searching for "span" in each of the strings above:
searching the first one will return 13 (0-based),
searching the second will skip the opening span tag in the string and return 35 for the string span in the word spanned
searching the third will skip the empty span tag and the openings of the two nested span tags, and return 91
searching the fourth will skip the nested span tags and the opening of the second span tag, and return 100
I don't want to remove any of the HTML tags, I just don't want them included in the search.
I'm aware that attempting to use regex is almost certainly a bad idea, probably even for simplistic strings as my code will be encountering, so please refrain from suggesting it.
I'm guessing I will need to use an HTML parser (something I've never done before). Is there one with which I can access the original parsed strings (or at least their lengths) for each node?
Might there be a simpler solution than that?
I did search around and wasn't been able to find anyone ask this particular question before, so if someone knows of something I missed, I apologize for faulty search skills.
The search could loop through the string char by char. If inside a tag, skip the tag, search the string only outside tags and remember partial match in case the text is matched partially then interrupted with another tag, continue the search outside the tag.
Here is a little function I came up with:
function customSearch(haysack,needle){
var start = 0;
var a = haysack.indexOf(needle,start);
var b = haysack.indexOf('<',start);
while(b < a && b != -1){
start = haysack.indexOf('>',b) + 1;
a = haysack.indexOf(needle,start);
b = haysack.indexOf('<',start);
}
return a;
}
It returns the results you expected based in your examples. Here is a JSFiddle where the results are logged in the console.
Let's start with your third example:
var desiredSubString = 'span';
var entireString = '<span class="no-text"></span><span class="some-class"><span class="other-class">This is my spanned string</span></span>';
Remove all HTML elements from entireString, above, to establish textString:
var textString = entireString.replace(/(data-([^"]+"[^"]+")/ig,"");
textString = textString.replace(/(<([^>]+)>)/ig,"");
You can then find the index of the start of the textString within the entireString:
var indexOfTextString = entireString.indexOf(textString);
Then you can find the index of the start of the substring you're looking for within the textString:
var indexOfSubStringWithinTextString = textString.indexOf(desiredSubString);
Finally you can add indexOfTextString and indexOfSubStringWithinTextString together:
var indexOfSubString = indexOfTextString + indexOfSubStringWithinTextString;
Putting it all together:
var entireString = '<span class="no-text"></span><span class="some-class"><span class="other-class">This is my spanned string</span></span>';
var desiredSubString = 'span';
var textString = entireString.replace(/(data-([^"]+"[^"]+")/ig,"");
textString = textString.replace(/(<([^>]+)>)/ig,"");
var indexOfTextString = entireString.indexOf(textString);
var indexOfSubStringWithinTextString = textString.indexOf(desiredSubString);
var indexOfSubString = indexOfTextString + indexOfSubStringWithinTextString;
You could use the browser's own HTML parser and XPath engine to search only inside the text nodes and do whatever processing you need.
Here's a partial solution:
var haystack = ' <span class="no-text"></span><span class="some-class"><span class="other-class">This is my spanned string</span></span>';
var needle = 'span';
var elt = document.createElement('elt');
elt.innerHTML = haystack;
var iter = document.evaluate('.//text()[contains(., "' + needle + '")]', elt).iterateNext();
if (iter) {
var position = iter.textContent.indexOf(needle);
var range = document.createRange();
range.setStart(iter, position);
range.setEnd(iter, position + needle.length);
// At this point, range points at the first occurence of `needle`
// in `haystack`. You can now delete it, replace it with something
// else, and so on, and after that, set your original string to the
// innerHTML of the document fragment representing the range.
console.log(range);
}
JSFiddle.
I'm retrieving tweets from Twitter with the Twitter API and displaying them in my own client.
However, I'm having some difficulty properly highlighting the right search terms. I want to an effect like the following:
The way I'm trying to do this in JS is with a function called highlightSearchTerms(), which takes the text of the tweet and an array of keywords to bold as arguments. It returns the text of the fixed tweet. I'm bolding keywords by wrapping them in a that has the class .search-term.
I'm having a lot of problems, which include:
Running a simple replace doesn't preserve case
There is a lot of conflict with the keyword being in href tags
If I try to do a for loop with a replace, I don't know how to only modify search terms that aren't in an href, and that I haven't already wrapped with the span above
An example tweet I want to be able to handle for:
Input:
This is a keyword. This is a <a href="http://search.twitter.com/q=%23keyword">
#keyword</a> with a hashtag. This is a link with kEyWoRd:
http://thiskeyword.com.
Expected Output:
This is a
<span class="search-term">keyword</span>
. This is a <a href="http://search.twitter.com/q=%23keyword"> #
<span class="search-term">keyword</span>
</a> with a hashtag. This is a link with
<span class="search-term">kEyWoRd</span>
:<a href="http://thiskeyword.com">http://this
<span class="search-term>keyword.com</span>
</a>.
I've tried many things, but unfortunately I can't quite find out the right way to tackle the problem. Any advice at all would be greatly appreciated.
Here is my code that works for some cases but ultimately doesn't do what I want. It fails to handle for when the keyword is in the later half of the link (e.g. http://twitter.com/this_keyword). Sometimes it strangely also highlights 2 characters before a keyword as well. I doubt the best solution would resemble my code too much.
function _highlightSearchTerms(text, keywords){
for (var i=0;i<keywords.length;i++) {
// create regex to find all instances of the keyword, catch the links that potentially come before so we can filter them out in the next step
var searchString = new RegExp("[http://twitter.com/||q=%23]*"+keywords[i], "ig");
// create an array of all the matched keyword terms in the tweet, we can't simply run a replace all as we need them to retain their initial case
var keywordOccurencesInitial = text.match(searchString);
// create an array of the keyword occurences we want to actually use, I'm sure there's a better way to create this array but rather than try to optimize, I just worked with code I know should work because my problem isn't centered around this block
var keywordOccurences = [];
if (keywordOccurencesInitial != null) {
for(var i3=0;i3<keywordOccurencesInitial.length;i3++){
if (keywordOccurencesInitial[i3].indexOf("http://twitter.com/") > -1 || keywordOccurencesInitial[i3].indexOf("q=%23") > -1)
continue;
else
keywordOccurences.push(keywordOccurencesInitial[i3]);
}
}
// replace our matches with search term
// the regex should ensure to NOT catch terms we've already wrapped in the span
// i took the negative lookbehind workaround from http://stackoverflow.com/a/642746/1610101
if (keywordOccurences != null) {
for(var i2=0;i2<keywordOccurences.length;i2++){
var searchString2 = new RegExp("(q=%23||http://twitter.com/||<span class='search-term'>)?"+keywordOccurences[i2].trim(), "g"); // don't replace what we've alrdy replaced
text = text.replace(searchString2,
function($0,$1){
return $1?$0:"<span class='search-term'>"+keywordOccurences[i2].trim()+"</span>";
});
}
}
return text;
}
Here's something you can probably work with:
var getv = document.getElementById('tekt').value;
var keywords = "keyword,big elephant"; // comma delimited keyword list
var rekeywords = "(" + keywords.replace(/\, ?/ig,"|") + ")"; // wraps keywords in ( and ), and changes , to a pipe (character for regex alternation)
var keyrex = new RegExp("(#?\\b" + rekeywords + "\\b)(?=[^>]*?<[^>]*>|(?![^>]*>))","igm")
alert(keyrex);
document.getElementById('tekt').value = document.getElementById('tekt').value.replace(keyrex,"<span class=\"search-term\">$1</span>");
And here is a variation that attempts to deal with word forms. If the word ends with ed,es,s,ing,etc, it chops it off and also, while looking for word-boundaries at the end of the word, it also looks for words ending in common suffixes. It's not perfect, for instance the past tense of ride is rode. Accounting for that with Regex is nigh-impossible without opening yourself up to tons of false-positives.
var getv = document.getElementById('tekt').value;
var keywords = "keywords,big elephant";
var rekeywords = "(" + keywords.replace(/(es|ing|ed|d|s|e)?\b(\s*,\s*|$)/ig,"(es|ing|ed|d|s|e)?$2").replace(/,/g,"|") + ")";
var keyrex = new RegExp("(#?\\b" + rekeywords + "\\b)(?=[^>]*?<[^>]*>|(?![^>]*>))","igm")
console.log(keyrex);
document.getElementById('tekt').value = document.getElementById('tekt').value.replace(keyrex,"<span class=\"search-term\">$1</span>");
Edit
This is just about perfect. Do you know how to slightly modify it so the keyword in thiskeyword.com would also be highlighted?
Change this line
var keyrex = new RegExp("(#?\\b" + rekeywords + "\\b)(?=[^>]*?<[^>]*>|(?![^>]*>))","igm")
to (All I did was remove both \\b's):
var keyrex = new RegExp("(#?" + rekeywords + ")(?=[^>]*?<[^>]*>|(?![^>]*>))","igm")
But be warned, you'll have problems like smiles ending up as smiles (if a user searches for mile), and there's nothing regex can do about that. Regex's definition of a word is alphanumeric characters, it has no dictionary to check.
I am newbie in programming and I have a question that has been asked many times in the past. I need to implement highlight in some fashion. I have seen a jquery plugin ( SearchHighlight). Jquery is using selectors. To be honest I don't whant to use them. What I need is to "feed" the plugin the string to be highlighted and the string that holds the search words via variables ex.
var searchterms = 'lolo loli let';
var searchstring = ' Lolo loves loli and .... Blah, blah';
var highlightedstring = '';
// SearchHighlight plugin
highlightedstring return;
If the above is not possible is there a way in pure JavaScript to do substring highlighting?
With respect,
Tom
Greece
The concept around highlighting is to grab the html from the container where you're searching an wrap the word(s) found with tags that you can style. Simple example with just one word, but this could be extended with an array of words.
function highlight(word, content) {
var html = $(content).html();
var re = new RegExp(word, 'gi');
$(content).html(html.replace(re, '<code>$&</code>'));
}
highlight('pleasure', '#content');
Demo: http://jsfiddle.net/elclanrs/hdYmb/