Regex for visible text, not HTML - javascript

If i had a string:
hey user, what are you doing?
How, with regex could I say: look for user, but not inside of < or > characters? So the match would grab the user between the <a></a> but not the one inside of the href
I'd like this to work for any tag, so it wont matter what tags.
== Update ==
Why i can't use .text() or innerText is because this is being used to highlight results much like the native cmd/ctrl+f functionality in browsers and I dont want to lose formatting. For example, if i search for strong here:
Some <strong>strong</strong> text.
If i use .text() itll return "Some strong text" and then I'll wrap strong with a <span> which has a class for styling, but now when I go back and try to insert this into the DOM it'll be missing the <strong> tags.

If you plan to replace the HTML using html() again then you will loose all event handlers that might be bound to inner elements and their data (as I said in my comment).
Whenever you set the content of an element as HTML string, you are creating new elements.
It might be better to recursively apply this function to every text node only. Something like:
$.fn.highlight = function(word) {
var pattern = new RegExp(word, 'g'),
repl = '<span class="high">' + word + '</span>';
this.each(function() {
$(this).contents().each(function() {
if(this.nodeType === 3 && pattern.test(this.nodeValue)) {
$(this).replaceWith(this.nodeValue.replace(pattern, repl));
}
else if(!$(this).hasClass('high')) {
$(this).highlight(word);
}
});
});
return this;
};
DEMO
It could very well be that this is not very efficient though.

To emulate Ctrl-F (which I assume is what you're doing), you can use window.find for Firefox, Chrome, and Safari and TextRange.findText for IE.
You should use a feature detect to choose which method you use:
function highlightText(str) {
if (window.find)
window.find(str);
else if (window.TextRange && window.TextRange.prototype.findText) {
var bodyRange = document.body.createTextRange();
bodyRange.findText(str);
bodyRange.select();
}
}
Then, after you the text is selected, you can style the selection with CSS using the ::selection selector.
Edit: To search within a certain DOM object, you could use a roundabout method: use window.find and see whether the selection is in a certain element. (Perhaps say s = window.getSelection().anchorNode and compare s.parentNode == obj, s.parentNode.parentNode == obj, etc.). If it's not in the correct element, repeat the process. IE is a lot easier: instead of document.body.createTextRange(), you can use obj.createTextRange().

$("body > *").each(function (index, element) {
var parts = $(element).text().split("needle");
if (parts.length > 1)
$(element).html(parts.join('<span class="highlight">needle</span>'));
});
jsbin demo
at this point it's evolving to be more and more like Felix's, so I think he's got the winner
original:
If you're doing this in javascript, you already have a handy parsed version of the web page in the DOM.
// gives "user"
alert(document.getElementById('user').innerHTML);
or with jQuery you can do lots of nice shortcuts:
alert($('#user').html()); // same as above
$("a").each(function (index, element) {
alert(element.innerHTML); // shows label text of every link in page
});

I like regexes, but because tags can be nested, you will have to use a parser. I recommend http://simplehtmldom.sourceforge.net/ it is really powerful and easy to use. If you have wellformed xhtml you can also use SimpleXML from php.
edit: Didn't see the javascript tag.

Try this:
/[(<.+>)(^<)]*user[(^>)(<.*>)]/
It means:
Before the keyword, you can have as many <...> or non-<.
Samewise after it.
EDIT:
The correct one would be:
/((<.+>)|(^<))*user((^>)|(<.*>))*/

Here is what works, I tried it on your JS Bin:
var s = 'hey user, what are you doing?';
s = s.replace(/(<[^>]*)user([^<]>)/g,'$1NEVER_WRITE_THAT_ANYWHERE_ELSE$2');
s = s.replace(/user/g,'Mr Smith');
s = s.replace(/NEVER_WRITE_THAT_ANYWHERE_ELSE/g,'user');
document.body.innerHTML = s;
It may be a tiny little bit complicated, but it works!
Explanation:
You replace "user" that is in the tag (which is easy to find) with a random string of your choice that you must never use again... ever. A good use would be to replace it with its hashcode (md5, sha-1, ...)
Replace every remaining occurence of "user" with the text you want.
Replace back your unique string with "user".

this code will strip all tags from sting
var s = 'hey user, what are you doing?';
s = s.replace(/<[^<>]+>/g,'');

Related

Select tags that starts with "x-" in jQuery

How can I select nodes that begin with a "x-" tag name, here is an hierarchy DOM tree example:
<div>
<x-tab>
<div></div>
<div>
<x-map></x-map>
</div>
</x-tab>
</div>
<x-footer></x-footer>
jQuery does not allow me to query $('x-*'), is there any way that I could achieve this?
The below is just working fine. Though I am not sure about performance as I am using regex.
$('body *').filter(function(){
return /^x-/i.test(this.nodeName);
}).each(function(){
console.log(this.nodeName);
});
Working fiddle
PS: In above sample, I am considering body tag as parent element.
UPDATE :
After checking Mohamed Meligy's post, It seems regex is faster than string manipulation in this condition. and It could become more faster (or same) if we use find. Something like this:
$('body').find('*').filter(function(){
return /^x-/i.test(this.nodeName);
}).each(function(){
console.log(this.nodeName);
});
jsperf test
UPDATE 2:
If you want to search in document then you can do the below which is fastest:
$(Array.prototype.slice.call(document.all)).filter(function () {
return /^x-/i.test(this.nodeName);
}).each(function(){
console.log(this.nodeName);
});
jsperf test
There is no native way to do this, it has worst performance, so, just do it yourself.
Example:
var results = $("div").find("*").filter(function(){
return /^x\-/i.test(this.nodeName);
});
Full example:
http://jsfiddle.net/6b8YY/3/
Notes: (Updated, see comments)
If you are wondering why I use this way for checking tag name, see:
JavaScript: case-insensitive search
and see comments as well.
Also, if you are wondering about the find method instead of adding to selector, since selectors are matched from right not from left, it may be better to separate the selector. I could also do this:
$("*", $("div")). Preferably though instead of just div add an ID or something to it so that parent match is quick.
In the comments you'll find a proof that it's not faster. This applies to very simple documents though I believe, where the cost of creating a jQuery object is higher than the cost of searching all DOM elements. In realistic page sizes though this will not be the case.
Update:
I also really like Teifi's answer. You can do it in one place and then reuse it everywhere. For example, let me mix my way with his:
// In some shared libraries location:
$.extend($.expr[':'], {
x : function(e) {
return /^x\-/i.test(this.nodeName);
}
});
// Then you can use it like:
$(function(){
// One way
var results = $("div").find(":x");
// But even nicer, you can mix with other selectors
// Say you want to get <a> tags directly inside x-* tags inside <section>
var anchors = $("section :x > a");
// Another example to show the power, say using a class name with it:
var highlightedResults = $(":x.highlight");
// Note I made the CSS class right most to be matched first for speed
});
It's the same performance hit, but more convenient API.
It might not be efficient, but consider it as a last option if you do not get any answer.
Try adding a custom attribute to these tags. What i mean is when you add a tag for eg. <x-tag>, add a custom attribute with it and assign it the same value as the tag, so the html looks like <x-tag CustAttr="x-tag">.
Now to get tags starting with x-, you can use the following jQuery code:
$("[CustAttr^=x-]")
and you will get all the tags that start with x-
custom jquery selector
jQuery(function($) {
$.extend($.expr[':'], {
X : function(e) {
return /^x-/i.test(e.tagName);
}
});
});
than, use $(":X") or $("*:X") to select your nodes.
Although this does not answer the question directly it could provide a solution, by "defining" the tags in the selector you can get all of that type?
$('x-tab, x-map, x-footer')
Workaround: if you want this thing more than once, it might be a lot more efficient to add a class based on the tag - which you only do once at the beginning, and then you filter for the tag the trivial way.
What I mean is,
function addTagMarks() {
// call when the document is ready, or when you have new tags
var prefix = "tag--"; // choose a prefix that avoids collision
var newbies = $("*").not("[class^='"+prefix+"']"); // skip what's done already
newbies.each(function() {
var tagName = $(this).prop("tagName").toLowerCase();
$(this).addClass(prefix + tagName);
});
}
After this, you can do a $("[class^='tag--x-']") or the same thing with querySelectorAll and it will be reasonably fast.
See if this works!
function getXNodes() {
var regex = /x-/, i = 0, totalnodes = [];
while (i !== document.all.length) {
if (regex.test(document.all[i].nodeName)) {
totalnodes.push(document.all[i]);
}
i++;
}
return totalnodes;
}
Demo Fiddle
var i=0;
for(i=0; i< document.all.length; i++){
if(document.all[i].nodeName.toLowerCase().indexOf('x-') !== -1){
$(document.all[i].nodeName.toLowerCase()).addClass('test');
}
}
Try this
var test = $('[x-]');
if(test)
alert('eureka!');
Basically jQuery selector works like CSS selector.
Read jQuery selector API here.

Select numbers on a page with jQuery or Javascript

I'm just wondering if there's a way to locate numbers on a page with jQuery or plain Javascript.
Here's what I want to do:
Say "June 23" is on the page. What I want to do is be able to prepend and append some <span> selectors to the number.
Using :contains() with jQuery selects the whole thing, not just the number.
These strings are being generated without any wrapping elements by a Wordpress theme I'm working on, and I only want to select the number.
Any help would be appreciated! Thanks for even thinking about it.
-George
You can walk through all the elements, looking at text nodes, and replacing them with updated content that has the number wrapped.
var regex = /(\d+)/,
replacement = '<span>$1</span>';
function replaceText(el) {
if (el.nodeType === 3) {
if (regex.test(el.data)) {
var temp_div = document.createElement('div');
temp_div.innerHTML = el.data.replace(regex, replacement);
var nodes = temp_div.childNodes;
while (nodes[0]) {
el.parentNode.insertBefore(nodes[0],el);
}
el.parentNode.removeChild(el);
}
} else if (el.nodeType === 1) {
for (var i = 0; i < el.childNodes.length; i++) {
replaceText(el.childNodes[i]);
}
}
}
replaceText(document.body);
Example: http://jsfiddle.net/JVsM4/
This doesn't do any damage to existing elements, and their associated jQuery data.
EDIT: You could shorten it a bit with a little jQuery:
var regex = /(\d+)/g,
replacement = '<span>$1</span>';
function replaceText(i,el) {
if (el.nodeType === 3) {
if (regex.test(el.data)) {
$(el).replaceWith(el.data.replace(regex, replacement));
}
} else {
$(el).contents().each( replaceText );
}
}
$('body').each( replaceText );
Example: http://jsfiddle.net/JVsM4/1/
Note that the regex requires the g global modifier.
Probably a little slower this way, so if the DOM is quite large, I'd use the non-jQuery version.
Just thinking out loud, but do you reckon this would work?
document.body.innerHTML = document.body.innerHTML.replace(/(\d+)/g, "<span class='number'>$1</span>")
It is fully dependent on what format your date is.
I found this website with a lot of different regular expressions (because you are just searching a normal piece of text for a date).
This seems a good option if this is your format for your date (dd MMM yyyy): http://regexlib.com/REDetails.aspx?regexp_id=405
I assume, because it is a template, that your dates will be the same for all pages. So the format will be the same as well. You can use the regular expression on every piece of text on your template if you define it well.
You can also select decimal numbers that contain comma as thousands separators:
let regex = /([,\d]*\.?\d+)/g;
This will match 1234 and 1,234 and 1234.5678 and 1,234.5678 and 0.5678 and .5678.
Refer to the above answer for full solution.

Regex to search html return, but not actual html jQuery

I'm making a highlighting plugin for a client to find things in a page and I decided to test it with a help viewer im still building but I'm having an issue that'll (probably) require some regex.
I do not want to parse HTML, and im totally open on how to do this differently, this just seems like the the best/right way.
http://oscargodson.com/labs/help-viewer
http://oscargodson.com/labs/help-viewer/js/jquery.jhighlight.js
Type something in the search... ok, refresh the page, now type, like, class or class=" or type <a you'll notice it'll search the actual HTML (as expected). How can I only search the text?
If i do .text() it'll vaporize all the HTML and what i get back will just be a big blob of text, but i still want the HTML so I dont lose formatting, links, images, etc. I want this to work like CMD/CTRL+F.
You'd use this plugin like:
$('article').jhighlight({find:'class'});
To remove them:
.jhighlight('remove')
==UPDATE==
While Mike Samuel's idea below does in fact work, it's a tad heavy for this plugin. It's mainly for a client looking to erase bad words and/or MS Word characters during a "publishing" process of a form. I'm looking for a more lightweight fix, any ideas?
You really don't want to use eval, mess with innerHTML or parse the markup "manually". The best way, in my opinion, is to deal with text nodes directly and keep a cache of the original html to erase the highlights. Quick rewrite, with comments:
(function($){
$.fn.jhighlight = function(opt) {
var options = $.extend($.fn.jhighlight.defaults, opt)
, txtProp = this[0].textContent ? 'textContent' : 'innerText';
if ($.trim(options.find.length) < 1) return this;
return this.each(function(){
var self = $(this);
// use a cache to clear the highlights
if (!self.data('htmlCache'))
self.data('htmlCache', self.html());
if(opt === 'remove'){
return self.html( self.data('htmlCache') );
}
// create Tree Walker
// https://developer.mozilla.org/en/DOM/treeWalker
var walker = document.createTreeWalker(
this, // walk only on target element
NodeFilter.SHOW_TEXT,
null,
false
);
var node
, matches
, flags = 'g' + (!options.caseSensitive ? 'i' : '')
, exp = new RegExp('('+options.find+')', flags) // capturing
, expSplit = new RegExp(options.find, flags) // no capturing
, highlights = [];
// walk this wayy
// and save matched nodes for later
while(node = walker.nextNode()){
if (matches = node.nodeValue.match(exp)){
highlights.push([node, matches]);
}
}
// must replace stuff after the walker is finished
// otherwise replacing a node will halt the walker
for(var nn=0,hln=highlights.length; nn<hln; nn++){
var node = highlights[nn][0]
, matches = highlights[nn][1]
, parts = node.nodeValue.split(expSplit) // split on matches
, frag = document.createDocumentFragment(); // temporary holder
// add text + highlighted parts in between
// like a .join() but with elements :)
for(var i=0,ln=parts.length; i<ln; i++){
// non-highlighted text
if (parts[i].length)
frag.appendChild(document.createTextNode(parts[i]));
// highlighted text
// skip last iteration
if (i < ln-1){
var h = document.createElement('span');
h.className = options.className;
h[txtProp] = matches[i];
frag.appendChild(h);
}
}
// replace the original text node
node.parentNode.replaceChild(frag, node);
};
});
};
$.fn.jhighlight.defaults = {
find:'',
className:'jhighlight',
color:'#FFF77B',
caseSensitive:false,
wrappingTag:'span'
};
})(jQuery);
If you're doing any manipulation on the page, you might want to replace the caching with another clean-up mechanism, not trivial though.
You can see the code working here: http://jsbin.com/anace5/2/
You also need to add display:block to your new html elements, the layout is broken on a few browsers.
In the javascript code prettifier, I had this problem. I wanted to search the text but preserve tags.
What I did was start with HTML, and decompose that into two bits.
The text content
Pairs of (index into text content where a tag occurs, the tag content)
So given
Lorem <b>ipsum</b>
I end up with
text = 'Lorem ipsum'
tags = [6, '<b>', 10, '</b>']
which allows me to search on the text, and then based on the result start and end indices, produce HTML including only the tags (and only balanced tags) in that range.
Have a look here: getElementsByTagName() equivalent for textNodes.
You can probably adapt one of the proposed solutions to your needs (i.e. iterate over all text nodes, replacing the words as you go - this won't work in cases such as <tag>wo</tag>rd but it's better than nothing, I guess).
I believe you could just do:
$('#article :not(:has(*))').jhighlight({find : 'class'});
Since it grabs all leaf nodes in the article it would require valid xhtml, that is, it would only match link in the following example:
<p>This is some paragraph content with a link</p>
DOM traversal / selector application could slow things down a bit so it might be good to do:
article_nodes = article_nodes || $('#article :not(:has(*))');
article_nodes.jhighlight({find : 'class'});
May be something like that could be helpful
>+[^<]*?(s(<[\s\S]*?>)?e(<[\s\S]*?>)?e)[^>]*?<+
The first part >+[^<]*? finds > of the last preceding tag
The third part [^>]*?<+ finds < of the first subsequent tag
In the middle we have (<[\s\S]*?>)? between characters of our search phrase (in this case - "see").
After regular expression searching you could use the result of the middle part to highlight search phrase for user.

Javascript: Changing color of every "r" in html document

EDIT [how can I] change the color of every R and r in my HTML document with javascript?
I'd use the highlight plugin for jQuery. Then do something like:
$('*').highlight('r'); // Not sure if it's case-insensitive or not
and in CSS:
.highlight { background-color: yellow; }
Doable, but not super easy. There's no CSS way to do it.
Basically, you'll need to use Javascript and iterate through the all nodes. If it's a text node, you can search it for "R" and then replace the R with a <span style="color:red">R</span>
I am obviously simplifying this a bit, it's probably better to just dynamically add a "highlight" class, rather than hard code a style, and have that defined in CSS. Similarly, I'm sure you'll wanna parameterize the search string. Also, this doesn't take into account what the text node is, for instance, I have special handling to skip comments, but you'll probably find there's other things (script nodes?) you also need to skip.
function updateNodes(node) {
if (node.nextSibling)
updateNodes(node.nextSibling);
if (node.nodeType ==8) return; //Don't update comments
if (node.firstChild)
updateNodes(node.firstChild);
if (node.nodeValue) { // update me
if (node.nodeValue.search(/[Rr]/) > -1){ // does the text node have an R
var span=document.createElement("span");
var remainingText = node.nodeValue;
var newValue='';
while (remainingText.search(/[Rr]/) > -1){ //Crawl through the node finding each R
var rPos = remainingText.search(/[Rr]/);
var bit = remainingText.substr(0,rPos);
var r = remainingText.substr(rPos,1);
remainingText=remainingText.substr(rPos+1);
newValue+=bit;
newValue+='<span style="color:red">';
newValue+=r;
newValue+='</span>';
}
newValue+=remainingText;
span.innerHTML=newValue;
node.parentNode.insertBefore(span,node);
node.parentNode.removeChild(node);
}
}
}
function replace(){ updateNodes(document.body);
}
Yes this is possible with a little Javascript, a smattering of CSS and some regex.
First, you need to define a style which provides the colour you require (in my example below I refer to a CSS class called "new-colour"), and then run some regex over your HTML content which does a search and replace. You are looking to change all 'r' and 'R' characters into something like this (as an example):
<span class="new-colour">r</span>
If you don't know regex, there are oodles of resources out there to get you started. You will be pleased to know that your requirement is very simple, so no worries there. Here are a couple of links:
regexlib.com
8 regular expressions you should know
You would need to use the DOM (or jQuery) to iterate through every text node in the document. Whenever you find the letter R, apply a transformation that wraps the letter in an appropriate element.
e.g. Transform the text node "art" into "a<span class="colored">r</span>t". This adds two new text nodes, "r" and "t", and the new span element.
The highlight plugin for jQuery is one option. Another option - especially since to-morrow - you might want to extend your highlighting into keywords or other terms is to use Google's Closure goog.dom.annotate Class. The beauty of this Class is that it will actually parse the dom tree properly and ONLY
annotate the relevant terms. It will also allow you to EXCLUDE elements or elements with certain classes.
A common problem with annotations is that you can mess your HTML, if you are not careful.
For example the 'simple solution posted above'
var body = document.getElementsByTagName("body")[0];
var html = body.innerHTML
.replace(/(^|>[^<rR]*)([rR])/g, "$1<em>$2</em>");
body.innerHTML = html;
will surely also capture terms in any style attributes. If you had this:
<p class="red">text......</p>
It will become
<p class="<span class="red">r</span>ed .....
that will break your html.
In general DOM parsing is 'slow', so try and avoid annotating the whole body of a webpage, ask yourself why you only need to highlight the R's? Actually I am curious why do you want to annotate the r's?:)
Plain JS solution without need of any 20kB JS library:
var body = document.getElementsByTagName("body")[0];
var html = body.innerHTML
.replace(/(^|>[^<rR]*)([rR])/g, "$1<em>$2</em>");
body.innerHTML = html; // note that you will lose all
// event handlers in this step...

How can I use jQuery to style /parts/ of all instances of a specific word?

Unusual situation. I have a client, let's call them "BuyNow." They would like for every instance of their name throughout the copy of their site to be stylized like "BuyNow," where the second half of their name is in bold.
I'd really hate to spend a day adding <strong> tags to all the copy. Is there a good way to do this using jQuery?
I've seen the highlight plugin for jQuery and it's very close, but I need to bold just the second half of that word.
To do it reliably you'd have to iterate over each element in the document looking for text nodes, and searching for text in those. (This is what the plugin noted in the question does.)
Here's a plain JavaScript/DOM one that allows a RegExp pattern to match. jQuery doesn't really give you much help here since selectors can only select elements, and the ‘:contains’ selector is recursive so not too useful to us.
// Find text in descendents of an element, in reverse document order
// pattern must be a regexp with global flag
//
function findText(element, pattern, callback) {
for (var childi= element.childNodes.length; childi-->0;) {
var child= element.childNodes[childi];
if (child.nodeType==1) {
findText(child, pattern, callback);
} else if (child.nodeType==3) {
var matches= [];
var match;
while (match= pattern.exec(child.data))
matches.push(match);
for (var i= matches.length; i-->0;)
callback.call(window, child, matches[i]);
}
}
}
findText(document.body, /\bBuyNow\b/g, function(node, match) {
var span= document.createElement('span');
span.className= 'highlight';
node.splitText(match.index+6);
span.appendChild(node.splitText(match.index+3));
node.parentNode.insertBefore(span, node.nextSibling);
});
Regular Expressions and replace() spring to mind. Something like
var text = $([selector]).html();
text = text.replace(/Now/g,'<strong>Now<\strong>');
$([selector]).html(text);
A word of caution in using html() to do this. Firstly, there is the potential to replace matched strings in href attributes of <a> elements and other attributes that may cause the page to then incorrectly function. It might be possible to write a better regular expression to overcome some of the potential problems, but performance may suffer (I'm no regular expression guru). Secondly, using html() to replace content will cause non-serializable data such as event handlers bound to elements markup that is replaced, form data, etc. to be lost. Writing a function to target only text nodes may be the better/safer option, it just depends on how complex the pages are.
If you have access to the HMTL files, it would probably be better to do a find and replace on the words they want to change the appearance of in the files if the content is static. NotePad++'s Find in Files option is performant for this job in most cases.
Going with SingleShot's suggestion and using a <span> with a CSS class will afford more flexibility than using a <strong> element.
I wrote a little plugin to do just that. Take a look at my answer to a similar question.
Instead of downloading the plugin suggested in the accepted answer, I strongly recommend that you use the plugin I've written--it's a lot faster.
var Run=Run || {};
Run.makestrong= function(hoo, Rx){
if(hoo.data){
var X= document.createElement('strong');
X.style.color= 'red'; // testing only, easier to spot changes
var pa= hoo.parentNode;
var res, el, tem;
var str= hoo.data;
while(str && (res= Rx.exec(str))!= null){
var tem= res[1];
el= X.cloneNode(true);
el.appendChild(document.createTextNode(tem));
hoo.replaceData(res.index, tem.length,'');
hoo= hoo.splitText(res.index);
str= hoo.data;
if(str) pa.insertBefore(el, hoo);
else{
pa.appendChild(el);
return;
}
}
}
}
Run.godeep= function(hoo, fun, arg){
var A= [];
if(hoo){
hoo= hoo.firstChild;
while(hoo!= null){
if(hoo.nodeType== 3){
if(hoo.data) A[A.length]= fun(hoo, arg);
}
else A= A.concat(arguments.callee(hoo, fun, arg));
hoo= hoo.nextSibling;
}
}
return A;
}
//test
**Run.godeep(document.body, Run.makestrong,/([Ee]+)/g);**
This is not a jQuery script but pure javaScript, i believe it can be altered a little.
Link.

Categories