Javascript regex not working as intended - javascript

I have the HTML from a page in a variable as just plain text. Now I need to remove some parts of the text. This is a part of the HTML that I need to change:
<div class="post"><a name="6188729"></a>
<div class="igmline small" style="height: 20px; padding-top: 1px;">
<span class="postheader_left">
RuneRifle
op 24.08.2012 om 21:41 uur
</span>
<span class="postheader_right">
Citaat Bewerken
</span>
<div style="clear:both;"></div>
</div>
<div class="text">Testforum</div>
<!-- Begin Thank -->
<!-- Thank End -->
</div>
These replaces work:
pageData = pageData.replace(/href=\".*?\"/g, "href=\"#\"");
pageData = pageData.replace(/target=\".*?\"/g, "");
But this replace does not work at all:
pageData = pageData.replace(
/<span class=\"postheader_right\">(.*?)<\/span>/g, "");
I need to remove every span with the class postheader_right and everything in it, but it just doesn't work. My knowledge of regex isn't that great so I'd appreciate if you would tell me how you came to your answer and a small explanation of how it works.

The dot doesn't match newlines. Use [\s\S] instead of the dot as it will match all whitespace characters or non-whitespace characters (i.e., anything).
As Mike Samuel says regular expressions are not really the best way to go given the complexity allowed in HTML (e.g., if say there is a line break after <a), especially if you have to look for attributes which may occur in different orders, but that's the way you can do it to match the case in your example HTML.

I need to remove every span with the class postheader_right and everything in it, but it just doesn't work.
Don't use regular expressions to find the spans. Using regular expressions to parse HTML: why not?
var allSpans = document.getElementsByClassName('span');
for (var i = allSpans.length; --i >= 0;) {
var span = allSpans[i];
if (/\bpostheader_right\b/.test(span.className)) {
span.parentNode.removeChild(span);
}
}
should do it.
If you only need to work on newer browsers then getElementsByClassName makes it even easier:
Find all div elements that have a class of 'test'
var tests = Array.filter( document.getElementsByClassName('test'), function(elem){
return elem.nodeName == 'DIV';
});

Related

Contenteditable regex whitespace not working

I am trying to validate if the contenteditiable value has only whitespace/blank space. In my example if the value have only whitespace/blank space it should not match according to my regex string, but it not working as intended. It keeps matching when I enter complete blank spaces.
edit: the black space is where you can enter text.
https://jsfiddle.net/j1kcer26/5/
JS
var checkTitle = function() {
var titleinput = document.getElementById("artwork-title").innerHTML;
var titleRegexp = new RegExp("^(?!\s*$).+"); //no blank spaces allowed
if (!titleRegexp.test(titleinput)) {
$('.start').removeClass('active-upload-btn');
console.log('no match')
} else if (titleRegexp.test(titleinput)) {
$('.start').addClass('active-upload-btn');
console.log('match')
}
};
$('#artwork-title').on('keyup change input', function() {
checkTitle();
});
HTML
<div class="post-title-header">
<span class="user-title-input title-contenteditable maxlength-contenteditable" placeholder="enter text here" contenteditable="true" name="artwork-title" id="artwork-title" autocomplete="off" type="text" spellcheck="false">
</span>
</div>
<div class="start">
turn red if match
</div>
If you look at the actual inner HTML, you'll see things like <br> elements or entities. Your regex doesn't look equipped to handle these.
You may want to consider using textContent instead of innerHTML if you just care about the text, not the HTML. Or alternatively, if you really want plain text, use a <textarea/> instead of a content-editable div, which is for rich-text-style editing that produces HTML.
Edit:
Your regex is not quite right either. Because you're using the RegExp constructor with new RegExp("^(?!\s*$).+"), the \s in your string literal is going to turn into a plain s; you have to use a \\s if you want the regex to have an actual \s in it. IMO, it's always better to use a regexp literal unless you're building one dynamically, like /^(?!\s*$).+/, or I find this to be a simpler alternative to tell you if a string is entirely whitespace: /^\s+$/.

Get text using jquery with text-transform

I have an html element and i toggle its class and show capital/small letters with text-transform.
Is it possible to get the text its text-transform?
$('#toggle').click(function(){
$('#char').toggleClass('upper');
});
$('#getdata').click(function(){
var text = $('#char').text();
alert(text); /// here i need to get the actual word with capital/lower i selected
});
.upper{
text-transform:uppercase;
}
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<span contenteditable="true" id="char">a</span>
<br/>
<button id="toggle">Toggle case</button>
<button id="getdata">gat data</button>
you can check for the class and use toUpperCase:-
$('#toggle').click(function(){
$('#char').toggleClass('upper');
});
$('#getdata').click(function(){
var $char = $('#char');
var text = $char.hasClass('upper') ? $char.text().toUpperCase() : $char.text();
alert(text); /// here i need to get the actual word with capital/lower i selected
});
.upper{
text-transform:uppercase;
}
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<span contenteditable="true" id="char">a</span>
<br/>
<button id="toggle">Toggle case</button>
<button id="getdata">gat data</button>
There is currently no way to get the rendered text with JavaScript.
When you are using English, toUpperCase and toLowerCase works well for the CSS value uppercase and lowercase.
But when you need it for non-English, or when you use capitalize, full-width etc., you have to reproduce the CSS logic (mostly unicode logic) with JS.
Below is a few rules that Firefox is doing. Chrome also knows some of them.
In German (de), the ß becomes SS in uppercase.
In Dutch (nl), the ij digraph becomes IJ, even with text-transform: capitalize, which only put the first letter of a word in uppercase.
In Greek (el), vowels lose their accent when the whole word is in uppercase (ά/Α), except for the disjunctive eta (ή/Ή). Also, diphthongs with an accent on the first vowel lose the accent and gain a diaeresis on the second vowel (άι/ΑΪ).
And so on...
It's also fun when you need to apply other CSS values:
capitalize - What constitutes a "word"? How do browsers split iPhone-6s+? Behold, Unicode consortium to the rescue!
full-width - The MDN example looks easy, but it does not show them all, for example [] to [], and maybe someday they will convert ... to … instead of ...
full-size-kana - How's your Japanese? No worry, this CSS4 proposals is dropped - in preference of a (future) fully customisable character mapping rules! Hope your CSS parser skill is up to par.
So, count yourself lucky if you use only English. You have my consolation if you, like me, work with multilingual systems. Timezone is nothing at all.
Maybe like this?
// Store css to javascript values in this object
var textTypes = {
"uppercase": "toUpperCase",
"lowercase": "toLowerCase"
}
// get the element
var div = document.getElementsByTagName('div')[0];
// get the computed style type
var type = window.getComputedStyle(div)['text-transform'];
// print the transformed text
console.log(div.innerHTML[textTypes[type]]());
Working Fiddle

Replace starting and ending multiple html tags using javascript

I want to remove following tags
1. <div>
2. </div>
3. <p>
4. </p>
5. <span>
6. </span>
var str = '<div><p><span>Hello World</span></p></div>';
I can do
str = str.replace('<div>', '');
str = str.replace('<p>', '');
and so on.
But using regular expressions etc can we accomplish the same in 1 step.
Do not use regexes for this: RegEx match open tags except XHTML self-contained tags
Parse the HTML and retrieve what you need. This is a basic one, that retrieves the text from the nodes you supplied. You can extend this further to seed your needs.
var container = document.createElement("div"); //load div in memory
container.insertAdjacentHTML("afterbegin", str); //append the nodes into the container div.
str = container.getElementsByTagName("span")[0].textContent || container.getElementsByTagName;("span")[0].innerText;
You can even do container.textContent || container.innerText; to get all text and no nodes from the string container HTML elements. (innerText is there to support older browsers, IE).
Try this pattern:
/<\/?([a-z])+\>/g
Heres an example
RegExr v2.0 is a very handy tool for testing regular expression. In order to see the result, click on the "substitution" tab on the bottom of the page.
Hope this is what you were looking for.

Custom RegEx needed

Using javascript, I need to parse the HTML of a page and replace all occurrences of ABC with ABC that occur within a content block such as <p>ABC Company lorem ipsum</p> would be changed to <p><span class="abc">ABC</span> Company lorem ipsum</p> but mailto:joe#abccompany.com would stay the same.
So pretty much replace ABC anywhere that is preceded by a space or quote, but obviously I would like to make it a little more generic. Perhaps the expression would say when it is not preceded/followed by [a-zA-z].
What I have so far:
<script type="text/javascript">
$(document).ready(function() {
$('body').find('div').each(function(i, v) {
h = $(v).html();
if (h.indexOf('abc') > 0) {
h = h.replace('abc', '<span class="abc">abc</span>');
$(v).html(h);
}
});
});
</script>
I suggest going about it a different way that preserves data and events on the elements and doesn't interfere with attributes on said elements.
$("body div").find("*").addBack().contents().filter(function(){
return this.nodeType === 3;
}).each(function() {
$(this).parent().html(this.nodeValue.replace(/abc/g, '<span class="abc">abc</span>'));
})
http://jsfiddle.net/hrEyC/1/
Note, requires jQuery 1.9+ due to use of .addBack(), for older versions replace with .andSelf()
This is not a very efficient thing to do (loop through all div tags in the DOM and apply a regex to each one) but since I don't know what constraints you have or what situation you are using this code in, I'll just assume there's a good reason you're doing this client-side in this way.
Anyway, this regex seems to match your requirements (albeit not very well defined requirements):
h = h.replace(/([^A-Z])(ABC)([^A-Z])/gi, '$1<span style="color: red">$2</span>$3');
Fiddle here: http://jsfiddle.net/czJFG/

Regular Expression matches similar strings

I am trying to alter classnames for a module via jquery and right now I have this RegEx
/module-\w+/gi
Used in this fashion
//// Removes all module-xxxx classes
var classes = $target[0].className.replace(/module-\w+/gi, '');
This has worked fine until now however I have to alter the structure of my module class so that it resembles this
<div class="module">
<div class="module-header">
<div class="module-header-content module-blue ..."></div>
</div>
<div class="module-content"></div>
</div>
The ... just means there could be other class names.
I need to change the RegEx so that it matches only module-blue (could be module-default, module-green, module-whatever, but always in the format of module-COLORNAME) and not doesn't match module-header-content as well.
The jquery selects the classname of: module-header-content module-blue
var classes = $target[0].className.replace(/\bmodule-\w+(?!-)\b/gi, '');
With word boundaries, the expression has to match an entire group of module- followed by at least one letter that is not followed by a dash.
http://jsfiddle.net/ExplosionPIlls/WqbKJ/

Categories