Javascript regex with quotes - javascript

I need a way to replace all appearances of <br class=""> with just <br>
I'm a complete novice with regex, but I tried:
str = str.replace(/<br\sclass=\"\"\s>/g, "<br>");
and it didn't work.
What's a proper regex to do this?

I would not use a regex to do this, but rather actually parse the html and remove the classes.
This is untested, but probably works.
// Dummy <div> to hold the HTML string contents
var d = document.createElement("div");
d.innerHTML = yourHTMLString;
// Find all the <br> tags inside the dummy <div>
var brs = d.getElementsByTagName("br");
// Loop over the <br> tags and remove the class
for (var i=0; i<brs.length; i++) {
if (brs[i].hasAttribute("class")) {
brs[i].removeAttribute("class");
}
}
// Return it to a string
var yourNewHTMLString = d.innerHTML;

One way is with the following
var s = '<br class="">';
var n = s.replace(/(.*)(\s.*)(>)/,"$1$3");
console.log(n)

\s matches exactly one whitespace character. You probably want \s*, which will match any number (including zero) of whitespace characters, and \s+, which will match at least one.
str = str.replace(/'<br\s+class=\"\"\s*>/g, "<br>");

Related

Calculating word count after stripping HTML

I have encountered a simple yet peculiar problem while calculating the word count of a string that contains HTML. The simple method is to first strip the HTML and then to count the whitespace. The problem I've found is that once you strip away the HTML tags some words are incorrectly concatenated.
See the example below that illustrates the issue using Javascript "textContent" to strip the HTML.
<p>One</p><p>Two</p><p>Three</p> becomes OneTwoThree and is counted as a single word.
How would you go about counting words (simply)?
var text = document.getElementById("test").textContent;
var words = text.match(/\S+/g).length;
document.getElementById("words").textContent = words;
<div class="box" id="test">
<p>One</p><p>Two</p><p>Three</p>
</div>
<div><span id="words">???</span> word(s)</div>
Maybe this could work for you:
Replace all tags with spaces, so <p>One</p><p>Two</p> would become One Two .
Trim the middle spaces, and make them one space, so our string should just have an extra space on the left and right.
Remove that extra space.
let html = "your html";
let tmp = html.replace(/(<([^>]+)>)/ig," ");
tmp = tmp.replace(/\s+/gm, " ");
console.log(tmp.replace(/^\s+|\ +$/gm, ""));
//Now we can count the number of spaces in tmp.
let count = (tmp.match(/ /g) || []).length;
You need to use innerText instead to get the all text content even with whitespaces.
var textWithoutWhiteSpaces = document.getElementById("test").textContent;
var wordsWithoutWhiteSpaces = textWithoutWhiteSpaces.match(/\S+/g).length;
var textWithWhiteSpaces = document.getElementById("test").innerText;
var wordsWithWhiteSpaces = textWithWhiteSpaces.match(/\S+/g).length;
console.log(wordsWithoutWhiteSpaces)
console.log(wordsWithWhiteSpaces)
document.getElementById("words").textContent = wordsWithWhiteSpaces;
<div class="box" id="test">
<p>One</p><p>Two</p><p>Three</p>
</div>
<div><span id="words">???</span> word(s)</div>
I will add another option to count the number of words among the tags:
const str = '<p><p><p>One<br></p><p>Two</p><p>Three</p><p></p></p><h1>four</h1><b>five</b><H1>123</H1>';
const result = str
.replace(/(<.*?>)/g, '|')
.split('|')
.filter((el) => el !== '').length;
console.log(result);

How to count every char except of word?

I need to count length of string without spaces and tags.
My JS pattern doesnt work because it also not counts 'b' and 'r' chars.
My code is here:
content.match(/[^\s^<br />]/g).length
How to fix it?
Instead of a match, just use .replace(). Match always returns an array, and because primitives in Javascript are immutable, you can make a new string without those characters easily using replace().
let newString = oldString.replace(/\s/g, '') //replace all whitespace with empty spaces
newString = newString.replace(/<br\s*\/?>/g, '') //replace <br> and <br /> with empty spaces
and then just do newString.length
In the future, try using https://regexr.com to test your regex matching
If you wanted to remove all the HTML tags (not just <br/>), you could add your string as the HTML to a new element, grab the textContent, and then run a regex match on that.
let str = '<div>Hallo this is a string.</div><br/>';
let el = document.createElement('div');
el.innerHTML = str;
let txt = el.textContent;
let count = txt.match(/[^\s]/g).join('').length; // 19
DEMO

Split in to Sentences and Wrap With Tags

I'm trying to build a text fixing page for normalising text written in all capital letters, all lower case or an ungrammatical mixture of both.
What I'm currently trying to do is write a regular expression to find all full stops, question marks and line breaks, then split the string in to various strings containing all of the words up to and including each full stop.
Then I'm going to wrap them with <span> tags and use CSS :first-letter and text-transform:capitalize; to capitalise the first letter of each sentence.
The last stage will be writing a dictionary function to find user-specified words for capitalisation.
This question only concerns the part about writing a regex and splitting in to strings.
I've tried too many methods to post here, with varying results, but here's my current attempt:
for(var i=0; i < DoIt.length; i++){
DoIt[i].onclick = function(){
var offendingtext = input.value.toString();
var keeplinebreaks = offendingtext.replace(/\r?\n/g, '<br />');
var smalltext = keeplinebreaks.toLowerCase();
//split at each character I specify
var breakitup = smalltext.split(/[/.?\r\n]/g);
breakitup.forEach(function(i){
var i;
console.log(i);
var packagedtogo = document.createElement('span');
packagedtogo.className = 'sentence';
packagedtogo.innerHTML = breakitup[i];
output.appendChild(packagedtogo);
i++;
});
}
}
It was splitting at the right places before, but it was printing undefined in the output area between the tags. I've been at this for days, please could someone give me a hand.
How can I split a string in to multiple string sentences, and then wrap each string with html tags?
Your regex for the split is fine. Just forgot to escape a few characters:
var str = "SDFDSFDSF?sdf dsf sdfdsf. sdfdsfsdfdsfdsfdsfdsfsdfdsf sdf."
str.split( (/[\.\?\r\n]/g))
//["SDFDSFDSF", "sdf dsf sdfdsf", " sdfdsfsdfdsfdsfdsfdsfsdfdsf sdf", ""]
Use for each iteration capabilities like this:
breakitup.forEach(function(element){
var packagedtogo = document.createElement('span');
packagedtogo.className = 'sentence';
packagedtogo.innerHTML = element;//breakitup is undefiend
output.appendChild(packagedtogo);
//No need to increase index
});

JS Regex to find href of several a tags

I need a regex to find the contents of the hrefs from these a tags :
<p class="bc_shirt_delete">
delete
</p>
Just the urls, not the href/ tags.
I'm parsing a plain text ajax request here, so I need a regex.
You can try this regex:
/href="([^\'\"]+)/g
Example at: http://regexr.com?333d1
Update: or easier via non greedy method:
/href="(.*?)"/g
This will do it nicely. http://jsfiddle.net/grantk/cvBae/216/
Regex example: https://regex101.com/r/nLXheV/1
var str = '<p href="missme" class="test">delete</p>'
var patt = /<a[^>]*href=["']([^"']*)["']/g;
while(match=patt.exec(str)){
alert(match[1]);
}
Here is a robust solution:
let href_regex = /<a([^>]*?)href\s*=\s*(['"])([^\2]*?)\2\1*>/i,
link_text = 'another article link',
href = link_text.replace ( href_regex , '$3' );
What it does:
detects a tags
lazy skips over other HTML attributes and groups (1) so you DRY
matches href attribute
takes in consideration possible whitespace around =
makes a group (2) of ' and " so you DRY
matches anything but group (1) and groups (3) it
matches the group (2) of ' and "
matches the group (1) (other attributes)
matches whatever else is there until closing the tag
set proper flags i ignore case
You may don't need Regex to do that.
o = document.getElementsByTagName('a');
urls = Array();
for (i =0; i < o.length; i++){
urls[i] = o[i].href;
}
If it is a plain text, you may insert it into a displayed non DOM element, i.e display: none, and then deal with it regularly in a way like I described.
It might be easier to use jQuery
var html = '<li><h2 class="saved_shirt_name">new shirt 1</h2><button class="edit_shirt">Edit Shirt</button><button class="delete_shirt" data-eq="0" data-href="/CustomContentProcess.aspx?CCID=13524&OID=3936923&A=Delete">Delete Shirt</button></li><li><h2 class="saved_shirt_name">new shirt 2</h2><button class="edit_shirt">Edit Shirt</button><button class="delete_shirt" data-eq="0" data-href="/CustomContentProcess.aspx?CCID=13524&OID=3936924&A=Delete">Delete Shirt</button></li><li><h2 class="saved_shirt_name">new shirt 3</h2><button class="edit_shirt">Edit Shirt</button><button class="delete_shirt" data-eq="0" data-href="/CustomContentProcess.aspx?CCID=13524&OID=3936925&A=Delete">Delete Shirt</button></li>';
$(html).find('[data-href]');
And iterate each node
UPDATE (because post updated)
Let html be your raw response
var matches = $(html).find('[href]');
var hrefs = [];
$.each(matches, function(i, el){ hrefs.push($(el).attr('href'));});
//hrefs is an array of matches
I combined a few solutions around and came up with this (Tested in .NET):
(?<=href=[\'\"])([^\'\"]+)
Explanation:
(?<=) : look behind so it wont include these characters
[\'\"] : match both single and double quote
[^] : match everything else except the characters after '^' in here
+ : one or more occurrence of last character.
This works well and is not greedy with the quote as it would stop matching the moment it finds a quote
var str = "";
str += "<p class=\"bc_shirt_delete\">";
str += "delete";
str += "</p>";
var matches = [];
str.replace(/href=("|')(.*?)("|')/g, function(a, b, match) {
matches.push(match);
});
console.log(matches);
or if you don't care about the href:
var matches = str.match(/href=("|')(.*?)("|')/);
console.log(matches);
how about spaces around = ?
this code will fix it:
var matches = str.match(/href( *)=( *)("|'*)(.*?)("|'*)( |>)/);
console.log(matches);
It's important to be non-greedy. And to cater for —matching— ' or "
test = "<a href="#" class="foo bar"> banana
<a href='http://google.de/foo?yes=1&no=2' data-href='foobar'/>"
test.replace(/href=(?:\'.*?\'|\".*?\")/gi,'');
disclaimer: The one thing it does not catch is html5 attribs data-href...
In this specified case probably this is fastest pregmatch:
/f="([^"]*)/
gets ALL signs/characters (letters, numbers, newline signs etc.) form f=" to nearest next ", excluding it, flags for example /is are unnecesary, return null if empty
but if the source contains lots of other links, it will be necessary to determine that this is exactly the one you are looking for and here we can do it this way, just include in your pregmatch more of the source code, for example (of course its depend from source site code...)
/bc_shirt_delete">\s*<a href="([^"]*)

jQuery Uppercase word locator

I need to locate words for more than 4 characters that are written between <p> </p> in uppercase and add them a style (ex. italic).
I know about the function isUpperCase() but don't know how to apply it to check if the string is more than 4 characters.
function isUpperCase( string ) {
(?)
}
Thanks.
var ps = [].slice.call(document.getElementsByTagName("p"))
ps.forEach(function (p) {
p.textContent.split(" ").forEach(function (word) {
if (word.length > 4 && word.toUpperCase() === word) {
// 4character UPPERCASE word
}
})
})
You could use a regex to replace any uppercase text longer than four characters in the innerHTML of every <p> element with that text surrounded by the markup you're trying to insert:
$('p').each(function(){
var pattern = /([-A-Z0-9]{4,})/g;
var before = '<span style="color: red;">';
var after = '</span>';
$(this).html($(this).html().replace(pattern, before+"$1"+after));
});
http://jsfiddle.net/eHPVg/
Yeah, like Rob said, I don't think Raynos's answer will work cross-browser and it also won't let you modify the matches within the paragraph.
Here's a slightly modified version:
var i = 0, ps = document.getElementsByTagName("p");
for(len = ps.length; i<len; i++)
{
var p = ps[i];
p.innerHTML = p.innerHTML.replace(/\b([A-Z]{4,})\b/g, "<span style='font-style:italic'>$1</span>";
}
You can change the span code to be whatever style you want to add. Just make sure to leave the $1, which refers the original uppercase word.

Categories