Replace ASCII characters in a string using JavaScript - javascript

I am removing HTML tags from this data saved with these tags using the following
var Summary = HtmlString.replace(/<\/?[^>]+>/gi,'');
the above statement does replace most of the TAGS except • , · , ‘ , etc.
I am trying to figure out JavaScript replace function to find all that start with &# and ends on semicolon ;
Any help will be appreciated
var Summary = HtmlString.replace(/</?[^>]+>/gi,'');

Related

HTML Title tooltip gets cut off after spaces

I'm trying to get some specific title text to display via JavaScript, but I'm having some issues getting the entire string to show up.
The text I'm trying to display:
mechanical : Failed to copy
And here's what shows up in HTML:
`<td title="mechanical" :="" failed="" to="" copy="">mechanical : Failed to copy</td>`
The actual title displayed afterwards is just mechanical.
In Javascript:
var copyResult = json_obj[i].CopyResult; //variable that contains the text
copyResult = copyResult.replace(/["{}]/g, " "); //regex that removes some characters and replaces them with spaces
The copyResult variable is then added to the element I want.
It looks like having spaces "ends" the title attribute, so the browser tries to make more attributes with the remaining text.
What's the best way to fix this?
I was able to create a workaround. Since any space would end the title attribute, I simply used a regex to properly escape all of the space characters for the copyResult variable.
var copyResult = copyResult.replace(/[ ]/g,"\u00a0")
\u00a0 is the Unicode character for NO-BREAK-SPACE.
it's not the spaces ending the atribute, its the quotation marks... try escaping them with backslashes like \"

Replacing £ character from html textarea using javascript

I'm currently developing a simple web app using html with JavaScript, and I'm trying to do a simple string.replace call on a string received from a html textarea like so;
var contents = document.getElementById("contents").value;
var alteredText = contents.replace(/£/g, "poundsign");
The problem is that when a £ sign is included in the string, the replace call can't find it. I've looked at the code via the console and it seems that anytime there's a $ sign in JavaScript it adds a "Â" to the £ symbol, so
string.replace(/£/g, "poundsign");
as it was written in the js file becomes the following while running:
string.replace(/£/g, "poundsign");
while £ in var contents remains simply £ (putting £ into the textarea causes the replace call to work correctly). Is there a way to stop the  being added in the js file, or to add it to the html file before .replace is called?
The  is added anytime £ appears in the js file as far as I can see, and I haven't been able to get it to match up with the html without the user adding the  to the html themselves.
// replace pound string by empty string
var mystr = '£';
mystr = mystr.replace(/£/g,'poundsign');
alert(mystr);
Thanks to #David Guan for the link, that put me on the right track, and to everyone else that commented.
The issue was resolved when I used the Unicode number in the .replace call rather than the character, it was able to then match the £ sign correctly without the  also being inserted.

Javascript/jQuery Fail to Append html to body

I'm working on an android/ios app. i need some sort of infinit scroll, so when user scroll and reaching the end of the page new content will load for him.
In the native code i store the new content in an string and then append it to the page with javascript/jquery.
for android:
String js = "javascript:(function() { document.body.innerHTML += '" + newContent + "';}())";
loadUrl(js);
so far so good, BUT if my newContent contains a set of specific char this code will fail and nothing append.
I discovered this char so far: ' \n « »
If i replace this chars in the newContent then the code works fine and new content will append to body.
The problem is every time i think it's over and i find all illigal char a new char cause my code to fail.
I also tried to parse my string to html then add it, but it fails also.
js = "javascript:(function(){var html = $.parseHTML( '"+newContent+"' ); $(\"body\").append(html);}())";
loadUrl(js);
so is there any general way to get ride of this illegal chars?
any help would be appreciated.
Your text contains unescaped special characters which causes error.
Try escaping the text with encodeURI
Edit:
It might be though that the javascript already fails because of the special characters, then you would have to escape the string with Java already. Try to google for "java escape string for javascript" and you'll find plenty of information.

Regex to put quotes for html attributes

I have a scenario like this
in html tags, if the attributes is not surrounded either by single or double quotes.. i want to put double quotes for that
how to write regex for that?
If you repeat this regex as many times as there might be tags in an element, that should work so long as the text is fairly normal and not containing lots of special characters that might give false positives.
"<a href=www.google.com title = link >".replace(/(<[^>]+?=)([^"'\s][^\s>]+)/g,"$1'$2'")
Regex says: open tag (<) followed by one or more not close tags ([^>]+) ungreedily (?) followed by equals (=) all captured as the first group ((...)) and followed by second group ((...)) capturing not single or double quote or space ([^"'\s]) followed by not space or close tag ([^\s>]) one or more times (+) and then replace that with first captured group ($1) followed by second captured group in single quotes ('$2')
For example with looping:
html = "<a href=www.google.com another=something title = link >";
newhtml = null;
while(html != newhtml){
if(newhtml)
html = newhtml;
var newhtml = html.replace(/(<[^>]+?=)([^"'\s][^\s>]+)/,"$1'$2'");
}
alert(html);
But this is a bad way to go about your problem. It is better to use an HTML parser to parse, then re-format the HTML as you want it. That would ensure well formatted HTML wheras regular expressions could only ensure well formatted HTML if the input is exactly as expected.
Very helpful! I made a slight change to allow it to match attributes with a single character value:
/(<[^>]+?=)([^"'\s>][^\s>]*)/g (changed one or more + to zero or more * and added > to the first match in second group).

html entity decode fail with the new lines in textareas

when I get a text from a textarea in html like this
wase&
;#101;m
the correct decode is waseem
notice the newline , when I decode it I get
wase&;#101;m
the newline make errors here , Can I fix it ? I use javascript in the decoding process .
I use this function in decoding
function html_entity_decode(str) {
var ta=document.createElement("textarea");
ta.innerHTML=str.replace(/</g,"<").replace(/>/g,">");
return ta.value;
}
You could pass it through the following regex - Replace
&[\s\r\n]+;(?=#\d+;)
with
&
globally. Your HTML entity format is simply broken. Apart from the fact that HTML entities cannot contain whitespace and newlines, they cannot contain semi-colons in the middle.
Your input text may not be right and it is working as intended. Garbage-In-Garbage-Out.
I suspect the &\n; should be something else. But if not:
str.replace(/&\s*;/g, "");

Categories