Is there an easy way to take a string of html in JavaScript and strip out the html?
If you're running in a browser, then the easiest way is just to let the browser do it for you...
function stripHtml(html)
{
let tmp = document.createElement("DIV");
tmp.innerHTML = html;
return tmp.textContent || tmp.innerText || "";
}
Note: as folks have noted in the comments, this is best avoided if you don't control the source of the HTML (for example, don't run this on anything that could've come from user input). For those scenarios, you can still let the browser do the work for you - see Saba's answer on using the now widely-available DOMParser.
myString.replace(/<[^>]*>?/gm, '');
Simplest way:
jQuery(html).text();
That retrieves all the text from a string of html.
I would like to share an edited version of the Shog9's approved answer.
As Mike Samuel pointed with a comment, that function can execute inline javascript code.
But Shog9 is right when saying "let the browser do it for you..."
so.. here my edited version, using DOMParser:
function strip(html){
let doc = new DOMParser().parseFromString(html, 'text/html');
return doc.body.textContent || "";
}
here the code to test the inline javascript:
strip("<img onerror='alert(\"could run arbitrary JS here\")' src=bogus>")
Also, it does not request resources on parse (like images)
strip("Just text <img src='https://assets.rbl.ms/4155638/980x.jpg'>")
As an extension to the jQuery method, if your string might not contain HTML (eg if you are trying to remove HTML from a form field)
jQuery(html).text();
will return an empty string if there is no HTML
Use:
jQuery('<p>' + html + '</p>').text();
instead.
Update:
As has been pointed out in the comments, in some circumstances this solution will execute javascript contained within html if the value of html could be influenced by an attacker, use a different solution.
Converting HTML for Plain Text emailing keeping hyperlinks (a href) intact
The above function posted by hypoxide works fine, but I was after something that would basically convert HTML created in a Web RichText editor (for example FCKEditor) and clear out all HTML but leave all the Links due the fact that I wanted both the HTML and the plain text version to aid creating the correct parts to an STMP email (both HTML and plain text).
After a long time of searching Google myself and my collegues came up with this using the regex engine in Javascript:
str='this string has <i>html</i> code i want to <b>remove</b><br>Link Number 1 ->BBC Link Number 1<br><p>Now back to normal text and stuff</p>
';
str=str.replace(/<br>/gi, "\n");
str=str.replace(/<p.*>/gi, "\n");
str=str.replace(/<a.*href="(.*?)".*>(.*?)<\/a>/gi, " $2 (Link->$1) ");
str=str.replace(/<(?:.|\s)*?>/g, "");
the str variable starts out like this:
this string has <i>html</i> code i want to <b>remove</b><br>Link Number 1 ->BBC Link Number 1<br><p>Now back to normal text and stuff</p>
and then after the code has run it looks like this:-
this string has html code i want to remove
Link Number 1 -> BBC (Link->http://www.bbc.co.uk) Link Number 1
Now back to normal text and stuff
As you can see the all the HTML has been removed and the Link have been persevered with the hyperlinked text is still intact. Also I have replaced the <p> and <br> tags with \n (newline char) so that some sort of visual formatting has been retained.
To change the link format (eg. BBC (Link->http://www.bbc.co.uk) ) just edit the $2 (Link->$1), where $1 is the href URL/URI and the $2 is the hyperlinked text. With the links directly in body of the plain text most SMTP Mail Clients convert these so the user has the ability to click on them.
Hope you find this useful.
An improvement to the accepted answer.
function strip(html)
{
var tmp = document.implementation.createHTMLDocument("New").body;
tmp.innerHTML = html;
return tmp.textContent || tmp.innerText || "";
}
This way something running like this will do no harm:
strip("<img onerror='alert(\"could run arbitrary JS here\")' src=bogus>")
Firefox, Chromium and Explorer 9+ are safe.
Opera Presto is still vulnerable.
Also images mentioned in the strings are not downloaded in Chromium and Firefox saving http requests.
This should do the work on any Javascript environment (NodeJS included).
const text = `
<html lang="en">
<head>
<style type="text/css">*{color:red}</style>
<script>alert('hello')</script>
</head>
<body><b>This is some text</b><br/><body>
</html>`;
// Remove style tags and content
text.replace(/<style[^>]*>.*<\/style>/gm, '')
// Remove script tags and content
.replace(/<script[^>]*>.*<\/script>/gm, '')
// Remove all opening, closing and orphan HTML tags
.replace(/<[^>]+>/gm, '')
// Remove leading spaces and repeated CR/LF
.replace(/([\r\n]+ +)+/gm, '');
I altered Jibberboy2000's answer to include several <BR /> tag formats, remove everything inside <SCRIPT> and <STYLE> tags, format the resulting HTML by removing multiple line breaks and spaces and convert some HTML-encoded code into normal. After some testing it appears that you can convert most of full web pages into simple text where page title and content are retained.
In the simple example,
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<!--comment-->
<head>
<title>This is my title</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<style>
body {margin-top: 15px;}
a { color: #D80C1F; font-weight:bold; text-decoration:none; }
</style>
</head>
<body>
<center>
This string has <i>html</i> code i want to <b>remove</b><br>
In this line BBC with link is mentioned.<br/>Now back to "normal text" and stuff using <html encoding>
</center>
</body>
</html>
becomes
This is my title
This string has html code i want to remove
In this line BBC (http://www.bbc.co.uk) with link is mentioned.
Now back to "normal text" and stuff using
The JavaScript function and test page look this:
function convertHtmlToText() {
var inputText = document.getElementById("input").value;
var returnText = "" + inputText;
//-- remove BR tags and replace them with line break
returnText=returnText.replace(/<br>/gi, "\n");
returnText=returnText.replace(/<br\s\/>/gi, "\n");
returnText=returnText.replace(/<br\/>/gi, "\n");
//-- remove P and A tags but preserve what's inside of them
returnText=returnText.replace(/<p.*>/gi, "\n");
returnText=returnText.replace(/<a.*href="(.*?)".*>(.*?)<\/a>/gi, " $2 ($1)");
//-- remove all inside SCRIPT and STYLE tags
returnText=returnText.replace(/<script.*>[\w\W]{1,}(.*?)[\w\W]{1,}<\/script>/gi, "");
returnText=returnText.replace(/<style.*>[\w\W]{1,}(.*?)[\w\W]{1,}<\/style>/gi, "");
//-- remove all else
returnText=returnText.replace(/<(?:.|\s)*?>/g, "");
//-- get rid of more than 2 multiple line breaks:
returnText=returnText.replace(/(?:(?:\r\n|\r|\n)\s*){2,}/gim, "\n\n");
//-- get rid of more than 2 spaces:
returnText = returnText.replace(/ +(?= )/g,'');
//-- get rid of html-encoded characters:
returnText=returnText.replace(/ /gi," ");
returnText=returnText.replace(/&/gi,"&");
returnText=returnText.replace(/"/gi,'"');
returnText=returnText.replace(/</gi,'<');
returnText=returnText.replace(/>/gi,'>');
//-- return
document.getElementById("output").value = returnText;
}
It was used with this HTML:
<textarea id="input" style="width: 400px; height: 300px;"></textarea><br />
<button onclick="convertHtmlToText()">CONVERT</button><br />
<textarea id="output" style="width: 400px; height: 300px;"></textarea><br />
var text = html.replace(/<\/?("[^"]*"|'[^']*'|[^>])*(>|$)/g, "");
This is a regex version, which is more resilient to malformed HTML, like:
Unclosed tags
Some text <img
"<", ">" inside tag attributes
Some text <img alt="x > y">
Newlines
Some <a
href="http://google.com">
The code
var html = '<br>This <img alt="a>b" \r\n src="a_b.gif" />is > \nmy<>< > <a>"text"</a'
var text = html.replace(/<\/?("[^"]*"|'[^']*'|[^>])*(>|$)/g, "");
from CSS tricks:
https://css-tricks.com/snippets/javascript/strip-html-tags-in-javascript/
const originalString = `
<div>
<p>Hey that's <span>somthing</span></p>
</div>
`;
const strippedString = originalString.replace(/(<([^>]+)>)/gi, "");
console.log(strippedString);
Another, admittedly less elegant solution than nickf's or Shog9's, would be to recursively walk the DOM starting at the <body> tag and append each text node.
var bodyContent = document.getElementsByTagName('body')[0];
var result = appendTextNodes(bodyContent);
function appendTextNodes(element) {
var text = '';
// Loop through the childNodes of the passed in element
for (var i = 0, len = element.childNodes.length; i < len; i++) {
// Get a reference to the current child
var node = element.childNodes[i];
// Append the node's value if it's a text node
if (node.nodeType == 3) {
text += node.nodeValue;
}
// Recurse through the node's children, if there are any
if (node.childNodes.length > 0) {
appendTextNodes(node);
}
}
// Return the final result
return text;
}
If you want to keep the links and the structure of the content (h1, h2, etc) then you should check out TextVersionJS You can use it with any HTML, although it was created to convert an HTML email to plain text.
The usage is very simple. For example in node.js:
var createTextVersion = require("textversionjs");
var yourHtml = "<h1>Your HTML</h1><ul><li>goes</li><li>here.</li></ul>";
var textVersion = createTextVersion(yourHtml);
Or in the browser with pure js:
<script src="textversion.js"></script>
<script>
var yourHtml = "<h1>Your HTML</h1><ul><li>goes</li><li>here.</li></ul>";
var textVersion = createTextVersion(yourHtml);
</script>
It also works with require.js:
define(["textversionjs"], function(createTextVersion) {
var yourHtml = "<h1>Your HTML</h1><ul><li>goes</li><li>here.</li></ul>";
var textVersion = createTextVersion(yourHtml);
});
const htmlParser= new DOMParser().parseFromString("<h6>User<p>name</p></h6>" , 'text/html');
const textString= htmlParser.body.textContent;
console.log(textString)
A lot of people have answered this already, but I thought it might be useful to share the function I wrote that strips HTML tags from a string but allows you to include an array of tags that you do not want stripped. It's pretty short and has been working nicely for me.
function removeTags(string, array){
return array ? string.split("<").filter(function(val){ return f(array, val); }).map(function(val){ return f(array, val); }).join("") : string.split("<").map(function(d){ return d.split(">").pop(); }).join("");
function f(array, value){
return array.map(function(d){ return value.includes(d + ">"); }).indexOf(true) != -1 ? "<" + value : value.split(">")[1];
}
}
var x = "<span><i>Hello</i> <b>world</b>!</span>";
console.log(removeTags(x)); // Hello world!
console.log(removeTags(x, ["span", "i"])); // <span><i>Hello</i> world!</span>
For easier solution, try this => https://css-tricks.com/snippets/javascript/strip-html-tags-in-javascript/
var StrippedString = OriginalString.replace(/(<([^>]+)>)/ig,"");
It is also possible to use the fantastic htmlparser2 pure JS HTML parser. Here is a working demo:
var htmlparser = require('htmlparser2');
var body = '<p><div>This is </div>a <span>simple </span> <img src="test"></img>example.</p>';
var result = [];
var parser = new htmlparser.Parser({
ontext: function(text){
result.push(text);
}
}, {decodeEntities: true});
parser.write(body);
parser.end();
result.join('');
The output will be This is a simple example.
See it in action here: https://tonicdev.com/jfahrenkrug/extract-text-from-html
This works in both node and the browser if you pack your web application using a tool like webpack.
I made some modifications to original Jibberboy2000 script
Hope it'll be usefull for someone
str = '**ANY HTML CONTENT HERE**';
str=str.replace(/<\s*br\/*>/gi, "\n");
str=str.replace(/<\s*a.*href="(.*?)".*>(.*?)<\/a>/gi, " $2 (Link->$1) ");
str=str.replace(/<\s*\/*.+?>/ig, "\n");
str=str.replace(/ {2,}/gi, " ");
str=str.replace(/\n+\s*/gi, "\n\n");
After trying all of the answers mentioned most if not all of them had edge cases and couldn't completely support my needs.
I started exploring how php does it and came across the php.js lib which replicates the strip_tags method here: http://phpjs.org/functions/strip_tags/
function stripHTML(my_string){
var charArr = my_string.split(''),
resultArr = [],
htmlZone = 0,
quoteZone = 0;
for( x=0; x < charArr.length; x++ ){
switch( charArr[x] + htmlZone + quoteZone ){
case "<00" : htmlZone = 1;break;
case ">10" : htmlZone = 0;resultArr.push(' ');break;
case '"10' : quoteZone = 1;break;
case "'10" : quoteZone = 2;break;
case '"11' :
case "'12" : quoteZone = 0;break;
default : if(!htmlZone){ resultArr.push(charArr[x]); }
}
}
return resultArr.join('');
}
Accounts for > inside attributes and <img onerror="javascript"> in newly created dom elements.
usage:
clean_string = stripHTML("string with <html> in it")
demo:
https://jsfiddle.net/gaby_de_wilde/pqayphzd/
demo of top answer doing the terrible things:
https://jsfiddle.net/gaby_de_wilde/6f0jymL6/1/
Here's a version which sorta addresses #MikeSamuel's security concern:
function strip(html)
{
try {
var doc = document.implementation.createDocument('http://www.w3.org/1999/xhtml', 'html', null);
doc.documentElement.innerHTML = html;
return doc.documentElement.textContent||doc.documentElement.innerText;
} catch(e) {
return "";
}
}
Note, it will return an empty string if the HTML markup isn't valid XML (aka, tags must be closed and attributes must be quoted). This isn't ideal, but does avoid the issue of having the security exploit potential.
If not having valid XML markup is a requirement for you, you could try using:
var doc = document.implementation.createHTMLDocument("");
but that isn't a perfect solution either for other reasons.
I think the easiest way is to just use Regular Expressions as someone mentioned above. Although there's no reason to use a bunch of them. Try:
stringWithHTML = stringWithHTML.replace(/<\/?[a-z][a-z0-9]*[^<>]*>/ig, "");
Below code allows you to retain some html tags while stripping all others
function strip_tags(input, allowed) {
allowed = (((allowed || '') + '')
.toLowerCase()
.match(/<[a-z][a-z0-9]*>/g) || [])
.join(''); // making sure the allowed arg is a string containing only tags in lowercase (<a><b><c>)
var tags = /<\/?([a-z][a-z0-9]*)\b[^>]*>/gi,
commentsAndPhpTags = /<!--[\s\S]*?-->|<\?(?:php)?[\s\S]*?\?>/gi;
return input.replace(commentsAndPhpTags, '')
.replace(tags, function($0, $1) {
return allowed.indexOf('<' + $1.toLowerCase() + '>') > -1 ? $0 : '';
});
}
I just needed to strip out the <a> tags and replace them with the text of the link.
This seems to work great.
htmlContent= htmlContent.replace(/<a.*href="(.*?)">/g, '');
htmlContent= htmlContent.replace(/<\/a>/g, '');
The accepted answer works fine mostly, however in IE if the html string is null you get the "null" (instead of ''). Fixed:
function strip(html)
{
if (html == null) return "";
var tmp = document.createElement("DIV");
tmp.innerHTML = html;
return tmp.textContent || tmp.innerText || "";
}
A safer way to strip the html with jQuery is to first use jQuery.parseHTML to create a DOM, ignoring any scripts, before letting jQuery build an element and then retrieving only the text.
function stripHtml(unsafe) {
return $($.parseHTML(unsafe)).text();
}
Can safely strip html from:
<img src="unknown.gif" onerror="console.log('running injections');">
And other exploits.
nJoy!
const strip=(text) =>{
return (new DOMParser()?.parseFromString(text,"text/html"))
?.body?.textContent
}
const value=document.getElementById("idOfEl").value
const cleanText=strip(value)
With jQuery you can simply retrieving it by using
$('#elementID').text()
I have created a working regular expression myself:
str=str.replace(/(<\?[a-z]*(\s[^>]*)?\?(>|$)|<!\[[a-z]*\[|\]\]>|<!DOCTYPE[^>]*?(>|$)|<!--[\s\S]*?(-->|$)|<[a-z?!\/]([a-z0-9_:.])*(\s[^>]*)?(>|$))/gi, '');
simple 2 line jquery to strip the html.
var content = "<p>checking the html source </p><p>
</p><p>with </p><p>all</p><p>the html </p><p>content</p>";
var text = $(content).text();//It gets you the plain text
console.log(text);//check the data in your console
cj("#text_area_id").val(text);//set your content to text area using text_area_id
I am trying to remove all the html tags out of a string in Javascript.
Heres what I have... I can't figure out why its not working....any know what I am doing wrong?
<script type="text/javascript">
var regex = "/<(.|\n)*?>/";
var body = "<p>test</p>";
var result = body.replace(regex, "");
alert(result);
</script>
Thanks a lot!
Try this, noting that the grammar of HTML is too complex for regular expressions to be correct 100% of the time:
var regex = /(<([^>]+)>)/ig
, body = "<p>test</p>"
, result = body.replace(regex, "");
console.log(result);
If you're willing to use a library such as jQuery, you could simply do this:
console.log($('<p>test</p>').text());
This is an old question, but I stumbled across it and thought I'd share the method I used:
var body = '<div id="anid">some text</div> and some more text';
var temp = document.createElement("div");
temp.innerHTML = body;
var sanitized = temp.textContent || temp.innerText;
sanitized will now contain: "some text and some more text"
Simple, no jQuery needed, and it shouldn't let you down even in more complex cases.
Warning
This can't safely deal with user content, because it's vulnerable to script injections. For example, running this:
var body = '<img src=fake onerror=alert("dangerous")> Hello';
var temp = document.createElement("div");
temp.innerHTML = body;
var sanitized = temp.textContent || temp.innerText;
Leads to an alert being emitted.
This worked for me.
var regex = /( |<([^>]+)>)/ig
, body = tt
, result = body.replace(regex, "");
alert(result);
This is a solution for HTML tag and   etc and you can remove and add conditions
to get the text without HTML and you can replace it by any.
convertHtmlToText(passHtmlBlock)
{
str = str.toString();
return str.replace(/<[^>]*(>|$)| ||»|«|>/g, 'ReplaceIfYouWantOtherWiseKeepItEmpty');
}
Here is how TextAngular (WYSISYG Editor) is doing it. I also found this to be the most consistent answer, which is NO REGEX.
#license textAngular
Author : Austin Anderson
License : 2013 MIT
Version 1.5.16
// turn html into pure text that shows visiblity
function stripHtmlToText(html)
{
var tmp = document.createElement("DIV");
tmp.innerHTML = html;
var res = tmp.textContent || tmp.innerText || '';
res.replace('\u200B', ''); // zero width space
res = res.trim();
return res;
}
you can use a powerful library for management String which is undrescore.string.js
_('a link').stripTags()
=> 'a link'
_('a link<script>alert("hello world!")</script>').stripTags()
=> 'a linkalert("hello world!")'
Don't forget to import this lib as following :
<script src="underscore.js" type="text/javascript"></script>
<script src="underscore.string.js" type="text/javascript"></script>
<script type="text/javascript"> _.mixin(_.str.exports())</script>
my simple JavaScript library called FuncJS has a function called "strip_tags()" which does the task for you — without requiring you to enter any regular expressions.
For example, say that you want to remove tags from a sentence - with this function, you can do it simply like this:
strip_tags("This string <em>contains</em> <strong>a lot</strong> of tags!");
This will produce "This string contains a lot of tags!".
For a better understanding, please do read the documentation at
GitHub FuncJS.
Additionally, if you'd like, please provide some feedback through the form. It would be very helpful to me!
For a proper HTML sanitizer in JS, see http://code.google.com/p/google-caja/wiki/JsHtmlSanitizer
<html>
<head>
<script type="text/javascript">
function striptag(){
var html = /(<([^>]+)>)/gi;
for (i=0; i < arguments.length; i++)
arguments[i].value=arguments[i].value.replace(html, "")
}
</script>
</head>
<body>
<form name="myform">
<textarea class="comment" title="comment" name=comment rows=4 cols=40></textarea><br>
<input type="button" value="Remove HTML Tags" onClick="striptag(this.form.comment)">
</form>
</body>
</html>
The selected answer doesn't always ensure that HTML is stripped, as it's still possible to construct an invalid HTML string through it by crafting a string like the following.
"<<h1>h1>foo<<//</h1>h1/>"
This input will ensure that the stripping assembles a set of tags for you and will result in:
"<h1>foo</h1>"
additionally jquery's text function will strip text not surrounded by tags.
Here's a function that uses jQuery but should be more robust against both of these cases:
var stripHTML = function(s) {
var lastString;
do {
s = $('<div>').html(lastString = s).text();
} while(lastString !== s)
return s;
};
The way I do it is practically a one-liner.
The function creates a Range object and then creates a DocumentFragment in the Range with the string as the child content.
Then it grabs the text of the fragment, removes any "invisible"/zero-width characters, and trims it of any leading/trailing white space.
I realize this question is old, I just thought my solution was unique and wanted to share. :)
function getTextFromString(htmlString) {
return document
.createRange()
// Creates a fragment and turns the supplied string into HTML nodes
.createContextualFragment(htmlString)
// Gets the text from the fragment
.textContent
// Removes the Zero-Width Space, Zero-Width Joiner, Zero-Width No-Break Space, Left-To-Right Mark, and Right-To-Left Mark characters
.replace(/[\u200B-\u200D\uFEFF\u200E\u200F]/g, '')
// Trims off any extra space on either end of the string
.trim();
}
var cleanString = getTextFromString('<p>Hello world! I <em>love</em> <strong>JavaScript</strong>!!!</p>');
alert(cleanString);
If you want to do this with a library and are not using JQuery, the best JS library specifically for this purpose is striptags.
It is heavier than a regex (17.9kb), but if you need greater security than a regex can provide/don't care about the extra 17.6kb, then it's the best solution.
Like others have stated, regex will not work. Take a moment to read my article about why you cannot and should not try to parse html with regex, which is what you're doing when you're attempting to strip html from your source string.
I’m making a random sentence generator for my English class. I’m close but because of my limited php and javascript knowledge I need to ask for help. I’m not bad at reading the code, I just get stuck writing it.
I want to use explode to break up a string of comma seperated values. The string is a mix of English and Spanish, on the .txt file they would seperated like:
The book, El libro
The man, El hombre
The woman, La mujer
etc.
I would like to break these two values into an array and display them in separate places on my web page.
I`m going to use a random text generator script that I found, it’s working great with no problems. I just need to modify it using explode to read, separate the values into an array, and be able to display the separate values of the array.
<?php
/* File, where the random text/quotes are stored one per line */
$settings['text_from_file'] = 'quotes.txt';
/*
How to display the text?
0 = raw mode: print the text as it is, when using RanTex as an include
1 = Javascript mode: when using Javascript to display the quote
*/
$settings['display_type'] = 1;
/* Allow on-the-fly settings override? 0 = NO, 1 = YES */
$settings['allow_otf'] = 1;
// Override type?
if ($settings['allow_otf'] && isset($_GET['type']))
{
$type = intval($_GET['type']);
}
else
{
$type = $settings['display_type'];
}
// Get a list of all text options
if ($settings['text_from_file'])
{
$settings['quotes'] = file($settings['text_from_file']);
}
// If we have any text choose a random one, otherwise show 'No text to choose from'
if (count($settings['quotes']))
{
$txt = $settings['quotes'][array_rand($settings['quotes'])];
}
else
{
$txt = 'No text to choose from';
}
// Output the image according to the selected type
if ($type)
{
// New lines will break Javascript, remove any and replace them with <br />
$txt = nl2br(trim($txt));
$txt = str_replace(array("\n","\r"),'',$txt);
// Set the correct MIME type
header("Content-type: text/javascript");
// Print the Javascript code
echo 'document.write(\''.addslashes($txt).'\')';
}
else
{
echo $txt;
}
?>
The script that displays the result:
<script type="text/javascript" src="rantex.php?type=1"></script>
Can someone please help me modify the rantex.php file so that I can use explode to separate the different comma separated values, and use a different script to call them in different places on my web page?
Thank you, and please excuse my noobness.
The following seems unnecessary, since file() will have already removed new line characters:
// New lines will break Javascript, remove any and replace them with <br />
$txt = nl2br(trim($txt));
$txt = str_replace(array("\n","\r"),'',$txt);
To break your line, you may instead use:
list($english, $spanish) = explode(', ', trim($txt));
It seems you are trying to use PHP to serve a static page with some random sentences, right? So why not use PHP to serve valid JSON, and handle to display logic on the client?
Heres a quick implementation.
// Get the data from the text file
$source = file_get_contents('./quotes.txt', true);
// Build an array (break on every line break)
$sentences = explode("\n", $source);
// Filter out empty values (if there is any)
$filtered = array_filter($sentences, function($item) {
return $item !== "";
});
// Build a hashmap of the array
$pairs = array_map(function($item) {
return ['sentence' => $item];
}, $filtered);
// Encode the hashmap to JSON, and return this to the client.
$json = json_encode($pairs);
Now you can let the client handle the rest, with some basic JavaScript.
// Return a random sentence from your list.
var random = sentences[Math.floor(Math.random() * sentences.length)];
// Finally display it
random.sentence
[edit]
You can get the JSON data to client in many ways, but if you don't want to use something like Ajax, you could simply just dump the contents on your webpage, then use JavaScript to update the random sentence, from the global window object.
// Inside your php page
<p>English: <span id="english"></span></p>
<p>Spanish: <span id="spanish"></span></p>
<script>
var sentences = <?= json_encode($pairs); ?>;
var random = sentences[Math.floor(Math.random() * sentences.length)];
var elspa = document.getElementById('spanish');
var eleng = document.getElementById('english');
elspa.innerText = random.sentence.split(',')[1];
eleng.innerText = random.sentence.split(',')[0];
</script>
Ok, so I have this figured out, I take 0 credit because I paid someone to do it. Special thanks to #stormpat for sending me in the right direction, if not for him I wouldn't have looked at this from a JSON point of view.
The .PHP file is like so:
<?php
$f_contents = file('quotes.txt');
$line = trim($f_contents[rand(0, count($f_contents) - 1)]);
$data = explode(',', $line);
$data['eng'] = $data[0];
$data['esp'] = $data[1];
echo json_encode($data);
?>
On the .HTML page in the header:
<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.11.2/jquery.min.js"></script>
<script>
(function ($) {
$(function()
{
function load_random_data() {
$.get('random_line.php', function(data) {
var data = $.parseJSON(data);
$('#random_english').text(data.eng);
$('#random_spanish').text(data.esp);
});
}
load_random_data();
$('#get_random').click(function(e){
e.preventDefault();
load_random_data();
});
});
})(jQuery);
</script>
This splits the different variables into classes, so to call them into my html page I call them by their class, for instance I wanted to drop the variable into a table cell so I gave the individual td cell a class:
<td id="random_spanish"></td>
<td id="random_english"></td>
Plus as a bonus the coder threw in a nifty button to refresh the json classes:
<input type="button" value="Get random" id="get_random" />
So now I don`t have to have my students refresh the whole web page, they can just hit the button and refresh the random variables.
Thanks again everyone!
I have a list of airport codes, names, and locations in an Excel Spreadsheet like the below:
+-------+----------------------------------------+-------------------+
| Code | Airport Name | Location |
+-------+----------------------------------------+-------------------+
| AUA | Queen Beatrix International Airport | Oranjestad, Aruba|
+-------+----------------------------------------+-------------------+
My Javascript is passed a 3 character string that should be an airline code. When that happens I need to find the code on the spreadsheet and return the Airport Name and Location.
Im thinking something like:
var code = "AUA";
console.log(getAirportInfo(code));
function getAirportInfo(code) {
// get information from spreadsheet
//format info (no help needed there)
return airportInfo;
}
Where the log would write out:
Oranjestad, Aruba (AUA): Queen Beatrix International Airport
What is the easiest method to get the data I need from the spreadsheet?
Extra Info:
The spreadsheet has over 17,000 entries
The function alluded to above may be called up to 8 times in row
I don't have to use an Excel Spreadsheet thats just what I have now
I will never need to edit the spreadsheet with my code
I did search around the web but everything I could find was much more complicated than what Im trying to do so it made it hard to understand what Im looking for.
Thank you for any help pointing me in the right direction.
I ended up using a tool at shancarter.com/data_converter to convert my flie to a JSON file and linked that to my page. Now I just loop through that JSON object to get what I need. This seemed like the simplest way for my particular needs.
I've used a plain text file(csv, or tsv both of which can be exported directly from Excel)
Loaded that into a string var via xmlhttprequest. Usually the browsers cache will stop having to download the file on each page load.
Then have a Regex parse out the values as needed.
All without using any third party....I can dig the code out if you wish.
Example:
you will need to have the data.txt file in the same web folder as this page, or update the paths...
<html>
<head>
<script>
var fileName = "data.txt";
var data = "";
req = new XMLHttpRequest();
req.open("GET", fileName, false);
req.addEventListener("readystatechange", function (e) {
data = req.responseText ;
});
req.send();
function getInfoByCode(c){
if( data == "" ){
return 'DataNotReady' ;
} else {
var rx = new RegExp( "^(" + c + ")\\s+\\|\\s+(.+)\\s+\\|\\s+\\s+(.+)\\|", 'm' ) ;
var values = data.match(rx,'m');
return { airport:values[2] , city:values[3] };
}
}
function clickButton(){
var e = document.getElementById("code");
var ret = getInfoByCode(e.value);
var res = document.getElementById("res");
res.innerText = "Airport:" + ret.airport + " in " + ret.city;
}
</script>
</head>
<body>
<input id="code" value="AUA">
<button onclick="clickButton();">Find</button>
<div id="res">
</div>
</body>
</html>
I have a web page that asks the user for a paragraph of text, then performs some operation on it. To demo it to lazy users, I'd like to add an "I feel lucky" button that will grab some random text from Wikipedia and populate the inputs.
How can I use Javascript to fetch a sequence of text from a random Wikipedia article?
I found some examples of fetching and parsing articles using the Wikipedia API, but they tend to be server side. I'm looking for a solution that runs entirely from the client and doesn't get scuppered by same origin policy.
Note random gibberish is not sufficient; I need human-readable sentences that make sense.
My answer builds on the technique suggested here.
The tricky part is formulating the correct query string:
http://en.wikipedia.org/w/api.php?action=query&generator=random&prop=extracts&exchars=500&format=json&callback=onWikipedia
generator=random selects a random page
prop=extracts and exchars=500 retrieves a 500 character extract
format=json returns JSON-formatted data
callback= causes that data to be wrapped in a function call so it can be treated like any other <script> and injected into your page (see JSONP), thus bypassing cross-domain barriers.
requestid can optionally be added, with a new value each time, to avoid stale results from the browser cache (required in IE9)
The page served by the query is something that looks like this (I've added whitespace for readability):
onWikipedia(
{"query":
{"pages":
{"12362520":
{"pageid":12362520,
"ns":0,
"title":"Power Building",
"extract":"<p>The <b>Power Building<\/b> is a historic commercial building in
the downtown of Cincinnati, Ohio, United States. Built in 1903, it
was designed by Harry Hake. It was listed on the National Register
of Historic Places on March 5, 1999. One week later, a group of
buildings in the northeastern section of downtown was named a
historic district, the Cincinnati East Manufacturing and Warehouse
District; the Power Building is one of the district's contributing
properties.<\/p>\n<h2> Notes<\/h2>"
} } } }
)
Of course you'll get a different article each time.
Here's a full, working example which you can try out on JSBin.
<HTML><BODY>
<p><textarea id="textbox" style="width:350px; height:150px"></textarea></p>
<p><button type="button" id="button" onclick="startFetch(100, 500)">
Fetch random Wikipedia extract</button></p>
<script type="text/javascript">
var textbox = document.getElementById("textbox");
var button = document.getElementById("button");
var tempscript = null, minchars, maxchars, attempts;
function startFetch(minimumCharacters, maximumCharacters, isRetry) {
if (tempscript) return; // a fetch is already in progress
if (!isRetry) {
attempts = 0;
minchars = minimumCharacters; // save params in case retry needed
maxchars = maximumCharacters;
button.disabled = true;
button.style.cursor = "wait";
}
tempscript = document.createElement("script");
tempscript.type = "text/javascript";
tempscript.id = "tempscript";
tempscript.src = "http://en.wikipedia.org/w/api.php"
+ "?action=query&generator=random&prop=extracts"
+ "&exchars="+maxchars+"&format=json&callback=onFetchComplete&requestid="
+ Math.floor(Math.random()*999999).toString();
document.body.appendChild(tempscript);
// onFetchComplete invoked when finished
}
function onFetchComplete(data) {
document.body.removeChild(tempscript);
tempscript = null
var s = getFirstProp(data.query.pages).extract;
s = htmlDecode(stripTags(s));
if (s.length > minchars || attempts++ > 5) {
textbox.value = s;
button.disabled = false;
button.style.cursor = "auto";
} else {
startFetch(0, 0, true); // retry
}
}
function getFirstProp(obj) {
for (var i in obj) return obj[i];
}
// This next bit borrowed from Prototype / hacked together
// You may want to replace with something more robust
function stripTags(s) {
return s.replace(/<\w+(\s+("[^"]*"|'[^']*'|[^>])+)?>|<\/\w+>/gi, "");
}
function htmlDecode(input){
var e = document.createElement("div");
e.innerHTML = input;
return e.childNodes.length === 0 ? "" : e.childNodes[0].nodeValue;
}
</script>
</BODY></HTML>
One downside of generator=random is you often get talk pages or generated content that are not actual articles. If anyone can improve the query string to limit it to quality articles, that would be great!