First off, don't link to the "Don't parse HTML with Regex" post :)
I've got the following HTML, which is used to display prices in various currencies, inc and ex tax:
<span id="price_break_12345" name="1">
<span class="price">
<span class="inc" >
<span class="GBP">£25.00</span>
<span class="USD" style="display:none;">$34.31</span>
<span class="EUR" style="display:none;">27.92 €</span>
</span>
<span class="ex" style="display:none;">
<span class="GBP">£20.83</span>
<span class="USD" style="display:none;">$34.31</span>
<span class="EUR" style="display:none;">23.27 €</span>
</span>
</span>
<span style="display:none" class="raw_price">25.000</span>
</span>
An AJAX call returns a single string of HTML, containing multiple copies of the above HTML, with the prices varying. What I'm trying to match with regex is:
Each block of the above HTML (as mentioned, it occurs multiple times in the return string)
The value of the name attribute on the outermost span
What I have so far is this:
var price_regex = new RegExp(/(<span([\s\S]*?)><span([\s\S]*?)>([\s\S]*?)<\/span><\/span\>)/gm);
console && console.log(price_regex.exec(product_price));
It matches the first price break once for each price break that occurs (so if there's name=1, name=5 and name=15 it matches name=1 3 times.
Whereabouts am I going wrong?
So, if you can count on the format of that first span in each block like this:
<span id="price_break_12345" name="1">
Then, how about you use code like this to cycle through all the matches. This code identifies the price_break_xxxx id value in that first span and then picks out the following name attribute:
var re = /id="price_break_\d+"\s+name="([^"]+)"/gm;
var match;
while (match = re.exec(str)) {
console.log(match[1]);
}
You can see it work here: http://jsfiddle.net/jfriend00/G39ne/.
I used a converter to make three of your blocks of HTML into a single javascript string (to simulate what you get back from your ajax call) so I could run the code on it.
A more robust way to do this is to just use the browser's HTML parser to do all the work for you. Assuming you have the HTML in a string variable named `str', you can use the browser's parser like this:
function getElementChildren(parent) {
var elements = [];
var children = parent.childNodes;
for (var i = 0, len = children.length; i < len; i++) {
// collect element nodes only
if (children[i].nodeType == 1) {
elements.push(children[i]);
}
}
return(elements);
}
var div = document.createElement("div");
div.innerHTML = str;
var priceBlocks = getElementChildren(div);
for (i = 0; i < priceBlocks.length; i++) {
console.log(priceBlocks[i].id + ", " + priceBlocks[i].getAttribute("name") + "<br>");
}
Demo here: http://jsfiddle.net/jfriend00/F6D8d/
This will leave you with all the DOM traversal functions for these elements rather than using (the somewhat brittle) regular expressions on HTML.
Thanks in large part to jfriend for making me realise why my regex was matching in a strange way (while (price_break = regex.exec(string)) instead of just exec'ing it once), I've got it working:
var price_regex = new RegExp(/<span[\s\S]*?name="([0-9]+)"[\s\S]*?><span[\s\S]*?>[\s\S]*?<\/span><\/span\>/gm);
var price_break;
while (price_break = price_regex.exec(strProductPrice))
{
console && console.log(price_break);
}
I had a ton of useless () which were just clogging up the result set, so stripping them out made things a lot simpler.
The other thing, as mentioned above was that originally I was just doing
price_break = price_regex.exec(strProductPrice)
which runs the regex once, and returns the first match only (which I mistook for returning 3 copies of the first match, due to the ()s). By looping over them, it keeps evaluating the regex until all the matches have been exhausted, which I assumed it did normally, similar to PHP's preg_match.
Related
I have an HTML document which contains this text somewhere in it
function deleteFolder() {
var mailbox = "CN=John Urban,OU=Sect-1,DC=TestServer ,DC=acme,DC=com";
var path = "/Inbox/";
//string of interest: "CN=John Urban,OU=Sect-1,DC=TestServer ,DC=acme,DC=com"
I just want to extract this text and store it in a variable in C#. My problem is that string of interest will slightly change each time the page is loaded, something like this:
"CN=John Urban,OU=Sect-1,DC=TestServer ,DC=acme,DC=com"
"CN=Jane Doe,OU=Sect-1,DC=TestServer ,DC=acme,DC=com"
etc....
How do I extract that ever changing string, without regular expression?
Is it always a function deleteFolder() which has its first line as var mailbox = "somestring"? And you are interested in somestring?
Based on the requirements you told us, could just search your string containing the HTML for var mailbox =" and then the next " and take all text between these two occurrences.
var htmlstring= "..."; //
var i1 = htmlstring.IndexOf("var mailbox = \"");
var i2 = i1 >= 0 ? htmlstring.IndexOf("\"", i1+15) : -1;
var result = i2 >= 0 ? htmlstring.Substring(i1+15, i2-(i1+15)): "not found";
VERY, VERY ugly, not maintainable, but without more information, I can't do any better. However Regex would be much nicer!
I'm trying to restrict the data inputted in a text field by the user by only allowing letters in it and must contain at LEAST 2 words(For name and surname) in this html line:
<label name="CardHolderName">Card Holder Name</label>
<input type="Text" name="CardHolderName"required/><br>
I tried using the "RegExp"/RegularExpression in javascript but since I'm a beginner in coding and never learnt using the regexp I can't figure it out, anyone could suggest/help me please?Thanks.
P.S: If possible the code/function must be only in html and javascript.
You need to listen keypress event in javascript
HTML
<label name="CardHolderName">Card Holder Name</label>
<input type="Text" id="cardHolderName" name="CardHolderName" required/><br>
Javascript
// we listen key up event of the keyboard
document.getElementById ('cardHolderName').keyup(function (e) {
var t = e.currentTarget; // get the element
var name = t.value; // get the value of the element
if(name.split(' ').length < 2) { // count elements splitted by an space
// if it's less than two words, make stuff
}
}) ;
If you define a 'word' as at least two characters, you could do something simple like this:
var is_valid = function(input) {
var min_length = 2,
space = ' ',
index_of_space = input.indexOf(space);
return
index_of_space !== -1
&& index_of_space >= min_length
&& index_of_space < (input.length - min_length);
};
The condition being returned reads as "the input contains a space, and the space is at least the 3rd character, and the space is not within the last two characters".
The indexOf function only works on modern browsers, if you need to support older browsers (IE 8 and below), there is a polyfill available here.
I have a challenging problem to solve. I'm working on a script which takes a regex as an input. This script then finds all matches for this regex in a document and wraps each match in its own <span> element. The hard part is that the text is a formatted html document, so my script needs to navigate through the DOM and apply the regex across multiple text nodes at once, while figuring out where it has to split text nodes if needed.
For example, with a regex that captures full sentences starting with a capital letter and ending with a period, this document:
<p>
<b>HTML</b> is a language used to make <b>websites.</b>
It was developed by <i>CERN</i> employees in the early 90s.
</p>
Would ideally be turned into this:
<p>
<span><b>HTML</b> is a language used to make <b>websites.</b></span>
<span>It was developed by <i>CERN</i> employees in the early 90s.</span>
</p>
The script should then return the list of all created spans.
I already have some code which finds all the text nodes and stores them in a list along with their position across the whole document and their depth. You don't really need to understand that code to help me and its recursive structure can be a bit confusing. The first part I'm not sure how to do is figure out which elements should be included within the span.
function findTextNodes(node, depth = -1, start = 0) {
let list = [];
if (node.nodeType === Node.TEXT_NODE) {
list.push({ node, depth, start });
} else {
for (let i = 0; i < node.childNodes.length; ++i) {
list = list.concat(findTextNodes(node.childNodes[i], depth+1, start));
if (list.length) {
start += list[list.length-1].node.nodeValue.length;
}
}
}
return list;
}
I figure I'll make a string out of all the document, run the regex through it and use the list to find which nodes correspond to witch regex matches and then split the text nodes accordingly.
But an issue arrives when I have a document like this:
<p>
This program is not stable yet. Do not use this in production yet.
</p>
There's a sentence which starts outside of the <a> tag but ends inside it. Now I don't want the script to split that link in two tags. In a more complex document, it could ruin the page if it did. The code could either wrap two sentences together:
<p>
<span>This program is not stable yet. Do not use this in production yet.</span>
</p>
Or just wrap each part in its own element:
<p>
<span>This program is </span>
<a href="beta.html">
<span>not stable yet.</span>
<span>Do not use this in production yet.</span>
</a>
</p>
There could be a parameter to specify what it should do. I'm just not sure how to figure out when an impossible cut is about to happen, and how to recover from it.
Another issue comes when I have whitespace inside a child element like this:
<p>This is a <b>sentence. </b></p>
Technically, the regex match would end right after the period, before the end of the <b> tag. However, it would be much better to consider the space as part of the match and wrap it like this:
<p><span>This is a <b>sentence. </b></span></p>
Than this:
<p><span>This is a </span><b><span>sentence.</span> </b></p>
But that's a minor issue. After all, I could just allow extra white-space to be included within the regex.
I know this might sound like a "do it for me" question and its not the kind of quick question we see on SO on a daily basis, but I've been stuck on this for a while and it's for an open-source library I'm working on. Solving this problem is the last obstacle. If you think another SE site is best suited for this question, redirect me please.
Here are two ways to deal with this.
I don't know if the following will exactly match your needs. It's a simple enough solution to the problem, but at least it doesn't use RegEx to manipulate HTML tags. It performs pattern matching against the raw text and then uses the DOM to manipulate the content.
First approach
This approach creates only one <span> tag per match, leveraging some less common browser APIs.
(See the main problem of this approach below the demo, and if not sure, use the second approach).
The Range class represents a text fragment. It has a surroundContents function that lets you wrap a range in an element. Except it has a caveat:
This method is nearly equivalent to newNode.appendChild(range.extractContents()); range.insertNode(newNode). After surrounding, the boundary points of the range include newNode.
An exception will be thrown, however, if the Range splits a non-Text node with only one of its boundary points. That is, unlike the alternative above, if there are partially selected nodes, they will not be cloned and instead the operation will fail.
Well, the workaround is provided in the MDN, so all's good.
So here's an algorithm:
Make a list of Text nodes and keep their start indices in the text
Concatenate these nodes' values to get the text
Find matches over the text, and for each match:
Find the start and end nodes of the match, comparing the the nodes' start indices to the match position
Create a Range over the match
Let the browser do the dirty work using the trick above
Rebuild the node list since the last action changed the DOM
Here's my implementation with a demo:
function highlight(element, regex) {
var document = element.ownerDocument;
var getNodes = function() {
var nodes = [],
offset = 0,
node,
nodeIterator = document.createNodeIterator(element, NodeFilter.SHOW_TEXT, null, false);
while (node = nodeIterator.nextNode()) {
nodes.push({
textNode: node,
start: offset,
length: node.nodeValue.length
});
offset += node.nodeValue.length
}
return nodes;
}
var nodes = getNodes(nodes);
if (!nodes.length)
return;
var text = "";
for (var i = 0; i < nodes.length; ++i)
text += nodes[i].textNode.nodeValue;
var match;
while (match = regex.exec(text)) {
// Prevent empty matches causing infinite loops
if (!match[0].length)
{
regex.lastIndex++;
continue;
}
// Find the start and end text node
var startNode = null, endNode = null;
for (i = 0; i < nodes.length; ++i) {
var node = nodes[i];
if (node.start + node.length <= match.index)
continue;
if (!startNode)
startNode = node;
if (node.start + node.length >= match.index + match[0].length)
{
endNode = node;
break;
}
}
var range = document.createRange();
range.setStart(startNode.textNode, match.index - startNode.start);
range.setEnd(endNode.textNode, match.index + match[0].length - endNode.start);
var spanNode = document.createElement("span");
spanNode.className = "highlight";
spanNode.appendChild(range.extractContents());
range.insertNode(spanNode);
nodes = getNodes();
}
}
// Test code
var testDiv = document.getElementById("test-cases");
var originalHtml = testDiv.innerHTML;
function test() {
testDiv.innerHTML = originalHtml;
try {
var regex = new RegExp(document.getElementById("regex").value, "g");
highlight(testDiv, regex);
}
catch(e) {
testDiv.innerText = e;
}
}
document.getElementById("runBtn").onclick = test;
test();
.highlight {
background-color: yellow;
border: 1px solid orange;
border-radius: 5px;
}
.section {
border: 1px solid gray;
padding: 10px;
margin: 10px;
}
<form class="section">
RegEx: <input id="regex" type="text" value="[A-Z].*?\." /> <button id="runBtn">Highlight</button>
</form>
<div id="test-cases" class="section">
<div>foo bar baz</div>
<p>
<b>HTML</b> is a language used to make <b>websites.</b>
It was developed by <i>CERN</i> employees in the early 90s.
<p>
<p>
This program is not stable yet. Do not use this in production yet.
</p>
<div>foo bar baz</div>
</div>
Ok, that was the lazy approach which, unfortunately doesn't work for some cases. It works well if you only highlight across inline elements, but breaks when there are block elements along the way because of the following property of the extractContents function:
Partially selected nodes are cloned to include the parent tags necessary to make the document fragment valid.
That's bad. It'll just duplicate block-level nodes. Try the previous demo with the baz\s+HTML regex if you want to see how it breaks.
Second approach
This approach iterates over the matching nodes, creating <span> tags along the way.
The overall algorithm is straightforward as it just wraps each matching node in its own <span>. But this means we have to deal with partially matching text nodes, which requires some more effort.
If a text node matches partially, it's split with the splitText function:
After the split, the current node contains all the content up to the specified offset point, and a newly created node of the same type contains the remaining text. The newly created node is returned to the caller.
function highlight(element, regex) {
var document = element.ownerDocument;
var nodes = [],
text = "",
node,
nodeIterator = document.createNodeIterator(element, NodeFilter.SHOW_TEXT, null, false);
while (node = nodeIterator.nextNode()) {
nodes.push({
textNode: node,
start: text.length
});
text += node.nodeValue
}
if (!nodes.length)
return;
var match;
while (match = regex.exec(text)) {
var matchLength = match[0].length;
// Prevent empty matches causing infinite loops
if (!matchLength)
{
regex.lastIndex++;
continue;
}
for (var i = 0; i < nodes.length; ++i) {
node = nodes[i];
var nodeLength = node.textNode.nodeValue.length;
// Skip nodes before the match
if (node.start + nodeLength <= match.index)
continue;
// Break after the match
if (node.start >= match.index + matchLength)
break;
// Split the start node if required
if (node.start < match.index) {
nodes.splice(i + 1, 0, {
textNode: node.textNode.splitText(match.index - node.start),
start: match.index
});
continue;
}
// Split the end node if required
if (node.start + nodeLength > match.index + matchLength) {
nodes.splice(i + 1, 0, {
textNode: node.textNode.splitText(match.index + matchLength - node.start),
start: match.index + matchLength
});
}
// Highlight the current node
var spanNode = document.createElement("span");
spanNode.className = "highlight";
node.textNode.parentNode.replaceChild(spanNode, node.textNode);
spanNode.appendChild(node.textNode);
}
}
}
// Test code
var testDiv = document.getElementById("test-cases");
var originalHtml = testDiv.innerHTML;
function test() {
testDiv.innerHTML = originalHtml;
try {
var regex = new RegExp(document.getElementById("regex").value, "g");
highlight(testDiv, regex);
}
catch(e) {
testDiv.innerText = e;
}
}
document.getElementById("runBtn").onclick = test;
test();
.highlight {
background-color: yellow;
}
.section {
border: 1px solid gray;
padding: 10px;
margin: 10px;
}
<form class="section">
RegEx: <input id="regex" type="text" value="[A-Z].*?\." /> <button id="runBtn">Highlight</button>
</form>
<div id="test-cases" class="section">
<div>foo bar baz</div>
<p>
<b>HTML</b> is a language used to make <b>websites.</b>
It was developed by <i>CERN</i> employees in the early 90s.
<p>
<p>
This program is not stable yet. Do not use this in production yet.
</p>
<div>foo bar baz</div>
</div>
This should be good enough for most cases I hope. If you need to minimize the number of <span> tags it can be done by extending this function, but I wanted to keep it simple for now.
function parseText( element ){
var stack = [ element ];
var group = false;
var re = /(?!\s|$).*?(\.|$)/;
while ( stack.length > 0 ){
var node = stack.shift();
if ( node.nodeType === Node.TEXT_NODE )
{
if ( node.textContent.trim() != "" )
{
var match;
while( node && (match = re.exec( node.textContent )) )
{
var start = group ? 0 : match.index;
var length = match[0].length + match.index - start;
if ( start > 0 )
{
node = node.splitText( start );
}
var wrapper = document.createElement( 'span' );
var next = null;
if ( match[1].length > 0 ){
if ( node.textContent.length > length )
next = node.splitText( length );
group = false;
wrapper.className = "sentence sentence-end";
}
else
{
wrapper.className = "sentence";
group = true;
}
var parent = node.parentNode;
var sibling = node.nextSibling;
wrapper.appendChild( node );
if ( sibling )
parent.insertBefore( wrapper, sibling );
else
parent.appendChild( wrapper );
node = next;
}
}
}
else if ( node.nodeType === Node.ELEMENT_NODE || node.nodeType === Node.DOCUMENT_NODE )
{
stack.unshift.apply( stack, node.childNodes );
}
}
}
parseText( document.body );
.sentence {
text-decoration: underline wavy red;
}
.sentence-end {
border-right: 1px solid red;
}
<p>This is a sentence. This is another sentence.</p>
<p>This sentence has <strong>emphasis</strong> inside it.</p>
<p><span>This sentence spans</span><span> two elements.</span></p>
I would use "flat DOM" representation for such task.
In flat DOM this paragraph
<p>abc <a href="beta.html">def. ghij.</p>
will be represented by two vectors:
chars: "abc def. ghij.",
props: ....aaaaaaaaaa,
You will use normal regexp on chars to mark span areas on props vector:
chars: "abc def. ghij."
props: ssssaaaaaaaaaa
ssss sssss
I am using schematic representation here, it's real structure is an array of arrays:
props: [
[s],
[s],
[s],
[s],
[a,s],
[a,s],
...
]
conversion tree-DOM <-> flat-DOM can use simple state automata.
At the end you will convert flat DOM to tree DOM that will look like:
<p><s>abc </s><a href="beta.html"><s>def.</s> <s>ghij.</s></p>
Just in case: I am using this approach in my HTML WYSIWYG editors.
As everyone has already said, this is more of an academic question since this shouldn't really be the way you do it. That being said, it seemed like fun so here's one approach.
EDIT: I think I got the gist of it now.
function myReplace(str) {
myRegexp = /((^<[^>*]>)+|([^<>\.]*|(<[^\/>]*>[^<>\.]+<\/[^>]*>)+)*[^<>\.]*\.\s*|<[^>]*>|[^\.<>]+\.*\s*)/g;
arr = str.match(myRegexp);
var out = "";
for (i in arr) {
var node = arr[i];
if (node.indexOf("<")===0) out += node;
else out += "<span>"+node+"</span>"; // Here is where you would run whichever
// regex you want to match by
}
document.write(out.replace(/</g, "<").replace(/>/g, ">")+"<br>");
console.log(out);
}
myReplace('<p>This program is not stable yet. Do not use this in production yet.</p>');
myReplace('<p>This is a <b>sentence. </b></p>');
myReplace('<p>This is a <b>another</b> and <i>more complex</i> even <b>super complex</b> sentence.</p>');
myReplace('<p>This is a <b>a sentence</b>. Followed <i>by</i> another one.</p>');
myReplace('<p>This is a <b>an even</b> more <i>complex sentence. </i></p>');
/* Will output:
<p><span>This program is </span><span>not stable yet. </span><span>Do not use this in production yet.</span></p>
<p><span>This is a </span><b><span>sentence. </span></b></p>
<p><span>This is a <b>another</b> and <i>more complex</i> even <b>super complex</b> sentence.</span></p>
<p><span>This is a <b>a sentence</b>. </span><span>Followed <i>by</i> another one.</span></p>
<p><span>This is a </span><b><span>an even</span></b><span> more </span><i><span>complex sentence. </span></i></p>
*/
I have spent a long time implementing all of approaches given in this thread.
Node iterator
Html parsing
Flat Dom
For any of this approaches you have to come up with technique to split entire html into sentences and wrap into span (some might want words in span). As soon as we do this we will run into performance issues (I should say beginner like me will run into performance issues).
Performance Bottleneck
I couldn't scale any of this approach to 70k - 200k words and still do it in milli seconds. Wrapping time keeps increasing as words in pages keep increasing.
With complex html pages with combinations of text-node and different elements we soon run into trouble and with this technical debt keeps increasing.
Best approach : Mark.js (according to me)
Note: if you do this right you can process any number of words in millis.
Just use Ranges I want to recommend Mark.js and following example,
var instance = new Mark(document.body);
instance.markRanges([{
start: 15,
length: 5
}, {
start: 25:
length: 8
}]); /
With this we can treat entire body.textContent as string and just keep highlighting substring.
No DOM structure is modified here. And you can easily fix complex use cases and technical debt doesn't increase with more if and else.
Additionally once text is highlighted with html5 mark tag you can post process these tags to find out bounding rectangles.
Also look into Splitting.js if you just want split html documents into words/chars/lines and many more... But one draw back for this approach is that Splitting.js collapses additional spaces in the document so we loose little bit of info.
Thanks.
(http://www.learnwithjesse.com/white-hmong-to-green-hmong-converter/).
I have JavaScript that utilizes two HTML <textarea> elements, one for input values and one for output values. Input values are converted a different word and is outputted to the output box. For example, if I type in the word 'daj' in the input box and click on the convert button, it should output the converted word 'dlaaj' to the output box. It works fine in Chrome, Firefox, and Internet Explorer, but not on my Galaxy S5 Active; it outputs the same word I put in the input box. 'daj' still outputs 'daj'.
What I've tried so far: I've tried running on different phones, iPhone5, it doesn't output correctly. I've tried running the script on http://mobiletest.me which makes my computer act like mobile phone using Galaxy S5 and it runs properly.
How can I get it to output properly on mobile phones?
<p class="welcome" id="greeting">White Hmong to Green Hmong Converter</p>
<p class="content">
</p>
<form>
<input type="button" value="Convert White Hmong to Green Hmong" onClick="clicked(0)" /><input type="button" value="Converter Green Hmong to White Hmong" onClick="clicked(1)" />
<br>
<br>
<textarea rows="7" cols="68" id="whiteHmongInput" >
Input</textarea>
<br>
<br>
<textarea rows="7" cols="68" id="greenHmongInput" >
Output</textarea>
</form>
<script>
function clicked(number) {
var n = -1;
var list = [];
var list2 = [];
var NUMBERWORDS = 90;
list[0] ="cab";
list[1] ="cia";
list[2] ="dab";
list[3] ="daj";
list[4] ="dej";
list[5] ="dev";
list[6] ="dib";
list[7] ="duab";
list[8] ="fiav";
list[9] ="hais";
list[10] ="hla";
list[11] ="hlab";
list[12] ="hlad";
list[13] ="hmaim";
list[14] ="hmaiv";
list[15] ="hmaj";
list[16] ="hmaiv";
list[17] ="hmob";
list[18] ="hmog";
list[19] ="hmoo";
list[20] ="hmoob";
list[21] ="hmood";
list[22] ="hmoog";
list[23] ="hmoov";
list[24] ="hmos";
list[25] ="hmov";
list[26] ="hnas";
list[27] ="hnais";
list[28] ="hneev";
list[29] ="hnyab";
list[30] ="hnov";
list[31] ="iav";
list[32] ="kos";
list[33] ="liab";
list[34] ="liaj";
list[35] ="liam";
list[36] ="loos";
list[37] ="los";
list[38] ="mloos";
list[39] ="mus";
list[40] ="npib";
list[41] ="nqhaiv";
list[42] ="nyaiv";
list[43] ="pa";
list[44] ="pab";
list[45] ="pad";
list[46] ="pag";
list[47] ="pam";
list[48] ="piam";
list[49] ="piav";
list[50] ="qaib";
list[51] ="rhiam";
list[52] ="siab";
list[53] ="siav";
list[54] ="thab";
list[55] ="tiab";
list[56] ="tias";
list[57] ="tiav";
list[58] ="tsam";
list[59] ="tshaj";
list[60] ="txiav";
list[61] ="txiab";
list[62] ="vaj";
list[63] ="xa";
list[64] ="xaj";
list[65] ="xya";
list[66] ="yiag";
list[67] ="zaj";
list[68] = "txoj";
list[69] = "nco";
list[70] = "dua";
list[71] = "tus";
list[72] = "txog";
list[73] = "cas";
list[74] = "tos";
list[75] = "qab";
list[76] = "yaj";
list[77] = "pov";
list[78] = "niaj";
list[79] = "hmo";
list[80] = "hnub";
list[81] = "iab";
list[82] = "pom";
list[83] = "niaj";
list[84] = "Ziag";
list[85] = "ya";
list[86]= "tas";
list[87]= "nws";
list[88] = "rau";
list[89] = "li";
list2[0] ="caab";
list2[1] ="ca";
list2[2] ="dlaab";
list2[3] ="dlaaj";
list2[4] ="dlej";
list2[5] ="dlev";
list2[6] ="dlib";
list2[7] ="dluab";
list2[8] ="fav";
list2[9] ="has";
list2[10] ="hlaa";
list2[11] ="hlaab";
list2[12] ="hlaad";
list2[13] ="maim";
list2[14] ="maiv";
list2[15] ="maaj";
list2[16] ="maiv";
list2[17] ="mob";
list2[18] ="mog";
list2[19] ="moo";
list2[20] ="moob";
list2[21] ="mood";
list2[22] ="moog";
list2[23] ="moov";
list2[24] ="mog";
list2[25] ="mov";
list2[26] ="naag";
list2[27] ="nais";
list2[28] ="neev";
list2[29] ="nyab";
list2[30] ="nov";
list2[31] ="av";
list2[32] ="kaus";
list2[33] ="lab";
list2[34] ="laj";
list2[35] ="lam";
list2[36] ="loog";
list2[37] ="lug";
list2[38] ="noog";
list2[39] ="moog";
list2[40] ="pib";
list2[41] ="qhav";
list2[42] ="yav";
list2[43] ="paa";
list2[44] ="paab";
list2[45] ="paad";
list2[46] ="paag";
list2[47] ="choj";
list2[48] ="puag";
list2[49] ="pav";
list2[50] ="qab";
list2[51] ="rag";
list2[52] ="sab";
list2[53] ="sav";
list2[54] ="hab";
list2[55] ="tab";
list2[56] ="tag";
list2[57] ="tav";
list2[58] ="tsaam";
list2[59] ="tshaaj";
list2[60] ="txav";
list2[61] ="txab";
list2[62] ="vaaj";
list2[63] ="xaa";
list2[64] ="xaaj";
list2[65] ="xyaa";
list2[66] ="yag";
list2[67] ="zaaj";
list2[68] = "txuj";
list2[69] = "ncu";
list2[70] = "dlua";
list2[71] = "tug";
list2[72] = "txug";
list2[73] = "caag";
list2[74] = "tog";
list2[75] = "qaab";
list2[76] = "yaaj";
list2[77] = "puv";
list2[78] = "naj";
list2[79] = "mo";
list2[80] = "nub";
list2[81] = "ab";
list2[82] = "pum";
list2[83] = "naj";
list2[84] = "Zag";
list2[85] = "yaa";
list2[86]= "tag";
list2[87]= "nwg";
list2[88] = "rua";
list2[89] = "le";
var s = document.getElementById("whiteHmongInput").value;
var choppedIntoLines = s.split(/\r\n|\r|\n/g);
var choppedIntoWords;
//Splits Lines Into Words
document.getElementById("greenHmongInput").value ="";
for(var i = 0; i < choppedIntoLines.length; i++) {
choppedIntoWords = choppedIntoLines[i].split(" ");
//Splits each Line to words, then match words to see if white hmong if so convert to green word
for(var o = 0; o < choppedIntoWords.length; o++) {
choppedIntoWords[o].toLowerCase();
if (number == 0){
n = list.indexOf(choppedIntoWords[o].valueOf()); //tries to find the index of a word if i exist, returns -1 if it doesn't
if ( n != -1){
choppedIntoWords[o] = list2[n];
}
n = -1; //Basically if n = -1 it means the white hmong word coulnd't be found.
}
if (number == 1 ){
n = list2.indexOf(choppedIntoWords[o].valueOf()); //tries to find the index of a word if i exist, returns -1 if it doesn't
if ( n != -1){
choppedIntoWords[o] = list[n];
}
n = -1; //Basically if n = -1 it means the white hmong word couldn't be found.
}
}
//Recombines words to line of Words.
choppedIntoLines[i] = "";
for (var p = 0; p < choppedIntoWords.length; p++) {
choppedIntoLines[i] += choppedIntoWords[p] + " ";
}
//Recombines lines and Output to Green Hmong Section
document.getElementById("greenHmongInput").value += choppedIntoLines[i]+"\n";
}
}
</script>
I suspect the reason your code is not working on mobile is not because of the mobile browser, but the keyboard on your mobile device. Most mobile keyboards will automatically capitalize the first letter you type, so when you type daj in it automatically comes out Daj.
There is a bug in the code that prevents it form working with words that have any capital letters.
Fixing the current code
The line choppedIntoWords[o].toLowerCase(); does nothing. In JavaScript strings are immutable, toLowerCase does not alter the string, it returns a new string. Since you never assigned the result of toLowerCase to a variable, the result was immediately discarded.
The quick fix would be to just assign the result of toLowerCase to the array element you are calling it on: choppedIntoWords[o] = choppedIntoWords[o].toLowerCase();.
Better yet, move the toLowerCase call into the comparison, leaving the original value untouched, this will preserve the capitalization of any words that are not being replaced.
list.indexOf(choppedIntoWords[o].toLowerCase()) // .valueOf() is not needed here
(Another subtle bug, word 84 in the arrays (Ziag/Zag), both words are capitalized which will screw up the comparisons even after the toLowerCase bug is fixed.)
That said, I would rewrite it to work a little differently to make it more maintainable.
A better way of doing it
Instead of trying to keep two lists of words in sync, it would be much easier to store the word pairs as two-element arrays inside of another array. This way when adding/removing/editing any of the words you can do it all in one place.
At run-time, you could then generate two objects to act as look-up tables for the white-to-green and green-to-white conversions. Using these objects to look up the conversions should also be much faster than doing .indexOf on an array. In this particular situation, efficiency probably does not matter that much but it is a bonus that the conversion will happen (ever so slightly) faster.
<p class="welcome" id="greeting">White Hmong to Green Hmong Converter</p>
<p class="content">
</p>
<form>
<button type="button" id="white-to-green">Convert White Hmong to Green Hmong</button>
<button type="button" id="green-to-white">Convert Green Hmong to White Hmong</button>
<br>
<br>
<textarea rows="7" cols="68" id="input" placeholder="Type in Hmong words here"></textarea>
<br>
<br>
<textarea rows="7" cols="68" id="output" placeholder="Converted words will appear here"></textarea>
</form>
<script>
// Storing the white and green words in a single array that made up of
// two-element arrays containing each word pair will be much more
// maintainable than trying to keep two different lists in sync.
var hmongWords = [
// white first, green second
["cab", "caab"],
["cia", "ca"],
["dab", "dlaab"],
["daj", "dlaaj"],
// ...
["tas", "tag"],
["nws", "nwg"],
["rau", "rua"],
["le", "li"]
],
// these objects will act as look-up tables for the conversions
whiteToGreen = {},
greenToWhite = {},
elInput = document.getElementById("input"),
elOuput = document.getElementById("output"),
elWhiteToGreen = document.getElementById("white-to-green"),
elGreenToWhite = document.getElementById("green-to-white"),
convert = function (text, lookupTable) {
var lines = text.split(/\r\n|\r|\n/),
processWord = function (word) {
// Look for the word in the look up object.
// If the object has a property that is the word we are looking for,
// that value will be returned. If the word is not in the look-up
// object, undefined is returned. Since undefined is falsey the
// second half of the or statement will return the original word
return lookupTable[word.toLowerCase()] || word;
},
processLine = function (line) {
// Split the line up based on whitespace and process them,
// join the resulting array with spaces and return the converted line
return line.split(/\s/).map(processWord).join(' ');
};
// Map will return a new array of lines that have been processed by
// processLine. Join the new lines with a newline and return the string
return lines.map(processLine).join('\n');
},
convertWhiteToGreen = function (text) {
// just calls convert with the whiteToGreen look-up object
return convert(text, whiteToGreen);
},
convertGreenToWhite = function (text) {
// just calls convert with the greenToWhite look-up object
return convert(text, greenToWhite);
},
makeListener = function (converter) {
// this returns a function that will be used as an event listener
return function () {
// grab text from the input box, runs it through the
// converter function that was provided when makeListener was called
// and puts the output into the output box
elOuput.value = converter(elInput.value);
};
};
// build the look-up tables
hmongWords.forEach(function (wordPair) {
whiteToGreen[wordPair[0]] = wordPair[1];
greenToWhite[wordPair[1]] = wordPair[0];
});
// Attach the event listeners to the buttons.
// makeListener returns a function that uses the function you pass to it
// to convert the text.
elWhiteToGreen.addEventListener('click', makeListener(convertWhiteToGreen), false);
elGreenToWhite.addEventListener('click', makeListener(convertGreenToWhite), false);
</script>
You can see my version in action here
My code uses the forEach and map methods to loop over the arrays that split creates instead of for loops. This avoids the need for a counter variable and instead allows us to provide each item in the array a meaningful name (word, line, etc) instead of referring to an item in the array by its index.
Something else you might notice is that the convert function uses the logical or (||) operator. The logical or statement short-circuits if the first operand is truthy. So if a value is found in the look-up object, it is returned. If the value is not found in the object the second operand, the original word is returned. You have to be careful when using this technique in some situations, for instance when a valid option might be falsy such as 0 or an empty string. But in this situation lookupTable[word.toLowerCase()] will either return a non-empty string, which is always truthy or undefined which is always falsy.
You might have noticed that I used the words "truthy" and "falsy" instead of true and false this has to do with how implicit type conversion is handled in JavaScript. If something is "truthy" it will be converted to true when in a context that a Boolean value is needed. Likewise "falsy" values are values that will be converted to false in a context where a Boolean value is needed.
In the HTML instead of putting the placeholder text as a value in the textareas, I used the placeholder attribute.
Here are a few articles that might help understanding some of the techniques I've used if they are new to you:
Truthy and Falsy: When All is Not Equal in JavaScript
JavaScript quirk 1: implicit conversion of values
Functions are first class objects in javascript
Tidying Up a JavaScript Application with Higher-Order Functions
Exploring JavaScript’s Logical OR Operator
Another thing I used in my code but didn't discus is closures. They are a kind of big and important topic in JavaScript here is some stuff to help with them:
Closing The Book On Javascript Closures
(videos) Stuart Langridge: Secrets of JavaScript Closures part 1, part 2
Advanced JavaScript: Namespaces, Closures, Self-Invoking Functions, and much, much more…
I am writing a support chat application where I want text to be parsed for urls. I have found answers for similar questions but nothing for the following.
what i have
function ReplaceUrlToAnchors(text) {
var exp = /(\b(https?:\/\/|ftp:\/\/|file:\/\/|www.)
[-A-Z0-9+&##\/%?=~_|!:,.;]*[-A-Z0-9+&##\/%=~_|])/ig;
return text.replace(exp,"<a href='$1' target='_blank'>$1</a>");
}
that pattern is a modified version of one i found on the internet. It includes www. in the first token, because not all urls start with protocol:// However, when www.google.com is replaced with
<a href='www.google.com' target='_blank'>www.google.com</a>
which pulls up MySite.com/webchat/wwww.google.com and I get a 404
that is my first problem, my second is...
in my script for generating messages to the log, I am forced to do it a hacky way:
var last = 0;
function UpdateChatWindow(msgArray) {
var chat = $get("MessageLog");
for (var i = 0; i < msgArray.length; i++) {
var element = document.createElement("div");
var linkified = ReplaceUrlToAnchors(msgArray[i]);
element.setAttribute("id", last.toString());
element.innerHTML = linkified;
chat.appendChild(element);
last = last + 1;
}
}
To get the "linkified" string to render HTML out correctly I have to use the non-standard .innerHTML attribute of element. I would prefer a way were i could parse the string as tokens - text tokens and anchor tokens - and call either createTextNode or createElement("a") and stitch them together with DOM.
so question 1 is how should I go about www.site.com parsing, or even site.com?
and question 2 is how would could I do this using only DOM?
Another thing you could do is this:
function ReplaceUrlToAnchors(text) {
var exp = /(\b(https?:\/\/|ftp:\/\/|file:\/\/|www.)
[-A-Z0-9+&##\/%?=~_|!:,.;]*[-A-Z0-9+&##\/%=~_|])/ig;
return text.replace(exp, function(_, url) {
return '<a href="' +
(/^www\./.test(url) ? "http://" + url : url) +
'target="_blank">' +
url +
'</a>';
});
}
That is kind-of like your solution, but it does the check for "www" URLs in that callback passed in to ".replace()".
Note that you won't be picking up "stackoverflow.com" or "newegg.com" or anything like that, which I understand may be unavoidable (and even desirable, given the false positives you'd pick up).
Here is what I came up with, perhaps someone has something better?
function replaceUrlToAnchors(text) {
var naked = /(\b(www.)[-A-Z0-9+&##\/%?=~_|!:,.;]*[-A-Z0-9+&##\/%=~_|](.com|.net|.org|.co.uk|.ca|.))/ig;
text = text.replace(naked, "http://$1");
var exp = /(\b(https?:\/\/|ftp:\/\/|file:\/\/)([-A-Z0-9+&##\/%?=~_|!:,.;]*[-A-Z0-9+&##\/%=~_|]))/ig;
return text.replace(exp,"<a href='$1' target='_blank'>$3</a>");
}
the first regex will replace www.google.com with http://www.google.com and is good enough for what I am doing. However, I will hold off marking this as the answer because I would also like to make (www.) optional but when I do (www.)? it replaces every word with http://word/