Using javascript regex to translate a html

Using javascript regex to translate a html - javascript

I would like to build my own translation function in javascript.
I already have a function language.lookup(key) which translates a word or expression:
var frenchHello = language.lookup('hello') //'bonjour'
Now I would like to write a function which takes a html string and translates it with my lookup function. In the html string I will have a special syntax for example #[translationkey] that will point out that this word should be translated.
This is the result I want:
var html = '<div><span>#[hello]</span><span>#[sir]</span>'
language.translate(html) //'<div><span>bonjour</span><span>monsieur</span>
How would I write language.translate?
My idea is to filter out my special syntax with regex and then run language.lookup on each key. Maybe with string replace or something.
I suck when it comes to regex and I've only come up with a very incomplete example but I include it anyway so maybe someone get the idea of what I am trying to do. Then if there is a better but complete different solution that is more than welcome.
var value = "#[hello], nice to see you.";
lookup = function(word){
return "bonjour";
};
var res = new RegExp( "\\b(hello)\\b", "gi" ).exec(value)
for (var c1 = 0; c1 < res.length; c1++){
value = value.replace(res[c1], lookup(res[c1]))
}
alert(value) //#[bonjour], nice to see you.
The regex should of course not filter out the word hello but the syntax and then collect the key by grouping or similar.
Can anyone help?

Just use String.replace method's ability to call function specified as second argument to generate replacement text and make a global replace using regexp matching your syntax:
var value = "#[hello], #[sir], nice to see you.";
lookup = function(full_match, word){
if(word == 'hello')
return "bonjour";
if(word == 'sir')
return "monsieur"
};
console.log(value.replace(/#\[(.+?)\]/gi, lookup))
Result:
bonjour, monsieur, nice to see you.
Of course when your replacement list gets bigger, you'd better use lookup object instead of series of ifs in lookup function, but you can really do whatever you want there.

You can try this to find all occurrences:
var re = new RegExp('#\\[([^\\]]+?)\\]', 'gi'),
str = '#[value1] plain text #[value2]',
match;
while (match = re.exec(str)) {
console.log(match);
}

You could use something like:
#\\[[^\\]]*\\]
Which matches the hash followed by an opening square bracket followed by zero or more characters NOT including the closing square bracket, followed by a closed square bracket.
Alternatively, perhaps it would be better to handle the translation at the server side (maybe even through your template engine) and send back to your client the translated response. Otherwise, (depending on the specific problem you are dealing with of course), you might end up sending a lot of data to the browser which might make your application respond slowly.
EDIT:
Here is a working piece of code:
var q="This #[ANIMAL1] was eaten by that #[ANIMAL2]";
var u = {"#[ANIMAL1]":"Lion","#[ANIMAL2]":"Frog"};
function insertAnimal(aString, lookup){
var res = (new RegExp("#\\[[^\\]]*\\]", "gi"))
while (m = res.exec(aString)){
aString = aString.replace(m, lookup[m])
}
return aString;
}
function main(){
alert(insertAnimal(q,u));
}
You can call the "main()" from an HTML document's body onload event

I can compare your requirement to 'resolving template texts within content'. If it is feasible to use Jquery , you should try Handlebars.js
.

Related

Dynamic string cutting

Okay, so I have a filepath with a variable prefix...
C:\Users\susan ivey\Documents\VKS Projects\secc-electron\src\views\main.jade
... now this path will be different for whatever computer I'm working on...
is there a way to traverse the string up to say 'secc-electron\', and drop it and everything before it while preserving the rest of it? I'm familiar with converting strings to arrays to manipulate elements contained within delimiters, but this is a problem that I have yet to come up with an answer to... would there be some sort of regex solution instead? I'm not that great with regex so I wouldn't know where to begin...

What you probably want is to do a split (with regex or not):
Here's an example:
var paragraph = 'C:\\Users\\susan ivey\\Documents\\VKS Projects\\secc-electron\\src\\views\\main.jade';
var splittedString = paragraph.split("secc-electron"); // returns an array of 2 element containing "C:\\Users\\susan ivey\\Documents\\VKS Projects\\" as the first element and "\\src\\views\\main.jade" as the 2nd element
console.log(splittedString[1]);
You can have a look at this https://www.w3schools.com/jsref/jsref_split.asp to learn more about this function.

With Regex you can do:
var myPath = 'C:\Users\susan ivey\Documents\VKS Projects\secc-electron\src\views\main.jade'
var relativePath = myPath.replace(/.*(?=secc-electron)/, '');
The Regex is:
.*(?=secc-electron)
It matches any characters up to 'secc-electron'. When calling replace it will return the last part of the path.

You can split the string at a certain point, then return the second part of the resulting array:
var string = "C:\Users\susan ivey\Documents\VKS Projects\secc-electron\src\views\main.jade"
console.log('string is: ', string)
var newArray = string.split("secc-electron")
console.log('newArray is: ', newArray)
console.log('newArray[1] is: ', newArray[1])

Alternatively you could use path.parse(path); https://nodejs.org/api/path.html#path_path_parse_path and retrieve the parts that you are interested in from the object that gets returned.

Javascript: given an array of variables, function to remove characters and output another array

so I am still learning Javascript, so I know this is a basic questions, and I'd really like to learn what I'm missing. I have an array of variables, and I need a function that removes special characters, and returns the result as an array.
Here's my code:
var myArray = [what_hap, desc_injury];
function ds (string) {
string.replace(/[\\]/g, ' ')
string.replace(/[\"]/g, ' ')
string.replace(/[\/]/g, '-')
string.replace(/[\b]/g, ' ')
string.replace(/[\f]/g, ' ')
string.replace(/[\n]/g, ',')
string.replace(/[\r]/g, ' ')
string.replace(/[\t]/g, ' ');
return string;
}
ds (myArray);
I know that's not going to work, so I'm just trying to learn the simplest and cleanest way to output:
[whatHap: TEXTw/oSpecialCharacters, descInj: TEXTw/oSpecialCharacters]
Anyone willing to guide a noobie? Thanks! :)

The comments on the question are correct, you need to specify what you are asking a little better but I will try and give you some guidance from what I assume about your intended result.
One important thing to note which would fix the function you already have is that string.replace() will not change the string itself, it returns a new string with the replacements as you can see in the documentation. to do many replacements you need to do string = string.replace('a', '-')
On to a solution for the whole array. There are a couple ways to process an array in javascript: for loop, Array.forEach(), or Array.map(). I urge you to read the documentation of each and look up examples on your own to understand each and where they are most useful.
Since you want to replace everything in your array I suggest using .map()
or .foreach() since these will loop through the whole array for you without you having to keep track of the index yourself. Below are examples of using each to implement what I think you are going for.
Map
function removeSpecial(str) {
// replace all these character with ' '
// \ " \b \f \r \t
str = str.replace(/[\\"\b\f\r\t]/g, ' ');
// replace / with -
str = str.replace(/\//g, '-');
// replace \n with ,
str = str.replace(/\n/g, ',');
return str;
}
let myArray = ["string\\other", "test/path"];
let withoutSpecial = myArray.map(removeSpecial); // ["string other", "test-path"]
forEach
function removeSpecial(myArray) {
let withoutSpecial = [];
myArray.forEach(function(str) {
str = str.replace(/[\\"\b\f\r\t]/g, ' ');
// replace / with -
str = str.replace(/\//g, '-');
// replace \n with ,
str = str.replace(/\n/g, ',');
withoutSpecial.push(str)
});
return withoutSpecial;
}
let myArray = ["string\\other", "test/path"];
let withoutSpecial = removeSpecial(myArray); // ["string other", "test-path"]
The internalals of each function's can be whatever replacements you need it to be or you could replace them with the function you already have. Map is stronger in this situation because it will replace the values in the array, it's used to map the existing values to new corresponding values one to one for every element. On the other hand the forEach solution requires you to create and add elements to a new array, this is better for when you need to do something outside the array itself for every element in the array.
PS. you should check out https://regex101.com/ for help building regular expressions if you want a more complex replacements but you dont really need them for this situation

I realize that the way I wrote my goal isn't exactly clear. I think what I should have said was that given several text strings, I want to strip out some specific characters (quotes, for example), and then output each of those into an array that can be accessed. I have read about arrays, it's just been my experience in learning JS that reading code and actually doing code are two very different things.
So I appreciate the references to documentation, what I really needed to see was a real life example code.
I ended up finding a solution that works:
function escapeData(data) {
return data
.replace(/\r/g, "");
}
var result = {};
result.what_hap_escaped = escapeData($what_hap);
result.desc_injury_escaped = escapeData($desc_injury);
result;
I appreciate everyone's time, and hope I didn't annoy you guys too much with my poorly constructed question :)

How to properly bold search terms from Twitter, strange regex case in JS

I'm retrieving tweets from Twitter with the Twitter API and displaying them in my own client.
However, I'm having some difficulty properly highlighting the right search terms. I want to an effect like the following:
The way I'm trying to do this in JS is with a function called highlightSearchTerms(), which takes the text of the tweet and an array of keywords to bold as arguments. It returns the text of the fixed tweet. I'm bolding keywords by wrapping them in a that has the class .search-term.
I'm having a lot of problems, which include:
Running a simple replace doesn't preserve case
There is a lot of conflict with the keyword being in href tags
If I try to do a for loop with a replace, I don't know how to only modify search terms that aren't in an href, and that I haven't already wrapped with the span above
An example tweet I want to be able to handle for:
Input:
This is a keyword. This is a <a href="http://search.twitter.com/q=%23keyword">
#keyword</a> with a hashtag. This is a link with kEyWoRd:
http://thiskeyword.com.
Expected Output:
This is a
<span class="search-term">keyword</span>
. This is a <a href="http://search.twitter.com/q=%23keyword"> #
<span class="search-term">keyword</span>
</a> with a hashtag. This is a link with
<span class="search-term">kEyWoRd</span>
:<a href="http://thiskeyword.com">http://this
<span class="search-term>keyword.com</span>
</a>.
I've tried many things, but unfortunately I can't quite find out the right way to tackle the problem. Any advice at all would be greatly appreciated.
Here is my code that works for some cases but ultimately doesn't do what I want. It fails to handle for when the keyword is in the later half of the link (e.g. http://twitter.com/this_keyword). Sometimes it strangely also highlights 2 characters before a keyword as well. I doubt the best solution would resemble my code too much.
function _highlightSearchTerms(text, keywords){
for (var i=0;i<keywords.length;i++) {
// create regex to find all instances of the keyword, catch the links that potentially come before so we can filter them out in the next step
var searchString = new RegExp("[http://twitter.com/||q=%23]*"+keywords[i], "ig");
// create an array of all the matched keyword terms in the tweet, we can't simply run a replace all as we need them to retain their initial case
var keywordOccurencesInitial = text.match(searchString);
// create an array of the keyword occurences we want to actually use, I'm sure there's a better way to create this array but rather than try to optimize, I just worked with code I know should work because my problem isn't centered around this block
var keywordOccurences = [];
if (keywordOccurencesInitial != null) {
for(var i3=0;i3<keywordOccurencesInitial.length;i3++){
if (keywordOccurencesInitial[i3].indexOf("http://twitter.com/") > -1 || keywordOccurencesInitial[i3].indexOf("q=%23") > -1)
continue;
else
keywordOccurences.push(keywordOccurencesInitial[i3]);
}
}
// replace our matches with search term
// the regex should ensure to NOT catch terms we've already wrapped in the span
// i took the negative lookbehind workaround from http://stackoverflow.com/a/642746/1610101
if (keywordOccurences != null) {
for(var i2=0;i2<keywordOccurences.length;i2++){
var searchString2 = new RegExp("(q=%23||http://twitter.com/||<span class='search-term'>)?"+keywordOccurences[i2].trim(), "g"); // don't replace what we've alrdy replaced
text = text.replace(searchString2,
function($0,$1){
return $1?$0:"<span class='search-term'>"+keywordOccurences[i2].trim()+"</span>";
});
}
}
return text;
}

Here's something you can probably work with:
var getv = document.getElementById('tekt').value;
var keywords = "keyword,big elephant"; // comma delimited keyword list
var rekeywords = "(" + keywords.replace(/\, ?/ig,"|") + ")"; // wraps keywords in ( and ), and changes , to a pipe (character for regex alternation)
var keyrex = new RegExp("(#?\\b" + rekeywords + "\\b)(?=[^>]*?<[^>]*>|(?![^>]*>))","igm")
alert(keyrex);
document.getElementById('tekt').value = document.getElementById('tekt').value.replace(keyrex,"<span class=\"search-term\">$1</span>");
And here is a variation that attempts to deal with word forms. If the word ends with ed,es,s,ing,etc, it chops it off and also, while looking for word-boundaries at the end of the word, it also looks for words ending in common suffixes. It's not perfect, for instance the past tense of ride is rode. Accounting for that with Regex is nigh-impossible without opening yourself up to tons of false-positives.
var getv = document.getElementById('tekt').value;
var keywords = "keywords,big elephant";
var rekeywords = "(" + keywords.replace(/(es|ing|ed|d|s|e)?\b(\s*,\s*|$)/ig,"(es|ing|ed|d|s|e)?$2").replace(/,/g,"|") + ")";
var keyrex = new RegExp("(#?\\b" + rekeywords + "\\b)(?=[^>]*?<[^>]*>|(?![^>]*>))","igm")
console.log(keyrex);
document.getElementById('tekt').value = document.getElementById('tekt').value.replace(keyrex,"<span class=\"search-term\">$1</span>");
Edit
This is just about perfect. Do you know how to slightly modify it so the keyword in thiskeyword.com would also be highlighted?
Change this line
var keyrex = new RegExp("(#?\\b" + rekeywords + "\\b)(?=[^>]*?<[^>]*>|(?![^>]*>))","igm")
to (All I did was remove both \\b's):
var keyrex = new RegExp("(#?" + rekeywords + ")(?=[^>]*?<[^>]*>|(?![^>]*>))","igm")
But be warned, you'll have problems like smiles ending up as smiles (if a user searches for mile), and there's nothing regex can do about that. Regex's definition of a word is alphanumeric characters, it has no dictionary to check.

JavaScript, Regex and null result

I have written this regexp: <(a*)\b[^>]*>.*?</\1>
and is tested on this regexp testing site: http://gskinner.com/RegExr/?2tntr
The point of the regexp is to go through a sites HTML and find all of the links. It should then return these in an Array for me to manipulate.
On the regexp testing site it works perfectly, but when put in action with JavaScript on my site it returns null.
JavaScript looks like this:
var data = $('#mainDivOnMiddleOfPage').html();
var pattern = "<(a*).*href=.*>.*</a>";
var modi = "g";
var patt = new RegExp(pattern, modi);
var result = patt.exec(data);
jQuery gets the content of the page. This is tested and verified.
Question is, why does this return null in JavaScript but what it is supposed to return in the regexp tester?

All <a> links:
<a[^>]*?\bhref=['\"](.*?)['\"]
Absolute links only (starting with http):
<a[^>]*?\bhref=['\"](http.*?)['\"]
JavaScript code:
var html = '<a href="test.html">';
var m = html.match(/<a[^>]*?\bhref=['"](.*?)['"]/);
print (m[1]);
See and test the code here.

I use the following code to do the same thing and it works for me, try it out
var data = document.getElementById('mainDivOnMiddleOfPage').textContent;
var result = data.match(/<(a*).*href=.*>.*<\/a>/);

Going to go ahead and post this here, since I think it's what you want -- it is not a RegEx solution, however.
$(function(){
$.ajax({
url: "test.htm",
success: function(data){
var array_of_links = $.makeArray($("a",data));
// do your stuff here
}
});
});

I'm conscious an answer has been chosen. However it's worth mentioning that the current REGEX solutions match the tags but not the actual HREFs in isolation.
This is where JavaScript falls down, since its somewhat simplistic implementation of REGEX does not allow for the capturing of sub-groups when the global g flag is specified.
One way round this is to exploit the REGEX replacement callback. This will get just the link HREFs, not the tags.
var html = document.body.innerHTML,
links = [];
html.replace(/<a[^>]*?href=('|")(.*?)\1/gi, function($0, $1, $2) {
links.push($2);
});
//links is now an array of hrefs
It also uses a back-reference to close the href attribute, i.e. making sure both opening and closing quote are single or double, not mixed.
Sidenote: as others have mentioned, where possible, you'd want to DOM this rather than REGEX.

"The point of the regexp is to go through a sites HTML and find all of the links. It should then return these in an Array for me to manipulate."
I won't add another regex answer, but just want to point out that if you have hold of the document (not just the html) then it's easier to walk trhough the links collection. That contains all <a href="">'s but also all <area> elements:
for (var link, links = document.links, n = links.length, i=0; i<n; i++){
link = links[i];
switch (link.tagName){
case "A":
//do something with the link
break;
case "AREA":
//do something with the area.
break;
}
}

Your problem is that you are not compiling your regex:
patt.compile();
You have to call it before using with the exec() method.

JS Regex match string with $

I am trying to write something that would look at tweets and pull up info about stocks being mentioned in the tweet. People use $ to reference stock symbols on twitter but I cant escape the $.
I also dont want to match any price mention or anything like that so basically match $AAPL and not $1500
I was thinking it would be something like this
\b\$[a-zA-Z].*\b
if there are multiple matches id like to loop through them somehow so something like
while ((tweet = reg.exec(sym_pat)) !== null) {
//replace text with stock data.
}
This expression gives me an unexpected illegal token error
var symbol_pat = new RegExp(\b\$[a-z]*);
Thanks for the help if you want to see the next issue I ran into
Javascript AJAX scope inside of $.each Scope

Okay, you've stated that you want to replace the matches with their actual stock values. So, you need to get all of the matching elements (stock ticker names) and then for each match you're going to replace the it with the stock value.
The answer will "read" very similarly to that sentence.
Assume there's a tweet variable that is the contents of a particular tweet you're going to work on:
tweet.match(/\b\$[A-Za-z]+\b/g).forEach(function(match) {
// match looks like '$AAPL'
var tickerValue = lookUpTickerValue(match);
tweet.replace(match, tickerValue);
});
This is assuming you have some logic somewhere that will grab the ticker value for the given stock name and then replace it (it should probably return the original value if it can't find a match, so you don't mangle lovely tweets like "Barbara Streisand is $ATAN").

var symbol_pat = new RegExp('\\b\\$[a-z]+\\b','gi');
// or
var symbol_pat = /\b\$[a-z]+\b/gi;
Also, for some reason JS can not calculate the beginning of a word by \b, it just catches the one at the end.
EDIT: If you're replacing the stock symbols you can use the basic replace method by a function and replace that data with predefined values:
var symbol_pat = /(^|\s)(\$[a-z]+\b)/gi;
var stocks = {AAPL:1,ETC:2}
var str = '$aapl ssd $a a$s$etc $etc';
console.log(str);
str = str.replace(symbol_pat, function() {
var stk = arguments[2].substr(1).toUpperCase();
// assuming you want to replace $etc as well as $ETC by using
// the .toUpperCase() method
if (!stocks[stk]) return arguments[0];
return arguments[0].replace(arguments[2],stocks[stk]);
});
console.log(str);

We Keep Coding

JavaScript is the programming language of the Web.

Using javascript regex to translate a html - javascript

You can try this to find all occurrences: var re = new RegExp('#\\[([^\\]]+?)\\]', 'gi'), str = '#[value1] plain text #[value2]', match; while (match = re.exec(str)) { console.log(match); }

I can compare your requirement to 'resolving template texts within content'. If it is feasible to use Jquery , you should try Handlebars.js .

Related

Dynamic string cutting

Javascript: given an array of variables, function to remove characters and output another array

How to properly bold search terms from Twitter, strange regex case in JS

JavaScript, Regex and null result

JS Regex match string with $

Categories

Resources