extract words in string with unicode characters - javascript

In javascript (nodejs) I need to index text strings with unicode characters, i.e given a string like:
"Bonjour à tous le monde,
je voulais être le premier à vous dire:
-'comment ça va'
-<est-ce qu'il fait beau?>"
I want to get the following array of words :
["Bonjour", "à", "tous", "le", "monde", "je", "voulais", "être", ... "beau"]
How can I achieve that using regex or any other means ?
ps: I installed and tried the xregexp module which provides unicode support for javascript, but being utterly useless with regexes in general, I could not go very far ...

You can use the version of XRegExp bundled with addons which (amongst others) adds support for regex unicode categories. We are interested in the category not an unicode letter that is \P{L}.
You can then split your string by the regex XRegExp("\\P{L}+").
var s="Bonjour à tous le monde,\nje voulais être le premier à vous dire:\n -'comment ça va'\n -<est-ce qu'il fait beau?>";
var notALetter = XRegExp("\\P{L}+");
var words = XRegExp.split(s, notALetter);
See this fiddle.

You can probably use the library "uwords" - https://github.com/AlexAtNet/uwords. It extracts words from the text by grouping together characters from L* Unicode groups.
It works similar to XRegExp("\\p{L}+") but it is extremely fast.
Example:
var uwords = require('uwords');
var words = uwords('Bonjour à tous le monde,\n' +
'je voulais être le premier à vous dire:\n' +
'-\'comment ça va\'\n' +
'-<est-ce qu\'il fait beau?>');
console.log(words);
[ 'Bonjour',
'à',
'tous',
'le',
'monde',
'je',
'voulais',
'être',
'le',
'premier',
'à',
'vous',
'dire',
'comment',
'ça',
'va',
'est',
'ce',
'qu',
'il',
'fait',
'beau' ]
P.S. Sorry for being late - I hope it is still useful.

An idea could be to split the string by the various characters that are NOT part of words and then filter out empty strings:
var str = "Bonjour à tous le monde, je voulais être le premier à vous dire: -'comment ça va' -<est-ce qu'il fait beau?>";
var result = str.split(/[-:'"?\s><]+/).filter(function(item) { return item !== '' });
/*
["Bonjour", "à", "tous", "le", "monde,", "je", "voulais", "être", "le", "premier", "à", "vous", "ire", "comment", "ça", "va", "est", "ce", "qu", "il", "fait", "beau"]
*/
Similarly you could match by the negated character class above and you don't have to filter empty strings:
var result = str.match(/[^-:'"?\s><]+/g);

Related

How to compare a text with accents in cypress?

I try to compare a text in cypress, my text has words with accents and it throws the following error ...
AssertionError Timed out retrying: expected '' to have text '\n Su lote de distribuci�n se ha creado correctamente, en breve sus comprobantes se enviar�n a sus respectivos destinatarios.\n', but the text was '\n Su lote de distribución se ha creado correctamente, en breve sus comprobantes se enviarán a sus respectivos destinatarios.\n
I could make it work by replacing the accented words with their corresponding unicode symbol, which for ñ was \u00F3n

Translating a Paragraph does not work using translate.js

I'm using translate.js.
Short text sentences work but I'm finding that long paragraphs do not nor do elements embedded within elements as you'll see below.
Here's ALL THE CODE:
STEP 1:
the HTML:
<div class="company mar-left10">
<h4 class="trn">Sustainable Global Solutions' Mission</h4>
<p class="trn">Sustainable Global Solutions mission is to market, build, implement, and operate sustainable business methods throughout the US and worldwide. The end result of our efforts is to provide proven, proprietary, and economically efficient solutions to our food, fuel, and population crises at hand. Additionally our waste-to-energy systems can utilize solid waste gasification for renewable energy generation.</p>
</div>
STEP 2:
The JAVASCRIPT:
$(function () {
var t = {
//TITLE
"Sustainable Global Solutions' Mission": {
en: "Sustainable Global Solutions' Mission",
sp: "Misión de Sustainable Global Solutions"
},
//**THIS SECTION WILL NOT TRANSLATE!**
"Sustainable Global Solutions mission is to market, build, implement, and operate sustainable business methods throughout the US and worldwide. The end result of our efforts is to provide proven, proprietary, and economically efficient solutions to our food, fuel, and population crises at hand.Additionally our waste-to-energy systems can utilize solid waste gasification for renewable energy generation.": {
en: "Sustainable Global Solutions mission is to market, build, implement, and operate sustainable business methods throughout the US and worldwide. The end result of our efforts is to provide proven, proprietary, and economically efficient solutions to our food, fuel, and population crises at hand.Additionally our waste-to-energy systems can utilize solid waste gasification for renewable energy generation.",
sp: "La misión de Sustainable Global Solutions es comercializar, construir, implementar y operar métodos comerciales sostenibles en todo Estados Unidos y en todo el mundo. El resultado final de nuestros esfuerzos es proporcionar soluciones probadas, exclusivas y económicas soluciones eficientes para nuestras crisis de alimentación, combustible y población a la mano.Además, nuestros sistemas de conversión de residuos en energía pueden utilizar la gasificación de residuos sólidos para la generación de energía renovable."
},
};
var _t = $('body').translate({lang: "en", t: t});
var str = _t.g("translate");
console.log(str);
$(".lang_selector").click(function (ev) {
var lang = $(this).attr("data-value");
_t.lang(lang);
console.log(lang);
ev.preventDefault();
});
}
);
Any help would be greatly appreciated.
Thank you everyone
Figured it out.
SIMPLE as MAKING SURE EACH PLACE the text appears it is IDENTICAL.
Sustainable Global Solutions mission is to market, build, implement, and operate sustainable business methods throughout the US and worldwide. The end result of our efforts is to provide proven, proprietary, and economically efficient solutions to our food, fuel, and population crises at hand.Additionally our waste-to-energy systems can utilize solid waste gasification for renewable energy generation.
I made sure that it was so and WALA!
La misión de Sustainable Global Solutions es comercializar, construir, implementar y operar métodos comerciales sostenibles en todo Estados Unidos y en todo el mundo. El resultado final de nuestros esfuerzos es proporcionar soluciones probadas, exclusivas y económicas soluciones eficientes para nuestras crisis de alimentación, combustible y población a la mano.Además, nuestros sistemas de conversión de residuos en energía pueden utilizar la gasificación de residuos sólidos para la generación de energía renovable.
Thank you to I Wrestled a BEAR ONCE for helping me SEE THE LIGHT!

Line separator - JSON to HTML with Angular and pascalprecht.translate

Sorry in advance for my bad english..
I have a problem to display a line separator in my HTML page.
I am using AngularJS and the translate module : pascaleprecht.translate.
I would like to have a Line separator in a translation.
So my angular file is :
Betizy.config(function ($translateProvider) {
$translateProvider.translations('fr', {
'header.betizy': 'BetIzy',
'header.login': 'Se connecter',
'header.register': "S'inscrire",
'login.error' : "Le nom d'utilisateur ou le mot de passe est incorrect.",
'login.username' : "Nom d'utilisateur",
'login.password' : "Mot de passe",
'login.button': "Valider",
'register.error' : "Une erreur est survenue lors de l'inscription. \u2028 Veuillez contacter un administrateur.",
'register.username' : "Nom d'utilisateur",
'register.password' : "Mot de passe",
'register.email' : "Adresse e-mail",
'register.button': "Valider"
});
Problem is this line :
'register.error' : "Une erreur est survenue lors de l'inscription. \u2028 Veuillez contacter un administrateur.",
The result in my HTML page is strange. My line separator is present and recognized by my browser but partially.
Here is my HTML page
Thank you in advance for your help ! :)
UPDATE:
Problem solved by replacing \u2028 with <br> in the translation table, and replacing {{"'register.error' | translate"}} with ng-bind-html="'register.error' | translate" in the HTML template (see comments).
It might be because you've used
$translateProvider.useSanitizeValueStrategy('sanitize');
I suggest either removing that line or replacing it with
$translateProvider.useSanitizeValueStrategy('sanitizeParameters');

How to extract values from Bing Speech API output

I am using the Bing Speech API (with Javascript - REST API) and as a result get something like this:
[{
"lexical":"gerson de laudos médicos por meio do reconhecimento automático",
"display":"gerson de laudos por meio do reconhecimento automático",
"inverseNormalization":null,
"maskedInverseNormalization":null,
"transcript":"gerson de laudos por meio do reconhecimento automático",
"confidence":0.7618318
}]
How do I get the result is just what is transcribed? For example, I would like to output only was the text: "gerson de laudos médicos por meio do reconhecimento automático"
You can use JSON.parse to get the results,
For ex : JSON.parse(results.d).transcript, here results.d is the resultant json data you got from Bing API & transcript
Hope that helps, let me know if you need further clarification.
I said these lines:
lexical: r.lexical,
display: r.name,
inverseNormalization: null,
maskedInverseNormalization: null,
Now to take [{"transcript":" and "}], I will use Javascript.

Read JSon string with Javascript and PHP

In a Javascript file i receive this JSon string into a hidden html string:
<input id="page_json_language_index" type="hidden" value="[{"id":"label_accept_terms","fr":"En cliquant sur le bouton ci-dessous, vous accepter de respecter les "},{"id":"label_and","fr":" et la "},{"id":"label_birthdate","fr":"Anniversaire"},{"id":"label_bottom_about","fr":"\u00c0 propos de GayUrban"},{"id":"label_bottom_contact","fr":"Contactez-nous"},{"id":"label_bottom_copyright","fr":"\u00a9 2010-2013 GayUrban.Com - Tous droits r\u00e9serv\u00e9s"},{"id":"label_bottom_privacypolicy","fr":"Vie priv\u00e9e"},{"id":"label_bottom_termsofuse","fr":"Conditions d`...mon courriel"},{"id":"label_signon_twitter","fr":"Avec Twitter"},{"id":"label_slogan","fr":"LE site des rencontres LGBT !"},{"id":"label_terms_of_use","fr":"Conditions d`utilisation"},{"id":"label_title","fr":"Bienvenue sur GayUrban | LE site des rencontres LGBT !"},{"id":"label_transgender","fr":"Transgendre"},{"id":"label_username","fr":"Nom d`utilisateur"},{"id":"label_wait_create_profile","fr":"Un moment SVP, Cr\u00e9ation de votre profil en cours..."},{"id":"label_your_gender","fr":"Votre \u00eate"}]">
from MySQL database in user language (this example is in French (fr) so, i need to access in Javascript to each "id" and each value of this 'id"
Example : for the first "id"
i need to obtain on separate variables for each ID and VALUE
var label = "label_accept_terms";
and other variable
var value = "En cliquant sur le bouton ci-dessous, vous accepter de respecter les "
so i have a problem to read and affected each ID with good label and value.
Thank you for your helping !
I must point out that you've made a mistake in your HTML. You should escape the quotes to avoid breaking attributes, for example, simply replace them with apostrophe:
<input id="page_json_language_index" type="hidden" value='[{"id":"label_accept_terms","fr":"En cliquant sur le bouton ci-dessous, vous accepter de respecter les "},{"id":"label_and","fr":" et la "},{"id":"label_birthdate","fr":"Anniversaire"},{"id":"label_bottom_about","fr":"\u00c0 propos de GayUrban"},{"id":"label_bottom_contact","fr":"Contactez-nous"},{"id":"label_bottom_copyright","fr":"\u00a9 2010-2013 GayUrban.Com - Tous droits r\u00e9serv\u00e9s"},{"id":"label_bottom_privacypolicy","fr":"Vie priv\u00e9e"},{"id":"label_bottom_termsofuse","fr":"Conditions d`...mon courriel"},{"id":"label_signon_twitter","fr":"Avec Twitter"},{"id":"label_slogan","fr":"LE site des rencontres LGBT !"},{"id":"label_terms_of_use","fr":"Conditions d`utilisation"},{"id":"label_title","fr":"Bienvenue sur GayUrban | LE site des rencontres LGBT !"},{"id":"label_transgender","fr":"Transgendre"},{"id":"label_username","fr":"Nom d`utilisateur"},{"id":"label_wait_create_profile","fr":"Un moment SVP, Cr\u00e9ation de votre profil en cours..."},{"id":"label_your_gender","fr":"Votre \u00eate"}]'>
JSON.parse is the best way to convert JSON string, but it's not surpported by old IE (e.g. IE6). You can use JSON2 to make it compatible, or just simply use eval().
Aware that abuse of eval() may lead to XSS (Cross Site Scripting) vulnerability. Make sure that the JSON you're about to parse is safe (doesn't include malicious Javascript).
Here's an example to read all id:
<input id="page_json_language_index" type="hidden" value='[{"id":"label_accept_terms","fr":"En cliquant sur le bouton ci-dessous, vous accepter de respecter les "},{"id":"label_and","fr":" et la "},{"id":"label_birthdate","fr":"Anniversaire"},{"id":"label_bottom_about","fr":"\u00c0 propos de GayUrban"},{"id":"label_bottom_contact","fr":"Contactez-nous"},{"id":"label_bottom_copyright","fr":"\u00a9 2010-2013 GayUrban.Com - Tous droits r\u00e9serv\u00e9s"},{"id":"label_bottom_privacypolicy","fr":"Vie priv\u00e9e"},{"id":"label_bottom_termsofuse","fr":"Conditions d`...mon courriel"},{"id":"label_signon_twitter","fr":"Avec Twitter"},{"id":"label_slogan","fr":"LE site des rencontres LGBT !"},{"id":"label_terms_of_use","fr":"Conditions d`utilisation"},{"id":"label_title","fr":"Bienvenue sur GayUrban | LE site des rencontres LGBT !"},{"id":"label_transgender","fr":"Transgendre"},{"id":"label_username","fr":"Nom d`utilisateur"},{"id":"label_wait_create_profile","fr":"Un moment SVP, Cr\u00e9ation de votre profil en cours..."},{"id":"label_your_gender","fr":"Votre \u00eate"}]'>
<textarea id="debug-console" cols="50" rows="20"></textarea>
<script type="text/javascript">
var arr = eval(document.getElementById("page_json_language_index").value),
output = document.getElementById("debug-console");
//output all id
for(var i=0; i<arr.length; i++)
output.value += [i, ": ", arr[i].id, "\n"].join("");
//show the first id
alert(arr[0].id);
</script>
Actually you can directly output JSON to Javascript.
To meet your needs, I think here's what you need.
<?php
...
$data = ...; //for example, from mysql query results
$language = "fr"; //you can replace it with it/en/zh...
...
?>
<input id="some_id"></input>
<script>
(function() {
var i18n = "<?php echo json_encode($data); ?>",
lang = "<?php echo $language; ?>", //language
data = eval(i18n); //you can also use JSON.parse/jQuery.parseJSON ...
for(var i=0; i<data.length; i++) {
document.getElementById(data[i].id).value = data[i][lang]; //a general way to read object's attribute in Javascript
}
})()
</script>
i need to obtain on separate variables for each ID and VALUE
This is not the way to go, you don't want to pollute your scope with a bunch of variables. What you have is a collection (array of objects). You can loop such collection and access the properties you need.
var input = document.getElementById('page_json_language_index');
var data = JSON.parse(input.value); // collection
the goal is to assign to each pair of (id, fr) value into a jquery
label
var label = function(lab) {
return '<label id="'+ lab.id +'">'+ lab.fr +'</label>';
};
var labels = data.map(label);
You can also make it a jQuery collection:
var $labels = $(labels.join(''));
Then you can append it to any container:
$labels.appendTo('body');

Categories