Javascript "".length returning 1 rather than 0 - javascript

Ok so I am rather stumped by this one.
I get a string value from a javascript library. I call myStringVar = myStringVar.trim() but when I do myStringVar.substring(0,1) it gives me an empty string. When I call var arr = myStringVar.split('') the first element in the array is and empty string, and when I call arr[0].trim().length it returns 1 instead of zero.
Am I missing something?
EDIT
Following the comments and responses I have been able to isolate the problem down to the existence of a non-visual unicode character at the beginning of the string. I will now try to find a way to remove those characters from the string....or better yet extract the portions of the string that are of interest.
Thanks for the help.

The most likely answer for this is that you have some invisible Unicode character in your string (for instance, "⁣", U+2063 INVISIBLE SEPARATOR).
A string containing only such a character would look to a user (or programmer) like an empty string, but would infact have length 1 since it does contain a character.
One simple way to test if this is the case, is to get the Unicode character code of the character in the string with string.charCodeAt(0). You can then look this up value in a Unicode table (such as this one), which should tell you if you have an invisible character in your string.

Related

Is it posible to two equal strings be unequal in Javascript?

Me and my team are doing a React/Redux project and now I want to filter out duplicated tags, but I realize someone has put some tricky strings to the tags data like this
And when I log those tags to the console, for example the first and the second tag of the tag list are looking like the same is "HumanIty" but when I compare them with even strict equal operator, I've got the false result.
When I try to select and copy the text content in both string tags, then paste them back to the console, I got a surprise result - The string in the second tag somehow has spaces between characters (red dots in the picture below)
Someone has to face this problem before please give me some explain about this.
Thank you.
To answer your question directly:
Is it possible for to two equal strings be unequal in Javascript?
No.
As mentioned in the comments you have some invisible characters in your strings, making them unequal when you compare them.
To fix the problem, remove the invisible characters with a method of your choice (my recommendation would be to not let user input invisible characters in the first place).
What is the .length property of each string?
If you iterate an index variable over each character position from 0 (inclusive) to length (exclusive), and print the .charCodeAt(index), what do you see?
In doing this, you might see differences between the strings.
I've found out that one of those two look-alike strings contains some special invisible, zero-width character called Byte Order Mark
(https://www.ionos.com/digitalguide/websites/web-development/byte-order-mark/)
and we could strip out those characters by the regex /[^\x20-\x7E]/g as
(https://www.w3resource.com/javascript-exercises/javascript-string-exercise-32.php)
We could detect the existence of the invisible character with some tools which show unicode character
(https://qaz.wtf/u/show.cgi?show=a%E2%80%8Bc&type=string)

UTF8 String not allowing charAt() or substring to pull out specific characters

In my code I'm trying to isolate out the first character of a variable, it is the UTF8 symbol: 🌈
The code to outputs are as follows:
Code:
console.log(login_name);
console.log(login_name.charAt(0));
console.log(login_name.substring(0,1));
Output:
🌈 ✨✨✨UTF8MB4
�
�
Obviously, I want .charAt() to print 🌈 and not �. Any known oddities with utf8mb4 that I'm missing? My main problem is I don't know how to word this specific problem.
Also if I swap the rainbow for/ target the ✨, it functions as it should and prints properly.
JavaScript can't handle Unicode properly. charAt() operates on code units instead of code points.
Luckily JavaScript has workarounds. To get the characters in a string instead of UTF-16/UCS-2 code units you need to call Array.from(yourstring), which will get you an array of characters. From there on you can get the first element in the usual way.
let characters = Array.from(login_name);
console.log(characters.shift());

Javascript - how to use regex process the following complicated string

I have the following string that will occur repeatedly in a larger string:
[SM_g]word[SM_h].[SM_l] "
Notice in this string after the phrase "[SM_g]word[Sm_h]" there are three components:
A period (.) This could also be a comma (,)
[SM_l]
"
Zero to all three of these components will always appear after "[SM_g]word[SM_h]". However, they can also appear in any order after "[SM_g]word[SM_h]". For example, the string could also be:
[SM_g]word[SM_h][SM_l]"
or
[SM_g]word[SM_h]"[SM_l].
or
[SM_g]word[SM_h]".
or
[SM_g]word[SM_h][SM_1].
or
[SM_g]word[SM_h].
or simply just
[SM_g]word[SM_h]
These are just some of the examples. The point is that there are three different components (more if you consider the period can also be a comma) that can appear after "[SM_h]word[SM_g]" where these three components can be in any order and sometimes one, two, or all three of the components will be missing.
Not only that, sometimes there will be up to one space before " and the previous component/[SM_g]word[SM_h].
For example:
[SM_g]word[SM_h] ".
or
[SM_g]word[SM_h][SM_l] ".
etc. etc.
I am trying to process this string by moving each of the three components inside of the core string (and preserving the space, in case there is a space before &\quot; and the previous component/[SM_g]word[SM_h]).
For example, [SM_g]word[SM_h].[SM_l]" would turn into
[SM_g]word.[SM_l]"[SM_h]
or
[SM_g]word[SM_h]"[SM_l]. would turn into
[SM_g]word"[SM_l].[SM_h]
or, to simulate having a space before "
[SM_g]word[SM_h] ".
would turn into
[SM_g]word ".[SM_h]
and so on.
I've tried several combinations of regex expressions, and none of them have worked.
Does anyone have advice?
You need to put each component within an alternation in a grouping construct with maximum match try of 3 if it is necessary:
\[SM_g]word(\[SM_h])((?:\.|\[SM_l]| ?"){0,3})
You may replace word with .*? if it is not a constant or specific keyword.
Then in replacement string you should do:
$1$3$2
var re = /(\[SM_g]word)(\[SM_h])((?:\.|\[SM_l]| ?"){0,3})/g;
var str = `[SM_g]word[SM_h][SM_l] ".`;
console.log(str.replace(re, `$1$3$2`));
This seems applicable for your process, in other word, changing sub-string position.
(\[SM_g])([^[]*)(\[SM_h])((?=([,\.])|(\[SM_l])|( ?&\\?quot;)).*)?
Demo,,, in which all sub-strings are captured to each capture group respectively for your post processing.
[SM_g] is captured to group1, word to group2, [SM_h] to group3, and string of all trailing part is to group4, [,\.] to group5, [SM_l] to group6, " ?&\\?quot;" to group7.
Thus, group1~3 are core part, group4 is trailing part for checking if trailing part exists, and group5~7 are sub-parts of group4 for your post processing.
Therefore, you can get easily matched string's position changed output string in the order of what you want by replacing with captured groups like follows.
\1\2\7\3 or $1$2$7$3 etc..
For replacing in Javascript, please refer to this post. JS Regex, how to replace the captured groups only?
But above regex is not sufficiently precise because it may allow any repeatitions of the sub-part of the trailing string, for example, \1\2\3\5\5\5\5 or \1\2\3\6\7\7\7\7\5\5\5, etc..
To avoid this situation, it needs to adopt condition which accepts only the possible combinations of the sub-parts of the trailing string. Please refer to this example. https://regex101.com/r/6aM4Pv/1/ for the possible combinations in the order.
But if the regex adopts the condition of allowing only possible combinations, the regex will be more complicated so I leave the above simplified regex to help you understand about it. Thank you:-)

Regex javascript to only return a value and not full match

How do we do look behind in java script like we can in java or php?
RegEx works for php parser using lookbehind
Here is the working Regex using php parser.
(?<=MakeName=)(.*?)([^\s]+)
This produces the value
(MakeName=)(.*?)([^\s]+)
this produces the match + value
xml response to extract value from.
<ModelName="Tacoma" MakeName="Tundra" Year="2015">
I just need the value
There is no look-behind in JavaScript.
If you are sure the attribute MakeName is present in the input, then you could use this regular expression:
/[^"]*(?!.*\sMakeName\s*=)(?="([^"]*"[^"]*")*[^"]*$)/
It grabs the first series of characters that do not contain a double quote and have a double quote immediately following it, with an even number of double quotes following after that until the end of the input (to make sure we are matching inside a quoted string), but MakeName= should not occur anywhere after the match.
This is of course still not bullet proof, as it will fail for some boundary cases, like with single quoted values.:
<ModelName="Tacoma" MakeName='Tundra' Year="2015">
You could resolve that, if needed, by repeating the same pattern, but then based on single quotes, and combining the two with an OR (|).
Demo:
var s = '<ModelName="Tacoma" MakeName="Tundra" Year="2015">';
result = s.match(/[^"]*(?!.*\sMakeName\s*=)(?="([^"]*"[^"]*")*[^"]*$)/);
console.log(result[0]);

Javascript substring check using indexOf or search on a date string with forward slash /

I am surprised to not to find any post regarding this, I must be missing something very trivial. I have a small JavaScript function to check if a string matches an object's properties. Simple stuff right? It works easily with all strings except those which contain a forward slash.
"‎04‎/‎08‎/‎2015‎".indexOf('4') // returns 2 :good
"‎04‎/‎08‎/‎2015‎".indexOf('4/') // returns -1 :why?
The same issue appears to be with .search() function as well. I encountered this issue while working on date strings.
Please note that I don't want to use regex based solution for performance reasons. Thanks for your help in advance!
Your string has invisible Unicode characters in it. The "left-to-right mark" (hex 200E) appears around the two slash characters as well as at the beginning and the end of the string.
If you type the code in on your browser console instead of cutting and pasting, you'll see that it works as expected.

Categories