InputStream Encoding with Special Characters - javascript

Apologies, I'm not a JS developer and this is the first time I've worked with InputStream.
In the InputStream, I am processing one line of delimited text at a time that will always contain a character that is not UTF-8. My goal is to parse the InputStream to a string, split it by the delimiter, and read a certain value that is UTF-8 at an index.
The line will always be tab delimited, and will always contain the same number of delimiters. I might see something like this (two separate lines):
stuff morestuff 0.00 A ç F00012049333302129FF
stuff2 morestuff2 B è F00012205229521042CB
In my code, the value at the index position always seems to leave my variable undefined, and I'm assuming it's from the UTF-8 encoding in the toString method. My assumption is that the encoding is turning the non UTF-8 character into something that messes up the split function, but I'm not sure what or how. Here's some test code:
var InputStreamCallback = Java.type("org.apache.nifi.processor.io.InputStreamCallback");
var IOUtils = Java.type("org.apache.commons.io.IOUtils");
var StandardCharsets = Java.type("java.nio.charset.StandardCharsets");
var flowFile = session.get();
var index = 5;
session.read(flowFile,
new InputStreamCallback(function(inputStream) {
// Convert the single line of the flowfile into a UTF_8 encoded string
var line = IOUtils.toString(inputStream, StandardCharsets.UTF_8);
// Split the delimited string into an array
var dataArray = line.split('\t');
// Capture the required value at the defined index position
var capturedValue = dataArray[index];
}));
if (typeof capturedValue === 'undefined') {
// log an error
}
else {
// do what it's supposed to do
}
I'm hoping someone could explain what exactly is happening, and help me find a solution that will allow me to look up the correct value at my predetermined index position.

Related

Doubts in JavaScript RegExp and String.replace() method

I am trying to enter 'username' in a webpage using VBA. So in the source code of the webpage, there are some modifications done to the 'username' value.
I have attached the code,
function myFunction()
{
document.str.value = "Abc02023";
document.str.value = document.str.value.toUpperCase();
pattern = new RegExp("\\*", "g");
document.str.value = document.str.value.replace(pattern, "");
document.str.value = document.str.value.replace(/^\s+/, "");
document.str.value = document.str.value.replace(/\s+$/, "");
}
I read about these and from my understanding, after the modifications document.str.value is ABC02023.
Obviously I am wrong as there would not be no point in doing all these modifications then. Also, I am getting an 'incorrect username error'.
So can anybody please help me to understand these. What would be the value of document.str.value and how did you figure it out? I am new to JavaScript so please forgive me if I am being too slow...
Looks like you are using some very old code to learn from. ☹
Let's see if we can still learn something by bringing this code up to date, then you go find some newer learning materials. Here is a well-written book series with free online versions available: You Don't Know JS.
function myFunction() {
// Assuming your code runs in a browser, `document` is equal to the
// global object. So if in a browser and somewhere outside the function
// a variable `str` has been created, this will add an attribute `value`
// to `str` and set the value of `str.value` to 'Abc02023'. If there is
// no already existing object `document` (for instance not running in
// a browser) or if document does not have an already created property
// called`str` then this will throw a TypeError because you cannot add
// a property to `undefined`.
document.str.value = "Abc02023";
// You probably were just trying to create a new variable `str` so let's
// just start over
}
Second try
function myFunction() {
// create a variable `str` and set it to 'Abc02023'
var str = "Abc02023";
// Take value of str and convert it to all capital letters
// then overwrite current value of str with the result.
// So now `str === 'ABC02023'
str = str.toUpperCase();
// Create a regular expression representing all occurences of `*`
// and assign it to variable `pattern`.
var pattern = new RegExp("\\*", "g");
// Remove all instances of our pattern from the string. (which does not
// affect this string, but prevents user inputting some types of bad
// strings to hack your website.
str = str.replace(pattern, "");
// Remove any leading whitespace form our string (which does not
// affect this string, but cleans up strings input by a user).
str = str.replace(/^\s+/, "");
// Remove any trailing whitespace form our string (which does not
// affect this string, but cleans up strings input by a user).
str = str.replace(/\s+$/, "");
// Let's at least see our result behind the scenes. Press F12
// to see the developer console in most browsers.
console.log("`str` is equal to: ", str );
}
Third try, let's clean this up a little:
// The reason to use functions is so we can contain the logic
// separate from the data. Let's pull extract our data (the string value)
// and then pass it in as a function parameter
var result = myFunction('Abc02023')
console.log('result = ', result)
function myFunction(str) {
str = str.toUpperCase();
// Nicer syntax for defining regular expression.
var pattern = /\*/g;
str = str.replace(pattern, '');
// Unnecesarry use of regular expressions. Let's use trim instead
// to clean leading and trailing whitespace at once.
str = str.trim()
// let's return our result so the rest of the program can use it
// return str
}
Last go round. We can make this much shorter and easier to read by chaining together all the modifications to str. And let's also give our function a useful name and try it out against a bad string.
var cleanString1 = toCleanedCaps('Abc02023')
var cleanString2 = toCleanedCaps(' ** test * ')
console.log('cleanString1 = ', cleanString1)
console.log('cleanString2 = ', cleanString2)
function toCleanedCaps(str) {
return str
.toUpperCase()
.replace(/\\*/g, '')
.trim()
}
#skylize answer is close
what is equivalent to your code is actually
function toCleanedCaps(str) {
return str
.toUpperCase()
.replace(/\*/g, '') // he got this wrong
.trim()
}
Let's go over the statements one by one
document.str.value = document.str.value.toUpperCase();
makes the string uppercase
pattern = new RegExp("\\*", "g");
document.str.value = document.str.value.replace(pattern, "");
replaces between zero and unlimited occurences of the \ character , so no match in this case.
document.str.value = document.str.value.replace(/^\s+/, "");
replaces any whitespace character occurring between one and unlimited times at the beginning of the string, so no match.
document.str.value = document.str.value.replace(/\s+$/, "");
replaces any whitespace character occurring between one and unlimited times at the end of the string, so no match.
You are right. With "Abc02023" as input, the output is what you suggest.

javascript: get everything after certain characters from a string?

I'm trying to get everything after certain characters in a string.
But I have no idea why with my, when I alert(); the result, there is a comma before the string!
Here is a working FIDDLE
And this is my code:
var url = "mycool://?string=mysite.com/username_here80";
var urlsplit = url.split("mycool://?string=");
alert(urlsplit);
any help would be appreciated.
Split separates the string into tokens separated by the delimiter. It always returns an array one longer than the number of tokens in the string. If there is one delimiter, there are two tokens—one to the left and one to the right. In your case, the token to the left is the empty string, so split() returns the array ["", "mysite.com/username_here80"]. Try using
var urlsplit = url.split("mycool://?string=")[1]; // <= Note the [1]!
to retrieve the second string in the array (which is what you are interested in).
The reason you are getting a comma is that converting an array to a string (which is what alert() does) results in a comma-separated list of the array elements converted to strings.
The split function of the string object returns an Array of elements, based on the splitter. In your case - the returned 2 elements:
var url = "http://DOMAIN.com/username_here801";
var urlsplit = url.split("//");
console.log(urlsplit);
The comma you see is only the representation of the Array as string.
If you are looking for to get everything after a substring you better use the indexOf and slice:
var url = "http://DOMAIN.com/username_here801";
var splitter = '//'
var indexOf = url.indexOf(splitter);
console.log(url.slice(indexOf+splitter.length));
I'd use a simple replace..
var s = "mycool://?string=mysite.com/username_here80";
var ss = s.replace("mycool://?string=", "");
alert(ss);

Split String in Javascript with a given format

I have string "club160", but I want to get string "club-160", how I can do this? I need to use split() func? For split I need delimiter but there is no comma, or space.
Can somebody help me?
You can do this using a regular expression and a replace operation:
var s = "club160"
var result = s.replace(/([a-z])([0-9])/i, '$1-$2')
But this only replaces something like aaa111 to aaa-111, 111aaa will stay 111aaa
If the string always contains at least one digit or letter, consider:
var s = 'club160';
s.match(/(\d+)|([a-z]+)/ig).join('-'); // club-160
var t = '160club';
t.match(/(\d+)|([a-z]+)/ig).join('-'); // 160-club
It doesn't care about order or how many groups of letters and numbers are present. However, it requires at least one letter or number in the string, otherwise it will throw an error.
As a function, dealing with errors:
function specialSplit(s) {
// Make sure string has at least one letter or digit
if (/\d|[a-z]/i.test(s)) {
return s.match(/(\d+)|([a-z]+)/ig).join('-');
}
// Otherwise return undefined
}

javascript split() function - ghost character at the end of each string

I am reading a file I created in Notepad in windows. (The basic txt editor.)
When creating the file I wrote (where [newline] indicates a return)
app.exe[newline]background.jpg[newline]
and then saved it. I put this into a directory.
My Nodekit program read this file and then did the following:
var data = fs.readFileSync(filenameTemp, "utf8");
data.replace(/\r\n/g, "\n");
data.replace(/\r/g, "\n");
var strARR = data.split("\n");
strARR[0] is length 8 ????? when "app.exe" is length 7. When I look at strARR[0][7] in Chrome it says it is "", ie a string with nothing in.
Also strARR[1] is length 15 when "background.jpg" is length 14. Again Chrome reports the extra character as "".
strARR[2] is length 0 as expected.
Where is this ghost character coming from? It's responsible for another error I am getting.
The replace method returns a new string - it does NOT modify the existing string. Lines two and three of your code aren't changing the value held in data. You need to assign the returned value back into your data variable, like so:
var data = fs.readFileSync(filenameTemp, "utf8");
data = data.replace(/\r\n/g, "\n");
data = data.replace(/\r/g, "\n");
var strARR = data.split("\n");
The 'ghost' character you're seeing is in fact the \r character, which you think you've removed but haven't!

Convert string to whitespace

I'm looking for av way to convert a string into whitespace; spaces, newlines and tabs, and the other way around.
I found a Python script, but I have no idea how to do it using Javascript.
I need it for a white-hacking contest.
I can has banana? ;)
var ws={x:'0123',y:' \t\r\n',a:/[\w\W]/g,b:/[\w\W]{8}/g,c:function(z){return(
ws.y+ws.x)[(ws.x+ws.y).indexOf(z)]},e:function(s){return(65536+s.charCodeAt(0)
).toString(4).substr(1).replace(ws.a,ws.c)},d:function(s){return String.
fromCharCode(parseInt(s.replace(ws.a,ws.c),4))},encode:function(s){return s.
replace(ws.a,ws.e)},decode:function(s){return s.replace(ws.b,ws.d)}};
// test string
var s1 = 'test0123456789AZaz€åäöÅÄÖ';
// show test string
alert(s1);
// encode test string
var code = ws.encode(s1);
// show encoded string
alert('"'+code+'"');
// decode string
var s2 = ws.decode(code);
// show decoded string
alert(s2);
// verify that the strings are completely identical
alert(s1 === s2);

Categories