javascript split() function - ghost character at the end of each string - javascript

I am reading a file I created in Notepad in windows. (The basic txt editor.)
When creating the file I wrote (where [newline] indicates a return)
app.exe[newline]background.jpg[newline]
and then saved it. I put this into a directory.
My Nodekit program read this file and then did the following:
var data = fs.readFileSync(filenameTemp, "utf8");
data.replace(/\r\n/g, "\n");
data.replace(/\r/g, "\n");
var strARR = data.split("\n");
strARR[0] is length 8 ????? when "app.exe" is length 7. When I look at strARR[0][7] in Chrome it says it is "", ie a string with nothing in.
Also strARR[1] is length 15 when "background.jpg" is length 14. Again Chrome reports the extra character as "".
strARR[2] is length 0 as expected.
Where is this ghost character coming from? It's responsible for another error I am getting.

The replace method returns a new string - it does NOT modify the existing string. Lines two and three of your code aren't changing the value held in data. You need to assign the returned value back into your data variable, like so:
var data = fs.readFileSync(filenameTemp, "utf8");
data = data.replace(/\r\n/g, "\n");
data = data.replace(/\r/g, "\n");
var strARR = data.split("\n");
The 'ghost' character you're seeing is in fact the \r character, which you think you've removed but haven't!

Related

InputStream Encoding with Special Characters

Apologies, I'm not a JS developer and this is the first time I've worked with InputStream.
In the InputStream, I am processing one line of delimited text at a time that will always contain a character that is not UTF-8. My goal is to parse the InputStream to a string, split it by the delimiter, and read a certain value that is UTF-8 at an index.
The line will always be tab delimited, and will always contain the same number of delimiters. I might see something like this (two separate lines):
stuff morestuff 0.00 A ç F00012049333302129FF
stuff2 morestuff2 B è F00012205229521042CB
In my code, the value at the index position always seems to leave my variable undefined, and I'm assuming it's from the UTF-8 encoding in the toString method. My assumption is that the encoding is turning the non UTF-8 character into something that messes up the split function, but I'm not sure what or how. Here's some test code:
var InputStreamCallback = Java.type("org.apache.nifi.processor.io.InputStreamCallback");
var IOUtils = Java.type("org.apache.commons.io.IOUtils");
var StandardCharsets = Java.type("java.nio.charset.StandardCharsets");
var flowFile = session.get();
var index = 5;
session.read(flowFile,
new InputStreamCallback(function(inputStream) {
// Convert the single line of the flowfile into a UTF_8 encoded string
var line = IOUtils.toString(inputStream, StandardCharsets.UTF_8);
// Split the delimited string into an array
var dataArray = line.split('\t');
// Capture the required value at the defined index position
var capturedValue = dataArray[index];
}));
if (typeof capturedValue === 'undefined') {
// log an error
}
else {
// do what it's supposed to do
}
I'm hoping someone could explain what exactly is happening, and help me find a solution that will allow me to look up the correct value at my predetermined index position.

Why do I get a blank space returned as last item in array from split

I want to be able to convert file contents from both unix and windows systems into an array.
So I use the regex /\r\n|\n/ to split on "\r\n" and also "\n". Code:
const contents1 = 'line1\r\nline2\r\n';
const contents2 = 'line1\nline2\n';
const lines1 = contents1.split(/\r\n|\n/);
const lines2 = contents2.split(/\r\n|\n/);
console.log(lines1);
console.log(lines2);
I am reading the regex to mean split on either "\r\n" or "\n".
But when I run this code I get two arrays like this:
["line1", "line2", ""]
["line1", "line2", ""]
Why the blank line as last member of the array?
Can I change the regex to fix this issue?
The empty string entry is fully expected because your sample strings represent files that end with an empty line. And that empty line is that empty string at the end.
So you regexp is correct. If you trim or remove the last value from the result, you basically change the the file representation - you'd remove the last empty line. A converter should not do that.
You simply have an empty string at the last part of the string. To overcome this, you could match by non linefeed parts.
const contents1 = 'line1\r\nline2\r\n';
const contents2 = 'line1\nline2\n';
const lines1 = contents1.match(/[^\r\n]+/g);
const lines2 = contents2.match(/[^\r\n]+/g);
console.log(lines1);
console.log(lines2);

Detecting new lines in markdown string

I am processing markdown files in my nodeJS application. So I have my markdown held as a string. I am trying to determine the difference between markdown like:
```Javascript
var code_block = something;
and
```
var code_block = something;
so I am approaching the issue like:
var language = markdown_string.substr(0, markdown_string.search("\n"));
console.log("Language: " + language);
So I am searching the string for the code between the ``` and the newline, however, the \n isn't being found, so the string represents the rest of the file. if I search for (blank space), then I get the var included, so my string doesn't seem to have anything detectable between the end of the backticks or the language and the next line.
Is this correct? Can you see any way I can pick up the rest of the top line after the triple backticks but before the var on the next line?
I have found that by splitting the string into chars, i.e. var chars = markdown.split(""); and then looping through the next 15 values until chars[i] equals \n and using that as an index to substring the necessary part of the string. It works, but is a bit messy ...

Remove new line characters from data recieved from node event process.stdin.on("data")

I've been looking for an answer to this, but whatever method I use it just doesn't seem to cut off the new line character at the end of my string.
Here is my code, I've attempted to use str.replace() to get rid of the new line characters as it seems to be the standard answer for this problem:
process.stdin.on("data", function(data) {
var str;
str = data.toString();
str.replace(/\r?\n|\r/g, " ");
return console.log("user typed: " + str + str + str);
});
I've repeated the str object three times in console output to test it. Here is my result:
hi
user typed: hi
hi
hi
As you can see, there are still new line characters being read between each str. I've tried a few other parameters in str.replace() but nothing seems to work in getting rid of the new line characters.
You are calling string.replace without assigning the output anywhere. The function does not modify the original string - it creates a new one - but you are not storing the returned value.
Try this:
...
str = str.replace(/\r?\n|\r/g, " ");
...
However, if you actually want to remove all whitespace from around the input (not just newline characters at the end), you should use trim:
...
str = str.trim();
...
It will likely be more efficient since it is already implemented in the Node.js binary.
You were trying to console output the value of str without updating it.
You should have done this
str = str.replace(/\r?\n|\r/g, " ");
before console output.
you need to convert the data into JSON format.
JSON.parse(data) you will remove all new line character and leave the data in JSON format.

Most efficient way to grab XML tag from file with JavaScript and Regex

I'm doing some more advanced automation on iOS devices and simulators for an enterprise application. The automation is written in browserless Javascript. One of the methods works on the device but not on the simulator, so I need to code a workaround. For the curious, it's UIATarget.localTarget().frontMostApp().preferencesValueForKey(key).
What we need to do is read a path to a server (which varies) from a plist file on disk. As a workaround on the simulator, I've used the following lines to locate the plist file containing the preferences:
// Get the alias of the user who's logged in
var result = UIATarget.localTarget().host().performTaskWithPathArgumentsTimeout("/usr/bin/whoami", [], 5).stdout;
// Remove the extra newline at the end of the alias we got
result = result.replace('\n',"");
// Find the location of the plist containing the server info
result = UIATarget.localTarget().host().performTaskWithPathArgumentsTimeout("/usr/bin/find", ["/Users/"+result+"/Library/Application Support/iPhone Simulator", "-name", "redacted.plist"], 100);
// For some reason we need a delay here
UIATarget.localTarget().delay(.5);
// Results are returned in a single string separated by newline characters, so we can split it into an array
// This array contains all of the folders which have the plist file under the Simulator directory
var plistLocations = result.stdout.split("\n");
...
// For this example, let's just assume we want slot 0 here to save time
var plistBinaryLocation = plistLocations[0];
var plistXMLLocation = plistLocations[i] + ".xml";
result = UIATarget.localTarget().host().performTaskWithPathArgumentsTimeout("/usr/bin/plutil", ["-convert","xml1", plistBinaryLocation,"-o", plistXMLLocation], 100);
From here, I think the best way to get the contents is to cat or grep the file, since we can't read the file directly from disk. However, I'm having trouble getting the syntax down. Here's an edited snippet of the plist file I'm reading:
<key>server_url</key>
<string>http://pathToServer</string>
There are a bunch of key/string pairs in the file, where the server_url key is unique. Ideally I'd do something like a lookback, but because JavaScript doesn't appear to support it, I figured I'd just get the pair from the file and whittle it down a bit later.
I can search for the key with this:
// This line works
var expression = new RegExp(escapeRegExp("<key>server_url</key>"));
if(result.stdout.match(expression))
{
UIALogger.logMessage("FOUND IT!!!");
}
else
{
UIALogger.logMessage("NOPE :(");
}
Where the escapeRegExp method looks like this:
function escapeRegExp(str)
{
var result = str.replace(/([()[{*+.$^\\|?])/g, '\\$1');
UIALogger.logMessage("NEW STRING: " + result);
return result;
}
Also, this line returns a value (but gets the wrong line):
var expression = new RegExp(escapeRegExp("<string>(.*?)</string>"));
However, when you put the two together, it (the Regex syntax) works on the terminal but doesn't work in code:
var expression = new RegExp(escapeRegExp("<key>server_url</key>[\s]*<string>(.*?)</string>"));
What am I missing? I also tried grep and egrep without any luck.
There are two problems affecting you here getting the regex to work in your JavaScript code.
First, you are escaping the whole regex expression string, which means that your capturing (.*?) and your whitespace ignoring [\s]* will also be escaped and won't be evaluated the way you're expecting. You need to escape the XML parts and add in the regex parts without escaping them.
Second, the whitespace ignoring part, [\s]* is falling prey to JavaScript's normal string escaping rules. the "\s" is turning into "s" in the output. You need to escape that backslash with "\s" so that it stays as "\s" in the string that you pass to construct the regular expression.
I've built a working script that I've verified in the UI Automation engine itself. It should extract and print out the expected URL:
var testString = "" +
"<plistExample>\n" +
" <key>dont-find-me</key>\n" +
" <string>bad value</string>\n" +
" <key>server_url</key>\n" +
" <string>http://server_url</string>\n" +
"</plistExample>";
function escapeRegExp(str)
{
var result = str.replace(/([()[{*+.$^\\|?])/g, '\\$1');
UIALogger.logMessage("NEW STRING: " + result);
return result;
}
var strExp = escapeRegExp("<key>server_url</key>") + "[\\s]*" + escapeRegExp("<string>") + "(.*)" + escapeRegExp("</string>");
UIALogger.logMessage("Expression escaping only the xml parts:" + strExp);
var exp = new RegExp(strExp);
var match = testString.match(exp);
UIALogger.logMessage("Match: " + match[1]);
I should point out, though, that the only thing you need to escape in the regex is the forward slashes in the XML closing tags. That means that you don't need your escapeRegExp() function and can write the expression you want like this:
var exp = new RegExp("<key>server_url<\/key>[\\s]*<string>(.*)<\/string>");

Categories