I'm looking for av way to convert a string into whitespace; spaces, newlines and tabs, and the other way around.
I found a Python script, but I have no idea how to do it using Javascript.
I need it for a white-hacking contest.
I can has banana? ;)
var ws={x:'0123',y:' \t\r\n',a:/[\w\W]/g,b:/[\w\W]{8}/g,c:function(z){return(
ws.y+ws.x)[(ws.x+ws.y).indexOf(z)]},e:function(s){return(65536+s.charCodeAt(0)
).toString(4).substr(1).replace(ws.a,ws.c)},d:function(s){return String.
fromCharCode(parseInt(s.replace(ws.a,ws.c),4))},encode:function(s){return s.
replace(ws.a,ws.e)},decode:function(s){return s.replace(ws.b,ws.d)}};
// test string
var s1 = 'test0123456789AZaz€åäöÅÄÖ';
// show test string
alert(s1);
// encode test string
var code = ws.encode(s1);
// show encoded string
alert('"'+code+'"');
// decode string
var s2 = ws.decode(code);
// show decoded string
alert(s2);
// verify that the strings are completely identical
alert(s1 === s2);
Related
Apologies, I'm not a JS developer and this is the first time I've worked with InputStream.
In the InputStream, I am processing one line of delimited text at a time that will always contain a character that is not UTF-8. My goal is to parse the InputStream to a string, split it by the delimiter, and read a certain value that is UTF-8 at an index.
The line will always be tab delimited, and will always contain the same number of delimiters. I might see something like this (two separate lines):
stuff morestuff 0.00 A ç F00012049333302129FF
stuff2 morestuff2 B è F00012205229521042CB
In my code, the value at the index position always seems to leave my variable undefined, and I'm assuming it's from the UTF-8 encoding in the toString method. My assumption is that the encoding is turning the non UTF-8 character into something that messes up the split function, but I'm not sure what or how. Here's some test code:
var InputStreamCallback = Java.type("org.apache.nifi.processor.io.InputStreamCallback");
var IOUtils = Java.type("org.apache.commons.io.IOUtils");
var StandardCharsets = Java.type("java.nio.charset.StandardCharsets");
var flowFile = session.get();
var index = 5;
session.read(flowFile,
new InputStreamCallback(function(inputStream) {
// Convert the single line of the flowfile into a UTF_8 encoded string
var line = IOUtils.toString(inputStream, StandardCharsets.UTF_8);
// Split the delimited string into an array
var dataArray = line.split('\t');
// Capture the required value at the defined index position
var capturedValue = dataArray[index];
}));
if (typeof capturedValue === 'undefined') {
// log an error
}
else {
// do what it's supposed to do
}
I'm hoping someone could explain what exactly is happening, and help me find a solution that will allow me to look up the correct value at my predetermined index position.
My issue is the following:
I have a field with a file path: "\\random.ad.test.stuff.com\folder\level 1\51. level 2\ level 3"
I want to create an array with this information
function myFunction() {
var str = "\\random.ad.test.stuff.com\folder\level 1\51. level 2\level 3";
var array = str.replace(/[\-\[\]\/\{\}\(\)\*\+\?\.\\\^\$\|]/g, "_");
document.getElementById("demo").innerHTML = array;
}
Problem is that \51 the character code for a right parenthesis. So the result is
"_random_ad_test_stuff_comfolder_level 1__. level 2_level 3".
How can I escape the \51 as well as insert a _after .com ?
You can't escape the string after-the-fact. In a string literal, as you said, \51 is ), exactly as though you'd typed ) in the string literal; there is no difference in the resulting string:
console.log("\51" === ")"); // true
You have to escape the characters in the literal:
var str = "\\\\random.ad.test.stuff.com\\folder\\level 1\\51. level 2\\level 3";
// --------^-^-------------------------^-------^--------^------------^
console.log(str);
Note that this is just because you're using a string literal. If you read that string from somewhere, there's no need to escape it at all. Escaping (in this sense) is a string literal thing, not a string thing.
You've said this comes from an XML file, and asked what you have to do to the file to avoid this problem. The answer is: Nothing. Read in the XML file, and when you get those filenames from it, you'll get strings with the correct characters again, escaping is for string literals, but XML isn't a string literal.
Example:
// "Read" the file
var xmlText = document.querySelector("#xml").textContent;
// Parse it
var oParser = new DOMParser();
var oDOM = oParser.parseFromString(xmlText, "application/xml");
// Use its contents; the information you'll get will be valid strings,
// no escaping needed
var entries = oDOM.querySelectorAll("entry");
console.log(entries[0].getAttribute("attr"));
console.log(entries[1].firstChild.nodeValue);
<script id="xml" type="text/xml"><root>
<entry attr="\\random.ad.test.stuff.com\folder\level 1\51. level 2\level 3" />
<entry>\\random.ad.test.stuff.com\folder\level 1\51. level 2\level 3</entry>
</root></script>
In that example, I've shown taking the string from an attribute, or from the body of an element, the two usual ways you put information in XML.
Encoding my URL works perfectly with base-64 encoding. So does decoding but not with the string literal variable.
This works:
document.write(atob("hi"));
This does not:
var tempvar = "hello";
document.write(atob(tempvar));
What am I doing wrong? Nothing is displayed. But if I quote "tempvar", then it of course works but is not the same thing since "tempvar" is a string, not a variable.
Your Question
What am I doing wrong?
The string being passed to atob() is a string literal of length 5 (and not technically a base-64 encoded string). The browser console should reveal an exception in the error log (see explanation in The cause below).
The cause
Per the MDN documentation of atob():
Throws
Throws a DOMException if the length of passed-in string is not a multiple of 4. 1
The length of the string literal "hello" (i.e. 5) is not a multiple of 4. Thus the exception is thrown instead of returning the decoded version of the string literal.
A Solution
One solution is to either use a string that has actually been encoded (e.g. with btoa()) or at least has a length of four (e.g. using String.prototype.substring()). See the snippet below for an example.
var tempvar = "hello";
window.addEventListener("DOMContentLoaded", function(readyEvent) {
var container = document.getElementById("container");
//encode the string
var encoded = btoa(tempvar);
container.innerHTML = encoded;
var container2 = document.getElementById("container2");
//decode the encoded string
container2.innerHTML = atob(encoded);
var container3 = document.getElementById("container3");
//decode the first 4 characters of the string
container3.innerHTML = atob(tempvar.substring(0, 4));
});
<div> btoa(tempvar): <span id="container"></span></div>
<div> atob(decoded): <span id="container2"></span></div>
<div> atob(tempvar.substring(0, 4)): <span id="container3"></span></div>
1https://developer.mozilla.org/en-US/docs/Web/API/WindowOrWorkerGlobalScope/atob
It's because it can't decode the string "hello", try an actual string that can be decoded from base64, here is an example;
var tempvar = "aHR0cDovL3N0YWNrb3ZlcmZsb3cuY29tL3F1ZXN0aW9ucy80MzEyOTEzNi9kZWNvZGluZy1ub3Qtd29ya2luZy13aXRoLWJhc2U2NA==";
document.write(atob(tempvar));
If you want to encode, use the btoa function instead,
var tempvar = "hello";
document.write(btoa(tempvar));
You can use this website to test decoding and encoding base64, https://www.base64encode.org/
it's because you are trying to decode a not base64 encoded string
that it works on hi is just a coincidence it seems.
atob = decode
btoa = encode
You're using the wrong function. You should use btoa() to encode.
When you do atob('hi'), you're actually decoding 'hi', which happens to be valid base-64.
How do I convert the below string:
var string = "Bouchard+P%E8re+et+Fils"
using javascript into UTF-8, so that %E8 would become %C3%A8?
Reason is this character seems to be tripping up decodeURIComponent
You can test it out by dropping the string into http://meyerweb.com/eric/tools/dencoder/ and seeing the console error that says Uncaught URIError: URI malformed
I'm looking specifically for something that can decode an entire html document, that claims to be windows-1252 encoded which is where I assume this %E8 character is coming from, into UTF-8.
Thanks!
First create a map of Windows-1252. You can find references to the encoding using your search engine of choice.
For the sake of this example, I'm going to include on the character in your sample data.
Then find all the percentage signs followed by two hexadecimal characters, convert them to numbers, and convert them using the map (to get raw data), then convert them again using encodeURIComponent (to get the encoded data).
var string = "Bouchard+P%E8re+et+Fils"
var w2512chars = [];
w2512chars[232] = "è"
var percent_encoded = /(%[a-fA-F0-9]{2})/g;
function filter(match, group) {
var number = parseInt(group.substr(1), 16);
var character = w2512chars[number];
return encodeURIComponent(character);
}
string = string.replace(percent_encoded, filter);
alert(string);
I have a string in JS in this format:
http\x3a\x2f\x2fwww.url.com
How can I get the decoded string out of this? I tried unescape(), string.decode but it doesn't decode this. If I display that encoded string in the browser it looks fine (http://www.url.com), but I want to manipulate this string before displaying it.
Thanks.
You could write your own replacement method:
String.prototype.decodeEscapeSequence = function() {
return this.replace(/\\x([0-9A-Fa-f]{2})/g, function() {
return String.fromCharCode(parseInt(arguments[1], 16));
});
};
"http\\x3a\\x2f\\x2fwww.example.com".decodeEscapeSequence()
There is nothing to decode here. \xNN is an escape character in JavaScript that denotes the character with code NN. An escape character is simply a way of specifying a string - when it is parsed, it is already "decoded", which is why it displays fine in the browser.
When you do:
var str = 'http\x3a\x2f\x2fwww.url.com';
it is internally stored as http://www.url.com. You can manipulate this directly.
If you already have:
var encodedString = "http\x3a\x2f\x2fwww.url.com";
Then decoding the string manually is unnecessary. The JavaScript interpreter would already be decoding the escape sequences for you, and in fact double-unescaping can cause your script to not work properly with some strings. If, in contrast, you have:
var encodedString = "http\\x3a\\x2f\\x2fwww.url.com";
Those backslashes would be considered escaped (therefore the hex escape sequences remain unencoded), so keep reading.
Easiest way in that case is to use the eval function, which runs its argument as JavaScript code and returns the result:
var decodedString = eval('"' + encodedString + '"');
This works because \x3a is a valid JavaScript string escape code. However, don't do it this way if the string does not come from your server; if so, you would be creating a new security weakness because eval can be used to execute arbitrary JavaScript code.
A better (but less concise) approach would be to use JavaScript's string replace method to create valid JSON, then use the browser's JSON parser to decode the resulting string:
var decodedString = JSON.parse('"' + encodedString.replace(/([^\\]|^)\\x/g, '$1\\u00') + '"');
// or using jQuery
var decodedString = $.parseJSON('"' + encodedString.replace(/([^\\]|^)\\x/g, '$1\\u00') + '"');
You don't need to decode it. You can manipulate it safely as it is:
var str = "http\x3a\x2f\x2fwww.url.com";
alert(str.charAt(4)); // :
alert("\x3a" === ":"); // true
alert(str.slice(0,7)); // http://
maybe this helps: http://cass-hacks.com/articles/code/js_url_encode_decode/
function URLDecode (encodedString) {
var output = encodedString;
var binVal, thisString;
var myregexp = /(%[^%]{2})/;
while ((match = myregexp.exec(output)) != null
&& match.length > 1
&& match[1] != '') {
binVal = parseInt(match[1].substr(1),16);
thisString = String.fromCharCode(binVal);
output = output.replace(match[1], thisString);
}
return output;
}
2019
You can use decodeURI or decodeURIComponent and not unescape.
console.log(
decodeURI('http\x3a\x2f\x2fwww.url.com')
)