Unexpected Regular Expression Output With Javascript's Replace Function - javascript

I am creating a javascript function that retrieves an xxd (bash) output and splits it into 3 pieces. The left piece (with offsets) called offsets, a middle piece called rawVals and the right piece called sidebar.
I have tested the regular on plenty of other xxd outputs and they all work. However, when the following output is given, the regular expression seems to add unexpected characters:
Output (some offsets flipped for explaining purpose):
00054280: c003 000a b042 0000 8b95 0e00 0000 0000 .....B..........
00054290: 0000 0000 cf11 0000 8b95 0e00 0000 0000 ................
00054260: 8502 a861 2000 0000 3c40 8c49 fd5d 8646 ...a ...<#.I.].F
00054270: 3c40 8c49 ac00 8484 8400 0000 0000 0000 <#.I............
Normal behaviour results in:
Group 1
Group 2
Group 3
00054280:
c003 000a b042 0000 8b95 0e00 0000 0000
.....B..........
00054290:
0000 0000 cf11 0000 8b95 0e00 0000 0000
................
Unexpected behaviour results in:
Group 1
Group 2
Group 3
00054260:].F
8502 a861 2000 0000 3c40 8c49 fd5d 8646 ].F
...a ...<#.I.].F
00054270:...
03c40 8c49 ac00 8484 8400 0000 0000 0000 ...
<#.I............
For some reason ].F & ... are added after the offset and rawVals content.
Javascript code:
function getSplitXxdOutput(content){
var splitRegex = /([a-fA-F0-9]*:)(?: )((?:[a-fA-F0-9]{4} ){8})(?: )(.{16})/g
var offsets = content.replace(new RegExp(splitRegex), '$1');
var rawVals = content.replace(new RegExp(splitRegex), '$2');
var sidebar = content.replace(new RegExp(splitRegex), '$3');
return [offsets.split("\n"), rawVals, sidebar.split("\n")];
}
Code as snippet:
var input = `00054260: 8502 a861 2000 0000 3c40 8c49 fd5d 8646 ...a ....#.I.].F
00054270: 3c40 8c49 ac00 8484 8400 0000 0000 0000 <#.I............
00054280: c003 000a b042 0000 8b95 0e00 0000 0000 .....B..>.......
00054290: 0000 0000 cf11 0000 8b95 0e00 0000 0000 ................
000542a0: c100 6300 6f00 6c00 6f00 7200 7300 2e00 ..c.o.l.o.r.s...
000542b0: 6a00 7000 6700 0000 0000 0000 0000 0000 j.p.g...........
000542c0: 0502 f4e0 2000 0000 6040 8c49 7967 3e49 .... ...#.Iyg>I
000542d0: 6040 8c49 4000 8484 8400 0000 0000 0000 #.I#...........`;
function getSplitXxdOutput(content){
var splitRegex = /([a-fA-F0-9]*:)(?: )((?:[a-fA-F0-9]{4} ){8})(?: )(.{16})/g
var offsets = content.replace(new RegExp(splitRegex), '$1');
var rawVals = content.replace(new RegExp(splitRegex), '$2');
var sidebar = content.replace(new RegExp(splitRegex), '$3');
return [offsets.split("\n"), rawVals, sidebar.split("\n")];
}
var output = getSplitXxdOutput(input);
console.log(output);
The content variable is a string that contains the xxd output. The global regular expression is written to splitRegex and splits the output in 3 capturing groups:
all a-F characters and numbers followed by a ":"
all a-F characters and numbers followed by a space and this 4 x 8 (xxxx times 8)
the values (can be any character, 16 of them)
Some ideas and what I have tried to solve the problem:
I have checked the regular expression on regexr.com where it behaves as expected, even with the aforementioned output. Which makes me think the problem is not related to the regular expression.
I have the feeling that the reason for the unexpected behaviour is caused by the "<" character. When it is removed manually, the problem seems to be solved.
I have tried escaping some of the special characters but that doesn't seem to solve the problem. Since I am not certain of the success of the used method, I didn't include it in the question.
I could replace the character but then the output is altered with, which is not the solution I want.
Is there a way to write "<" without escaping it and to avoid it being executed or cause the issue?
Output of console.log that is retrieved from the element:
'''html
00054260: 8502 a861 2000 0000 3c40 8c49 fd5d 8646 ...a ....#.I.].F
00054270: 3c40 8c49 ac00 8484 8400 0000 0000 0000 <#.I............
00054280: c003 000a b042 0000 8b95 0e00 0000 0000 .....B..>.......
00054290: 0000 0000 cf11 0000 8b95 0e00 0000 0000 ................
000542a0: c100 6300 6f00 6c00 6f00 7200 7300 2e00 ..c.o.l.o.r.s...
000542b0: 6a00 7000 6700 0000 0000 0000 0000 0000 j.p.g...........
000542c0: 0502 f4e0 2000 0000 6040 8c49 7967 3e49 .... ...#.Iyg>I 000542d0: 6040 8c49 4000 8484 8400 0000 0000 0000 #.I#...........
'''
It is retrieved with:
var selectedHtml = document.getElementById(id).querySelectorAll("code");
var content = selectedHtml[1].querySelectorAll(".xxd-content");

Related

How to generate UniqueID by combining characters and numbers, and apply padding in ECMAScript

I want to generate UniqueID for objects, by generating the character part of the UniqueID from ASCII values, without declaring any Arrays, The Unique ID should start from AA01 and continue through AA99, then AB01 through AB99, then AC01 though AC99, AD01 -> AD99, -> AE01 -> AE99 ..and so on. I also need to apply padding, so the UniqueID always has 4 values like "AC08" instead of "AC8".
Below is a snippet of what I have done.
function genUID (a,b){
var res="";
var res2="";
var res3;
if (a=>65 && a<=90) {
res = String.fromCharCode(a);
if(b=>65 && b<=90) {
res2= String.fromCharCode(b); b++;
for(c=1;c<150;c++){
if(c<100){
(res3=c);
}
else {
(res3= c-99); }
console.log(res+""+res2+""+res3);
}
a++ }
} }
Are you not making this way more complicated than it needs to be? Just increase a normal number, format it to four digits length by padding zeroes on the left - and then just “translate” the first two numeric digits to their character “equivalent”, by adding the difference between the character codes for A and 0 ...
for(var i=1; i<3000; ++i) {
var padNum = ("000"+i).substr(-4),
uniqID =
String.fromCharCode(padNum.charCodeAt(0)+17) +
String.fromCharCode(padNum.charCodeAt(1)+17) +
padNum[2] +
padNum[3];
console.log(padNum, uniqID)
}
Result: (Snippet console here does not show the full result, but only the last few lines)
0001 AA01
0002 AA02
0003 AA03
0004 AA04
0005 AA05
0006 AA06
0007 AA07
0008 AA08
0009 AA09
0010 AA10
0011 AA11
...
0099 AA99
0100 AB00
0101 AB01
0102 AB02
...
0199 AB99
0200 AC00
0201 AC01
0202 AC02
...
0998 AJ98
0999 AJ99
1000 BA00
1001 BA01
1002 BA02
...

How setting endianess works for DataView

I'm reading this explanation of DataView and there's an example there:
var littleEndian = (function() {
var buffer = new ArrayBuffer(2);
new DataView(buffer).setInt16(0, 256, true /* littleEndian */);
// Int16Array uses the platform's endianness.
return new Int16Array(buffer)[0] === 256;
})();
I don't really understand what this line does:
new DataView(buffer).setInt16(0, 256, true /* littleEndian */);
Does it mean that the data stored in the range [0;256] bits should be stored in littleEndian?
Suppose we create an array buffer and array like this:
var dv = new DataView(new ArrayBuffer(4));
It means that we've got 32 bits in memory:
0000 0000 0000 0000 0000 0000 0000 0000
Now, we want to store the number 0x0103, which has the pattern:
0000 0001 0000 0011
Now, let's store this number in first two bytes using little endianess, and in the second two bytes using big endianess and see how it's laid out in the memory. So:
dv.setInt16(0, 0x0103, true);
dv.setInt16(2, 0x0103, false);
Now, the bits in the DataView have this pattern:
0000 0011 0000 0001 0000 0001 0000 0011
Here is the code to test that behavior:
var little = dv.getUint16(0);
little === 0x0103 // false
little === 0x0301 // true
var big = dv.getUint16(2);
big === 0x0103 // true
big === 0x0301 // false

What exactly does ~ do? [duplicate]

This question already has answers here:
What does a tilde do when it precedes an expression?
(5 answers)
Closed 8 years ago.
I see sometimes the symbol ~ in code. I tried it with ~1, and it shows 0.
And thus, I can see some code using this trick:
if ( !~text.indexOf('a') ){ }
To check for truthy value. Is it kind of bit shifting?
It's the bitwise NOT operator. It will convert the operand to an 32-bit integer, then yields one's complement (inverts every bit) of that integer.
Finally, ! will return true if and only only if the result of that operation is 0.
Some examples might help:
x | x (bin) | ~x (bin) | ~x | !~x
-3 | 1111…1101 | 0000…0010 | 2 | false
-2 | 1111…1110 | 0000…0001 | 1 | false
-1 | 1111…1111 | 0000…0000 | 0 | true
0 | 0000…0000 | 1111…1111 | -1 | false
1 | 0000…0001 | 1111…1110 | -2 | false
In other words,
if ( !~text.indexOf('a') ) { }
is equivalent to:
if ( text.indexOf('a') == -1 ) { }
~ is the bitwise negation operator[MDN]. It converts its operand to a 32-bit integer and swaps all the 1s to 0s and all the 0s to 1s.
For example:
0000 0000 0000 0000 0000 0000 0000 0000 = 0
1111 1111 1111 1111 1111 1111 1111 1111 = ~0 = -1
Instead of doing text.indexOf(str) !== -1) you can use the tricky !~text.indexOf(str), because ~1 === 0 and !0 === true.
~ is the unary negation operator. Basically converts the operand to a 32-bit integer and then flips every bit of the integer.
~12 =
~(00000000 00000000 00000000 00001100) =
(11111111 11111111 11111111 11110011) =
-13

javascript compression algorithm that supports BINARY data?

I'm looking for a lossless compression algorithm (like LZW or Huffman or anything) in javascript, that accepts and returns raw, binary data.
With 'binary data' I mean a sequence of bytes, which may come in any of the following forms:
a string containing characters with any value from 0 to 255
an array containing integers with any value from 0 to 255
a string containing a hexadecimal representation of the data (i.e. 2 hex digits per byte)
a string containing the base64 encoded representation of the data
or anything else that can be unambiguously converted from or to any of the above
Now obviously there are TONS of javascript implementations available everywhere, for a wide range of algorithms. However EVERYTHING I find seems to do crazy stuff like:
returning an array containing also values >255 (so what is the compression ratio now? how do I represent this in bytes, or how would I go about saving this to a file for example?)
messing with character encodings in strings, converting from/to unicode or url/html entities or whatnot (it's BINARY, character encoding does not apply here!)
return other representations that don't seem suitable for binary storage (i.e. cannot be converted to sequence of bytes)
Would anyone know of a good javascript compression (+decompression) implementation that suits my binary fetish?
I think I found what I was looking for after all: this deflate + inflate implementation in javascript seems to work with strings as byte sequences.
first of all create a closure for hold the binar or hex or decimal flags
function ASearch() { }
ASearch.Flag = {
Front_Wheel_Drive: 0xF, Rear_Wheel_Drive: 0xF0, Four_Wheel_Drive: 0xF00,
Auto: 0xFF, Manual: 0xFF00,
Gas: 0xF, Diesel: 0xF0, Hybrid: 0xF00, Electrical: 0xF000,
Two: 1, Three: 2, Four: 4, Five: 8, Six: 16, Eight: 32, Ten: 64, Twelve: 128
};
then set like this
SetFlag = (function (e) {
e = e.srcElement;
$("#" + e.parentNode.name).val(e.checked ?
$("#" + e.parentNode.name).val() | ASearch.Flag[e.id] :
$("#" + e.parentNode.name).val() ^ ASearch.Flag[e.id]);
});
this is an example for packed data in a 32 bit integer
there are four variable... i've used them for 18 flags.. this is fast and super effective
for example...
int i = 0; //binary = 0000 0000 0000 0000
i = i | 255; //binary = 0000 0000 1111 1111
i = i ^ 255; //binary = 0000 0000 0000 0000
i = i | 0xFF00; //binary = 1111 1111 0000 0000
i = i | 255; //binary = 1111 1111 1111 1111
i = i ^ 0xFF00; //binary = 0000 0000 1111 1111

JavaScript byte logic

What does this JavaScript code mean?
flag &= ~CONST
Is it append, prepend, intersection or something else?
Look at Bitwise operators.
& Operator
& puts 1 where both operands' bits are 1.
Example
10000001 & 00000001 = 00000001
~ Operator
~ inverts the bits.
Example
~10000000 = 011111111;
flag &= ~CONST is short hand for flag = flag & ~CONST;.
You may have seen something similar, e.g. number *= 10.
This will turn off whatever constant represents.
For example, lets look at a hypothetical example of code which would represent the state of a window:
WS_HASBORDER = 0x01;
WS_HASCLOSEBUTTON = 0x02;
WS_HASMINIMIZEBUTTON = 0x04;
WS_HASMAXIMIZEBUTTON = 0x08;
WS_ISMAXIMIZED = 0x10;
We could represent the "state" of the window by using
windowState = WS_HASBORDER | WS_HASCLOSEBUTTON | ... etc
now, lets say we want to "turn off" one of these states, well, thats what your example code does...
windowState &= ~WS_HASBORDER
Now what the above code does, is it gets the compliment [i guess you could call it the inverted bits] of whatever is to its right, WS_HASBORDER.
So.. WS_HASBORDER has one bit turned on, and everything else is turned off. Its compliment has all bits turned on, except for the one bit that was turned off before.
Since I've represented the many constants as bytes, i'll just show you an example [not that javascript doesn't represent numbers as bytes, nor can you do so]
WS_HASBORDER = 0x01; //0000 0001
WS_HASCLOSEBUTTON = 0x02; //0000 0010
WS_HASMINIMIZEBUTTON = 0x04; //0000 0100
WS_HASMAXIMIZEBUTTON = 0x08; //0000 1000
WS_ISMAXIMIZED = 0x10; //0001 0000
_ now for an example
windowState = WS_HASBORDER | WS_HASCLOSEBUTTON | WS_HASMINIMIZEBUTTON |
WS_HASMAXIMIZEBUTTON | WS_ISMAXIMIZED;
0000 0001
0000 0010
0000 0100
0000 1000
and) 0001 0000
--------------
0001 1111 = 0x1F
So... windowState gets the value 0x1F
windowState &= ~ WS_HASMAXIMIZEBUTTON
WS_HASMAXIMIZEBUTTON: 0000 1000
~WS_HASMAXIMIZEBUTTON: 1111 0111
..To finish our calculation
windowState
&) ~WS_HASMAXIMIZEBUTTON
becomes
0001 1111
&) 1111 0111
-------------
0001 0111 = 0x07
Here are your resulting flags:
On:
WS_HASBORDER
WS_HASCLOSEBUTTON
WS_HASMINIMIZEBUTTON
WS_ISMAXIMIZED
Off:
WS_HASMAXIMIZEBUTTON
Hope that helps. Back to procrastinating homework I go! haha.

Categories