so i need to get from a text file alot of data and when i use fs.createReadStream and copy the data to a varible and start changing it it looks like \n and \r and present and the are messing my splits and array checking and i tried to do a function that removes them that runs on the array and checks for ''(it doesnt work for some reason)
if(arr[i]==='\'(this throws the mistake){**strong text**
(removing it and stuff)
}
do you have any idea how to remove it?
You can either use the String prototype replaceAll, or work with split and join:
let data = '\r \n sdfsdf. \r'
data = data.replaceAll(`\r`, '')
data = data.replaceAll(`\n`, '')
OR ------------------------
let data = '\r \n sdfsdf. \r'
data = data.split(`\r`).join('').split(`\n`).join()
Pay attention - replaceAll is a new prototype and exists only on Node v.15+ and the latest versions of the modern browsers.
See MDN documents to check your needs.
Related
Say my string is this:
var testexample = <p nameIt="Title">Title_Test</p><figure class="t15"><table><thead><tr>
<th><span>Column1</span></th><th><span>Column2</span></th></tr></thead><tbody><tr><td><span>Entry1</span></td><td><span>Entry2</span></td><td><span>ready</span></td></tr></tbody></table></figure><p ex="ready">!aaa;
It's quite a long string, but it's a table written out in string form. How would I get the words from in between <span> and </span>? For example, I would like it to return Column1, Column2, Entry1, Entry2 (maybe in an array?)
Here is what I tried so far:
storing = testexample.match(/<span>(.*)</span>/);
But it only returned "Column1" I also tried doing matchAll, exec, and doing /<span>(.*)</span>/g. These results gave me the whole string, nothing, things like <th><span>Column1</span></th>, or the just "Column1" again.
I'm quite new at javascript so I'm unsure what I'm doing wrong as I have read the documentation for this. Any help would be appreciated. Thank you.
Your Regex should be using the global and multi flag -- But other than that you need to be checking for more than one instance .. Something like this:
<\s*span[^>]*>(.*?)<\s*\/\s*span\s*>
You can see it at work here:
Rexex 101
ALSO because as stated you can't reliably parse HTML with regex -- I did my best to make sure you could still use styles or attributes INSIDE the <span> tag .. IE <span style="color:#FF0000;"> will still work with the example I provided.
With another example here:
Regex 101
There is a very good answer of #bobince about why you should not even try to use regular expressions for parsing HTML
To help you with an answer you should provide info what environment you would like to use for such job.
Is it browser or node.js and do you have HTML as text or in a page?
I would propose another solution to your problem that creates dom elements that you will query to extract desired data.
/**
* Helper function to transform HTML string to dom element
* #param {string} html
* #param {string} elementType
* #returns {HTMLDivElement}
*/
function htmlToElement(html, elementType = 'div') {
const template = document.createElement(elementType);
template.innerHTML = html.trim(); // Never return a text node of whitespace as the result
return template;
}
const htmlString = `<p nameIt="Title">Title_Test</p><figure class="t15"><table><thead><tr>
<th><span>Column1</span></th><th><span>Column2</span></th></tr></thead><tbody><tr><td><span>Entry1</span></td><td><span>Entry2</span></td><td><span>ready</span></td></tr></tbody></table></figure><p ex="ready">`;
const element = htmlToElement(htmlString);
// extract inner text from spans as array of strings
const arrayOfWords = [...element.querySelectorAll('span')].map(span => span.innerText);
// convert array of strings to space separated string
const wordsJoinedWithSpace = arrayOfWords.join(' ');
// log a result in a console
console.log({arrayOfWords, wordsJoinedWithSpace});
As pointed out, you can't reliably parse random HTML with Regex. HOWEVER, assuming you only want to parse an HTML table of the kind you have in the question, this is your regex:
<span>(.*?)<\/span>
I changed a couple things:
You hadn't escaped the / in </span> so your regex was actually ended earlier
I added a ? in the match anything section. This way the regex will match the shortest possible sequence so you get to match all spans.
Calling match will match all occurences of this regex. This will also include the <span> / </span> parts
Trim the start and ending <span> parts
Here's the complete example:
var testexample = `<p nameIt="Title">Title_Test</p><figure class="t15"><table><thead><tr>
<th><span>Column1</span></th><th><span>Column2</span></th></tr></thead><tbody><tr><td><span>Entry1</span></td><td><span>Entry2</span></td><td><span>ready</span></td></tr></tbody></table></figure><p ex="ready">!aaa`;
var regex = /<span>(.*?)<\/span>/g;
var match = testexample.match(regex);
var columnContent = match.map(m => m.replace("<span>", "").replace("</span>", ""));
console.log(columnContent[0]); // Column1
console.log(columnContent[1]); // Column2
I am making a discord bot and one of the commands allows the user to send and embed to any channel with whatever text they want, but I want them to be able to start a new line in the body of the embed too. Simply having them type "\n" in their message in the command does, not work, the bot will output that \n in the embed instead of making a new line. Is there an easy way to do this?
Embed:
const sayEmbed = new Discord.MessageEmbed()
.setColor('#4d4d4d')
.setTitle(header.join(' '))
.setDescription(args.join(' '))
The description field is where this is occurring when there is a "\n" in the args array it will not make a new line it will simply send.
You don't actually need to use \n, you can just create a new line when sending the message, and discord.js will do all the parsing work for you. I tested this out with my bot:
So I made a kind of simple DIY sort of thing. (LOL, actually I made this as I saw the question, took like 5 mins.)
It allows you to disect a message using the operator | (The one below BackSpace).
I tried using this in an eval command, so I'm sure it works.
The codes are:
// Creating array variables so it doesn't return undefined when we try to `.concat()` it.
let sentences = [];
let temp = [];
// Loops every args
for(l=0;l<args.length;l++) {
// Adding the args as an array to the `temp` variable.
temp = temp.concat(argss[l])
// If we meet `|` which is a sentence separator.
if (args[l] === "|") {
// Join the `temp` array, making it a sentence while removing the `|` operator.
sentences = sentences.concat(temp.join(' ').slice(0, -2));
// Resetting `temp` to reset the saved sentence and start a new one.
temp = [];
}
}
Using .join(' ') will not work since it returns a string from an array, and therefore joining \n cannot be used.
The above method may be more efficient. They use a command such as:
// Say prefix is `.` and the command is `embed`
.embed <header> | <content> | <title1> | <sentence1> | <title2> | <sentence2> |
and you will get sentences[0], sentences[1], sentences[2], sentences[3], sentences[4], sentences[5] respectively. You can then add this to your embed.
This will also allow multi string input, instead of a single args. Don't forget the | at the end since without it, it will ignore the whole last sentence.
const sayEmbed = new Discord.MessageEmbed()
.setColor('#4d4d4d')
.setTitle(sentences[0]) // <header>
.setDescription(sentences[1]) // <content>
.addField(sentences[2], sentences[3]) // <title1> <sentence1>
.addField(sentences[4], sentences[5]) // <title2> <sentence2>
// The more you add, the more it'll allow, you'll have to set it yourself.
TL;DR: A simpler answer:
sentences = args.join(" ").split(" | ");
Sorry, I tend to do things the hard way a lot.
I'm trying to convert a string to JSON in javascript and send the string to a textarea HTML element. The string from the backend looks as follows:
"customBar_query": "select\\n\\neq.Name,\\nAVG(Payload) as [Average Payload]\\n\\nfrom tbl.Cycles as c\\nleft join dim.Equipment as eq on eq.Id = c.Truck_Id\\nwhere c.Timestamp_Loading >= DATEADD(day,-365,GETDATE())\\n\\nGROUP BY eq.Name\\n\\nORDER BY [Average Payload] DESC"
The "\\n" string is supposed to be a valid newline representation.
I'm parsing the string as follows:
var newChartData = JSON.parse(data);
The resulting Javascript string looks as follows:
customBar_query: "select\n\neq.Name,\nAVG(Payload) as [Average Payload]\n\nfrom tbl.Cycles as c\nleft join dim.Equipment as eq on eq.Id = c.Truck_Id\nwhere c.Timestamp_Loading >= DATEADD(day,-365,GETDATE())\n\nGROUP BY eq.Name\n\nORDER BY [Average Payload] DESC"
So far so good. Everything looks OK. However, when browsing the object in Developer Tools, I don't see the normal "enter" symbol indicating a new line. When assigning this value to a textarea using jquery .val(), the text is shown with the "\n" string and obviously without the real new lines as follows:
select\n\neq.Name,\nAVG(Payload) as [Average Payload]\n\nfrom
tbl.Cycles as c\nleft join dim.Equipment as eq on eq.Id =
c.Truck_Id\nwhere c.Timestamp_Loading >=
DATEADD(day,-365,GETDATE())\n\nGROUP BY eq.Name\n\nORDER BY [Average
Payload] DESC
I just cannot figure out what's going on here. It's not supposed to act in this manner and I really don't understand why it's doing this. Any advice will be appreciated!
UPDATE
See below for a snippet from Chrome Developer Tools to show that the string \n is not show as an "enter" symbol
The values are strings. They are not yet newlines. Also we expect to see newlines represented as \n in a string and just a newline in the log
This shows the data with newlines in a textarea
const jsonstring = `{"customBar_query": "select\\n\\neq.Name,\\nAVG(Payload) as [Average Payload]\\n\\nfrom tbl.Cycles as c\\nleft join dim.Equipment as eq on eq.Id = c.Truck_Id\\nwhere c.Timestamp_Loading >= DATEADD(day,-365,GETDATE())\\n\\nGROUP BY eq.Name\\n\\nORDER BY [Average Payload] DESC"}`
const obj = JSON.parse(jsonstring);
const txtArea = document.getElementById("x");
const pre = document.getElementById("output")
Object.values(obj).forEach(val => {
txtArea.value += val;
output.textContent += val;
})
console.log(pre.innerHTML)
textarea {
height: 400px;
width: 400px;
}
<textarea id="x"></textarea>
<pre id="output"></pre>
Ok I found the problem. It seems like '\n' is indeed proper syntax for JSON to be parsed in Javascript. What happened was I had two different data sources pulling from the same function. However, the difference was the first was injected via Django ({{dashboardJson|safe}}) into the HTML file and the second was through Javascript Ajax.
The Django '|safe' function messed around with the \n values and thus I had to send \\n in order for the JSON to be valid. However, the Javascript Ajax only required \n and not \\n which caused it to render invalid strings.
I have no idea what Django changed as the strings looked fine to me, however something definitely happend there.
We're trying to compare the same file name, one is set by = operator, the other is returned by getting the file from Nodejs server after uploaded, as the following code block:
var name = "tên_đẹp.WAV";
// uploaded_file is the file (tên_đẹp) returned by calling an ajax function
// to get the uploaded file in uploaded_folder of a Nodejs server
ajaxUploadFile(name).done( function(e, uploaded_file) {
if(name == uploaded_file.name) {
return uploaded_file; // This line is never reached
else {
console.log(escape(name)); // t%EAn_%u0111%u1EB9p.WAV
console.log(escape(uploaded_file.name)); // te%u0302n_%u0111e%u0323p.WAV
}
}
As you can see the result of the 2 escape commands are different.
I don't know why they use different unicode format and how can I make them use the same Unicode charset or any solution would be much appreciated?
Thanks.
The issue is that "e\u0302" and "\u00EA" are both visually identical. One is the specific character U+00EA (LATIN SMALL LETTER E WITH CIRCUMFLEX), and the other is e with the combining character U+0302 (COMBINING CIRCUMFLEX ACCENT). You must normalize each string to a standard form first to compare them.
require('unorm');
var name = "tên_đẹp.WAV";
// uploaded_file is the file (tên_đẹp) returned by calling an ajax function
// to get the uploaded file in uploaded_folder of a Nodejs server
ajaxUploadFile(name).done( function(e, uploaded_file) {
if(name.normalize() == uploaded_file.name.normalize()) {
return uploaded_file; // This line is never reached
else {
console.log(escape(name)); // t%EAn_%u0111%u1EB9p.WAV
console.log(escape(uploaded_file.name)); // te%u0302n_%u0111e%u0323p.WAV
}
}
Note that I've loaded the unorm module, which polyfills in the .normalize() method being called on the strings. This method is part of ECMA6, and in future versions of Node you will not need to load unorm at all.
It's impossible to say what introduced the differences there, it could have been your text editor or your browser.
%EA == ê
e%u0302 == e + ^
These are two unicode sequences that look the same, but typed differently. If you're need to compare them, you'll have to do unicode normalization first.
The unicode characters in uploaded_file.name are accents. %u0302 is a diacritical mark COMBINING CIRCUMFLEX ACCENT, %u0323 is a diactritical mark COMBINING DOT BELOW.
On the other hand, %EA (ê) and %u1EB9 (ẹ) are the equivalent characters with accents integrated.
This is something handled by Unicode equivalence (see Wikipedia). The sequence of e%u0302 is said to be canonicaly equivalent to %EA, and similarly for the other pair.
To handle the comparison properly in node.js, you have to normalize the strings into a canonical form (NFC or NFD). This can be achieved with unorm:
var unorm = require('unorm');
var s1 = 'êẹ';
var s2 = 'e\u0302e\u0323';
console.log(s1 == s2); // false
console.log(unorm.nfc(s1) == unorm.nfc(s2)); // true
console.log(unorm.nfd(s1) == unorm.nfd(s2)); // true
The choice between NFC (composed) and NFD (decomposed) should not matter in this case.
Important: Note that canonicalization can sometimes introduce nonobvious exploitable vulnerabilities, especially with filenames, as the OS would likely still see them as different. E.g. see this story of spotify: Creative usernames and Spotify account hijacking.
I'm developing a multi-process application using Node.js. In this application, a parent process will spawn a child process and communicate with it using a JSON-based messaging protocol over a pipe. I've found that large JSON messages may get "cut off", such that a single "chunk" emitted to the data listener on the pipe does not contain the full JSON message. Furthermore, small JSON messages may be grouped in the same chunk. Each JSON message will be delimited by a newline character, and so I'm wondering if there is already a utility that will buffer the pipe read stream such that it emits one line at a time (and hence, for my application, one JSON document at a time). This seems like it would be a pretty common use case, so I'm wondering if it has already been done.
I'd appreciate any guidance anyone can offer. Thanks.
Maybe Pedro's carrier can help you?
Carrier helps you implement new-line
terminated protocols over node.js.
The client can send you chunks of
lines and carrier will only notify you
on each completed line.
My solution to this problem is to send JSON messages each terminated with some special unicode character. A character that you would never normally get in the JSON string. Call it TERM.
So the sender just does "JSON.stringify(message) + TERM;" and writes it.
The reciever then splits incomming data on the TERM and parses the parts with JSON.parse() which is pretty quick.
The trick is that the last message may not parse, so we simply save that fragment and add it to the beginning of the next message when it comes. Recieving code goes like this:
s.on("data", function (data) {
var info = data.toString().split(TERM);
info[0] = fragment + info[0];
fragment = '';
for ( var index = 0; index < info.length; index++) {
if (info[index]) {
try {
var message = JSON.parse(info[index]);
self.emit('message', message);
} catch (error) {
fragment = info[index];
continue;
}
}
}
});
Where "fragment" is defined somwhere where it will persist between data chunks.
But what is TERM? I have used the unicode replacement character '\uFFFD'. One could also use the technique used by twitter where messages are separated by '\r\n' and tweets use '\n' for new lines and never contain '\r\n'
I find this to be a lot simpler than messing with including lengths and such like.
Simplest solution is to send length of json data before each message as fixed-length prefix (4 bytes?) and have a simple un-framing parser which buffers small chunks or splits bigger ones.
You can try node-binary to avoid writing parser manually. Look at scan(key, buffer) documentation example - it does exactly line-by line reading.
As long as newlines (or whatever delimiter you use) will only delimit the JSON messages and not be embedded in them, you can use the following pattern:
let buf = ''
s.on('data', data => {
buf += data.toString()
const idx = buf.indexOf('\n')
if (idx < 0) { return } // No '\n', no full message
let lines = buf.split('\n')
buf = lines.pop() // if ends in '\n' then buf will be empty
for (let line of lines) {
// Handle the line
}
})