I'm trying to parse a text file and store to an array, but I can't seem to get rid of the unneeded characters.
For example, some of the text will be "fi nd" or "job;" or "writ,er"
Right now I'm using
lettersTemp = InputDataLine.match(/[a-zA-Z]['*]/);
to parse the text file, but that obviously isn't working because I'm pulling the entire string and not getting rid of the extra characters. Anyone got some advice or an easier way to do this?
Is the result a string that consists of only letters (without space separating words)? If so you can use to the following code to filter the letters;
lettersTemp = InputDataLine.match(/[a-zA-Z]+/g);
And then you can append each line in an array.
If you want to append each word in each line in the array, it seems impossible to realize that.
Related
I have searched for way too long, to solve my existing Problem.
So, there is a Software, that can only read csv-files with ISO-8859-1 (ISO-Latin-1) encoding. And If tried alost everything I can find on the web, but nothing worked. I don´t want to change the Text, I want to change the encoding.
I´ve tried working with this lib: https://github.com/inexorabletash/text-encoding Library and the PapaParse Library and much more. But they are just converting the Text so there are weird symbols replace ä,ö,ü and other characters.
The characters in the csv file itself has characters which are not part of the character set which you're encoding into. You might find that the end of lines have a character which does not exist in the character set you want to use. If you open the csv file in a basic text editor or at the dos prompt use Type myFile.csv you will be able to see which charaters there are in a most basic format. Then stip them out and you should have a file that can be converted. Always work on a copy of the original. The easiest way would be to search and replace in a text editor where you replace the unwanted characters with nothing. not even a space. Keep in mind that if the character is part of the csv construct - eg .. a delimiter - then you would want to replace that with the latin equivalent character.
I found it myself, but thank you anyway.
In js :
const uint8array = new TextEncoder(
'windows-1252',
{NONSTANDARD_allowLegacyEncoding: true}
).encode(csv_data);
const blob = new Blob([uint8array])
(For those, who searched the question: csv_data is an String, containing the values from the read csv-file. Make sure to ad ; behind every value to move to the next column and a /n to move to the next line.)
In HTML :
<script>
window.TextEncoder = window.TextDecoder = null;
</script>
<script src="encoding-indexes.js"></script>
<script src="encoding.js"></script>
And you have to download encoding.js and encoding-indexes.js from https://github.com/inexorabletash/text-encoding
I am building a graph drawer and currently working on the math expression parser. I'm done with most parts but I'm stuck at clearing the input text before parsing it. What I'm trying to achieve now is getting rid of unpermitted characters.
For example, in this text:
5ax+4asxxv+sdflog10aloga(132*43)sin(132)
I want to match everything that is not +,-,*,/,^,(,),ln,log,sin,cos,tan,cot,arcsin,arccos,...
and replace them with "".
so that the output is
5x+4xx+log10log(132*43)sin(132)
I need help with the regex.
Spaces don't matter since I clear them out beforehand.
A little bit tricky - at least I couldn't think of a simple way to do what you ask. The regex would get monstrous.
So I did it the other way around - match what you want to keep, and put it back together.
The regex:
[\d+*/^()x-]|ln|log|(?:arc)?(?:sin|cos)|tan|cot
The code:
var re = /[\d+*/^()x-]|ln|log|(?:arc)?(?:sin|cos)|tan|cot/g,
text = '5ax+4asxxv+sdflog10aloga(132*43)sin(132)arccos(1)';
console.log(text.match(re).join(''));
We have report pages that append a <pre> at the bottom of pages that lists information line by line.
Example:
<pre>
Site Report Info
This is where any error will appear as a query string of numbers: 938109283091238109281092
This is where the account ID will be.
This is where the account reference pin will be.
So on...
So on...
So on...
So on...
So on...
</pre>
Using javascript or jquery, perhaps regex, how can I place all of this into one array where each line is an array element? I assume regex, since the way to determine the lines is by identifying line breaks \n ?
var lines = $('pre').text().split('\n') should do the trick.
You don't need to use jQuery to get the text, of course, but if you're doing web programming, jQuery is pretty ubiquitous.
You may also want to trim the results to get rid of extra whitespace (or not, depending on your application):
var lines = $('pre').text().split('\n').map(function(l) { return l.trim(); });
I have a little interesting issue here. I have a plaintext URL coming from Excel and I need to change it to an HTML URL with a unique body. Here is the regex code for javascript:
text = text.toString().replace(/=hyperlink\(([#\\\w\s\(\)-\.\/]+)\)/g, "<a href='file:///$1'>$1</a>");
This works perfectly fine for what it does. Example, text is:
=hyperlink("\\share\folder\log\2013\13-05-13\13-05-13.txt")
regex turns it into
\\share\folder\log\2013\13-05-13\13-05-13.txt
However, I need the inner HTML to be just the text file name:
13-05-13.txt
To further complicate the matter, the original text the regex is going through is not a single occurrence. It is an entire spreadsheet with 100's of rows that contain this. So the regex will be matching and replacing 100's of these strings in one operation.
Hopefully it is possible to get this all done in one regexp on the entire string, but I suppose I could loop through each line of the string first...
If there is no way to do this with one regex engine, what do you think the best approach is? (no PHP/Python/Server side. Just Javascript, HTML, Jquery, etc).
I guess you could use this regex:
=hyperlink\("([#\\\w\s\(\)\-\.\/]+\\([^"]+))"\)
And this new replace:
$2
I'm not sure how your regex was working, but I added the quotes in the regex and replaced the single quotes by double quotes in the replace. Revert those if need be.
Demo
I having the following code. I want to extract the last text (hello64) from it.
<span class="qnNum" id="qn">4</span><span>.</span> hello64 ?*
I used the code below but it removes all the integers
questionText = questionText.replace(/<span\b.*?>/ig, "");
questionText=questionText.replace(/<\/span>/ig, "");
questionText = questionText.replace(/\d+/g,"");
questionText = questionText.replace("*","");
questionText = questionText.replace(". ",""); i want to remove the first integer, and need to keep the rest of the integers
It's the third line .replace(/\d+/g,"") which is replacing the integers. If you want to keep the integers, then don't replace \d+, because that matches one or more digits.
You could achieve most of that all on one line, by the way - there's no need to have multiple replaces there:
var questionText = questionText.replace(/((<span\b.*?>)|(<\/span>)|(\d+))/ig, "");
That would do the same as the first three lines of your code. (of course, you'd need to drop the |(\d+) as per the first part of the answer if you didn't want to get rid of the digits.
[EDIT]
Re your comment that you want to replace the first integer but not the subsequent ones:
The regex string to do this would depend very heavily on what the possible input looks like. The problem is that you've given us a bit of random HTML code; we don't know from that whether you're expecting it to always be in this precise format (ie a couple of spans with contents, followed by a bit at the end to keep). I'll assume that this is the case.
In this case, a much simpler regex for the whole thing would be to replace eveything within <span....</span> with blank:
var questionText = questionText.replace(/(<span\b.*?>.*?<\/span>)/ig, "");
This will eliminate the whole of the <span> tags plus their contents, but leave anything outside of them alone.
In the case of your example this would provide the desired effect, but as I say, it's hard to know if this will work for you in all cases without knowing more about your expected input.
In general it's considered difficult to parse arbitrary HTML code with regex. Regex is a contraction of "Regular Expressions", which is a way of saying that they are good at handling strings which have 'regular' syntax. Abitrary HTML is not a 'regular' syntax due to it's unlimited possible levels of nesting. What I'm trying to say here is that if you have anything more complex than the simple HTML snippets you've supplied, then you may be better off using a HTML parser to extract your data.
This will match the complete string and put the part after the last </span> till the next word boundary \b into the capturing group 1. You just need to replace then with the group 1, i.e. $1.
searched_string = string.replace(/^.*<\/span>\s*([A-Za-z0-9]+)\b.*$/, "$1");
The captured word can consist of [A-Za-z0-9]. If you want to have anything else there just add it into that group.