I have 2 very big txt files, file A contains some strings and on file B I have all the strings that i will search on file A.
Actually I turned file B into an array and with fs I loaded file A. The problem is that file B is containing some strange strings like R<f(9f so when I do match with regexp, the program exits with the error message /: Unterminated groupregular expression: /R<f(9f.
So what I want to do is that the regexp match "treats" the characters as strings and not as instructions.
console.time('program');
const fs = require('fs');
const filePath = "./processhackerfile.txt";
const hackStringsPath = "./hackstrings.txt";
var hackStrings = fs.readFileSync(hackStringsPath).toString().split("\n");
console.log(hackStrings.length);
var file = fs.readFileSync(filePath).toString();
for(i in hackStrings){
var regex = new RegExp(hackStrings[i].toString(), 'i');
var stringSearch = file.match(regex);
if(stringSearch != null){
console.log(`Cheat found, string name: ${stringSearch}`);
} else {
console.log('Cheat not found');
}
}
console.timeEnd('program');
You can find the code here
You need to escape the string to use in regular expression verbatim. Unfortunately there seems to be no built-in method to do that, however there are npm packages available for that, like this one: https://www.npmjs.com/package/escape-string-regexp
Related
I want to get the directory from the file path without the file name in JavaScript. I want the inputs and outputs in the following behavior.
Input: '/some/path/to/file.txt'
Output: '/some/path/to'
Input: '/some/path/to/file'
Output: '/some/path/to/file'
Input: '/some/folder.with/dot/path/to/file.txt'
Output: '/some/folder.with/dot/path/to'
Input: '/some/file.txt/path/to/file.txt'
Output: '/some/file.txt/path/to'
I was thinking of doing this using RegExp. But, not sure how the exact RegExp should be written.
Can someone help me with an EFFICIENT solution other than that or the RegExp?
Looking at your examples looks like you want to treat anything except last filename as directory name where filename always contains a dot.
To get that part, you can use this code in Javascript:
str = str.replace(/\/\w+\.\w+$/, "");
Regex \/\w+\.\w+$ matches a / and 1+ word characters followed by a dot followed by another 1+ word characters before end of string. Replacement is just an empty string.
However, do keep in mind that some filenames may not contain any dot character and this replacement won't work in those cases.
You could use lastIndexOf to get the index and then use slice to get the desired result.
const strArr = [
"/some/path/to/file.txt",
"/some/path/to/file",
"/some/folder.with/dot/path/to/file.txt",
"/some/file.txt/path/to/file.txt",
];
const result = strArr.map((s) => {
if (s.match(/.txt$/)) {
const index = s.lastIndexOf("/");
return s.slice(0, index !== -1 ? index : s.length);
} else return s;
});
console.log(result);
Using regex
const strArr = [
"/some/path/to/file.txt",
"/some/path/to/file",
"/some/folder.with/dot/path/to/file.txt",
"/some/file.txt/path/to/file.txt",
];
const result = strArr.map((s) => s.replace(/\/\w+\.\w+$/, ""));
console.log(result);
so i'm making a simple function that separates the file name and the directory path. I believe there is an easier way with node's Path module but I thought i'd do it myself for this project.
so the problem is when i'm writing a backslash character in a string, I'm escaping them in the string like "directory\AnothaDirectory". It runs, but the double "\" and the "\\" used in order to escape are still remaining in the strings after they are parsed. ex: "C:\\Documents\Newsletters".
I have tried both to use single backslahses, which throws compiler errors as one could expect. but I have also tried to use forward slashes. what could be the reason the backslashes are not being escaped?
function splitFileNameFromPath(path,slashType){
let pathArray = path.split(slashType),
fileName = pathArray[pathArray.length - 1],
elsIndexes = pathArray.length - 1,
pathSegs = pathArray.slice(0, elsIndexes);
let dirPath = pathSegs.join(slashType);
//adds an extra slash after drive name and colon e.g."C:\\"
dirPath = dirPath.replace( new RegExp("/\\/","ug"), "\\" )
//removes illegal last slash
let pathSeg = pathSegs.slice(0,-1)
return [dirPath, fileName]
}
let res = splitFileNameFromPath("C:\\\\Documents\\Newsletters\\Summer2018.pdf","\\");
console.log(res)
There are some moments in this code I do not understand.
"C:\\\\Documents\\Newsletters\\Summer2018.pdf" (i.e. "C:\\Documents\Newsletters\Summer2018.pdf") does not seem like a valid Windows path as there are no double slashes after the drive letter usually used (it is not like in the URL 'https://...').
new RegExp("/\\/","ug") is equal to /\/\//gu and does not match anythhing.
The result of let pathSeg = pathSegs.slice(0,-1) is not used at all.
It seems to me this code is enough to achive the task:
'use strict';
function splitFileNameFromPath(path, slashType) {
const pathArray = path.split(slashType),
fileName = pathArray.pop(),
dirPath = pathArray.join(slashType);
return [dirPath, fileName];
}
const path = "C:\\Documents\\Newsletters\\Summer2018.pdf";
const slash = "\\";
const res = splitFileNameFromPath(path, slash);
console.log(res);
console.log(path === res.join(slash));
I am trying to get a the extension from a filename. The filename could include special characters, "#,#,.,_,(),..etc)
ex:
var file1 = "fake.der"
var file2 = "fake.1.der"
var file3 = "fake_test.3.der"
NOw In the above case I want to extract only the ext "der" from every filename.
I tried:
file1.split(".")[1] //works fine
file2.split(".")[1] // gives me 1 -incorrect but file2.split(".")[2] gives correct result
file3.split(".")[1] //gives 3-incorrect.
since filename could vary, I dont kinda want to make it the .split(".")[1] static, by changing it to .split(".")[2] for other filenames and so on..
HOw can I make sure that regardless of how many dots present in the filename, I'll always get the extension only as o/p, is there a better appraoch?
Thanks!
Use a regular expression to match a dot, followed by non-dot characters, followed by the end of the string:
function getExt(str) {
const match = str.match(/\.([^.]+)$/);
if (match) {
return match[1];
} else {
return 'Not found';
}
}
var file1 = "fake.der";
var file2 = "fake.1.der";
var file3 = "fake_test.3.der";
var file4 = "foobar";
[file1, file2, file3, file4].forEach(str => console.log(getExt(str)));
Note that you can't always be sure that an input string contains a well-formatted file extension, so make sure to handle those unexpected cases, as done above.
With lastIndexOf:
function getExtension(file) {
const index = file.lastIndexOf('.');
return index === -1 ? '' : file.slice(index + 1);
}
This also handles the case if the string does not contain a ..
you can use the \w in a regular expression which matches any "word" character. A "word" character is any letter or digit or the underscore character. You should use $ which starts marching from the back of the string
function ext(path) {
let extension = path.match(/\w+$/)
return extension ? extension[0].replace(".","") : null;
}
Just use .split() and some length calculations:
var file1 = "fake.der";
var file2 = "fake.1.der";
var file3 = "fake_test.3.der";
function getExtension(name) {
var nameArr = name.split(".");
var fileExt = nameArr[nameArr.length - 1];
return fileExt;
}
console.log(getExtension(file1));
console.log(getExtension(file2));
console.log(getExtension(file3));
Use slice ;)
const fileName = "file.name.extension.der";
console.log(fileName.split('.').slice(-1));
I'm in the process of porting some php code I have to nodejs.
The issue I have concerns this PCRE regex:
/\/?_?[0-9]*_?([^\/\._]*)[_#*\-*\.?\p{L}\p{M}*]*$/u
(this regex matches first in _4_first_ääää,in _first_äääää or first_äääää)
I'm using XRegExp in this context, but with no luck:
// lib/parser.js
var XRegExp = require('xregexp').XRegExp;
module.exports = {
getName : function(string){
var name = XRegExp('\/?_?[0-9]*_?([^\/\._]*)[_#*\-*\.?\p{L}\p{M}*]*$');
var matches = XRegExp.exec(string, name);
if(matches && matches.length > 0){
return matches[1];
}
else{
return '';
}
}
};
And the test (mocha) that goes with it:
// test/test.js
var assert = require("assert");
var parser = require('../lib/parser.js');
describe('parser', function(){
describe('#getName()', function(){
it('should return the name contained in the string', function(){
assert.equal('test', parser.getName('3_test'));
assert.equal('test', parser.getName('test'));
assert.equal('test', parser.getName('_3_test'));
assert.equal('test', parser.getName('_3_test_ääää'));
assert.equal('test', parser.getName('_3_test_boom'));
})
})
})
And the tests results:
0 passing (5ms)
1 failing
1) parser #getName() should return the name contained in the string:
AssertionError: "test" == "ääää"
+ expected - actual
+ääää
-test
This code matches ääää.
The commented line catches first so I guess I'm missusing the unicodes caracter classes.
My question is: how can I make my original php regex work in javascript?
Mmaybe there is a work around?
Put an anchor at the begining:
^\/?_?[0-9]*_?([^\/\._]*)[_#*\-*\.?\p{L}\p{M}*]*$
Also you could remove the unnecessary escaping:
^/?_?[0-9]*_?([^/._]*)[-_#*.?\p{L}\p{M}]*$
Your regex matches also an empty string, may be you want:
^/?_?[0-9]*_?([^/._]+)[-_#*.?\p{L}\p{M}]+$
According to your sample, id could be:
^/?(?:(?:_\d+)?_)?([^/._]+)[-_#*.?\p{L}\p{M}]+$
I finally managed to find the origin of the problem. The \p{L} and \p{M} need another backslash in the Xregexp syntax. That change made the original regex work again.
var unicodeWord = XRegExp('^\\p{L}+$');
unicodeWord.test('Русский'); // -> true
unicodeWord.test('日本語'); // -> true
unicodeWord.test('العربية'); // -> true
from the usage examples:
https://github.com/slevithan/xregexp/blob/master/README.md#usage-examples
I want to add a (variable) tag to values with regex, the pattern works fine with PHP but I have troubles implementing it into JavaScript.
The pattern is (value is the variable):
/(?!(?:[^<]+>|[^>]+<\/a>))\b(value)\b/is
I escaped the backslashes:
var str = $("#div").html();
var regex = "/(?!(?:[^<]+>|[^>]+<\\/a>))\\b(" + value + ")\\b/is";
$("#div").html(str.replace(regex, "" + value + ""));
But this seem not to be right, I logged the pattern and its exactly what it should be.
Any ideas?
To create the regex from a string, you have to use JavaScript's RegExp object.
If you also want to match/replace more than one time, then you must add the g (global match) flag. Here's an example:
var stringToGoIntoTheRegex = "abc";
var regex = new RegExp("#" + stringToGoIntoTheRegex + "#", "g");
// at this point, the line above is the same as: var regex = /#abc#/g;
var input = "Hello this is #abc# some #abc# stuff.";
var output = input.replace(regex, "!!");
alert(output); // Hello this is !! some !! stuff.
JSFiddle demo here.
In the general case, escape the string before using as regex:
Not every string is a valid regex, though: there are some speciall characters, like ( or [. To work around this issue, simply escape the string before turning it into a regex. A utility function for that goes in the sample below:
function escapeRegExp(stringToGoIntoTheRegex) {
return stringToGoIntoTheRegex.replace(/[-\/\\^$*+?.()|[\]{}]/g, '\\$&');
}
var stringToGoIntoTheRegex = escapeRegExp("abc"); // this is the only change from above
var regex = new RegExp("#" + stringToGoIntoTheRegex + "#", "g");
// at this point, the line above is the same as: var regex = /#abc#/g;
var input = "Hello this is #abc# some #abc# stuff.";
var output = input.replace(regex, "!!");
alert(output); // Hello this is !! some !! stuff.
JSFiddle demo here.
Note: the regex in the question uses the s modifier, which didn't exist at the time of the question, but does exist -- a s (dotall) flag/modifier in JavaScript -- today.
If you are trying to use a variable value in the expression, you must use the RegExp "constructor".
var regex = "(?!(?:[^<]+>|[^>]+<\/a>))\b(" + value + ")\b";
new RegExp(regex, "is")
I found I had to double slash the \b to get it working. For example to remove "1x" words from a string using a variable, I needed to use:
str = "1x";
var regex = new RegExp("\\b"+str+"\\b","g"); // same as inv.replace(/\b1x\b/g, "")
inv=inv.replace(regex, "");
You don't need the " to define a regular expression so just:
var regex = /(?!(?:[^<]+>|[^>]+<\/a>))\b(value)\b/is; // this is valid syntax
If value is a variable and you want a dynamic regular expression then you can't use this notation; use the alternative notation.
String.replace also accepts strings as input, so you can do "fox".replace("fox", "bear");
Alternative:
var regex = new RegExp("/(?!(?:[^<]+>|[^>]+<\/a>))\b(value)\b/", "is");
var regex = new RegExp("/(?!(?:[^<]+>|[^>]+<\/a>))\b(" + value + ")\b/", "is");
var regex = new RegExp("/(?!(?:[^<]+>|[^>]+<\/a>))\b(.*?)\b/", "is");
Keep in mind that if value contains regular expressions characters like (, [ and ? you will need to escape them.
I found this thread useful - so I thought I would add the answer to my own problem.
I wanted to edit a database configuration file (datastax cassandra) from a node application in javascript and for one of the settings in the file I needed to match on a string and then replace the line following it.
This was my solution.
dse_cassandra_yaml='/etc/dse/cassandra/cassandra.yaml'
// a) find the searchString and grab all text on the following line to it
// b) replace all next line text with a newString supplied to function
// note - leaves searchString text untouched
function replaceStringNextLine(file, searchString, newString) {
fs.readFile(file, 'utf-8', function(err, data){
if (err) throw err;
// need to use double escape '\\' when putting regex in strings !
var re = "\\s+(\\-\\s(.*)?)(?:\\s|$)";
var myRegExp = new RegExp(searchString + re, "g");
var match = myRegExp.exec(data);
var replaceThis = match[1];
var writeString = data.replace(replaceThis, newString);
fs.writeFile(file, writeString, 'utf-8', function (err) {
if (err) throw err;
console.log(file + ' updated');
});
});
}
searchString = "data_file_directories:"
newString = "- /mnt/cassandra/data"
replaceStringNextLine(dse_cassandra_yaml, searchString, newString );
After running, it will change the existing data directory setting to the new one:
config file before:
data_file_directories:
- /var/lib/cassandra/data
config file after:
data_file_directories:
- /mnt/cassandra/data
Much easier way: use template literals.
var variable = 'foo'
var expression = `.*${variable}.*`
var re = new RegExp(expression, 'g')
re.test('fdjklsffoodjkslfd') // true
re.test('fdjklsfdjkslfd') // false
Using string variable(s) content as part of a more complex composed regex expression (es6|ts)
This example will replace all urls using my-domain.com to my-other-domain (both are variables).
You can do dynamic regexs by combining string values and other regex expressions within a raw string template. Using String.raw will prevent javascript from escaping any character within your string values.
// Strings with some data
const domainStr = 'my-domain.com'
const newDomain = 'my-other-domain.com'
// Make sure your string is regex friendly
// This will replace dots for '\'.
const regexUrl = /\./gm;
const substr = `\\\.`;
const domain = domainStr.replace(regexUrl, substr);
// domain is a regex friendly string: 'my-domain\.com'
console.log('Regex expresion for domain', domain)
// HERE!!! You can 'assemble a complex regex using string pieces.
const re = new RegExp( String.raw `([\'|\"]https:\/\/)(${domain})(\S+[\'|\"])`, 'gm');
// now I'll use the regex expression groups to replace the domain
const domainSubst = `$1${newDomain}$3`;
// const page contains all the html text
const result = page.replace(re, domainSubst);
note: Don't forget to use regex101.com to create, test and export REGEX code.
var string = "Hi welcome to stack overflow"
var toSearch = "stack"
//case insensitive search
var result = string.search(new RegExp(toSearch, "i")) > 0 ? 'Matched' : 'notMatched'
https://jsfiddle.net/9f0mb6Lz/
Hope this helps