Regular expression when reading files javascript - javascript

I have a situation where I have 3 font files and I read its content in order to find mathes with font name. But the thing is that font names are Wingdings, Wingdings 2, Wingdings 3. And when I have Wingdings font name it matches all 3 files, but I need file that exactly is associated with font name, not all 3 of them. I tried to find it using indexOf method, but it didn't help. The only rational way is to use regular expression, but cannot think of a right one. One more thing need to be mentioned is that I have to pass a parameter into that regExp, something like
var regExp = new RegExp('\\^' + fontName + '$\\', 'g');
if (currentFileContent.search(regExp) !== -1) {...}
Any help will be greatly appreciated.

It seems you try to use regex delimiters in a RegExp constructor. You only need /.../ in the literal notation.
Note you need not escape the start and end of string anchors, they lose their special meaning in the regex then. \\ matches a single \, but it cannot be matched after end of string ($).
Also, you can use RegExp#test() function to check if the string matches the pattern (note no g modifier can be used with it):
var regExp = RegExp('^' + fontName + '$');
if (regExp.test(currentFileContent)) { ... }
If font names contain special characters, use escapeRegExp function from MDN:
function escapeRegExp(string){
return string.replace(/[.*+?^${}()|[\]\\]/g, "\\$&");
}
And then
var regExp = RegExp('^' + escapeRegExp(fontName) + '$');
And the final note: if the font names appear inside a larger string, and you need to match Windings but not Windings3, use
var regExp = RegExp('\\b' + escapeRegExp(fontName) + '\\b');
The \b is a word boundary.
UPDATE
To make sure you only match a font name that is not followed by a whitespace (if any) and a digit, use a (?!\\s*\\d) lookahead when declaring a RegExp:
var fontName = "Wingding";
var contents = "Font name: Wingding, the other file: Font name: Wingding 2. And so forth. ";
var rExp = RegExp(fontName + '(?!\\s*\\d)');
if (rExp.test(contents)) {
document.write(fontName + " was found in '<i>" + contents + "</i>'.");
}

Related

how to match a string of words using regex javascript

Am trying to find a regex expression for this result:
string => should be matched (a single word or set of words at the beginning or the ending)
string => should be matched (a single word or set of words in the middle)
{{string}} -- should not be matched (a single word or set of words surrounded by two "{}" should not be matched)
am using this regex in this function :
text = text.replace(RegExp("([^{]{2})[^(\d:)]" + aTags[index].textContent + "\w*
([^}]{2})", 'i'), "{{" + index + ":" + aTags[index].textContent + "}}");
the function should find the textContent of an 'a' tag in a 'text' string and replace it by adding a digit and ':' to the beginning of the textContent so that the result should be something like this :
some text => will became => {{1:some text}}
We can apply the good old *SKIP what's to avoid approach and throw everything that does not need to be replaced in the full match and capture the desired output in group 1:
{{[^}]+}}|(string)
To make this work effectively in JavaScript we have to use a .replace callback function:
const regex = /{{[^}]+}}|(string)/gm;
const str = `string
string
{{string}}`;
var index = 1; //this is your index var and is somehow set from outside
const result = str.replace(regex, function(m, group1) {
if (group1) return `{{${index}:${group1}}}`;
else return m;
});
console.log('Substitution result: ', result);
I had pseudo-coded this a bit since I cannot know where index and aTags[index].textContent is coming from. Adjust as needed.
You cannot use PCRE verbs like (*SKIP)(*F) in a JavaScript regex, i.e. you cannot skip a matched portion of text with the regex means only. In JavaScript, you may match and capture a part of the string you want to later analyze in the replacement callback method (JS String#replace accepts a callback as the replacement argument).
So, in your case the solution will look like
text = text.replace(RegExp("{{.*?}}|(" + aTags[index].textContent + ")", "gi"),
function ($0, $1) {
return $1 ? "{{" + index + ":" + $1 + "}}" : $0;
}
);
I understand the aTags[index].textContent value is alphanumeric, else, consider escaping it for use in a regex pattern.
The pattern will match a {{...}} substring having no } inside (with {{.*?}}) or (|) it will match and capture the text content ((aTags[index].textContent)) into Group 1. When you get a match, you need to pass 2 arguments to the callback, the whole match and Group 1 value. If Group 1 is not empty, you perform string manipulations, else, just insert the match back.

Javascript nested square brackets in string

I am looking for an easier (and less hacky) way to get the substring of what is inside matching square brackets in a string. For example, lets say this is the string:
[ABC[D][E[FG]]HIJK[LMN]]OPQR[STUVW]XYZ
I want the substring:
ABC[D][E[FG]]HIJK[LMN]
Right now, I am looping through the string and counting the open and closed brackets, and when those numbers are the same, I take substring of the first open bracket and last closed bracket.
Is there an easier way to do this (ie with regex), so that I do need to loop through every character?
Here's another approach, an ugly hack which turns the input into a JS array representation and then parses it using JSON.parse:
function parse(str) {
return JSON.parse('[' +
str.split('') . join(',') . // insert commas
replace(/\[,/g, '[') . // clean up leading commas
replace(/,]/g, ']') . // clean up trailing commas
replace(/\w/g, '"$&"') // quote strings
+ ']');
}
>> hack('A[B]C')
<< ["A", ["B"], "C"]
Now a stringifier to turn arrays back into the bracketed form:
function stringify(array) {
return Array.isArray(array) ? '[' + array.map(stringify).join('') + ']' : array;
}
Now your problem can be solved by:
stringify(parse("[ABC[D][E[FG]]HIJK[LMN]]OPQR[STUVW]XYZ")[0])
Not sure if I get the question right (sorry about that).
So you mean that if you were to have a string of characters X, you would like to check if the string combination Y is contained within X?
Where Y being ABC[D][E[FG]]HIJK[LMN]
If so then you could simply do:
var str = "[ABC[D][E[FG]]HIJK[LMN]]OPQR[STUVW]XYZ";
var res = str.match(/ABC\[D]\[E\[FG]]HIJK\[LMN]/);
The above would then return the string literal Y as it matches what is inside str.
It is important that you pay attention to the fact that the symbols [ are being escaped with a \. This is because in regex if you were to have the two square brackets with any letter in between (ie. [asd]) regex would then match the single characters included in the specified set.
You can test the regex here:
https://regex101.com/r/zK3vZ3/1
I think the problem is to get all characters from an opening square bracket up to the corresponding closing square bracket. Balancing groups are not implemented in JavaScript, but there is a workaround: we can use several optional groups between these square brackets.
The following regex will match up to 3 nested [...] groups and you can add the capturing groups to support more:
\[[^\]\[]*(?:
\[[^\]\[]*(?:
\[[^\]\[]*(?:\[[^\]\[]*\])*\]
)*[^\]\[]*
\][^\]\[]*
)*[^\]\[]*
\]
See example here. However, performance may be not that high with such heavy backtracking.
UPDATE
Use XRegExp:
var str = '[ABC[D][E[FG]]HIJK[LMN]]OPQR[STUVW]XYZ';
// First match:
var res = XRegExp.matchRecursive(str, '\\[', ']');
document.body.innerHTML = "Getting the first match:<br/><pre>" + JSON.stringify(res, 0, 4) + "</pre><br/>And now, multiple matches (add \"g\" modifier when defining the XRegExp)";
// Multiple matches:
res = XRegExp.matchRecursive(str, '\\[', ']', 'g');
document.body.innerHTML += "<pre>" + JSON.stringify(res, 0, 4) + "</pre>";
<script src="https://cdnjs.cloudflare.com/ajax/libs/xregexp/2.0.0/xregexp-all-min.js"></script>

Escape string characters without manually escaping them

I Need to replace a set of characters from a string, I don't have control over the string so I can't just escape the + symbol inside the string.
So my question is, seeing as this works if I change my value to 'breeding' it does replace the string. How can I escape a string without manually escaping them? I have tried
var s = "http://example.co/kb/tags/anazolic~racing~all+articles~breeding";
var value = 'all+articles';
var find = new RegExp('\~?\\b' + value + '\\b', 'g');
var l = s.replace(find, '');
console.log(l);
DEMO: http://jsfiddle.net/AnBc6/1/
I have also tried adding: value = encodeURIComponent(value); but this didn't work either.
Any Help?
So, if I understand correctly, you want to escape special regex characters.
value = value.replace(/[-\\()\[\]{}^$*+.?|]/g, '\\$&');
You could extract this to a function of course:
function escapeRegex(value) {
return String(value).replace(/[-\\()\[\]{}^$*+.?|]/g, '\\$&');
}
Change the third line to this:
var find = new RegExp('\~?\\b' + value.replace(/\+/g,'\\+') + '\\b', 'g');
The plus sign is a special character in a Regular Expression, so it needs to be escaped with a backslash.
(Also, I'm not sure what you mean by "stored in a variable." Everything in JavaScript is "in a variable." Or maybe you really mean, "stored in a RegExp object.")

JavaScript: How to pull out a string from a URL using a regular expression

In java, I have this URL as a string:
window.location.href =
"http://localhost:8080/bladdey/shop/c6c8262a-bfd0-4ea3-aa6e-d466a28f875/hired-3";
I want to create a javascript regular expression to pull out the following string:
c6c8262a-bfd0-4ea3-aa6e-d466a28f875
To find left hand marker for the text, I could use the regex:
window\.location\.href \= \"http\:\/\/localhost:8080\/bladdey\/shop\/
However, I don't know how to get to the text between that and /hired3"
What is the best way to pull out that string from a URL using javascript?
You could split the string in tokens and look for a string that has 4 occurrences of -.
Or, if the base is always the same, you could use the following code:
String myString = window.location.href;
myString = myString.substring("http://localhost:8080/bladdey/shop/".Length());
myString = myString.subString(0, myString.indexOf('/'));
Use a lookahead and a lookbehind,
(?<=http://localhost:8080/bladdey/shop/).+?(?=/hired3)
Check here for more information.
Also, there is no need to escape the : or / characters.
You need a regex, and some way to use it...
String theLocation = "http://localhost:8080/bladdey/shop/c6c8262a-bfd0-4ea3-aa6e-d466a28f8752/hired-3";
String pattern = "(?</bladdey/shop/).+?(?=/hired3)";
// Create a Pattern object
Pattern r = Pattern.compile(pattern);
// Now create matcher object.
Matcher m = r.matcher(line);
if (m.find( )) {
System.out.println("Found value: " + m.group(0) );
} else {
System.out.println("NO MATCH");
}
note - this will still work when you change the host (it only looks for bladdey/shop/)
You can use capturing groups to pull out some content of your string.
In your case :
Pattern pattern = Pattern.compile("(http://localhost:8080/bladdey/shop/)(.+)(/hired-3)");
Matcher matcher = pattern.matcher(string);
if(matcher.matches()){
String value = matcher.group(2);
}
String param = html.replaceFist("(?s)^.*http://localhost:8080/bladdey/shop/([^/]+)/hired-3.*$", "$1");
if (param.equals(html)) {
throw new IllegalStateException("Not found");
}
UUID uuid = new UUID(param);
In regex:
(?s) let the . char wildcard also match newline characters.
^ begin of text
$ end of text
.* zero or more (*) any (.) characters
[^...]+ one or more (+) of characters not (^) being ...
Between the first parentheses substitutes $1.
Well if you want to pull out GUID from anything:
var regex = /[a-f0-9]{8}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{11,12}/i
It should really be {12} but in your url it is malformed and has just 15.5 bytes of information.

replace the word or some characters in a word in html page with jquery

I found the jquery code (I forgot the original site) is working to replace a word in a html page to be star sign (*), and the code is running well, but the code can only be used to replace each a single word, can't change the part of the word and also case-sensitive.
JQuery code :
String.prototype.repeat = function(num){
return new Array(num + 1).join(this);
}
/* Word or Character to be replace */
var filter = ['itch','asshole', 'uck', 'sex'];
$('body').text(function(i, txt){
// iterate over all words
for(var i=0; i<filter.length; i++){
// Create a regular expression and make it global
var pattern = new RegExp('\\b' + filter[i] + '\\b', 'g');
// Create a new string filled with '*'
var replacement = '*'.repeat(filter[i].length);
txt = txt.replace(pattern, replacement);
}
// returning txt will set the new text value for the current element
return txt;
});
word filter:
['itch','asshole', 'uck', 'sex'];
and result :
sex -> *** // successfully replacing
SEX -> SEX // not replaced, i want this word also replaced to ***
bitch -> bitch // not replaced, i want this word replaced to b****
how to modify this jquery code so that can be used to change some of the characters in the word and not case-sensitive?
the fiddle : http://jsfiddle.net/bGhq8/
Thank you.
Use the case sensitive option and no need for boundary.
String.prototype.repeat = function(num){
return new Array(num + 1).join(this);
}
/* Word or Character to be replace */
var filter = ['itch','asshole', 'uck', 'sex'];
$('body').text(function(i, txt){
// iterate over all words
for(var i=0; i<filter.length; i++){
// Create a regular expression and make it global
var pattern = new RegExp(filter[i] , 'gi'); // Add the "i" modifier for case insensitivity
// Create a new string filled with '*'
var replacement = '*'.repeat(filter[i].length);
txt = txt.replace(pattern, replacement);
}
// returning txt will set the new text value for the current element
return txt;
});
Updated fiddle:
http://jsfiddle.net/bGhq8/3/
The following line in the code you provided:
var pattern = new RegExp('\\b' + filter[i] + '\\b', 'g');
matches on word boundaries (e.g. whitespace). In other words, it's a whole-word match on each word in the filter array.
To match against any occurrence of the words in filter, whether or not they occur as partial words, you can remove '\\b' from the start, the end, or both ends of the regular expression.
This approach, however, is not really ideal. Plenty of legitimate, non-offensive words -- itch, sextet, and so on -- will be censored by your filter. This is not something that is simple to solve without either:
Keeping the word boundary constraint as in the original code
Writing a custom regular expression for every offensive word you wish to censor (perhaps too time-consuming)
You should note that no single approach is going to be without false positives.
The reason you are seeing the behavior you mention is because of the regular expression you have written (repeated below):
var pattern = new RegExp('\\b' + filter[i] + '\\b', 'g');
For starters, to get this to replace values in a case-insensitive manner, you need to add the 'i' flag for case-insensitivity
var pattern = new RegExp('\\b' + filter[i] + '\\b', 'gi');
In addition, the reason it is only replacing whole words is because of the word boundary flags (\b) being placed around the search criteria. If you don't want to limit yourself to replacing whole words, you need to consider what additional patterns are candidates for being replaced and how you want to replace them. One possible solution that gets you a bit closer by allowing for zero or one letters before the pattern would be:
var pattern = new RegExp('\\b([A-Z]?)' + filter[i] + '\\b', 'gi');
var replacement = '$1'+'*'.repeat(filter[i].length);
txt = txt.replace(pattern, replacement);
As a side note http://regexpal.com/ is a great place for testing (and therefore learning about) regular expressions.

Categories