RegExp doesn't produce expected result but it does everywhere else [duplicate]

RegExp doesn't produce expected result but it does everywhere else [duplicate] - javascript

This question already has answers here:
RegEx to extract all matches from string using RegExp.exec
(19 answers)
Closed 5 years ago.
I'm getting started with Node.js and trying to learn it, coming from PHP environment.
I have following RegExp: /([A-Z]{2,})+/gim (two or more letters next to each other).
I have following string: "That's my testing sample but it doesn't work."
So I throw this into Node.js (keep in mind I'm a newbie):
var fs = require("fs");
var request = require("request");
// COMMENTS
var regex = new RegExp(/([A-Z]{2,})+/gim);
//COMMENTS
var thisyear = regex.exec("That's my testing sample but it doesn't work.");
console.log(thisyear);
This is the file in it's entirety.
The output that it returns:
[ 'That',
'That',
index: 0,
input: 'That\'s my testing sample but it doesn\'t work.' ]
The output according to pretty much every site I tested it on:
That
my
testing
sample
but
it
doesn
work
How do I get each separate result in an array of sorts?
P.S.: match() and test() are "not a function".

To get multiple results with the g flag, you call .exec() multiple times like this:
let regex = /([A-Z]{2,})+/gim;
let str = "That's my testing sample but it doesn't work.";
let results;
while ((results = regex.exec(str)) !== null) {
console.log(results[0]);
}
Javascript will set the .index property on the regex object to keep track of where it is searching in the source string and each time you call it, it will return the next set of results.
Note: When using the literal form of a regex /something/, you do not put it inside a new RegExp() constructor. The language makes you a regex object automatically when using the literal syntax.
FYI, you can get all the matches without using a while look like this:
let re = /([A-Z]{2,})+/gim;
let str = "That's my testing sample but it doesn't work.";
console.log(str.match(re));
This is generally the simpler way to do it unless you need to get the groups from your regex. If you need to groups rather than just the whole match, then you have to use the .exec() form to get multiple matches, each with multiple groups.

Related

Javascript regex parse complex url string

I need to parse a complex URL string to fetch specific values.
From the following URL string:
/api/rss/feeds?url=http://any-feed-url-a.com?filter=hot&format=rss&url=http://any-feed-url-b.com?filter=rising&format=rss
I need to extract this result in array format:
['http://any-feed-url-a.com?filter=hot&format=rss', 'http://any-feed-url-b.com?filter=rising&format=rss']
I tried already with this one /url=([^&]+)/ but I can't capture all correctly all the query parameters. And I would like to omit the url=.
RegExr link
Thanks in advance.

This regex works for me: url=([a-z:/.?=-]+&[a-z=]+)
also, you can test this: /http(s)?://([a-z-.?=&])+&/g
const string = '/api/rss/feeds?url=http://any-feed-url.com?filter=hot&format=rss&url=http://any-feed-url.com?filter=latest&format=rss'
const string2 = '/api/rss/feeds?url=http://any-feed-url.com?filter=hot&format=rss&next=parm&url=http://any-feed-url.com?filter=latest&format=rss'
const regex = /url=([a-z:/.?=-]+&[a-z=]+)/g;
const regex2 = /http(s)?:\/\/([a-z-.?=&])+&/g;
console.log(string.match(regex))
console.log(string2.match(regex2))

have you tried to use split method ? instead of using regex.
const urlsArr = "/api/rss/feeds?url=http://any-feed-url-a.com?filter=hot&format=rss&url=http://any-feed-url-b.com?filter=rising&format=rss".split("url=");
urlsArr.shift(); // removing first item from array -> "/api/rss/feeds?"
console.log(urlsArr)
)
which is going to return ["/api/rss/feeds?", "http://any-feed-url-a.com?filter=hot&format=rss&", "http://any-feed-url-b.com?filter=rising&format=rss"] then i am dropping first item in array
if possible its better to use something else then regex CoddingHorror: regular-expressions-now-you-have-two-problems

You can matchAll the url's, then map the capture group 1 to an array.
str = '/api/rss/feeds?url=http://any-feed-url-a.com?filter=hot&format=rss&url=http://any-feed-url-b.com?filter=rising&format=rss'
arr = [...str.matchAll(/url=(.*?)(?=&url=|$)/g)].map(x => x[1])
console.log(arr)
But matchAll isn't supported by older browsers.
But looping an exec to fill an array works also.
str = '/api/rss/feeds?url=http://any-feed-url-a.com?filter=hot&format=rss&url=http://any-feed-url-b.com?filter=rising&format=rss'
re = /url=(.*?)(?=&url=|$)/g;
arr = [];
while (m = re.exec(str)) {
arr.push(m[1]);
}
console.log(arr)

If your input is better-formed in reality than shown in the question and you’re targeting a modern JavaScript environment, there’s URL/URLSearchParams:
const input = '/api/rss/feeds?url=http://any-feed-url-a.com?filter=hot%26format=rss&url=http://any-feed-url-b.com?filter=rising%26format=rss';
const url = new URL(input, 'http://example.com/');
console.log(url.searchParams.getAll('url'));
Notice how & has to be escaped as %26 for it to make sense.
Without this input in a standard form, it’s not clear which rules of URLs are still on the table.

regex to match first occruence and everything in between until last match

I may be thinking this about the wrong way.
The first three (...)'s are generated and could be any number. I only want to catch these first set of items and allow the user to use () inside of their custom string.
Test String
(374003) (C6-96738) (WR183186) R1|SALOON|DEFECTIVE|WiFiInfotainment|Hardware detects WIFI but unable to log in on the (JAMIE HUTBER) internet.:
Regex
/\(([^)]+)\)/g
Current output
 ["(374003)", "(C6-96738)", "(WR183186)", "(JAMIE HUTBER)"]
Desired Output
 ["(374003)", "(C6-96738)", "(WR183186)"]

You can use two ways to do that:
get only 3 items from array
add space to your regexp \(([^ )]+)\) (https://regex101.com/r/ZPdq35/1/)

Using the sticky option /y you can then use regEx's ability to find all occurrences..
This will then work, if there is not a space in JAMIE HUNTER, etc..
eg.
const re = /\s*\(([^)]+)\)/y;
const str = "(374003) (C6-96738) (WR183186) R1|SALOON|DEFECTIVE|WiFiInfotainment|Hardware detects WIFI but unable to log in on the (JAMIE HUTBER) internet.:";
let m = re.exec(str);
while (m) {
console.log(m[1]);
m = re.exec(str);
}

AutomationAnywhere + Javascript regular expression

I have an "Invalid character error" while using JS script to extract mails from text that I cannot handle for last 2 days.
I am getting text from web application by using Object cloning and passing it to variable which i will later pass to JS script.
And of course my JS script which I checked and it works:
var args = WScript.Arguments;
var pattern = \w+#\w+.\w;
var result = /pattern/.exec(args);
WScript.StdOut.WriteLine(result);

First and foremost lets break this down it modules and debug them.
.
First Module: Object cloning
Object Cloning is very good to build reliability and this reliability is acheived by careful selection of Properties and in your example you have selected Path,DOMXPath, HTML Tag
Its a good practice to identify the properties which are unique and therefore yield high accuracy and some of these properties depend on the context
For example in a login page some properties include:
Priority 1: Path, HTML ID, InnerText
Priority 2: DOMXPath, HTMLValue
You may chose to add properties that you think may be unique to your context
Does strResult give you the expected value ? If yes, lets proceed
Second Module: Run Script
Accepts 2 Parameters $strResult$ and $mail$
And of course my JS script which I checked and it works:
and you have confirmed the JS module also runs fine
If you have verified the results of the first 2 modules, I think there could be an Invalid character somewhere in the script, parameters, check the regular expressions used. Shouldn't the pattern be enclosed in string " " ??
=====================
EDIT:
I wanted to recreate the issue and give you the desired result but I do not know your intended input and output for Javascript. However to the best of my understanding of your javascript, I have compiled and executed this script in Automation Anywhere and works perfect.
JavaScript
var args = WScript.Arguments;
if (args.length > 0)
{
var val=0;
var str=args.item(0);
var ary = str.split(",");
//WScript.Echo(ary.length);
// for loop in case there are multiple parameters passed
for (var i=0; i < ary.length; i++)
{
//Takes the input passed as parameter
var input = (ary[i]);
// Uses the Match() Method to look for an email address in input string
var result = input.match(/\w+#\w+\.com/);
//returns the email address
}
WScript.StdOut.WriteLine(result);
}
OR
//Takes the input passed as parameter
var input = (ary[i]);
//Declares the pattern used
var pattern = /\w+#\w+\.com/
// Uses the Exec() Method to look for a match
var result = pattern.exec(input);
//returns the email address
Run Script
Input Parameter
Output Parameter

Using javascript regex to translate a html

I would like to build my own translation function in javascript.
I already have a function language.lookup(key) which translates a word or expression:
var frenchHello = language.lookup('hello') //'bonjour'
Now I would like to write a function which takes a html string and translates it with my lookup function. In the html string I will have a special syntax for example #[translationkey] that will point out that this word should be translated.
This is the result I want:
var html = '<div><span>#[hello]</span><span>#[sir]</span>'
language.translate(html) //'<div><span>bonjour</span><span>monsieur</span>
How would I write language.translate?
My idea is to filter out my special syntax with regex and then run language.lookup on each key. Maybe with string replace or something.
I suck when it comes to regex and I've only come up with a very incomplete example but I include it anyway so maybe someone get the idea of what I am trying to do. Then if there is a better but complete different solution that is more than welcome.
var value = "#[hello], nice to see you.";
lookup = function(word){
return "bonjour";
};
var res = new RegExp( "\\b(hello)\\b", "gi" ).exec(value)
for (var c1 = 0; c1 < res.length; c1++){
value = value.replace(res[c1], lookup(res[c1]))
}
alert(value) //#[bonjour], nice to see you.
The regex should of course not filter out the word hello but the syntax and then collect the key by grouping or similar.
Can anyone help?

Just use String.replace method's ability to call function specified as second argument to generate replacement text and make a global replace using regexp matching your syntax:
var value = "#[hello], #[sir], nice to see you.";
lookup = function(full_match, word){
if(word == 'hello')
return "bonjour";
if(word == 'sir')
return "monsieur"
};
console.log(value.replace(/#\[(.+?)\]/gi, lookup))
Result:
bonjour, monsieur, nice to see you.
Of course when your replacement list gets bigger, you'd better use lookup object instead of series of ifs in lookup function, but you can really do whatever you want there.

You can try this to find all occurrences:
var re = new RegExp('#\\[([^\\]]+?)\\]', 'gi'),
str = '#[value1] plain text #[value2]',
match;
while (match = re.exec(str)) {
console.log(match);
}

You could use something like:
#\\[[^\\]]*\\]
Which matches the hash followed by an opening square bracket followed by zero or more characters NOT including the closing square bracket, followed by a closed square bracket.
Alternatively, perhaps it would be better to handle the translation at the server side (maybe even through your template engine) and send back to your client the translated response. Otherwise, (depending on the specific problem you are dealing with of course), you might end up sending a lot of data to the browser which might make your application respond slowly.
EDIT:
Here is a working piece of code:
var q="This #[ANIMAL1] was eaten by that #[ANIMAL2]";
var u = {"#[ANIMAL1]":"Lion","#[ANIMAL2]":"Frog"};
function insertAnimal(aString, lookup){
var res = (new RegExp("#\\[[^\\]]*\\]", "gi"))
while (m = res.exec(aString)){
aString = aString.replace(m, lookup[m])
}
return aString;
}
function main(){
alert(insertAnimal(q,u));
}
You can call the "main()" from an HTML document's body onload event

I can compare your requirement to 'resolving template texts within content'. If it is feasible to use Jquery , you should try Handlebars.js
.

What is the best way to parse a URL with JavaScript? [duplicate]

If there is one thing I just cant get my head around, it's regex.
So after a lot of searching I finally found this one that suits my needs:
function get_domain_name()
{
aaaa="http://www.somesite.se/blah/sdgsdgsdgs";
//aaaa="http://somesite.se/blah/sese";
domain_name_parts = aaaa.match(/:\/\/(.[^/]+)/)[1].split('.');
if(domain_name_parts.length >= 3){
domain_name_parts[0] = '';
}
var domain = domain_name_parts.join('.');
if(domain.indexOf('.') == 0)
alert("1"+ domain.substr(1));
else
alert("2"+ domain);
}
It basically gives me back the domain name, is there anyway I can also get all the stuff after the domain name? in this case it would be /blah/sdgsdgsdgs from the aaaa variable.

EDIT (2020): In modern browsers, you can use the built-in URL Web API.
https://developer.mozilla.org/en-US/docs/Web/API/URL/URL
var url = new URL("http://www.somesite.se/blah/sdgsdgsdgs");
var pathname = url.pathname; // returns /blah/sdgsdgsdgs
Instead of relying on a potentially unreliable* regex, you should instead use the built-in URL parser that the JavaScript DOM API provides:
var url = document.createElement('a');
url.href = "http://www.example.com/some/path?name=value#anchor";
That's all you need to do to parse the URL. Everything else is just accessing the parsed values:
url.protocol; //(http:)
url.hostname; //(www.example.com)
url.pathname; //(/some/path)
url.search; // (?name=value)
url.hash; //(#anchor)
In this case, if you're looking for /blah/sdgsdgsdgs, you'd access it with url.pathname
Basically, you're just creating a link (technically, anchor element) in JavaScript, and then you can make calls to the parsed pieces directly. (Since you're not adding it to the DOM, it doesn't add any invisible links anywhere.) It's accessed in the same way that values on the location object are.
(Inspired by this wonderful answer.)
EDIT: An important note: it appears that Internet Explorer has a bug where it omits the leading slash on the pathname attribute on objects like this. You could normalize it by doing something like:
url.pathname = url.pathname.replace(/(^\/?)/,"/");
Note:
*: I say "potentially unreliable", since it can be tempting to try to build or find an all-encompassing URL parser, but there are many, many conditions, edge cases and forgiving parsing techniques that might not be considered or properly supported; browsers are probably best at implementing (since parsing URLs is critical to their proper operation) this logic, so we should keep it simple and leave it to them.

The RFC (see appendix B) provides a regular expression to parse the URI parts:
^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?
12 3 4 5 6 7 8 9
where
scheme = $2
authority = $4
path = $5
query = $7
fragment = $9
Example:
function parse_url(url) {
var pattern = RegExp("^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\\?([^#]*))?(#(.*))?");
var matches = url.match(pattern);
return {
scheme: matches[2],
authority: matches[4],
path: matches[5],
query: matches[7],
fragment: matches[9]
};
}
console.log(parse_url("http://www.somesite.se/blah/sdgsdgsdgs"));
gives
Object
authority: "www.somesite.se"
fragment: undefined
path: "/blah/sdgsdgsdgs"
query: undefined
scheme: "http"
DEMO

Please note that this solution is not the best. I made this just to match the requirements of the OP. I personally would suggest looking into the other answers.
THe following regexp will give you back the domain and the rest. :\/\/(.[^\/]+)(.*):
www.google.com
/goosomething
I suggest you studying the RegExp documentation here: http://www.regular-expressions.info/reference.html
Using your function:
function get_domain_name()
{
aaaa="http://www.somesite.se/blah/sdgsdgsdgs";
//aaaa="http://somesite.se/blah/sese";
var matches = aaaa.match(/:\/\/(?:www\.)?(.[^/]+)(.*)/);
alert(matches[1]);
alert(matches[2]);
}

You just need to modify your regex a bit. For example:
var aaaa="http://www.somesite.se/blah/sdgsdgsdgs";
var m = aaaa.match(/^[^:]*:\/\/([^\/]+)(\/.*)$/);
m will then contain the following parts:
["http://www.somesite.se/blah/sdgsdgsdgs", "www.somesite.se", "/blah/sdgsdgsdgs"]
Here is the same example, but modified so that it will split out the "www." part. I think the regular expression should be written so that the match will work whether or not you you have the "www." part. So check this out:
var aaaa="http://www.somesite.se/blah/sdgsdgsdgs";
var m = aaaa.match(/^[^:]*:\/\/(www\.)?([^\/]+)(\/.*)$/);
m will then contain the following parts:
["http://www.somesite.se/blah/sdgsdgsdgs", "www.", "somesite.se", "/blah/sdgsdgsdgs"]
Now check out the same regular expression but with a url that does not start with "www.":
var bbbb="http://somesite.se/blah/sdgsdgsdgs";
var m = .match(/^[^:]*:\/\/(www\.)?([^\/]+)(\/.*)$/);
Now your match looks like this:
["http://somesite.se/blah/sdgsdgsdgs", undefined, "somesite.se", "/blah/sdgsdgsdgs"]
So as you can see it will do the right thing in both cases.

There is a nice jQuery plugin for parsing URLs: Purl.
All the regex stuff is hidden inside, and you get something like:
> url = $.url("http://markdown.com/awesome/language/markdown.html?show=all#top");
> url.attr('source');
"http://markdown.com/awesome/language/markdown.html?show=all#top"
> url.attr('protocol');
"http"
> url.attr('host');
"markdown.com"
> url.attr('relative');
"/awesome/language/markdown.html?show=all#top"
> url.attr('path');
"/awesome/language/markdown.html"
> url.attr('directory');
"/awesome/language/"
> url.attr('file');
"markdown.html"
> url.attr('query');
"show=all"
> url.attr('fragment');
"top"

Browsers have come a long way since this question was first asked. You can now use the native URL interface to accomplish this:
const url = new URL('http://www.somesite.se/blah/sdgsdgsdgs')
console.log(url.host) // "www.somesite.se"
console.log(url.href) // "http://www.somesite.se/blah/sdgsdgsdgs"
console.log(url.origin) // "http://www.somesite.se"
console.log(url.pathname) // "/blah/sdgsdgsdgs"
console.log(url.protocol) // "http:"
// etc.
Be aware that IE does not support this API. But, you can easily polyfill it with polyfill.io:
<script crossorigin="anonymous" src="https://polyfill.io/v3/polyfill.min.js?flags=gated&features=URL"></script>

We Keep Coding

JavaScript is the programming language of the Web.

RegExp doesn't produce expected result but it does everywhere else [duplicate] - javascript

Related

Javascript regex parse complex url string

regex to match first occruence and everything in between until last match

AutomationAnywhere + Javascript regular expression

Using javascript regex to translate a html

What is the best way to parse a URL with JavaScript? [duplicate]

Categories

Resources