How to with extract url from tweet using Regular Expressions - javascript

Ok so i'm executing the following line of code in javascript
RegExp('(http:\/\/t.co\/)[a-zA-Z0-9\-\.]{8}').exec(tcont);
where tcont is equal to some string like 'Test tweet to http://t.co/GXmaUyNL' (the content of a tweet obtained by jquery).
However it is returning, in the case above for example, 'http://t.co/GXmaUyNL,http://t.co/'.
This is frustracting because I want the url without the bit on the end - after and including the comma.
Any ideas why this is appearing? Thanks

First, get rid of the parens in the pattern - they're unnecessary:
RegExp('http:\/\/t.co\/[a-zA-Z0-9\-\.]{8}').exec(tcont);
Second, a regex match returns an array of matching groups - you want the first item in it (the entire match):
var match = RegExp('http:\/\/t.co\/[a-zA-Z0-9\-\.]{8}').exec(tcont);
if(match) {
var result = match[0];
}
The reason you had "a part on the end" is because your result is actually an array - the parens you had in the expression were resulting in an extra matching group (the portion they were around), which would be match[1].

Try this : RegExp('http:\/\/t\.co\/[a-zA-Z0-9\-\.]{8}').exec(tcont);

Related

Global flag returns one result while no flag returns multiple (JS Regex HELP) [duplicate]

I have the following code:
var str = "$123";
var re = /(\$[0-9]+(\.[0-9]{2})?)/;
var found = str.match(re);
alert(found[1]);
alert(found[0]);
I am trying to understand why found[0] and found[1] would contain $123. Why does it get it twice?
I would like to get all the "potential" prices just one, so for example if I have this string:
var str = "$123 $149 $150"; It would be:
found[0] = $123
found[1] = $149
found[2] = $150
And that would be it, the array found would not have more matches.
What is happening here? What am I missing?
That's because of the parenthesis around the whole expression : it defines a captured group.
When you don't use the g flag, match returns in an array :
the whole string if it matches the pattern
the captured group(s)
Here the captured group is the whole string.
What you seem to want is
"$123 $149 $150".match(/\$\d+(\.\d{0,2})?/g)
which returns
["$123", "$149", "$150"]
Reference : the MDN about regular expressions and flags
The first is the full match.
The second represents the outer subgroup you defined, which is the same as the full match in your case.
That particular subgroup doesn't really seem necessary, so you should be able to remove it. The inner group doesn't have a match for your particular string.
FYI, if you want to use a group, but make it non-capturing, you can add ?: inside the start of it.
var re = /(?:\$[0-9]+(\.[0-9]{2})?)/;
Again, the group here isn't doing you much good, but it shows the ?: in use.
Add the g flag to the end of your regex. Otherwise only the first match will be captured. With g, sub groups are not captured. You do not need them to be; the outer parentheses in your regex do not actually do anything.
var re = /\$[0-9]+(\.[0-9]{2})?/g;
You can explicitly suppress subgroup capture with (?:, but it doesn't matter with the g flag.

regular expression find end of string

I have troubles with a regular expression.
I want to replace all ocurrences of myData=xxxx& xxxx can change, but always ends with &, except the last ocurrence, when it is myData=xxx.
var data = "the text myData=data1& and &myData=otherData& and end myData=endofstring"
data.replace(/myData=.*?&/g,'newData');
it returns :
the text newData and &newData and end myData=endofstring
which is correct, but how can I detect the last one?
Two things:
You need to assign the result of replace somewhere, which you're not doing in your question's code
You can use an alternation (|) to match either & or end of string
So:
var data = "the text myData=data1& and &myData=otherData& and end myData=endofstring"
data = data.replace(/myData=.*?(?:&|$)/g,'newData');
// ^^^^^^^-- 1 ^^^^^^^-- 2
console.log(data);
Note the use of a non-capturing group ((?:...)), to limit the scope of the alternation.
What about :
data="myData=abc& and linked with something else";
data.replace(/myData=.*?&/g,'newData');
https://jsfiddle.net/ob8c2j9v/

Find and Replace all occurrences of a phrase in a json string using capturing groups

I have a stringified JSON which looks like this:
...
"message":null,"elementId:["xyz1","l9ie","xyz1"]}}]}], "startIndex":"1",
"transitionTime":"3","sourceId":"xyz1","isLocked":false,"autoplay":false
,"mutevideo":false,"loopvideo":false,"soundonhover":false,"videoCntrlVisibility":0,
...,"elementId:["dgff","xyz1","jkh90"]}}]}]
... it goes on.
The part I need to work on is the value of the elementId key. (The 2nd key in the first line, and the last key).
This key is present in multiple places in the JSON string. The value of this key is an array containing 4-character ids.
I need to replace one of these ids with a new one.
The kernel of the idea is something like:
var elemId = 'xyz1' // for instance
var regex = new RegExp(elemId, 'g');
var newString = jsonString.replace(regex, newRandomId);
jsonString = newString;
There are a couple of problems with this approach. The regex will match the id anywhere in the JSON. I need a regex which only matches it inside the elementId array; and nowhere else.
I'm trying to use a capturing group to match just the occurrences I need, but I can't quite crack it. I have:
/.*elementId":\[".*(xyz1).*"\]}}]/
But this doesn't match the 1st occurence of 'xyz1 in the array.
So, firstly, I need a regex which can match all the 'xyz1's inside elementId; but nowhere else. The sequence of square and curly brackets after elementId ends doesn't change anywhere in the string, if that helps.
Secondly, even if I have a capturing group that works, string.replace doesn't act as expected. Instead of replacing just the match inside the capturing group, it replaces the whole match.
So, my second requirement is replacing only the captured groups, not the whole match.
What a need is a piece of js code which will replace my 'xyz1's where needed and return the following string (assuming the newRandomId is 'abcd'):
"message":null,"elementId:["abcd","l9ie","abcd"]}}]}], "startIndex":"1",
"transitionTime":"3","sourceId":"xyz1","isLocked":false,"autoplay":false
,"mutevideo":false,"loopvideo":false,"soundonhover":false,"videoCntrlVisibility":0,
...,"elementId:["dgff","abcd","jkh9"]}}]}]
Note that the value of 'sourceId' is unaffected.
EDIT: I have to work with the JSON. I can't parse it and work with the object since I don't know all the places the old id might be in the object and looping through it multiple times (for multiple elements) would be time-consuming
Assuming you can't just parse and change the JS object, you could use 2 regexes: one to extract the array and the one to change the desired ids inside:
var output = input.replace(/("elementId"\s*:\s*\[)((?:".{4}",?)*)(\])/g, function(_,start,content,end){
return start + content.replace(/"xyz1"/g, '"rand"') + end;
});
The arguments _, start, content, end are produced as result of the regex (documentation here):
_ is the whole matched string (from "elementId:\[ to ]). I choose this name because it's an old convention for arguments you don't use
start is the first group ("elementId:\[)
content is the second captured group, that is the internal part of the array
end id the third group, ]
Using the groups instead of hardcoding the start and end parts in the returned string serves two purposes
avoid duplication (DRY principle)
make it possible to have variable strings (for example in my regex I accept optional spaces after the :)
var input = document.getElementById("input").innerHTML.trim();
var output = input.replace(/("elementId":\s*\[)((?:".{4}",?)*)(\])/g, function(_,start,content,end){
return start + content.replace(/"xyz1"/g, '"rand"') + end;
});
document.getElementById("output").innerHTML = output;
Input:
<pre id=input>
"message":null,"elementId":["xyz1","l9ie","xyz1"]}}]}], "startIndex":"1",
"transitionTime":"3","sourceId":"xyz1","isLocked":false,"autoplay":false
,"mutevideo":false,"loopvideo":false,"soundonhover":false,"videoCntrlVisibility":0,
...,"elementId":["dgff","xyz1","jkh9"]}}]}]
</pre>
Output:
<pre id=output>
</pre>
Notes:
it would be easy to do the whole operation in one regex if they weren't repetition of the searched id in one array. But the present structure makes it easy to handle several ids to replace at once.
I use non captured groups (?:...) in order to unclutter the arguments passed to the external replacing callback

`match` and `exec` with non-global regex appear to return the first match twice

I do not quite understand the behaviour of the JavaScript regex methods.
The problem is that I can’t get regexes of type /(something|something)/ to work with the match or exec methods without the global identifier, e.g. /(somereg1|somereg2)/g.
When the global identifier is there, the methods correctly return every instance it finds. But when it is not there, both methods correctly return only the first match they find. The problem is that they appear to return it twice. For instance:
const str = "Here is somereg1 and somereg2";
str.match(/(somereg1|somereg2)/)
I would expect this match call to return "somereg1". Instead it appears to return "somereg1,somereg1".
Check this JSFiddle. The code should be fairly self explanatory. The first example is taken from W3Schools.
The first element is the full match of the regex. If you tried this:
const str = "Here is somereg1 and somereg2";
str.match(/.*(somereg1|somereg2)/)
Your result would be [ "Here is somereg1 and somereg2", "somereg2" ].
This same behaviour occurs with an .exec(str) method call.
You might want to read about .match and .exec.
About the “sub parentheses matches”: in regexes, parentheses delimit capture groups. So, if you had this regex:
/.*(somereg1).*?(somereg2)/
Your .match result would be [ "Here is somereg1 and somereg2", "somereg1", "somereg2" ]. So, as you can see, the result array consists of the full match followed by all capture groups matches.
And to force a group not to be captured, just delimit with (?: and ):
"Here is somereg1 and somereg2".match(/.*(?:somereg1).*?(somereg2)/);
// Will result in [ "Here is somereg1 and somereg2", "somereg2" ].
Note that the g (global) flag changes the return semantics of match: they will return an array of full matches and capture groups will be ignored. exec, on the other hand, always returns the full match and capture group matches of the match which is after the current lastIndex of the RegExp instance. For convenience, matchAll can be used instead, which returns an iterator of all matches, including all capture groups.
You can use the following to get the req. result:
var str = "Here is somereg1 and somereg2" //I would expect
str.match(/(?=(somereg1|somereg2))/)
As for the match and exec. I would say go for the match as it uses regex object and prevents you from double escape and all for the strings used as re.
Modify your second line as below:
str.match(/somereg1|somereg2/)

Remove part of attribute value with jquery or javascript

There is a data parameter for a div that looks as follows:
<div data-params="[possibleText&]start=2011-11-01&end=2011-11-30[&possibleText]">
</div>
I want to remove the from the start through the end of the second date from that data-params attribute. There may or may not be text before the start and after the date after the second date.
How can I accomplish this using javascript or jQuery? I know how to get the value of the "data-params" attribute and how to set it, I'm just not sure how to remove just that part from the string.
Thank you!
Note: The dates will not always be the same.
I'd use a regular expression:
var text = $('div').attr('data-params');
var dates = text.match(/start=\d{4}-\d{2}-\d{2}&end=\d{4}-\d{2}-\d{2}/)[0]
// dates => "start=2011-11-01&end=2011-11-30"
The regular expression is not too complex. The notation \d means "match any digit" and \d{4} means "match exactly 4 digits". The rest is literal characters. So you can see how it works. Finally, that [0] at the end is because javascript match returns an array where the first element is the whole match and the rest are subgroups. We don't have any subgroups and we do want the whole match, so we just grab the first element, hence [0].
If you wanted to pull out the actual dates instead of the full query string, you can create subgroups to match by adding parenthesis around the parts you want, like this:
var dates = text.match(/start=(\d{4}-\d{2}-\d{2})&end=(\d{4}-\d{2}-\d{2})/)
// dates[0] => "start=2011-11-01&end=2011-11-30"
// dates[1] => "2011-11-01"
// dates[2] => "2011-11-30"
Here, dates[1] is the start date (the first subgroup based on parenthesis) and dates[2] is the end date (the second subgroup).
My regex skills aren't that good but this should do it
var txt = "[possibleText&]start=2011-11-01&end=2011-11-30[&possibleText]";
var requiredTxt = txt.replace(/^(.*)start=\d{4}-\d{2}-\d{2}&end=\d{4}-\d{2}-\d{2}(.*)$/, "$1$2");
I'm sure there are better ways to match your string with regex, but the $1 and $2 will put the first group and second group match into your requiredTxt stripping out the start/end stuff in the middle.
Say you have your data-params in a variable foo. Call foo.match as follows:
foo.match("[\\?&]start=([^&#]*)"); //returns ["&start=2011-11-01", "2011-11-01"]
foo.match("[\\?&]end=([^&#]*)"); //returns ["&end=2011-11-30", "2011-11-30"]

Categories