remove everything after second occurrence even if dash occur in domain - javascript

i got links for example:
abc.pri-from-somestring
abc-de.pri-idx-somestring
abc.org.au-sop-somestring
the result mi looking for from the example links is"
abc.pri
abc-de.pri
abc.org.au
what im trying to get is the domain out of those string combinations.
the main problem is that links can have dashes in them or 2 dots so im stuck.
i did many splits and join but there is many combinations.
i think the best way is with regex and i don't have enough experience with.
any other solution will be fine.
this works but only if there is a dash after the domain
^([^-\d]+)
any help will be very appreciated
thanks

Something like this should work:
^.*[.][^-]*
Basically just get everything up until and including the last dot, and then get every up until but excluding the last dash (if one exists).

Related

Regular Expression for an entire text

I have been working on a regular expression to divide the one text in sentences. But I been having problems with numbers like 13.4 or emails. In reasons of the '.'. Someone would how to fix it?
/([^\n\r\.!\?:;]+[\.!\?:;]\s)|([^\.!\?]+$)/g
As #WiktorStribizew said, this is a too hard task to be completed with just regex. But you can still get an approximation matching interpunction followed by spaces or new lines:
/.*?[.!?:;](\s|[\r\n]|$)/gm
You can see an example here.

Improvement of JS Regex to restrict all letters of a word in a specific range

I'm solving the Ranges challenge in RegexGolf, but I'm somewhat stuck in trying to shorten the regex.
Here is a screenshot of the conditions -
My current solution is \b[a-f]+\b. This pattern has the required range [a-f] in a word boundary. While this works, the regex has 10 characters, and the result list shows submissions with 8, and even 1 character.
Would appreciate any insights on improving this regex.
First please note that shorter doesn't necessarily means better, faster or better readable. But as this is a golfing challenge:
This site seems to handle every input as a separate string. While the word boundaries you are using are fine, using start and end of string anchors (^ and $) will be 1 character shorter each. I don't see how it could be minimized further, so your regex could be
^[a-f]+$
Note: One of the 1-score solutions comments, that i dont know regex but i know javascript, so I'd guess that there was some cheating involved.

How to break up non-english text into constituent characters in javascript?

I am trying to draw text along a curve on html5 canvas. To do this, I need to break up input text into constituent characters which can individually be rotated and translated etc. The breaking up of text is easy for English. Given input string s, s[i] gives the ith character. But this does not work for non-english strings. I have a jsfiddle here illustrating the problem: http://jsfiddle.net/c6HV8/. Note that the fiddle appears differently in Chrome and IE at time of this writing. To see what the problem is, consider you have non-english text in a string s. Create a text node to which you pass s. Next, create a text node for each s[i] and display the text nodes adjacent to each other. Now compare the results. They are not the same. How can I break up non-english text into constituent characters in javascript, so that the two results are the same?
भाईसाब :) So as I'm sure you already know, the problem is that fillText and createText both work on the entire string and so it is able to evaluate the string along with all the diacritic marks (combining characters). However, when you call fillText and createText per character, none of the diacritics appear along with the characters they are supposed to be attached to. Hence they are evaluated and drawn individually, which is why you see the diacritic along with the dotted circle (kind of a place holder that says: put a character here).
There is no easy way to do this, really. Your algorithm would basically have to be like this:
Look up the current character from the string.
Find all successive characters that are diacritics and then combine all of them into a new string.
Render that string using fillText.
You can check out the results here on a forked version of your fiddle. I modified the sample text to add some more complex characters just to make sure that the algorithm works properly. The code could definitely be cleaned up; I just did it as a proof-of-concept.
The hard part is coming up with a list of code-points for diacritics for all languages if you want to internationalize this. This answer provides a list that should help you get started.

Time / Date Recognition with jQuery

I was wondering if anyone has any ideas, or has stumbled upon a script that will recognize portions of dates and times in any given field of text.
For instance, this sentence right here was being typed at 4:24pm and I suspect I will finish it at about 4:25 or so.. perhaps even later on the 19th.
I would like to be able to make a live listener that will pick out those times above (or guess) and surround them in a link to say... /calendar/events/create/TIMESTAMP
I expect regex could be used to find certain indicators like : or th, or anything like that and take a guess at the rest pending the current time..
Macs do this in mail and icalendar.. pretty cool. Thoughts or ideas would be greatly appreciated!
Have a look at date.js library which supports wide variety of formats.
Even with this library you will need to:
Preprocess the text (like removing dots at the end of sentences, removing whitespaces, ..)
Filter out false positives (it can parse th as thursday, say as saturday, etc).
Create algorithm for effective scanning of the text.
Map the found times/dates back to the text, etc.
So there is still a lot of work to do.
To see a very ineffective demonstration of the parsing have a look HERE.

Remove Jargon but keep real characters

I"m getting bombarded by spam with posts like below, so what would be the best and most efficient way of remove all the jargon from something like this:
<texarea id="comment">ȑ̉̽ͧ̔͆ͦ̊͛̿͗҉̷̢̧̫̗̗͎͈͕e̷̪͓̼̼̣̻̻͙͔̳̘̗͙̬̱͎ͭ̃͗ͩͯͥͬ̂ͧ͐͌̑̅͢͜ͅd̴̦̺̖̣͎̲̥͕̗̺̯̤͗ͬ͌ͧ̓͒ͭ́̋ͩͥ͊̇̓̌ͫ̃́́͠</textarea>
I'm assuming RegEx, but what exactly are those things called and how would it be referenced in RegExp? The problem lays within a <textarea> tag, and upon retrieving the value, I'd like to be able to remove all that jargon from the value and have it only display the real characters which in this case should be red.
Allowing other Unicode type of characters are essential, but not characters that stack on top of each other.
Zalgo waits behind the wall.
You want to filter out combining characters, such as the diacritical marks listed here.
You should be able to get away with a simple character class pattern match, i.e.:
fooString.replace(/[\u0300-\u036f\u0483-\u0489\u1dc0-\u1dff\u20d0-\u20ff\ufe20-\ufe2f]/, "");
If you want to limit content to one combination per character (not that this really alleviates all negative side-effects), you could simply use
fooString.replace(/([\u0300-\u036f\u0483-\u0489\u1dc0-\u1dff\u20d0-\u20ff\ufe20-\ufe2f])[\u0300-\u036f\u0483-\u0489\u1dc0-\u1dff\u20d0-\u20ff\ufe20-\ufe2f]*/, "$1");
EDIT: Added a number of other combining character ranges. This is most likely still not exhaustive.
Removing combining diacriticals will make input of some languages (such as Vietnamese) difficult or impossible, so you should reconsider.

Categories