Regex to get the text after first "space"? - javascript

Here a string I need to parse using regex.
http://carto1.wallonie.be/documents/terrils/fiche_terril.idc?TERRIL_id=1 Crachet 7/12
In fact this is an url followed by 1 space and a text.
I need to extract url and the text in 2 separate ways.
To extract the url \S+ is working just fine.
But to extract the text after first space, it gets really hard to understand.
I am using Yahoo Pipes. (I don't know if this link to edit the code will work)
EDIT:
Using (\S+) (.+) gives me something weird:

According to the Pipes documentation, it looks like it uses fairly standard regex syntax. Try this:
^(\S+)\s(.+)$
Then the URL will be $1 and the comment will be $2. The . operator matches any character, which you will need since it looks like the comments may have spaces.
EDIT: changed from literal space to \s since you might be looking at some odd whitespace character(s). You might as well throw a ^ and $ in there too, so the match fails instead of doing something weird.

Related

Regex get last 2 characters pipe not working

I'm creating a regex expression to get the variables passed to a JavaScript constructor.
The input is always going to follow along these lines:
app.use(express.static('public'));
And the regex I plan to use to strip out the unnecessary parts is:
(^app.use\()|(..$)
The first part of the regex gets everything up to the first parenthesis, and the it's supposed to pipe it to another expression which gets the last 2 characters of the string.
My issue is that it seems to be ignoring the second regex. I tried a few other expressions in the second part and they worked, but this one isn't.
What am I doing wrong?
Regex example on Regex101: https://regex101.com/r/jV9eH6/3
UPDATE:
This is not a duplicate of How to replace all occurrences of a string in JavaScript?
My question is about a specific issue with a regex, not about replacing one string with another in JavaScript.
You need to use multiline modifier. Whenever anchors ^, $ are used in your regex then feel free to add multi-line modifier m.
/(^app.use\()|(..$)/gm
DEMO

How to write a RegEx to check for a certain, specific number of characters?

I am trying to test a string for a state code, the regex I have is
^A[LKSZRAEP]|C[AOT]|D[EC]|F[LM]|G[AU]|HI|I[ADLN]|K[SY]|LA|M[ADEHINOPST]|N[CDEHJMVY]|O[HKR]|P[ARW]|RI|S[CD]|T[NX]|UT|V[AIT]|W[AIVY]$
The issue is, if I have something like "CTA12" as a test string, it will get a match of CT. How can I modify my regex to make it only match state codes that are not part of a larger string?
Your use of anchors with alternation is incorrect, ^AB|DC$ means "strings that start with AB or end with DC". To get the ^ and $ to both apply to each element of the alternation, you need to put the alternation in a group, for example ^(AB|DC)$.
Try changing your regex to the following:
^(A[LKSZRAEP]|C[AOT]|D[EC]|F[LM]|G[AU]|HI|I[ADLN]|K[SY]|LA|M[ADEHINOPST]|N[CDEHJMVY]|O[HKR]|P[ARW]|RI|S[CD]|T[NX]|UT|V[AIT]|W[AIVY])$
The alternative to using a group is to put the ^ and $ as a part of each element in the alternation, for example ^AB$|^DC$, but that would make your regex significantly longer so a group is the way to go.

Alternation operator inside square brackets does not work

I'm creating a javascript regex to match queries in a search engine string. I am having a problem with alternation. I have the following regex:
.*baidu.com.*[/?].*wd{1}=
I want to be able to match strings that have the string 'word' or 'qw' in addition to 'wd', but everything I try is unsuccessful. I thought I would be able to do something like the following:
.*baidu.com.*[/?].*[wd|word|qw]{1}=
but it does not seem to work.
replace [wd|word|qw] with (wd|word|qw) or (?:wd|word|qw).
[] denotes character sets, () denotes logical groupings.
Your expression:
.*baidu.com.*[/?].*[wd|word|qw]{1}=
does need a few changes, including [wd|word|qw] to (wd|word|qw) and getting rid of the redundant {1}, like so:
.*baidu.com.*[/?].*(wd|word|qw)=
But you also need to understand that the first part of your expression (.*baidu.com.*[/?].*) will match baidu.com hello what spelling/handle????????? or hbaidu-com/ or even something like lkas----jhdf lkja$##!3hdsfbaidugcomlaksjhdf.[($?lakshf, because the dot (.) matches any character except newlines... to match a literal dot, you have to escape it with a backslash (like \.)
There are several approaches you could take to match things in a URL, but we could help you more if you tell us what you are trying to do or accomplish - perhaps regex is not the best solution or (EDIT) only part of the best solution?

Struggling with regex to match only two of a character, not three

I need to match all occurrences of // in a string in a Javascript regex
It can't match /// or /
So far I have (.*[^\/])\/{2}([^\/].*)
which is basically "something that isn't /, followed by // followed by something that isn't /"
The approach seems to work apart from when the string I want to match starts with //
This doesn't work:
//example
This does
stuff // example
How do I solve this problem?
Edit: A bit more context - I am trying to replace // with !, so I am then using:
result = result.replace(myRegex, "$1 ! $2");
Replace two slashes that either begin the string or do not follow a slash,
and are followed by anything not a slash or the end of the string.
s=s.replace(/(^|[^/])\/{2}([^/]|$)/g,'$1!$2');
It looks like it wouldn't work for example// either.
The problem is because you're matching // preceded and followed by at least one non-slash character. This can be solved by anchoring the regex, and then you can make the preceding/following text optional:
^(.*[^\/])?\/{2}([^\/].*)?$
Use negative lookahead/lookbehind assertions:
(.*)(?<!/)//(?!/)(.*)
Use this:
/([^/]*)(\/{2})([^/]*)/g
e.g.
alert("///exam//ple".replace(/([^/]*)(\/{2})([^/]*)/g, "$1$3"));
EDIT: Updated the expression as per the comment.
/[/]{2}/
e.g:
alert("//example".replace(/[/]{2}/, ""));
This does not answer the OP's question about using regex, but since some of the original comments suggested using .replaceAll, since not everyone who reads the question in the future wants to use regex, since people might mistakenly assume that regex is the only alternative, and since these details cannot be accommodated by submitting a comment, here's a poor man's non-regex approach:
Temporarily replace the three contiguous characters with something that would never naturally occur — really important when dealing with user-entered values.
Replace the remaining two contiguous characters using .replaceAll().
Return the original three contiguous characters.
For instance, let's say you wanted to remove all instances of ".." without affecting occurrences of "...".
var cleansedText = $(this).text().toString()
.replaceAll("...", "☰☸☧")
.replaceAll("..", "")
.replaceAll("☰☸☧", "...")
;
$(this).text(cleansedText);
Perhaps not as fast as regex for longer strings, but works great for short ones.

JavaScript regex replace - but only part of matched string?

I have the following replace function
myString.replace(/\s\w(?=\s)/,"$1\xA0");
The aim is to take single-letter words (e.g. prepositions) and add a non-breaking space after them, instead of standard space.
However the above $1 variable doesn't work for me. It inserts text "$1 " instead of a part of original matched string + nbsp.
What is the reason for the observed behaviour? Is there any other way to achieve it?
$1 doesn't work because you don't have any capturing subgroups.
The regular expression should be something like /\b(\w+)\s+/.
Seems you want to do something like this:
myString.replace(/\s(\w)\s/,"$1\xA0");
but that way you will loose the whitespace before your single-letter word. So you probably want to also include the first \s in the capturing group.

Categories