Get all the WORDS except one specific word

Get all the WORDS except one specific word - javascript

I want to get all the words, except one, from a string using JS regex match function. For example, for a string testhello123worldtestWTF, excluding the word test, the result would be helloworldWTF.
I realize that I have to do it using look-ahead functions, but I can't figiure out how exactly. I came up with the following regex (?!test)[a-zA-Z]+(?=.*test), however, it work only partially.
http://refiddle.com/refiddles/59511c2075622d324c090000

IMHO, I would try to replace the incriminated word with an empty string, no?

Lookarounds seem to be an overkill for it, you can just replace the test with nothing:
var str = 'testhello123worldtestWTF';
var res = str.replace(/test/g, '');

Plugging this into your refiddle produces the results you're looking for:
/(test)/g
It matches all occurrences of the word "test" without picking up unwanted words/letters. You can set this to whatever variable you need to hold these.

WORDS OF CAUTION
Seeing that you have no set delimiters in your inputted string, I must say that you cannot reliably exclude a specific word - to a certain extent.
For example, if you want to exclude test, this might create a problem if the input was protester or rotatestreet. You don't have clear demarcations of what a word is, thus leading you to exclude test when you might not have meant to.
On the other hand, if you just want to ignore the string test regardless, just replace test with an empty string and you are good to go.

Related

Splitting a string at question mark, exclamation mark, or period in javascript and retain those marks?

I was a bit surprised, that actually no one had the exact same issue in javascript...
I tried several different solutions none of them parse the content correctly.
The closest one I tried : (I stole its regex query from a PHP solution)
const test = `abc?aaa.abcd?.aabbccc!`;
const sentencesList = test.split("/(\?|\.|!)/");
But result just going to be
["abc?aaa.abcd?.aabbccc!"]
What I want to get is
['abc?', 'aaa.', 'abcd?','.', 'aabbccc!']
I am so confused.. what exactly is wrong?

/[a-z]*[?!.]/g) will do what you want:
const test = `abc?aaa.abcd?.aabbccc!`;
console.log(test.match(/[a-z]*[?!.]/g))

To help you out, what you write is not a regex. test.split("/(\?|\.|!)/"); is simply an 11 character string. A regex would be, for example, test.split(/(\?|\.|!)/);. This still would not be the regex you're looking for.
The problem with this regex is that it's looking for a ?, ., or ! character only, and capturing that lone character. What you want to do is find any number of characters, followed by one of those three characters.
Next, String.split does not accept regexes as arguments. You'll want to use a function that does accept them (such as String.match).
Putting this all together, you'll want to start out your regex with something like this: /.*?/. The dot means any character matches, the asterisk means 0 or more, and the questionmark means "non-greedy", or try to match as few characters as possible, while keeping a valid match.
To search for your three characters, you would follow this up with /[?!.]/ to indicate you want one of these three characters (so far we have /.*?[?!.]/). Lastly, you want to add the g flag so it searches for every instance, rather than only the first. /.*?[?!.]/g. Now we can use it in match:
const rawText = `abc?aaa.abcd?.aabbccc!`;
const matchedArray = rawText.match(/.*?[?!.]/g);
console.log(matchedArray);

The following code works, I do not think we need pattern match. I take that back, I have been answering in Java.
final String S = "An sentence may end with period. Does it end any other way? Ofcourse!";
final String[] simpleSentences = S.split("[?!.]");
//now simpleSentences array has three elements in it.

Extracting both the full match, and the last token match in a regexp

I have a little interesting issue here. I have a plaintext URL coming from Excel and I need to change it to an HTML URL with a unique body. Here is the regex code for javascript:
text = text.toString().replace(/=hyperlink\(([#\\\w\s\(\)-\.\/]+)\)/g, "<a href='file:///$1'>$1</a>");
This works perfectly fine for what it does. Example, text is:
=hyperlink("\\share\folder\log\2013\13-05-13\13-05-13.txt")
regex turns it into
\\share\folder\log\2013\13-05-13\13-05-13.txt
However, I need the inner HTML to be just the text file name:
13-05-13.txt
To further complicate the matter, the original text the regex is going through is not a single occurrence. It is an entire spreadsheet with 100's of rows that contain this. So the regex will be matching and replacing 100's of these strings in one operation.
Hopefully it is possible to get this all done in one regexp on the entire string, but I suppose I could loop through each line of the string first...
If there is no way to do this with one regex engine, what do you think the best approach is? (no PHP/Python/Server side. Just Javascript, HTML, Jquery, etc).

I guess you could use this regex:
=hyperlink\("([#\\\w\s\(\)\-\.\/]+\\([^"]+))"\)
And this new replace:
$2
I'm not sure how your regex was working, but I added the quotes in the regex and replaced the single quotes by double quotes in the replace. Revert those if need be.
Demo

How to match between characters but not include them in the result

Say I have a string "&something=variable&something_else=var2"
I want to match between &something= and &, so I'll write a regular expression that looks like:
/(&something=).*?(&)/
And the result of .match() will be an array:
["&something=variable&", "&something=", "&"]
I've always solved this by just replacing the start and end elements manually but is there a way to not include them in the match results at all?

You're using the wrong capturing groups. You should be using this:
/&something=(.*?)&/
This means that instead of capturing the stuff you don't want (the delimiters), you capture what you do want (the data).

You can't avoid them showing up in your match results at all, but you can change how they show up and make it more useful for you.
If you change your match pattern to /&something=(.+?)&/ then using your test string of "&something=variable&something_else=var2" the match result array is ["&something=variable&", "variable"]
The first element is always the entire match, but the second one, will be the captured portion from the parentheses, which is much more useful, generally.
I hope this helps.

If you are trying to get variable out of the string, using replace with backreferences will get you what you want:
"&something=variable&something_else=var2".replace(/^.*&something=(.*?)&.*$/, '$1')
gives you
"variable"

Negating in /,?(([1-9]-[1-9])|([1-9]))/g

I am trying to match a string containing a mix of digits and hyphenated digits, like a crossword answer specification, for example 1,2-2 or 1-1,3,4,2-2
/,?(([1-9]-[1-9])|([1-9]))/g is what I've come up to match the string
value = value.replace(/,?(([1-9]-[1-9])|([1-9]))/g, '');
replaces ok, and I've checked it out in an online tester.
What I really need is to negate this, so I can use it on a keyup event, examine the contents of a textarea and remove characters that don't fit, so it only allows through characters as in the example.
I've tried ^ where expected, but this it's not doing what I expect, how should I negate the regex so I remove everything that doesn't match?
If there is a better way of doing this I'm open to suggestions too.

var value = 'hello,1,2,3,4-6,1-1,3,test,4,2-2';
var pattern = /,?(([1-9]-[1-9])|([1-9]))/g;
value.replace(pattern, ''); // "hello,test"
You can use String#match. With /g flag, it returns an array of all the matches, then you can use Array#join to join them.
The problem is that String#match returns null when there is no match, so you have to handle that case and use an empty array so that it can join:
(value.match(pattern) || []).join(''); // ",1,2,3,4-6,1-1,3,4,2-2"
Note: It may better to check them on onblur rather than onkeyup. Messing with the text that the user is currently typing will make it annoying. Better to wait for the user to finish typing.

Didn't test it in JS, but this should return the valid string beginning from the left and as long as valid values are encountered (note that I used \d - if you'd like 1-9 only, then use your brackets).
(?:\d(?:-\d)?,)*\d(?:-\d)?
E.g. matching this regular expression with the string "0-1,1,2,3,4-4,2,,1,3--4" will return "0-1,1,2,3,4-4,2" as the first match.

Trying to remove trailing text

I having the following code. I want to extract the last text (hello64) from it.
<span class="qnNum" id="qn">4</span><span>.</span> hello64 ?*
I used the code below but it removes all the integers
questionText = questionText.replace(/<span\b.*?>/ig, "");
questionText=questionText.replace(/<\/span>/ig, "");
questionText = questionText.replace(/\d+/g,"");
questionText = questionText.replace("*","");
questionText = questionText.replace(". ",""); i want to remove the first integer, and need to keep the rest of the integers

It's the third line .replace(/\d+/g,"") which is replacing the integers. If you want to keep the integers, then don't replace \d+, because that matches one or more digits.
You could achieve most of that all on one line, by the way - there's no need to have multiple replaces there:
var questionText = questionText.replace(/((<span\b.*?>)|(<\/span>)|(\d+))/ig, "");
That would do the same as the first three lines of your code. (of course, you'd need to drop the |(\d+) as per the first part of the answer if you didn't want to get rid of the digits.
[EDIT]
Re your comment that you want to replace the first integer but not the subsequent ones:
The regex string to do this would depend very heavily on what the possible input looks like. The problem is that you've given us a bit of random HTML code; we don't know from that whether you're expecting it to always be in this precise format (ie a couple of spans with contents, followed by a bit at the end to keep). I'll assume that this is the case.
In this case, a much simpler regex for the whole thing would be to replace eveything within <span....</span> with blank:
var questionText = questionText.replace(/(<span\b.*?>.*?<\/span>)/ig, "");
This will eliminate the whole of the <span> tags plus their contents, but leave anything outside of them alone.
In the case of your example this would provide the desired effect, but as I say, it's hard to know if this will work for you in all cases without knowing more about your expected input.
In general it's considered difficult to parse arbitrary HTML code with regex. Regex is a contraction of "Regular Expressions", which is a way of saying that they are good at handling strings which have 'regular' syntax. Abitrary HTML is not a 'regular' syntax due to it's unlimited possible levels of nesting. What I'm trying to say here is that if you have anything more complex than the simple HTML snippets you've supplied, then you may be better off using a HTML parser to extract your data.

This will match the complete string and put the part after the last </span> till the next word boundary \b into the capturing group 1. You just need to replace then with the group 1, i.e. $1.
searched_string = string.replace(/^.*<\/span>\s*([A-Za-z0-9]+)\b.*$/, "$1");
The captured word can consist of [A-Za-z0-9]. If you want to have anything else there just add it into that group.

We Keep Coding

JavaScript is the programming language of the Web.

Get all the WORDS except one specific word - javascript

IMHO, I would try to replace the incriminated word with an empty string, no?

Lookarounds seem to be an overkill for it, you can just replace the test with nothing: var str = 'testhello123worldtestWTF'; var res = str.replace(/test/g, '');

Plugging this into your refiddle produces the results you're looking for: /(test)/g It matches all occurrences of the word "test" without picking up unwanted words/letters. You can set this to whatever variable you need to hold these.

Related

Splitting a string at question mark, exclamation mark, or period in javascript and retain those marks?

Extracting both the full match, and the last token match in a regexp

How to match between characters but not include them in the result

Negating in /,?(([1-9]-[1-9])|([1-9]))/g

Trying to remove trailing text

Categories

Resources