I have a textbox where a user puts a string like this:
"hello world! I think that __i__ am awesome (yes I am!)"
I need to create a correct URL like this:
hello-world-i-think-that-i-am-awesome-yes-i-am
How can it be done using regular expressions?
Also, is it possible to do it with Greek (for example)?
"Γεια σου κόσμε"
turns to
geia-sou-kosme
In other programming languages (Python/Ruby) I am using a translation array. Should I do the same here?
Try this:
function doDashes(str) {
var re = /[^a-z0-9]+/gi; // global and case insensitive matching of non-char/non-numeric
var re2 = /^-*|-*$/g; // get rid of any leading/trailing dashes
str = str.replace(re, '-'); // perform the 1st regexp
return str.replace(re2, '').toLowerCase(); // ..aaand the second + return lowercased result
}
console.log(doDashes("hello world! I think that __i__ am awesome (yes I am!)"));
// => hello-world-I-think-that-i-am-awesome-yes-I-am
As for the greek characters, yeah I can't think of anything else than some sort of lookup table used by another regexp.
Edit, here's the oneliner version:
Edit, added toLowerCase():
Edit, embarrassing fix to the trailing regexp:
function doDashes2(str) {
return str.replace(/[^a-z0-9]+/gi, '-').replace(/^-*|-*$/g, '').toLowerCase();
}
A simple regex for doing this job is matching all "non-word" characters, and replace them with a -. But before matching this regex, convert the string to lowercase. This alone is not fool proof, since a dash on the end may be possible.
[^a-z]+
Thus, after the replacement; you can trim the dashes (from the front and the back) using this regex:
^-+|-+$
You'd have to create greek-to-latin glyps translation yourself, regex can't help you there. Using a translation array is a good idea.
I can't really say for Greek characters, but for the first example, a simple:
/[^a-zA-Z]+/
Will do the trick when using it as your pattern, and replacing the matches with a "-"
As per the Greek characters, I'd suggest using an array with all the "character translations", and then adding it's values to the regular expression.
To roughly build the url you would need something like this.
var textbox = "hello world! I think that __i__ am awesome (yes I am!)";
var url = textbox.toLowerCase().replace(/([^a-z])/, '').replace(/\s+/, " ").replace(/\s/, '-');
It simply removes all non-alpha characters, removes double spacing, and then replaces all space chars with a dash.
You could use another regular expression to replace the greek characters with english characters.
Related
I've got a question concerning regex.
I was wondering how one could replace an encapsulated text, something like {key:23} to something like <span class="highlightable">23</span, so that the entity will still remain encapsulated, but with something else.
I will do this in JS, but the regex is what is important, I have been searching for a while, probably searching for the wrong terms, I should probably learn more about regex, generally.
In any case, is there someone who knows how to perform this operation with simplicity?
Thanks!
It's important that you find {key:23} in your text first, and then replace it with your wanted syntax, this way you avoid replacing {key:'sometext'} with that syntax which is unwanted.
var str = "some random text {key:23} some random text {key:name}";
var n = str.replace(/\{key:[\d]+\}/gi, function myFunction(x){return x.replace(/\{key:/,'<span>').replace(/\}/, '</span>');});
this way only {key:AnyNumber} gets replaced, and {key:AnyThingOtherThanNumbers} don't get touched.
It seems you are new to regex. You need to learn more about character classes and capturing groups and backreferences.
The regex is somewhat basic in your case if you do not need any nested encapsulated text support.
Let's start:
The beginning is {key: - it will match the substring literally. Note that { can be a special character (denoting start of a limiting quantifier), thus, it is a good idea to escape it: {key:.
([^}]+) - This is a bit more interesting: the round brackets around are a capturing group that let us later back-reference the matched text. The [^}]+ means 1 or more characters (due to +) other than } (as [^}] is a negated character class where ^ means not)
} matches a } literally.
In the replacement string, we'll get the captured text using a backreference $1.
So, the entire regex will look like:
{key:([^}]+)}
See demo on regex101.com
Code snippet:
var re = /{key:([^}]+)}/g;
var str = '{key:23}';
var subst = '<span class="highlightable">$1</span>';
document.getElementById("res").innerHTML = str.replace(re, subst);
.highlightable
{
color: red;
}
<div id="res"/>
If you want to use a different behavior based on the value of key, then you'll need to adjust the regex to either match digits only (with \d+) or letters only (say, with [a-zA-Z] for English), or other shorthand classes, ranges (= character classes), or their combinations.
If your string is in var a, then:
var test = a.replace( /\{key:(\d+)\}/g, "<span class='highlightable'>$1</span>");
How would I match the quotations around "text" in the string below and not around "TEST TEXT" using RegEx. I wanted just quotations only when they are by themselves. I tried a negative lookahead (for a second quote) but it still captured the second of the two quotes around TEST TEXT.
This is some "text". This is also some ""TEST TEXT""
Be aware that I need this to scale so sometimes it would be right in the middle of a string so something like this:
/(\s|\w)(\")(?!")/g (using $2...)
Would work in this example but not if the string was:
This is some^"text".This is also some ""TEST TEXT""
I just need quotation marks by themselves.
EDIT
FYI, this needs to be Javascript RegEx so lookbehind would not be an option for me for this one.
Since you have not tagged any particular flavor of regex I am takig liberty of using lookbehind also. You can use:
(?<!")"(?!")[^"]*"
RegEx Demo
Update: For working with Javascript you can use this regex:
/""[^"]*""|(")([^"]*)(")/
And use captured group # 1 for your text.
RegEx Demo
I'm not sure if I really understood well your needs. I'll post this answer to check if it helps you but I can delete it if it doesn't.
So, is this what you want using this regex:
"\w+?"
Working demo
By the way, if you just want to get the content within "..." you can use this regex:
"(\w+?)"
Working demo
You can't do this with a pure JavaScript regexp. I am going to eat my words now however, as you can use the following solution using callback parameters:
var regex = /""+|(")/g
replaced = subject.replace(regex, function($0, $1) {
if ($1 == "\"") return "-"; // What to replace to?
else return $0;
});
"This is some -text-. This is also some ""TEST TEXT"""
If you're needing the regex to split the string, then you can use the above to replace matches to something distinctive, then split by them:
var regex = /""+|(")/g
replaced = subject.replace(regex, function($0, $1) {
if ($1 == "\"") return "☺";
else return $0;
});
splits = replaced.split("☺");
["This is some ", "text", ". This is also some ""TEST TEXT"""]
Referenced by:http://www.rexegg.com/regex-best-trick.html
I've a string done like this: "http://something.org/dom/My_happy_dog_%28is%29cool!"
How can I remove all the initial domain, the multiple underscore and the percentage stuff?
For now I'm just doing some multiple replace, like
str = str.replace("http://something.org/dom/","");
str = str.replace("_%28"," ");
and go on, but it's really ugly.. any help?
Thanks!
EDIT:
the exact input would be "My happy dog is cool!" so I would like to get rid of the initial address and remove the underscores and percentage and put the spaces in the right place!
The problem is that trying to put a regex on Chrome "something goes wrong". Is it a problem of Chrome or my regex?
I'd suggest:
var str = "http://something.org/dom/My_happy_dog_%28is%29cool!";
str.substring(str.lastIndexOf('/')+1).replace(/(_)|(%\d{2,})/g,' ');
JS Fiddle demo.
The reason I took this approach is that RegEx is fairly expensive, and is often tricky to fine tune to the point where edge-cases become less troublesome; so I opted to use simple string manipulation to reduce the RegEx work.
Effectively the above creates a substring of the given str variable, from the index point of the lastIndexOf('/') (which does exactly what you'd expect) and adding 1 to that so the substring is from the point after the / not before it.
The regex: (_) matches the underscores, the | just serves as an or operator and the (%\d{2,}) serves to match digit characters that occur twice in succession and follow a % sign.
The parentheses surrounding each part of the regex around the |, serve to identify matching groups, which are used to identify what parts should be replaced by the ' ' (single-space) string in the second of the arguments passed to replace().
References:
lastIndexOf().
replace().
substring().
You can use unescape to decode the percentages:
str = unescape("http://something.org/dom/My_happy_dog_%28is%29cool!")
str = str.replace("http://something.org/dom/","");
Maybe you could use a regular expression to pull out what you need, rather than getting rid of what you don't want. What is it you are trying to keep?
You can also chain them together as in:
str.replace("http://something.org/dom/", "").replace("something else", "");
You haven't defined the problem very exactly. To get rid of all stretches of characters ending in %<digit><digit> you'd say
var re = /.*%\d\d/g;
var str = str.replace(re, "");
ok, if you want to replace all that stuff I think that you would need something like this:
/(http:\/\/.*\.[a-z]{3}\/.*\/)|(\%[a-z0-9][a-z0-9])|_/g
test
var string = "http://something.org/dom/My_happy_dog_%28is%29cool!";
string = string.replace(/(http:\/\/.*\.[a-z]{3}\/.*\/)|(\%[a-z0-9][a-z0-9])|_/g,"");
I have a string like that:
var str = 'aaaaaa, bbbbbb, ccccc, ddddddd, eeeeee ';
My goal is to delete the last space in the string. I would use,
str.split(0,1);
But if there is no space after the last character in the string, this will delete the last character of the string instead.
I would like to use
str.replace("regex",'');
I am beginner in RegEx, any help is appreciated.
Thank you very much.
Do a google search for "javascript trim" and you will find many different solutions.
Here is a simple one:
trimmedstr = str.replace(/\s+$/, '');
When you need to remove all spaces at the end:
str.replace(/\s*$/,'');
When you need to remove one space at the end:
str.replace(/\s?$/,'');
\s means not only space but space-like characters; for example tab.
If you use jQuery, you can use the trim function also:
str = $.trim(str);
But trim removes spaces not only at the end of the string, at the beginning also.
Seems you need a trimRight function. its not available until Javascript 1.8.1. Before that you can use prototyping techniques.
String.prototype.trimRight=function(){return this.replace(/\s+$/,'');}
// Now call it on any string.
var a = "a string ";
a = a.trimRight();
See more on Trim string in JavaScript? And the compatibility list
You can use this code to remove a single trailing space:
.replace(/ $/, "");
To remove all trailing spaces:
.replace(/ +$/, "");
The $ matches the end of input in normal mode (it matches the end of a line in multiline mode).
Try the regex ( +)$ since $ in regex matches the end of the string. This will strip all whitespace from the end of the string.
Some programs have a strip function to do the same, I do not believe the stadard Javascript library has this functionality.
Regex Reference Sheet
Working example:
var str = "Hello World ";
var ans = str.replace(/(^[\s]+|[\s]+$)/g, '');
alert(str.length+" "+ ans.length);
Fast forward to 2021,
The trimEnd() function is meant exactly for this!
It will remove all whitespaces (including spaces, tabs, new line characters) from the end of the string.
According to the official docs, it is supported in every major browser. Only IE is unsupported. (And lets be honest, you shouldn't care about IE given that microsoft itself has dropped support for IE in Aug 2021!)
Looking for a regex/replace function to take a user inputted string say, "John Smith's Cool Page" and return a filename/url safe string like "john_smith_s_cool_page.html", or something to that extent.
Well, here's one that replaces anything that's not a letter or a number, and makes it all lower case, like your example.
var s = "John Smith's Cool Page";
var filename = s.replace(/[^a-z0-9]/gi, '_').toLowerCase();
Explanation:
The regular expression is /[^a-z0-9]/gi. Well, actually the gi at the end is just a set of options that are used when the expression is used.
i means "ignore upper/lower case differences"
g means "global", which really means that every match should be replaced, not just the first one.
So what we're looking as is really just [^a-z0-9]. Let's read it step-by-step:
The [ and ] define a "character class", which is a list of single-characters. If you'd write [one], then that would match either 'o' or 'n' or 'e'.
However, there's a ^ at the start of the list of characters. That means it should match only characters not in the list.
Finally, the list of characters is a-z0-9. Read this as "a through z and 0 through 9". It's a short way of writing abcdefghijklmnopqrstuvwxyz0123456789.
So basically, what the regular expression says is: "Find every letter that is not between 'a' and 'z' or between '0' and '9'".
I know the original poster asked for a simple Regular Expression, however, there is more involved in sanitizing filenames, including filename length, reserved filenames, and, of course reserved characters.
Take a look at the code in node-sanitize-filename for a more robust solution.
For more flexible and robust handling of unicode characters etc, you could use the slugify in conjunction with some regex to remove unsafe URL characters
const urlSafeFilename = slugify(filename, { remove: /"<>#%\{\}\|\\\^~\[\]`;\?:#=&/g });
This produces nice kebab-case filenemas in your url and allows for more characters outside the a-z0-9 range.
Here's what I did. It works to convert full sentences into a decently clean URL.
First it trims the string, then it converts spaces to dashes (-), then it gets rid of anything that's not a letter/number/dash
function slugify(title) {
return title
.trim()
.replace(/ +/g, '-')
.toLowerCase()
.replace(/[^a-z0-9-]/g, '')
}
slug.value = slugify(text.value);
text.oninput = () => { slug.value = slugify(text.value); };
<input id="text" value="Foo: the old #Foobîdoo!! " style="font-size:1.2em">
<input id="slug" readonly style="font-size:1.2em">
I think your requirement is to replaces white spaces and aphostophy `s with _ and append the .html at the end try to find such regex.
refer
http://www.regular-expressions.info/javascriptexample.html