Replacing url by a value taking from the url with another url - javascript

I have a markdown text file with links like that:
[Text](https://docs.google.com/document/d/unique-doc-id-here/edit)
or
[Text2](https://docs.google.com/document/d/unique-doc-id-here")
I want to replace the whole href with another one by taking the unique-doc-id-here, passing that to a function that will return a new href, so in result my urls would look something like that:
[Text](https://new-url-here.com/fragment-unique-id)
or
[Text2](https://new-url-here.com/fragment-unique-id)
I think my problem is to select the unique-doc-id-here, I think I have to use the regex for that.
So the solution could be looking like this:
text.replace(/https:\/\/docs.google.com\/document\/d\/(.*?)*/gm, (x) =>
this.getNewHref(x)
);
However it seems that the regex does not looks quite right, because it does not much all the cases. Any ideas how to fix?
Here is an input text example:
# Title
Text text text.
Text 1 text 1 text 1, abc.
More text
Bullet points
- [abc]
- [bla]
- [cba]
## Title 2
More text:
- A
- B
- C
- D
Text text text text [url1](https://docs.google.com/document/d/2x2my-DRqfSidOsdve4m9bF_eEOJ7RqIWP7tk7PM4qEr) text.
**BOLD.**
## Title
Text2 text1 text3 text
[url2](https://docs.google.com/document/d/4x2mrhsqfGSidOsdve4m9bb_wEOJ7RqsWP7tk7PMPqEb/edit#bookmark=id.mbnek2bdkj8c) text.
More text here
[bla](https://docs.google.com/document/d/6an7_b4Mb0OdxNZdfD3KedfvFtdf2OeGzG40ztfDhi5o9uU/edit)
I've try this regex \w+:\/\/.*?(?=\s) but it does select the last ) symbol
I've applied a proposed solution by #The fourth bird:
function getNewHref(id: string) {
const data = getText();
const element = data.find((x: any) => x.id === id);
if(element?.url) {
return element.url;
} else {
return 'unknown-url'
}
}
data = data.replace(
/\[[^\][]*]\(https?:\/\/docs\.google\.com\/document\/d\/([^\s\\\/)]+)[^\s)]*\)/gm,
(x, g1) => getNewHref(g1)
);
The problem is that the replace function replace the whole thing so what was [...](...) becomes ./new-url or unknown-url but needs to me [original text](new result)

You can make the pattern more specific, and then use the group 1 value.
(\[[^\][]*]\()https?:\/\/docs\.google\.com\/document\/d\/([^\s\\\/)]+)[^\s)]*\)
The pattern in parts matches:
(\[[^\][]*]\() Capture group 1, match from [...]( using a negated character class
https?:\/\/docs\.google\.com\/document\/d\/ Match the leading part of the url
( Capture group 2
[^\s\\\/)]+ Match 1+ chars other than a whitespace char, \ or /
) Close group 1
[^\s)]* Match optional chars other than a whitespace char or )
\) Match )
Regex demo
For example, a happy case scenario where all the keys to be replaced exist (note that you can omit the /m flag as there are no anchors in the pattern)
const text = "[Text](https://docs.google.com/document/d/unique-doc-id-here/edit)";
const regex = /(\[[^\][]*]\()https?:\/\/docs\.google\.com\/document\/d\/([^\s\\\/)]+)[^\s)]*\)/g;
function getNewHref(id) {
const replacements = {
"unique-doc-id-here": `https://docs.google.com/document/d/${id}`
}
return replacements[id];
}
const replacedText = text.replace(regex, (x, g1, g2) => g1 + getNewHref(g2)) + ")";
console.log(replacedText);

You can achieve this by getting the href link from a string by using RegEx and then by splitting that up using forward slash.
Try this (Descriptive comments has been added in the below code snippet) :
const text = 'Text';
// Get the href link using regex
const link = text.match(/"([^"]*)"/)[1];
// Split the string and get the array of link based on the forward slash.
const linkArr = link.split('/')
// get the unique ID from an array.
const uniqueID = linkArr[linkArr.indexOf('d') + 1]
console.log(uniqueID);

Related

Replace text but not if contain specific characters?

In JavaScript, I am using the below code to replace text that matches a certain string. The replacement wraps the string like this: "A(hello)". It works great but if there are two strings that are the same, for example: "Hello hi Hello", only the first one will get marked and if I am trying twice, it will get marked double, like this: "A(A(Hello)) Hi Hello".
A solution to this could be to not replace a word if it contains "A(" or is between "A(" and ")"; both would work.
Any idea how it can be achieved?
Note: I cant use replaceAll because if there is already a word that is replaced and a new word is added, then the first one will be overwritten. Therefore I need a solution like above. For example,If I have a string saying "Hello hi", and I mark Hello, it will say "A(Hello) hi", but if I then add Hello again to the text and replace it, it will look like this: A(A(Hello)) hi A(Hello).
Here is what I got so far:
let text = "Hello hi Hello!"
let selection = "Hello"
let A = `A(${selection})`
let addWoman = text.replace(selection, A)
You can use a negative lookahead assertion in your pattern that fails the match if we A( before full word Hello:
(?<!A\()\bHello\b
And replace it with A($&)
RegEx Demo
Code:
let text = "Hello hi Hello!";
let selection = "Hello";
let A = `A(${selection})`;
let re = new RegExp(`(?<!A\\()\\b${selection}\\b`, "g");
let addWoman = text.replace(re, A);
console.log(addWoman);
console.log(addWoman.replace(re, A));
A solution to this could be to not replace a word if it contains "A(" or is between "A(" and ")"; both would work.
To avoid re-matching selection inside a A(...) string, you can match A(...) and capture it into a group so as to know if the group matched, it should be kept, else, match the word of your choice:
let text = "Hello hi Hello!"
let selection = "Hello"
let A = `A(${selection})`
const rx = new RegExp(String.raw`(A\([^()]*\))|${selection.replace(/[-\/\\^$*+?.()|[\]{}]/g, '\\$&')}`, 'g')
let addWoman = text.replace(rx, (x,y) => y || A)
console.log(addWoman);
// Replacing the second time does not modify the string:
console.log(addWoman.replace(rx, (x,y) => y || A))
The regex will look like /(A\([^()]*\))|Hello/g, it matches
(A\([^()]*\)) - Group 1: A and then ( followed with zero or more chars other than ( and ) and then a ) char
| - or
Hello - a Hello string.

How do I make my code concise and short using Regex Expressions

I'm trying to make the code a lot cleaner and concise. The main goal I want to do is to change the string to my requirements .
Requirements
I want to remove any empty lines (like the one in the middle of the two sentences down below)
I want to remove the * in front of each sentence, if there is.
I want to make the first letter of each word capital and the rest lowercase (except words that have $ in front of it)
This is what I've done so far:
const string =
`*SQUARE HAS ‘NO PLANS’ TO BUY MORE BITCOIN: FINANCIAL NEWS
$SQ
*$SQ UPGRADED TO OUTPERFORM FROM PERFORM AT OPPENHEIMER, PT $185`
const nostar = string.replace(/\*/g, ''); // gets rid of the * of each line
const noemptylines = nostar.replace(/^\s*[\r\n]/gm, ''); //gets rid of empty blank lines
const lowercasestring = noemptylines.toLowerCase(); //turns it to lower case
const tweets = lowercasestring.replace(/(^\w{1})|(\s{1}\w{1})/g, match => match.toUpperCase()); //makes first letter of each word capital
console.log(tweets)
I've done most of the code, however, I want to keep words that have $ in front of it, capital, which I don't know how to do.
Furthermore, I was wondering if its possible to combine regex expression, so its even shorter and concise.
You could make use of capture groups and the callback function of replace.
^(\*|[\r\n]+)|\$\S*|(\S+)
^ Start of string
(\*|[\r\n]*$) Capture group 1, match either * or 1 or more newlines
| Or
\$\S* Match $ followed by optional non whitespace chars (which will be returned unmodified in the code)
| Or
(\S+) Capture group 2, match 1+ non whitespace chars
Regex demo
const regex = /^(\*|[\r\n]+)|\$\S*|(\S+)/gm;
const string =
`*SQUARE HAS ‘NO PLANS’ TO BUY MORE BITCOIN: FINANCIAL NEWS
$SQ
*$SQ UPGRADED TO OUTPERFORM FROM PERFORM AT OPPENHEIMER, PT $185`;
const res = string.replace(regex, (m, g1, g2) => {
if (g1) return ""
if (g2) {
g2 = g2.toLowerCase();
return g2.toLowerCase().charAt(0).toUpperCase() + g2.slice(1);
}
return m;
});
console.log(res);
Making it readable is more important than making it short.
const tweets = string
.replace(/\*/g, '') // gets rid of the * of each line
.replace(/^\s*[\r\n]/gm, '') //gets rid of empty blank lines
.toLowerCase() //turns it to lower case
.replace(/(^\w{1})|(\s{1}\w{1})/g, match => match.toUpperCase()) //makes first letter of each word capital
.replace(/\B\$(\w+)\b/g, match => match.toUpperCase()); //keep words that have $ in front of it, capital

Group multiline string by regex pattern javascript

I have a multiline string that I want to split and group by a certain regex pattern that appears several times throughout the string
Some filler
at the beginning
of the text
Checking against foo...
Some text here
More text
etc.
Checking against bar...
More text
moremoremore
Using the above, I'd like to group by the value following the term Checking against (so in this example foo and bar, and in those groups would be the text following that line, up until the next occurrence
So the resulting output would be something like the below, allowing access to the values by the grouping name
{
foo: 'Some text here\nMore text\netc.'
bar: 'More text\nmoremoremore'
}
My initial approach was to split the string on the newlines into an array of elements, I'm then struggling to
Find occurrence of "Checking against" and set that as the key
Append every line up until the next occurrence as the value
maybe you can try this
const str = `Some filler
at the beginning
of the text
Checking against foo...
Some text here
More text
etc.
Checking against bar...
More text
moremoremore`;
let current = null;
const result = {};
str.split("\n").forEach(line => {
const match =line.match(/Checking against (.+?)\.\.\./);
if (match) {
current = match[1];
} else if (current && line !== "") {
if (result[current]) {
result[current] += "\n" + line
} else {
result[current] = line;
}
}
});
console.log(result)
There are many ways to do that. You could use split to split the whole text by "Checking against" and at the same time capture the word that follows it as part of the splitting separator.
Then ignore the intro with slice(1), and transform the array of keyword, text parts into an array of pairs, which in turn can be fed into Object.fromEntries. That will return the desired object:
let data = `Some filler
at the beginning
of the text
Checking against foo...
Some text here
More text
etc.
Checking against bar...
More text
moremoremore`;
let result = Object.fromEntries(
data.split(/^Checking against (\w+).*$/gm)
.slice(1)
.reduce((acc, s, i, arr) => i%2 ? acc.concat([[arr[i-1], s.trim()]]) : acc, [])
);
console.log(result);

Replace capture group of dynamic size

I want to replace the first part of regex for a URL with asterisks. Depending on the regex, for example:
Case 1
http://example.com/path1/path2?abcd => http://example.com/path1/**********
Regex 1: /^(https?:\/\/.+\/path1\/?)(.+)/but I want each character in group 2 to be replaced individually with *
or
Case 2
person#example.com => ******#example.com
Regex 2
/^(.+)(#.+)$/, similarly I want all characters in the first capture group to be replaced individually with *
I have tried to use capture groups, but then, I'm left with *#example.com
let email = `person#example.com`;
let regex = /^(.+)(#.+)$/;
console.log(email.replace(regex, '*$2'));
let url = `http://example.com/path1/path2?abcd`;
let regex = /^(https?:\/\/.+\/path1\/?)(.+)/;
console.log(url.replace(regex, '$1*'));
You may use
let email = `person#example.com`;
let regex = /[^#]/gy;
console.log(email.replace(regex, '*'));
// OR
console.log(email.replace(/(.*)#/, function ($0,$1) {
return '*'.repeat($1.length) + "#";
}));
and
let url = `http://example.com/path1/path2?abcd`;
let regex = /^(https?:\/\/.+\/path1\/?)(.*)/gy;
console.log(url.replace(regex, (_,$1,$2) => `${$1}${'*'.repeat($2.length)}` ));
// OR
console.log(url.replace(regex, function (_,$1,$2) {
return $1 + ('*'.repeat($2.length));
}));
In case of .replace(/[^#]/gy, '*'), each char other than # from the start of the string is replaced with * (so, up to the first #).
In case of .replace(/(.*)#/, function ($0,$1) { return '*'.repeat($1.length) + "#"; }), all chars up to the last # are captured into Group 1 and then the match is replaced with the same amount of asterisks as the length of the Group 1 value + the # char (it should be added into the replacement pattern as it is used as part of the consuming regex part).
The .replace(regex, (_,$1,$2) => `${$1}${'*'.repeat($2.length)}` ) follows the same logic as the case described above: you capture the part you need to replace, pass it into the anonymous callback method and manipulate its value using a bit of code.
You can use the sticky flag y (but Internet Explorer doesn't support it):
s = s.replace(/(^https?:\/\/.*?\/path1\/?|(?!^))./gy, '$1*')
But the simplest (and that is supported everywhere), is to use a function as replacement parameter.
s = s.replace(/^(https?:\/\/.+\/path1\/?)(.*)/, function (_, m1, m2) {
return m1 + '*'.repeat(m2.length);
});
For the second case, you can simply check if there's an # after the current position:
s = s.replace(/.(?=.*#)/g, '*');

How to use a variable inside Regex?

I have this line in my loop:
var regex1 = new RegExp('' + myClass + '[:*].*');
var rule1 = string.match(regex1)
Where "string" is a string of class selectors, for example: .hb-border-top:before, .hb-border-left
and "myClass" is a class: .hb-border-top
As I cycle through strings, i need to match strings that have "myClass" in them, including :before and :hover but not including things like hb-border-top2.
My idea for this regex is to match hb-border-top and then :* to match none or more colons and then the rest of the string.
I need to match:
.hb-fill-top::before
.hb-fill-top:hover::before
.hb-fill-top
.hb-fill-top:hover
but the above returns only:
.hb-fill-top::before
.hb-fill-top:hover::before
.hb-fill-top:hover
and doesn't return .hb-fill-top itself.
So, it has to match .hb-fill-top itself and then anything that follows as long as it starts with :
EDIT:
Picture below: my strings are the contents of {selectorText}.
A string is either a single class or a class with a pseudo element, or a rule with few clases in it, divided by commas.
each string that contains .hb-fill-top ONLY or .hb-fill-top: + something (hover, after, etc) has to be selected. Class is gonna be in variable "myClass" hence my issue as I can't be too precise.
I understand you want to get any CSS selector name that contains the value anywhere inside and has EITHER : and 0+ chars up to the end of string OR finish right there.
Then, to get matches for the .hb-fill-top value you need a solution like
/\.hb-fill-top(?::.*)?$/
and the following JS code to make it all work:
var key = ".hb-fill-top";
var rx = RegExp(key.replace(/[-\/\\^$*+?.()|[\]{}]/g, '\\$&') + "(?::.*)?$");
var ss = ["something.hb-fill-top::before","something2.hb-fill-top:hover::before","something3.hb-fill-top",".hb-fill-top:hover",".hb-fill-top2:hover",".hb-fill-top-2:hover",".hb-fill-top-bg-br"];
var res = ss.filter(x => rx.test(x));
console.log(res);
Note that .replace(/[-\/\\^$*+?.()|[\]{}]/g, '\\$&') code is necessary to escape the . that is a special regex metacharacter that matches any char but a line break char. See Is there a RegExp.escape function in Javascript?.
The ^ matches the start of a string.
(?::.*)?$ will match:
(?::.*)?$ - an optional (due to the last ? quantifier that matches 1 or 0 occurrences of the quantified subpattern) sequence ((?:...)? is a non-capturing group) of a
: - a colon
.* - any 0+ chars other than line break chars
$ - end of the string.
var regex1 = new RegExp(`^\\${myClass}(:{1,2}\\w+)*$`)
var passes = [
'.hb-fill-top::before',
'.hb-fill-top:hover::before',
'.hb-fill-top',
'.hb-fill-top:hover',
'.hb-fill-top::before',
'.hb-fill-top:hover::before',
'.hb-fill-top:hover'
];
var fails = ['.hb-fill-top-bg-br'];
var myClass = '.hb-fill-top';
var regex = new RegExp(`^\\${myClass}(:{1,2}\\w+)*$`);
passes.forEach(p => console.log(regex.test(p)));
console.log('---');
fails.forEach(f => console.log(regex.test(f)));
var regex1 = new RegExp('\\' + myClass + '(?::[^\s]*)?');
var rule1 = string.match(regex1)
This regex select my class, and everething after if it start with : and stop when it meets a whitespace character.
See the regex in action.
Notice also that I added '\\' at the beginning. This is in order to escape the dot in your className. Otherwise it would have matched something else like
ahb-fill-top
.some-other-hb-fill-top
Also be careful about .* it may match something else after (I don't know your set of strings). You might want to be more precise with :{1,2}[\w-()]+ in the last group. So:
var regex1 = new RegExp('\\' + myClass + '(?::{1,2}[\w-()]+)?');

Categories