Group multiline string by regex pattern javascript - javascript

I have a multiline string that I want to split and group by a certain regex pattern that appears several times throughout the string
Some filler
at the beginning
of the text
Checking against foo...
Some text here
More text
etc.
Checking against bar...
More text
moremoremore
Using the above, I'd like to group by the value following the term Checking against (so in this example foo and bar, and in those groups would be the text following that line, up until the next occurrence
So the resulting output would be something like the below, allowing access to the values by the grouping name
{
foo: 'Some text here\nMore text\netc.'
bar: 'More text\nmoremoremore'
}
My initial approach was to split the string on the newlines into an array of elements, I'm then struggling to
Find occurrence of "Checking against" and set that as the key
Append every line up until the next occurrence as the value

maybe you can try this
const str = `Some filler
at the beginning
of the text
Checking against foo...
Some text here
More text
etc.
Checking against bar...
More text
moremoremore`;
let current = null;
const result = {};
str.split("\n").forEach(line => {
const match =line.match(/Checking against (.+?)\.\.\./);
if (match) {
current = match[1];
} else if (current && line !== "") {
if (result[current]) {
result[current] += "\n" + line
} else {
result[current] = line;
}
}
});
console.log(result)

There are many ways to do that. You could use split to split the whole text by "Checking against" and at the same time capture the word that follows it as part of the splitting separator.
Then ignore the intro with slice(1), and transform the array of keyword, text parts into an array of pairs, which in turn can be fed into Object.fromEntries. That will return the desired object:
let data = `Some filler
at the beginning
of the text
Checking against foo...
Some text here
More text
etc.
Checking against bar...
More text
moremoremore`;
let result = Object.fromEntries(
data.split(/^Checking against (\w+).*$/gm)
.slice(1)
.reduce((acc, s, i, arr) => i%2 ? acc.concat([[arr[i-1], s.trim()]]) : acc, [])
);
console.log(result);

Related

Replacing url by a value taking from the url with another url

I have a markdown text file with links like that:
[Text](https://docs.google.com/document/d/unique-doc-id-here/edit)
or
[Text2](https://docs.google.com/document/d/unique-doc-id-here")
I want to replace the whole href with another one by taking the unique-doc-id-here, passing that to a function that will return a new href, so in result my urls would look something like that:
[Text](https://new-url-here.com/fragment-unique-id)
or
[Text2](https://new-url-here.com/fragment-unique-id)
I think my problem is to select the unique-doc-id-here, I think I have to use the regex for that.
So the solution could be looking like this:
text.replace(/https:\/\/docs.google.com\/document\/d\/(.*?)*/gm, (x) =>
this.getNewHref(x)
);
However it seems that the regex does not looks quite right, because it does not much all the cases. Any ideas how to fix?
Here is an input text example:
# Title
Text text text.
Text 1 text 1 text 1, abc.
More text
Bullet points
- [abc]
- [bla]
- [cba]
## Title 2
More text:
- A
- B
- C
- D
Text text text text [url1](https://docs.google.com/document/d/2x2my-DRqfSidOsdve4m9bF_eEOJ7RqIWP7tk7PM4qEr) text.
**BOLD.**
## Title
Text2 text1 text3 text
[url2](https://docs.google.com/document/d/4x2mrhsqfGSidOsdve4m9bb_wEOJ7RqsWP7tk7PMPqEb/edit#bookmark=id.mbnek2bdkj8c) text.
More text here
[bla](https://docs.google.com/document/d/6an7_b4Mb0OdxNZdfD3KedfvFtdf2OeGzG40ztfDhi5o9uU/edit)
I've try this regex \w+:\/\/.*?(?=\s) but it does select the last ) symbol
I've applied a proposed solution by #The fourth bird:
function getNewHref(id: string) {
const data = getText();
const element = data.find((x: any) => x.id === id);
if(element?.url) {
return element.url;
} else {
return 'unknown-url'
}
}
data = data.replace(
/\[[^\][]*]\(https?:\/\/docs\.google\.com\/document\/d\/([^\s\\\/)]+)[^\s)]*\)/gm,
(x, g1) => getNewHref(g1)
);
The problem is that the replace function replace the whole thing so what was [...](...) becomes ./new-url or unknown-url but needs to me [original text](new result)
You can make the pattern more specific, and then use the group 1 value.
(\[[^\][]*]\()https?:\/\/docs\.google\.com\/document\/d\/([^\s\\\/)]+)[^\s)]*\)
The pattern in parts matches:
(\[[^\][]*]\() Capture group 1, match from [...]( using a negated character class
https?:\/\/docs\.google\.com\/document\/d\/ Match the leading part of the url
( Capture group 2
[^\s\\\/)]+ Match 1+ chars other than a whitespace char, \ or /
) Close group 1
[^\s)]* Match optional chars other than a whitespace char or )
\) Match )
Regex demo
For example, a happy case scenario where all the keys to be replaced exist (note that you can omit the /m flag as there are no anchors in the pattern)
const text = "[Text](https://docs.google.com/document/d/unique-doc-id-here/edit)";
const regex = /(\[[^\][]*]\()https?:\/\/docs\.google\.com\/document\/d\/([^\s\\\/)]+)[^\s)]*\)/g;
function getNewHref(id) {
const replacements = {
"unique-doc-id-here": `https://docs.google.com/document/d/${id}`
}
return replacements[id];
}
const replacedText = text.replace(regex, (x, g1, g2) => g1 + getNewHref(g2)) + ")";
console.log(replacedText);
You can achieve this by getting the href link from a string by using RegEx and then by splitting that up using forward slash.
Try this (Descriptive comments has been added in the below code snippet) :
const text = 'Text';
// Get the href link using regex
const link = text.match(/"([^"]*)"/)[1];
// Split the string and get the array of link based on the forward slash.
const linkArr = link.split('/')
// get the unique ID from an array.
const uniqueID = linkArr[linkArr.indexOf('d') + 1]
console.log(uniqueID);

Remove a substring to make given word in javascript

I want to know about the algorithm for below question in JavaScript.
Check whether the given word can be "programming" or not by removing the substring between them. You can only remove one substring from the given the word.
Give answer in 'yes' and 'no'
example answer explanation
"progabcramming" yes remove substring 'abc'
"programmmeding" yes remove substring 'med'
"proasdgrammiedg" no u have to remove 2 subtring 'asd' and 'ied'
which is not allowed
"pxrogramming" yes remove substring 'x'
"pxrogramminyg" no u have to remove 2 subtring 'x' and 'y'
which is not allowed
Please tell me an algorithm to solve it
{
// will create a regexp for fuzzy search
const factory = (str) => new RegExp(str.split('').join('(.*?)'), 'i')
const re = factory('test') // re = /t(.*?)e(.*?)s(.*?)t/i
const matches = re.exec('te-abc-st') ?? [] // an array of captured groups
const result = matches
.slice(1) // first element is a full match, we don't need it
.filter(group => group.length) // we're also not interested in empty matches
// now result contains a list of captured groups
// in this particular example a single '-abc-'
}
I'm not sure how efficient this code is, but only thing i can come up with is using regular expression.
const word = 'programming';
const test = ['progabcramming', 'programmmeding', 'proasdgrammiedg', 'pxrogramming', 'pxrogramminyg', 'programming'];
// create regular expression manually
// const regexp = /^(p).+(rogramming)|(pr).+(ogramming)|(pro).+(gramming)|(prog).+(ramming)|(progr).+(amming)|(progra).+(mming)|(program).+(ming)|(programm).+(ing)|(programmi).+(ng)|(programmin).+(g)$/;
// create regular expression programmatically
let text = '/^';
word.split('').forEach((character, i) => {
text += i ? `(${word.substring(0, i)}).+(${word.substring(i)})|` : '';
});
text = text.substring(text.length - 1, 1) + '$/';
const regexp = new RegExp(text);
// content output
let content = '';
test.forEach(element => {
content += `${element}: ${regexp.test(element)}\n`;
});
document.body.innerText = content;

Vue: Make matching part of input bold, including special hyphens

I have made a simple select component in Vue with a search/filter system. Based on the user input I'm showing some Belgium city suggestions.
Working example: https://codesandbox.io/s/wandering-lake-lecok?file=/src/components/Select.vue (Sometimes there is an error message in Codesandbox. Refresh the build in browser and it should work)
I want to take the UX one step further and show the matching part of the user input bold and underlined. Therefore I have a working makeBold function. By splitting the suggestion string into multiple parts I can add a bold and underline tag and return the suggestion.
computed: {
results() {
return this.options.filter((option) =>
option.display_name
.replaceAll("-'", "")
.toLowerCase()
.includes(this.searchInput.replaceAll("-'", "").toLowerCase())
);
},
},
methods: {
makeBold(str, query) {
const n = str.toUpperCase();
const q = query.toUpperCase();
const x = n.indexOf(q);
if (!q || x === -1) {
return str;
}
const l = q.length;
return (
str.substr(0, x) + "<b><u>" + str.substr(x, l) + "</u></b>" + str.substr(x + l)
);
},
}
One problem, a lot of cities in Belgium use dashes and/or apostrophes. In the suggestions function I'm removing this characters so a user doesn't need to type them. But in the makeBold function I would like to make this characters bold and underlined.
For example:
When the input is 'sint j', 'sintj' or 'Sint-j' I want the suggestions to look like 'Sint-Jans-Molenbeek' and 'Sint-Job in't Goor'
Is there someone who can give me a breakdown on how to achieve this?
I would propose using a mask, to save the city name structure, and after you find the start and end index of substring in city name, restore the original string from mask, inserting the appropriate tags at the start and end index using a replacer function. this way you would not worry about any other non-word characters or other unexpected user input.
Here is the makeBold function:
makeBold(str, query) {
// mask all word characters in city name
const city_mask = str.replace(/\w/g, "#");
// strip city and query string from any non-word character
let query_stripped = query.toLowerCase().replace(/\W/g, "");
let string_stripped = str.replace(/\W/g, "");
// find the index of querystring in city name
let index = string_stripped.toLowerCase().indexOf(query_stripped);
if (index > -1 && query_stripped.length) {
// find the end position of substring in stripped city name
let end_index = index + query_stripped.length - 1;
// replacer function for each masked character.
// it will add to the start and end character of substring the corresponding tags,
// replacing all masked characters with the original one.
function replacer(i) {
let repl = string_stripped[i];
if (i === index) {
repl = "<b><u>" + repl;
}
if (i === end_index) {
repl = repl + "</u></b>";
}
return repl;
}
let i = -1;
// restore masked string
return city_mask.replace(/#/g, (_) => {
i++;
return replacer(i);
});
}
return str;
}
And here is the working sandbox. I've changed a bit your computed results to strip all non-word characters.
One way is to transform your search string into a RegExp object and use replace(regexp, replacerFunction) overload of string to achieve this.
For example the search string is "sintg"
new RegExp(this.searchInput.split("").join("-?"), "i");
Turns it into /s-?i-?n-?t-?g/gi
-? indicates optional - character and
"i" at the end is the RegExp case insensitive flag
Applied to codesandbox code you get this
computed: {
results() {
const regex = new RegExp(this.searchInput.split("").join("-?"), "i");
return this.options.filter((option) => option.display_name.match(regex));
},
},
methods: {
makeBold(str, query) {
const regex = new RegExp(query.split("").join("-?"), "i");
return str.replace(regex, (match) => "<b><u>" + match + "</u></b>");
},
},
Which gives this result
However there is a caveat: There will be errors thrown if the user puts a RegExp special symbol in the search box
To avoid this the initial search input text needs to get RegExp escape applied.
Such as:
new RegExp(escapeRegExp(this.searchInput).split("").join("-?"), "i");
But there is no native escapeRegExp method.
You can find one in Escape string for use in Javascript regex
There is also an escapeRegExp function in lodash library if it's already in your list of dependencies (saves you from adding another function)
You could create a function that removes all spaces and - in the query and city string. If the city includes the query, split the query string on the last letter and get the occurences of that letter in the query. Calculate the length to slice and return the matching part of the original city string.
const findMatch = (q, c) => {
const query = q.toLowerCase().replace(/[\s-]/g, "");
const city = c.toLowerCase().replace(/[\s-]/g, "");
if (city.includes(query)) {
const last = query.charAt(query.length - 1); // last letter
const occ = query.split(last).length - 1; // get occurences
// calculate slice length
const len = c.toLowerCase().split(last, occ).join(" ").length + 1;
return c.slice(0, len);
}
return "No matching city found."
}
const city = "Sint-Jan Test";
console.log(findMatch("sint j", city));
console.log(findMatch("sintj", city));
console.log(findMatch("Sint Jan t", city));
console.log(findMatch("sint-j", city));
console.log(findMatch("Sint-J", city));
console.log(findMatch("SintJan te", city));

Replace words in var string by elements of array in JS

My code automatically search the string for color names and adds random number suffixes and stores them as elements in an array.
What I want is to create a new string with the new modified elements of my array.
Problem comes when string has multiple occurrences of the same color name.
What I need is to replace these occurrences with the different elements of my exported array one by one.
(I don't want to split string in Array, replace the same elements with the other array in a brand new one and then join it to a string. I need to modify the original string)
Example:
String changes through user input so if i have:
str = ' little red fox is blue and red cat is blue';
then my code finds all color names and produces a new array like that:
array = [ 'red2354' , 'blue7856' , 'red324', 'blue23467'] ;
(my code adds RANDOM suffixes at the end of every color element but the order of my array is the same as the string's occurrences)
Desired Output:
str = ' little red2354 fox is blue7856 and red324 cat is blue23467 ';
I tried so far:
var str = ' little red fox is blue and red cat is blue ';
//I have split string to Array:
ar1 = [ "little","red","fox","is","blue","and","red","cat","is","blue"];
//var dup = matchmine(ar1) find all color duplicates :
var dup = ["red","blue","red","blue"];
//I've sorted all duplicates to appear only once for option B:
var dup2 = ["red","blue"];
//var res = modify(str) takes all color names and adds random suffixes:
var res= ["redA" , "blueA" , "redB", "blueB" ] ;
//I have also create a new array with all strings in case I needed to match lengths:
var final = [ "little","redA","fox","is","blueA","and","redB","cat","is","blueB"];
var i = ar1.length-1;
for ( i ; i >= 0; i--) {
var finalAr = str.replace(ar1[i],final[i]);
str = finalAr;}
alert(finalAr);
Problem is that loop goes and 1st time replace one by one all elements. So far so good but in the following loops replace the first again.
loops result:
str = 'little redAB fox is blueAB and red cat is blue '
Desired output:
str = 'little redA fox is blueA and redB cat is blueB '
Some of your logic remains hidden in your question, like on what grounds you determine which word should get a suffix, or how that suffix is determined.
So my answer cannot be complete. I will assume all words that are duplicate (including "is"). If you already know how to isolate the words that should be taken into consideration, you can just inject your word-selection-logic where I have looked for duplicates.
For the suffix determination, I provide a very simple function which produces a unique number at each call (sequentially). Again, if you have a more appropriate logic to produce those suffixes, you can inject your code there.
I suggest that you create a regular expression from the words that you have identified, and then call replace on the string with that regular expression and use the callback argument to add the dynamic suffix.
Code:
function markDupes(str, getUniqueNum) {
// 1. Get all words that are duplicate (replace with your own logic)
let dupes = str.match(/\S+/g).reduce(({words, dupes}, word) => ({
dupes: words.has(word) ? dupes.concat(word) : dupes,
words: words.add(word)
}), {words: new Set, dupes: []} ).dupes;
// Turn those words into a regular expression. Escape special characters:
dupes = dupes.map(word => word.replace(/[\/\\^$*+?.()|[\]{}]/g, '\\$&'))
.join("|");
let regex = new RegExp(dupes, "g");
// Make the replacement of duplicate words and call the callback function:
return str.replace(regex, word => word + getUniqueNum());
}
// Example has two inputs:
// 1. A function that determines the suffix:
// Every call should return a different value
var getUniqueNum = ((i = 1) => {
// ... here we choose to just return an incremental number
// You may implement some other (random?) logic here
return () => i++;
})();
// 2. The input string
let str = 'little red fox is blue and red cat is blue ';
// The call:
console.log(markDupes(str, getUniqueNum));
Make an object that works as a map for your replacers:
const replacers = {
red: 'redA',
blue: 'blueB'
}
Then split your string into an array of words and map over it, replacing as you go:
const inputStr = 'this is my red string blue words'
const stringArr = inputStr.split(' ')
const result = stringArr.map(word=> replacers[word]?replacers[word]:word). join(' ')

Convert string that contains HTML to sentences and also keep separator using Javascript

This is my string. It contains some HTML:
First sentence. Here is a Google link in the second sentence! The third sentence might contain an image like this <img src="http://link.to.image.com/hello.png" /> and ends with !? The last sentence looks like <b>this</b>??
I want to split the string to sentences (array), keep the HTML as well as the separator. Like this:
[0] = First sentence.
[1] = Here is a Google link in the second sentence!
[2] = The third sentence might contain an image like this <img src="http://link.to.image.com/hello.png" /> and ends with !?
[3] = The last sentence looks like <b>this</b>??
Can anybody suggest me a way to do this please? May be using Regex and match?
This is very close to what I’m after, but not really with the HTML bits:
JavaScript Split Regular Expression keep the delimiter
The easy part is the parsing; you can do this easily by wrapping an element around the string. Splitting the sentences is somewhat more intricate; this is my first stab at it:
var s = 'First sentence. Here is a Google. link in the second sentence! The third sentence might contain an image like this <img src="http://link.to.image.com/hello.png" /> and ends with !? The last sentence looks like <b>this</b>??';
var wrapper = document.createElement('div');
wrapper.innerHTML = s;
var sentences = [],
buffer = [],
re = /[^.!?]+[.!?]+/g;
[].forEach.call(wrapper.childNodes, function(node) {
if (node.nodeType == 1) {
buffer.push(node.outerHTML); // save html
} else if (node.nodeType == 3) {
var str = node.textContent; // shift sentences
while ((match = re.exec(str)) !== null) {
sentences.push(buffer.join('') + match);
buffer = [];
str = str.substr(re.lastIndex + 1);
re.lastIndex = 0; // reset regexp
}
buffer.push(str);
}
});
if (buffer.length) {
sentences.push(buffer.join(''));
}
console.log(sentences);
Demo
Every node that's either an element or unfinished sentence gets added to a buffer until a full sentence is found; it's then prepended to the result array.

Categories