Javascript match and highlight advanced - javascript

I'm trying to build a search engine with javascript and PHP. So far, I have had success, but I still have a bit problem with highlighting search terms and limiting the words. The backend response is returning a JSON with a title and description, and the description could be long, so I want to limit the text and highlight those words that the user is searching for.
So basically, if backend response is like:
[{
"title": "Test topic",
"description": "<p>This topic contains some words that user is searching for</p>
<div><h1>It could contain anything</h1>
<p> It could contain anything. Important thing is that it should work.</p>
<img src="some_pic.jpg/>"
}]
So, if I'm searching for something like "What important topic contains". The user should see a parsed version of the description (or whatever string given), with limiting the text radius around the search term. And for this, perfect response would be like:
This topic contains some words that user... It could contain
anything. Important thing is that it would...
So I didn't pay attention to the details here in words, to be exact with the radius, but I think you got the idea.
So what I did so far is
map the response from the backend
delete html code (in order to prevent ex. "...ong>") `
split the search term words and execute the function to find them, make them bold and make a radius around them
show to the user in a view
Where is the problem? I have no logic to match 2 words, I'm splitting search term, creating an array and pushing parsed version into that array.
And it looks like: (important part of the component)
{
searchResults.length > 0 ? (
<div>
<span className={`text-xs text-blue`}>Threads</span>
{searchResults?.map((searched: any, index: Key) => {
// Replace html code (just in case)
const searchTerm = searched.description.replace(/<\/?[^>]+(>|$)/g, " ");
// Create array to push search term
const searchArray: any[] = [];
// Split search words into array and match results separately
search.split(" ").map((text, i) => {
// This function will be in the next code
searchArray.push(searchText(searchTerm, text));
});
// Just a view here
return (
<Link key={index} href={"/forum/thread/" + searched?.slug}>
<article
className={`w-full bg-gray-dark mb-2 px-3 py-1.5 rounded-xl text-gray-light`}
>
<h2>{searched.title}</h2>
{searchArray.map((match, i) => {
return (
<>
{/*Just in case, show it as html*/}
<div
className={`quill post-content text-xs text-gray`}
dangerouslySetInnerHTML={{ __html: match }}
/>
</>
);
})}
</article>
</Link>
);
})}
</div>
) : null;
}
And finally, a logic to highlight the text and make a radius around that word.
const searchText = (description: string, search: string) => {
const radius = 30;
const indexPosition = description.search(search)
const startPosition = indexPosition - radius < 0 ? indexPosition : indexPosition - radius
const endPosition = indexPosition + radius + search.length
return (`...${description.slice(startPosition, endPosition)}...`).replace(search, `<span class="font-bold text-yellow">${search}</span>`)
}
So in all that case I would see at the end (with same search term "What important topic contains")
...anything. Important thing is that it would...
...This topic contains some words that...
...This topic contains some words that user...
... It could contain anything. Important thing...
Basically searching every word couple of times. Any ideas how to improve that logic?

Here's a two-pass solution. It works as follows:
Setup
Initialize an array of fencepostPairs ([start: number, end: number]), to track where each span starts and ends
Combine search terms into a dynamically-created regex termsRe
Sanitize description
First pass
Iterate over all matches of termsRe against the sanitized description, expanding each into a span of text depending on the configured "radius", truncating at the start and end of the sanitized description's length
For each span, check if it overlaps an existing pair of fenceposts (only need to check in backwards direction, as the regex matches in order of placement within the string). If it overlaps, replace the overlapping pair with the expanded range; if not, append the new pair
Second pass
Map the fencepost pairs to their respective text spans, adding HTML to highlight the matches within each span, using termsRe
Join the spans with line breaks
const NON_WORD_CHARS = /[^\p{L}\p{N}\p{M}]+/u
const ELIPSIS = '…'
const RADIUS = 30
const formatDescription = (description, search) => {
const fencepostPairs = []
// not technically necessary given we split on all non-word chars, but it'd be necessary if
// e.g. we wanted to include "$" as a word char
const regexEscape = (x) => x.replaceAll(/[\x00-\x7f]/g, m => /\W/.test(m) ? '\\x' + m.codePointAt(0).toString(16).padStart(2, '0') : m)
// sort by length descending to ensure longer matches are preferred over shorter ones
const terms = search.split(NON_WORD_CHARS).sort((a, b) => b.length - a.length)
// lookahead (?=...) and lookbehind (?<=...) groups to ensure only full words matched
const termsRe = new RegExp(`(?<=${NON_WORD_CHARS.source}|^)(${terms.map(regexEscape).join('|')})(?=${NON_WORD_CHARS.source}|$)`, 'giu')
const sanitized = description.replace(/<\/?[^>]+(>|$)/g, ' ').replace(/\s+/g, ' ').trim()
for (const { 0: match, index } of sanitized.matchAll(termsRe)) {
const start = Math.max(0, index - RADIUS)
const end = Math.min(sanitized.length, index + match.length + RADIUS)
const overlapIdx = fencepostPairs.findIndex(([s, e]) => e >= start)
if (overlapIdx === -1) {
fencepostPairs.push([start, end])
} else {
const candidates = [start, end, ...fencepostPairs[overlapIdx]]
fencepostPairs[overlapIdx] = [Math.min(...candidates), Math.max(...candidates)]
}
}
const getFragment = ([start, end]) => {
const fragment = sanitized.slice(start, end)
return [
start && ELIPSIS,
fragment.replaceAll(termsRe, '<em class="highlight">$&</em>'),
end !== sanitized.length && ELIPSIS,
].filter(Boolean).join('')
}
const fragments = fencepostPairs.map(getFragment)
return fragments.join('<br>')
}
const search = 'What other important topic contains'
const description = `<p>Important: This topic contains some words that user is searching for</p>
<div><h1>It could contain anything</h1>
<p> It could contain anything. The important thing is that it should work.</p>
<img src="some_pic.jpg">
lorem ipsum dolor sit amet lorem ipsum dolor sit amet lorem ipsum dolor sit amet lorem ipsum dolor sit amet lorem ipsum dolor sit amet lorem ipsum dolor sit amet lorem ipsum dolor sit amet lorem ipsum dolor sit amet lorem ipsum dolor sit amet lorem ipsum dolor sit amet lorem ipsum dolor sit amet lorem ipsum dolor sit amet
<p>Here is some more text about another topic, which contains some more words that user is searching for.</p>
<p>It is also important.</p>`
document.querySelector('#target').innerHTML = formatDescription(description, search)
#target {
font-family: sans-serif;
}
.highlight {
font-style: normal;
background: yellow;
}
<div id="target"></div>

Related

Programmatically get the Range of a text spread over multiple elements

I have a code element, and I know the text I'm looking for is inside it, for example:
<p>
Lorem ipsum <span class="bold">dolor</span> sit amet
</p>
Note the span that is used for styling specific words.
Now, assume I have a reference to the p element, and I want to programmatically mark the ipsum dolor sit part, how can achieve that?
You can use the Selection API with a Range argument to programmatically select text in an element.
The Range start and end positions accept a Child Node number, or Character inside a Text node. In our case, we need to reach the Text nodes to direct to the text position inside them (in our example, it will start on the first Text node of p, in position 11, and will end on the last Text in position 4).
To find the right node and the text position inside it, use the next function:
const findPositionInsideTree = (node, position) => {
if (node.nodeType === Node.TEXT_NODE) {
return { node, position };
}
for (let child of node.childNodes) {
if (position <= child.textContent.length) return findPositionInsideTree(child, position);
position -= child.textContent.length;
}
};
This recursive code loops over the child nodes and counts the expected position inside each node.
And now you only need to call this function for your text, create a Range and add it to the Selection:
const textStart = element.textContent.indexOf('ipsum dolor sit');
const textEnd = textStart + 'ipsum dolor sit'.length;
const start = findPositionInsideTree(element, textStart);
const end = findPositionInsideTree(element, textEnd);
const range = new Range();
range.setStart(start.node, start.position);
range.setEnd(end.node, end.position);
window.getSelection().removeAllRanges()
window.getSelection().addRange(range)
Maybe you can use this:
const text = pElement.textContent;
const words = text.split(" ");
const startIndex = words.indexOf("ipsum");
const spanElement = document.createElement("span");
spanElement.classList.add("bold");
spanElement.textContent = words.slice(startIndex, startIndex + 3).join(" ");
pElement.innerHTML = words.slice(0, startIndex).join(" ") + spanElement.outerHTML + words.slice(startIndex + 3).join(" ");

JavaScript - Truncate innerHTML string after N numbers of characters with strip any tag or arrtibute inside it

I wastry to add three dots after 130 characters of string which grabbed from a DIV through the JavaScript innerHTML method. Inside the innerHTML may have more tags and attributes which need to skip while counting. Also need to keep and re-assign the truncated HTML into the same DIV after the operation completed.
Here is some sample input string and expected outputs -
Input 1:
<p>There are many <i>variations</i> of passages of <b>Lorem Ipsum</b> available, but the majority have suffered alteration in some form, by injected humour, or randomised words that don't look even slightly believable.</p>
Output 1:
<p>There are many <i>variations</i> of passages of <b>Lorem Ipsum</b> available, but the majority have suffered alteration in some form, by injecte...</p>
input 2:
<p><span class="header3">There are many variations of passages of Lorem Ipsum available, but the majority have suffered alteration in some form, by injected humour, or randomised words that don't look even slightly believable.</span></p>
output 2:
<p><span class="header3">There are many variations of passages of Lorem Ipsum available, but the majority have suffered alteration in some form, by injecte...</span></p>
Input 3:
<h4><span class="santral-pullquote-32"><span class="body-regular">There are many variations of passages of Lorem Ipsum available, but the majority have suffered alteration in some form, by injected humour, or randomised words that don't look even slightly believable.</span></span></h4>
Output 3:
<h4><span class="santral-pullquote-32"><span class="body-regular">There are many variations of passages of Lorem Ipsum available, but the majority have suffered alteration in some form, by injecte...</span></span></h4>
Using the following function Input type 1 is only working but not others:
function cutString = (str) => {
let stringCount = 0;
let keepCounting = true;
let ans = '';
let openTags = [];
for (let i = 0; i < str.length; i++) {
if (str[i] == "<") {
keepCounting = false;
if (str[i + 1] != `/`) openTags.push(str[i + 1])
continue;
}
if (str[i] == ">") {
keepCounting = true;
if (str[i - 2] == `/`) openTags.splice(openTags.indexOf(str[i - 2]), 1)
continue;
}
if (keepCounting) stringCount++
if (stringCount == 131) {
ans = str.slice(0, i);
break;
}
}
openTags.forEach(tag => ans = ans + `…</${tag}>`);
return ans;
}
I'm using pure JavaScript (no jQuery).
Any help would be greatly appreciated!
Thanks in advance.
you can try this. Note that this code updates the html directly, if you wish to keep the original content, clone the node you want to play with, and use the cloned version to run the trim:
function map_node(f_objNode,f_currLimit){
for(var i=0;i<f_objNode.childNodes.length;i++){
var l_node = f_objNode.childNodes[i];
if(f_currLimit == 0){ //max length exceeded, remove node
l_node.remove();
i--; //move index backwards to account for the deleted node
continue;
}else if(l_node.nodeType == 3){ //textnode
var l_strText = l_node.nodeValue;
if((f_currLimit - l_strText.length) < 0){ //the text length
//exceeds the limit
//trim and 0 the limit
l_node.nodeValue = l_strText.substring(0,f_currLimit) + '...';
f_currLimit = 0;
}else{ //max length is below the text length, update the limit
f_currLimit -= l_strText.length
}
}
//process the children of the node,
//you can add check here to skip the call if no children
f_currLimit = map_node(l_node,f_currLimit);
}
return f_currLimit
}
//this is how you use it.
function test(){
var l_textLimit = 100; //your limit
var l_div = document.getElementById("test-div"); //your node
var l_data = map_node(l_div,l_textLimit); //parse the shit out of it
//not really used, but if you replace the
//the limit integer with {} you can add postprocessing
console.log(l_data)
}
as a side note, be aware of parsing as a way to tokenize html. It certainly has its uses, but it can get pretty complicated if you want to maintain the structure of the node. In such cases, it is easier and more efficient to roll with the DOM directly. Going that direction also has issues- larger DOM (sub)trees are not ideal target for recursive processing, so you need to pick your approach for the concrete context.

How to find the html tag of any text in which the text is contained [duplicate]

I'm attempting to find a specific string in a document which potentially can have the text split by other tags (i.e. "< p > This is an < span > example < /span > < /p >"), where I would want to find the string "This is an example" in a much larger document and return the first parent element in belongs to (in this case, a < p > tag)
I wrote some code to find a string's index in an arbitrary web document...in such a way that it hopefully accommodates the string being split. It returns the index where the string starts. I'm wondering how to either do this more efficiently OR if this is a decent way, I'm wondering how, given the index of a string in a $("body").html() string, how to retrieve the parent element containing that index.
EDIT: Perhaps this was unclear. I am looking for the parent of a string in a document, and I cannot make any assumptions about where the string may be or what tag its parent may be. So, I call $("body").html() and attempt to find the index of the substring in the html "string". Probably certainly inefficient, I'm really desperate for help.
function get_string_parent(str) {
var p = null;
var split = str.split(' ');
var body_html = $("body").html();
var lower_ind = 0;
var upper_ind = split.length;
var STOPPING_LENGTH = 3; //give up after not finding string of length 3...
var ind = -1;
do { //shrink string until a snippet is found
ind = body_html.indexOf(split.slice(lower_ind, upper_ind).join(' '));
upper_ind--;
}
while (ind < 0 && upper_ind > STOPPING_LENGTH);
console.log("FOUND AT INDEX: ", ind);
//console.log("String around index: ", body_html.slice(ind - 10, ind + 10));
//I"M UNSURE OF, GIVEN A VALID "IND", how to get the parent element at that index
return p;
Thanks for your time, I'm not familiar with webdev and I'm almost certainly in over my head.
This will get you started. You can use :contains() selector for finding string in html.
function getStringParent(str) {
return $("p:contains('"+ str +"')");
}
var parent = getStringParent('This is an example');
console.log('found ' + parent.length + ' items' + '\n');
console.log(parent);
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<p> This is an <span> example </span> </p>
Use padolsey's findAndReplaceDOMText tool and pass it the replace option so that it returns the node you're looking for.
You need recursion for this:
function recursivelySearchString(str,from){
if(from.textContent.indexOf(str)==-1)
return null // doesn't contain the string, stop
var children = Array.from(from.children)
if(children.length>0){
// current element has children, look deeper
for(var i=0;i<children.length;i++){
var found = recursivelySearchString(str, children[i])
if(found)
return found
}
}
// none of the children matched, return the parent
return from
}
Calling recursivelySearchString('foobar',document.body) will return the closest element containing the phrase. Note it will return the element wrapped in a jQuery selector. If nothing is found it returns null.
Example:
function recursivelySearchString(str,from){
if(from.textContent.indexOf(str)==-1)
return null // doesn't contain the string, stop
var children = Array.from(from.children)
if(children.length>0){
// current element has children, look deeper
for(var i=0;i<children.length;i++){
var found = recursivelySearchString(str, children[i])
if(found)
return found
}
}
// none of the children matched, return the parent
return from
}
var found = recursivelySearchString('dolores',document.body)
found.style.backgroundColor = 'yellow'
<div>
<p>
Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua.
</p>
<p>
At vero eos et accusam et <span>justo duo dolores et ea rebum.</span>
Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet.
</p>
</div>

how to use javascript for string words comparision

I am using two text areas. Project is about online typing test. I used two text area. First textarea contains the matter to be typed in second textarea. For calculating the the net typing speed I need a javascript diff algorithm.
Javascript Diff Algorithm algo fits my all requirements..which uses this
jsdiff.js
javascript file for differencing of two strings. and
JS Diff Demo
is a demo which uses the same javascript file...You should have look of this demo. But I how can I know count correct words typed? Trouble is that the javascript file provided is not using any comments nor gives any documentation.
I'm not sure if you need much more explanation than the comment I placed above. I like the diff-highlighting your link shows, but if all you're after is counting the diffs, why does something like this not work? http://jsfiddle.net/vSySu/
var arr1 = $('#text1').val().split(' ');
var arr2 = $('#text2').val().split(' '); // split on whatever string or regex you want.
var diffs = 0;
for (var i = 0; i < arr1.length; i++) {
if (arr1[i] !== arr2[i]) {
diffs++;
}
}
alert(diffs);
You could use a combination of a lenvenshtein algorithm to find the accuracy, and some basic string manipulation to count the words that are different. This can be improved but you get the idea:
function wordAccuracy(str1, str2) {
var len = str1.length,
distance = levenshtein(str1, str2),
words1 = str1.split(' '),
words2 = str2.split(' ');
return {
accuracy: 100 - (0|(distance * 100) / len) +'%',
fail: words1.filter(function(word, idx){
return word != words2[idx];
}).length
}
}
// Example:
var str1 = 'Lorem ipsum dolor sit amet consectetur adipiscing elit';
var str2 = 'Lorme ipsmu dolor sit maet cnsectetur adipiscing elot';
console.log(wordAccuracy(str1, str2));
//^ {
// accuracy: '86%'
// fail: 5
// }

Need help making this small javascript 'game'

Trying to use javascript to make a game/script that:
- each number from 0-20 equal a specific word. Ex: 0=saw, 1=tie, 2=noah.
- a number pops up on the screen and the user needs to type in the word that = that number.
- Then the program says if I was correct or not and adds to my points.
Demo: http://jsfiddle.net/elclanrs/gfQsP/
var points = 0;
function game() {
var words = 'lorem ipsum dolor sit amet consecteur adipisci elit'.split(' ');
var rand = -~(Math.random() * words.length);
var result = prompt('Word number '+ rand +'?');
var match = words[rand-1] === result.toLowerCase();
alert( match ? 'Good' : 'Bad');
match && points++;
}
well this can be easy:
just make a object containing key value pairs like this:
var obj = {
key1: value1,
key2: value2
};
Then ask the player the value of particular number. Loop through this object to scan the value of that particular number to match the value as entered by the player. if found increment the points counter.
Its pretty straight forward.

Categories