Split array elements with white space into constituent elements in same array - javascript

Context
I'm trying to build a simple page that will let me display and filter bookmarks. Bookmarks can have zero to many tags and tags can be repeated between different bookmarks. The tags for each bookmark are stored in a data-attribute in the wrapper element for each bookmark (see below). My end goal is to be able to dynamically generate a menu of all the tags in use on the page and hide/show based on selections there. I'm hoping to avoid using any external libraries for this as I want this to be my homepage and to keep it really simple. So pure JS solutions preferred, unless you think I'm being needlessly picky.
Problem
Right now I'm stuck generating a useful array of the tags. I can filter for undefined elements in the array (when a bookmark doesn't have a tag) and I can filter for duplicate elements in the array (when tags appear on multiple bookmarks). But I'm majorly stuck when a bookmark has multiple tags. You can only use a data-attribute once per element, which means all tags for a particular bookmark have to be stored in a list in the data attribute.
What I'd like to happen is that for every bookmark that contains two or more tags, the script splits them and makes each one a new element of the array.
Here's an example, note the "data-tags" attribute in <article>:
<article class="note"
data-date="2021-08-04T03:40"
data-tags="Tag-One Tag-Two">
<section class="item">Bookmark Description</section>
<section class="tags">
Tag-One
Tag-Two
</section>
</article>
This is the part of the script that's working:
const notes = document.querySelectorAll('.note');
let tags = [];
// get tags
notes.forEach(note => tags.push(note.dataset.tags));
//Will look like this: tags[] = ("Tag-One Tag-Two")
//remove dupes
tags = tags.filter((value,index,array) => array.indexOf(value) == index);
//remove undefined values
tags = tags.filter(tag => tag !== undefined);
Below is the part I have trouble with. I've tried looping through the array multiple ways but each time splice() doesn't work in the way I expect and often the array is trimmed before later tags can be split up.
for (let index = 0; index < tags.length; index++) {
var value = tags[index];
if (value.indexOf(' ') > 0 ){
var newTags = value.split(' ');
newTags.forEach(newTag => tags.push(newTag));
tags.splice(index,1);
}
}
Any help would be hugely appreciated. If the earlier code is bad too, let me know!

Get all the elements with data-tags using the attribute select [data-tags]. Convert to an array using Array.from() and map it to an array of tags. Flatten the array of arrays by splitting the tags string, and filter empty tags. Use a Set to dedupe, and convert back to an array using Array.from() (or array spread).
const notes = document.querySelectorAll('[data-tags]');
const tags = Array.from(
new Set(
Array.from(notes, note => note.dataset.tags.split(/\s+/g))
.flat()
.filter(Boolean)
)
)
console.log(tags)
<article class="note"
data-date="2021-08-04T03:40"
data-tags="Tag-One Tag-Two">
</article>
<article class="note"
data-date="2021-08-04T03:40"
data-tags="Tag-Two">
</article>
<article class="note"
data-date="2021-08-04T03:40"
data-tags="">
</article>
<article class="note"
data-date="2021-08-04T03:40">
</article>
To fix your code, only push if there are tags, and then split the tags, and use array spread to add multiple items to the array, instead of a sub-array:
const notes = document.querySelectorAll('.note');
let tags = [];
// get tags
notes.forEach(note =>
note.dataset.tags && tags.push(...note.dataset.tags.split(/\s/g))
);
//Will look like this: tags[] = ("Tag-One Tag-Two")
//remove dupes
tags = tags.filter((value,index,array) => array.indexOf(value) == index);
//no need to filter empties
console.log(tags);
<article class="note"
data-date="2021-08-04T03:40"
data-tags="Tag-One Tag-Two">
</article>
<article class="note"
data-date="2021-08-04T03:40"
data-tags="Tag-Two">
</article>
<article class="note"
data-date="2021-08-04T03:40"
data-tags="">
</article>
<article class="note"
data-date="2021-08-04T03:40">
</article>

You can extract all the tags and split on space and filter on truthy value and then using Set get all unique tags.
const notes = document.querySelectorAll('.note');
const tags = [...new Set(
[...notes]
.flatMap(note => note.dataset.tags.split(/\s+/g))
.filter(Boolean)
)];
console.log(tags);
<article class="note" data-date="2021-08-04T03:40" data-tags="Tag-One Tag-Two">
<section class="item">Bookmark Description</section>
<section class="tags">
Tag-One
Tag-Two
</section>
</article>
<article class="note" data-date="2021-08-04T03:40" data-tags="Tag-three Tag-Two">
<section class="item">Bookmark Description</section>
<section class="tags">
Tag-One
Tag-Two
</section>
</article>

Related

Excluding inner tags from string using Regex

I have the following text:
If there would be more <div>matches<div>in</div> string</div>, you will merge them to one
How do I make a JS regex that will produce the following text?
If there would be more <div>matches in string</div>, you will merge them to one
As you can see, the additional <div> tag has been removed.
I would use a DOMParser to parseFromString into the more fluent HTMLDocument interface to solve this problem. You are not going to solve it well with regex.
const htmlDocument = new DOMParser().parseFromString("this <div>has <div>nested</div> divs</div>");
htmlDocument.body.childNodes; // NodeList(2): [ #text, div ]
From there, the algorithm depends on exactly what you want to do. Solving the problem exactly as you described to us isn't too tricky: recursively walk the DOM tree; remember whether you've seen a tag yet; if so, exclude the node and merge its children into the parent's children.
In code:
const simpleExampleHtml = `<div>Hello, this is <p>a paragraph</p> and <div>some <div><div><div>very deeply</div></div> nested</div> divs</div> that should be eliminated</div>`
// Parse into an HTML document
const doc = new DOMParser().parseFromString(exampleHtml, "text/html").body;
// Process a node, removing any tags that have already been seen
const processNode = (node, seenTags = []) => {
// If this is a text node, return it
if (node.nodeName === "#text") {
return node.cloneNode()
}
// If this node has been seen, return its children
if (seenTags.includes(node.tagName)) {
// flatMap flattens, in case the same node is repeatedly nested
// note that this is a newer JS feature and lacks IE11 support: https://caniuse.com/?search=flatMap
return Array.from(node.childNodes).flatMap(child => processNode(child, seenTags))
}
// If this node has not been seen, process its children and return it
const newChildren = Array.from(node.childNodes).flatMap(child => processNode(child, [...seenTags, node.tagName]))
// Clone the node so we don't mutate the original
const newNode = node.cloneNode()
// We can't directly assign to node.childNodes - append every child instead
newChildren.forEach(child => newNode.appendChild(child))
return newNode
}
// resultBody is an HTML <body> Node with the desired result as its childNodes
const resultBody = processNode(doc);
const resultText = resultBody.innerHTML
// <div>Hello, this is <p>a paragraph</p> and some very deeply nested divs that should be eliminated</div>
But make sure you know EXACTLY what you want to do!
There's lots of potential complications you could face with data that's more complex than your example. Here are some examples where the simple approach may not give you the desired result.
<!-- nodes where nested identical children are meaningful -->
<ul>
<li>Nested list below</li>
<li>
<ul>
<li>Nested list item</li>
</ul>
</li>
</ul>
<!-- nested nodes with classes or IDs -->
<span>A span with <span class="some-class">nested spans <span id="DeeplyNested" class="another-class>with classes and IDs</span></span></span>
<!-- places where divs are essential to the layout -->
<div class="form-container">
<form>
<div class="form-row">
<label for="username">Username</label>
<input type="text" name="username" />
</div>
<div class="form-row"
<label for="password">Password</label>
<input type="text" name="password" />
</div>
</form>
</div>
Simple approach without using Regex by using p element of html and get its first div content as innerText(exclude any html tags) and affect it to p, finally get content but this time with innerHTML:
let text = 'If there would be more <div>mathces <div>in</div> string</div>, you will merge them to one';
const p = document.createElement('p');
p.innerHTML = text;
p.querySelector('div').innerText = p.querySelector('div').innerText;
console.log(p.innerHTML);

Puppeter delete node inside element

I want to scrape a page with some news inside.
Here it's an HTML simplified version of what I have :
<info id="random_number" class="news">
<div class="author">
Name of author
</div>
<div class="news-body">
<blockquote>...<blockquote>
Here it's the news text
</div>
</info>
<info id="random_number" class="news">
<div class="author">
Name of author
</div>
<div class="news-body">
Here it's the news text
</div>
</info>
I want to get the author and text body of each news, without the blockquote part.
So I wrote this code :
let newsPage = await newsPage.$$("info.news");
for (var news of newsPage){ // Loop through each element
let author = await news.$eval('.author', s => s.textContent.trim());
let textBody = await news.$eval('.news-body', s => s.textContent.trim());
console.log('Author :'+ author);
console.log('TextBody :'+ textBody);
}
It works well, but I don't know how to remove the blockquote part of the "news-body" part, before getting the text, how can I do this ?
EDIT : Sometimes there is blockquote exist, sometime not.
You can use optional chaining with ChildNode.remove(). Also you may consider innerText more readable.
let textMessage = await comment.$eval('.news-body', (element) => {
element.querySelector('blockquote')?.remove();
return element.innerText.trim();
});

Selecting a childnode by inner text from a NodeList

First part [solved]
Given this HTML
<div id="search_filters">
<section class="facet clearfix"><p>something</p><ul>...</ul></section>
<section class="facet clearfix"><p>something1</p><ul>...</ul></section>
<section class="facet clearfix"><p>something2</p><ul>...</ul></section>
<section class="facet clearfix"><p>something3</p><ul>...</ul></section>
<section class="facet clearfix"><p>something4</p><ul>...</ul></section>
</div>
I can select all the section with
const select = document.querySelectorAll('section[class^="facet clearfix"]');
The result is a nodelist with 5 children.
What I'm trying to accomplish is to select only the section containing the "something3" string.
This is my first attempt:
`const xpathResult = document.evaluate( //section[p[contains(.,'something3')]],
select, null, XPathResult.ANY_TYPE, null );
How can I filter the selection to select only the node I need?
Second part:
Sorry for keeping update the question but it seems this is a problem out of my actual skill..
Now that i get the Node i need to work in what i've to do it's to set a custom order of the li in the sections:
<ul id="" class="collapse">
<li>
<label>
<span>
<input>
<span>
<a> TEXT
</a>
</span>
</input>
</span>
</label>
</li>
<li>..</li>
Assuming there are n with n different texts and i've a custom orders to follow i think the best way to go it would be look at the innertex and matching with an array where i set the correct order.
var sorOrder = ['text2','text1','text4','text3']
I think this approach should lead you to the solution.
Giving your HTML
<div id="search_filters">
<section class="facet clearfix"><p>something</p><ul>...</ul></section>
<section class="facet clearfix"><p>something1</p><ul>...</ul></section>
<section class="facet clearfix"><p>something2</p><ul>...</ul></section>
<section class="facet clearfix"><p>something3</p><ul>...</ul></section>
<section class="facet clearfix"><p>something4</p><ul>...</ul></section>
</div>
I would write this js
const needle = "something3";
const selection = document.querySelectorAll('section.facet.clearfix');
let i = -1;
console.info("SELECTION", selection);
let targetIndex;
while(++i < selection.length){
if(selection[i].innerHTML.indexOf(needle) > -1){
targetIndex = i;
}
}
console.info("targetIndex", targetIndex);
console.info("TARGET", selection[targetIndex]);
Then you can play and swap elements around without removing them from the DOM.
PS. Since you know the CSS classes for the elements you don't need to use the ^* (start with) selector. I also improved that.
PART 2: ordering children li based on content
const ul = selection[targetIndex].querySelector("ul"); // Find the <ul>
const lis = ul.querySelectorAll("li"); // Find all the <li>
const sortOrder = ['text2', 'text1', 'text4', 'text3'];
i = -1; // We already declared this above
while(++i < sortOrder.length){
const li = [...lis].find((li) => li.innerHTML.indexOf(sortOrder[i]) > -1);
!!li && ul.appendChild(li);
}
This will move the elements you want (only the one listed in sortOrder) in the order you need, based on the content and the position in sortOrder.
Codepen Here
In order to use a higher order function like filter on your nodelist, you have to change it to an array. You can do this by destructering:
var select = document.querySelectorAll('section[class^="facet clearfix"]');
const filtered = [...select].filter(section => section.children[0].innerText == "something3")
This answer explains the magic behind it better.
You can get that list filtered as:
DOM Elements:
Array.from(document.querySelectorAll('section[class^="facet clearfix"]')).filter(_ => _.querySelector('p').innerText === "something3")
HTML Strings:
Array.from(document.querySelectorAll('section[class^="facet clearfix"]')).filter(_ => _.querySelector('p').innerText === "something3").map(_ => _.outerHTML)
:)

Get child elements within casper.each

Using CasperJS 1.1 with the following codes, I'm able to fetch useful DOM html from web page.
casper.each(c.getElementsInfo(xpath), function(casper, element, j) {
var html = element["html"].trim();
if(html.indexOf('Phone') > -1) {
// what should I put here?
}
});
However, I want to access & obtain the child elements of the element. How can I achieve this? Element's HTML source (a.k.a the value of html) is as follow:
Loop 1
<div class="fields">
Phone
</div>
<div class="values">
12345678 (Mr. Lee) </div>
Loop 2
<div class="fields">
Emergency Phone
</div>
<div class="values">
23456789 (Emergency)
</div>
Loop 3
<div class="fields">
Opening Hours
</div>
<div class="values">
9:00am-6:30pm(Weekday) /
Close on Sundays and Public Holidays(Can be booked)(Holiday)
</div>
Loop 4
<div class="fields">
Last Update
</div>
<div class="values">
11/06/14 </div>
The above HTML is badly formatted, and contains a lot of whitespaces.
The data I wanted to fetch is:
Phone: 12345678
Emergency Phone: 23456789 (Emergency)
Opening Hours: 9:00am-6:30pm(Weekday) / Close on Sundays and Public Holidays(Can be booked)(Holiday)
Last Update: 11/06/14
Tried RegEx, but the RegEx is too complicated.
I don't recommend doing this with regular expressions. It can be easily done with some selectors, but it has to be done in the page context (inside of the evaluate() callback), because DOM nodes cannot be passed to the outside.
CasperJS provides a helper function for matching DOM nodes by XPath with __utils__.getElementsByXPath() through the ClientUtils module that is always automatically inserted. The result of that function is an array, so the normal forEach() pattern applies. DOM nodes can be used as context nodes for selecting child elements with el.querySelector(".class").
var info = casper.evaluate(function(xpath){
var obj = {};
__utils__.getElementsByXPath(xpath).forEach(function(el){
obj[el.querySelector(".fields").textContent.trim()] =
el.querySelector(".values").textContent.trim();
});
return obj;
}, yourXPathString);
If you want to select elements based on a CSS selector use the following:
var info = casper.evaluate(function(cssSelector){
var obj = {};
__utils__.findAll(cssSelector).forEach(function(el){
obj[el.querySelector(".fields").textContent.trim()] =
el.querySelector(".values").textContent.trim();
});
return obj;
}, yourCssSelector);

jQuery search for paragraphs containing multiple words

I have an unordered list called test
<ul id='test'></ul>
it is dynamically populated with data via ajax. Each item 'li' is a div containing 'p' paragraphs. Each paragraph contains some information.
Ex:
<li> <div> <p> test </p> </div> </li>
<li> <div> <p> hi how is it going?</p> </div> </li>
<li> <div> <p> not a test</p> </div> </li>
<li> <div> <p> whoa</p> </div> </li>
I also have a search box which i can get a search term from, I use:
var searchTerm = $("#search").val().trim().split(' '); // an array of words searched for
What I am trying to do is find a way to select all 'li' elements which contain all or some of the search words, but I'm not sure how to approach it best.
Currently, I am doing this:
var results = $('p:contains("'+ searchTerm[0] +'")');
to get an exact match on the first term, but I want to search for multiple terms, not just one.
I would want to search for 'test hi' and get back three nodes cause it searches for 'test' and 'hi'.
I also thought of:
var results2 = $('p').filter(function( index ) {
return ( this +':contains("'+ searchTerm +'")' );
});
Anyone point me in the right direction?
You could do some black magic with the selector, like this:
var results = $('p:contains("' + searchTerm.join('"), p:contains("') + '")');
This looks hard, but I'll explain it.
It joins the search terms with "), p:contains(". Then it just adds the missing p:contains(" and ") to the ends of the result string and searches for it.
A combination of $.filter and $.each (or array.forEach, if you don't care about ie8) can also be of use here:
var searchTerms = ["how", "test"];
$('div').filter(function () {
$text = $(this).text();
var found = 0;
$.each(searchTerms, function (index, term) {
found = $text.indexOf(term) > -1 ? found +1 : found;
})
return found;
}).addClass('match');
jsFiddle

Categories