Extracting images from RSS/Atom feeds

Extracting images from RSS/Atom feeds - javascript

I'm wondering how to extract images from RSS and Atom feeds so I can use them as a thumbnail when display the feed in a container with it's relative Title, Description and Link. So far my code, (shown below), grabs images from only certain feed types, I'm wondering how I can grab every image my script comes across.
if (feed_image_type == "description") {
item_img = $($(this).find('description').text()).find("img").attr("src");
} else if (feed_image_type == "encoded") {
item_img = $($(this).find('encoded').text()).find("img").attr("src");
} else if (feed_image_type == "thumbnail") {
item_img = $(this).find('thumbnail').attr('url');
} else {
item_img = $(this).find('enclosure').attr('url');
}
For example, I cannot figure out how I would grab the image link from the code rss feed snippet below:
<description>
<![CDATA[
<img src="https://i.kinja-img.com/gawker-media/image/upload/s--E93LuLOd--/c_fit,fl_progressive,q_80,w_636/hd6cujrvf1d72sbxsbnr.jpg" /><p>With a surprise showing of skill and, at one point, a miracle, the bottom-ranked team in the European <em>League </em>Championship Series will not end the summer winless.<br></p><p>Read more...</p>
]]>
</description>

Using these sources:
parse html inside cdata using jquery or javascript
jQuery.parseXML
https://www.w3schools.com/xml/dom_cdatasection.asp
It is essential that you get your content correctly as XML, by setting the dataType to 'xml'.
This code is self-contained and works:
var xmlString = '<Customer><![CDATA[ <img src="y1" /> ]]></Customer>';
var xmlObj = $.parseXML(xmlString);
var cdataText = xmlObj.firstChild.firstChild.textContent;
var jqueryObj = $(cdataText);
var imgUrl = jqueryObj.find('img').attr('src');
console.log(imgUrl);
This is slightly imprecise because you don't give quite enough information to exactly reproduce your situation. I will start as though this from your question is the only part of your code:
if (feed_image_type == "description") {
item_img = $($(this).find('description').text()).find("img").attr("src");
}
This ought to get close:
if (feed_image_type == "description") {
var cdataText = $(this).firstChild.firstChild.textContent;
var jqueryObj = $(cdataText);
item_img = jqueryObj.find('img').attr('src');
}

You can also try this.
let str = `<description>
<![CDATA[
<img src="https://i.kinja-img.com/gawker-media/image/upload/s--E93LuLOd--/c_fit,fl_progressive,q_80,w_636/hd6cujrvf1d72sbxsbnr.jpg" /><p>With a surprise showing of skill and, at one point, a miracle, the bottom-ranked team in the European <em>League </em>Championship Series will not end the summer winless.<br></p><p>Read more...</p>
]]>
</description>`;
//We need to strip CDATA in our case. Otherwise the parser will not parse the contents inside it.
str = str.replace("<![CDATA[", "").replace("]]>", "")
let parser = new DOMParser();
let xmlDoc = parser.parseFromString(str,"text/xml");
let images = [...xmlDoc.querySelectorAll('img')].map(image=>image.getAttribute('src'))

Related

Javascript-Using Parsed Data From a Query String as a Heading

I am wondering how to take the information from a parsed query string and use it to display on the top of my page. Ignore the window.alert part of the code, I was just using that to verify that the function worked.
For example: If the user had choices of Spring, Summer, Winter, and Fall, whichever they chose would display a a header on the next page. So if (seasonArray[i]) = Fall, I want to transfer that information into the form and display it as a element. I'm sure this is easily done, but I can't figure it out. Thanks, in advance.
function seasonDisplay() {
var seasonVariable = location.search;
seasonVariable = seasonVariable.substring(1, seasonVariable.length);
while (seasonVariable.indexOf("+") != -1) {
seasonVariable = seasonVariable.replace("+", " ");
}
seasonVariable = unescape(seasonVariable);
var seasonArray = seasonVariable.split("&");
for (var i = 0; i < seasonArray.length; ++i) {
window.alert(seasonArray[i]);
}
if (window != top)
top.location.href = location.href
}

<h1 id="DynamicHeader"></h1>
Replace the alert line with:
document.getElementById("DynamicHeader").insertAdjacentHTML('beforeend',seasonArray[i]);

text string output stops after first space, js/html

I apologize in advance, this is the first Stack Overflow question I've posted. I was tasked with creating a new ADA compliant website for my school district's technology helpdesk. I started with minimal knowledge of HTML and have been teaching myself through w3cschools. So here's my ordeal:
I need to create a page for all of our pdf and html guides. I'm trying to create a somewhat interactable menu that is very simple and will populate a link array from an onclick event, but the title="" text attribute drops everything after the first space and I've unsuccessfully tried using a replace() method since it's coming from an array and not static text.
I know I'm probably supposed to use an example, but my work day is coming to a close soon and I wanted to get this posted so I just copied a bit of my actual code.
So here's what's happening, in example 1 of var gmaildocAlt the tooltip will drop everything after Google, but will show the entire string properly with example 2. I was hoping to create a form input for the other helpdesk personnel to add links without knowing how to code, but was unable to resolve the issue of example 1 with a
var fix = gmaildocAlt.replace(/ /g, "&nb sp;")
//minus the space
//this also happens to break the entire function if I set it below the rest of the other variables
I'm sure there are a vast number of things I'm doing wrong, but I would really appreciate the smallest tip to make my tooltip display properly without requiring a replace method.
// GMAIL----------------------------
function gmailArray() {
var gmaildocLink = ['link1', 'link2'];
var gmaildocTitle = ["title1", "title2"];
var gmaildocAlt = ["Google Cheat Sheet For Gmail", "Google 10-Minute Training For Gmail"];
var gmailvidLink = [];
var gmailvidTitle = [];
var gmailvidAlt = [];
if (document.getElementById("gmailList").innerHTML == "") {
for (i = 0; i < gmaildocTitle.length; i++) {
arrayGmail = "" + gmaildocTitle[i] + "" + "<br>";
document.getElementById("gmailList").innerHTML += arrayGmail;
}
for (i = 0; i < gmailvidTitle.length; i++) {
arrayGmail1 = "";
document.getElementById("").innerHTML += arrayGmail1;
}
} else {
document.getElementById("gmailList").innerHTML = "";
}
}
<div class="fixed1">
<p id="gmail" onclick="gmailArray()" class="gl">Gmail</p>
<ul id="gmailList"></ul>
<p id="calendar" onclick="calendarArray()" class="gl">Calendar</p>
<ul id="calendarList"></ul>
</div>

Building HTML manually with strings can cause issues like this. It's better to build them one step at a time, and let the framework handle quoting and special characters - if you're using jQuery, it could be:
var $link = jQuery("<a></a>")
.attr("href", gmaildocLink[i])
.attr("title", gmaildocAlt[i])
.html(gmaildocTitle[i]);
jQuery("#gmailList").append($link).append("<br>");
Without jQuery, something like:
var link = document.createElement("a");
link.setAttribute("href", gmaildocLink[i]);
link.setAttribute("title", gmaildocAlt[i]);
link.innerHTML = gmaildocTitle[i];
document.getElementById("gmailList").innerHTML += link.outerHTML + "<br>";
If it matters to your audience, setAttribute doesn't work in IE7, and you have to access the attributes as properties of the element: link.href = "something";.

If you add ' to either side of the variable strings then it will ensure that the whole value is read as a single string. Initially, it was assuming that the space was exiting the Title attribute.
Hope the below helps!
UPDATE: If you're worried about using apostrophes in the title strings, you can use " by escaping them using a . This forces JS to read it as a character and not as part of the code structure. See the example below.
Thanks for pointing this one out guys! Sloppy code on my part.
// GMAIL----------------------------
function gmailArray() {
var gmaildocLink = ['link1', 'link2'];
var gmaildocTitle = ["title1", "title2"];
var gmaildocAlt = ["Google's Cheat Sheet For Gmail", "Google 10-Minute Training For Gmail"];
var gmailvidLink = [];
var gmailvidTitle = [];
var gmailvidAlt = [];
if (document.getElementById("gmailList").innerHTML == "") {
for (i = 0; i < gmaildocTitle.length; i++) {
var arrayGmail = "" + gmaildocTitle[i] + "" + "<br>";
document.getElementById("gmailList").innerHTML += arrayGmail;
}
for (var i = 0; i < gmailvidTitle.length; i++) {
var arrayGmail1 = "";
document.getElementById("").innerHTML += arrayGmail1;
}
} else {
document.getElementById("gmailList").innerHTML = "";
}
}
<div class="fixed1">
<p id="gmail" onclick="gmailArray()" class="gl">Gmail</p>
<ul id="gmailList"></ul>
<p id="calendar" onclick="calendarArray()" class="gl">Calendar</p>
<ul id="calendarList"></ul>
</div>

Slideshow with images from a folder

So i want to make a slideshow which takes the pictures out of a folder, instead of hardcoding it.
now i got something like this:
<img class="images" src="slideshow/slide0.png">
<img class="images" src="slideshow/slide1.png">
<img class="images" src="slideshow/slide2.png">
<img class="images" src="slideshow/slide3.png">
<img class="images" src="slideshow/slide4.png">
<img class="images" src="slideshow/slide5.png">
<img class="images" src="slideshow/slide6.png">
<script language='javascript'>
//script voor de slideshow
var myIndex = 0;
carousel();
function carousel() {
var i;
var x = document.getElementsByClassName("images");
for (i = 0; i < x.length; i++) {
x[i].style.display = "none";
}
myIndex++;
if (myIndex > x.length) {myIndex = 1}
x[myIndex-1].style.display = "block";
setTimeout(carousel, 5000); // Change image every 5 seconds
}
</script>
Als you can see, the images are in the code, but i want them to be taken out of a map, so if i add photos to that map, the slideshow gets longer too. This is to update the slideshow it easier. Since i'm not the one who's gonna use it, they wanted it like this.
i'm not that experienced with javascript, i found this one online but i can understand what it does. an solution with another language can be usefull too! like PHP

You could use AJAX to pull the information from an XML file.
$(function () {
$.ajax({
type: "GET", //call the xml file for reading
url: "Images.xml", //source of the file
dataType: "xml", //type of data in the file
success: parseXml //function to execute when the file is open and ready for use
});
function parseXml(xml) {
var xImages = xml.getElementsByTagName("Image");//Get all nodes tagged "Image" in the xml document.
var maxImages = xImages.length;//Find total number of nodes, for use in the iterations
function fillImages() {
for (i = 0; i < maxHeadlines; i++) {
$("#ImageList").append('<li><img src="' + xImages.childNodes[0].NodeValue[0] + '"</li>');
//one at a time, append the images
}
}
}
HTML
<ul class="ImageList">
</ul>
<!-- This will be populated through the xml - so no <li> elements need to be harcoded in to the page -->
Then you will need an XML file with some <Image> tags
<File>
<Image>file/path/to/image.png</Image>
<Image>file/path/to/image.png</Image>
</File>
This should be enough for you to at least research and implement some AJAX queries into your page.
More information can be found here and here
Update based on OP's comments.
Your XML file markup will look similar to that of your HTML, but the tags are self-governed, in that you create and name them yourself.
You can save your XML, JavaScript, and HTML files all in the same folder, as such
- Web Page (parent folder)
- index.html
- script.js
- Images.xml
This way, the URL for the XML file will simply be "Images.xml"
As for constructing XML documents, it really couldn't be easier.
Say you want a document that contains information about different people
<People>
<Person>
<Name>Greg</Name>
<Age>21</Age>
<Height>6'2"</Height>
</Person>
<Person>
<Name>Sarah</Name>
<Age>45</Age>
<Height>5'5"</Height>
</Person>
<Person>
<Name>Martin</Name>
<Age>80</Age>
<Height>4'11"</Height>
</Person>
</People>
That's all there is to it. You can use it to store any information you want and organise the structure any way you want, with whatever tag names you want to use
This is just somewhere to store the information. The JQuery then opens the file and says "Oh excellent, there's 4 tags here called <Images>, I'll go ahead and pop them in to the page"
This introduction and this how-to guide are very helpful.
It's probably also worth looking here to get an understanding of the syntax rules you need to follow.
There's plenty of information on that website, and it's definitely worth having a browse around and reading up on it all.
Remember Stack Overflow isn't here to do all the work for you, and there should be plenty of information made available here for you to do the appropriate research and implement what it is you want.

Added Code Snippet below, just locate your XML with this and it should work fine -- FOR JQUERY --
BOTH SNIPPETS WORK IF YOU DEFINE THE XML FILE SOMEWHERE / HOSTED OR LOCAL
<!-- XML -->
<whatever>
<image>path/to/file.jpg</image>
<image>path/to/file.jpg</image>
</whatever>
fetch('path/to/xml.xml') // File You want to grab
.then(function(resp) {
return resp.text(); // Response promise function
}).then(function(data) { // data transform function
let parser = new DOMParser(), // Create a parser for DOM
xmlDoc = parser.parseFromString(data, 'text/xml'); // Actually take the text and Convert it to an HTML DOM Element
var xImages = xmlDoc.getElementsByTagName('image')
var maxImages = xImages.length
console.log(xImages.item(0).textContent)
for (i = 0; i < maxImages; i++) {
$(".ImageList").append('<li><img src="' + xImages.item(i).textContent + '"</li>');//one at a time, append the images
}
})
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>
<ul id="imglist" class="ImageList">
</ul>
For Vanilla JS i have something else Entirely set up
<!-- XML -->
<whatever>
<image>path/to/file.jpg</image>
<image>path/to/file.jpg</image>
</whatever>
fetch('path/to/xml.xml') // File You want to grab
.then(function(resp) {
return resp.text(); // Response promise function
}).then(function(data) { // data transform function
let parser = new DOMParser(), // Create a parser for DOM
xmlDoc = parser.parseFromString(data, 'text/xml'); // Actually take the text and Convert it to an HTML DOM Element
var xImages = xmlDoc.getElementsByTagName('image')
var maxImages = xImages.length
var liopen = '<li><img src="'
var liclose = '"</li>'
console.log(xImages.item(0).textContent)
for (i = 0; i < maxImages; i++) {
document.querySelector(".ImageList").innerHTML += '<li><img src="' + xImages.item(i).textContent + '"</li>';//Vanilla JS approach to append elements through each iteration
}
})
<ul id="imglist" class="ImageList">
</ul>

Extract only javascript from a script tag

I would like to extract only javascript from script tags in a HTML document which I want to pass it to a JS parser like esprima. I am using nodejs to write this application and have the content extracted from the script tag as a string.
The problem is when there are HTML comments in the javascript extracted from html documents which I want to remove.
<!-- var a; --> should be converted to var a
A simple removal of <-- and --> does not work since it fails in the case <!-- if(j-->0); --> where it removes the middle -->
I would also like to remove identifiers like [if !IE] and [endif] which are sometimes found inside script tags.
I would also like to extract the JS inside CDATA segments.
<![CDATA[ var a; ]]> should be converted to var a
Is all this possible using a regex or is something more required?
In short I would like to sanitize the JS from script tags so that I can safely pass it into a parser like esprima.
Thanks!
EDIT:
Based on #user568109 's answer. This is the rough code that parses through HTML comments and CDATA segments inside script tags
var htmlparser = require("htmlparser2");
var jstext = '';
var parser = new htmlparser.Pavar htmlparser = require("htmlparser2");
var jstext = '';
var parser = new htmlparser.Parser({
onopentag: function(name, attribs){
if(name === "script" && attribs.type === "text/javascript"){
jstext = '';
//console.log("JS! Hooray!");
}
},
ontext: function(text) {
jstext += text;
},
onclosetag: function(tagname) {
if(tagname === "script") {
console.log(jstext);
jstext = '';
}
},
oncomment : function(data) {
if(jstext) {
jstext += data;
}
}
}, {
xmlMode:true
});
parser.write(input);
parser.end()

That is the job of the parser. See the htmlparser2 or esprima itself. Please don't use regex to parse HTML, it is seductive. You will waste your precious time and effort trying to match more tags.
An example from the page:
var htmlparser = require("htmlparser2");
var parser = new htmlparser.Parser({
onopentag: function(name, attribs){
if(name === "script" && attribs.type === "text/javascript"){
console.log("JS! Hooray!");
}
},
ontext: function(text){
console.log("-->", text);
},
onclosetag: function(tagname){
if(tagname === "script"){
console.log("That's it?!");
}
}
});
parser.write("Xyz <script type='text/javascript'>var foo = '<<bar>>';</script>");
parser.end();
Output (simplified):
--> Xyz
JS! Hooray!
--> var foo = '<<bar>>';
That's it?!
It will give you all the tags divs, comments, scripts etc. But you would have to validate the script inside the comments yourself. Also CDATA is a valid tag in XML(XHTML), so htmlparser2 would detect it as a comment, you would have to check those too.

Get source of image

I have a next string like:
<img src="../uplolad/commission/ranks/avatar.jpg' . $row[$c_name] .'" width="50" height="50"/>
How can i get a image file name in javascript? I know only PHP regexes. Extention of a file can be different.
The result must be: avatar.jpg

Regex is not ideal for this. JavaScript can traverse the HTML as distinct objects more readily than as a long string. If you can identify the picture by anything, say by adding an ID to it, or an ID to a parent with that as the only image, you'll be able to access the image from script:
var myImage = document.getElementById('imgAvatar'); // or whatever means of access
var src = myImage.src; // will contain the full path
if(src.indexOf('/') >= 0) {
src = src.substring(src.lastIndexOf('/')+1);
}
alert(src);
And if you want to edit, you can do that just as well
myImage.src = src.replace('.jpg', '.gif');

Fetch it following coding which can help what you want to get.
<script type="text/javascript">
function getImageName(imagePath) {
var objImage = new RegExp(/([^\/\\]+)$/);
var getImgName = objImage.exec(imagePath);
if (getImgName == null) {
return null;
}
else {
return getImgName[0];
}
}
</script>
<script>
var mystring = getImageName("http://www.mypapge.mm/myimage.png")
alert(mystring)
</script>

Here's a shorter variation of David Hedlund's answer that does use regex:
var myImage = document.getElementById('imgAvatar'); // or whatever means of access
alert(myImage.src.replace( /^.+\// , '' ));

We Keep Coding

JavaScript is the programming language of the Web.

Extracting images from RSS/Atom feeds - javascript

Related

Javascript-Using Parsed Data From a Query String as a Heading

text string output stops after first space, js/html

Slideshow with images from a folder

Extract only javascript from a script tag

Get source of image

Categories

Resources