I am trying to parse an RSS feed into an array but the feed is adding CDATA tags and combining certain elements.
My code below parses through the rss feed (url) and adds certain elements to an array. However when I look at the feed itself, it is combining multiple key elements in CDATA tags.
How do I parse through the CDATA tags to get usable xml fields?
Code
buildXMLDoc = function (url) {
var list =[];
$(listXML).find('item').each(function (){
var el = $(this);
console.log(el.find("title").text());
console.log(el.find("pubDate").text());
console.log(el.find("description").text());
list.push({title: el.find("title").text(), description: el.find("description").text(), modified: el.find("pubDate").text()});
});
return list;
};
XML
<?xml version="1.0" encoding="UTF-8"?>
<!--RSS generated by Microsoft SharePoint Foundation RSS Generator on 8/29/2017 10:23:18 AM -->
<?xml-stylesheet type="text/xsl" href="/_layouts/RssXslt.aspx?List=43aaf08e-0153-4f1d-9b46-e66bba563fde" version="1.0"?>
<rss version="2.0">
<channel>
<title>Webdocs: Test</title>
<description>RSS feed for the Test list.</description>
<lastBuildDate>Tue, 29 Aug 2017 14:23:18 GMT</lastBuildDate>
<generator>Microsoft SharePoint Foundation RSS Generator</generator>
<ttl>60</ttl>
<language>en-US</language>
<item>
<title>Alternative Methods for Determining LCRs</title>
<description><![CDATA[<div><b>Short Title:</b> Determining LCRs</div>
<div><b>Description:</b> <div class="ExternalClass6280076BC79848078688B86006BA554F"><p><span style="font-size:11.5pt;font-family:"calibri", "sans-serif"">This project is a carryover from the 2017 effort to identify an alternative method for calculating the Locational Minimum Installed Capacity Requirements (LCRs). </span></p></div></div>
<div><b>Governance Process Status:</b> Progress</div>
<div><b>Topic State:</b> Open/Current</div>
<div><b>Updated Placeholder:</b> updated</div>
]]></description>
<pubDate>Wed, 12 Jul 2017 13:41:06 GMT</pubDate>
</item>
The highlighted items are suppose to be separate elements.
In order to get the CDATA part details I may suggest to use jquery.contents() and so getting the relative sub sections by positon. This may give you wrong results if the positions change but it's a possibility.
var listXML = '<?xml version="1.0" encoding="UTF-8"?>\
<!--RSS generated by Microsoft SharePoint Foundation RSS Generator on 8/29/2017 10:23:18 AM -->\
<?xml-stylesheet type="text/xsl" href="/_layouts/RssXslt.aspx?List=43aaf08e-0153-4f1d-9b46-e66bba563fde" version="1.0"?>\
<rss version="2.0">\
<channel>\
<title>Webdocs: Test</title>\
<description>RSS feed for the Test list.</description>\
<lastBuildDate>Tue, 29 Aug 2017 14:23:18 GMT</lastBuildDate>\
<generator>Microsoft SharePoint Foundation RSS Generator</generator>\
<ttl>60</ttl>\
<language>en-US</language>\
<item>\
<title>Alternative Methods for Determining LCRs</title>\
<description><![CDATA[<div><b>Short Title:</b> Determining LCRs</div>\
<div><b>Description:</b> <div class="ExternalClass6280076BC79848078688B86006BA554F"><p><span style="font-size:11.5pt;font-family:"calibri", "sans-serif"">This project is a carryover from the 2017 effort to identify an alternative method for calculating the Locational Minimum Installed Capacity Requirements (LCRs). </span></p></div></div>\
<div><b>Governance Process Status:</b> Progress</div>\
<div><b>Topic State:</b> Open/Current</div>\
<div><b>Updated Placeholder:</b> updated</div>\
]]></description>\
<pubDate>Wed, 12 Jul 2017 13:41:06 GMT</pubDate>\
</item>';
var list =[];
$(listXML).find('item').each(function (){
var el = $(this);
var cdat = $(listXML).find('item description').contents();
console.log(cdat.eq(1).text() + cdat.eq(2).text());
console.log(cdat.eq(5).contents().eq(0).text() + cdat.eq(5).contents().eq(1).text());
console.log(cdat.eq(6).contents().eq(0).text() + cdat.eq(6).contents().eq(1).text());
list.push({title: cdat.eq(2).text(), description: cdat.eq(5).contents().eq(1).text(), modified: cdat.eq(6).contents().eq(1).text()});
});
console.log('list: ' + JSON.stringify(list));
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
A different approach is to get the description element, replace the inner CDATA and convert the result to a jQuery object. On this object you can use find in order to select sub elements.
var listXML = '<?xml version="1.0" encoding="UTF-8"?>\
<!--RSS generated by Microsoft SharePoint Foundation RSS Generator on 8/29/2017 10:23:18 AM -->\
<?xml-stylesheet type="text/xsl" href="/_layouts/RssXslt.aspx?List=43aaf08e-0153-4f1d-9b46-e66bba563fde" version="1.0"?>\
<rss version="2.0">\
<channel>\
<title>Webdocs: Test</title>\
<description>RSS feed for the Test list.</description>\
<lastBuildDate>Tue, 29 Aug 2017 14:23:18 GMT</lastBuildDate>\
<generator>Microsoft SharePoint Foundation RSS Generator</generator>\
<ttl>60</ttl>\
<language>en-US</language>\
<item>\
<title>Alternative Methods for Determining LCRs</title>\
<description><![CDATA[<div><b>Short Title:</b> Determining LCRs</div>\
<div><b>Description:</b> <div class="ExternalClass6280076BC79848078688B86006BA554F"><p><span style="font-size:11.5pt;font-family:"calibri", "sans-serif"">This project is a carryover from the 2017 effort to identify an alternative method for calculating the Locational Minimum Installed Capacity Requirements (LCRs). </span></p></div></div>\
<div><b>Governance Process Status:</b> Progress</div>\
<div><b>Topic State:</b> Open/Current</div>\
<div><b>Updated Placeholder:</b> updated</div>\
]]></description>\
<pubDate>Wed, 12 Jul 2017 13:41:06 GMT</pubDate>\
</item>';
var list =[];
$(listXML).find('item').each(function (){
var el = $(this);
var cdat = $(listXML).find('item description').contents();
var html = $($(listXML).find('item description')[0].innerHTML.replace('<!--[CDATA[', '')).html();
console.log(html);
});
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
Related
I'm parsing a RSS feed which looks like this:
<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:npr="http://www.npr.org/rss/" xmlns:nprml="http://api.npr.org/nprml" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:content="http://purl.org/rss/1.0/modules/content/" version="2.0">
<channel>
<title>News</title>
<link>http://www.npr.org/templates/story/story.php?storyId=1001&ft=1&f=1001</link>
<description>NPR news, audio, and podcasts. Coverage of breaking stories, national and world news, politics, business, science, technology, and extended coverage of major national and world events.</description>
<language>en</language>
<copyright>Copyright 2012 NPR - For Personal Use Only</copyright>
<generator>NPR API RSS Generator 0.94</generator>
<lastBuildDate>Tue, 28 Aug 2012 12:19:00 -0400</lastBuildDate>
<image>
<url>http://media.npr.org/images/npr_news_123x20.gif</url>
<title>News</title>
<link>http://www.npr.org/templates/story/story.php?storyId=1001&ft=1&f=1001</link>
</image>
<item>
<title>Reports: Obama Administration Will Unveil New Fuel-Efficiency Standards</title>
<description>The new rules will require U.S. cars to average 54.5 miles per gallon by 2025.</description>
<pubDate>Tue, 28 Aug 2012 12:19:00 -0400</pubDate>
<link>http://www.npr.org/blogs/thetwo-way/2012/08/28/160172356/reports-obama-administration-will-unveil-new-fuel-efficiency-standards?ft=1&f=1001</link>
<guid>http://www.npr.org/blogs/thetwo-way/2012/08/28/160172356/reports-obama-administration-will-unveil-new-fuel-efficiency-standards?ft=1&f=1001</guid>
<content:encoded><![CDATA[<p>The new rules will require U.S. cars to average 54.5 miles per gallon by 2025.</p><p>» E-Mail This » Add to Del.icio.us</p>]]></content:encoded>
</item>
</channel>
</rss>
I'm looping the items like this:
var channel = xml.documentElement.getElementsByTagName("channel");
var items = xml.documentElement.getElementsByTagName("item");
for (var i = 0; i < items.length; i++) {
var ul = document.getElementById("feed");
var li = document.createElement('li');
var item = items.item(i);
var title = item.getElementsByTagName("title").item(0).textContent;
var link = item.getElementsByTagName("link").item(0).textContent;
var description = item.getElementsByTagName("link").item(0).textContent;
//var content = item.getElementsByTagName('content\\:encoded').item(0).textContent;
var li = document.createElement('li');
li.innerHTML = '' + title + '';
document.getElementById('feed').appendChild(li);
}
But how can I get the contents of the node <content:encoded>?
I tried with: item.getElementsByTagName('content\\:encoded').item(0).textContent; but it's not working.
Using jQuery this one inside .each() works: $(this).find('content\\:encoded').text(); but I'd rather use native javaScript.
So, it seems that I needed to use the tag getElementsByTagNameNS and that the node was "encoded" - like this:
var content = item.getElementsByTagNameNS("*", "encoded").item(0).textContent;
my solution:
var result2 = JSON.parse(result1);
setData(result2.rss.channel.item[0].["content:encoded"]);
jus use ["content:encoded"]
My xml looks like this, I am able to retrieve the items and get the data from nodes like <title>, <description>. How to get the values from <media:title> and <media:credit>, <media:thumbnail>
This is how am able to get the data
var xmlparser = new DOMParser();
var xmlData = xmlparser.parseFromString(data.text(), "text/xml");
var items = xmlData.getElementsByTagName('item');
for(var i = 0; i < items.length; i++){
var title = items[i].getElementsByTagName("title")[0].childNodes[0].nodeValue;
var desc = items[i].getElementsByTagName("description")[0].childNodes[0].nodeValue;
}
<pre xmlns:content="http://purl.org/rss/1.0/modules/content/"
xmlns:dcterms="http://purl.org/dc/terms/"
xmlns:media="http://search.yahoo.com/mrss/" version="2.0">
<Channel>
<item>
<title>List of records</title>
<description>reading xml</description.
<media:title xmlns:media="http://search.yahoo.com/mrss/">
SinkorSwim Trailer
</media:title>
<title>Sink or Swim - Trailer</title>
<description>Jon Bowermaster's documentary</description>
<media:description xmlns:media="http://search.yahoo.com/mrss/">
Jon Bowermaster's documentary on a learn-to-swim camp
</media:description>
<media:credit xmlns:media="http://search.yahoo.com/mrss/" role="Director"
scheme="urn:ebu">
Jon Bowermaster
</media:credit>
<media:status xmlns:media="http://search.yahoo.com/mrss/" state="active"/>
<media:thumbnail xmlns:media="http://search.yahoo.com/mrss/"
type="landscape" url="http://snagfilms-video.jpg"/>
<media:player xmlns:media="http://search.yahoo.com/mrss/" height="323"
url="http://embed.snagfilms.com/embed/player?filmId=00000158-b20c-d8f9-
affd-b32ce8700000" width="500"/>
</item>
<item></item>
<item></item>
</channel>
</pre>
The media in media:title denotes an XML namespace prefix. The namespace prefix is only a shortcut for the namespace. The namespace has to be defined somewhere in the document with an xmlns:media attribute.
Then you can use the namespace aware getElementsByTagNameNS() function to query for the title element:
console.log(xml.getElementsByTagNameNS('xmlns:media="http://search.yahoo.com/mrss/"', 'title'));
first parameter you have to pass the namespace name and not the prefix.
I'm using jQuery to parse and output three RSS feeds into three different container divs. The first two feeds work fine, but I can't get the links to work in the third feed, the href isn't found at all. See fiddle http://jsfiddle.net/a68myvm2/1/
I'm thinking that it has to do with that the third feed contains multiple link tags per item. I've tried searching the web for a solution without success.
HTML
<div id="content_1"></div>
<div id="content_2"></div>
<div id="content_3"></div>
JS
$(function () {
function GetFeeds(){
var urls = ['http://www.gosugamers.net/counterstrike/news/rss', 'http://www.hltv.org/news.rss.php', 'http://feeds.thescoreesports.com/csgo.rss'];
urls.forEach(function(Query){
$.ajax({
type: "GET",
url: 'http://ajax.googleapis.com/ajax/services/feed/load?v=1.0&num=1000&callback=?&q='+encodeURIComponent(Query),
dataType: 'json',
error: function () {
alert('Unable to load feed, Incorrect path or invalid feed');
},
success: function(xml) {
var Content=parseInt(urls.indexOf(Query))+1;
$("#content_"+Content).html('');
$.each(xml.responseData.feed.entries, function(idx, value){
$("#content_"+Content).append('<a class="news-item" href="' + value.link + '" title="' + value.title +'" target="_blank"><p>' + value.publishedDate + '</p><h3>' + value.title + '</h3></a><hr>');
});
}
});
});
}
//Call GetFeeds every 5 seconds.
setInterval(GetFeeds,5000);
//Page is ready, get feeds.
GetFeeds();
});
Part of problematic feed (http://feeds.thescoreesports.com/csgo.rss)
<item>
<guid isPermaLink="false">4117</guid>
<title>ScrunK joins Team Coast as Coach</title>
<link>http://www.thescoreesports.com/news/4117</link>
<pubDate>Wed, 30 Sep 2015 18:02:45 +0000</pubDate>
<dc:creator/>
<media:content url="https://dqrt72khb0whk.cloudfront.net/uploads/image/file/2729/w1080xh810_coast.jpg?ts=1432916713">
<media:credit>Team Coast</media:credit>
</media:content>
<content:encoded>
<![CDATA[<p>Team Coast have brought in German CS:GO professional, Robin "<strong>ScrunK</strong>" Röpke, to take the reigns as the team's coach, the organization announced Wednesday. </p><figure><blockquote class="twitter-tweet" lang="en"><p lang="en" dir="ltr">We would like to officially welcome #CSTScrunK as our new CS:GO coach! Please show your support and give him a follow!!</p>— Team Coast (#TeamCoastGaming) September 30, 2015</blockquote>
<script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></figure><p>"I would like to thank Team Coast and the players for giving me this opportunity to throw my EU knowledge and style into the NA scene," said ScrunK in his statement to HLTV.org. "I am excited to see where this route in my career leads me and where I can help this team go."</p><p>Team Coast are currently competing in the North American divisions of both CEVO-P Season 8 and ESL ESEA Pro League Season 2. They currently sit at sixth in the CEVO-P division with a 2-4-3 record and third in the ESL ESEA Pro League with a record of 3-4.</p><p><em>Paul Park is a writer for theScore eSports. <a target="_blank" href="https://twitter.com/phjpark">You can follow him on Twitter</a>.</em></p><p><small><em>Copyright © 2015 Score Media Ventures Inc. All rights reserved. Certain content reproduced under license.</em></small></p>]]>
</content:encoded>
<link rel="related" type="text/html" href="http://www.thescoreesports.com/news/4070" title="Dead Pixels CS:GO part ways with FARIS and YOUNS"/>
<link rel="related" type="text/html" href="http://www.thescoreesports.com/news/4115" title="G2.Kinguin add jkaem to roster"/>
<link rel="related" type="text/html" href="http://www.thescoreesports.com/news/4102" title="ESL ESEA Pro League Hot Match of week 3"/>
<link rel="related" type="text/html" href="http://www.thescoreesports.com/news/4061" title="HIGHLIGHT: azr shuts down Winterfox"/>
<link rel="related" type="text/html" href="http://www.thescoreesports.com/news/4063" title="DreamHack Stockholm Group B Roundup: Down and out"/>
</item>
<data>
<manufacture date="2013-06-05 T 19:40:50. 88463 7 Z">
<title>java_package</title>
<author>tom</author>
<year>2013</year>
<price>29.99</price>
</manufacture>
<manufacture date="2015-06-05T19:40:50.884637Z">
<title>java_package_2</title>
<author>tom</author>
<year>2015</year>
<price>39.95</price>
</manufacture>
<manufacture date="2014-06-05T19:40:50.884637Z">
<title>java_package_3</title>
<author>tom</author>
<year>2003</year>
<price>39.95</price>
</manufacture>
</data>
DATA is the root element, Manufacture is the element content.
here i need to get the latest manufacture, based on date i.e
2015-06-05 T 19 : 40 : 50 . 88463 7 Z
which is an attribute of the manufacture element.
i am doing in asp.net and c#
here is my code that i tried
XmlDocument xdoc = new XmlDocument();
xdoc.LoadXml("bc.xml");
XmlNode root = xdoc.DocumentElement;
XmlNodeList nodeList = root.SelectNodes("//manufacture");
foreach (XmlNode node1 in nodeList)
{
Label1.Text += node.Attributes["DATE"].Value.ToString() + "\n <br>";
}
Your datetime formatting in the file is wrong and inconsistent....
It should be like this : "2013-06-05 T19:40:50.884637Z" otherwise any and every code will fail...
Here is the solution...
var mf = doc.Root.Elements("manufacture")
.OrderByDescending (m => DateTime.ParseExact(m.Attribute("date").Value, "yyyy-MM-dd THH:mm:ss.ffffffZ", null))
.First();
I can't manage to get this Ext.data.XmlReader's CDATA field mapping to work.
<script>
var store = new Ext.data.Store({
url: '../data/data.xml',
// specify a XmlReader
reader: new Ext.data.XmlReader({
record: 'entry',
fields:[
{ name: 'field1', type: 'date', mapping:'field1'},
{ name: 'field2', type: 'string', mapping:'field2'}
]
}),
listeners:{load:function(store,recs)
{ //alert row1.field1 and row1.field2
var s = 'field1 = '+recs[0].get('field1') + '\nfield2 = '+recs[0].get('field2');
alert(s);
}
}
});
store.load();
</script>
And here's the XML contents in data.xml :
<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
<entry>
<field1>01/01/2006</field1>
<field2>
<![CDATA[
<Comment>
Test
</Comment>
]]>
</field2>
</entry>
</feed>
When store finished loading . The alert (from the listener) shows some thing like this:
field1 = Sun Jan 01 2006 00:00:00 GMT+0700 (ICT)
field2 =
But I expected to see this :
field1 = Sun Jan 01 2006 00:00:00 GMT+0700 (ICT)
field2 = <Comment>
Test
</Comment>
These issue only happen in chrome and safari.it works with IE6.
How do I get the field2 node value (preferably, the solution works across major browsers),
any suggestion ?
Thanks in advance.
Owat
The <![CDATA[ start tag must start immediately after the XML tag with no space in between and the ]]> end tag must be followed immediately by the XML tag close, like this:
<field2><![CDATA[
<Comment>
Test
</Comment>
]]></field2>