I'm trying to get the html of www.soccerway.com. In particular this:
that have the label-wrapper class I also tried with: select.nav-select but I can't get any content. What I did is:
1) Created a php filed called grabber.php, this file have this code:
<?php echo file_get_contents($_GET['url']); ?>
2) Created a index.html file with this content:
<!DOCTYPE html>
<html>
<head>
<script src="http://ajax.googleapis.com/ajax/libs/jquery/1/jquery.min.js"></script>
<meta charset=utf-8 />
<title>test</title>
</head>
<body>
<div id="response"></div>
</body>
<script>
$(function(){
var contentURI= 'http://soccerway.com';
$('#response').load('grabber.php?url='+ encodeURIComponent(contentURI) + ' #label-wrapper');
});
var LI = document.querySelectorAll(".list li");
var result = {};
for(var i=0; i<LI.length; i++){
var el = LI[i];
var elData = el.dataset.value;
if(elData) result[el.innerHTML] = elData; // Only if element has data-value attr
}
console.log( result );
</script>
</html>
in the div there is no content grabbed, I tested my js code for get all the link and working but I've inserted the html page manually.
I see a couple issues here.
var contentURI= 'http:/soccerway.com #label-wrapper';
You're missing the second slash in http://, and you're passing a URL with a space and an ID to file_get_contents. You'll want this instead:
var contentURI = 'http://soccerway.com/';
and then you'll need to parse out the item you're interested in from the resulting HTML.
The #label-wrapper needs to be in the jQuery load() call, not the file_get_contents, and the contentURI variable needs to be properly escaped with encodeURIComponent:
$('#response').load('grabber.php?url='+ encodeURIComponent(contentURI) + ' #label-wrapper');
Your code also contains a massive vulnerability that's potentially very dangerous, as it allows anyone to access grabber.php with a url value that's a file location on your server. This could compromise your database password or other sensitive data on the server.
Related
I'm doing a study using a RSS, but the Web Site gives me a RSS with an unclosed tag then I couldn't get the innerHTML of this tag.
I don't know how to resolve the problem with jquery and make the tag closed or a possible solution like this.
Here is the code :
<!DOCTYPE html>
<html>
<head>
<title></title>
<meta charset="utf-8" content="xml">
<script type="text/javascript" src="api/jquery.js"></script>
</head>
<body>
<p id="someElement" visibility="hidden"></p>
<p id="anotherElement"></p>
<script type="text/javascript">
var x = new XMLHttpRequest();
x.open("GET", "http://www.lemonde.fr/rss/une.xml", true);
x.onreadystatechange = function () {
if (x.readyState == 4 && x.status == 200)
{
var doc = x.responseXML;
var string = (new XMLSerializer()).serializeToString(doc);
$("#someElement").append(string);
alert("test");
var tag = document.getElementsByTagName("item");
for(var i = 0, max = tag.length; i < max; i++){
var htmli = tag[i];
//alert(htmli.innerHTML);
//uncomment the alert to see the xml got from the rss
var title = htmli.getElementsByTagName("title")[0].innerHTML;
var link = htmli.getElementsByTagName("link")[0].innerHTML;
var description = htmli.getElementsByTagName("description")[0].innerHTML;
var toAdd = "<ul><li> title : " +title+"</li><li> link : "+ link +" </li><li> description :"+description+" </li></ul>";
$("#anotherElement").append(toAdd);
}
}
};
x.send(null);
</script>
</body>
</html>
Any solution to this?
I have jquery in a folder named api.
Thanks a lot !!
(I notice that while you include jQuery in a script tag, you're not actually using it in your code. It's much better practice to use jQuery's functionality to manage AJAX requests and serialization, if you're going to use it at all, as they cover many more situations and browser versions. I'd also recommend retrieving jQuery from a CDN rather than hosting it yourself. jQuery has had the ability to parse XML natively since 1.5. The following was written using 1.12.)
I ran into the same issue with unclosed tags in an RSS feed and came up with a terrible solution to it. I have not tested this cross-browser and would not recommend incorporating it into production code, but it worked to solve a one-time problem for me.
The idea is to take the raw output of the RSS item's text, cram it into the jQuery HTML parser, and then manually inspect its output until we get to an item that it thinks might have been an HTML <link> tag. Because we know the RSS link tag isn't closed, the next thing it encounters should be parsed as an HTML Text object, which we can extract for our permalink URL.
Here's how I would rewrite your script to take better advantage of jQuery and incorporate my hack. (I'm assuming you have set up CORS or something else so that you can actually retrieve the feed from lemonde.fr cross-domain.)
<!DOCTYPE html>
<html>
<head>
<title></title>
<meta charset="utf-8" content="xml">
<script type="text/javascript" src="//code.jquery.com/jquery-1.12.4.min.js"></script>
</head>
<body>
<p id="someElement" visibility="hidden"></p>
<p id="anotherElement"></p>
<script type="text/javascript">
(function($, window, document) {
function fetchFeed(url) {
// use jQuery to handle AJAX
$.get(url, function(data) {
// parse XML result with jQuery
var $XML = $(data);
$XML.find("item").each(function() {
// ensure that we have a jQuery-wrapped _this_ object and
// create a new object with the properties we want
var $this = $(this),
item = {
title: $this.find("title").text(),
description: $this.find("description").text(),
link: ""
};
// since the XML parser will treat the unclosed <link> as valid,
// we instead send the raw output to the HTML parser and tell it do to its best
var $redigested = $($this.html());
// jQuery should produce an array of HTML DOM objects
for (var i = 0; i < $redigested.length; i++) {
// if we found an HTMLLinkElement--a <link> tag--followed by a Text element, that's our URL
if ($redigested[i] instanceof HTMLLinkElement && $redigested.length >= i + 1 && $redigested[i + 1] instanceof Text) {
item.link = $redigested[i + 1].data;
break;
}
}
console.log("link: " + item.link);
var toAdd = "<ul><li> title: " + item.title + "</li><li> link: " + item.link + " </li><li> description: " + item.description + " </li></ul>";
$("#anotherElement").append(toAdd);
});
});
}
$(function() {
// call the fetch function on DOM ready
fetchFeed("http://www.lemonde.fr/rss/une.xml");
});
})(jQuery, window, document);
</script>
</body>
</html>
I have some JavaScript code with Servlets code, I want to move all of them (between ) to external js file, but it doesn't work, what can I do? How to modify my code if only part of JavaScript can move to external file.
<script language="JavaScript" type="text/JavaScript">
var sel = document.getElementById('sel');
var selList = [];
<%
String key = "";
String text = "";
for(int i = 0; i < master.size(); i++) {
Map option = (Map) master.get(i);
key = (String) option.get("Code");
text = key + " " + (String) option.get("NAME");
%>
selList.push({
key: "<%=key%>",
text: "<%=text%>"
});
<%
}
%>
</script>
Here two options:
1-by not using ajax
external.js
var images;
function renderImages(){
//do things for showing images here.
//images variable has images data as JSON (i suggest you this way) so you can easily iterate over list and render it.
}
jsp
<html>
<head>
<script type="text/javascript" src="external.js"></script>
<script>
images = "<%=request.getAttribute("imageDataAsJSON")%>"; //here i assume you populate request variable with your image data in JSON format. Be careful about parse errors due to ' and ".
</script>
</head>
<body>
<script>
renderImages();
</script>
</body>
</html>
2-by using ajax (you can seperate client side logic into external js code and populate data into it by doing ajax calls to server side.)
external.js
function renderImages(){
//do ajax to your servlet which returns image data as JSON.
//iterate over image data and render your html elements accordingly.
}
jsp
<html>
<head>
<script type="text/javascript" src="external.js"></script>
</head>
<body>
</body>
</html>
<html>
<head>
<script type="text/javascript" src="jquery.js"></script>
<script type="text/javascript">
if (window.self === window.top) { $.getScript("Wing.js"); }
</script>
</head>
</html>
Is there a way in C# to modify the above HTML file and convert it into this format:
<html>
<head>
</head>
</html>
Basically my goal is to remove all the JavaScript from the HTML page. I don't know what is be the best way to modify the HTML files. I want to do it programmatically as there are hundreds of files which need to be modified.
It can be done using regex:
Regex rRemScript = new Regex(#"<script[^>]*>[\s\S]*?</script>");
output = rRemScript.Replace(input, "");
May be worth a look: HTML Agility Pack
Edit: specific working code
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
string sampleHtml =
"<html>" +
"<head>" +
"<script type=\"text/javascript\" src=\"jquery.js\"></script>" +
"<script type=\"text/javascript\">" +
"if (window.self === window.top) { $.getScript(\"Wing.js\"); }" +
"</script>" +
"</head>" +
"</html>";
MemoryStream ms = new MemoryStream(Encoding.ASCII.GetBytes(sampleHtml));
doc.Load(ms);
List<HtmlNode> nodes = new List<HtmlNode>(doc.DocumentNode.Descendants("head"));
int childNodeCount = nodes[0].ChildNodes.Count;
for (int i = 0; i < childNodeCount; i++)
nodes[0].ChildNodes.Remove(0);
Console.WriteLine(doc.DocumentNode.OuterHtml);
I think as others have said, HtmlAgility pack is the best route. I've used this to scrape and remove loads of hard to corner cases. However, if a simple regex is your goal, then maybe you could try <script(.+?)*</script>. This will remove nasty nested javascript as well as normal stuff, i.e the type referred to in the link (Regular Expression for Extracting Script Tags):
<html>
<head>
<script type="text/javascript" src="jquery.js"></script>
<script type="text/javascript">
if (window.self === window.top) { $.getScript("Wing.js"); }
</script>
<script> // nested horror
var s = "<script></script>";
</script>
</head>
</html>
usage:
Regex regxScriptRemoval = new Regex(#"<script(.+?)*</script>");
var newHtml = regxScriptRemoval.Replace(oldHtml, "");
return newHtml; // etc etc
This may seem like a strange solution.
If you don't want to use any third party library to do it and don't need to actually remove the script code, just kind of disable it, you could do this:
html = Regex.Replace(html , #"<script[^>]*>", "<!--");
html = Regex.Replace(html , #"<\/script>", "-->");
This creates an HTML comment out of script tags.
using regex:
string result = Regex.Replace(
input,
#"</?(?i:script|embed|object|frameset|frame|iframe|meta|link|style)(.|\n|\s)*?>",
string.Empty,
RegexOptions.Singleline | RegexOptions.IgnoreCase
);
Hi all i want to document.write a hyperlink image inside getjson i tried the following but it doesnt work. could you guys tell me what is wrong with my document write?
<script>
$.getJSON('http://anyorigin.com/get?url=http://www.somesite.com/handelit.ashx&callback=?', function(data){
var siteContents = data.contents;
//writes to textarea
document.myform.outputtext.value = siteContents ;
document.write("<a id="ok" href="http://www.mysite.com/master.m3u8?+siteContents+"><img src="./playicon.jpg"></a>");
});
</script>
Hi all i want to document.write a hyperlink image inside getjson
You can't (not reasonably*). document.write only works during the initial parsing of the page. If you use it after the page finishes loading, it completely replaces the page.
Instead, interact with the DOM. Several ways to do that, but the most obvious based on your code is to have the anchor initially-hidden and then show it after filling in the text area like this:
$("#ok").show();
Full Example: Live Copy | Live Source
(I've changed the playicon.jpg to your gravatar, since otherwise it shows as a broken image on JSBin)
<!DOCTYPE html>
<html>
<head>
<script src="http://code.jquery.com/jquery-1.9.1.min.js"></script>
<meta charset=utf-8 />
<title>JS Bin</title>
</head>
<body>
<form name="myform">
<textarea name="outputtext"></textarea>
</form>
<a id="ok" style="display: none" href="http://www.mysite.com/master.m3u8?+siteContents+"><img src="http://www.gravatar.com/avatar/f69cfb4677f123381231f97ea1138f8a?s=32&d=identicon&r=PG"></a>
<script>
(function($) {
$.getJSON('http://anyorigin.com/get?url=http://www.somesite.com/handelit.ashx&callback=?', function(data){
var siteContents = data.contents;
//writes to textarea
document.myform.outputtext.value = siteContents;
// shows the link
$("#ok").show();
});
})(jQuery);
</script>
</body>
</html>
* "not reasonably": IF your content were coming from the same origin as the document (it doesn't look like it is), you could do this with a synchronous ajax call. But that would be very bad design.
Please, use createElement instead of document.write
$.getJSON('http://anyorigin.com/get?url=http://www.somesite.com/handelit.ashx&callback=?', function(data){
var siteContents = data.contents;
//writes to textarea
document.myform.outputtext.value = siteContents ;
//Create A-Element
var link = document.createElement('a');
link.setAttribute('href', 'http://www.mysite.com/master.m3u8?' + encodeURIComponent(siteContents) );
link.id = 'ok';
//Append A-Element to your FORM-Element
var myForm = document.getElementsByTagName('form')[0];
myForm.appendChild(link);
//Create IMG-Element
var img = document.createElement('img');
img.setAttribute('src', './playicon.jpg');
//Append IMG-Element to A-Element (id='ok')
link.appendChild(img);
});
I am trying to create a javascript quiz, that gets the questions from a xml file. At the moment I am only starting out trying to parse my xml file without any success. Can anyone point me to what I am doing wrong?
<html>
<head>
<title>Test</title>
<script type="text/javascript" src="prototype.js"></script>
</head>
<body>
<div class="spmArr">
</div>
<script type="text/javascript">
var quizXML = '<quiz><Sporsmal tekst="bla bla bla"/><alternativer><tekst>bla</tekst><tekst>bli</tekst><tekst correct="yes">ble</tekst></alternativer><Sporsmal tekst="More blah"/><alternativer><tekst>bla bla</tekst><tekst correct="yes">bli bli</tekst><tekst>ble ble</tekst></alternativer></quiz>'
var quizDOM = $.xmlDOM( quizXML );
quizDOM.find('quiz > Sporsmal').each(function() {
var sporsmalTekst = $(this).attr('tekst');
var qDiv = $("<div />")
.addClass("item")
.addClass("sporsmal")
.appendTo($(".spmArr"));
var sTekst = $("<h2/>")
.html(sporsmalTekst)
.appendTo(qDiv);
});
</script>
</body>
</html>
When I try this in my browser the classes and div are not being created. And the page is just blank. Am i doing something wrong when I intialize the xml?
edited to add prototype.js and close function
Looks like you're forgetting to close your .each call. append ); after the statement for sTekst and your call will parse correctly.