Removing broken links in offline HTML - javascript

I have an html file with many <a> tags with href links.
I would like to have the page do nothing when these links point to an outside url (http://....) or an internal link that is broken.
The final goal is to have the html page used offline without having any broken links. Any thoughts?
I have tried using a Python script to change all links but it got very messy.
Currently I am trying to use JavaScript and calls such as $("a").click(function(event) {} to handle these clicks, but these have not been working offline.
Also, caching the pages will not be an option because they will never be opened online. In the long run, this may also need to be adapted to src attributes, and will be used in thousands of html files.
Lastly, it would be preferable to use only standard and built in libraries, as external libraries may not be accessible in the final solution.
UPDATE: This is what I have tried so far:
//Register link clicks
$("a").click(function(event) {
checkLink(this, event);
});
//Checks to see if the clicked link is available
function checkLink(link, event){
//Is this an outside link?
var outside = (link.href).indexOf("http") >= 0 || (link.href).indexOf("https") >= 0;
//Is this an internal link?
if (!outside) {
if (isInside(link.href)){
console.log("GOOD INSIDE LINK CLICKED: " + link.href);
return true;
}
else{
console.log("BROKEN INSIDE LINK CLICKED: " + link.href);
event.preventDefault();
return false;
}
}
else {
//This is outside, so stop the event
console.log("OUTSIDE LINK CLICKED: " + link.href);
event.preventDefault();
return false;
}
}
//DOESNT WORK
function isInside(link){
$.ajax({
url: link, //or your url
success: function(data){
return true;
},
error: function(data){
return false;
},
})
}
Also an example:
Outside Link : Do Nothing ('#')
Outside Link : Do Nothing ('#')
Existing Inside Link : Follow Link
Inexistent Inside Link : Do Nothing ('#')

Javascript based solution:
If you want to use javascript, you can fix your isInside() function by setting the $.ajax() to be non asynchronous. That is will cause it to wait for a response before returning. See jQuery.ajax. Pay attention to the warning that synchronous requests may temporarily lock the browser, disabling any actions while the request is active (This may be good in your case)
Also instead of doing a 'GET' which is what $.ajax() does by default, your request should be 'HEAD' (assuming your internal webserver hasn't disabled responding to this HTTP verb). 'HEAD' is like 'GET' except it doesn't return the body of the response. So it's a good way to find out if a resource exists on a web server without having to download the entire resource
// Formerly isInside. Renamed it to reflect its function.
function isWorking(link){
$.ajax({
url: link,
type: 'HEAD',
async: false,
success: function(){ return true; },
error: function(){ return false; },
})
// If we get here, it obviously did not succeed.
return false;
}
Python based solution:
If you don't mind preprocessing the html page (and even caching the result), I would go with parsing the HTML in Python using a library like BeautifulSoup.
Essentially I would find all the links on the page, and replace the href attribute of those starting with http or https with #. You can then use a library like requests to check the internal urls and update the appropriate urls as suggested.

Here is some javascript that will prevent you from going to external site:
var anchors = document.getElementsByTagName('a');
for(var i=0, ii=anchors.length; i < ii; i++){
anchors[i].addEventListener('click',function(evt){
if(this.href.slice(0,4) === "http"){
evt.preventDefault();
}
});
}
EDIT:
As far as checking if a local path is good on the client side, you would have to send and ajax call and then check the status code of the call (infamous 404). However, you can't do ajax from a static html file (e.g. file://index.html). It would need to be running on some kind of local server.
Here is another stackoverflow that talks about that issue.

Related

Return Content with alert in Controller

I'm trying to somewhat replicate what I saw in this question, particularly in this answer, but not quite the same.
My intent is, if the zip has no files (it can happen because the folder could be empty) I want to return an alert just so the user is warned that is not possible to obtain the file at the time.
But I'm missing on the redirection point, I don't want the alert to redirect the user to a blank page refering the Action, I want it to stay in the page, also due to some filters.
Is this possible? I couldn't find anything that would stop the redirection from happening.
Here is my the Action Controller code:
public ActionResult DownloadZip(List<int> things)
{
// Create zip with files
if (!zip.Any())
{
return Content(#"<script language='javascript' type='text/javascript'>
alert('Message');
</script>
");
}
// Return zip
}
Here is the call from the view:
$("#btnExportToZip").on("click", function (e) {
var grid = $("#gridThings").data("kendoGrid");
var items = grid.dataSource.data();
var lstIds = [];
$.each(items, function (index, elem) {
if (elem.Checked) {
lstIds.push(elem.Id);
}
});
if (lstIds.length > 0) {
var params = lstIds.join("&listAmostras=")
var url = '/Search/DownloadZip?listAmostras=' + params;
window.location.href = url;
}
});
If you do a redirect as you're doing here, it's too late to take it back once you've determined the zip file is empty. Your best bet here is probably to do an AJAX file download. Bear in mind, though, that this will require that the browser supports the HTML5 File API, so IE 9 and under are out.
$.ajax({
url: url,
async: false,
xhrFields: {
responseType: 'blob'
},
success: function (data) {
var a = document.createElement('a');
var url = window.URL.createObjectURL(data);
a.href = url;
a.download = 'myfile.pdf';
a.click();
window.URL.revokeObjectURL(url);
}
});
Essentially what this does is request the zip file via AJAX. Once the file data has been received, an anchor link is added to the DOM (not visible) and dynamically "clicked" to approximate the behavior of user click a link to a static file. In other words, a download prompt will pop as soon as the AJAX request completes successfully. However, this code only removes the need to redirect. You still need to conditionally pop the download only if the zip file has something in. There's two ways you can accomplish that.
In the success callback of the AJAX, you would wrap the code there in a conditional that checks that data.size > 0. However, that might not actually work. I've never looked at an empty zip file, but it's entirely possible that there's file headers in the binary that would cause the blob to actually have a size greater than zero, even though it's "empty".
The better approach is to return an error response in your zip action when the zip file is empty. Off the top of my head, I'm not sure what the most appropriate error response code would be, but anything in 400-500 range will work for triggering the appropriate AJAX callback. Then, you just need to add and error handler to this AJAX. In that handler, you could then notify the user however you like that there's no download because the zip would be empty.
As per my understanding, alert is redirect the user to the blank page because in the javascript you have the line window.location.href = url; which might be redirect to the same action again which shows the alert.
So try to give the different url to the window.location.href
for ex:window.location.href = '../somecontroller/someaction';
thanks
Karthik

jQuery rebind click event after ajax call

I'm trying to prevent defaults on a click, call a page with ajax and trigger the click on complete, using this answer.
<a id="mylink" href="file.csv" download >Dowload</a>
<script>
var flag = false;
$('#mylink').on('click',function(e) {
// Result is the same with :
//$(document).on("click","#mylink",function(e){
if (flag === true) {
flag = false;
return;
}
e.preventDefault();
$.ajax({
url: "index.php?controller=admin&action=refreshFile",
complete: function() {
console.log('control'); // This is called
flag = true;
$('#mylink').trigger('click'); // This is not called
}
});
});
</script>
The call works but the link is not triggered after. The result is the same when the ajax call is set inside a separate function.
use window.location to call the link href
$('#mylink').on('click',function(e) {
e.preventDefault();
$.ajax({
url: "index.php?controller=admin&action=refreshFile",
complete: function() {
console.log('control'); // This is called
window.location = $('#mylink').attr("href");
}
});
});
or with one event listeners
var eventListener = function(e) {
e.preventDefault();
$.ajax({
url: "index.php?controller=admin&action=refreshFile",
complete: function() {
console.log('control'); // This is called
$('#mylink')[0].click();
$('#mylink').one('click', eventListener);
}
});
};
$('#mylink').one('click', eventListener);
I'm not sure what your flag is supposed to do. In your example it would mean the link only works every 2nd click.
P.s. Using the complete callback means it also works even when the ajax fails. You might want to change it to success.
Update
#Racil Hilan has a point: this solution is a little overkill when you could just call the link directly and return the correct file after the refreshFile action has been called.
TRy
var flag = false;
$('#mylink').on('click',function(e) {
// Result is the same with :
//$(document).on("click","#mylink",function(e){
if (flag === true) {
flag = false;
windows.location="file.csv";
}
e.preventDefault();
$.ajax({
url: "index.php?controller=admin&action=fileDownload",
complete: function() {
console.log('control'); // This is called
flag = true;
$('#mylink').trigger('click'); // This is not called
}
});
});
In my humble opinion, this is not the right design. Your Ajax is calling the index.php on the server before triggering the download. If the index.php is doing some security or critical stuff that MUST be done before allowing the user to download the file, then this design is absolutely insecure. You don't even need to be a hacker, simply copy the link file.csv and paste it in your browser's address bar, and you'll get the file without the Ajax.
You need to place the file.csv file outside your website folder (or maybe it is generated on the fly by the server code, so that' good too) and then the PHP page must run all the checks and if all run OK, it reads the file (or generate it) and returns the download to the browser (or an error message if the checks failed). This is how to secure file downloads on the server.
After doing all of that, it is a matter of preference whether you call the PHP directly from your link, or the link calls the Ajax function which in turn calls the PHP page and parse the download (this is a bit more complex, but doable). The only difference between the two methods is whether you want the page refreshed when the download (or error message) come back from the server.
If you want to take this advice, rephrase your question and select which way you want to go (i.e. direct link, or through Ajax), so we can help you.

polling for RSS feed with casperjs not working

I am trying to match a token (string token) in the RSS feed using casperjs waitFor() but it does not seem to work. There are other ways (not using polling) to get around but I need to poll for it. Here is the code snippet:
casper.then(function() {
this.waitFor(function matchToken() {
return this.evaluate(function() {
if(!this.resourceExists(token)) {
this.reload();
return false;
}
return true;
});
});
});
The updates to rss url are not dynamic and hence, a refresh would be needed to check for the token. But it seems (from the access log) that I am not getting any hits (reload not working) on the rss url. Ideally, I would want to refresh the page if it doesn't see the token and then check for the token again & it should keep doing that until the waitFor times out.
I also tried using assertTextExists() instead of resourceExists() but even that did not work.
I am using PhantomJS (1.9.7) & the url is: https://secure.hyper-reach.com:488/rss/323708
The token I am looking for is --> item/272935. If you look at the url I have mentioned above, you will find this in a each guid tag. The reason why I am including "item/" also as a part of my token is so that it doesn't match any other numbers incorrectly.
evaluate() is the sandboxed page context. Anything inside of it doesn't have access to variables defined outside and this refers to window of the page and not casper. You don't need the evaluate() function here, since you don't access the page context.
The other thing is that casper.resourceExists() works on the resource meta data such as URL and request headers. It seems that you want to check the content of the resource. If you used casper.thenOpen() or casper.open() to open the RSS feed, then you can check with casper.getPageContent(), if the text exists.
The actual problem with your code is that you mix synchronous and asynchronous code in a way that won't work. waitFor() is the wrong tool for the job, because you need to reload in the middle of its execution, but the check function is called so fast that there probably won't be a complete page load to actually test it.
You need to recursively check whether the document is changed to your liking.
var tokenTrials = 0,
tokenFound = false;
function matchToken(){
if (this.getPageContent().indexOf(token) === -1) {
// token was not found
tokenTrials++;
if (tokenTrials < 50) {
this.reload().wait(1000).then(matchToken);
}
} else {
tokenFound = true;
}
}
casper.then(matchToken).then(function(){
test.assertTrue(tokenFound, "Token was found after " + tokenTrials + " trials");
});

URL hashchange problems with ajax loading

I have a functional wordpress theme that loads content via ajax. One issue that I'm having though is that when pages are loaded directly the ajax script no longer works. For example the link structure works as follows, while on www.example.com and the about page link is clicked then the link becomes www.example.com/#/about. But when I directly load the standalone page www.example.com/about, the other links clicked from this page turn into www.example.com/about/#/otherlinks. I modified the code a little bit from this tutuorial http://www.deluxeblogtips.com/2010/05/how-to-ajaxify-wordpress-theme.html. Here is my code. Thanks for the help.
jQuery(document).ready(function($) {
var $mainContent = $("#container"),
siteUrl = "http://" + top.location.host.toString(),
url = '';
$(document).delegate("a[href^='"+siteUrl+"']:not([href*=/wp-admin/]):not([href*=/wp-login.php]):not([href$=/feed/]))", "click", function() {
location.hash = this.pathname;
return false;
});
$(window).bind('hashchange', function(){
url = window.location.hash.substring(1);
if (!url) {
return;
}
url = url + " #ajaxContent";
$mainContent.fadeOut(function() {
$mainContent.load(url,function(){
$mainContent.fadeIn();
});
});
});
$(window).trigger('hashchange');
});
The problem you are expressing is not easily solved. There are multiple factors at stake but it boils down to this :
Any changes to a URL will trigger a page reload
Only exception is if only the hash part of the URL changes
As you can tell there is no hash part in the URL www.example.com/about/. Consequently, this part cannot be changed by your script, or else it will trigger page reload.
Knowing about that fact, your script will only change the URL by adding a new hash part or modifying the existing one, while leaving alone the "pathname" part of the URL. And so you get URLs like www.example.com/about/#/otherlinks.
Now, from my point of view there are two ways to solve your problem.
First, there is an API that can modify the whole URL pathame without reload, but it's not available everywhere. Using this solution and falling back to classical page reload for older browser is the cleaner method.
Else, you can force the page reload just once to reset the URL to www.example.com/ and start off from a good basis. Here is the code to do so :
$(document).delegate("a[href^='"+siteUrl+"']:not([href*=/wp-admin/]):not([href*=/wp-login.php]):not([href$=/feed/]))", "click", function() {
location = location.assign('#' + this.pathname);
return false;
});
It should be noted that this script won't work if your site is not at the root of the pathname. So for it to work for www.example.com/mysite/, you will need changes in the regex.
Please let me know how it went.

jquery load() equivalent for offline use

I am looking for an equivalent to jquery's load() method that will work offline. I know from jquery's documentation that it only works on a server. I have some files from which I need to call the html found inside a particular <div> in those files. I simply want to take the entire site and put it on a computer without an internet connection, and have that portion of the site (the load() portion) function just as if it was connected to the internet. Thanks.
Edit: BTW, it doesn't have to be js; it can be any language that will work.
Edit2:
My sample code (just in case there are syntax errors I am missing; this is for the files in the same directory):
function clickMe() {
var book = document.getElementById("book").value;
var chapter = document.getElementById("chapter").value;
var myFile = "'" + book + chapter + ".html'";
$('#text').load(myFile + '#source')
}
You can't achieve load() over the file protocol, no other ajax request is going to work for html files. I have tried even with the crossDomain and isLocale option on without anything success, even if precising the protocol.
The problem is that even if jQuery is trying the browser will stop the request for security issues (well most browsers as the snippet below works in FF) as it allows you to load locale file so you could get access to a lot of things.
The one thing you could load locally is javascript files, but that probably means changing a lot of the application/website architecture.
Only works in FF
$.ajax({
url: 'test.html',
type: 'GET',
dataType: 'text',
isLocale: true,
success: function(data) {
document.body.innerHTML = data;
}
});
What FF does well is that it detect that the file requesting local files is on the file protocol too when other don't. I am not sure if it has restriction over the type of files you can request.
You can still use the JQuery load function in this context:
You would could add an OfflineContent div on your page:
<div id="OfflineContent">
</div>
And then click a button which calls:
$('#OfflineContent').load('OfflinePage.html #contentToLoad');
Button code:
$("#btnLoadContent").click(function() {
$('#OfflineContent').load('OfflinePage.html #contentToLoad');
});
In the OfflinePage.html you could have to have another section called contentToLoad which would display on the initial page.

Categories