I am going to create a crawelable ajax by jquery, How to do it? before I had a website that used jquery Ajax for searching my website but nothing indexed.
this is the new way tha I use:
page 1
And then show result by ajax and don't allow the link to go:
javascript
$("body").on("click","#linkA",function(e){
e.preventDefault();
var href=$(this).attr('href');
$.ajax({
type:"POST",
url:"ajax/return.php",
data:({page:href}),
success:function(data){
$("body").html(data);
}
})
});
my questions:
1- Is the way that I am using true?
2- Is this way crawelable?
I think the way that you are using is true, and it's a good way, but google has an article about Making AJAX Applications Crawlable
As long as the links you provide in the "href" attribute are also rendered correctly by the server if the browser accesses them directly, you're on the safe site. You should also use HTML5 History API and Pushstate in order to reflect the url of the page currently shown, so visitors can use their browser history buttons, send links to pages and favorize them in their browser.
Google and the other search engines normaly won't execute your javascript and try directly to access the links you provide.
If your site got heavy scripts to load or static parts like header, footer, menu it's a great way to improve your loading / rendering speed by hijacking the links and loading only the needed content via javascript.
Related
Ive tried using the js load function but as the external site does not allow CORS requests, my original GET request gets blocked.
<div id="test"></div>
<script>
$(document).ready(function () {
$("#test").load("https://mywebsite.com");
});
</script>
So it seems that my only approach is to use iframes?! Is there a way to only crawl a specific div with iframes? I dont want to display the whole website.
EDIT: Since I am using Django I was able to crawl the website with python in a view and then push the crawled and cleaned up code snippet in the html template. Nevertheless to answer my question -> There is no correct way of doing it as long as the website you are trying to access is blocking the content.
Work with the owner of the site you want to take content from.
They can set you up with an API. That avoids having to use hackey methods or risking copyright-related legal trouble.
This site: http://www.bienvillecapital.com/themes
Has somehow managed to make it look as if new content (when clicking on a link) is loaded with ajax. They also managed to remove the browser loading indicator.
Any ideas how this is done?
This site is a single-page application you can add some similar functionality with JQuery create an event so every time the user clicks on a link the content will be loaded via ajax.
But single-page applications are more than just loading contents via AJAX. If you want to do it right you should use a modern JS framework like http://chaplinjs.org/ or http://angularjs.org/
There are many reasons to use one of those frameworks.
Look at this .ajaxComplete() - http://api.jquery.com/ajaxComplete/
example;
$(document).(function(){
// Your Codes
}).ajaxComplete(function(){
alert("Ajax completed.");
});
it works each ajax process.
I am a Javascript, HTML5 and CSS web application developer, but I don't know how to make my web apps embeddable, so people can embed them in their sites too. Please, any method except PHP will be much appreciated.
Provide a javascript file that people will include in their web pages + some div with marker like
<div id="divtofillwithyourcontent"></div>
And script should fill this element with your content.
Or use iFrame if you need access to your cookies (i suggest you not use it in all other cases)
They could use an iFrame, or they can make an AJAX GET call, for example with jQuery. The response of the AJAX call will be your HTML5/JS which can then be put into a container(div) for the caller.
$.get('ajax/test.html', function(data) {
$('.result').html(data);
alert('App Loadded');
});
I am trying to integrate with the FireShot API to given a URL, grab HTML of another web page into a div then take a screenshot of it.
Some things I will need to do after getting the HTML
grab <link> & <script> from <head>
grab <body> into <div>
But 1st, it seems when I try to do a
$.get("http://google.com", function(data) { ... });
I get a 200 in firebug colored red. I think it has to do with sites not allowing you to grab their page with JS? Then is opening a window the best I can do? But how might I control the other page with jQuery or call fsapi on that page?
UPDATE
I tried to do something like below to do something when the new window is ready, but FireBug says "Permission denied to access property 'document'"
w = window.open($url.val());
setTimeout(function() { // if I dont do this, I always get about:blank, is there a better way around this?
$(w.document).ready(function() {
console.log(w.document.body);
});
}, 1000);
I believe the cross-site security setup within Javascript is basically blocking this. You'd likely have to proxy the content through your own domain.
There are a couple other options I think for break the cross-site security constraints, but I'm not sure I'd promote them.
If the "another page" locates within the same domain of your hosting page, yes, you can. Please refer to jQuery's $().load() API.
Otherwise, you're disallowed to do so by the browser's Cross-Site Security Policy. At this moment, you can choose to use iFrame instead of DIV.
Some jQuery plugins, e.g. thickbox provides ability to load pages to appropriate container automatically.
Unless I am correct, I do not believe you can AJAX a page cross domain (e.g. from domain1.com to domain2.com). To get around this, you can have a PHP "proxy" script that does the "getting" of the page and then pass it to JS.
For example, in JS you would get() http://mydomain.com/get/?domain=http://google.com and then do what you need to do!
I have a html page on my localhost - get_description.html.
The snippet below is part of the code:
<input type="text" id="url"/>
<button id="get_description_button">Get description</button>
<iframe id="description_container" src="#"/>
When the button is clicked the src of the iframe is set to the url entered in the textbox. The pages fetched this way are very big with lots of linked files. What I am interested in the page is a block of text contained in a <div id="description"> element.
Is there a way to mitigate downloading of resources linked in the page that loads into the iframe?
I don't want to use curl because the data is only available to logged in users and the steps to take with curl to get the content is too complicated. The iframe is simple as I use this on a box which sends the right cookies to identify the request as coming from a logged in user, but the problem is that it is very wasteful to get nearly 1 MB of data to keep 1 KB of it and throw out the rest.
Edit
If the proposed method just works in Firefox it is fine, so I added Firefox tag. Also, it is possible that the answer actually is from the realm of Firefox add-on techniques, so I added that tag as well.
The problem is not that I cannot get at what I'm looking for, rather, the problem is the easy iframe method is wasteful.
I know that Firefox does allow loading only the text of a page. If you open a page and press Ctrl+U you are taken to 'view page source' window, There links behave as normal and are clickable, if you click on a link in source view, the source of the new page is loaded into the view source window, without the linked resources being downloaded, exactly what I'm trying to get. But I don't know how to access this behaviour.
Another example is the Adblock add-on. It somehow kills elements before they get loaded. With plain Javascript this is not possible. Because it only is triggered too late to intervene in good time.
The Same Origin Policy forbids any web page to access contents of any other web page in a different domain so basically you cannot do that.
However it seems that with some browsers it is allowed to access web pages content if you are trying to access it from a local web page which seems to be your case.
Safari, IE 6/7/8 are browser that allow a local web page to do so via XMLHttpRequest (source: Google Browser Security Handbook) so you may want to choose to use one of those browsers to do what you need (note that future versions of those browsers may not allow to do so anymore).
A part from this solution I only see two possibities:
If the web pages you need to fetch content from are somehow controlled by you, you can create a simpler interface to let other web pages to get the content you need (for example allowing JSONP requests).
If the web pages you need to fetch content from are not controlled by you the only solution I see is to fetch content server side logging in from the server directly (I know that you don't want to do so, but I don't see any other possibility if the previous I mentioned are not practicable)
Hope it helps.
Actually I've seen Cross Domain jQuery .load request before, here: http://james.padolsey.com/javascript/cross-domain-requests-with-jquery/
The author claims that codes like these found on that page
$('#container').load('http://google.com'); // SERIOUSLY!
$.ajax({
url: 'http://news.bbc.co.uk',
type: 'GET',
success: function(res) {
var headline = $(res.responseText).find('a.tsh').text();
alert(headline);
}
});
// Works with $.get too!
would work. (The BBC code might not work because of the recent redesign, but you get the idea)
Apparently it is using YQL wrapped into a jQuery plugin to do the trick. Now I cannot say I fully understand what he is doing there but it appears to work, and fits the bill. Once you load the data I suppose it is a simple matter of filtering out the data that you need.
If you prefer something that works at the browser level, may I suggest Mozilla's Jetpack framework for lightweight extensions. I've not yet read the documentations in its entirety but it should contain the APIs needed for this to work.
There are various ways to go about this in AJAX, I'm going to show the jQuery way for brevity as one option, though you could do this in vanilla JavaScript as well.
Instead of an <iframe> you can just use a container, let's say a <div> like this:
<div id="description_container"></div>
Then to load it:
$(function() {
$("#get_description_button").click(function() {
$("#description_container").load($("input").val() + " #description");
});
});
This uses the .load() method which takes a string in this format: .load("url selector"), then takes that element in the page and places it's content inside the container you're loading, in this case #description_container.
This is just the jQuery route, mainly to illustrate that yes, you can do what you want, but you don't have to do it exactly like this, just showing the concept is getting what you want from an AJAX request, rather than in an <iframe>.
Your description sounds like you are fetching pages from the same domain (you said that you need to be logged in and have session credentials) so have you tried to use async request via XMLHttpRequest? It might complain if the html on a page is particularly messed up but you chould still be able to get raw text via .responseText and extract what you need with a regex.