Get all URLs from an external URL

Get all URLs from an external URL - javascript

I'm trying to get all URLs from a page using jQuery to call them later on using $.get(). If they were on the same page as the script is included in, it would be no problem calling something like
var links = document.getElementsByTagName("a");
for(var i=0; i<links.length; i++) {
alert(links[i].href);
}
In this case I'd just use alert to check that the links were actually parsed.
But how can I do the same thing with an URL that is not the current page?
Any help would be appreciated. Maybe I'm missing something ridiculously simple but I am really stumped when it comes to anything JavaScript/JQuery related.

Blatantly copying this answer by Nick Craver (go upvote it), but modifying it for your use case:
$.get("page.html", function(data) {
var data = $(data);
var links = data.find('a');
//do stuff with links
});
Note that this will only work if the page you're hitting is set up for cross-origin request. If it isn't, you'll need to do the same with a Dom-parser from a backend server. Nodejs has some great options there, including jsDom.

You will have to get the other page via an HTTP request ($.get in JQuery achieves this), and then either go about converting that HTML into a DOM that JQuery can then traverse and find the <a> tags for you, or use another method such as a regular expression to find all the links within the returned markup.
edit: Probably don't actually use a regex unless you have a guaranteed HTML format and can guarantee the format of all <a> tags on the page. By this point, it's probably just easier to parse the HTML for real.

Collect the current page URL using window.location.href and then match the same with the href of other "a" tags in the loop
var links = document.getElementsByTagName("a");
var thisHref = window.location.href;
for(var i=0; i<links.length; i++) {
templink = links[i].href;
if (templink != thisHref){// if the link is not same with current page URL
alert(links[i].href);
}
}

Related

How to not make window.location relative to file

I don't know why I couldn't find a reason why this was happening or how to fix it, but here is my code:
window.location.href=("www.google.com");
I want this code to make the page go to google.com, but instead it adds the path of my javascript file to the URL:
file:///home/chronos/u-d39822a3dd3bcc85fb11b442cbd253ea0275a8af/Downloads/www.google.com
How do I make it so that it simply goes to google.com? And is there an entirely different way I should be doing this?

Without specifying protocol, it's like a relative link base on your current URL.
Very similar to this case:
var anchors = document.getElementsByTagName('a');
for(var i = 0; i < anchors.length; i++) {
document.getElementsByTagName('div')[0].innerHTML += anchors[i].href+'<br>';
}
<p>haha</p>
<p>hoho</p>
<p><div></div></p>

Specify the http:// protocol, otherwise it will try to start the path relative to your page url. You also can just use window.location
window.location = 'http://www.google.com';

hash navigation URL construction

Let's say that location.href is http:/domain.com/en/ at the moment.
After a click I want it to be http://domain.com/en/#opened-File.html/1
This way I know what URL I need, so if a user copies and shares this URL I am doing:
$(document).ready(function(){
var info = window.location.hash.match(/^#([^\/]*)\/([^-]*)-(.*)$/),
url="", nivel="", seccion="";
if (info) {
url = info[1];
nivel = info[3];
seccion = info[2];
location.href = url;
}
}
Wich works fine, but my questions are:
is this a good aproach?
is this seo-frendly?
would you do it differently?
this works together with
$('nav a').each(function(){
if(!$(this).hasClass('enlaceAnulado')){
/*Recopilamos*/
var href = $(this).attr('href');
var id = $(this).attr('id');
var parts = id.split("_");
var seccion = parts[0];
var nivel = parseInt(parts[1])+1;
/*Quitamos el enlace*/
$(this).attr('href','javascript:void(0)');
/*Guardamos la información.*/
$(this).data('hrefnot',href);
$(this).data('nivel',nivel);
$(this).data('seccion',seccion);
$(this).addClass('enlaceAnulado');
}
});
So the links where static but i do this to improve user experience and load content via ajax

Search engine indexes your page content as if the url has nothing that follows the hash. Hash navigation is only intended for the browser to maintain a navigation history. You should always make the content you want to be indexed static. Consider this as an answer to all three questions of yours.

is this a good approach?
My first inclination is to think that this is a good job for the server-side (php, python, asp.net, apache rewrite, etc.)
is this seo-frendly?
I would worry about the hash, and instead utilize better Url practices.
would you do it differently?
I would rather have my server parse (mod rewrite, etc) the Url instead of javascript.

I'd like to add the following to Nikita Volkov's answer:
Search crawlers generally don't run JavaScript code (although Google is trying to change that). This means that redirecting the user to a static page using JavaScript, like what you're doing with this:
location.href = url;
...is not going to work.
If you want to make URL's with hash tags more SEO-friendly, you'll have to do it server-side.

Get contents from <link> (not <a>) tag

Hi I'm trying to get contents of the link tag. So with:
<link rel="stylesheet" href="some.css">
I want the contents of the file some.css in a string.
Tried:
document.getElementsByTagName('link')[0].firstChild.nodeValue; // fails
document.getElementsByTagName('link')[0].hasChildNodes(); // false
Any ideas? I don't want to use the styleSheet method (which only works in FF anyway) because it will strip out stuff like -moz-border-radius and such.
Thanks.

I think Daniel A. White is correct. Your best bet is to get the href of the stylesheet, then load the content via Ajax and parse it.
What are you trying to do exactly?

You can't get the contents of a file with only javascript. You'll need an ajax request to the server which opens the file and returns its contents.

To do this, you need to access the file via an ajax request.
So, with jQuery, something like this
$.ajax({
url: "some.css",
success: function(){
//do something
}
});
More details here: http://api.jquery.com/jQuery.ajax/
Note: this only works if the file making the request is on the same server as the file requested.

CSS rules offer a special API, but nothing like innerHTML.
This is as close as it gets:
var result = '';
var st = document.styleSheets[0].cssRules;
for (var i = 0; i < st.length; i++) {
result += st[i].cssText;
}
console.log(result);
However, this will not respect whitespace, comments, erroneous rules, ...
And as usual, this is subject to Same Origin Policy.

Replace links on page based on location.host and a cookie

I'm using jquery to rewrite a list of links on the page. If the location.host is NOT the vendor location.host AND the cookie isn't set to a specific value then it locates the links and rewrites them to the alternate values. The code I'm using works great in FF but not in IE7. Please help!
<script type="text/javascript">
// link hider
var hostadd = location.host;
var vendor = '172.29.132.34';
var localaccess = 'internal.na.internal.com';
var unlock = 'http://internal.na.internal.com/Learning/Customer_Care/navigation/newhire.html';
// link rewriter
$(document).ready (
function style_switcher(){
//if not a vendor or not accessing from lms reroute user to lms
if (hostadd != vendor && $.cookie("unlockCookie") != unlock){
var linkData = {
"https://www.somesite.com": "https://internalsite.com/something",'../Compliance/something/index.html':'../somethingelse.html'
};
$("a").each(function() {
var link = this.getAttribute("href"); // use getAttribute to get what was actualy in the page, perhaps not fully qualified
if (linkData[link]) {
this.href = linkData[link];
}
});
}
});
</script>

What you could do, if you insert the links dynamic, is store them in a data attribute like data-orglink="yourlink" which wouldnt be transformed by the browser, then check on that -and if its in the object array - change the href. Do you have access to creating the data attribute?
IE7 have problems with internal links, because it puts the host info on, before JS can reach the link..
http://jsfiddle.net/Cvj8C/9/
Will work in all, but IE7. So you need to use full paths if to use JS for this function :(
You had some errors in your JS.
But it seems to work fine?
See: http://jsfiddle.net/s4XmP/
or am i missing something? :)

How to use javascript to get information from the content of another page (same domain)?

Let's say I have a web page (/index.html) that contains the following
<li>
<div>item1</div>
details
</li>
and I would like to have some javascript on /index.html to load that
/details/item1.html page and extract some information from that page.
The page /details/item1.html might contain things like
<div id="some_id">
picture
map
</div>
My task is to write a greasemonkey script, so changing anything serverside is not an option.
To summarize, javascript is running on /index.html and I would
like to have the javascript code to add some information on /index.html
extracted from both /index.html and /details/item1.html.
My question is how to fetch information from /details/item1.html.
I currently have written code to extract the link (e.g. /details/item1.html)
and pass this on to a method that should extract the wanted information (at first
just .innerHTML from the some_id div is ok, I can process futher later).
The following is my current attempt, but it does not work. Any suggestions?
function get_information(link)
{
var obj = document.createElement('object');
obj.data = link;
document.getElementsByTagName('body')[0].appendChild(obj)
var some_id = document.getElementById('some_id');
if (! some_id) {
alert("some_id == NULL");
return "";
}
return some_id.innerHTML;
}

First:
function get_information(link, callback) {
var xhr = new XMLHttpRequest();
xhr.open("GET", link, true);
xhr.onreadystatechange = function() {
if (xhr.readyState === 4) {
callback(xhr.responseText);
}
};
xhr.send(null);
}
then
get_information("/details/item1.html", function(text) {
var div = document.createElement("div");
div.innerHTML = text;
// Do something with the div here, like inserting it into the page
});
I have not tested any of this - off the top of my head. YMMV

As only one page exists in the client (browser) at a time and all other (virtual/possible) pages are on the server, how will you get information from another page using JavaScript as you will have to interact with the server at some point to retrieve the second page?
If you can, integrate some AJAX-request to load the second page (and parse it), but if that's not an option, I'd say you'll have to load all pages that you want to extract information from at the same time, hide the bits you don't want to show (in hidden DIVs?) and then get your index (or whoever controls the view) to retrieve the needed information from there ... even though that sounds pretty creepy ;)

You can load the page in a hidden iframe and use normal DOM manipulation to extract the results, or get the text of the page via AJAX, grab the part between <body...>...</body>¨ and temporarily inject it into a div. (The second might fail for some exotic elements like ins.) I would expect Greasemonkey to have more powerful functions than normal Javascript for stuff like that, though - it might be worth to thumb through the documentation.

We Keep Coding

JavaScript is the programming language of the Web.

Get all URLs from an external URL - javascript

Related

How to not make window.location relative to file

hash navigation URL construction

Get contents from <link> (not <a>) tag

Replace links on page based on location.host and a cookie

How to use javascript to get information from the content of another page (same domain)?

Categories

Resources