How can I inspect a webpage without opening it in a browser?

How can I inspect a webpage without opening it in a browser? - javascript

In a certain webpage that I was inspecting I saw some redirect links that didn't redirect directly to that link. For example: A button says "Go to Google" and then opened "www.examplesite.com/redirect_google" instead of just opening Google via <href>.
I wasn't sure if I trusted that link so a question came up: "How can I inspect that page to know what kind of scripts they do there?". But as you already understood I can't open it in my browser because I get redirected, so where can I type it so it inspects the page instantly?

If the redirect is implemented at the network layer, then there's no page to inspect; it's just a http 301 response (or 302, etc.)
If the redirect is via a meta tag or Javascript, then you can request the page via curl without rendering the HTML or having a browser act upon the meta redirect.
In the case of Javascript, you could also disable JS in your browser (methods of how to do that vary depending on the browser you're using).

Using CURL in command line on the given page, you will get the source code of the page.
Added to another programming language you can simply parse the files to check if it contains a redirect.
I'm also pretty sure a few tools exist over the Internet to check such behavior on websites, but I don't know any.

Linux/UNIX command line:
$ curl -i www.example.com/redirect_google
There are many variations of this; some small utility that downloads content from URLs without caring about what the content is and showing you information about the responses (here -i to show HTTP headers).
But if your concern is that the page may not be trustworthy... well, why this Google redirect page in particular? Any site could try to attack you with some "bad content"...

You can download the whole html file or whatever is stored there with tools like WinHTTrack or WSSniffer for example.

Related

Deeplink to Facebook App (using fb: protocol) not working from Facebook in-app browser

I am writing a mobile web page which has both a redirect and two manual backup links (for when the redirect doesn't work) to a Facebook Page.
The link takes the form:
fb://page/[PAGE ID NUMBER]
The redirect and link work in Chrome Mobile and Firefox Mobile but (surprise) they don't work in Facebook Browser which, instead, gives me the error:
Page can't be loaded.
I am perplexed that a link to the Facebook App doesn't work from within the Facebook Browser.
How can I resolve this? Are there any creative solutions or workarounds... or have I missed something obvious?
Additional Info: It looks like the redirect is working in at least one version of the Facebook Browser on the Facebook iOS App. So the issue may be isolated to the Facebook Android App.
UPDATE 1
I've made some progress. I've discovered that Facebook's in-app browser doesn't always (or doesn't ever?) acknowledge / load / execute external script files.
Added: (To find out why not, see Update 8, below...)
In this case the href attributes in the links were being re-populated with fb:// protocol links by an external script after page load.
I have moved the relevant javascript functions from the external script to the bottom of the actual page. I have tested the functions and I can see they are now activating. Although the links still don't work.
UPDATE 2
It struck me that there may be some security mechanism going on behind the scenes which doesn't allow for any javascript-driven re-population of href attributes and that instead of the fb:// protocol links not working, it was maybe the case that the initial, default http://www.facebook.com/ links were never even being replaced and it was those http:// protocol links that weren't working.
So I updated the PHP template, so that the initial default links were the fb:// instead of the http:// links (so nothing in the page delivered to the Facebook in-app browser would need to be updated by any client-side script at all at any point).
Nope. Still not working.
UPDATE 3
I added a plain vanilla link to the bottom of the page, linking to the site's homepage. The link functioned entirely normally.
Later, I pointed the original links to an external domain. They didn't work.
So... I concluded that only http:// protocol links to the same domain would work and that's why the links wouldn't work if they pointed to an external domain or to an fb:// protocol address.
Wrong conclusion.
I pointed the original links at the site's homepage and they still didn't work.
UPDATE 4
In a moment of inspiration, I removed the reference to the external script which I'd set up to customise the links to the OS + browser environment (even though this script reference was being entirely ignored by Facebook, according to the FB Debugging tool.
The links worked.
So the reason why the plain vanilla link I had added earlier had worked, was nothing to do with where it was pointing and simply to do with the fact that at no point had a script tried to access it or update it.
Added: (This isn't the reason. See Update 8, below...)
I pointed the original links at the external domain. They worked.
I pointed the original links at the fb:// protocol. They didn't work.
UPDATE 5
Now that I've got rid of the external script reference, I can point the original protocol links at any http:// protocol address and they work.
Including the http://www.facebook.com web equivalent of the page I am trying to open in the Facebook App.
Let's review that:
The Facebook website is opening in the Facebook in-app browser.
I know, right?
UPDATE 6 [.HTACCESS REDIRECT]
I changed the link destination to /fb-custom-redirect/.
Then I added a line to the mod_rewrite section of my .htaccess file:
RewriteRule ^fb-custom-redirect fb://page/[PAGE ID NUMBER]
Naturally the server didn't understand what I was asking for.
UPDATE 7 [PHP REDIRECT]
I created an index.php for /fb-custom-redirect/ and added the following:
<?php
header('Location: fb://page/[PAGE ID NUMBER]');
?>
Guess what? This works in Firefox Mobile. It also works in Chrome Mobile.
But in the Facebook in-app browser, it returns the same error:
Page can't be loaded.
UPDATE 8
I've only just discovered - and this is not insignificant - that when the Facebook Debugger Tool (https://developers.facebook.com/tools/debug/sharing/) refreshes Facebook's cache of a given page, it only refreshes the .html.
Pressing Scrape Again does not refresh any external resources like .css and .js files.
Instead Facebook continues to refer to its own cached versions of those files, regardless that the .html file cache has just been updated.
The workaround (in PHP, at least) is to append the filepath with a new, randomly generated query string every time the page is loaded:
<link rel="stylesheet" href="/styles/mystyles.css?'.uniqid().'" />
Now the Facebook in-app browser is fetching up-to-date versions of my .css and .js files.
This explains my initial observation in Update 1:
I've made some progress. I've discovered that Facebook's in-app
browser doesn't always (or doesn't ever?) acknowledge / load / execute
external script files.
I'm going to conclude that the Facebook in-app browser was parsing the external .js file reference every time, but it was repeatedly accessing an old, cached version of that file.
Nevertheless, even after all the hypotheses and experimenting above, I'm still no closer to discovering why fb: protocol deeplinks don't work in the Facebook App's in-app browser.
I give up.

Apple apps are sandboxed. This means they cannot access other apps and execute code. Facebook is running a browsing instance and when you try to call the fb:// protocol, the iPhone is blocking you from doing this to try to create an infinite app loading loop. I.e, you open a page in FB browser and it opens itself in FB browser and it opens itself in FB browser...

Can I grab specific page HTML code from another webpage through Javascript?

I've read about how there are multiple methods to grabbing source code from another webpage via jQuery or using Cross-Domain Requests. What I want to try and do is make it so I grab a div that has different code each time a page is loaded and not the source code as a whole. So for example, the greater detail you see when you use 'inspect element' or a tool like firebug to dive deeper into the page code.
Would I be using one of the same methods?

Yes.
If you control BOTH domains you can add the Access_control_CORS
header to allow access of cross domain requests and use a
headless browser like phantomJS to grab a cached version of
rendered HTML page.
If you don't control both domains you will have to write a server
side proxy to grab the page and all its resources (you will have
to parse the page to get or rewrite links to images, javascripts,
stylesheets etc...) then run it through phantomJS to create a
HTML snapshot.`
source:
https://developer.mozilla.org/en-US/docs/Web/Security/Same-origin_policy
https://developer.mozilla.org/en-US/docs/Web/HTTP/Access_control_CORS
http://phantomjs.org/
NOTE: despite my best efforts, stack overflow is absolutely convinced these links are code. Sorry for posting as code.

Mixed Content Warning IE: What matters; css, images, everything?

I have just moved my site from http to https and IE-9 started showing non-secure content warning at home page. This warning is understandable because i have one http call to googleapi for getting jquery script. But when I login and enter the inner pages there is no warning from IE despite the fact that most of the images are coming from other servers through http protocol.
So the question: Is getting image over http is fine when accessing site over https? Does only css and js matters? or shall I have to get all the data through HTTPS? If so how is my scenario justifiable (getting images over http from other server on https page without warning)?

If you load CSS and JS over HTTP then an attacker can inject executable code. Unfortunately IE will execute JavaScript within CSS. The problem with loading images over HTTP from the same domain is that the browser will likely spill the session id in plain text which is a violation OWASP a9.

You can use the protocol-relative URL on all your urls to avoid this issue in IE.
Basicaly, instead of linking to a js/image/css by using its full path with the protocol, you instead link to it by leaving out the protocol bit and just using a double slash, //.
This will have the effect of all the above links inheriting the protocol from the parent page.
Of course this depends on you having valid SSL certs on the domains you're serving the different files form.
One other thing to note also is that images in your pages or CSS that are done using data URI could also cause mixed content warnings in IE.
To find out what files are causing issues, I recommend using Fiddler
There is also another tool that a fellow SO user, Eric Law wrote:
Install it from http://www.bayden.com/dl/scriptfreesetup.exe and you will get a different mixed content prompt which shows the exact URL of the first insecure resource on the page. That tool is basically a prototype and you should uninstall it when you're done with it. It works on IE8 and you should install it as admin.

Advert javascript not being served correctly to the browser unless called directly?

I hope this is the right place to ask this question - I did have a look at the rest of the sites in the network but this looked like the most appropriate place.
We are having issues serving third party adverts on our websites. For various reasons our ad setup is a bit complicated - we serve third party javascript tags (AppNexus) through our own ad server (OpenX) through iframes. Currently, the third party javascript tags are not showing correctly, although they have worked just fine in the past.
Debugging this in Safari I have discovered a few things which seem to me to be a bit unusual, and I'm struggling to work out what's going on. Using the web inspector to check the third party's javascript, it appears in the web inspector as a blank file. Additionally, if I check the network tab, the headers are shown and look fine, but there is no 'content' tab with which to check the returned content. The network tab shows the request for the file as complete, and with suitable status codes (200/302):
http://cl.ly/401C1D3Y3u2G2k2k3s0x
However, if I load the file directly in the web browser, it loads fine:
http://ib.adnxs.com/ttj?id=694021&cb=[CACHEBUSTER]&pubclick=[INSERT_CLICK_TAG]
FWIW, the javascript file uses document.write to spit out either an image or another iframe. It's also worth mentioning that there are no related errors in the console - there is one relating to Google Ads, but the problem persists if I load the Ad server's iframe directly without the rest of the site.
Has anyone seen this behaviour before, where a file loads just fine directly, but is (blank / not retrieved / not parsed / whatever's going on) when called as part of another page? If so, would you be able to help me fix this?
Thanks in advance for any help you can give me - I hope this makes some sense and will be happy to provide any further information that might help me get to the bottom of this!
Ollie

I'm guessing that the third party site is filtering output from their servers based on the HTTP referer being sent in the request (a technique employed by many web hosts to thwart hot-linking content). Try putting the link to the javascript file in a clickable link on a web page on your server and click it and then see if it loads or if you get a blank page. You could also try loading a browser extension which lets you forge the HTTP Referer (such as RefControl for Firefox) and then change your refer to be your site instead of the third party's and try pasting in the URL to the browser and see if it loads.
This isn't your fault if it's what turns out to be the actual problem. It's up to the third party to configure their web host to allow for this.

Javascript reads previously opening tab html on the save Window

I have a task that i do not know where to start, i hope Stack Overflowers can give me some ideas.
I want to read the html source code of the previously opened and still opening tab in my web page.
My approach was to grab the url of the targeted page, send that url to server and do something, then use it in my web page. But i am facing the "same domain policy" on the server side, i know that JSONP can be used, but i must use POST in this case (other reasons). So i think if the tab (page) has been opened and is still open, there must be some ways that i can read the HTML when my web page is opened.
The flow will be if there is Page1 opening, user opens mywebpage.html on the same Window, mywebpage.html finds there is Page1 opening, then grab the HTML source page and use it.
Thanks!
Edit:
This is the full story.
What I am planning to do is a FireFox plugin. And there is a Button (myPluginButton) on the tool bar.
If user click myPluginButton, the HTML code of the current page will be sent to the server, then server parse the HTML code and generate a report, a new tab then is opened to display this report.
My current approach is to read the HTML of current page using newTabBrowser.contentDocument and send it to server, then do the parsing on server side. But this approach creates extra traffic. The efficient way would be only the url of the current page is sent to the server, and we can read HTML and parse it on the server side. However, the same domain policy does not allow me to do this easily.
So, my question is if it is possible to do when user click myPluginButton to open a new tab, this new tab loop all the opening tabs on the browser and reads the HTML contents of them then generate the report, since these tabs are still opening and the HTML contents must be saved on somewhere ( or i am wrong).
Thanks.

The browsers have a built in protection called same origin policy that prevent a page to read the content of other origin(domain, subdomain, port,...)
If you want to gain access to the current page you can use a bookmarklet.
You ask your users to add it in their bookmarks bar, and each time they want to use it, they don't open a tab but click on the bookmark.
This will load your script in the page, with all access to read the page content.
And oddly enough you can POST from this page to your domain, by posting a FORM to an IFRAME hosted on your domain. But you won't be able to read the response of the POST. You can use a setInterval with a JSONP call to your domain to know if the POST was successful.

We Keep Coding

JavaScript is the programming language of the Web.