I'm wondering how I could Pentest a website made completely in JavaScript, for example using the qooxdoo Framework.
Those websites do not contain any requests to the server which respond with HTML content. Only one Javascript file gets transmitted when loading the page (which is an almost empty html page with just the link to the javascript file) and the page is then beeing set up by the loaded JS file, without any line of HTML written by the developer.
Typically, there would be some Spidering/Crawling in most Web App Scanners (like Nexpose), which check a website for links and forms and crawl any link they find which directs to the same domain and test any parameter found on these links. I assume those scanners would not have any effect on a pure JS page.
Then there's the other possibility: A proxy server (like Burp Suite) which captures any traffic beeing sent to a server and is able to check any found parameters on this requests. This would probably work to test the API-Server which is located behind the Website (for example to find SQL injections).
But: Is there any way to test the client, for example for XSS (self or stored)?
Or more in general: What types of attacks would you typically need to check in such a pure JS web application? What tools could help with that?
Related
I'm a newbie when it comes to javascript, but since I have a local intranet web app at work, which I would like to programmatically retrieve a file from by programming another website on the internet, I was wondering if it was at all possible to make client-side (browser) javascript request another website (on the local intranet) and click a button and retrieve a file ? (the next step is to parse this excel file to json in the browser but that's a separate problem).
I have looked at artoo.js on github but am unsure if this is up to the task of clicking a jquery loaded button on that site in question, and retrieve a file.
The local intranet app lacks an API.
You can't. Javascript cannot access (private) local file without an explicit file upload made by the user. For public files (i.e. files on a web server) make sure to check if Cross Origin is set up correctly
You can't, unless Cross Origin Resource Sharing is enabled for those files.
Update :
This can be done using Chrome+Pupeeteer+Nodejs..
Read This post this gives a Full Tutorial How to scrap data from a webpage automatically and more..https://codeburst.io/a-guide-to-automating-scraping-the-web-with-javascript-chrome-puppeteer-node-js-b18efb9e9921.
..
make client-side (browser) javascript request another website (on the local intranet) ...
this can be achieved without cross-Origin Resource Sharing If both website are on same server..as in your case LOCALHOST.
click a button and retrieve a file?
For this, you need a js runtime for executing another website client-side. then you can apply js scripting on the page and manipulate the Dom or fire an event like a button click.
But this is very complex to do So alternatively directly access the file or make a common API endpoint for both the websites.
Obviously Apostrophe CMS code is javascript-based, so I'm wondering to what extent Apostrophe pages are 'properly' indexable (i.e. "Ready for JavaScript Crawling & Indexing"?).
The reason I ask this is because of this article, in which Moz tested a number of javascript-derived content pages and found them lacking from an SEO perspective.
So can Apostrophe / Node.js pages be properly indexed by search engines?
I'm the lead developer of Apostrophe at P'unk Avenue.
The article you're referring to is talking about crawling URLs that exist only due to JavaScript in the browser, such as React apps with a JavaScript-based router in them.
The issue here is that Google can't naively crawl them just by fetching pages from the server and parsing links in the returned HTML. It has to either run the JavaScript itself like a browser would or use other tricks where the server does extra work just for Google, etc.
Apostrophe is powered by Node.js, which is JavaScript running on the server. The pages it sends out are basically indistinguishable from pages generated by any other CMS or even a static folder full of webpages — the server sends perfectly valid HTML that doesn't require any browser-side JavaScript at all in order to read all of the text and find all of the links. The fact that the server happens to have a JavaScript engine inside it is irrelevant to Google because it is receiving normal HTML from that server. That server could be running PHP or Ruby or Python or CrazyMadeUpLanguage... doesn't matter to Google as long as it's sending HTML to the world.
I want to show website preview on a link similar to facebook when a user post a link. my question has been repeated in the following link ,but I am going to ask specific information throughout my solutions. I have 2 solutions for showing webpage preview which are as follows:1. server side html process 2. client side html process.
1. server-side html process
I used System.Net.WebClient().DownloadString(url) to retrieve the web page data in server side , and I tried to extract the most important information in the page ,but in most cases, main part of the page loads using javascript , therefore I do not have access to that information.
Another solution in server-side html process is to work with webBrowser and WebDocument objects. because I didn't work with these libraries and I don't know how much the Web server performance affect by applying this objects , I only present this solution for discussion .Therefore are there any server-side html graber which fetch all html data including javascript loaded html source?
2. Client Side Html process
The simplest approach for client side is to use the iframe tag, but it has two following problems:
a. I can not access to innerHTML of the frame for the links on other domains.
b. I can not load https webpages such as drop-box and facebook in the iframe
because of "x-frame options" error.
My question is that, is there any other client-side solution to retrieve dynamic html source(loaded by javascript) from 3rd party webpages (usually https)? Or can I solve above problems with some tricks.
I guess server side approach would be most viable option. On client side you can use proxy services which allow to solve cross domain limitation, for example, crossorigin.
To generate a preview, similar to one Facebook provides, you need to get Open Graph information for target page. Libraries to process open graph data available for multiple platforms. OpenGraph-Net could be used on .NET plarform.
I've got this setup:
Single page app that generates HTML content using Javascript. There is no visible HTML for non-JS users.
History.js (pushState) for handling URLS without hashbangs. So, the app on "domain.com" can load dynamic content of "page-id" and updates the URL to "domain.com/page-id". Also, direct URLS work nicely via Javascript this way.
The problem is that Google cannot execute Javascript this way. So essentially, as far as Google knows, there is no content whatsoever.
I was thinking of serving cached content to search bots only. So, when a search bot hits "domain.com/page-id", it loads cached content, but if a user loads the same page, it sees normal (Javascript injected) content.
A proposed solution for this is using hashbangs, so Google can automatically convert those URLs to alternative URLs with an "escaped_fragment" string. On the server side, I could then redirect those alternative URLs to cached content. As I won't use hashbangs, this doesn't work.
Theoretically I have everything in place. I can generate a sitemap.xml and I can generate cached HTML content, but one piece of the puzzle is missing.
My question, I guess, is this: how can I filter out search bot access, so I can serve those bots the cached pages, while serving my users the normal JS enabled app?
One idea was parsing the "HTTP_USER_AGENT" string in .htaccess for any bots, but is this even possible and not considered cloaking? Are there other, smarter ways?
updates the URL to "domain.com/page-id". Also, direct URLS work nicely via Javascript this way.
That's your problem. The direct URLs aren't supposed to work via JavaScript. The server is supposed to generate the content.
Once whatever page the client has requested is loaded, JavaScript can take over. If JavaScript isn't available (e.g. because it is a search engine bot) then you should have regular links / forms that will continue to work (if JS is available, then you would bind to click/submit events and override the default behaviour).
A proposed solution for this is using hashbangs
Hashbangs are an awful solution. pushState is fix for hashbangs, and you are using that already - you just need to use it properly.
how can I filter out search bot access
You don't need to. Use progressive enhancement / unobtrusive JavaScript instead.
I want to create a web crawler/spider to iteratively fetch all the links in the webpage including javascript-based links (ajax), catalog all of the Objects on the page, build and maintain a site hierarchy. My question is:
Which language/technology should be better (to fetch javascript-based links)?
Is there any open source tools there?
Thanks
Brajesh
You can automate the browser. For example, have a look at http://watir.com/
Fetching ajax links is something that even the search-giants haven't accomplished yet. It is because, the ajax links are dynamic and the command and response both vary greatly as per the user's actions. That's probably why, SEF-AJAX (Search Engine Friendly AJAX) is now being developed. It is a technique that makes a website completely indexable to search engines that when visited by a web browser, acts as a web application. For reference, you may check this link: http://nixova.com
No offence but I dont see any way of tracking ajax links. That's where my knowledge ends. :)
you can do it with php, simple_html_dom and java. let the php crawler copy the pages on your local machine or webserver, open it with an java application (jpane or something) mark all text as focused and grab it. send it to your database or where you want to store it. track all a tags or tags with an onclick or mouseover attribute. check what happens when you call it again. if the source html (the document returned from server) size or md5 hash is different you know its an effective link and can grab it. i hope you can understand my bad english :D