Scrape entire webpage + css + javascript [closed] - javascript

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I'm trying to create a webpage version control backup / log. Where if the webpage (including JS and CSS) gets altered it saves a static copy on the drive.
How do I get the CSS and javascript of a webpage? Getting the HTML is easy by simply connecting to the webpage and read the contents and return it. But how do I get the CSS & Javascript of this page too?
The system doesnt have direct access to the webserver(s) so I have to do everything over the network remotely.
My idea is I search the HTML I scraped for .css and '.js' and take everything until the first quote " and directly access the CSS / javascript file as webpage. But I think this might not be very reliable?
Not sure why this is marked as too broad. I'm asking how to get the CSS and javascript of a webpage. I reformed my question, hopefully its better now.

Instead of searching for .js and .css , I'd look for <script> and <link> tags instead and use their src and href properties respectively to perform another network request and retrieve those files for comparison.
This will be more reliable because you won't have to worry about the page's content containing js or css, and you could also use an XML parser to ensure things like single-quotes vs. double aren't an issue.

Related

How to detect presence of Javascript in plain HTML using PHP? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I have plain HTML code, without Javascript code in it.
How would you detect if any form of Javascript was injected in the HTML ?
The application generates HTML client side. And needs to validate it once it arrives on the server.
The goal is NOT to remove Javascript, but simply detect the presence of it.
This is what tools like HTML Purifier are for. They break the input into tokens are run them against a white list.
This is safer than trying to find specific ways of inserting scripts into HTML, because there are tricks with malformed tags or non obvious attributes being used. See the XSS Evasion Cheat Sheet for example.
Removing can be easier than detecting - just escape all the HTML etc. you get with htmlspecialchars($string).
Alright, so this is a very interesting challenge:
First, check for all script tags, both capital and lowercase
<SCRIPT> <script> <sCrIPt>
Then, check for event handlers (onclick etc).
For this, we use DOM
$dom = new DOMDocument;
$dom->loadHTML($string);
You can work all sorts of magic with DOM, I recommend reading their documentation. Check for any attributes with "on" in them

Regarding how browser interprets HTML/CSS/Javascript [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I am pretty green to web development.
In one of my courses I have been told the following things will happen when browser reads a web page.
At a very high level I assume this is the basic flow.
1.Browser pulls the HTML page.
2.Browser understands the document structure using the HTML tags.
3.After step 2, browser understands the CSS selectors/properties.
4.Browser builds the DOM model now.
5.After this, the javascript interpreter within the browser interprets the .js script
Questions
1.Is the above flow correct ?
2.I am aware that the HTML tags can be manipulated by javascript.
Are the CSS selectors are also part of DOM and can be manipulated by javascript ?
Not exactly correct. It's a complicated process.
JavaScript isn't run just after the entire page is loaded, which is why you'll see a lot of junior programmers make the mistake of trying to manipulate HTML, without checking if the page has loaded.
When the browser reaches an element such as <script> or <link> it will attempt to pull the resource, and if successful, will then execute that resource. Meaning the JavaScript code, for instance, will run before the DOM has loaded, if the <script> tag is in the head (where it usually is). CSS works in a similar way, however it doesn't really matter when CSS is applied, in most cases, since it can't crash. You can create styles and even change the inline styles of elements, using JavaScript, but a general rule of thumb is to keep styles that CAN be in .css files there.

Use javascript to target element in php include [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 8 years ago.
Improve this question
What's the best way to use javascript to target html elements included in an external php file?
I am adding/removing a class from my header whenever the client scrolls more than 20px from the top. The header is saved in header.php which is an included file in each page. This only works on html elements in the main pages. How do I target elements in the included files when sensing changes in distance from the main file?
Javascript works on the client. It sees only the final output from PHP and knows nothing about what mechanism PHP might have used to create it. Treat your header exactly as if it were part of the main document.

creating Page turner effect for PDF files with angular JS [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
Is there a way to create page turner effect for PDF files with angular? Jquery solutions are also fine. I have seen turn.js which uses html. Can any one help out to find a way for PDF files?
If you are talking about the pages within a PDF having a page curl effect then it is not something you can do with js, html or anything else outside the PDF itself without converting the PDF to something else (ie flash, jpg images, etc).
Last time I checked the only way to achieve this within a PDF was by using Acrobat Pro or InDesign and using 'Page Transitions'.
Please note that out of the available page transitions 'Page Turn' (the curl effect you want) will cause the document to be converted to a flash file and then embedded in the PDF.
I'm sorry if this is not what you want to hear. Rather than creating fancy page curl/turn effects it is probably better to concentrate on producing a well designed, easy to navigate document with great content. This will provide much better value.

Can javascript read webpage? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
can my page Javascript read same page which itself is loaded? Like other parts of page are dynamically loaded by other provider. I have tried many things, google as well, but now I am in doubt that it is posible. Or it is.
Thank You!
If the page has loaded and the javascript you are running is client-side (which it should be), you should be able to access everything on the page via the document object. I would advise reading about the DOM to familiarise yourself with this.
EDIT: removed link
Server side code (whether written in JavaScript or otherwise) is not capable of determining the final rendering of the page in the user's browser.
You could build the entire page yourself (and you could use a headless browser, like PhantomJS, to do it) but that could give different results to a visitor's as you would have a different set of cookies, a different source IP address, and so on.

Categories