I'm now making a web crawler.
getting a link from HTML is easy part but acquiring a link from the result of javascript is not easy for me.
Can I get the result of javascript so as to know where a link is referred to?
for example.
How can I retrieve the link to google.com from javascript code in Python?
<!DOCTYPE html>
<html lang="en">
<head></head>
<body>
to google
</body>
<script>
document.getElementById('goog').onclick = function() {
window.location = "http://google.com";
};
</script>
</html>
You would need to install node.js and run a separate piece of code that executes the Javascript code in context to emit the html. This is possible using jsdom but the key to it is extracting the Javascript code from the HTML page, and setting up the context correctly.
Python doesn't offer a way to execute the Javascript, which would be a large task, and may not even be what you want, because you won't know how to execute all of the appropriate Javascript.
For the code you showed, you could simply regex the entire thing to get URL-like strings from it, but that could be very ad-hoc and error-prone.
Related
I need to embed javascript directly into html page generated by Thymeleaf
Something like:
<script th:include="../static/assets/generated/scripts.js"></script>
But this simple usage leads to SAXParseException...
Is there any easy way to switch off parsing of the th:included
content? Or any other way how to embed content of resource int the result page?
I don't think that is possible out of the box. You could probably write an extension that can do it. Or maybe there is an existing one, but I couldn't find one right now.
Does it have to to be a separate JavaScript file? Can't you put your JavaScript code into a fragment and include it like any other fragment?
NB: Including JavaScript into your HTML file like that is usually bad web design und may be a sign that you have bigger problems and you haven't structured your code well. Why do you think you need to do that? Why can't you refer to an external script file?
Thats not a Thymeleaf-Thing. It's classic html:
<script src="/assets/generated/scripts.js"></script>
In version 3.0, you can do it in this way
<script th:src="#{/webjars/jquery/dist/jquery.min.js}"></script>
I've been searching and I can't find an answer to this.
I'm using Servlet and after the servlet loads, I'm printing using response.getWriter().print(String); in the web. When all the content in the web browser is loaded I want to execute a Javascript script but I can't make it runs.
Any idea how can make it run?
Using response.getWriter().print(String) is very primitive way of generating HTML from servlet. Your best bet is to use JSP or JSF for that purpose. But to answer to your original question you either need to write the raw Javascript code or Javascript import along with onload right into the string that you pass to print method. See this HTML snippet for example how that string should look like:
<body onload="doSomething()">
<script>
function doSomething() {
}
</script>
I am working on a angularjs module in which we are trying to avoid a particular piece of logic in the view source on the browser. I have just given the skeleton code where the logic written inside script tags should not be shown in the page source of the browser.
<html ng-app="myApp">
<head>
<script>
if(something){
do something...
}
<script>
</head>
<body></body>
</html>
Is there any way in angularjs or javascript such that the logic written inside script is not visible in view page source on browser?.
This is not possible as the code needs to be executed by the browser, thus transferred to the client. Anyone can read/copy your code.
The most you can do is to use uglify or similar tool to minify your code. This will have to advantages for you. First, your code will be hard to read for others who might want to exploit your application. But keep in mind, it is still not impossible to understand your code even when minified - it just takes more time. And second, your scripts will become smaller thus making your page load slightly faster.
One more solution is make it look a bit complex so the user trying to read it does not understand it(in case you are not interested in making a minfied version)
Store all the Variables required in a separate file and access them form that file.
Even your base URL should be stored as global variable.
These are some options which you can use but making a minfied code is best practice.
You can't hide code to your client. It is executed in the client browser.
The best thing you can do here is to minify your code. It will make it unreadable without parsing it. Also, the code will be smaller, and will be loaded by the browser faster.
As a side note, luckily all the code is visible: imagine if malicious code could be executed without you know it.
I'm still new to javascript and node.js and was trying to accomplish something but I am encountering the error ReferenceError: Document not defined
Given the following code in seperate files:
Index.html:
<!DOCTYPE html>
<html>
<head>
<title>My Test Website</title>
<script type="text/javascript" src="script.js"></script>
<link rel="stylesheet" href="style.css">
</head>
<body>
<p id="para1"> Some text here</p>
</body>
</html>
And script.js:
function getParagraphContents() {
var paragraph = document.getElementById("para1");
var temp = paragraph.textContent;
console.log(temp);
}
So my problem being is I wanted to be able to run the command node script.js from the command line but keep getting an error message. I keep reading other answers and documentation but nothing seems to be the right answer. Any suggestions? P.S. I am using terminal on mac osx and have node and npm install correctly.
Thank you.
script.js is a script designed to run in a browser, not in node.js. It is designed to operate on the current web page and uses the document object to get access to that page.
A node.js execution environment is not the same as a browser. For example, there is no notion of the "current page" like there is in a browser and thus there is no global document or window object either like there is in a browser.
So, this all explains why trying to run node script.js doesn't work. There's no global document object so your script quickly generates an error and exits.
We can't really tell from your question if you just don't really understand the difference between running scripts in a browser vs. running scripts in a node.js environment and that's all you need to have explained or if you actually want to operate on an HTML document from within node.js?
It is possible to operate on HTML pages from within node.js. The usual way to do that is to get an HTML parsing library like jsdom. You can then load some HTML page using that library and it will create a simulated browser DOM and will make things like the document object available to you so you could execute a browser-like script on an HTML document.
I downloaded a JavaScript file from the following link
I stored this code into 'goldprice.js'
Then I somehow minimized the whole HTML code from here
to the following simplified code.
<html>
<head></head>
<body>
<h1>
Gold Price:
<xyz id="gpotickerLeft_price"></xyz>
</h1>
<script src="goldprice.js"></script>
<script type="text/javascript">
var leftticker
var ticker = new GPOTicker();
ticker.addTicker(leftticker, 'gpotickerLeft');
ticker.start();
</script>
</body>
</html>
Now my question is how to get this whole thing working in Java program? I want the output of the above HTML code in my Java program so that I may be able to use that automatically updated value in jlabel.
I have seen some examples on
javax.script.ScriptEngine and javax.script.ScriptEngineManager but I don't really think my problem is that simple.
If you take a closer loook to the javascript code, you'll find, that the javascript fetches the actual data from a webservice. Look for the code that starts with
{api2:"http://api2.goldprice.org/Service.svc/GetRaw/",resourcesRootPath:"/goldprice/img/"
That seems to be their webservice, which you could try to access directly from Java. Note, however, that you ought to ask for the permission to use that service. If it is an service that is meant to be officially used by others, there should exist a documentation and all to explain how to use it. That would be the best way to go.