Render JavaScript to HTML in Python? - javascript

What
My web-app is made dynamic through Google's AngularJS.
I want static versions of my pages to be generated.
Why
Web-scrapers like Google's execute and render the JavaScript; but don't treat the content the same way as their static equivalents.
References:
Does heavy JavaScript use adversely impact Googleability? (Programmers StackExchange)
Making AJAX Applications Crawlable (Google Documentation for webmasters)
How
Not sure exactly how—which is why I'm asking—but I want to access the same source that your browser's 'inspect element' presents; rather than the source that: Ctrl+U (View page source) shows.
Once I have a script which renders the page; 'spitting' out the HTML+CSS; I will place those 'generated' files on my web-server. A 'cron' job will then be scheduled to regenerate the files at regular intervals.
These static files will subsequently be served instead of the dynamic ones; when JavaScript is disabled and/or when a scraper 'visits' the site.

Here is one solution, however I very much doubt I'll be able to find a public PaaS cloud which can run it:
import spynner
if __name__=='__main__':
url = "http://angular.github.com/angular-phonecat/step-10/app/#/phones"
browser = spynner.Browser()
browser.create_webview(True)
browser.load(url, load_timeout=60)
print browser._get_html()
# ^ Can pipe this to a file, POST it to my server or return it as a string
browser.close()
Package: Spynner (on Github)

Related

Force browser to reload the Javascript files

I am trying to achieve the below in ASP.NET MVC3 web application which uses razor.
1) In my Index.cshtml file, I have the below reference.
<script src="/MySite/Scripts/Main.js"></script>
2) I load my home page for the first time and a http request is made to fetch this file which returns 200.
3) Then, I made some changes to the Main.js and saved it.
4) Now I just reload the home page (please note that I am not refreshing the page) by going to the address bar and typing the home page url and pressing enter. At this point, I want the browser to fetch the updated Main.js file by making a http request again.
How can I achieve this? I don't want to use System.Web.Optimization bundling way. I knew that we can achieve this by changing the URL (appending version or some random number) everytime the file changes.
But the challenge here is the URL is hardcoded in my Index.cshtml file. Everytime when there is a change in Main.js file, how can I change that hardcoded URL in the Index.cshtml file?
Thanks,
Sathya.
What I was trying to achieve is to invalidate browser cache as soon as my application javascript file (which already got cached in the browser) gets modified at the physical location. I understood that this is simply not achievable as no browsers are providing that support currently. To get around this below are the only two ways:
1)Use MVC bundling
2)Everytime the file is modified, modify the URL by just appending the version or any random number to the URL through querystring. This method is explained in the following URL - force browsers to get latest js and css files in asp.net application
But the disadvantage with the 2nd method is, if there are any external applications referring to your application's javascript file, the browser cache will still not be invalidated without refreshing the external application in browser.
Just add a timestamp as a querystring parameter:
var timestamp = System.DateTime.Now.ToString("yyyyMMddHHmmssfff");
<script src="/MySite/Scripts/Main.js?TimeStamp=#timestamp"></script>
Note: Only update TimeStamp parameter value, when the file is updated/modified.
It's not possible without either using bundling (which internally handles version) or manually appending version. You can create a single file bundle as well if you want.

Crawl some of Javascript codes in a web-page

The page I am trying to crawl has includes javascript code. (Possibly using AJAX?) When I crawl the page based on the html code, it can't get the javascript part. How can I do that?
I think I need some libraries in python which can crawl the javascript code including html codes.
Please give me some advice.
Below is the page link:
view-source:http://www.bobaedream.co.kr/mycar/popup/mycarChart_4.php?zone=C&cno=652691&tbl=cyber
I recommend two ways.
First, request ajax url directly and parse HTML.
import requests
url = "http://www.bobaedream.co.kr/mycar/proc/mycar_regist_option.php"
data = {'param': 'ALL'}
response = requests.post(url, data=data)
# parse
...
Second, use web driver, like geckodriver, phantomjs and so on, using selenium library.
That library make virtual browser, run javascript and then render the DOM made by javascript.
This is public documents about selenium

Python: Issue getting updating html created via JavaScript calls in browser

I am using Python to pull the HTML of a website to get satellite locations. Of course since I am not actually accessing the site via a browser I am not retrieving any html that would be populated by javascript calls.
import urllib.request
page = urllib.request.urlopen('http://n2yo.com/?s=20217')
file = open("textFile", "wb")
satelliteText = page.read()
file.write(satelliteText)
file.close()
I've explored libraries like Windmill that literally run a browser so that you can get that javascript created html, but I am using a Raspberry Pi. I'd rather not install an additional browser.
Is there anyway that I can make the ajax get calls myself that the website is making and retrieve just the data I need?
Looking at this source here: http://www.n2yo.com/js/passes.js it appears that it is calling http://www.n2yo.com/inc/all.php to get the data. By reading through passes.js carefully you should be able to figure out how to parse it.

How to avoid downloading the entire PDF to display

in my webpage you can read book in pdf format. The problem is that some books have around 1000 pages and the PDF is really big so even if the user reads just 10 pages the server download the full pdf, so this is awful for my hosting account because I have a transfer limit.
What could I do to display the pdf without load the full PDF.
I use pdf.js
Greetings.
ORIGINAL POST:
PDF files are designed in a way that forces the client side to download the whole file just to get the first page.
The last line of the PDF file tells the PDF reader where the root dictionary for the PDF file is located (the root dictionary tells the reader about the page catalog - order of pages - and other data used by the reader).
So, as you can see, the limitations of the PDF design require that you use a server side solution that will create a new PDF with only the page(s) you want to display.
The best solution (in my opinion) is to create a "reader" page (as opposed to a download page) that requests a specific page from the server and allows the user to advance page by page (using AJAX).
The server will need to create a new PDF (file or stream) that contains only the requested page and return it to the reader.
if you are running your server with Ruby (ruby on rails), you can use the combine_pdf gem to load the pdf and send just one page...
You can define a controller method that will look something like this:
def get_page
# read the book
book = CombinePDF.parse IO.read("book.pdf")
# create empty PDF
pdf_with_one_page = CombinePDF.new
# add the page you want
# notice that the pages array is indexed from 0,
# so an adjustment to user input is needed...
pdf_with_one_page << book.pages[ params[:page_number] - 1 ]
# no need to create a file, just stream the data to the client.
send_data pdf_with_one_page.to_pdf, type: 'application/pdf', disposition: 'inline'
end
if you are running PHP or node.js, you will need to find a different server-side solution.
Good luck!
EDIT:
I was looking over the PDF.js project (which looks very nice) and notice the limited support statement for Safari:
"Safari (desktop and mobile) lacks a number of features or has defects, e.g. in typed arrays or HTTP range requests"...
I understand from this statement that on some browsers you can manage a client-side solution based on the HTTP Byte Serving protocol.
This will NOT work with all browsers, but it will keep you from having to use a server-side solution.
I couldn't find the documentation for the PDF.js feature (maybe it defaults to ranges and you just need to set the range...?), but I would go with a server-side solution that I know to work on all browsers.
EDIT 2:
Ignore Edit 1, as iPDFdev pointed out (thank you iPDFdev), this requires a special layout of the PDF file and will not resolve the issue of the browser downloading the whole file.
You can take following approach governed by functionality
Add configuration (i.e. kind of flag) whether you want to display entire PDF or not.
While rendering your response read above mentioned configuration if flag is set generate minimal PDF with 20 pages with hyperlink to download entire PDF else minimal PDF with 20 pages only
When you prepare initial response of your web page add PDF which contains say 20 pages (minimal PDF) only and process the response

Generate some xml in javascript, prompt user to save it

I'd like to make an XML document in JavaScript then have a save dialog appear.
It's OK if they have to click before the save can occur.
It's *not* OK if I *have* to use IE to achieve this (I don't even need to support it at all). However, Windows is a required platform (so Firefox or Chrome are the preferred browsers if I can only do this in one browser).
It's *not* OK if I need a web server. But conversely, I don't want to require the JavaScript to be run on a local file only, i.e. elevated privileges -- if possible. That is, I'd like to to run locally or on a *static* host. But just locally is OK.
It's OK to have to bend over backwards to do this. The file won't be very big, but internet access might either be there, be spotty or just not be a possibility at all -- see (3).
So far the only ideas I have seen are to save the XML to an iframe and save that document -- but it seems that you can only do this in IE? Also, that I could construct a data URI and place that in a link. My fear here is that it will just open the XML file in the window, rather than prompt the user to save it.
I know that if I require the JavaScript to be local, I can raise privileges and just directly save the file (or hopefully cause a save dialog box to appear). However, I'd much prefer a solution where I do not require raised privileges (even a Firefox 3.6 only solution).
I apologize if this offends anyone's sensibilities (for example, not supporting every browser). I basically want to write an offline application and Javascript/HTML/CSS seem to be the best candidate considering the complexity of the requirements and the time available. However, I have this single requirement of being able to save data that must be overcome before I can choose this line of development.
How about this downloadify script?
Which is based on Flash and jQuery, which can prompt you dialog box to save file in your computer.
Downloadify.create('downloadify',{
filename: function(){
return document.getElementById('filename').value;
},
data: function(){
return document.getElementById('data').value;
},
onComplete: function(){
alert('Your File Has Been Saved!');
},
onCancel: function(){
alert('You have cancelled the saving of this file.');
},
onError: function(){
alert('You must put something in the File Contents or there will be nothing to save!');
},
swf: 'media/downloadify.swf',
downloadImage: 'images/download.png',
width: 100,
height: 30,
transparent: true,
append: false
});
Using a base64 encoded data URI, this is possible with only html & js. What you can do is encode the data that you want to save (in your case, a string of XML data) into base64, using a js library like jquery-base64 by carlo. Then put the encoded string into a link, and add your link to the DOM.
Example using the library I mentioned (as well as jquery):
<html>
<head>
<title>Example</title>
</head>
<body>
<script>
//include jquery and jquery-base64 here (or whatever library you want to use)
document.write('click to make save dialog');
</script>
</body>
</html>
...and remember to make the content-type something like application/octet-stream so the browser doesn't try to open it.
Warning: some older IE versions don't support base64, but you said that didn't matter, so this should work fine for you.
Without any more insight into your specific requirements, I would not recommend a pure Javascript/HTML solution. From a user perspective you would probably get the best results writing a native application. However if it will be faster to use Javascript/HTML, I recommend using a local application hosting a lightweight web server to serve up your content. That way you can cleanly handle the file saving server-side while focusing the bulk of your effort on the front-end application.
You can code up a web server in - for example - Python or Ruby using very few lines of code and without 3rd party libraries. For example, see:
Making a simple web server in python
WEBrick - Writing a custom servlet
python-trick-really-little-http-server - This one is really simple, and will easily let you server up all of your HTML/CSS/JS files:
"""
Serves files out of its current directory.
Doesn't handle POST requests.
"""
import SocketServer
import SimpleHTTPServer
PORT = 8080
def move():
""" sample function to be called via a URL"""
return 'hi'
class CustomHandler(SimpleHTTPServer.SimpleHTTPRequestHandler):
def do_GET(self):
#Sample values in self for URL: http://localhost:8080/jsxmlrpc-0.3/
#self.path '/jsxmlrpc-0.3/'
#self.raw_requestline 'GET /jsxmlrpc-0.3/ HTTP/1.1rn'
#self.client_address ('127.0.0.1', 3727)
if self.path=='/move':
#This URL will trigger our sample function and send what it returns back to the browser
self.send_response(200)
self.send_header('Content-type','text/html')
self.end_headers()
self.wfile.write(move()) #call sample function here
return
else:
#serve files, and directory listings by following self.path from
#current working directory
SimpleHTTPServer.SimpleHTTPRequestHandler.do_GET(self)
httpd = SocketServer.ThreadingTCPServer(('localhost', PORT),CustomHandler)
print "serving at port", PORT
httpd.serve_forever()
Finally - Depending on who will be using your application, you also have the option of compiling a Python program into a Frozen Binary so the end user does not have to have Python installed on their machine.
Javascript is not allowed to write to a local machine. Your question is similar to this one.
I suggest creating a simple desktop app.
Is localhost PHP server ok? Web traditionally can't save to hard drive because of security concerns. PHP can push files though it requires a server.
Print to PDF plugins are available for available for all browsers. Install once, print to PDF forever. Then, you can use a javascript or Flash to call a Print function.
Also, if you are developing for an environment where internet access is spotty, conwider using VB.NET or some other desktop language.
EDIT:
You can use the browser's Print function.
Are you looking for something like this?
If PHP is ok, if would be much easier.
With IE you could use document.execCommand, but I note that IE is not an option.
Here's something that looks like it might help, although it will not prompt with SaveAs dialog, https://developer.mozilla.org/en/Code_snippets/File_I%2F%2FOL.
One simple but odd way to do this that doesn't require any Flash is to create an <a/> with a data URI for its href. This even has pretty good cross-browser support, although for IE it must be at least version 8 and the URI must be < 32k. It looks like someone else on SO has more to say on the topic.
Why not use a hybrid flash for client and some server solution server-side. Most people have flash so you can default to client side to conserve resources on the server.

Categories