How to get Dynamic HTML code by PHP or JS - javascript

I want to get contents from a website, but when I use file_get_contents() function, I get the HTML code, but some of them lost, I check the site code, I know some parts generate by Ajax, I don't know how to get them, does someone have any suggestions?
I may get some examples,
Site: http://www.drbattery.com/category/notebook+battery/acer/aspire+series.aspx?p=3
Request: I want to get those laptop model which list on this page, such as "Aspire 1690" etc. I need all of those models.

Mhm.
In JS you can access the HTML content in a browser by
document.getElementsByTagName('body')[0].innerHTML
Doing this server-side, you would probably need a headless browser for this.
The tricky part would be detecting, when the content has finished loading and everything is in place. (You wont be able to track AJAX requests by "window.onload".)
Doing it manually, you could add a bookmarklet to your browser, like
javascript:alert(document.getElementsByTagName('body')[0].innerHTML)
You could then select the alert's content by keyboard shortcut (CTRL + A or Command + A), copy it, and hit return (as the dialog's close-button will probably be out of sight).

Related

Is it possible to run a PHP file with HTML on the server

Normally when you have a .PHP file and the client request it, the PHP code is run on the server and the HTML and JavaScript are sent to the client.
Question
Is it possible to have the server request a webpage (local) and run both the PHP code and the HTML with JavaScript on the server? I have created a single .html file that after 3 seconds of processing locally creates the image data for a thumbnail of the given video.
Why
I need to generate a thumbnail for a video. I used shared hosting and my hosting provider doesn't support for ffmpeg. You can, however, generate thumbnails using a canvas and JavaScript. I have already put a lot of pressure on the client. If this is possible, upload and download times would be significantly shorter than using the client.
Attempts
I've tried using file_get_contents(), but it doesn't run the code (Makes sense). Is there a way I could have it open and run for x seconds and then grab the contents?
I've tried using curl to get the file using this function here. I believe it is similar to my previous attempt in that it gets the file contents, but never executes them.
My final attempt was to use new DOMDocument(). I couldn't even get to loading the page though. First, I can't parse it with a video tag. It gives this error:
Warning: DOMDocument::load(): Specification mandates value for attribute controls in
file:\path\to\html\document.html, line: 53 in C:\path\to\php\document.php on line 50
If I were to remove the video tag (which is required), I get errors while parsing my JavaScript. So that attempt also did not work.
Is there a way that I could have PHP process the code (for something on the server) for x seconds before getting the contents? It would allow for time to generate the thumbnail data. If there is another way to do this without using ffmpeg on the server, that would be great.
So as I mentioned in comments, what I'm gonna explain is just an option (not the best one and just answering for your need of running html code!)
Where to do this?
Personally I rather to do this when the video is being uploaded by admin's browser and the best thing is that you can do this as a part of the posting procedure.
So in the page that you want this process to be done, put an invisible iframe like this.
<iframe id="myIframe" style="display: none;"></iframe>
How to begin the process?
I don't know the way you use to upload the videos (and it really is not that important!) but let's assume you want to use formdata. After the video is uploaded you need to know something unique to address the video (let's say an id). So after the video is uploaded, we can recive a code like id:20, initiateThumbnail:true as the result json data. Then we can simply use that hidden iframe to be the browser you've been asking for like this:
$("#myIframe").attr("src","dothething.php?video=20");
Now do what ever you wanted to do in it and change it's content after it's done. Now you need to wait for the result!
$('#myIframe').load(()=>{
let result = $("#myIframe").contents();
// checking result!
});
As you have already thought about, you can handle any errors by processing the result.
Notes
The event listener we used for iframe (iframe.load) fires when you initiate making the thumbnail as well. So be careful with the process of checking result (content of that iframe!)
If you don't use ajax or formdata, simply the action of your form is what I used as iframe.
One question? What happens if network connection goes down during this process? Simple answer! You can check in so many ways that the thumbnail exists or not. If not you can create it once that user requests for it in his browser and upload it back to server and save it for ever (as you did it in admin's panel!)
I think there isn’t another way to generate thumbnail on php server than with ffmpeg.
The only thing you can do, I suppose, is to force canvas generation on page load if you aren’t already doing it.
Anyway you are trying to do something wrong. Php doesn’t evaluate the html code, it’s just a preprocessor and not an interpreter like the browser. You can wait all the time of the world, but you’ll never get the content of the image that only a browser will generate.

Can I create a new web page from the HTML in a div using javascript?

I have a web page that allows a user to choose some options for a widget and then dynamically generates example HTML based on those options. The HTML is put in a div on the page so that the user can see how it looks and copy/paste it to their own site, if they so desire.
I would like to add a "view this example page" link, which opens in a new window and has the example HTML from the div, so that the example can instantly be seen in action.
Is there a way to do this with javascript/jquery?
You can actually use the window.open method, saving a reference to the opened window, and then writing to it.
https://developer.mozilla.org/en-US/docs/Web/API/Window/open
var exampleWin = window.open("", "example");
var docMarkup = "<!doctype html><html><head><title>test</title></head>" +
"<body><p>Hello, world.</p></body></html>";
exampleWin.document.write(docMarkup);
// later you can also do exampleWin.close() if you wish
Try pasting the above code in your browser's developer tools console.
The usual way to accomplish the end goal works a bit differently. You have a web server listening for GET requests at /code (or similar) and it constructs and responds with the appropriate HTML based on the query string. So you can request /code?color=blue, for example.
Constructing documents is what web servers are designed to do. This approach allows you to leverage caching policies, integrate with a wider variety of user authentication and authorization systems, etc.
To display the source code to the user, simply fetch() the appropriate URL and put the contents in a <code> tag. To display the rendered widget, use an <iframe> whose src is the same URL.
If you really want it to be a new window, open() the URL instead of using an iframe. But beware of popup blockers.

Downloading dynamically loaded webpage with python

I have this website and I want to download the content of the page.
I tried selenium, and button clicking with it, but with no success.
#!/usr/bin/env python
from contextlib import closing
from selenium.webdriver import Firefox
import time
# use firefox to get page with javascript generated content
with closing(Firefox()) as browser:
# setting the url
browser.get("http://bonusbagging.co.uk/oddsmatching.php#")
# finding and clicking the button
button = browser.find_element_by_id('select_button')
button.click()
page = browser.page_source
time.sleep(5)
print(page.encode("utf8"))
This code only downloads the source code, where the data are hidden.
Can someone show me the right way to do that? Or tell my how can be the hidden data downloaded?
Thanks in advance!
I always try to avoid selenium like the plague when scraping; it's very slow and is almost never the best way to go about things. You should dig into the source more before scraping; it was clear on this page that the html was coming in and then a separate call was being made to get the table's data. Why not make the same call as the page? It's lightning fast and requires no html parsing; just returns raw data, which seems to be what you're looking for. the python requests import is perfect for this. Happy Scraping!
import requests
table_data = requests.get('http://bonusbagging.co.uk/odds-server/getdata_slow.php').content
PS: The best way to look for these calls is to open the dev console, and check out the network tab. You can see what calls are being made here. Another way is to go to the sources tab, look for some javascript, and search for ajax calls (that's where I got the url I'm calling to above, the path was: top/odds-server.com/odds-server/js/table_slow.js). The later option is sometimes easier, sometimes it's nearly impossible (if the file is minified/uglified). Do whatever works for you!
Check out the Network tab in Chrome Dev tools. Nab the URL out of there.
What you're looking at is a DataTable. You can use their API to fetch what you need.
Adjust the "start" and/or "length" parameters to fetch the data page-by-page.
It's JSON data, so it'll be super easy to parse.
But be nice and don't hammer this poor guy's server.

How can I *locally* save an .html file generated by javascript (running on a *local* .html page)?

So I've been researching this for a couple days and haven't come up with anything conclusive. I'm trying to create a (very) rudimentary liveblogging setup because I don't want to pay for something like CoverItLive. My process is: Local HTML file > Cloud storage (Dropbox/Drive/etc) > iframe on content page. All that works, and with some CSS even looks pretty nice despite the less-than-awesome approach. But here's the thing: the liveblog itself is made up of an HTML table, and I have to manually copy/paste the code for a new row, fill in the timestamp, write the new message, and save the document (which then syncs with the cloud and shows up in the iframe). To simplify the process I've made another HTML file which I intend to run locally and use to add entries to the table automatically. At the moment it's just a bunch of input boxes and some javascript to automate the timestamp and write the table row from the input data.
Code, as it stands now: http://jsfiddle.net/LukeLC/999bH/
What I'm looking to do from here is find a way to somehow export the generated table data to another .html file on my hard drive. So far I've managed to get this code...
if(document.documentElement && document.documentElement.innerHTML){
var a=document.getElementById("tblive").innerHTML;
a=a.replace(/</g,'<');
var w=window.open();
w.document.open();
w.document.write('<pre><tblive>\n'+a+'\n</tblive></pre>');
w.document.close();
}
}
...to open just the generated table code in a new window, and sure, I can save the source from there, but the whole point is to eliminate steps like that from the process.
How can I tell the page to save the generated code to a separate .html file when I click on the 'submit' button? Again, all of this happens locally, not on a server.
I'm not very good with javascript--and maybe a different language will be necessary--but any help is much appreciated.
I suppose you could do something like this:
var myHTMLDoc = "<html><head><title>mydoc</title></head><body>This is a test page</body></html>";
var uri = "data:application/octet-stream;base64,"+btoa(myHTMLDoc);
document.location = uri;
BTW, btoa might not be cross-browser, I think modern browsers all have it, but older versions of IE don't. AFAIK base64 isn't even needed. you might be able to get away with
var uri = "data:application/octet-stream,"+myHTMLDoc;
Drawbacks with this is that you can't set the filename when it gets saved
You cant do this with javascript but you can have a HTML5 link to open save dialogue:
<a href="pageToDownload.html" download>Download</a>
You could add some smarts to automate it on the processed page after the POST.
fiddle : http://jsfiddle.net/ghQ9M/
Simple answer, you can't.
JavaScript is restricted to perform such operations due to security reasons.
The best way to accomplish that, would be, to call a server page that would write
the new file on the server. Then from javascript perform a POST request to the
server page passing the data you want to write to the new file.
If you want the user to save the page to it's file system, this is a different
problem and the best approach to accomplish that, would be to, notify the user/ask him
to save the page, that page could be your new window like you are doing w.open().
Let me do some demonstration for you:
//assuming you know jquery or are willing to use it :)
var html = $("#tblive").html().replace(/</g, '<');
//generating your download button
$.post('generate_page.php', { content: html })
.done(function( data ) {
var filename = data;
//inject some html to allow user to navigate to the new page (example)
$('#tblive').parent().append(
'Check your Dynamic Page!');
// you data here, is the response from the server so you can return
// your new dynamic page file name here.
// and maybe to some window.location="new page";
});
On the server side, something like this:
<?php
if($_REQUEST["content"]){
$pagename = uniqid("page_", true) . '.html';
file_put_contents($pagename, $_REQUEST["content"]);
echo $pagename;
}
?>
Some notes, I haven't tested the example, but it works in theory.
I assume that with this the effort to implement it should be minimal, assuming this solves your problem.
A server based solution:
You'll need to set up a server (or your PC) to serve your HTML page with headers that tell your browser to download the page instead of processing the HTML markup. If you want to do this on your local machine, you can use software such as WAMP (or MAMP for Mac or LAMP for Linux) that is basically a web server in a .exe. It's a lot of hassle but it'll work.

URL Hash modification after document.write()

I download via jQuery AJAX a whole html webpage. I want to replace the content of the current page with the one downloaded via ajax. I do it with document.write(). It doesn't work correctly because whenever I try to modify the hash, the webpage is reloaded.
I know in IE it it necessary an iframe, but that is not the problem, because I use jQuery History plugin. The problem is due to the use of document.write(), but I don't know why.
Update:
index.php -> main entry point, which downloads JS code to parse URL after hash and invoke request.php.
request.php -> request entry point. It returns the webpage.
It works OK when I simulate a direct request to request.php and the downloaded webpage updates the hash.
It doesn't work (in FFox only) when I simulate a original request to index.php, which downloads the webpage via request.php and the downloaded page modifies the hash.
I use document.write() to write the content of the webpage to the current window. So the problem is about the modification of the hash in a document "being written".
don't use document.write().
instead use $('your selector').html(your_html_fetched_via_ajax);
I thinkg that you can't modify the whole html object because it means erasing the reference to the javascript script tag. I would say your best bet is to either just link to the request.php page or just change the body tag
$('body').html(response_html);
And I agree with harshath.jr, don't use document.write().
The individuals pointing you towards an iframe are correct. Add the iframe, and simply set the src attribute to the page you're fetching...you won't even need request.php.
If you really want to try to load in the html without an iframe, you'd have the parse out the elements in the head and add them to your documents , and also parse the contents of the and add them to the current pages body. Its not guaranteed to display correctly, though. I think an iframe is really what you're looking for.

Categories