Navigating inside website and taking screenshot through code - javascript

I need to take screenshots of multiple separate shopping websites for the final checkout page.
All selections of items in cart and other navigation to pages must be through code.
The output screenshots should be in image file(jpg,png) or inserted in a docx file(if possible)
What tool and technology can I use for this task?
I have a little idea about screen capture through php and phantomjs but for a static webpage only. I am a newbie and would be happy if someone guides me here.
For example:
To open google.com, search for "stackoverflow" and further opening stackoverflow.com and take a screenshot of the homepage. These steps must be done via code (i.e) automated. Thanks in advance guyz!!

The Selenium website has an example of how to do something similar to this from Java (using Firefox as the browser) at http://www.seleniumhq.org/docs/03_webdriver.jsp#introducing-the-selenium-webdriver-api-by-example
Here's a quick TL;DR version. It doesn't click through to Stack Overflow but instead should take a screenshot of the Google results for that term. Going via Google when you already know the URL of the site may be a redundant step anyway, I am sure you can modify this example to make it do what you need it to.
WebDriver driver = new FirefoxDriver();
driver.get("http://www.google.com");
// Find the text input element by its name
WebElement element = driver.findElement(By.name("q"));
element.sendKeys("Stack Overflow");
element.submit();
// Google's search is rendered dynamically with JavaScript.
// Wait for the page to load, timeout after 10 seconds
(new WebDriverWait(driver, 10)).until(new ExpectedCondition<Boolean>() {
public Boolean apply(WebDriver d) {
return d.getTitle().toLowerCase().startsWith("Stack Overflow");
}
});
// Screenshot of search results (screen not whole page)
File scrFile = ((TakesScreenshot)driver).getScreenshotAs(OutputType.FILE);
FileUtils.copyFile(scrFile, new File("c:\\screenshot.png"));
Screenshot code is from Sergii Pozharov's answer at Take a screenshot with Selenium WebDriver - see that for other considerations such as choice of driver.

Related

How to scrape the javascript portion of a webpage?

I'm trying to scrape some site in Node.js. I've followed a great tutorial however realize that it might not be what I am looking for, ie. might be looking at scraping the javascript portion of the page instead of the html one.
Is that possible ?
Reason for that is that I am looking for loading the content of the below portion of the code I could find by inspecting in Safari (not showing in Chrome) a kayak.com page (see url below) and seems to be in a scripting section.
reducer: {"reducerPath":"flights\/results\/react\/reducers\/
https://www.kayak.com/flights/TYO-PAR/2019-07-05-flexible/2019-07-14-flexible/1adults/children-11?fs=cfc=1;legdur=-960;stops=~0;bfc=1&sort=bestflight_a&attempt=2&lastms=1550392662619
UPDATE: Unfortunately, this site uses bot/scrape protection: tools like curl get a page with bot warning, headless browser tools like puppeteer get a page with captcha.
===============
As this line is present in the HTML source code and is not added dynamically by JavaScript execution, you can use something like this with the appropriate library API:
const extractedString = [...document.querySelectorAll('script')]
.map(({ textContent }) => textContent)
.find(txt => txt.includes('string'))
.match(/regexp/);

Multiple screenshot with Firefox Developer Tools

I am trying take screenshots of a page that loads a series of content (slideshow) via Javascript. I can take screenshots of individual items with Firefox Devtools just fine. However it's tedious to do so by hand.
I can think of a few options-
Run the 'screenshot' command in a loop and call a JS function in each loop to load the next content. However I can't find any documentation to script the developer tools or call JS functions from within it.
Run a JS script on the page to load the contents at an interval and call the devtools to take a screenshot each time. But I can't find any documentation on calling devtools from JS in webpage.
Have Devtools take screenshots in response to a page event. But I can't find any documentation on this either.
How do I do this?
Your first questions is, how to take screenshots with javascript in a programmed way:
use selenium Webdriver to steer the browser instead of trying to script the developer tools of a specific browser.
Using WebdriverJS as framework you can script anything you need around the Webdriver itself.
Your second question is, how to script the FF dev tools:
- no answer from my side -
I will second Ralf R's recommendation to use webdriver instead of trying to wrangle the firefox devtools.
Here's a webdriverjs script that goes to a webpage with a slow loading carousel, and takes a screenshot as soon as the image I request is fully loaded (with this carousel, I tell it to wait until the css opacity is 1). You can just loop this through however many slide images you have.
var webdriver = require('selenium-webdriver');
var By = webdriver.By;
var until = webdriver.until;
var fs = require("fs");
var driver = new webdriver.Builder().forBrowser("chrome").build();
//Go to website
driver.get("http://output.jsbin.com/cerutusihe");
//Tell webdriver to wait until the opacity is 1
driver.wait(function(){
//first store the element you want to find in a variable.
var theEl = driver.findElement(By.css(".mySlides:nth-child(1)"));
//return the css value (it can be any value you like), then return a boolean (that the 'result' of the getCssValue request equals 1)
return theEl.getCssValue('opacity').then(function(result){
return result == 1;
})
}, 60000) //specify a wait of 60 seconds.
//call webdriver's takeScreenshot method.
driver.takeScreenshot().then(function(data) {
//use the node file system module 'fs' to write the file to your disk. In this case, it writes it to the root directory of my webdriver project.
fs.writeFileSync("pic2.png", data, 'base64');
});

How do I view the source of a page generated mostly in javascript?

On this Yelp page:
http://www.yelp.com/search?find_desc=auto+repair&find_loc=70163&ns=1#l=g:-90.1266860962,29.9067341681,-90.0243759155,29.9959757119
The first result is GR Automotive. But when I do View Page Source and Ctrl+F for GR Automotive I get no results.
I believe this is because the text I want is generated by javascript.
How can I view the new page source which is generated by javascript?
I need to be able to manipulate the data on the page, but it's not in the html source, and I don't want to use the API since the main portion of my code is in Autohotkey. The URL version of the yelp API also doesnt seem to work with the sample example code.
Answer based on your question title:
This question does not appear to be about programming, but you need to view the information a different way in order to see the DOM. Instead of "view page source", use "inspect element".
Answer based on your edited question:
In order to manipulate Yelp listings, you will need the Yelp API.
General documentation
Business API
After your post on http://ahkscript.org
Remember that View Page Source does not give you the live Source
I still took a look at it... and it can be done just fine with a normal IE browser COM object from ahk
Example:
url := "http://www.yelp.com/search?find_desc=auto+repair&find_loc=70163&ns=1#l=g:-90.1266860962,29.9067341681,-90.0243759155,29.9959757119"
wb := ComObjCreate("InternetExplorer.Application")
wb.visible := true
wb.Navigate(url)
while wb.readyState!=4 || wb.document.readyState != "complete" || wb.busy
continue
sleep 100
while (wb.document.getElementsByClassName("throbber-overlay")[0].style.display != "none")
continue
msgbox % wb.document.getElementsByClassName("natural-search-result")[0].innertext
return
I don't really know what you tried before but with an IE COM object you can access the dhtml without much hassel.
But you always need to wait long enough for the elements you need to fully load when trying to access them this way.

Injecting HTML into existing web pages

I'm interested in the concept of injecting a bit of HTML into existing web pages to perform a service. The idea is to create an improved bookmarking system - but I digress, the specific implementation is unimportant. I'm quite new to web development and so I have no definite idea as to how to accomplish this, thought I have noticed a couple of possibilities.
I found out I can right click > 'inspect element' and proceed to edit my browser's version of the HTML corresponding with the webpage I'm viewing. I assume that this means I can edit what I see and interact with. Could I possibly create a script that ran from a button on bookmarks bar that injected an Iframe which linked to a web service of my making? (And deleted itself after being used).
Could I possibly use a chrome extension to accomplish this? I have no experience with creating extensions and so I have no clue what they're capable of - though I wouldn't be against learning.
Which of these would be best? If they are even valid ideas. Or is there another way that I've yet to know of?
EDIT: The goal is to have a user click a button in the browser if they would like to save this page. They are then presented an interface visually independent of the rest of the page that allows them to categorize this webpage according to their interests. It would take the current link, add some information such as a comment, rating, etc. and add it to the user's data. This is meant as a sort of side-service to a website whose purpose would be to better organize and display the browsing information of the user.
Yes, you can absolutely do this. You're asking about Bookmarklets.
A bookmarklet is just a bookmark where the URL is a piece of JavaScript instead of a URL. They are very simple, yet can be capable of doing anything to a web page. Full JavaScript access.
A bookmarklet can be engaged on any web page -- the user simply has to click the bookmark(let) to launch it on the current page.
Bookmark = "http://chasemoskal.com/"
Bookmarklet = "javascript:(function(){ alert('I can do anything!') })();"
That's all it is. You can create a bookmarklet link which can be clicked-and-dragged onto a bookmark bar like this:
Bookmarklet
Bookmarklets can be limited in size, however, you can load an entire external script from the bookmarklet.
You can do what you refer to as like an <iframe>, so here are some steps that may help you, simply put:
Create an XMLHttpRequest object and make a request for a page trough it.
Make the innerHTML field of an element to hold the resultString of the previous request, aka the HTML structure.
Lets assume you have an element with the id="Result" on your html. The request goes like this:
var req = new XMLHttpRequest();
req.open('GET', 'http://example.com/mydocument.html', true);
req.onreadystatechange = function (aEvt) {
if (req.readyState == 4 && req.status == 200) {
Result.innerHTML = req.responseText;
}
};
req.send(null);
Here's an improved version in the form of a fiddle.
When you're done, you can delete that injected HTML by simply:
Result.innerHTML = '';
And then anything inside it will be gone.
However, you can't make request to other servers due to request policies. They have to be under the same domain or server. Take a look at this: Using XMLHttpRequest on MDN reference pages for more information.

Take a screenshot of a webpage with JavaScript?

Is it possible to to take a screenshot of a webpage with JavaScript and then submit that back to the server?
I'm not so concerned with browser security issues. etc. as the implementation would be for HTA. But is it possible?
Google is doing this in Google+ and a talented developer reverse engineered it and produced http://html2canvas.hertzen.com/ . To work in IE you'll need a canvas support library such as http://excanvas.sourceforge.net/
I have done this for an HTA by using an ActiveX control. It was pretty easy to build the control in VB6 to take the screenshot. I had to use the keybd_event API call because SendKeys can't do PrintScreen. Here's the code for that:
Declare Sub keybd_event Lib "user32" _
(ByVal bVk As Byte, ByVal bScan As Byte, ByVal dwFlags As Long, ByVal dwExtraInfo As Long)
Public Const CaptWindow = 2
Public Sub ScreenGrab()
keybd_event &H12, 0, 0, 0
keybd_event &H2C, CaptWindow, 0, 0
keybd_event &H2C, CaptWindow, &H2, 0
keybd_event &H12, 0, &H2, 0
End Sub
That only gets you as far as getting the window to the clipboard.
Another option, if the window you want a screenshot of is an HTA would be to just use an XMLHTTPRequest to send the DOM nodes to the server, then create the screenshots server-side.
Another possible solution that I've discovered is http://www.phantomjs.org/ which allows one to very easily take screenshots of pages and a whole lot more. Whilst my original requirements for this question aren't valid any more (different job), I will likely integrate PhantomJS into future projects.
Pounder's if this is possible to do by setting the whole body elements into a canvase then using canvas2image ?
http://www.nihilogic.dk/labs/canvas2image/
A possible way to do this, if running on windows and have .NET installed you can do:
public Bitmap GenerateScreenshot(string url)
{
// This method gets a screenshot of the webpage
// rendered at its full size (height and width)
return GenerateScreenshot(url, -1, -1);
}
public Bitmap GenerateScreenshot(string url, int width, int height)
{
// Load the webpage into a WebBrowser control
WebBrowser wb = new WebBrowser();
wb.ScrollBarsEnabled = false;
wb.ScriptErrorsSuppressed = true;
wb.Navigate(url);
while (wb.ReadyState != WebBrowserReadyState.Complete) { Application.DoEvents(); }
// Set the size of the WebBrowser control
wb.Width = width;
wb.Height = height;
if (width == -1)
{
// Take Screenshot of the web pages full width
wb.Width = wb.Document.Body.ScrollRectangle.Width;
}
if (height == -1)
{
// Take Screenshot of the web pages full height
wb.Height = wb.Document.Body.ScrollRectangle.Height;
}
// Get a Bitmap representation of the webpage as it's rendered in the WebBrowser control
Bitmap bitmap = new Bitmap(wb.Width, wb.Height);
wb.DrawToBitmap(bitmap, new Rectangle(0, 0, wb.Width, wb.Height));
wb.Dispose();
return bitmap;
}
And then via PHP you can do:
exec("CreateScreenShot.exe -url http://.... -save C:/shots domain_page.png");
Then you have the screenshot in the server side.
This might not be the ideal solution for you, but it might still be worth mentioning.
Snapsie is an open source, ActiveX object that enables Internet Explorer screenshots to be captured and saved. Once the DLL file is registered on the client, you should be able to capture the screenshot and upload the file to the server withing JavaScript. Drawbacks: it needs to register the DLL file at the client and works only with Internet Explorer.
We had a similar requirement for reporting bugs. Since it was for an intranet scenario, we were able to use browser addons (like Fireshot for Firefox and IE Screenshot for Internet Explorer).
This question is old but maybe there's still someone interested in a state-of-the-art answer:
You can use getDisplayMedia:
https://github.com/ondras/browsershot
The SnapEngage uses a Java applet (1.5+) to make a browser screenshot. AFAIK, java.awt.Robot should do the job - the user has just to permit the applet to do it (once).
And I have just found a post about it:
Stack Overflow question JavaScript code to take a screenshot of a website without using ActiveX
Blog post How SnapABug works – and what they should do
I found that dom-to-image did a good job (much better than html2canvas). See the following question & answer: https://stackoverflow.com/a/32776834/207981
This question asks about submitting this back to the server, which should be possible, but if you're looking to download the image(s) you'll want to combine it with FileSaver.js, and if you want to download a zip with multiple image files all generated client-side take a look at jszip.
You can achieve that using HTA and VBScript. Just call an external tool to do the screenshotting. I forgot what the name is, but on Windows Vista there is a tool to do screenshots. You don't even need an extra install for it.
As for as automatic - it totally depends on the tool you use. If it has an API, I am sure you can trigger the screenshot and saving process through a couple of Visual Basic calls without the user knowing that you did what you did.
Since you mentioned HTA, I am assuming you are on Windows and (probably) know your environment (e.g. OS and version) very well.
If you are willing to do it on the server side, there are options like PhantomJS, which is now deprecated. The best way to go would be Headless Chrome with something like Puppeteer on Node.JS. Capturing a web page using Puppeteer would be as simple as follows:
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://example.com');
await page.screenshot({path: 'example.png'});
await browser.close();
})();
However it requires headless chrome to be able to run on your servers, which has some dependencies and might not be suitable on restricted environments. (Also, if you are not using Node.JS, you might need to handle installation / launching of browsers yourself.)
If you are willing to use a SaaS service, there are many options such as
Restpack
UrlBox
Screenshot Layer
A great solution for screenshot taking in Javascript is the one by https://grabz.it.
They have a flexible and simple-to-use screenshot API which can be used by any type of JS application.
If you want to try it, at first you should get the authorization app key + secret and the free SDK
Then, in your app, the implementation steps would be:
// include the grabzit.min.js library in the web page you want the capture to appear
<script src="grabzit.min.js"></script>
//use the key and the secret to login, capture the url
<script>
GrabzIt("KEY", "SECRET").ConvertURL("http://www.google.com").Create();
</script>
Screenshot could be customized with different parameters. For example:
GrabzIt("KEY", "SECRET").ConvertURL("http://www.google.com",
{"width": 400, "height": 400, "format": "png", "delay", 10000}).Create();
</script>
That's all.
Then simply wait a short while and the image will automatically appear at the bottom of the page, without you needing to reload the page.
There are other functionalities to the screenshot mechanism which you can explore here.
It's also possible to save the screenshot locally. For that you will need to utilize GrabzIt server side API. For more info check the detailed guide here.
As of today Apr 2020 GitHub library html2Canvas
https://github.com/niklasvh/html2canvas
GitHub 20K stars | Azure pipeles : Succeeded | Downloads 1.3M/mo |
quote : " JavaScript HTML renderer The script allows you to take "screenshots" of webpages or parts of it, directly on the users browser. The screenshot is based on the DOM and as such may not be 100% accurate to the real representation as it does not make an actual screenshot, but builds the screenshot based on the information available on the page.
I made a simple function that uses rasterizeHTML to build a svg and/or an image with page contents.
Check it out :
https://github.com/orisha/tdg-screen-shooter-pure-js

Categories