I am currently load testing my companies new webpage and have used JMeter for this task. We have an assessment which pulls down javascript locally to the users machine which allows them to take a test, once completed tests are uploaded back to the database.
The issue I'm having is that it is well documented that JMeter is not a browser and does not interact with javascript. We need a way to test the time it takes for requests to browse to the page assessment and how long it takes to pull. It is also required to up the amount of requests over a specific period of time so we can determine at what point the server falls over.
I have also tried using Gatling however I am running into the same issue. Has anyone else ran into these problems and how did they get around this?
Thanks in advance!
Very few tools run downloaded JS code in the client/VU threads when executing a load test, for performance reasons mainly. You can try Selenium Grid or some online service based on it, like https://www.loadbooster.com/ or Blazemeter, but if you want to run tests in your own environment, Selenium Grid may be your only choice.
The alternative is to emulate the JS client side code when you script the load test scenario. Many tools can do that, at some level, but to make translation of existing JS code easier, I would choose a tool that offers a real scripting language, such as e.g. Grinder, Locust, Wrk or k6 (or possibly Gatling). k6 may be the simplest as you script it in Javascript, so translating the client-side JS code should be somewhat less work there.
https://k6.io
https://gatling.io/
https://locust.io
https://github.com/wg/wrk
http://grinder.sourceforge.net/
You need to split your test into 2 major areas:
JavaScript processing happens solely on client side and you need to test it separately, the majority of modern web browsers come with developer tools allowing testing the performance of JavaScript execution:
Firefox Developer Tools - Performance
Chrome DevTools - Analyze Runtime Performance
Profiling JavaScript performance
Server side impact is the fact of downloading the JavaScript and uploading the results, you should be able to mimic the corresponding calls using JMeter's HTTP Request sampler, check out Performance Testing: Upload and Download Scenarios with Apache JMeter article for more details.
See built in JavaScript profiler inside of developer tools inside every browser. This is a question tied to single definition of a browser/os/machine/running Apps/browser extensions question, not one of multi user performance
Related
Good afternoon!
We're looking to get a javascript variable from a webpage, that we are usually able to retrieve typing app in the Chrome DevTools.
However, we're looking to realize this headlessly as it has to be performed on numerous apps.
Our ideas :
Using a Puppeteer instance to go on the page, type the command and return the variable, which works, but it's very ressource consuming.
Using a GET/POST request to the page trying to inject the JS command, but we didn't succeed.
We're then wondering if there will be an easier solution, as a special API that could extract the variable?
The goal would be to automate this process with no human interaction.
Thanks for your help!
Your question is not so much about a JS API (since the webpage is not yours to edit, you can only request it) as it is about webcrawling / browser automation.
You have to add details to get a definitive answer, but I see two scenarios:
the website actively checks for evidence of human browsing (for example, it sits behind CloudFlare and has requested this option); or the scripts depend heavily on there being a browser execution environment available. In this case, the simplest option is to automate a browser, because a headless option has to get many things right to fool the server or the scripts. I would use karate, which is easier than, say, selenium and can execute in-browser scripts. It is written in Java, but you can execute it externally and just read its reports.
the website does not check for such evidence and the scripts do not really require a browser execution environment. Then you can simply download everything requires locally and attempt to jury-rig the JS into executing in any JS environment. According to your post, this fails; but it is impossible to help unless you can describe how it fails. This option can be headless.
You can embed Chrome into your application and instrument it. It will be headless.
We've used this approach in the past to copy content from PowerPoint Online.
We were using .NET to do this and therefore used CEFSharp.
I'm just getting into Python and have been working mostly with BeautifulSoup to scrape sports data from the web. I have run into an issue with a table on the PGA website where it is generated by javascript, was hoping someone could walk me through the process in the context the specific website I am working with. Here is a sample link "http://www.pgatour.com/content/pgatour/players/player.29745.tyler-aldridge.html/statistics" the tables are all of the player statistics tables. Thanks!
When a web page uses JavaScript to build or get it's content, you are out of luck with tools that just download HTML from the web. You need something which is mimicking a web browser more thoroughly, and interpreting JavaScript. In other words, a so-called headless browser. There are some out there, even some with good Python integration. You may want to start your journey by searching for PhantomJS or Selenium. Once you've chosen the tool of your choice, you can let the browser do it's retrieving and rendering work and then browse the DOM in much a similar way than you did with BeautifulSoup on static pages.
I would, however, also a look at the Network tab of your browser's debugger first. Sometimes you can identify the GET which is actually getting the table data from the server. In this case it might be easier to GET the data yourself (e.g. via requests) than to employ complex technology to do it for you. It is also very likely that you get the information you want in plain JSON which will make it even simpler to use. The PGA site makes GETs hundreds of resources to build, but it will still be a good trade to browse thru them.
You need JavaScript Engine to parse and run JavaScript code inside the page. There are a bunch of headless browsers that can help you
http://code.google.com/p/spynner/
http://phantomjs.org/
http://zombie.labnotes.org/
http://github.com/ryanpetrello/python-zombie
http://jeanphix.me/Ghost.py/
http://webscraping.com/blog/Scraping-JavaScript-webpages-with-webkit/
Also, consider using this:
http://www.seleniumhq.org/docs/03_webdriver.jsp
Selenium-WebDriver makes direct calls to the browser using each browser’s native support for automation. How these direct calls are made, and the features they support depends on the browser you are using. Information on each ‘browser driver’ is provided later in this chapter.
For those familiar with Selenium-RC, this is quite different from what you are used to. Selenium-RC worked the same way for each supported browser. It ‘injected’ javascript functions into the browser when the browser was loaded and then used its javascript to drive the AUT within the browser. WebDriver does not use this technique. Again, it drives the browser directly using the browser’s built in support for automation.
I'm scraping content from a website using Python. First I used BeautifulSoup and Mechanize on Python but I saw that the website had a button that created content via JavaScript so I decided to use Selenium.
Given that I can find elements and get their content using Selenium with methods like driver.find_element_by_xpath, what reason is there to use BeautifulSoup when I could just use Selenium for everything?
And in this particular case, I need to use Selenium to click on the JavaScript button so is it better to use Selenium to parse as well or should I use both Selenium and Beautiful Soup?
Before answering your question directly, it's worth saying as a starting point: if all you need to do is pull content from static HTML pages, you should probably use a HTTP library (like Requests or the built-in urllib.request) with lxml or BeautifulSoup, not Selenium (although Selenium will probably be adequate too). The advantages of not using Selenium needlessly:
Bandwidth, and time to run your script. Using Selenium means fetching all the resources that would normally be fetched when you visit a page in a browser - stylesheets, scripts, images, and so on. This is probably unnecessary.
Stability and ease of error recovery. Selenium can be a little fragile, in my experience - even with PhantomJS - and creating the architecture to kill a hung Selenium instance and create a new one is a little more irritating than setting up simple retry-on-exception logic when using requests.
Potentially, CPU and memory usage - depending upon the site you're crawling, and how many spider threads you're trying to run in parallel, it's conceivable that either DOM layout logic or JavaScript execution could get pretty expensive.
Note that a site requiring cookies to function isn't a reason to break out Selenium - you can easily create a URL-opening function that magically sets and sends cookies with HTTP requests using cookielib/cookiejar.
Okay, so why might you consider using Selenium? Pretty much entirely to handle the case where the content you want to crawl is being added to the page via JavaScript, rather than baked into the HTML. Even then, you might be able to get the data you want without breaking out the heavy machinery. Usually one of these scenarios applies:
JavaScript served with the page has the content already baked into it. The JavaScript is just there to do the templating or other DOM manipulation that puts the content into the page. In this case, you might want to see if there's an easy way to pull the content you're interested in straight out of the JavaScript using regex.
The JavaScript is hitting a web API to load content. In this case, consider if you can identify the relevant API URLs and just hit them yourself; this may be much simpler and more direct than actually running the JavaScript and scraping content off the web page.
If you do decide your situation merits using Selenium, use it in headless mode, which is supported by (at least) the Firefox and Chrome drivers. Web spidering doesn't ordinarily require actually graphically rendering the page, or using any browser-specific quirks or features, so a headless browser - with its lower CPU and memory cost and fewer moving parts to crash or hang - is ideal.
I would recommend using Selenium for things such as interacting with web pages whether it is in a full blown browser, or a browser in headless mode, such as headless Chrome. I would also like to say that beautiful soup is better for observing and writing statements that rely on if an element is found or WHAT is found, and then using selenium ot execute interactive tasks with the page if the user desires.
I used Selenium for web scraping, but it is not happy solution. In my last project I used https://github.com/chromedp/chromedp . It is more simple solution than Selenium.
I am trying to figure out how to load test my JavaScript widget. It will be running on around 1000 websites each with about 2k visitors a day so i really need to find a way to test this scenario before letting my users install the widget.
Any ideas what approach i should take?
It sounds like the widget itself (an instance of your javascript code running in a browser) will never be used by more than one user (one browser) at a time. Your server-side APIs, OTOH, will. So you need to test your server by simulating the level of HTTP traffic to your server that you expect the use of your widget to generate. For this, you want web load testing tools.
Depending on how simple the APIs are, you may be fine with something like JMeter. If the API usage is more complex, you might want a more sophisticated tool.
I'm attempting to write a particular script that logs into a website. This specific website contains a Javascript form so I had little to no luck by making use of "mechanize".
I'm curious if there exist other solutions that I may be unaware of that would help me in my situation. If this particular question or some related variant has been asked here before, please excuse me, and I would prefer the link to this particular query. Otherwise, what are some common techniques/approaches for dealing with this issue?
Thanks.
I've recently been using PhantomJS for this kind of work - it's a command-line tool that allows you to run Javascript in a browser environment (based on Webkit). This allows you to do scraping and online interactions that require Javascript-enabled interfaces. There's a Python-based implementation here that's fully compatible with the API of the C++ version, or you could run either version in Python via subprocess.
Depending on what you're trying to do, another good option might be to use Selenium, which has client driver implementation in Python - it's meant for integration testing, but can do a lot of automation as long as you're okay running the Java-based Selenium Server and having the automation happen in an open browser rather than as a background process.