Trouble crawling/scraping webpages that use javascript with Perl - javascript

I've been trying to teach myself how to crawl and scrape different websites. I got a good feeling about crawling/scraping, but only with websites which mainly use HTML. Now I'm working with this link https://intel.taleo.net/careersection/10000/jobsearch.ftl
I'm using Perl (with mechanize) to do the following task : I want to write a crawler/scraper to click the "United States" checkbox on the left (filtering the results) and then collect the titles of all jobs. However, I couldn't find a way to navigate to this radio button using Perl. Can someone get me started on this? (an example code would be helpful).

you need to analyise the page and see how this radio button impelented in order to use WWW-Mechanize to eumulate the Javascript code if there JavaScript code there .
also on Perl you have more easy options to handle JavaScript below some of crawling modules that handle javascript out of the box :
1.WWW-Mechanize-Firefox which automate FireFox
2.WWW-Mechanize-PhantomJS which based on PhatonJS Broweser and can handle javascript
3.WWW::Selenium which use Selenium
4.WWW::HtmlUnit which based on Java HtmlUnit and can handle javascript

Related

Dart wrapper over js library and use it in flutter app

Put it simple I want to make small currency exchange app (pet project- I want free API( 1000 requests per month including more currency is a perfect option)). I dont like the free APIs I have found so far but I have found this website https://bg.coinmill.com/ and I wanna use it for my purpose. Reading an answer to similar question:
The only way to make use of JS in Flutter is using WebView.
Dart compiles to JS only for browser applications, for Flutter it compiles >to native machine code.
convert js code direcly to dart, using package js
package JS doesn't convert JS, it just creates proxies for JS functions to >be able to call them from Dart, but that is also only supported in Dart web >applications.
Put it simple it isn't possible without hitting some compilation errors and some workarounds. However https://github.com/pichillilorenzo/flutter_inappbrowser looks promissing. Embedding the webpage that will look ugly and I won't have any control over ui/settings. My options now are looking for another free currency API or trying to find a workaround. I incline for another API, but not sure which one. Any suggestions ?
So basically what you actually want to do is use that website to do the currency conversion in the background (enter value, press "Convert"), then display the result in your Flutter app? You don't need javascript for that.
After entering pressing the submit button, the site simply redirects you to a different page (GET request) with an URL like this:
https://bg.coinmill.com/CAD_USD.html?CAD=22
Use dart's http library to perform the same request with the right currency/value parameters. The result of the request contains the source code of the web page.
Instead of displaying the web page, you just need to read the value you need from the source code of the web page:
<div id="currencyBox1">
<input class="currencyField" ... value="16.46" ...>
САЩ долар (USD)
</div>
So, how I understand your question, you have some js library, and you want to use it from Dart?
If question so, yes, you can do it using Dart JS Intertop. The more information in the link.
Edit
Yes, you are right, you can call js from Flutter only using evalJavascript function from flutter_webview_plugin.
You can use Firebase Cloud Functions and wrap your function in a callable function. You'll have all node js environment and Dart code will only call a function.

iPhone SDK: Getting text from javascript class

I'm trying to retrieve some information from Gmail but have been unsuccessful after many attempts. This is the line of code that I'm trying to extract using javascript.
Inbox (182)
Im trying to get the text "Inbox (182)," to do that, I'm using this piece of code
NSString *js_result = [webview1 stringByEvaluatingJavaScriptFromString:#"document.getElementsByClassName('J-Ke n0').innerText"];
This however does not work, my result being nothing at all, and I've tried many alternatives but none have worked. All I need to do here is extract the "Inbox (182)" text in any way possible. Thanks.
I think your javascript is incorrect, since there are multiple elements with that class. If I login to gmail, this works:
document.getElementsByClassName('J-Ke n0')[0].innerText
I would be weary of using this in a production environment, though. It seems very brittle; that class or order of elements could be changed by Google at any time.
You also need to make sure that the page has loaded before trying to execute javascript. Typically this is implemented in a webViewDidFinishLoad: callback. If you're not getting a result and your JS is valid, this is probably the issue.

Can't use perl WWW::Mechanize to tick checkboxes

I am making a webscraper using perl WWW::Mechanize. My problem is the site that I am scraping is using javascript a bit too much. I am logging in using credentials, Then traversing to custom search using $mech->follow_link(url).
The problem starts from here. I am landing on to a page where I have to select one checkbox and one radiobutton from a javascript enabled dropdownlist. I am stuck at this point.
The part of html is as below. When I am using $mech->tick('cs-MajorIndustryGroup'), I am getting an error
Can't call method "find_input" on an undefined value
WWW::Mechanize doesn't support JavaScript. You could try some of these modules:
Gtk2::WebKit::Mechanize
Win32::IE::Mechanize
WWW::Mechanize::Firefox
WWW::Scripter
WWW::Selenium.
For more information see WWW::Mechanize::FAQ.

I need to extract somehow (probably using JavaScript) some information in my clients' websites. What's the best way to do it?

I want to plug my clients' websites to a system that I have. I need to be able to use some information that is in the website in order to improve the user experience in my system (automatically pre-filled forms, show their address, etc...).
The problem I face is that my client's website provider will not code that feature (add a link passing the information I need). So my idea is to have a JavaScript file that will be included in all the pages (they are willing to do this, because it's only copy & paste)... and then this JavaScript code will somehow extract the data I need and create the link the way I need.
One thing that will help is that all my clients' websites are provided by the same companies, and they are all template-based. So all the websites from the same provider have the same HTML structure.
Do you know any other way of doing this? If JavaScript is the way to go, what's the best way to scrape the information?
Thanks!
I'm not sure if your 'system' is a web tool or desktop based program, but if it is a web tool dynamic drive have a nice piece of javascript that can achieve the results you want without needing to modify the clients site:
Dynamic Ajax Content
Now I'm guessing you may want to change the content around your self and not display it exactly as it is on your clients site. So heres a quick modification of their script function loadpage() so that you can catch the html in a variable (loadedContent):
var loadedContent;
function loadpage(page_request, containerid){
if (page_request.readyState == 4 && (page_request.status==200 || window.location.href.indexOf("http")==-1))
loadedContent = page_request.responseText
}
Now if you follow the instructions on their page to setup and call the script ... after its execution you will have the html of the page stored in loadedContent for you to play about with.
if you want to test it working before you implement it, go to the link above, open your developer console, put the moded code in and hit enter. This should replace their function on the fly. Now see their demo at the top, click on one of the different pages. Nothing visible should happen. Go to your console and now type in loadedContent. You should see the html they where trying to load stored there.
Hope this helps

Use bing translator in my website

I recently have come across with a need for some type of "translation"-type that could translate specific text fields or areas to other languages.
I want when user will write texts in **<input type="text" id="texttotranslate"/>** html control and after space the text should get converted to local language i.e. Hindi, Arabic Finnish
I am not sure if something like this even is out there - but I thought this might be a good place to ask.
Link 1
I came accorss this links as well but i want it Javascript / ajax solution to get it done
Link 2
I went through this and create my APPID
I am getting link 1 working in my C# console application but
i want a javascript solution for the same. ie. when i write a word in the text box it should get converted to local language i set .
if you are using Bing translator in your website, then there is no need to write any code in C#. You can use the Bing URL directly to translate the words.
Please refer to the following URL: http://basharkokash.com/post/Bing-Translator-for-developers.aspx
One option would be to put the Microsoft Translator widget on your site (http://www.microsofttranslator.com/widget). Mark up the fields that you don't want translated using the class="notranslate" tag.
Alternatively, if you want to use the API, I recommend following the tutorials here:
http://blogs.msdn.com/b/translation/p/gettingstarted1.aspx
and
http://blogs.msdn.com/b/translation/p/gettingstarted2.aspx
While the second link does it in ASP.NET, instead of JavaScript, it should give you a rough idea for how to do it. At the very least I recommend getting your access token server side, using ASP.NET, PHP or something similar, so your Client ID and Client Secret are not in-the-clear on your site.
Finally, take a look here: http://msdn.microsoft.com/en-us/library/ff512385.aspx, for the MSDN documentation on the AJAX API, including how to access it using JavaScript.

Categories