Scraping Google Keyword Tools with CasperJS and PhantomJS

Scraping Google Keyword Tools with CasperJS and PhantomJS - javascript

I'm currently trying to scrape Google Keyword Tools with CasperJS and PhantomJS (both excellent tools, thanks n1k0 and Ariya), but I can't get it to work.
Here is my current process:
Log in with my Google Account (to avoid captchas in the Keyword Tools).
Navigate to the Keyword Tools page.
Fill in the search form and press Search.
I'm stuck at step 3: the search form is not a regular HTML form, I can't use Casper#fill(), so instead I'm accessing the fields directly. Here are some of the syntaxes I tried to change the value of the Word or phrase field:
this.evaluate(function() {
// Trying to change the value...
document.querySelector('textarea.sP3.sBFB').value = 'MY SUPER KEYWORDS';
document.querySelector('textarea.sP3.sBFB').setAttribute('value', 'MY SUPER KEYWORDS');
document.querySelector('textarea').value = 'MY SUPER KEYWORDS'; // there's only one <textarea> on the page
// Trying to change other attributes...
document.querySelector('textarea.sP3.sBFB').textContent = 'MY SUPER KEYWORDS';
document.querySelector('textarea').style.backgroundColor = 'yellow';
});
Nothing works. I'm doing a Casper#capture() right after to see what the field contains. As you can see, it confirms I am on the right page and that I am logged in, but the <textarea> is empty.
Strangely, I can access other parts of the DOM: I could change the text of a link that said Advanced Options and Filters to ___VINCE SAYS HELLO___ (see capture), by doing the following:
this.evaluate(function() {
document.querySelector('a.sLAB').textContent = '___VINCE SAYS HELLO___';
});
PS. I know scraping Google Keyword Tools is against the TOS, but I'm thinking this question might be of interest to anyone trying to scrape a JavaScript/Ajax-heavy site.

document.querySelector('textarea.sP3.sBFB').value = 'MY SUPER KEYWORDS';
You can't use elt.value on a textarea. Did you try with elt.textContent?

Why do you try to scrape the results. Google already creating a csv file for us.
Try downloading that. That links selector must be like $('.gux-combo gux-dropdown-c .sJK')
Will you use that for automating things ?

I'm not sure exactly what's happening here, but the classes that you're using for targeting are different for me. The OneBoxKeywordsInputPanel-input textarea that I assume you're attempting to target has a second class, sPFB, and no other classes. It's possible that these cryptic classes are dynamic in some way. I'd recommend using the more descriptive class names instead. The following works just fine for me:
document.querySelector('textarea.OneBoxKeywordsInputPanel-input')

Related

How to one-way process html data for the view in CKEditor 5?

In the CKEditor view for authors I need to change links to files so that the session ID of the author gets attached. However in the actual content for normal users the specific user ID is added automatically. Therefore the authors ID must not be saved in the content the author edits with CKEditor, it just has to be there in the view while he edits so that he can see an image for example. On save the 'clean' link without any IDs need to be saved.
In CKEditor 5 there seem to be more possibilities to achieve such a one-way data filtering for example with
Conversions
the Editing Engine generally
the HtmlDataProcessor specifically
However I couldn't find a good example respectively an easy and clean approach to achieve this. (My tries turned out to become quite complicated and didn't work properly...) I'd guess this is a quite common use case so maybe I'm overlooking something. Is there a good solution to this?
Update 1: Example links would be:
"clean link" how it has to be saved but will never work:https://example.com/some-image.png
modified link for specific users in content (and how it has to be modified in ckeditor view for authors as well): https://example.com/some-image.png?sessionId=currentUsersSessionId
Update 2:
While I was working further with CKEditor I came across more things like this which simply are very unpleasant from a developers point of view. And it seems this is by design, since quote from a Contributor 'fredck':
[...] we want to bring the editor out of the "HTML Editor" thing, making it the perfect soluting for "quality content writing".
Implicitly this means, if you are a developer and you have advanced users with advanced use cases (which may be likely the case if you are on Stackoverflow) you are not the target audience and shouldn't use CKEditor in the first place.
You can read more about this for example in the discussion here (also it is about another feature): https://github.com/ckeditor/ckeditor5/issues/592

To modify downloaded links you can write a custom downcast converter, which modifies obtained href.
Here is a working sample which adds the current timestamp to URLs:
https://codepen.io/msamsel/pen/zVMvZN?editors=1010
editor.conversion.for( 'dataDowncast' ).add( dispatcher => {
dispatcher.on(
'attribute:linkHref',
( evt, data, conversionApi ) => {
if ( !conversionApi.consumable.test( data.item, 'attribute:linkHref' ) ) {
return;
}
if ( data.attributeNewValue ) {
data.attributeNewValue += `#time=${ ( new Date() ).getTime() }`;
}
},
{ priority: 'high' }
);
} );
Few words how it works.
There is created listener which reacts on attribute:linkHref changes (it's fired only when data are obtained anyway because it's dataDowncast). Listeners fires with 'high' priority to change URL before the actual Link plugin will create an output. First is checked if the given model element is not consumed, but without consuming it, because we want to preserve native behavior which will process this same element again. The attribute value is extended with a timestamp, what finish this listener. After that, the native behaviour is fired, which has 'normal' priority.
A similar approach was used to implement custom link attributes. More about dispatcher and conversion process might be found here:
https://ckeditor.com/docs/ckeditor5/latest/framework/guides/architecture/editing-engine.html#conversion
https://ckeditor.com/docs/ckeditor5/latest/api/module_engine_conversion_downcastdispatcher-DowncastDispatcher.html

Display message Google Sheets from Apps Script

I want to display a message on a google sheet. But I don't get it, and, after research here, in documentation, I don't get the answer.
I think that the problem is in "activate" the spreadsheet, where i need to display.
var SEGUIMIENTO = SpreadsheetApp.openById("MyTestediD");
var INF = SEGUIMIENTO.getSheetByName("NameOfSheet");
function TestMessage() {
INF.activate();
Browser.msgBox("Hello")
}
When i run.. nothing happen
I need the definition of Spreadsheet outside the function because I'm working in 2 Spreadsheet's by ID in more that one function.
i only need the correction in my code for display a simple message in the spreadsheet.
PD. i really cant find a simple example of that,
Update
This code it's part of a macro recorder of a Spreadsheet, the same "SpreadsheetApp.openById("MyTestediD");"

I don't know why you try to 'activate' a sheet. If you want display a message I assume you want to do it in the user's current sheet, so:
SpreadsheetApp.getUi().alert('Confirmation received.');

From https://developers.google.com/apps-script/reference/base/browser
The methods in this class are only available for use in the context of a Google Spreadsheet. Please use G Suite dialogs instead.
As you can see, Google is nicely asking you to use G Suite dialogs instead of Class Browser, so be nice too and follow their request.

When you say you want a message in a spreeadsheet, do you mean an alert message? If so, the answer is to use the code SpreadsheetApp.getUi().alert('Hello.'); when the TestMessage function is executed
var SEGUIMIENTO = SpreadsheetApp.openById("My TestediD");
var INF = SEGUIMIENTO.getSheetByName("NameOfSheet");
function TestMessage() {
INF.activate();
SpreadsheetApp.getUi().alert('Hello.');
}

Changing the value of a variable through user input and re-using it on a different page

First I would like to say that I searched and found plenty of answers and even tried a couple (more than...) but to no avail! The error is probably mine but it is time to turn to SO and ask.
Problem description: I have a variable that I want to change the value through the user input (click on btn). As soon as the user chooses the btn it will navigate to a different page that will use the result of the variable to perform certain actions. My issue is that if I alert on my 1st page I get the value being passed by the btn... But on the second page I only get "undefined"
I think it has to do with variable scope and the fact that (I think it works that way anyway) even a window.var will be deleted/purged in a different window.
Anyway, the code is something like this (on the 1st page/file):
var somAlvo;
$('#omissL').click(function(){
somAlvo = 'l';
window.location.href='index_ProofOfConcept_nivel1.html';
});
And on the "receiving end" I have the following code
<head>
...
<script type="text/javascript" src="testForm_javascript.js"></script>
to "import" the js file with the variable and:
var processo = somAlvo;
alert(processo);
I tried declaring window, not using var inside the function and so on...
This is a proof of Concept for a project in my local University, where I'm working as a research assistant (so, this is not homework ;) )
Thanks for any help/hints...

You are right in that when you navigate to another page, the entire JavaScript runtime is reset and all variables lost.
To preserve a value across page loads you have two options:
Include it as part of a query string when navigating to the new page.
Set a cookie.
You may also want to look into loading the new content through an AJAX call and replacing what is displayed. This way you won't reload the entire page which won't cause the JavaScript runtime to be reset.

How to identify a hidden file element in selenium webdriver

Team,
I am trying to automate a file upload functionality but webdriver doesn't recognize the file object. Here is the thing:
The file object is in a modalbox (xpath is of the modal box is //*[#id='modalBoxBody']/div[1]). The type and name of the file object are file and url respectively.
When i see the html content, there are two objects with the same attributes. One of them is visible and another is invisible. But the hierarchy they belong to are different. So I am using the hierarchy where the element is visible.
Following is my code. I have tried all possible solutions provided in the stackoverflow (as much as I could search), but nothing worked. Commented out sections mean that they too are tried and failed.
wbdv.findElement(By.xpath("//*[#id='left-container']/div[4]/ul/li/ul/li[2]/a")).click();
wbdv.switchTo().activeElement();
System.out.println(wbdv.findElement(By.xpath("//*[#id='modalBoxBody']/div[1]")).isDisplayed()); **//This returns true**
List<WebElement> we = wbdv.findElement(By.xpath("//*[#id='modalBoxBody']/div[1]")).findElement(By.className("modalBoxBodyContent")).findElements(By.name("url")); **//There is only one element named url in this hierarchy**
System.out.println(we.isEmpty()); //This returns false meaning it got the element named url
//((JavascriptExecutor) wbdv).executeScript("document.getElementsByName('url')[0].style.display='block';"); **//This didn't work**
for(WebElement ele: we){
String js = "arguments[0].style.height='auto'; arguments[0].style.visibility='visible';";
((JavascriptExecutor) wbdv).executeScript(js, ele);
System.out.println(ele.isDisplayed()); **//This returns FALSE**
System.out.println(ele.isEnabled()); **//This returns TRUE**
System.out.println(ele.isSelected()); **//This returns FALSE**
ele.click(); **//This throws org.openqa.selenium.ElementNotVisibleException exception**
}
Now, if you look at the 3 methods above, it seems that the element is NOT displayed, NOT selected but IS enabled. So when it is not displayed, selenium cannot identify it. The java script to make it visible also came to no rescue.
Could anyone please help me solve this. It ate my entire day today?

In your last example, it looks to me like you have the right idea with using the 'style.visibility' tag. Another thing that I would recommend trying is using "ExpectedConditions.visibilityOfElementLocatedBy" method. Usually I use "presenceOfElementLocatedBy", but if you are talking about the css visibility property, I think using "visibilityOfElementLocatedBy" is the way to go. I think what might be happening for you is that you need the wait condition on the element object you are trying to get a hold of and the "ExpectedCondtions" method should give you what you need. I see that you have tried a few things but you haven't listed using a Wait condition. No guarantees, but you should try it:
WebDriverWait wait = new WebDriverWait(driver, 60);
wait.until(ExpectedConditions.visibilityOfElementLocated(
By.xpath(".//whatever")))

What is the meaning of <script>window["_GOOG_TRANS_EXT_VER"] = "1";</script>

I have this code on the header of my page
<script>window["_GOOG_TRANS_EXT_VER"] = "1";</script>
But i dont understand its meaning or from where it is generated, anyone knows anything ?
I would like to delete this because it seems to be a problem in the page generation...
Thank you for your help.

This is dynamically inserted by the Google Translate extension (or other extensions that were based on the Google Translate extension)
The source code of the Google Translate extension specifically refers to it:
/* Copyright 2010 Google */
...
function v(a) {
var b = {
noEvents: c,
content: u('window["_GOOG_TRANS_EXT_VER"] = "1";')
};
i.tabs.executeScript(a, {
code: q(s, b)
})
}
...
and disabling the extension removes it from the page.

Check the actions for your onClick event for in anchor tags, or another JavaScript action that is being fired. it is most likely an escaped quote where there should not be an escape.

All that script does is create a variable, _GOOG_TRANS_EXT_VER, in the global scope which then has a value of 1. If nothing uses this variable it shouldn't generates any issue but it seems like it is inserted by so sort of Google widget.

Do you have the Google Translate plugin? I saw the same thing and then tried turning my Translate plugin off and this line disappears.
Assuming you are running Chrome, you can check/disable at chrome://extensions/

We Keep Coding

JavaScript is the programming language of the Web.

Scraping Google Keyword Tools with CasperJS and PhantomJS - javascript

document.querySelector('textarea.sP3.sBFB').value = 'MY SUPER KEYWORDS'; You can't use elt.value on a textarea. Did you try with elt.textContent?

Why do you try to scrape the results. Google already creating a csv file for us. Try downloading that. That links selector must be like $('.gux-combo gux-dropdown-c .sJK') Will you use that for automating things ?

Related

How to one-way process html data for the view in CKEditor 5?

Display message Google Sheets from Apps Script

Changing the value of a variable through user input and re-using it on a different page

How to identify a hidden file element in selenium webdriver

What is the meaning of <script>window["_GOOG_TRANS_EXT_VER"] = "1";</script>

Categories

Resources