Clear Chrome browser logs in Selenium/Python - javascript

I have a large application and I am using Headless Chrome, Selenium and Python to test each module. I want to go through each module and get all the JS console errors produced while inside that specific module.
However, since each module is inside a different test case and each case executes in a separate session, the script first has to login on every test. The login process itself produces a number of errors that show up in the console. When testing each module I don't want the unrelated login errors to appear in the log.
Basically, clear anything that is in the logs right now -> go to the module and do something -> get logs that have been added to the console.
Is this not possible? I tried doing driver.execute_script("console.clear()") but the messages in the console were not removed and the login-related messages were still showing after doing something and printing the logs.

State in 2017 and late 2018
The logging API is not part of the official Webdriver specification yet.
In fact, it's requested to be defined for the level 2 specification. In mid 2017 only the Chromedriver has an undocumented non-standard implementation of that command.
In the sources there's no trace of a method for clearing logs:
The public API Webdriver.get_log()
which references internal Command names
which translate to acutal requests in RemoteConnection
Possible Workaround
The returned (raw) data structure is a dictionary that looks like this:
{
u'source': u'console-api',
u'message': u'http://localhost:7071/console.html 8:9 "error"',
u'timestamp': 1499611688822,
u'level': u'SEVERE'
}
It contains a timestamp that can be remembered so that subsequent calls to get_log() may filter for newer timestamps.
Facade
class WebdriverLogFacade(object):
last_timestamp = 0
def __init__(self, webdriver):
self._webdriver = webdriver
def get_log(self):
last_timestamp = self.last_timestamp
entries = self._webdriver.get_log("browser")
filtered = []
for entry in entries:
# check the logged timestamp against the
# stored timestamp
if entry["timestamp"] > self.last_timestamp:
filtered.append(entry)
# save the last timestamp only if newer
# in this set of logs
if entry["timestamp"] > last_timestamp:
last_timestamp = entry["timestamp"]
# store the very last timestamp
self.last_timestamp = last_timestamp
return filtered
Usage
log_facade = WebdriverLogFacade(driver)
logs = log_facade.get_log()
# more logs will be generated
logs = log_facade.get_log()
# newest log returned only

This thread is a few years old, but in case anyone else finds themselves here trying to solve a similar problem:
I also tried using driver.execute_script('console.clear()') to clear the console log between my login process and the page I wanted to check to no avail.
It turns out that calling driver.get_log('browser') returns the browser log and also clears it.
After navigating through pages for which you want to ignore the console logs, you can clear them with something like
_ = driver.get_log('browser')

Related

Plaid - can't run (python) quickstart

I'm working with the Plaid API found here but can't seem to get the quickstart to run properly. My latest attempt is below
import base64
import os
...
...
app = Flask(__name__)
# Fill in your Plaid API keys - https://dashboard.plaid.com/account/keys
PLAID_CLIENT_ID = 'xxxxxxxx' #os.getenv('xxxxx')
PLAID_SECRET = 'xxxxx' #os.getenv('xxxx')
...
PLAID_ENV = 'sandbox' #os.getenv('PLAID_ENV', 'sandbox')
...
PLAID_PRODUCTS = 'transactions' #os.getenv('PLAID_PRODUCTS', 'transactions').split(',')
...
PLAID_COUNTRY_CODES = 'US' #os.getenv('PLAID_COUNTRY_CODES', 'US').split(',')
def empty_to_none(field):
value = os.getenv(field)
if value is None or len(value) == 0:
return None
return field
...
PLAID_REDIRECT_URI = empty_to_none('http://localhost:8000/oauth-response.html')
client = plaid.Client(client_id=PLAID_CLIENT_ID,
secret=PLAID_SECRET,
environment=PLAID_ENV,
api_version='2019-05-29')
#app.route('/')
def index():
return render_template('index.html',)
When I run server.py and open the browser the button can't be selected. Also the list of banks just continuously loads. So I check chrome dev tools I find the error link-initialize.js:1 Uncaught Error: Missing Link parameter. Link requires a key or token to be provided. Is this because I didn't pass something in render_template? I can't tell from the index.html file found here & currently that's the only front end document referenced in the (python) repository. I looked at the question found here but it's several years old & I believe the integration has changed...
The problem here is that you're specifying a REDIRECT_URI but haven't configured the developer dashboard to accept that as your URI.
Unfortunately, the error messaging is currently swallowed and only visible in the network tab. We're going to fix it so that these errors are propagated into a place where they're more visible.

Scraping a rendered javascript webpage

I'm trying to build a short Python program that extracts Pewdiepie's number of subscribers which is updated every second on socialblade to show it in the terminal. I want this data like every 30 seconds.
I've tried using PyQt but it's slow, i've turned to dryscrape, slightly faster but doesn't work either as I want it to. I've just found Invader and written some short code that still has the same problem : the number returned is the one before the Javascript on the page is executed :
from invader import Invader
url = 'https://socialblade.com/youtube/user/pewdiepie/realtime'
invader = Invader(url, js=True)
subscribers = invader.take(['#rawCount', 'text'])
print(subscribers.text)
I know that this data is accessible via the site's API but it's not always working, sometimes it just redirect to this.
Is there a way to get this number after the Javascript on the page modified the counter and not before ? And which method seems the best to you ? Extract it :
from the original page which always returns the same number for hours ?
from the API's page which bugs when not using cookies in the code and after a certain amount of time ?
Thanks for your advices !
If you want scrape a web page that has parts of it loaded in by javascript you pretty much need to use a real browser.
In python this can be achieved with pyppeteer:
import asyncio
from pyppeteer import launch
async def main():
browser = await launch(headless=False)
page = await browser.newPage()
await page.goto('https://socialblade.com/youtube/user/pewdiepie/realtime',{
'waitUntil': 'networkidle0'
})
count = int(await page.Jeval('#rawCount', 'e => e.innerText'))
print(count)
asyncio.get_event_loop().run_until_complete(main())
Note: It does not seems like the website you mentioned above is updating the subscriber count frequently any more (even with JavaScript). See: https://socialblade.com/blog/abbreviated-subscriber-counts-on-youtube/
For best success and reliability you will probably need to set the user agent(page.setUserAgent in pyppeteer) and keep it up to date and use proxies (so your ip does not get banned). This can be a lot of work.
It might be easier and cheaper (in time and than buying a large pool of proxies) to use a service that will handle this for you like Scraper's Proxy. It supports will use a real browser and return the resulting html after the JavaScript has run and route all of our requests through a large network of proxies, so you can sent a lot of requests without getting you ip banned.
Here is an example using the Scraper's Proxy API getting the count directly from YouTube:
import requests
from pyquery import PyQuery
# Send request to API
url = "https://scrapers-proxy2.p.rapidapi.com/javascript"
params = {
"click_selector": '#subscriber-count', # (Wait for selector work-around)
"wait_ajax": 'true',
"url":"https://www.youtube.com/user/PewDiePie"
}
headers = {
'x-rapidapi-host': "scrapers-proxy2.p.rapidapi.com",
'x-rapidapi-key': "<INSERT YOUR KEY HERE>" # TODO
}
response = requests.request("GET", url, headers=headers, params=params)
# Query html
pq = PyQuery(response.text)
count_text = pq('#subscriber-count').text()
# Extract count from text
clean_count_text = count_text.split(' ')[0]
clean_count_text = clean_count_text.replace('K','000')
clean_count_text = clean_count_text.replace('M','000000')
count = int(clean_count_text)
print(count)
I know this is a bit late, but I hope this helps

Branch.io: javascript detect whether mobile app is installed

I have a web page where I want a particular link on a page to open our native mobile app if the app is installed, and if not, do what it is currently doing (which submits a form).
Note: I think this is different than a smart banner - I don't want the banner on this page. I just want the normal app flow if there is no mobile app.
I have integrated branch-sdk in the web page and in my ios app. I have successfully set up a deep link from the web page to the iOS app (code not shown), but I am not getting results I expect when sniffing for whether the app is installed.
Here's the code in the <head> of my webpage:
branch.init('MY_KEY', null, function(err, data) {
console.log('init...');
console.dir(data);
});
branch.setIdentity('test-user', function(err, data) {
console.log('identity...');
console.dir(data);
});
branch.data(function(err, data) {
console.log('data...');
console.dir(data);
})
Here's my code in application(_:didFinishLaunchingWithOptions:) in the iOS side:
//Initialize Branch
if let branch = Branch.getInstance() {
branch.setDebug()
branch.initSession(launchOptions: launchOptions, andRegisterDeepLinkHandler: { params, error in
if let params = params, error == nil {
// params are the deep linked params associated with the link that the user clicked -> was re-directed to this app
// params will be empty if no data found
// ... insert custom logic here ...
print("params: %#", params.description)
}
})
branch.setIdentity("test-user")
} else {
readerLog.error("failed to initialize branch")
}
Every time I load the webpage (even after loading it, and following the deep link), I get the following as output in the console. Note that +isfirstsession is true, and the hasApp property is null:
[Log] init... (localhost, line 18)
[Log] Object (localhost, line 20)
data: "{\"+is_first_session\":true,\"+clicked_branch_link\":false}"
data_parsed: {+is_first_session: true, +clicked_branch_link: false}
has_app: null
identity: null
referring_identity: null
referring_link: null
[Log] identity... (localhost, line 23)
[Log] Object (localhost, line 25)
identity_id: "352945822373397525"
link: "https://nlfd.app.link/?%24identity_id=352945822373397525"
link_click_id: "352947632809063724"
referring_data: "{\"$one_time_use\":false,\"+click_timestamp\":1485387510,\"_branch_match_id\":\"352945819276765390\",\"_t\":\"352945819276765390\",\"referrer\":\"link_clic…"
referring_data_parsed: Object
session_id: "352945822337503746"
[Log] data... (localhost, line 28)
[Log] Object (localhost, line 30)
data: "{\"+is_first_session\":true,\"+clicked_branch_link\":false}"
data_parsed: null
has_app: null
identity: null
referring_identity: null
referring_link: null
What am I doing wrong? I was hoping I could just look at the has_app property after init or after setting the identity. Is it incorrect to assume that has_app would return true in this case? Is there a more appropriate way to do what I want to do?
Alex from Branch.io here:
Unfortunately the has_app parameter is not especially reliable. It's good enough for switching a button between displaying an Open or an Install label, but ideally you don't want to use it for functional things. This is a limitation caused by iOS: Apple doesn't allow web pages to query for which apps are installed on a device (for obvious privacy reasons), so Branch has to rely on cookie matching. This means if we can't match the cookie, or haven't seen the user recently, or the user clears their device cache, or the user has uninstalled the app since the last time Branch saw them, the value of has_app will be incorrect.
HOWEVER, even though Apple doesn't allow web pages to query access this data, iOS itself can still act on it. Universal Links do exactly this — when the user opens a Universal Link (which includes Branch links, assuming you got all the configuration done), the app will open if it is installed. If it is not installed, the user will be sent to the URL of the link. You just need to put a Branch link behind that button.
Note however that this doesn't work for form submission buttons, so you would need to come up with some UX workaround for this. Or you might be able to find a way to submit the form after a delay if the app doesn't open, by using a Javascript timer.

Meteor SmartCollection giving inconsistent results

On the browser JS console, News.insert({name: 'Test'}) caused {{count}} to increase from 0 to 1.
In mongo console mrt mongo, db.news.find().count() returns 1. However after adding a record via the mongo console db.news.insert({name: 'TestAgain'}), {{count}} remains at 1 while in mongo, there are 2 records now.
Question: What is causing minimongo and the mongodb console to give inconsistent results?
If I replace Meteor.SmartCollection with Meteor.Collection and reload the page, {{count} is now 2. But if I were to change it back to Meteor.SmartCollection, {{count}} goes back to 1!!
collections/news.js
News = new Meteor.SmartCollection('news');
client/views/main.html
<template name="news">
{{ count }}
</template>
client/views/main.js
Template.news.count = function() {
return News.find().count();
}
Using Meteor v6.6.3 with SmartCollection v0.3.2.2
Update
By Cuberto's suggestion, I have enabled Oplog on my Mongodb server.
export MONGO_URL=mongodb://192.168.1.111:27017/myDb
export OPLOG_URL=mongodb://192.168.1.111:27017/local
mrt
mongod runs with --replSet meteor and mongodb was configured with
var config = {_id: "meteor", members: [{_id: 0, host: "127.0.0.1:27017"}]}
rs.initiate(config)
The prompt in mongo also becomes meteor:PRIMARY> and db.local. does contain the collection oplog.rs.
Starting meteor, we see in the console SmartCollection charged with MongoDB Oplog.
Problem: However, nothing is retrieved when we try to do News.find() in the browser JS console. Doing the same query in mongo client returns the correct result. Switching from Meteor.SmartCollection back to Meteor.Collection allows the site to work again.
How can we troubleshoot the problem with SmartCollection?
Make sure you configure your MongoDB to use oplog and set the environment variables, as explained here:
http://meteorhacks.com/lets-scale-meteor.html
Since smart collections removes the periodic database poll, you need to use an oplog-enabled mongodb instance to make it recognize DB changes from outside meteor.

Getting selenium to send keys to google's sign in box (coding in python)

I have a problem with sending keys to the username and password fields in google's sign in box. Selenium finds the webelements with the id's "Email" and "Passwd", but I cannot send any keys to them.
Here's the code that isn't yielding the expected results:
from selenium import webdriver
#from selenium.webdriver.common.keys import Keys
import time
username = "test"
password = "ninja"
driver = webdriver.Firefox()
driver.get(u'http://www.google.com')
driver.implicitly_wait(10)
elem = driver.find_element_by_id("gb_70")
elem.click()
username = driver.find_element_by_id('Email')
username.send_keys(username)
This code generates an error:
Traceback (most recent call last):
File "SERPkopi.py", line 18, in
username.send_keys(username)
File "/Users/Sverdrup/virtualenv-1.6.1/Alert/lib/python2.7/site- packages/selenium/webdriver/remote/webelement.py", line 142, in send_keys
local_file = LocalFileDetector.is_local_file(*value)
File "/xxxxx/xxxxxx/virtualenv-1.6.1/Alert/lib/python2.7/site- packages/selenium/webdriver/remote/webelement.py", line 253, in is_local_file
for i in range(len(val)):
TypeError: object of type 'WebElement' has no len()
Which is strange, because the same code can write to google's 'q' (query) field
I've tried to identify the webElement by id, name, xpath to no avail.
Background as to why:
I got my eyes open for google alerts today and want to set up alerts on my company's customer's names (this is a business to business setup, so the customers are companies themselves). The customers are relatively small, and I do not imagine that I'll get a lot of alerts on their names, but it would be great to be able to keep track of them.
Seeing as how there isn't a API for google alerts, I thought I'd use selenium to programmatically enter the couple of hundred customers names. I first have to be able to log on to my account though...
I would realy appreciate all and any help given.
Sincerely
So this may be a bittersweet answer, but your script is essentially fine. Here is the problem:
username = "test"
#....
username = driver.find_element_by_id('Email')
username.send_keys(username)
You set username just fine, but you then go on to redefine it as the Email element, which then causes an error because of the final line - you are sending the username element back to itself as a string (which is the argument of send_keys), causing an Inception-like event of chaos. The len error is because Selenium is trying to take the length of the argument to send_keys, which it expects to be a string but is in this case an element. In order to fix it, simply change one of the variable names. For instance
user_field = driver.find_element_by_id('Email')
user_field.send_keys(username)

Categories