I am trying to automatize some SAP Job monitoring with Python. I want to create a script that should do the following:
Connect and login the SAP environment -> Open SM37 transaction -> Send job parameters (name-user-from-to) -> Read the output and store it into a database.
I don't know about any module or library that allow me to do that. So I checked the WEBGUI is already enabled. I am able to open the environment through a Browser. A browsing module should allows me to do everything I need.
Tried with Mechanize and RoboBrowser. It works but the WEBGUI runs a lot of javascript for renderize and those modules doesn't handle javascript.
There is one more shot: Selenium.
I was able to connect and login to the environment. But when trying to select an element from new page (main menu), Selenium cannot locate the element.
Printing the sourcecode I realized that the Main Menu site is rendered with javascript. The sourcecode doesn't contains the element at all, only the title ("Welcome "). That means the login was successfull.
I read a lot of posts asking for this, and everybody reccommend to use WebDriverWait with some explicit conditions.
Tried this, didn't work:
driver.get("http://mysapserver.domain:8000/sap/bc/gui/sap/its/webgui?sap-client=300&sap-language=ES")
wait = WebDriverWait(driver, 30)
element = wait.until(EC.presence_of_element_located((By.ID, 'ToolbarOkCode')))
EDIT:
There are two sourcecodes: SC-1 is the one that Selenium reads. SC-2 is the one that appears once the javascript renders the site (the one from "Inspect Element").
The full SC-1 is this:
https://pastebin.com/5xURA0Dc
The SC-2 for the element itself is the following:
<input id="ToolbarOkCode" ct="I" lsdata="{0:'ToolbarOkCode',1:'Comando',4:200,13:'150px',23:true}" lsevents="{Change:[{ClientAction:'none'},{type:'TOOLBARINPUTFIELD'}],Enter:[{ClientAction:'submit',PrepareScript:'return\x20its.XControlSubmit\x28\x29\x3b',ResponseData:'delta',TransportMethod:'partial'},{Submit:'X',type:'TOOLBARINPUTFIELD'}]}" type="text" maxlength="200" tabindex="0" ti="0" title="Comando" class="urEdf2TxtRadius urEdf2TxtEnbl urEdfVAlign" value="" autocomplete="on" autocorrect="off" name="ToolbarOkCode" style="width:150px;">
Still can't locate the element. How can I solve it?
Thanks in advance.
The solution was to go into the iframe that containts the renderized html (with the control).
driver2.get("http://mysapserver.domain:8000/sap/bc/gui/sap/its/webgui?sap-client=300&sap-language=ES")
iframe = driver2.find_elements_by_tag_name('iframe')[0]
driver2.switch_to_default_content()
driver2.switch_to_frame(iframe)
driver2.find_element_by_id("ToolbarOkCode").send_keys("SM37")
driver2.find_element_by_id("ToolbarOkCode").send_keys(Keys.ENTER)
Related
I am trying to create a script to download an ebook into a pdf. When I try to use beautifulsoup in it I to print the contents of a single page, I get a message in the console stating "Oh no! It looks like JavaScript is disabled in your browser. Please re-enable to access the reader."
I have already enabled Javascript in Chrome and this same piece of code works for a page like a stackO answer page. What could be blocking Javascript in this page and how can I bypass it?
My code for reference:
url = requests.get("https://platform.virdocs.com/r/s/0/doc/350551/sp/14552484/mi/47443495/?cfi=%2F4%2F2%5BP7001013978000000000000000003FF2%5D%2F2%2F2%5BP7001013978000000000000000010019%5D%2F2%2C%2F1%3A0%2C%2F1%3A0")
url.raise_for_status()
soup = bs4.BeautifulSoup(url.text, "html.parser")
elems = soup.select("p")
print(elems[0].getText())
The problem is that the page actually contains no content. To load the content it needs to run some JS code. The requests.get method does not run JS, it just loads the basic HTML.
What you need to do is to emulate a browser, i.e. 'open' the page, run JS, and then scrape content. One way to do it is to use a browser driver as described here - https://stackoverflow.com/a/57912823/9805867
I'm trying to scrape some site in Node.js. I've followed a great tutorial however realize that it might not be what I am looking for, ie. might be looking at scraping the javascript portion of the page instead of the html one.
Is that possible ?
Reason for that is that I am looking for loading the content of the below portion of the code I could find by inspecting in Safari (not showing in Chrome) a kayak.com page (see url below) and seems to be in a scripting section.
reducer: {"reducerPath":"flights\/results\/react\/reducers\/
https://www.kayak.com/flights/TYO-PAR/2019-07-05-flexible/2019-07-14-flexible/1adults/children-11?fs=cfc=1;legdur=-960;stops=~0;bfc=1&sort=bestflight_a&attempt=2&lastms=1550392662619
UPDATE: Unfortunately, this site uses bot/scrape protection: tools like curl get a page with bot warning, headless browser tools like puppeteer get a page with captcha.
===============
As this line is present in the HTML source code and is not added dynamically by JavaScript execution, you can use something like this with the appropriate library API:
const extractedString = [...document.querySelectorAll('script')]
.map(({ textContent }) => textContent)
.find(txt => txt.includes('string'))
.match(/regexp/);
I have been trying to:
Go to:
mdoe.state.mi.us/moecs/PublicCredentialSearch.aspx
Enter a certificate number (for the sake of illustration, you can just search for "Davidson" as the last name).
Click on a link corresponding to "Professional Teaching Certificate".
Copy and paste the resulting table.
The rub seems to be with the JavaScript doPostBack() part, as it requires rendering, I believe, to get the data.
When viewing the source code, see how the href part identifies an individual link like this? (for the 6th link down):
href="javascript:__doPostBack('ctl00$ContentPlaceHolder1$gViewCredentialSearchList$ctl07$link1','')
From this:
<td class="MOECSNormal" style="border-color:Black;border-width:1px;border-style:Solid;">Professional Teaching Certificate Renewal</td><td class="MOECSNormal" style="border-color:Black;border-width:1px;border-style:Solid;">
<a id="ContentPlaceHolder1_gViewCredentialSearchList_link1_5" ItemStyle-BorderColor="Black" ItemStyle-BorderStyle="Solid" ItemStyle-BorderWidth="1px" href="javascript:__doPostBack('ctl00$ContentPlaceHolder1$gViewCredentialSearchList$ctl07$link1','')">CC-XWT990004102</a>
</td>
I'm looking for a way (via Python) to get the data I need into a table, given a certification number and certificate name (i.e. "Professional Teaching Certificate".
I have tried following a tutorial using PyQt4, but installing it alone was traumatic.
Thanks in advance!
You can open the page in a browser e.g. Chrome and study how the interaction is done between the page and the server, normally this information can be seen in the network tab of Developer tool, this way you can formulate a python script to do the steps maybe using requests library
or
You can use selenium-python to do simulate your browser interaction (including javascript calls) until you got to the page where your interested data belongs to.
I'm writing a webscraper/automation tool. This tool needs to use POST requests to submit form data. The final action uses this link:
<a id="linkSaveDestination" href='javascript:WebForm_DoPostBackWithOptions(new WebForm_PostBackOptions("linkSaveDestination", "", true, "", "", false, true))'>Save URL on All Search Engines</a>
to submit data from this form:
<input name="sem_ad_group__destination_url" type="text" maxlength="1024" id="sem_ad_group__destination_url" class="TextValueStyle" style="width:800px;">
I've been using requests and BeautifulSoup. I understand that these libraries can't interact with Javascript, and people recommend Selenium. But as I understand it Selenium can't do POSTs. How can I handle this? Is it possible to do without opening an actual browser like Selenium does?
Yes. You can absolutely duplicate what the link is doing by just submitting a POST to the proper url (this is, in reality, eventually going to be the same thing that the javascript that fires when the link is clicked does).
You'll find the relevant section in the requests docs here: http://docs.python-requests.org/en/latest/user/quickstart/#more-complicated-post-requests
So, that'll look something like this for your particular case:
payload = {'sem_ad_group__destination_url': 'yourTextValueHere'}
r = requests.post("theActionUrlForTheFormHere", data=payload)
If you're having trouble figuring out what url it is actually be posted to, just monitor the network tab (in chrome dev tools) while you manually click the link yourself, you should be able to find the right request and pull any information off of that.
Good Luck!
With selenium you mimic the real-user interactions in a real browser - tell it to locate an input, write a text inside, click a button etc - high-level approach - you don't even need to know what is there under-the-hood, you see what a real user sees. The downside here is that there is a real browser involved which, at least, slows things down. You can though, automate a a headless browser (PhantomJS), or use a Xvfb virtual framebuffer if you don't have conditions to open up a browser with a UI. Example:
from selenium import webdriver
driver = webdriver.PhantomJS()
driver.get('url here')
button = driver.find_element_by_id('linkSaveDestination')
button.click()
With requests+BeautifulSoup, you are going down to the bare metal - using browser developer tools you research/analyze what requests are made to a server and mimic them in your code. Sometimes the way a page is constructed and requests made are too complicated to automate, or there are anti-web-scraping technique used.
There are pros & cons about both approaches - which option to choose depends on many things.
I saw some guy had a file (I guess a batch file). On clicking of the batch file he was able to log in to multiple sites. (Perhaps it was done using VB.)
I looked for such a script on Google but didn't find anything useful.
I know a bit of C++ and UNIX (also some HTML and JavaScript). I don't know if it can be done on a windows machine using these languages, but even if it could be done I think it would be difficult compared to VB or C## or some other high level languages.
I learned how to open multiple sites using basic windows batch commands enclosed in a batch file like:
start http://www.gmail.com
start http://stackoverflow.com
But still I can't figure out how actually clicking on the batch file would help me to log in to the sites without even typing the username and password.
Do I need to start learning Visual Basic, .NET, or windows batch programming to do this?
One more thing: can I also use it to log in to remote desktops?
From the term "automatic login" I suppose security (password protection) is not of key importance here.
The guidelines for solution could be to use a JavaScript bookmark (idea borrowed form a nice game published on M&M's DK site).
The idea is to create a javascript file and store it locally. It should do the login data entering depending on current site address. Just an example using jQuery:
// dont forget to include jQuery code
// preferably with .noConflict() in order not to break the site scripts
if (window.location.indexOf("mail.google.com") > -1) {
// Lets login to Gmail
jQuery("#Email").val("youremail#gmail.com");
jQuery("#Passwd").val("superSecretPassowrd");
jQuery("#gaia_loginform").submit();
}
Now save this as say login.js
Then create a bookmark (in any browser) with this (as an) url:
javascript:document.write("<script type='text/javascript' src='file:///path/to/login.js'></script>");
Now when you go to Gmail and click this bookmark you will get automatically logged in by your script.
Multiply the code blocks in your script, to add more sites in the similar manner. You could even combine it with window.open(...) functionality to open more sites, but that may get the script inclusion more complicated.
Note: This only illustrates an idea and needs lots of further work, it's not a complete solution.
The code below does just that. The below is a working example to log into a game. I made a similar file to log in into Yahoo and a kurzweilai.net forum.
Just copy the login form from any webpage's source code. Add value= "your user name" and value = "your password". Normally the -input- elements in the source code do not have the value attribute, and sometime, you will see something like that: value=""
Save the file as a html on a local machine double click it, or make a bat/cmd file to launch and close them as required.
<!doctype html>
<!-- saved from url=(0014)about:internet -->
<html>
<title>Ikariam Autologin</title>
</head>
<body>
<form id="loginForm" name="loginForm" method="post" action="http://s666.en.ikariam.com/index.php?action=loginAvatar&function=login">
<select name="uni_url" id="logServer" class="validate[required]">
<option class="" value="s666.en.ikariam.com" fbUrl="" cookieName="" >
Test_en
</option>
</select>
<input id="loginName" name="name" type="text" value="PlayersName" class="" />
<input id="loginPassword" name="password" type="password" value="examplepassword" class="" />
<input type="hidden" id="loginKid" name="kid" value=""/>
</form>
<script>document.loginForm.submit();</script>
</body></html>
Note that -script- is just -script-. I found there is no need to specify that is is JavaScript. It works anyway. I also found out that a bare-bones version that contains just two input filds: userName and password also work. But I left a hidded input field etc. just in case. Yahoo mail has a lot of hidden fields. Some are to do with password encryption, and it counts login attempts.
Security warnings and other staff, like Mark of the Web to make it work smoothly in IE are explained here:
http://happy-snail.webs.com/autologinintogames.htm
I used #qwertyjones's answer to automate logging into Oracle Agile with a public password.
I saved the login page as index.html, edited all the href= and action= fields to have the full URL to the Agile server.
The key <form> line needed to change from
<form autocomplete="off" name="MainForm" method="POST"
action="j_security_check"
onsubmit="return false;" target="_top">
to
<form autocomplete="off" name="MainForm" method="POST"
action="http://my.company.com:7001/Agile/default/j_security_check"
onsubmit="return false;" target="_top">
I also added this snippet to the end of the <body>
<script>
function checkCookiesEnabled(){ return true; }
document.MainForm.j_username.value = "joeuser";
document.MainForm.j_password.value = "abcdef";
submitLoginForm();
</script>
I had to disable the cookie check by redefining the function that did the check, because I was hosting this from XAMPP and I didn't want to deal with it. The submitLoginForm() call was inspired by inspecting the keyPressEvent() function.
You can use Autohotkey, download it from: http://ahkscript.org/download/
After the installation, if you want to open Gmail website when you press Alt+g, you can do something like this:
!g::
Run www.gmail.com
return
Further reference: Hotkeys (Mouse, Joystick and Keyboard Shortcuts)
Well, its true that we can use Vb Script for what you intended to do.
We can open an application through the code like Internet Explorer. We can navigate to site you intend for. Later we can check the element names of Text Boxes which require username and password; can set then and then Login. It works fine all of using code.
No manual interaction with the website. And eventually you will end up signing in by just double clicking the file.
To get you started :
Set objIE = CreateObject("InternetExplorer.Application")
Call objIE.Navigate("https://gmail.com")
This will open an instance of internet explore and navigate to gmail.
Rest you can learn and apply.