I looked up various questions and answers but unfortunately none of the problems I found dealt with a case that is similar to mine. In a typical question, the JavaScript table builds up directly when the website is loaded. In my case, however, I first have to navigate through the JavaScript module and select several criteria before I get the sought-after result.
This is my case: I have to scrape the exchange rates for various currencies from this website www.globocambio.co. To do that, I have (1) to navigate to “I WANT COLOMBIAN PESO”, (2) select the currency (e.g., “Chilean Peso”), (3) and the collection destination (e.g., “El Dorado International Airport”). Only then the respective exchange rate is being loaded. See this screenshot for illustration. I marked the three selection steps red. Green is the data point that I want to scrape for different currencies.
I am not very familiar with JavaScript but I tried to understand what is going on. Here is what I found out:
Using Chrome DevTools, I investigated the Network activity when loading an exchange rate. There is an XHR called “GetPrice” that requests the price using this URL: https://reservations.globocambio.co/DesktopModules/GlobalExchange/API/Widget/GetPrice and using the following Form Data
ISOAOrigen=CLP&cantidadOrigen=9000&ISOADestino=COP&cantidadDestino=0¢erId=27&operationType=OperationTypesBuying
I understand that the Form Data contains the information that I initially selected manually:
operationType=OperationTypesBuying: this is the “I WANT COLOMBIAN PESO” option
ISOAOrigen=CLP: this is the “Chilean Peso”
centerId=27: this is the “El Dorado International Airport”
The server responds to my request with the following information:
{“MonedaOrigen":{"ISOA":"CLP","Nombre":null,"Margen":0.1630000000,"Tramo":0.0,"Fixing":2.9000000000},"CantidadOrigen":9000.00,"MonedaDestino":{"ISOA":"COP","Nombre":null,"Margen":0.0,"Tramo":0.0,"Fixing":0.0},"CantidadDestino":21845.70,"TipoCambio":2.42730000000000000000,"MargenOrigen":0.0,"TramoOrigen":0.0,"FixingOrigen":0.0,"MargenDestino":0.0,"TramoDestino":0.0,"FixingDestino":0.0,"IdCentro":"27","Comision":null,"ComisionTramoSuperior":null,"ComisionAplicada":{"CodigoMoneda":null,"CodigoTipoMoneda":0,"ComisionFija":0.0,"ComisionVariable":0.0,"TramoInicio":0.0,"TramoFin":null,"Orden”:0}}
From this response, "TipoCambio":2.42730000000000000000 is then being written on the website using this line of HTML code: <span id="spTipoCambioCompra">2.427300</span>
This means that "TipoCambio" is the value that I am looking for.
So, I have to communicate somehow via R with the server using the Form Data as input variables. Can anyone tell me how to do this?
I mean, understand that I have to combine the URL https://reservations.globocambio.co/DesktopModules/GlobalExchange/API/Widget/GetPrice with the Form Data “ISOAOrigen=CLP&cantidadOrigen=9000&ISOADestino=COP&cantidadDestino=0¢erId=27&operationType=OperationTypesBuying” somehow but I do not know how it works..
Any help will be appreciated!
Update:
I still have no idea how to solve the above issue, yet. However, I try to approach it with small steps.
Using RSelenium, I am currently trying to find out how to click on the option “I WANT COLOMBIAN PESO”. My idea was to use the following code:
library(RSelenium)
remDr <- RSelenium::remoteDriver(remoteServerAddr = "localhost",
port = 4445L,
browserName = "chrome")
remDr$open()
remDr$navigate("https://www.globocambio.co/en/home")
webElem <- remDr$findElement("id", "tabCompra") #What is wrong here?
webElem$clickElement() # Click on "I WANT COLOMBIAN PESO"
But I get an error message after executing webElem <- remDr$findElement("id", "tabCompra"):
Selenium message:no such element: Unable to locate element: {"method":"css selector","selector":"#tabCompra"}
(Session info: chrome=81.0.4044.113)
For documentation on this error, please visit: https://www.seleniumhq.org/exceptions/no_such_element.html
...
Error: Summary: NoSuchElement
Detail: An element could not be located on the page using the given search parameters.
class: org.openqa.selenium.NoSuchElementException
Further Details: run errorDetails method
What am I doing wrong here?
I solved my problem using selenium in Python:
from selenium import webdriver
driver = webdriver.Firefox(executable_path = '/your_path/geckodriver')
driver.get("https://www.globocambio.co/en/")
driver.switch_to.frame("iframeWidget");
elem = driver.find_element_by_id('tabCompra')
elem.click()
elem = driver.find_element_by_id('inputddlMonedaOrigenCompra')
elem.click()
elem.send_keys(Keys.CLEAR)
elem.send_keys("Chilean Peso")
elem.send_keys(Keys.ENTER)
elem.send_keys(Keys.ARROW_DOWN)
elem.send_keys(Keys.RETURN)
elem = driver.find_element_by_id('info-change-compra')
print(elem.text)
I am trying to access the X-men API on wikia, to try and extract the name and image of each character, to then be used on a SPA using javascript.
This is the link too the page on the wiki:
http://x-men.wikia.com/wiki/Category:Characters
I cannot for the life of me figure out how to access the API. It doesn't seem to be RESFTful, and that's all I have any experience in.
Has anyone used the Wikia API successfully before? I can get some articles and such, but nothing useful.
(The documentation is shocking, been searching around for hours.)
Probably you have already found a solution, but I think you should write something like this:
import requests
xmen_url = "http://x-men.wikia.com/api/v1/Articles/List?expand=1&category=Characters&limit=10000"
r = requests.get(xmen_url)
response = r.json()
# print response
a = 0
for item in response['items']:
a += 1
print("{}\t{}\t({})".format(str(a),item['title'].encode(encoding='utf-8'),item['id']))
This will print a list of all the articles of the category Characters (I think there also some subcategories, you should check). If you want to take a deeper look at the json file you can uncomment the commented code.
Hope it helps.
I'm trying to sign in to my Wells Fargo account and scrape my transaction history so that I can use them to track my finances. I am able to do the scraping part if I can get to the HTML of the page. The problem I'm having is getting there and the below code is returning a whole lot of gibberish to me.
####Bring in BeautifulSoup and urllib.
import bs4
import urllib.request
import requests
####Navigate to the website.
url = 'https://connect.secure.wellsfargo.com/auth/login/do'
payload = {"j_username":"USERNAME", "j_password":"PASSWORD"}
r = requests.post(url, payload)
print(r.text)
This code is outputting the following:
<html><head><meta http-equiv="Pragma" content="no-cache"/>
<meta http-equiv="Expires" content="-1"/>
<meta http-equiv="CacheControl" content="no-cache"/>
<script>
(function(){
var securemsg;
var dosl7_common;
window["bobcmn"] = "1011200000002200000001300000021application/x-www-form-urlencoded3000000088adfa450300000008TSPD_101300000014%2fauth%2flogin%2fdo300000000300000006/TSPD/300000008TSPD_101300000005https3000000b6#sCmnToken#0BC26lnGAWSD9m6NkEoMZy0dIjA7Os6O4oLerWkImSHetiQqPjvoid03xpkXMNwHZ4wUmjd9+FeNk7M7zEe5ESlixC/1O8E7X61l10gL4ddUAhMNR4LaIYlGkq+hckjmRwTXudNvohk90GvOs8Ea9fFIoAAAAAE=#eCmnToken#200000000";
try{(function(){try{var jS,JS,LS=1,oS=1,OS=1,zS=1,S_=1,__=1,i_=1,I_=1,j_=1;for(var L_=0;L_<JS;++L_)LS+=2,oS+=2,OS+=2,zS+=2,S_+=2,__+=2,i_+=2,I_+=2,j_+=3;jS=LS+oS+OS+zS+S_+__+i_+I_+j_;window.i===jS&&(window.i=++jS)}catch(o_){window.i=jS}var O_=window.sdkljshr489=!0;function z_(S){window.sdkljshr489&&S&&(O_=!1);return O_}function Z_(){}z_(window[Z_.name]===Z_);z_("undefined"===window.vodsS0);window.vodsS0=null;z_(/\x3c/.test(function(){return"\x3c"})&/x3d/.test(function(){return"0";"x3d"}));
var s_=/mobi/i.test(navigator.userAgent),Si=+new Date,_i=s_?3E4:3E3;function ii(){return z_(Si+_i<(Si=+new Date))}
(function Ii(){var J=!1;function l(J){for(var l=0;J--;)l+=L(document.documentElement,null);return l}function L(J,l){var Z="vi";l=l||new z;return _S(J,function(J){J.setAttribute("data-"+Z,l.SS());return L(J,l)},null)}function z(){this.O=1;this.L=0;this._=this.O;this.j=null;this.SS=function(){this.j=this.L+this._;if(!isFinite(this.j))return this.reset(),this.SS();this.L=this._;this._=this.j;this.j=null;return this._};this.reset=function(){this.O++;this.L=0;this._=this.O}}var Z=!1;function s(J,l){var L=
document.createElement(J);l=l||document.body;l.appendChild(L);L&&L.style&&(L.style.display="none")}function iS(l,L){L=L||l;var z="|";function s(J){J=J.split(z);var l=[];for(var L=0;L<J.length;++L){var Z="",lS=J[L].split(",");for(var SS=0;SS<lS.length;++SS)Z+=lS[SS][SS];l.push(Z)}return l}var _S=0,IS="datalist,details,embed,figure,hrimg,strong,article,formaddress|audio,blockquote,area,source,input|canvas,form,link,tbase,option,details,article";IS.split(z);IS=s(IS);IS=new RegExp(IS.join(z),"g");while(IS.exec(l))IS=
new RegExp((""+new Date)[8],"g"),J&&(Z=O_),++_S;return L(_S&&1)}function _S(J,l,L){(L=L||Z)&&s("div",J);J=J.children;var z=0;for(var _S in J){L=J[_S];try{L instanceof HTMLElement&&(l(L),++z)}catch(IS){}}return z}iS(Ii,l)})();window.oi={iI:"08c787b5a40180002943d30328de8438de8cc553d459dcd4fc6c4cb17feaa34f085900356d674a1888119e0ea122f11994fc63fbabf471ce1f60053949777f087711d376633d1c30cd2e2f14295017cd8afeedacf0c4783d8b9ec0abec9808a830fa17d4cc351f649688f2b9c98cc0961ddcaf13fb0e7020486252f76f751366cdb10741f04ad6fd"};function _(S){return 753>S}function I(){var S=arguments.length;for(var J=0;J<S;++J)arguments[J]-=38;return String.fromCharCode.apply(String,arguments)}function O(S){return S.toString(36)}(function ji(J){return J?0:ji(J)*ji(J)})(ii());var v;})();}finally{sdkljshr489=false;ie9rgb4=void(0);};
eval((ie9rgb4=function (){var m='function () {/*fQb f_TcC}-di`U_V YU)bWR$+dbikuVe^SdY_^uvkdbikfQb ZCy:Cy<C-!y_C-!y?C-!+V_bufQb <O-}+<O,:C+xx<Ov<Cx-"y_Cx-"y?Cx-#+ZC-<Cx_Cx?C+gY^T_g{Y---ZCssugY^T_g{Y-xxZCvmSQdSXu_OvkgY^T_g{Y-ZCmfQb ?O-gY^T_g{cT[\\ZcXb$()-n}+Ve^SdY_^ jOuCvkgY^T_g{cT[\\ZcXb$()ssCssu?O-n!v+bUdeb^ ?OmVe^SdY_^ JOuvkmjOugY^T_gKJO{^Q]UM---JOv+jOuoe^TUVY^UTo---gY^T_g{f_TcC}v+gY^T_g{f_TcC}-^e\\\\+jOu|Lh#S|{dUcduVe^SdY_^uvkbUdeb^oLh#Somvs|h#T|{dUcduVe^SdY_^uvkbUdeb^o}o+oh#Tomvv+\r\nfQb cO-|]_RY|Y{dUcdu^QfYWQd_b{ecUb1WU^dvyCY-x^Ug 4QdUyOY-cO/#5$*#5#+Ve^SdY_^ YYuvkbUdeb^ jOuCYxOY,uCY-x^Ug 4QdUvvmuVe^SdY_^uvkfQb C-kTUSbi`d*Ve^SdY_^uCvkdbikbUdeb^ :C?>{`QbcUuVe^SdY_^uCvkC-C{c`\\Yduo\\ov+fQb :-oo+V_bufQb \\-}+\\,C{\\U^WdX+xx\\v:x-CdbY^W{Vb_]3XQb3_TUuCK\\Mv+bUdeb^ :muCvvmSQdSXu\\vkmmm+bUdeb^ C-kS_^VYWebQdY_^*C{TUSbi`duo!"#\\#$\\)\'\\))\\!!&\\!}%\\!!(\\!}!\\#$\\%(\\#$\\!!}\\!!!\\#$\\$$\\#$\\!}}\\!}!\\)(\\!!\'\\!}#\\!}#\\!}%\\!!}\\!}#\\#$\\%(\\#$\\!!}\\!!!\\#$\\$$\\#$\\!})\\!!!\\!}}\\!!\'\\!}(\\!}!\\$)\\#$\\%(\\#$\\!}!\\!!}\\)\'\\)(\\!}(\\!}!\\!}}\\#$\\$$\\#$\\!})\\!!!\\!}}\\!!\'\\!}(\\!}!\\%}\\#$\\%(\\#$\\!}!\\!!}\\)\'\\)(\\!}(\\!}!\\!}}\\#$\\$$\\#$\\!})\\!!!\\!}}\\!!\'\\!}(\\!}!\\%!\\#$\\%(\\#$\\!}!\\!!}\\)\'\\)(\\!}(\\!}!\\!}}\\#$\\$$\\#$\\!})\\!!!\\!}}\\!!\'\\!}(\\!}!\\%"\\#$\\%(\\#$\\!}!\\!!}\\)\'\\)(\\!}(\\!}!\\!}}\\#$\\!"%ovmmvuv+\r\ncUSebU]cW-kcZC*Ve^SdY_^uCvkbUdeb^ cUSebU]cWK?u"(()\'vMucUSebU]cW{jYuuOu!&}vy}vyCyOu)""v/}*!vyVe^SdY_^uvkbUdeb^ CdbY^WK9u!$}y!%"y!$)y!$\'y!}%y!$"y!#%y!%"y!}%y!$)y!#(y!#)vMu=QdXK?u"&"}&}!!vMu=QdXK?u!&%}$\'#\'#$vMuvwuOu))#v/#$"*"%&vxuOu""$v/!*}vvruOu")\'v/"%&*#""vvmvK?u)!("#)vMuoovmyjC*Ve^SdY_^uCvkbUdeb^uuCsuOu"%#v/"%%*""}vv,,uOu)$\'v/"!*"$vluCsuOu$(&v/&%"(}*&&%%%vv,,uOu)}!v/%*(vlC..uOu(()v/)*(vsuOu&&"v/&%"(}*\'##\'\'vlC..uOu%%)v/"$*#"vsuOu!&$v/"%%*#"\'vv...uOu!)"vy}vmy9}*Ve^SdY_^uCy:vkV_bufQb \\-ooy<-uOu%&)vy\r\n}v+<,CK?u!")$#))"}%vM+<xxv\\x-CdbY^WKoLe}}&&b_]3Lh&(Qb3_TUoMuCK9u!#\'y!$"y!#%y!%"y!}%y!$)y!#(y!#)y!}#y!%$vMuu<xCK?u!")$#))"}%vMz:vrCK?u!")$#))"}%vMvv+bUdeb^ \\myYZC*Ve^SdY_^uCy:vkbUdeb^ cUSebU]cW{9}uCyCK?u!")$#))"}%vMz:vmy<O*Ve^SdY_^uCy:vkYVuCK?u!")$#))"}%vMn-:K?u!")$#))"}%vMvdXb_g cUSebU]cW{:CuCvycUSebU]cW{:Cu:vyoo+V_bufQb \\-ooy<-uOu\'&#vy}v+<,CK?u!")$#))"}%vM+<xxv\\x-CdbY^WKoLe}}&&bLh&V]Le}}$#XLh&!bLe}}$#_Lh&$UoMuCK9u!#\'y\r\n!$"y!#%y!%"y!}%y!$)y!#(y!#)y!}#y!%$vMu<vN:KoLe}}&#XQbLh$#_TU1doMu<vv+bUdeb^ \\my<C*Ve^SdY_^uCy:vkbUdeb^uuC...uOu""!vy}vvxu:...uOu"")vy}vvsuOu)$&v/"!$\'$(#&$\'*$")$)&\'")%vv...uOu&$#vy}vmyO:*Ve^SdY_^uCy:vkbUdeb^uuC...uOu()(vy}vvz:suOu!}&v/$")$)&\'")%*"!$\'$(#&$\'vv...uOu#\'}vy}vmy_%*Ve^SdY_^uCy:y\\vkdbikYVuCK?u!")$#))"}%vMn-uOu("#v/""*!&vvdXb_goo+YVu:K?u!")$#))"}%vMn-uOu&}&v/(*&vvdXb_goo+fQb <-cUSebU]cW{c_uCv+<KOu()vy}M-cUSebU]cW{jCu<KOu\'(&vy}Mv+<KOu\'(#v/}*!M-cUSebU]cW{jCu<KOu)!(v/\r\n}*!Mv+<KOu)"(v/!*"M-cUSebU]cW{jCu<KOu&\'}vy"Mv+<KOu\'(%v/"*#M-cUSebU]cW{jCu<KOu!\')vy#Mv+fQb j-cUSebU]cW{c_u:vyJ-cUSebU]cW{jCujKOu\'$}vy}Mvyc-cUSebU]cW{jCujKOu(\'}v/}*!MvyYC-u\\/Ou)"&v/"!$\'$(#&$\'*$"$\'})\'"#}$*uOu)&#vy}vv...uOu$\'#vy}v+YVu\\vV_bufQb OC-Ou""!v/!%*!"+OC.-uOu"%#vy}v+OCzzvfQb CC-cUSebU]cW{<CuJ,,uOu#\'%v/$*#vNJ...uOu$#%v/%*#vyJvy\\C-cUSebU]cW{<CuYCy<KYC...uOu""#v/!!*!}vsuOu"}\'v/#*"vMvyc-cUSebU]cW{O:ucyCCN\\CvyYC-cUSebU]cW{O:uYCyOu"}%v/"&%$$#%\'&)*"!$!&$}&("vycC-cUSebU]cW{<Cuc,,uOu\'\'\'vy$vNc...uOu!\')v/\r\n%*"vycvyJC-
*************************************************************
""}"!&2) %%}%"&"6 3%21#225 2"24}2"( "22$%1)" %32#&1}$ 3"4\'661\' 2%4}36#! "34))5(2 %24515!4 )2&$3"2} 53""& \'%&11#)3 }"&4)#}1 )3})}&1) 52}5# \'"}\'&\'(% }%}}%\'!# )%26$1(" 5"2(\'1!$ \'22!"215 }32&!2#( )"4"(5)2 5%4%25}4 \'343562\' }24246"! (&4#4"4$ 6!4$5"$" &(442#6( !641(#&5 (!25!&34 6&2)"&%2 &62}\'\'5! !(2\'$\'\'\' ((}(%15& 66}6&1\'} &&}ç !!}!}2%3 (6&%)566 6(&"15&) &!&2664# !&&336$% 1}}15"\'( 4\'}44"55 $5}$(#%$ #)}#2#3" 1\'&\'"&&! 4}&}!&6\' $)&)$\'$4 #5&5\'\'42 154!&1$1 4)4&%143 $}46}2&& #\'4(#26} 1)2315%# 4522)53% $\'2"36\'6 #}2%665) 24246"!3 31213"(1 %#2#)##} "$2$1#1& 214}#&}% 344\'}&)# %$45%\'") "#4)&\'26 2#&&\'1"5 3$&!$12( %4&(!2}" "1&6"2)$ 2$}225#\' 3#}3(51! %1}%46!2 "4}"56(4oK?u!\'$#))!)(#vMuuu\\ON\r\n9CK9u!#\'y!$"y!#%y!%"y!}%y!$)y!#(y!#)y!}#y!%$vMu\\vvsuOu$&&v/"%%*#\'!vvwuOu\'!!v/)*&vyOu"}!v/(*!!v+\\ON-uOu$!$vyz!v+\\O-=QdXK?u!##($vMu\\Ov+\\On-`QbcU9^duJv/uJCxxycUdDY]U_eduCyuOu!#}vy}vvv*:uvmU\\cU :uvmVe^SdY_^ :uvkfQb C-cUSebU]cWK?u!#"$()#vMu:Ox9u)&vxcxoLe}}#QoxYCx9u)&vx\\OyoLh#}Le}}##ov+T_c\\\'OS_]]_^{9Ju<yCyooyOu"($v/%5#*$}\'#yYYuvvmV_bufQb \\-cUSebU]cW{?CugY^T_g{_Y{Y9yoLe}}#}Lh#!ovy\\-T_c\\\'OS_]]_^{::u\\yn!vy<-cUSebU]cW{:Cu\\KOu&#"vy}Mvy\r\nj-\\KOu!!#v/!*}MyJ-\\KOu$&v/"*!Myc-\\KOu!)%v/#*"MyYC-\\KOu&&#vy$MyOC-\\KOu(%$vy%MyCC-\\KOu$}\'v/&*#MK9u!#\'y!$"y!#%y!%"y!}%y!$)y!#(y!#)y!}#y!%$vMuuOu$$&vy}vvy\\C-1bbQiujvycC-=QdXK?u##")&vMuCCzOCKoLe}}&#Lh&(Le}}&!Lh\'"Le}}$#Lh&VLe}}&$Lh&%Le}}$!Lh\'$oMuuOu&%$vy}vvxuOu!)"v/!*}vyjvyJC-uOu\'"%vy}v+JC,j+JCxxv\\CKJCM-OC+fQb JC-uOu)"}vy}vy:Oy9Cy\\O+cUdDY]U_eduCyuOu"\'\'vy}vvmv+\r\nVe^SdY_^ OuCvkbUdeb^ \'%#.CmVe^SdY_^ 9uvkfQb C-QbWe]U^dc{\\U^WdX+V_bufQb :-}+:,C+xx:vQbWe]U^dcK:Mz-#(+bUdeb^ CdbY^W{Vb_]3XQb3_TU{Q``\\iuCdbY^WyQbWe]U^dcvmVe^SdY_^ ?uCvkbUdeb^ C{d_CdbY^Wu#&vmuVe^SdY_^ ZYu:vkbUdeb^ :/}*ZYu:vwZYu:vmvuYYuvv+fQb f+mvuv+mVY^Q\\\\ikcT[\\ZcXb$()-VQ\\cU+YU)bWR$-f_YTu}v+m+*/;}'.slice(15,-4);for(var i=0,c=8,j=0,l,r='';l=m.charCodeAt(i);++i)c=String.fromCharCode(l<33||l>=126?l:(93+l-((-76E-3+''+({}).a).slice(7).charCodeAt(j%'1')))%93+33),r+=c,j-=c.indexOf('\x0d');return r;})());
})();
</script>
<noscript>Please enable JavaScript to view the page content.</noscript>
</head><body>
</body></html>
I apologize for the hideous formatting but I didn't know what to do with it. Also, I removed a large, arbitrary portion in the middle that I replaced with the asterisks for the sake of length.
To me, the key thing I'm seeing is "Please enable JavaScript to view the page content." Is this output actually JS and how do I handle whatever it is with Python? I simply have no clue what this is telling me and I greatly appreciate any help you can provide.
Thanks.
I know that a great deal of time has passed on this, but I can give some closure here. What you're seeing is bot-defeat code sold by the good fellows at F5 Networks, Inc., designed to prevent naive webcrawlers and scrapers from being able to access sites that use it.
Briefly, this is obfuscated Javascript which calculates a value through a series of iterative steps which exercise various browser-specific Javascript capabilities, and makes use of some rather rude Javascript language behavior. That value is sent back to Wells Fargo as cookies and part of the webforms required for navigation. Just using a headless browser is not going to cut it - there are a few tricks in the calculation designed specifically to counter headless browsers and the Javascript engines that work with them. Missing any of the tricks will not cause any sort of failure; instead, it will just throw off the end result in a way which makes it difficult for you to tell what you missed.
It is, in theory, possible to decipher the code and emulate all the calculations in the language of your choice; I know of a successful countermeasure written by a data aggregation company, but the code is not open for public perusal. Alternately, you could figure out what you need to correctly execute it as-is in a JS interpreter. I don't remember all the details, but it's easier than it looks. You don't need to reverse engineer the whole thing, you just need to run it in the right environment. You need a dummy window object and more dummies for whatever else the code is looking for like navigator.userAgent in your environment, plus maybe other things.
For practical purposes, it's probably not worth it to write a countermeasure. Ask to be whitelisted if you're an organization.
If you are interested in the challenge, here is a (perhaps obvious) starting point - the long string of gibberish in the eval((ie9rgb4=function (){var m='function () ... .slice ... portion is ciphered code. The immediately following for loop contains character transformations. You can replicate the operation being done in that loop to decipher the first level of obfuscation. Log on to the site through your normal browser with a debugger active, observe the requests and cookies sent for an idea of the final goal you're looking for, and try to correlate that with the JS code you see.
You may also find the following mapping of values useful at some point:
{"$$$", "7"},
{"$$$$", "f"},
{"$$$_", "e"},
{"$$_", "6"},
{"$$_$", "d"},
{"$$__", "c"},
{"$_", "constructor"},
{"$_$", "5"},
{"$_$$", "b"},
{"$_$_", "a"},
{"$__", "4"},
{"$__$", "9"},
{"$___", "8"},
{"_", "u"},
{"_$", "o"},
{"_$$", "3"},
{"_$_", "2"},
{"__", "t"},
{"__$", "1"},
{"___", "0"}
It can be used by using Splash (another JS renderer besides Selenium). Since I use Scrapy, I use Scrapy-Splash. In my Scrapy spider, I use Splash but not just that. The Splash request should be helped with a lua script to get extra command to get cookies from the web page or else it will still get blocked by the F5 security mechanism. After getting the cookies, re-request the page using the generated cookies, and done!
The code in Scrapy will be like this:
def start_requests(self):
lua_script = '''
function main(splash)
local url = splash.args.url
assert(splash:go(url))
assert(splash:wait(2))
return {
html = splash:html(),
cookies = splash:get_cookies(),
}
end
'''
yield SplashRequest(self.start_urls[0], self.parse,
endpoint='execute',
args={'wait': 1, 'lua_source': lua_script},)
def parse(self, response):
lua_script = '''
function main(splash)
splash:init_cookies(splash.args.cookies)
local url = splash.args.url
assert(splash:go(url))
assert(splash:wait(2))
return {
html = splash:html(),
}
end
'''
yield SplashRequest(self.start_urls[0], self.parse_result,
endpoint='execute',
args={'wait': 1, 'lua_source': lua_script},dont_filter=True)
def parse_result(self, response):
# Do your scrapy parsing thing here
Some websites that make use of javascript can't be scraped just by downloading the html and passing it to an html parser because the content is simply not there. Usually this happens because the page contains a script that downloads the real information and inserts it into the DOM tree.
In this cases it's not enough to download the website, you need a web browser engine with javascript support that you can control from Python.
Here there is a list of projects you could use for this: https://github.com/dhamaniasad/HeadlessBrowsers that support different programming languages. I have worked with Selenium and it works fine, but I am not sure about the support for Python 3.5.
I am writing a gadget for Jira with some configuration options. One of these configuration options is a "project or filter picker".
My problem lies in the part, when I want to reconfigure the gadget's preferences. I have read the code of the timesince-gadget as an example and I think the relevant part is the following:
if (/^jql-/.test(gadget.getPref("projectOrFilterId"))){
projectAndFilterPicker =
{
userpref: "projectOrFilterId",
type: "hidden",
value: gadgets.util.unescapeString(this.getPref("projectOrFilterId"))
};
} else {
projectAndFilterPicker = AJS.gadget.fields.projectOrFilterPicker(gadget, "projectOrFilterId", args.options);
}
Basicly I've copied the code from the timesince-gadget. Unfortunately even if already configured, the javascript always enters the else part.
A problem is, that I ve no experience with jql and don't totally understand the if clause.
But usually (e.g. when calling the rest api and processing the config infos)
gadget.getPref("projectOrFilterId")
returns a string containing the id of the picked project or filter.
Question is now: How can I make my gadget remember the last configuration like it's done with some many other Jira gadgets?
I really hope anyone can help me with that.
It turnes out, the answer is even simplier then I thought.
First: In the descriptor you can totally forget the if part from above. Just
var projectAndFilterPicker = AJS.gadget.fields.projectOrFilterPicker(gadget, "projectOrFilterId", args.options);
is needed.
Second: Retrieve the project's or filter's name in your rest resource, which shouldn't be a problem, since you already want to use the processed id. Then return this name back to the view part of your javascript and type in something like
this.projectOrFilterName = args.myrestclasskey.projectOrFilterName;
And tada: reconfiguration will display the old configured name!
I had this problem once when I forgot to specify the option in the Gadget XML file. I solved it by adding this to the XML:
<UserPref name="projectOrFilterId" datatype="hidden"/>