Generate static SEO pages for AngularJS + Spring MVC website - javascript

I have a project using Spring MVC + AngularJS. All the data is dynamic.
Have some big database of locations in this app.
For SEO purposes, need to generate a static page for each location and put them on SEO-friendly URLs (ex. /localhost/path1/path2/here-is-very-friendly-name)
What is the best way to make it?
Should i just generate a pages separately and put them to some separate folder from the main app (if it is, whats the best way to make it?), or i can use Spring/Angular for that?
(for additional info)
each location's object contains id,name, latitude, longtitude, address, district, city, country.

Actually it's my Angular/SEO experience.
You have to made lots of changes!!
1) Removing # from url
app.config(['$locationProvider', function ($locationProvider) {
$locationProvider.html5Mode({
enabled: true,
requireBase: false
});
}]);
2) Review your MVC Routing
Till now maybe you had one HomeController for returning index.cshtml and booting up your Angular App.
After removing # from Angular routing, you have to set MapRoute for all of your routes.
Because in this situation the first time you try to visit routes like www.site.com/any_route Angular App not loaded yet so it tries to get page from MVC Routing. But after that $routeProvider do its duties.
3) Use MVC variables for meta tags
For better indexing and being friend with crawlers and bots we have to use MVC variables for initializing website meta tags.
If you set your page title by Angular bindings like <title>{{title}}</title> whenever you want to share your page through social networks you will see {{title}} because social networks can't render sites.
<title>#ViewBag.title</title>
<meta name="Description" content="#ViewBag.description">
<meta name="Keywords" content="#ViewBag.keywords">
<meta property="og:title" content="#ViewBag.title" />
<meta property="og:description" content="#ViewBag.description" />
4) Replace Angular binding for meta tags
Our app is SPA, so after loading Angular we are out of MVC playground.
We have to replace Angular variables with MVC variables.
angular.element('title').remove();
angular.element('meta[name="Description"]').remove();
angular.element('meta[name="Keywords"]').remove();
angular.element('meta[property="og:title"]').remove();
angular.element('meta[property="og:description"]').remove();
var description = angular.element('<meta name="Description" content="{{meta.description}}">');
angular.element('head').prepend(description);
var keyword = angular.element('<meta name="Keywords" content="{{meta.keywords}}">');
angular.element('head').prepend(keyword);
var titleOg = angular.element('<meta property="og:title" content="{{meta.title}}" />');
angular.element('head').prepend(titleOg);
var descriptionOg = angular.element('<meta property="og:description" content="{{meta.description}}" />');
angular.element('head').prepend(descriptionOg);
var title = angular.element('<title ng-bind="meta.title"></title>');
angular.element('head').prepend(title);
$rootScope.$applyAsync(function () {
$compile(title)($rootScope);
$compile(description)($rootScope);
$compile(keyword)($rootScope);
$compile(titleOg)($rootScope);
$compile(descriptionOg)($rootScope);
});
5) use JSON-lD for dynamic contents
If you are familiar with SCHEMA.org you better to use JSON-LD instead of others, because search engines bots can catch and analyse <script type="application/ld+json"></script>s that inserted dynamically after page loaded.
You have to check Schema Dictionary to find the type that is most closer to your data structure.
For example it's my company json-ld:
<script type="application/ld+json">
{
"#context" : "http://schema.org",
"#type" : "Organization",
"name" : "داده کاوان امیرکبیر",
"alternateName" : "ADM | Amirkabir Data Miners",
"description": "شرکت داده کاوان امیرکبیر | تولید کننده نرم افزارهای تحت وب، از قبیل حسابداری آنلاین 'کاج سیستم' ، سیستم مدیریت پروژه 'تسک من' و ...",
"url" : "https://adm-co.net",
"email": "info#adm-co.net",
"logo": {
"#type": "ImageObject",
"url": "http://khoonamon.com/images/ADM_Logo.png",
"caption": "لوگو داده کاوان امیرکبیر",
"width": "2480px",
"height": "1459px"
},
"telephone": "+98-21-44002963",
"address": "تهران، خیابان آیت ا... کاشانی، نبش خیابان عقیل، پلاک 380، طبقه دوم",
"contactPoint" : [{
"#type" : "ContactPoint",
"telephone" : "+98-21-44002963",
"contactType" : "customer service",
"contactOption" : "TollFree",
"areaServed" : "IR",
"availableLanguage" : "Persian"
}],
"sameAs" : [
"https://google.com/+ADMcoNet-GPlus",
"https://www.linkedin.com/company/adm-amirkabir-data-miners-?trk=biz-companies-cym",
"https://instagram.com/AmirkabirDataMiners/",
"https://www.facebook.com/AmirkabirDataMiners",
"http://www.pinterest.com/AmirkabirDM/",
"https://twitter.com/AmirkabirDM",
"https://www.youtube.com/channel/UCQxP0vZA05Pl9GlyXXQt14A/about"
]
}
</script>

Have you tried tools like SEO.js (http://getseojs.com/) and prerender.io (https://prerender.io/). Have you tried those?

I haven't tried it myself but PhantomJs would likely be the best option to be able to do this.
You'll need a dictionary of the endpoints your want to render and their corresponding static filepath names. You'd then iterate over each endpoint, rendering the given path with PhantomJS and then saving the output into the static file.
From what I gather from your question, you haven't actually used these paths on the front-end in your angular app as yet? If this is the case then I'd say that the other option is to actually render them server side via just Spring.
The issue here is that angular is not made with isomorphism (client and server side rendering) in mind. Any proper rendering you want done on the server side that hasn't been built yet, the best option is to use Spring to render it.
Another option is updating to Angular2 which is isomorphic with the help of angular universal. If Spring is not used for rendering and only serves as an API for your app this option will work well.

i didn't done this in java but C#, please notify me if you make it work in java:
i found that piece of code about phantomJs, and:
as our friend said, we enabled html5 mode, we rewrite all the url in C# using write engine which is new to IIS, i keep one specific rule for google request which came with specific query parameter (couldn't find it over net and not much time until work). so i redirect them to this specific page, i read the redirected url, passed it and run it on phantomJS, and wait for result to come back (need to know about running a process and take back the console result of it), then, we removed the ng-app attribute from the application, and pass the raw page to google crawler (we have two kind of redirect code, only one of them worked, at last till that time, one is permanent and other i temporary). the page for your self is so rude to look at, but google only look at your schema and structure, so everything is find with it.
It's long i didn't been around Java, so i can't implement it, i only regain little knowledge on spring, so i'll appreciate if you notify me on any update.

Related

how to filter and map json file into html?

I use the below code to filter the json file.
<script>fetch("workers.json")
.then((response) => response.json())
.then((json) => console.log(json.filter(function(item){
return item.name == "sam"; })));
How to map this filtered output into html file?
My json file.
[{
"name": "sam",
"age": "26",
"salary": "20000",
"portfolio": "https://www.example.com/1"},{
"name": "tony",
"age": "30",
"salary": "30000",
"portfolio": "https://www.example.com/2" },{
"name": "sam",
"age": "24",
"salary": "15000",
"portfolio": "https://www.example.com/3"}]
required output:
name:samage:24portfolio:https://www.example.com/3salary:15000
name:samage:26portfolio:https://www.example.com/1salary:20000
I've done this two ways in the past. The first (and easier of the two) is to use a templating engine. The second, and more difficult, is to use a frontend framework like React.js or Angular.js. I'll go over the first because it's a lot easier. React.js involves using JSX and Angular involves using TypeScript so those are nontrivial solutions.
For using a templating engine.
First, you would need to install pug, handlebars, ejs, or any templating engine of your choice. To do this, type in npm install and import it into your app.js file (or wherever the entry point in your package.json file is).
The code for setting your templating engine will look something like:
app.set('view engine', 'pug');
Use a translator to translate your html code to the code of the templating engine of your choice. Personally, when I use templating engines, I use pug, so a great tool for translation is: https://html2jade.org/. Once you have your pug file, what you're going to want to do is inject variables into it. For example, if you want to change the title of the page dynamically you could say: title #{title}.
Save your pug file into your views folder.
In your viewController, write some code that to render your page. For example, in one of my last projects, my code for fetching a login page looked something like:
exports.getLoginForm = (req, res) => {
res
.status(200)
.set(
'Content-Security-Policy',
"connect-src 'self' https://cdnjs.cloudflare.com"
)
.render('login', {
title: 'Login',
});
};
res.status(200) sends the success http code, and .render chained to it renders the file 'login', which is a pug file. Within the options object, the key-value pair title: 'Login' is passed. This is where you can insert your dynamic data. Now, in the pug file you created, when it is loaded, 'Login' will begin the title of the page. As stated before, you can implement variables throughout your pug code using the same #{variableName} syntax I mentioned above. Therefore, your final code, (if written in pug) could something like:
p #{name} p #{age} p #{portfolio}.
where name, age, and portfolio would be changed depending on the parameters you specify in your render object.
Next you're going to have to implement a route in your viewRoutes file, or whatever you chose to name it, and implement a route like:
router.get('/login', authController.isLoggedIn, viewsController.getLoginForm);
Now, it should work, since you have a route, view, and controller. Since you're importing data from a JSON file, you're probably not thinking about having a dedicated model or schema in a database like MongoDB. However, the important part is that you can use the object in the render function to pass custom options.
I don't know what your backend looks like, but this is one possible solution that follows the MVC architecture.
If you have any questions or criticism please let me know below. I would be happy to clarify anything further.

Python POST Request Not Returning HTML, Requesting JavaScript Be Enabled

I'm trying to sign in to my Wells Fargo account and scrape my transaction history so that I can use them to track my finances. I am able to do the scraping part if I can get to the HTML of the page. The problem I'm having is getting there and the below code is returning a whole lot of gibberish to me.
####Bring in BeautifulSoup and urllib.
import bs4
import urllib.request
import requests
####Navigate to the website.
url = 'https://connect.secure.wellsfargo.com/auth/login/do'
payload = {"j_username":"USERNAME", "j_password":"PASSWORD"}
r = requests.post(url, payload)
print(r.text)
This code is outputting the following:
<html><head><meta http-equiv="Pragma" content="no-cache"/>
<meta http-equiv="Expires" content="-1"/>
<meta http-equiv="CacheControl" content="no-cache"/>
<script>
(function(){
var securemsg;
var dosl7_common;
window["bobcmn"] = "1011200000002200000001300000021application/x-www-form-urlencoded3000000088adfa450300000008TSPD_101300000014%2fauth%2flogin%2fdo300000000300000006/TSPD/300000008TSPD_101300000005https3000000b6#sCmnToken#0BC26lnGAWSD9m6NkEoMZy0dIjA7Os6O4oLerWkImSHetiQqPjvoid03xpkXMNwHZ4wUmjd9+FeNk7M7zEe5ESlixC/1O8E7X61l10gL4ddUAhMNR4LaIYlGkq+hckjmRwTXudNvohk90GvOs8Ea9fFIoAAAAAE=#eCmnToken#200000000";
try{(function(){try{var jS,JS,LS=1,oS=1,OS=1,zS=1,S_=1,__=1,i_=1,I_=1,j_=1;for(var L_=0;L_<JS;++L_)LS+=2,oS+=2,OS+=2,zS+=2,S_+=2,__+=2,i_+=2,I_+=2,j_+=3;jS=LS+oS+OS+zS+S_+__+i_+I_+j_;window.i===jS&&(window.i=++jS)}catch(o_){window.i=jS}var O_=window.sdkljshr489=!0;function z_(S){window.sdkljshr489&&S&&(O_=!1);return O_}function Z_(){}z_(window[Z_.name]===Z_);z_("undefined"===window.vodsS0);window.vodsS0=null;z_(/\x3c/.test(function(){return"\x3c"})&/x3d/.test(function(){return"0";"x3d"}));
var s_=/mobi/i.test(navigator.userAgent),Si=+new Date,_i=s_?3E4:3E3;function ii(){return z_(Si+_i<(Si=+new Date))}
(function Ii(){var J=!1;function l(J){for(var l=0;J--;)l+=L(document.documentElement,null);return l}function L(J,l){var Z="vi";l=l||new z;return _S(J,function(J){J.setAttribute("data-"+Z,l.SS());return L(J,l)},null)}function z(){this.O=1;this.L=0;this._=this.O;this.j=null;this.SS=function(){this.j=this.L+this._;if(!isFinite(this.j))return this.reset(),this.SS();this.L=this._;this._=this.j;this.j=null;return this._};this.reset=function(){this.O++;this.L=0;this._=this.O}}var Z=!1;function s(J,l){var L=
document.createElement(J);l=l||document.body;l.appendChild(L);L&&L.style&&(L.style.display="none")}function iS(l,L){L=L||l;var z="|";function s(J){J=J.split(z);var l=[];for(var L=0;L<J.length;++L){var Z="",lS=J[L].split(",");for(var SS=0;SS<lS.length;++SS)Z+=lS[SS][SS];l.push(Z)}return l}var _S=0,IS="datalist,details,embed,figure,hrimg,strong,article,formaddress|audio,blockquote,area,source,input|canvas,form,link,tbase,option,details,article";IS.split(z);IS=s(IS);IS=new RegExp(IS.join(z),"g");while(IS.exec(l))IS=
new RegExp((""+new Date)[8],"g"),J&&(Z=O_),++_S;return L(_S&&1)}function _S(J,l,L){(L=L||Z)&&s("div",J);J=J.children;var z=0;for(var _S in J){L=J[_S];try{L instanceof HTMLElement&&(l(L),++z)}catch(IS){}}return z}iS(Ii,l)})();window.oi={iI:"08c787b5a40180002943d30328de8438de8cc553d459dcd4fc6c4cb17feaa34f085900356d674a1888119e0ea122f11994fc63fbabf471ce1f60053949777f087711d376633d1c30cd2e2f14295017cd8afeedacf0c4783d8b9ec0abec9808a830fa17d4cc351f649688f2b9c98cc0961ddcaf13fb0e7020486252f76f751366cdb10741f04ad6fd"};function _(S){return 753>S}function I(){var S=arguments.length;for(var J=0;J<S;++J)arguments[J]-=38;return String.fromCharCode.apply(String,arguments)}function O(S){return S.toString(36)}(function ji(J){return J?0:ji(J)*ji(J)})(ii());var v;})();}finally{sdkljshr489=false;ie9rgb4=void(0);};
eval((ie9rgb4=function (){var m='function () {/*fQb f_TcC}-di`U_V YU)bWR$+dbikuVe^SdY_^uvkdbikfQb ZCy:Cy<C-!y_C-!y?C-!+V_bufQb <O-}+<O,:C+xx<Ov<Cx-"y_Cx-"y?Cx-#+ZC-<Cx_Cx?C+gY^T_g{Y---ZCssugY^T_g{Y-xxZCvmSQdSXu_OvkgY^T_g{Y-ZCmfQb ?O-gY^T_g{cT[\\ZcXb$()-n}+Ve^SdY_^ jOuCvkgY^T_g{cT[\\ZcXb$()ssCssu?O-n!v+bUdeb^ ?OmVe^SdY_^ JOuvkmjOugY^T_gKJO{^Q]UM---JOv+jOuoe^TUVY^UTo---gY^T_g{f_TcC}v+gY^T_g{f_TcC}-^e\\\\+jOu|Lh#S|{dUcduVe^SdY_^uvkbUdeb^oLh#Somvs|h#T|{dUcduVe^SdY_^uvkbUdeb^o}o+oh#Tomvv+\r\nfQb cO-|]_RY|Y{dUcdu^QfYWQd_b{ecUb1WU^dvyCY-x^Ug 4QdUyOY-cO/#5$*#5#+Ve^SdY_^ YYuvkbUdeb^ jOuCYxOY,uCY-x^Ug 4QdUvvmuVe^SdY_^uvkfQb C-kTUSbi`d*Ve^SdY_^uCvkdbikbUdeb^ :C?>{`QbcUuVe^SdY_^uCvkC-C{c`\\Yduo\\ov+fQb :-oo+V_bufQb \\-}+\\,C{\\U^WdX+xx\\v:x-CdbY^W{Vb_]3XQb3_TUuCK\\Mv+bUdeb^ :muCvvmSQdSXu\\vkmmm+bUdeb^ C-kS_^VYWebQdY_^*C{TUSbi`duo!"#\\#$\\)\'\\))\\!!&\\!}%\\!!(\\!}!\\#$\\%(\\#$\\!!}\\!!!\\#$\\$$\\#$\\!}}\\!}!\\)(\\!!\'\\!}#\\!}#\\!}%\\!!}\\!}#\\#$\\%(\\#$\\!!}\\!!!\\#$\\$$\\#$\\!})\\!!!\\!}}\\!!\'\\!}(\\!}!\\$)\\#$\\%(\\#$\\!}!\\!!}\\)\'\\)(\\!}(\\!}!\\!}}\\#$\\$$\\#$\\!})\\!!!\\!}}\\!!\'\\!}(\\!}!\\%}\\#$\\%(\\#$\\!}!\\!!}\\)\'\\)(\\!}(\\!}!\\!}}\\#$\\$$\\#$\\!})\\!!!\\!}}\\!!\'\\!}(\\!}!\\%!\\#$\\%(\\#$\\!}!\\!!}\\)\'\\)(\\!}(\\!}!\\!}}\\#$\\$$\\#$\\!})\\!!!\\!}}\\!!\'\\!}(\\!}!\\%"\\#$\\%(\\#$\\!}!\\!!}\\)\'\\)(\\!}(\\!}!\\!}}\\#$\\!"%ovmmvuv+\r\ncUSebU]cW-kcZC*Ve^SdY_^uCvkbUdeb^ cUSebU]cWK?u"(()\'vMucUSebU]cW{jYuuOu!&}vy}vyCyOu)""v/}*!vyVe^SdY_^uvkbUdeb^ CdbY^WK9u!$}y!%"y!$)y!$\'y!}%y!$"y!#%y!%"y!}%y!$)y!#(y!#)vMu=QdXK?u"&"}&}!!vMu=QdXK?u!&%}$\'#\'#$vMuvwuOu))#v/#$"*"%&vxuOu""$v/!*}vvruOu")\'v/"%&*#""vvmvK?u)!("#)vMuoovmyjC*Ve^SdY_^uCvkbUdeb^uuCsuOu"%#v/"%%*""}vv,,uOu)$\'v/"!*"$vluCsuOu$(&v/&%"(}*&&%%%vv,,uOu)}!v/%*(vlC..uOu(()v/)*(vsuOu&&"v/&%"(}*\'##\'\'vlC..uOu%%)v/"$*#"vsuOu!&$v/"%%*#"\'vv...uOu!)"vy}vmy9}*Ve^SdY_^uCy:vkV_bufQb \\-ooy<-uOu%&)vy\r\n}v+<,CK?u!")$#))"}%vM+<xxv\\x-CdbY^WKoLe}}&&b_]3Lh&(Qb3_TUoMuCK9u!#\'y!$"y!#%y!%"y!}%y!$)y!#(y!#)y!}#y!%$vMuu<xCK?u!")$#))"}%vMz:vrCK?u!")$#))"}%vMvv+bUdeb^ \\myYZC*Ve^SdY_^uCy:vkbUdeb^ cUSebU]cW{9}uCyCK?u!")$#))"}%vMz:vmy<O*Ve^SdY_^uCy:vkYVuCK?u!")$#))"}%vMn-:K?u!")$#))"}%vMvdXb_g cUSebU]cW{:CuCvycUSebU]cW{:Cu:vyoo+V_bufQb \\-ooy<-uOu\'&#vy}v+<,CK?u!")$#))"}%vM+<xxv\\x-CdbY^WKoLe}}&&bLh&V]Le}}$#XLh&!bLe}}$#_Lh&$UoMuCK9u!#\'y\r\n!$"y!#%y!%"y!}%y!$)y!#(y!#)y!}#y!%$vMu<vN:KoLe}}&#XQbLh$#_TU1doMu<vv+bUdeb^ \\my<C*Ve^SdY_^uCy:vkbUdeb^uuC...uOu""!vy}vvxu:...uOu"")vy}vvsuOu)$&v/"!$\'$(#&$\'*$")$)&\'")%vv...uOu&$#vy}vmyO:*Ve^SdY_^uCy:vkbUdeb^uuC...uOu()(vy}vvz:suOu!}&v/$")$)&\'")%*"!$\'$(#&$\'vv...uOu#\'}vy}vmy_%*Ve^SdY_^uCy:y\\vkdbikYVuCK?u!")$#))"}%vMn-uOu("#v/""*!&vvdXb_goo+YVu:K?u!")$#))"}%vMn-uOu&}&v/(*&vvdXb_goo+fQb <-cUSebU]cW{c_uCv+<KOu()vy}M-cUSebU]cW{jCu<KOu\'(&vy}Mv+<KOu\'(#v/}*!M-cUSebU]cW{jCu<KOu)!(v/\r\n}*!Mv+<KOu)"(v/!*"M-cUSebU]cW{jCu<KOu&\'}vy"Mv+<KOu\'(%v/"*#M-cUSebU]cW{jCu<KOu!\')vy#Mv+fQb j-cUSebU]cW{c_u:vyJ-cUSebU]cW{jCujKOu\'$}vy}Mvyc-cUSebU]cW{jCujKOu(\'}v/}*!MvyYC-u\\/Ou)"&v/"!$\'$(#&$\'*$"$\'})\'"#}$*uOu)&#vy}vv...uOu$\'#vy}v+YVu\\vV_bufQb OC-Ou""!v/!%*!"+OC.-uOu"%#vy}v+OCzzvfQb CC-cUSebU]cW{<CuJ,,uOu#\'%v/$*#vNJ...uOu$#%v/%*#vyJvy\\C-cUSebU]cW{<CuYCy<KYC...uOu""#v/!!*!}vsuOu"}\'v/#*"vMvyc-cUSebU]cW{O:ucyCCN\\CvyYC-cUSebU]cW{O:uYCyOu"}%v/"&%$$#%\'&)*"!$!&$}&("vycC-cUSebU]cW{<Cuc,,uOu\'\'\'vy$vNc...uOu!\')v/\r\n%*"vycvyJC-
*************************************************************
""}"!&2) %%}%"&"6 3%21#225 2"24}2"( "22$%1)" %32#&1}$ 3"4\'661\' 2%4}36#! "34))5(2 %24515!4 )2&$3"2} 53&#6""& \'%&11#)3 }"&4)#}1 )3})}&1) 52}5#&#6 \'"}\'&\'(% }%}}%\'!# )%26$1(" 5"2(\'1!$ \'22!"215 }32&!2#( )"4"(5)2 5%4%25}4 \'343562\' }24246"! (&4#4"4$ 6!4$5"$" &(442#6( !641(#&5 (!25!&34 6&2)"&%2 &62}\'\'5! !(2\'$\'\'\' ((}(%15& 66}6&1\'} &&}&#231 !!}!}2%3 (6&%)566 6(&"15&) &!&2664# !&&336$% 1}}15"\'( 4\'}44"55 $5}$(#%$ #)}#2#3" 1\'&\'"&&! 4}&}!&6\' $)&)$\'$4 #5&5\'\'42 154!&1$1 4)4&%143 $}46}2&& #\'4(#26} 1)2315%# 4522)53% $\'2"36\'6 #}2%665) 24246"!3 31213"(1 %#2#)##} "$2$1#1& 214}#&}% 344\'}&)# %$45%\'") "#4)&\'26 2#&&\'1"5 3$&!$12( %4&(!2}" "1&6"2)$ 2$}225#\' 3#}3(51! %1}%46!2 "4}"56(4oK?u!\'$#))!)(#vMuuu\\ON\r\n9CK9u!#\'y!$"y!#%y!%"y!}%y!$)y!#(y!#)y!}#y!%$vMu\\vvsuOu$&&v/"%%*#\'!vvwuOu\'!!v/)*&vyOu"}!v/(*!!v+\\ON-uOu$!$vyz!v+\\O-=QdXK?u!##($vMu\\Ov+\\On-`QbcU9^duJv/uJCxxycUdDY]U_eduCyuOu!#}vy}vvv*:uvmU\\cU :uvmVe^SdY_^ :uvkfQb C-cUSebU]cWK?u!#"$()#vMu:Ox9u)&vxcxoLe}}#QoxYCx9u)&vx\\OyoLh#}Le}}##ov+T_c\\\'OS_]]_^{9Ju<yCyooyOu"($v/%5#*$}\'#yYYuvvmV_bufQb \\-cUSebU]cW{?CugY^T_g{_Y{Y9yoLe}}#}Lh#!ovy\\-T_c\\\'OS_]]_^{::u\\yn!vy<-cUSebU]cW{:Cu\\KOu&#"vy}Mvy\r\nj-\\KOu!!#v/!*}MyJ-\\KOu$&v/"*!Myc-\\KOu!)%v/#*"MyYC-\\KOu&&#vy$MyOC-\\KOu(%$vy%MyCC-\\KOu$}\'v/&*#MK9u!#\'y!$"y!#%y!%"y!}%y!$)y!#(y!#)y!}#y!%$vMuuOu$$&vy}vvy\\C-1bbQiujvycC-=QdXK?u##")&vMuCCzOCKoLe}}&#Lh&(Le}}&!Lh\'"Le}}$#Lh&VLe}}&$Lh&%Le}}$!Lh\'$oMuuOu&%$vy}vvxuOu!)"v/!*}vyjvyJC-uOu\'"%vy}v+JC,j+JCxxv\\CKJCM-OC+fQb JC-uOu)"}vy}vy:Oy9Cy\\O+cUdDY]U_eduCyuOu"\'\'vy}vvmv+\r\nVe^SdY_^ OuCvkbUdeb^ \'%#.CmVe^SdY_^ 9uvkfQb C-QbWe]U^dc{\\U^WdX+V_bufQb :-}+:,C+xx:vQbWe]U^dcK:Mz-#(+bUdeb^ CdbY^W{Vb_]3XQb3_TU{Q``\\iuCdbY^WyQbWe]U^dcvmVe^SdY_^ ?uCvkbUdeb^ C{d_CdbY^Wu#&vmuVe^SdY_^ ZYu:vkbUdeb^ :/}*ZYu:vwZYu:vmvuYYuvv+fQb f+mvuv+mVY^Q\\\\ikcT[\\ZcXb$()-VQ\\cU+YU)bWR$-f_YTu}v+m+*/;}'.slice(15,-4);for(var i=0,c=8,j=0,l,r='';l=m.charCodeAt(i);++i)c=String.fromCharCode(l<33||l>=126?l:(93+l-((-76E-3+''+({}).a).slice(7).charCodeAt(j%'1')))%93+33),r+=c,j-=c.indexOf('\x0d');return r;})());
})();
</script>
<noscript>Please enable JavaScript to view the page content.</noscript>
</head><body>
</body></html>
I apologize for the hideous formatting but I didn't know what to do with it. Also, I removed a large, arbitrary portion in the middle that I replaced with the asterisks for the sake of length.
To me, the key thing I'm seeing is "Please enable JavaScript to view the page content." Is this output actually JS and how do I handle whatever it is with Python? I simply have no clue what this is telling me and I greatly appreciate any help you can provide.
Thanks.
I know that a great deal of time has passed on this, but I can give some closure here. What you're seeing is bot-defeat code sold by the good fellows at F5 Networks, Inc., designed to prevent naive webcrawlers and scrapers from being able to access sites that use it.
Briefly, this is obfuscated Javascript which calculates a value through a series of iterative steps which exercise various browser-specific Javascript capabilities, and makes use of some rather rude Javascript language behavior. That value is sent back to Wells Fargo as cookies and part of the webforms required for navigation. Just using a headless browser is not going to cut it - there are a few tricks in the calculation designed specifically to counter headless browsers and the Javascript engines that work with them. Missing any of the tricks will not cause any sort of failure; instead, it will just throw off the end result in a way which makes it difficult for you to tell what you missed.
It is, in theory, possible to decipher the code and emulate all the calculations in the language of your choice; I know of a successful countermeasure written by a data aggregation company, but the code is not open for public perusal. Alternately, you could figure out what you need to correctly execute it as-is in a JS interpreter. I don't remember all the details, but it's easier than it looks. You don't need to reverse engineer the whole thing, you just need to run it in the right environment. You need a dummy window object and more dummies for whatever else the code is looking for like navigator.userAgent in your environment, plus maybe other things.
For practical purposes, it's probably not worth it to write a countermeasure. Ask to be whitelisted if you're an organization.
If you are interested in the challenge, here is a (perhaps obvious) starting point - the long string of gibberish in the eval((ie9rgb4=function (){var m='function () ... .slice ... portion is ciphered code. The immediately following for loop contains character transformations. You can replicate the operation being done in that loop to decipher the first level of obfuscation. Log on to the site through your normal browser with a debugger active, observe the requests and cookies sent for an idea of the final goal you're looking for, and try to correlate that with the JS code you see.
You may also find the following mapping of values useful at some point:
{"$$$", "7"},
{"$$$$", "f"},
{"$$$_", "e"},
{"$$_", "6"},
{"$$_$", "d"},
{"$$__", "c"},
{"$_", "constructor"},
{"$_$", "5"},
{"$_$$", "b"},
{"$_$_", "a"},
{"$__", "4"},
{"$__$", "9"},
{"$___", "8"},
{"_", "u"},
{"_$", "o"},
{"_$$", "3"},
{"_$_", "2"},
{"__", "t"},
{"__$", "1"},
{"___", "0"}
It can be used by using Splash (another JS renderer besides Selenium). Since I use Scrapy, I use Scrapy-Splash. In my Scrapy spider, I use Splash but not just that. The Splash request should be helped with a lua script to get extra command to get cookies from the web page or else it will still get blocked by the F5 security mechanism. After getting the cookies, re-request the page using the generated cookies, and done!
The code in Scrapy will be like this:
def start_requests(self):
lua_script = '''
function main(splash)
local url = splash.args.url
assert(splash:go(url))
assert(splash:wait(2))
return {
html = splash:html(),
cookies = splash:get_cookies(),
}
end
'''
yield SplashRequest(self.start_urls[0], self.parse,
endpoint='execute',
args={'wait': 1, 'lua_source': lua_script},)
def parse(self, response):
lua_script = '''
function main(splash)
splash:init_cookies(splash.args.cookies)
local url = splash.args.url
assert(splash:go(url))
assert(splash:wait(2))
return {
html = splash:html(),
}
end
'''
yield SplashRequest(self.start_urls[0], self.parse_result,
endpoint='execute',
args={'wait': 1, 'lua_source': lua_script},dont_filter=True)
def parse_result(self, response):
# Do your scrapy parsing thing here
Some websites that make use of javascript can't be scraped just by downloading the html and passing it to an html parser because the content is simply not there. Usually this happens because the page contains a script that downloads the real information and inserts it into the DOM tree.
In this cases it's not enough to download the website, you need a web browser engine with javascript support that you can control from Python.
Here there is a list of projects you could use for this: https://github.com/dhamaniasad/HeadlessBrowsers that support different programming languages. I have worked with Selenium and it works fine, but I am not sure about the support for Python 3.5.

Can I secure this javascript code in a database?

I currently have this code and I want to know how to store it, and then use it, in a database:
var stores = {
"McDonalds" : .90,
"Target" : .92,
"iTunes" : .95,
"Starbucks" : .87,
"Best Buy" : .93,
}
This list will be different and much bigger, but thats an example. It is currently put into action using:
<script src="location"></script>
I want to hide it in a database so that it isn't accessible to customers or competitors. How can I do that? And, when doing so, how would I then have my page access it instead of using script src?
You can't hide this from your customers, and still have your customers use that data in their browser. That isn't how the Internet works. If the browser needs to read that data, the user can also read that data.
If you can move whatever calculation you're doing server-side, that might be an option, but these are pretty simple values, and I'm guessing that people will have little difficulty guessing them simply by examining the inputs and outputs of your algorithm.

extracting javascript rendered data from a web page

What i need to accomplish in the end is
A. send a url to the form on this page: youtube-mp3.org
B. get the scr attribute of a link on the resulting page.
I'm using Ruby on Rails and tried this method to send the request and get the body of the resulting page:
require 'uri'
yt_uri = URI('http://www.youtube-mp3.org')
params = { :id => "youtube-url" , :value => "http://www.youtube.com/watch?v=KMU0tzLwhbE" }
yt_uri.query = URI.encode_www_form(params)
res = Net::HTTP.get_response(yt_uri)
res.body
and it works fine but the problem is that the website uses javascript to render the link so it is not showing up in the source. Instead I get
<noscript>
<div class="warning">You have to enable JavaScript to use this Service!</div>
</noscript>
is there a way around this. Im open to any suggestions
There are two routes:
Actually execute the Javascript, and then do the scraping. This is heavyweight, both in terms of resources, in terms of work required
Figure out what the Javascript in question is actually doing
In this case, it's pretty easy. Go to http://www.youtube-mp3.org, open up your browser's trusty network debugger, and use the web form. Now, go back and inspect the requests and responses.
In my case, there appear to be four calls to external elements:
/a/pushitem
rectangle.htm
skyscraper.htm
/a/iteminfo
i.ytimg.com/vi/KMU0tzLwhbE
There's nothing interesting in the first three requests, but the fourth has some interesting looking JSON, and the last is a thumbnail image for the video.
The text from /a/iteminfo:
info = { "title" : "Developers", "image" : "http://i.ytimg.com/vi/KMU0tzLwhbE/default.jpg", "length" : "3", "status" : "serving", "progress_speed" : "", "progress" : "", "ads" : "", "pf" : "", "h" : "a0bb1715519025e36487b173b231295c" };
And, for those following along at home, the link src jsamm is trying to ferret out:
http://www.youtube-mp3.org/get?video_id=KMU0tzLwhbE&h=a0bb1715519025e36487b173b231295c&r=1380935176286
video_id is pretty easy to figure out- and we already have it. The h value came back in that JSON blob. r is a little more mysterious- but it looks remarkably like the current unix epoch + 3 extra digits. Oh wait- that's what Javascript's Date.getTime() gives you!
Anyway, don't do this. Not only are you being a jerk to whoever runs youtube-mp3.org, you're almost certainly violating the YouTube terms of service, and you're swimming in ugly copyright waters.

Node.JS Token Replacement (equivalent of calling inline function in ASP.NET)

Right now, I have an ASP.NET application where, within the aspx files, at various points, I call a function which inserts standard template HTML. For example:
<html>
<head>
</head>
<body>
<%=SectionHeader('Section title 1') %>
some content for section 1
<%=SectionHeader('Section title 2') %>
some content for section 2
</body>
</html>
So wherever the SectionHeader function was called, it would read the passed in parameter, and insert the HTML for the section header, such as {title}. I'm trying to figure out how to accomplish the same thing in Node.
I understand how to do a basic token replacement - reading a static HTML file, looking for a token (such as {token1}) and replacing it with something. But short of using Regex and complex string manipulation, is there any way to accomplish the same thing in Node that I'm doing with ASP.NET?
I took the generated application skeleton, and modified the index.js and index.jade to pass a function into the template. I think is what you are asking for, but there may be opinions if this is a good architecture to have the template call back into logic.
index.js
exports.index = function(req, res){
var fn = function(initial) {
return initial + ". Tester";
};
res.render('index', { title: 'Express', fn : fn });
};
index.jade
block content
h1= title
p Welcome to #{title}
div Hello #{fn('A')}
Now, when I load http://localhost:3000/, this is what renders on the screen. Notice the "A" is being passed into the function to generate the string "A. Tester" for the output.
Express
Welcome to Express
Hello A. Tester
There are lots of templating engines for node, maybe you should try one of these. If you are searching for a web application framework express would be a good starting point, which has support for many templating enignes.
Of course you could do just string replacement, but a templating engine provides much more.

Categories