Pattern matching never terminates - javascript

I have this javascript for pattern matching. When I open the HTML file and the scripts run, it never ends. The page loads forever. The logs inside the if and else never print out. I am unable to find the problem.
var link="https://www.google.co.uk/search?source=hp&ei=EUtVWuX5JpGRkwWW_py4Cg&q=testing+for+schools&oq=testing&gs_l=psy-ab.1.1.0i131k1j0l9.7269.8065.0.9955.7.7.0.0.0.0.175.755.4j3.7.0....0...1.1.64.psy-ab..0.7.754...0i3k1.0.TglIEkPkeIU";
var pattern = "(https:\\/\\/)(.*\\.)*(google.co.uk)(\\/.*)*(\\/)*";
if(link.search(pattern) == 0)
{
console.log("inside if");
console.log("Match");
}
else
{
console.log("inside else");
console.log("Not Match");
}
EDIT:
I need a RegEx that represents almost any URL starts with https. The only thing that is variable is the domain name, e.g. google.co.uk. I thought my RegEx was perfect but it could not handle this case.
EDIT2:
The logic for the patter I need is: (any-sub-domain.)*(domain-name)(/something)* (/)*
EDIT3:
Sorry the previous edit corrected now. It was wrong because I did not put it in code.

Instead of using a regex to parse the whole URL, I suggest first using the URL object of JavaScript to extract the relevant parts of the URL. Then you can check attributes of the URL such as hostname and protocol using if:
var link = "https://www.google.co.uk/search?source=hp&ei=EUtVWuX5JpGRkwWW_py4Cg&q=testing+for+schools&oq=testing&gs_l=psy-ab.1.1.0i131k1j0l9.7269.8065.0.9955.7.7.0.0.0.0.175.755.4j3.7.0....0...1.1.64.psy-ab..0.7.754...0i3k1.0.TglIEkPkeIU";
var urlObject = new URL(link);
console.log(urlObject.hostname); // "www.google.co.uk"
console.log(urlObject.protocol); // "https:"
if (urlObject.protocol === "https:") {
if (urlObject.hostname.endsWith('google.co.uk')) {
console.log("this page is on Google UK");
} else {
console.log("this page is on some other HTTPS web site");
}
} else {
console.log("this page is not secured by HTTPS");
}

Related

Fortify Scan issue: Cross-Site Scripting issue when assigning a new URL

In my JavaScript code I am creating a URL to redirect my page to. It works fine, but when I run it through the Fortify Scan, it gives me the following error:
The method reloadParentTab() sends unvalidated data to a web browser on line (line number), which can result in the browser executing malicious code.
I've added a URL validator and ran the newly created URL through it, however, the error is still present.
Now, this is what it looks like now:
//This is reloadParentTab function, mentioned above
function reloadParentTab() {
if(window.opener && !window.opener.closed) { //checks if parent tab is present
//here we check a variable, irrelevant here, except that it decides whether we
//run window.opener.location.reload() or specify the URL, which is what the scan
//complains about
if (someTriggerValid()) {
var href = window.opener.location.href; //This is the source
if (validURL(href)) { // running the validator
//building the new URL
var newHref = href.substring(0, href.indexOf("tracking.")) +
"tracking.base.open.request.do?dataObjectKey=object.dsaidCase&trackingId=" + caseId;
//running the new URL through the validator, just to show that I tried it both ways
if (validURL(newHref)) {
//this is "Sink", that's where unvalidated data is, supposedly, sent,
//although I am validating it
window.opener.location.href = newHref;
}
}
} else {
window.opener.location.reload();
}
}
}
//And this is the validator
function validURL(str) {
var pattern = new RegExp('^(https?:\\/\\/)?'+ // protocol
'((([a-z\\d]([a-z\\d-]*[a-z\\d])*)\\.)+[a-z]{2,}|'+ // domain name
'((\\d{1,3}\\.){3}\\d{1,3}))'+ // OR ip (v4) address
'(\\:\\d+)?(\\/[-a-z\\d%_.~+]*)*'+ // port and path
'(\\?[;&a-z\\d%_.~+=-]*)?'+ // query string
'(\\#[-a-z\\d_]*)?$','i'); // fragment locator
return !!pattern.test(str);
}
So my question is: what am I missing and what else do I need to do to get this resolved? Any ideas are welcome. Thank you.

polling for RSS feed with casperjs not working

I am trying to match a token (string token) in the RSS feed using casperjs waitFor() but it does not seem to work. There are other ways (not using polling) to get around but I need to poll for it. Here is the code snippet:
casper.then(function() {
this.waitFor(function matchToken() {
return this.evaluate(function() {
if(!this.resourceExists(token)) {
this.reload();
return false;
}
return true;
});
});
});
The updates to rss url are not dynamic and hence, a refresh would be needed to check for the token. But it seems (from the access log) that I am not getting any hits (reload not working) on the rss url. Ideally, I would want to refresh the page if it doesn't see the token and then check for the token again & it should keep doing that until the waitFor times out.
I also tried using assertTextExists() instead of resourceExists() but even that did not work.
I am using PhantomJS (1.9.7) & the url is: https://secure.hyper-reach.com:488/rss/323708
The token I am looking for is --> item/272935. If you look at the url I have mentioned above, you will find this in a each guid tag. The reason why I am including "item/" also as a part of my token is so that it doesn't match any other numbers incorrectly.
evaluate() is the sandboxed page context. Anything inside of it doesn't have access to variables defined outside and this refers to window of the page and not casper. You don't need the evaluate() function here, since you don't access the page context.
The other thing is that casper.resourceExists() works on the resource meta data such as URL and request headers. It seems that you want to check the content of the resource. If you used casper.thenOpen() or casper.open() to open the RSS feed, then you can check with casper.getPageContent(), if the text exists.
The actual problem with your code is that you mix synchronous and asynchronous code in a way that won't work. waitFor() is the wrong tool for the job, because you need to reload in the middle of its execution, but the check function is called so fast that there probably won't be a complete page load to actually test it.
You need to recursively check whether the document is changed to your liking.
var tokenTrials = 0,
tokenFound = false;
function matchToken(){
if (this.getPageContent().indexOf(token) === -1) {
// token was not found
tokenTrials++;
if (tokenTrials < 50) {
this.reload().wait(1000).then(matchToken);
}
} else {
tokenFound = true;
}
}
casper.then(matchToken).then(function(){
test.assertTrue(tokenFound, "Token was found after " + tokenTrials + " trials");
});

JavaScript issue with matching URL

How can I add something in JavaScript that will check the web site URL of someone on a web site and then redirect to a certain page on the web site, if a match is found? For example...
The string we want to check for, will be mydirectory, so if someone went to example.com/mydirectory/anyfile.php or even example.com/mydirectory/index.php, JavaScript would then redirect their page / url to example.com/index.php because it has mydirectory in the url, otherwise if no match is found, don't redirect, I'm using the code below:
var search2 = 'mydirectory';
var redirect2 = 'http://example.com/index.php'
if (document.URL.substr(search2) !== -1)
document.location = redirect2
The problem with that, is that it always redirects for me even though there is no match found, does anyone know what's going wrong and is there a faster / better way of doing this?
Use String.indexOf() instead:
if (window.location.pathname.indexOf('searchTerm') !== -1) {
// a match was found, redirect to your new url
window.location.href = newUrl;
}
substr is not what you need in this situation, it extracts substrings out of a string. Instead use indexOf:
if(window.location.pathname.indexOf(search2) !== -1) {
window.location = redirect2;
}
If possible it's better to do this redirect on the server side. It will always work, be more search engine friendly and faster. If your users have JavaScript disabled, they won't get redirected.

If excludes url containing www

I have the below JavaScript, and when the url (window.location) does not contain www. the javascript IS executed
var windowloc = window.location; // http://mywebsite.com/
var homeurl = "http://mywebsite.com/";
if(windowloc==homeurl){
//JavaScript IS EXECUTED
}
and if it does the javascript is not executed.
var windowloc = window.location; // http://www.mywebsite.com/
var homeurl = "http://mywebsite.com/";
if(windowloc==homeurl){
//JavaScript is NOT executed.
}
How can I overcome this by allowing the JavaScript to accept urls (window.location) with and without www.
Use code like this see if the domain has www.mywebsite.com in it:
if (window.location.href.indexOf("//www.mywebsite.com/") != -1) {
// code to execute if it is www.mywebsite.com
} else {
// code to execute if it is not www.mywebsite.com
}
or, you could use just the hostname part of window.location like this to just check for the "www.":
if (window.location.hostname.indexOf("www.") != -1) {
// code to execute if it is www. something
} else {
// code to execute if it is not www. something
}
or if you wanted to check for exactly your entire domain, you could do it like this:
if (window.location.hostname === "www.mywebsite.com" {
// code to execute if it is www.mywebsite.com
} else {
// code to execute if it is not www.mywebsite.com
}
You can overcome that using regex, as I am sure other answers will provide. However, it's best practice for search engine optimization (SEO) to force your http://mywebsite.com/ to do a perminant redirect to http://www.mywebsite.com/ because search engines like Google consider the www. and www-less versions two separate websites.
Then you will not need two separate conditions because your url will always be the www. version.
if (window.location.href.indexOf("://www") === -1) {
// "www" excluded
} else {
// other stuff
}
edited the code sample to be more specific
if(window.location.href.indexOf('mywebsite.com')!= -1){
//do stuff
}
Use the hostname property of the location object to determine what address you're being served under:
if (location.hostname==='mywebsite.com')
// do something
location and other address-owning objects like links have properties like hostname, pathname, search and hash to give you the already-parsed pieces of the URL, so you don't have to try to pick apart URL strings yourself. Don't just look for the presence of www. in the location string as it might be somewhere else in the string that isn't the hostname.
But +1 Justin's answer: if you are trying to redirect alternative addresses such as a non-www address to a canonical address, the right way to do that is with an HTTP 301 redirect and not anything to do with JavaScript. This would normally be configured at the server level, eg for Apache you might use a Redirect in your .htaccess.

Need help with curl post and javascript

I'm trying to log in to some web site.
I did it before with some other site but this site is more complicated.
I used LIVE HTTP Headers to capture the post request.
I noticed that the post request was done correctly but from some reason I'm not being transferred to the correct url.
I went over the page source and I think this form is being transferred using JS.
This is what appended to the post arguments after the __VIEWSTATE variable:
&ctl00_Menu_MainMenu_ContextData=&ctl00%24middleContent%24TextBoxName=0526579737&ctl00%24middleContent%24TextBoxPass=LIRAN&ctl00%24middleContent%24TextBoxPriv=liran&ctl00%24middleContent%24CheckLicense=on
and this is the javascript function that validates this info:
function Continue_Click()
{
var LabelError = document.getElementById('ctl00_middleContent_LabelError');
var lnkButton1 = document.getElementById(middleContent + 'lnkButton1');
var msg = validateLoginPeleNumRecognizeUser(document.getElementById('ctl00_middleContent_TextBoxName').value);
if (msg == '')
{
if (validateLoginPeleNumEmail(document.getElementById('ctl00_middleContent_TextBoxName').value)){
musixMail = document.getElementById('ctl00_middleContent_TextBoxName').value;
var obj = document.getElementById('ctl00_middleContent_TextBoxPriv');
if (obj != null && obj.value != '')
msg = validateLoginUserLogin(obj.value);
if (msg == '')
{
if(document.getElementById('ctl00_middleContent_CheckLicense').checked)
{
if(log.login('recognize'))
{
__doPostBack('ctl00$middleContent$lnkButton1','');
}
}
else
LabelError.innerHTML = 'עליך להסכים לתנאי השימוש על מנת להמשיך לגלוש באתר';
}
else
LabelError.innerHTML = msg;
}
else{
msg = validateLoginPasswordLogin(document.getElementById('ctl00_middleContent_TextBoxPass').value);
if (msg == '')
{
var obj = document.getElementById('ctl00_middleContent_TextBoxPriv');
if (obj != null && obj.value != '')
msg = validateLoginUserLogin(obj.value);
if (msg == '')
{
if(document.getElementById('ctl00_middleContent_CheckLicense').checked)
{
if(log.login('recognize'))
{
__doPostBack('ctl00$middleContent$lnkButton1','');
}
}
else
LabelError.innerHTML = 'עליך להסכים לתנאי השימוש על מנת להמשיך לגלוש באתר';
}
else
LabelError.innerHTML = msg;
}
else
LabelError.innerHTML = msg;
}
}
else
LabelError.innerHTML = msg;
}
$(function(){
$('#ctl00_middleContent_TextBoxName,#ctl00_middleContent_TextBoxPass,#ctl00_middleContent_CellName').keypress(function(e){
if(e.keyCode==13)
Continue_Click();
});
Does anyone know how can I trigger this function when using curl?
Thanks
Curl is just a tool to make HTTP requests to a server, and (optionally) record the response. When you make the request you're thinking about, the server sends a few KB of text in response to curl's request.
In this case, the text is HTML with some Javascript embedded (or referenced). But curl doesn't know how to parse the HTML, because it's just a data-transfer tool. It received the data - job done.
So I think you're going to hit a dead end if you want to have curl automatically execute the Javascript. You'd need a JS engine to do this, as well as an HTML engine to parse the HTML and work out what JS commands should be actually run. Rhino or Spidermonkey could do the former, but since you don't have a JS file but an HTML file this won't work too well. Fundamentally, if you want this to work generally, comprehensively and autonomously, you'll need a tool that behaves identically to a browser - which, by definition, is a browser.
In most cases, if you're looking at a single site though, you can work out the request that curl needs to send by sniffing the requests made by a browser. Typically, in the worst case scenario you might need to use a regex on the returned text to extract e.g. the sessionID; for a given site this isn't so bad. If you're not prepared to have this level of brittleness, then curl is quite simply not an appropriate tool for what you're doing.

Categories