polling for RSS feed with casperjs not working

polling for RSS feed with casperjs not working - javascript

I am trying to match a token (string token) in the RSS feed using casperjs waitFor() but it does not seem to work. There are other ways (not using polling) to get around but I need to poll for it. Here is the code snippet:
casper.then(function() {
this.waitFor(function matchToken() {
return this.evaluate(function() {
if(!this.resourceExists(token)) {
this.reload();
return false;
}
return true;
});
});
});
The updates to rss url are not dynamic and hence, a refresh would be needed to check for the token. But it seems (from the access log) that I am not getting any hits (reload not working) on the rss url. Ideally, I would want to refresh the page if it doesn't see the token and then check for the token again & it should keep doing that until the waitFor times out.
I also tried using assertTextExists() instead of resourceExists() but even that did not work.
I am using PhantomJS (1.9.7) & the url is: https://secure.hyper-reach.com:488/rss/323708
The token I am looking for is --> item/272935. If you look at the url I have mentioned above, you will find this in a each guid tag. The reason why I am including "item/" also as a part of my token is so that it doesn't match any other numbers incorrectly.

evaluate() is the sandboxed page context. Anything inside of it doesn't have access to variables defined outside and this refers to window of the page and not casper. You don't need the evaluate() function here, since you don't access the page context.
The other thing is that casper.resourceExists() works on the resource meta data such as URL and request headers. It seems that you want to check the content of the resource. If you used casper.thenOpen() or casper.open() to open the RSS feed, then you can check with casper.getPageContent(), if the text exists.
The actual problem with your code is that you mix synchronous and asynchronous code in a way that won't work. waitFor() is the wrong tool for the job, because you need to reload in the middle of its execution, but the check function is called so fast that there probably won't be a complete page load to actually test it.
You need to recursively check whether the document is changed to your liking.
var tokenTrials = 0,
tokenFound = false;
function matchToken(){
if (this.getPageContent().indexOf(token) === -1) {
// token was not found
tokenTrials++;
if (tokenTrials < 50) {
this.reload().wait(1000).then(matchToken);
}
} else {
tokenFound = true;
}
}
casper.then(matchToken).then(function(){
test.assertTrue(tokenFound, "Token was found after " + tokenTrials + " trials");
});

Related

Removing broken links in offline HTML

I have an html file with many <a> tags with href links.
I would like to have the page do nothing when these links point to an outside url (http://....) or an internal link that is broken.
The final goal is to have the html page used offline without having any broken links. Any thoughts?
I have tried using a Python script to change all links but it got very messy.
Currently I am trying to use JavaScript and calls such as $("a").click(function(event) {} to handle these clicks, but these have not been working offline.
Also, caching the pages will not be an option because they will never be opened online. In the long run, this may also need to be adapted to src attributes, and will be used in thousands of html files.
Lastly, it would be preferable to use only standard and built in libraries, as external libraries may not be accessible in the final solution.
UPDATE: This is what I have tried so far:
//Register link clicks
$("a").click(function(event) {
checkLink(this, event);
});
//Checks to see if the clicked link is available
function checkLink(link, event){
//Is this an outside link?
var outside = (link.href).indexOf("http") >= 0 || (link.href).indexOf("https") >= 0;
//Is this an internal link?
if (!outside) {
if (isInside(link.href)){
console.log("GOOD INSIDE LINK CLICKED: " + link.href);
return true;
}
else{
console.log("BROKEN INSIDE LINK CLICKED: " + link.href);
event.preventDefault();
return false;
}
}
else {
//This is outside, so stop the event
console.log("OUTSIDE LINK CLICKED: " + link.href);
event.preventDefault();
return false;
}
}
//DOESNT WORK
function isInside(link){
$.ajax({
url: link, //or your url
success: function(data){
return true;
},
error: function(data){
return false;
},
})
}
Also an example:
Outside Link : Do Nothing ('#')
Outside Link : Do Nothing ('#')
Existing Inside Link : Follow Link
Inexistent Inside Link : Do Nothing ('#')

Javascript based solution:
If you want to use javascript, you can fix your isInside() function by setting the $.ajax() to be non asynchronous. That is will cause it to wait for a response before returning. See jQuery.ajax. Pay attention to the warning that synchronous requests may temporarily lock the browser, disabling any actions while the request is active (This may be good in your case)
Also instead of doing a 'GET' which is what $.ajax() does by default, your request should be 'HEAD' (assuming your internal webserver hasn't disabled responding to this HTTP verb). 'HEAD' is like 'GET' except it doesn't return the body of the response. So it's a good way to find out if a resource exists on a web server without having to download the entire resource
// Formerly isInside. Renamed it to reflect its function.
function isWorking(link){
$.ajax({
url: link,
type: 'HEAD',
async: false,
success: function(){ return true; },
error: function(){ return false; },
})
// If we get here, it obviously did not succeed.
return false;
}
Python based solution:
If you don't mind preprocessing the html page (and even caching the result), I would go with parsing the HTML in Python using a library like BeautifulSoup.
Essentially I would find all the links on the page, and replace the href attribute of those starting with http or https with #. You can then use a library like requests to check the internal urls and update the appropriate urls as suggested.

Here is some javascript that will prevent you from going to external site:
var anchors = document.getElementsByTagName('a');
for(var i=0, ii=anchors.length; i < ii; i++){
anchors[i].addEventListener('click',function(evt){
if(this.href.slice(0,4) === "http"){
evt.preventDefault();
}
});
}
EDIT:
As far as checking if a local path is good on the client side, you would have to send and ajax call and then check the status code of the call (infamous 404). However, you can't do ajax from a static html file (e.g. file://index.html). It would need to be running on some kind of local server.
Here is another stackoverflow that talks about that issue.

JavaScript browser navbar event

I want to prevent users to navigate to URL´s that are not accessed through html element. Example:
Actually navigating on: myweb.com/news
And I want to navigate to myweb.com/news?article_id=10 by writing this in the browser navigation bar to avoid pressing any element (like <a>).
When the user writes myweb.com/news?article_id=10 in the browser url, at the moment he presses enter, the browser should not allow him to navigate to the url.
I have tried:
//This wont work since jquery does not support it
$(window.location.href).on('change', function() {
//Here check if href contains '?'
alert("Not allowed");
});
//Neither works, doesnt do anything
$(window).on('change', function() {
alert("Not allowed");
});
References:
there is something similar asked here On - window.location.hash - Change?, but im interested in the 'parameter' version of that question.

There are some known solutions :
) Each time a user click a link - you save the page value to a cookie.
Later , at the server- you check that interval ( value-1 ... value+1).
) You can also save to a hidden field and check that value in the server.
So let's say a user is on page 3. ( the server serve that page - so a cookie/hidden value with value 3 is exists)
now he tries to go to page 10 :
you - in the server side - reads the cookie + requested Page number. if the interval is bigger than 1 - then you deny that request.

Try adding an event listener:
window.addEventListener('popstate', function(event)
{
var location = document.location;
var state = JSON.stringify(event.state);
});
To check the URL, The best thing would be to match it against a regex like:
if (url.match(/\?./)) {
// do not allow access
}
You might need to extend this, depending on other URL's that you need to forbid access to.

Synchronous cross domain javascript call, waiting for response - is it possible?

Disclaimer
Firstly, a disclaimer: I am working within specific boundaries, so whilst it may seem I'm going about something the long way round, I am limited as to what I can do. I know I should be doing this entirely differently, but I cannot. If it's not possible to do what I'm trying to do here, then that's fine, I just need to know.
Background
Basically, this boils down to a cross-domain javascript call. However, I need to wait for the response before returning the method.
Say I have a page - example1.com/host.html. This contains a javascript method of 'ProvideValue()' which returns an int. Edit: This method must be executed where it is found, since it may need to access other resources within that domain, and access global variables set for the current session.
https://example1.com/host.html
function ProvideValue(){
return 8; // In reality, this will be a process that returns a value
}
This host.html page contains an iframe pointing to example2.com/content.html (note the different domain). This content.html page contains a method that needs to display the value from host.html in an alert.
https://example2.com/content.html
function DisplayValue(){
var hostValue = //[get value from ProvideValue() in host.html]
alert(hostValue);
}
That's it.
Limitations
I can run any javascript I like on the host.html, but nothing server-side. On content.html I can run javascript and anything server-side. I have no control over the example1.com domain, but full control over example2.com.
Question
How can I retrieve the value from ProvideValue() on example1.com/host.html within the DisplayValue() method on example2.com/content.html?
Previous Attempts
Now, I've tried many of the cross-domain techniques, but all of them (that I've found) use an asynchronous callback. That won't work in this case, because I need to make the request to the host.html, and receive the value back, all within the scope of a single method on the content.html.
The only solution I got working involved relying on asynchronous cross-domain scripting (using easyXDM), and a server-side list of requests/responses in example2.com. The DisplayValue() method made the request to host.html, then immediately made a synchronous post to the server. The server would then wait until it got notified of the response from the cross-domain callback. Whilst waiting, the callback would make another call to the server to store the response. It worked fine in FireFox and IE, but Chrome wouldn't execute the callback until DisplayValue() completed. If there is no way to address my initial question, and this option has promise, then I will pose this as a new question, but I don't want to clutter this question with multiple topics.

Use XMLHttpRequest with CORS to make synchronous cross-domain requests.
If the server doesn't support cors, use a proxy which adds the appropriate CORS headers, e.g. https://cors-anywhere.herokuapp.com/ (source code at https://github.com/Rob--W/cors-anywhere).
Example 1: Using synchronous XHR with CORS
function getProvidedValue() {
var url = 'http://example.com/';
var xhr = new XMLHttpRequest();
// third param = false = synchronous request
xhr.open('GET', 'https://cors-anywhere.herokuapp.com/' + url, false);
xhr.send();
var result = xhr.responseText;
// do something with response (text manipulation, *whatever*)
return result;
}
Example 2: Use postMessage
If it's important to calculate the values on the fly with session data, use postMessage to continuously update the state:
Top-level document (host.html):
<script src="host.js"></script>
<iframe name="content" src="https://other.example.com/content.html"></iframe>
host.js
(function() {
var cache = {
providedValue: null,
otherValue: ''
};
function sendUpdate() {
if (frames.content) { // "content" is the name of the iframe
frames.content.postMessage(cache, 'https://other.example.com');
}
}
function recalc() {
// Update values
cache.providedValue = provideValue();
cache.otherValue = getOtherValue();
// Send (updated) values to frame
sendUpdate();
}
// Listen for changes using events, pollers, WHATEVER
yourAPI.on('change', recalc);
window.addEventListener('message', function(event) {
if (event.origin !== 'https://other.example.com') return;
if (event.data === 'requestUpdate') sendUpdate();
});
})();
A script in content.html: content.js
var data = {}; // Global
var parentOrigin = 'https://host.example.com';
window.addEventListener('message', function(event) {
if (event.origin !== parentOrigin) return;
data = event.data;
});
parent.postMessage('requestUpdate', parentOrigin);
// To get the value:
function displayValue() {
var hostName = data.providedValue;
}
This snippet is merely a demonstration of the concept. If you want to apply the method, you probably want to split the login in the recalc function, such that the value is only recalculated on the update of that particular value (instead of recalculating everything on every update).

OAuthException "(#210) Subject must be a page."

I Keep getting OAuthException (#210) Subject must be a page. error even if I am using the Page Access Token and not the App Access Token.
I am using the following:
Latest JavaScript SDK from facebook (//connect.facebook.net/en_US/all.js)
Calling the /{PAGE_ID}/tabs?app_id={APP_ID}&method=POST&access_token={PAGE_ACCESS_TOKEN} using the FB.api method once the user is logged in.
My Application is not FBML but a Canvas / iFrame App. What am i doing wrong?
I have researched the web including the Stackoverflow and other facebook forums but still no answer on this. OAuth is Enabled for my Application.
Also, If i copy and paste the link in Browser it works fine. It does not if I do it using the API.

I finally got it working.
However, Instead of using the FB.api to call the link above, i used jQuery.
I used jQuery "$.getJson(url)" and it worked.
It works as below.
Construct the link as below.
"https://graph.facebook.com/{PAGE_ID}/tabs?app_id={APP_ID}&method=POST&access_token={PAGE_ACCESS_TOKEN}&callback=?"
Call the jQuery method as below.
"$.getJSON(pageUrl, OnCallBack);" where "OnCallBack" is the call back method. You can do anything that you would need in the call back. In my case it was something like below.
function OnCallBack(r, s) {
var html = "";
if (s == "success" && !r.error) {
for (p in r) {
html += p + ": " + r[p] + "<br />";
}
} else {
html = r.error.message;
}
$("#dv").html(html);
}

To anybody who gets this error again:
I have get the same error message while I use wininet to post a https request to https://graph.facebook.com/......
I just changed the verb from "POST" to "GET" , and then it works well:
//string strVerb = "POST";
string strVerb = "GET";
PS: variable "strVerb" is used as the 2nd parameter of windows function HttpOpenRequest.

How to use javascript to get information from the content of another page (same domain)?

Let's say I have a web page (/index.html) that contains the following
<li>
<div>item1</div>
details
</li>
and I would like to have some javascript on /index.html to load that
/details/item1.html page and extract some information from that page.
The page /details/item1.html might contain things like
<div id="some_id">
picture
map
</div>
My task is to write a greasemonkey script, so changing anything serverside is not an option.
To summarize, javascript is running on /index.html and I would
like to have the javascript code to add some information on /index.html
extracted from both /index.html and /details/item1.html.
My question is how to fetch information from /details/item1.html.
I currently have written code to extract the link (e.g. /details/item1.html)
and pass this on to a method that should extract the wanted information (at first
just .innerHTML from the some_id div is ok, I can process futher later).
The following is my current attempt, but it does not work. Any suggestions?
function get_information(link)
{
var obj = document.createElement('object');
obj.data = link;
document.getElementsByTagName('body')[0].appendChild(obj)
var some_id = document.getElementById('some_id');
if (! some_id) {
alert("some_id == NULL");
return "";
}
return some_id.innerHTML;
}

First:
function get_information(link, callback) {
var xhr = new XMLHttpRequest();
xhr.open("GET", link, true);
xhr.onreadystatechange = function() {
if (xhr.readyState === 4) {
callback(xhr.responseText);
}
};
xhr.send(null);
}
then
get_information("/details/item1.html", function(text) {
var div = document.createElement("div");
div.innerHTML = text;
// Do something with the div here, like inserting it into the page
});
I have not tested any of this - off the top of my head. YMMV

As only one page exists in the client (browser) at a time and all other (virtual/possible) pages are on the server, how will you get information from another page using JavaScript as you will have to interact with the server at some point to retrieve the second page?
If you can, integrate some AJAX-request to load the second page (and parse it), but if that's not an option, I'd say you'll have to load all pages that you want to extract information from at the same time, hide the bits you don't want to show (in hidden DIVs?) and then get your index (or whoever controls the view) to retrieve the needed information from there ... even though that sounds pretty creepy ;)

You can load the page in a hidden iframe and use normal DOM manipulation to extract the results, or get the text of the page via AJAX, grab the part between <body...>...</body>¨ and temporarily inject it into a div. (The second might fail for some exotic elements like ins.) I would expect Greasemonkey to have more powerful functions than normal Javascript for stuff like that, though - it might be worth to thumb through the documentation.

We Keep Coding

JavaScript is the programming language of the Web.

polling for RSS feed with casperjs not working - javascript

Related

Removing broken links in offline HTML

JavaScript browser navbar event

Synchronous cross domain javascript call, waiting for response - is it possible?

OAuthException "(#210) Subject must be a page."

How to use javascript to get information from the content of another page (same domain)?

Categories

Resources