Getting the text out of a different webpage through javascript

Getting the text out of a different webpage through javascript - javascript

First of all hello everybody,
I'm new to this site but I still searched for 10 minutes and couldn't find an answer to my question so I'll ask it now.
I'm trying to create a sidebar gadget for windows vista/7.
In that gadget I need to get some text from another page in the web (not in my domain).
I know I can't really do it but I've found a work around with an Iframe.
Right now the page I want is loading through the Iframe and I can see that.
The question is how do I get the entire text of that page in the Iframe into a label or text area or whatever...
Since the windows sidebar gadgets doesn't work with ASP I need this done with pure HTML and javascript.
Anyone that can help me please do,
Thanks,
Sagi.

The sidebar may not allow ASP, but it will certainly allow ajax calls, your asp page will reside on the server and do all this hard work, and your gadget will just call your asp page?

You'll probably want to do an ajax call to the server where the other webpage is hosted.
jQuery might help with this (using .ajax()), otherwise, just google how to do this normally in js.
[EDIT]
An intermediate server/proxy where you do the ajax calls might help, you can use access control headers to allox cross-site-scripting. See my here for more info :)

If the fact that the page escapes your Iframe is the issue you can set up an intermediate server, that gets the text from the server you want (as you mentioned; www.bbc.com), and serves it as clean text. Have your widget then include an ifram from the text from you intermediate server. This would be the cleanest approach really.

Related

How do you keep content from your previous web page after clicking a link?

I'm sorry if this is a newbie question but I don't really know what to search for either. How do you keep content from a previous page when navigating through a web site? For example, the right side Activity/Chat bar on facebook. It doesn't appear to refresh when going to different profiles; it's not an iframe and doesn't appear to be ajax (I could be wrong).
Thanks,

I believe what you're seeing in Facebook is not actual "page loads", but clever use of AJAX or AHAH.
So ... imagine you've got a web page. It contains links. Each of those links has a "hook" -- a chunk of JavaScript that gets executed when the link gets clicked.
If your browser doesn't support JavaScript, the link works as it normally would on an old-fashioned page, and loads another page.
But if JavaScript is turned on, then instead of navigating to an HREF, the code run by the hook causes a request to be placed to a different URL that spits out just the HTML that should be used to replace a DIV that's already showing somewhere on the page.
There's still a real link in the HTML just in case JS doesn't work, so the HTML you're seeing looks as it should. Try disabling JavaScript in your browser and see how Facebook works.
Live updates like this are all over the place in Web 2.0 applications, from Facebook to Google Docs to Workflowy to Basecamp, etc. The "better" tools provide the underlying HTML links where possible so that users without JavaScript can still get full use of the applications. (This is called Progressive Enhancement or Graceful degradation, depending on your perspective.) Of course, nobody would expect Google Docs to work without JavaScript.

In the case of a chat like Facebook, you must save the entire conversation on the server side (for example in a database). Then, when the user changes the page, you can restore the state of the conversation on the server side (with PHP) or by querying your server like you do for the chat (Javascript + AJAX).

This isn't done in Javascript. It needs to be done using your back-end scripting language.
In PHP, for example, you use Sessions. The variables set by server-side scripts can be maintained on the server and tied together (between multiple requests/hits) using a cookie.

One really helpful trick is to run HTTPFox in Firefox so you can actually monitor what's happening as you browse from one page to the next. You can check out the POST/Cookies/Response tabs and watch for which web methods are being called by the AJAX-like behaviors on the page. In doing this you can generally deduce how data is flowing to and from the pages, even though you don't have access to the server side code per se.
As for the answer to your specific question, there are too many approaches to list (cookies, server side persistence such as session or database writes, a simple form POST, VIEWSTATE in .net, etc..)

You can open your last closed web-page by pressing ctrl+shift+T . Now you can save content as you like. Example: if i closed a web-page related by document sharing and now i am on travel web page. Then i press ctrl+shift+T. Now automatic my last web-page will open. This function works on Mozilla, e explorer, opera and more. Hope this answer is helpful to you.

ideas/hacks for using javascript inside an iframe with remote content

I'm trying to throw together a proof of concept in rails where I want to put a remote sites content inside and iframe and then use jQuery to modify the content in the iframe. I know that I can't really do this because of cross site scripting protections and I also know it's not a great design; it's just a hack as a proof of concept. But, is there anyway to scrape the html from the remote site and pipe that into the iframe?
Thanks!

In short answer you can't do this.
However, you can try some crazy solution like download remote page using wget and then link file to the iframe. But then the page is not really in different domain.
BTW, you may want to have a look at https://developer.mozilla.org/en/DOM/window.postMessage maybe it will be helpful in some cases for you.

How to offer a webapp to other sites. (div with javascript, iframe or..?)

I am quite new to web application development and I need to know how would I make other sites use it.
My webapp basically gets a username and returns some data from my DB. This should be visible from other websites.
My options are:
iframe. The websites owners embed an iframe and they pass the userid in the querystring. I render a webpage with the data and is shown inside the iframe.
pros: easy to do, working already.
cons: the websites wont know the data returned, and they may like to know it.
javascript & div. They paste a div and some javascript code in their websites and the div content is updated with the data retrieved by the small javascript.
pros: the webside would be able to get the data.
cons: I could mess up with their website and I don't know wow would I run the javascript code appart from being triggered by a document ready, but I wouldn't like to add jquery libraries to their sites.
There must be better ways to integrate web applications than what I'm thinking. Could someone give me some advice?
Thanks

Iframes cannot communicate with pages that are on a different domain. If you want to inject content into someone else's page and still be able to interact with that page you need to include (or append) a JavaScript tag (that points to your code) to the hosting page, then use JavaScript to write your content into the hosting page.

Context Framework contains embedded mode support, where page components can be injected to other pages via Javascript. It does depend on jQuery but it can always be used in noConflict-mode. At current release the embedded pages must be on same domain so that same-origin-policy is not violated.
In the next release, embedded mode can be extended to use JSONP which enables embedding pages everywhere.

If what you really want is to expose the data, but not the visual content, then I'd consider exposing your data via JSONP. There are caveats to this approach, but it could work for you. There was an answer here a couple of days ago about using a Web Service, but this won't work directly from the client because of the browser's Same Origin policy. It's a shame that the poster of that answer deleted it rather than leave it here as he inadvertently highlighted some of the misconceptions about how browsers access remote content.

web crawler/spider to fetch ajax based link

I want to create a web crawler/spider to iteratively fetch all the links in the webpage including javascript-based links (ajax), catalog all of the Objects on the page, build and maintain a site hierarchy. My question is:
Which language/technology should be better (to fetch javascript-based links)?
Is there any open source tools there?
Thanks
Brajesh

You can automate the browser. For example, have a look at http://watir.com/

Fetching ajax links is something that even the search-giants haven't accomplished yet. It is because, the ajax links are dynamic and the command and response both vary greatly as per the user's actions. That's probably why, SEF-AJAX (Search Engine Friendly AJAX) is now being developed. It is a technique that makes a website completely indexable to search engines that when visited by a web browser, acts as a web application. For reference, you may check this link: http://nixova.com
No offence but I dont see any way of tracking ajax links. That's where my knowledge ends. :)

you can do it with php, simple_html_dom and java. let the php crawler copy the pages on your local machine or webserver, open it with an java application (jpane or something) mark all text as focused and grab it. send it to your database or where you want to store it. track all a tags or tags with an onclick or mouseover attribute. check what happens when you call it again. if the source html (the document returned from server) size or md5 hash is different you know its an effective link and can grab it. i hope you can understand my bad english :D

Can I prevent user pasting Javascript into Design Mode IFrame?

I'm building a webapp that contains an IFrame in design mode so my user's can "tart" their content up and paste in content to be displayed on their page. Like the WYSIWYG editor on most blog engines or forums.
I'm trying to think of all potential security holes I need to plug, one of which is a user pasting in Javascript:
<script type="text/javascript">
// Do some nasty stuff
</script>
Now I know I can strip this out at the server end, before saving it and/or serving it back, but I'm worried about the possibility of someone being able to paste some script in and run it there and then, without even sending it back to the server for processing.
Am I worrying over nothing?
Any advice would be great, couldn't find much searching Google.
Anthony

...I'm worried about the possibility of someone being able to paste some script in and run it there and then, without even sending it back to the server for processing.
Am I worrying over nothing?
Firefox has a plug-in called Greasemonkey that allows users to arbitrarily run JavaScript against any page that loads into their browser, and there is nothing you can do about it. Firebug allows you to modify web pages as well as run arbitrary JavaScript.
AFAIK, you really only need to worry once it gets to your server, and then potentially hits other users.

As Jason said, I would focus more on cleaning the data on the server side. You don't really have any real control on the client side unless you're using Silverlight / Flex and even then you'd need to check the server.
That said, Here are some tips from "A List Apart" you may find helpful regarding server side data cleaning.
http://www.alistapart.com/articles/secureyourcode

We Keep Coding

JavaScript is the programming language of the Web.

Getting the text out of a different webpage through javascript - javascript

The sidebar may not allow ASP, but it will certainly allow ajax calls, your asp page will reside on the server and do all this hard work, and your gadget will just call your asp page?

Related

How do you keep content from your previous web page after clicking a link?

ideas/hacks for using javascript inside an iframe with remote content

How to offer a webapp to other sites. (div with javascript, iframe or..?)

web crawler/spider to fetch ajax based link

Can I prevent user pasting Javascript into Design Mode IFrame?

Categories

Resources