I'm interested in writing a script, preferably one easy to add on to browsers with tools such as Greasemonkey, that sends a page's HTML source code to an external server, where it will later be parsed and useful data would be sent to a database.
However, I haven't seen anything like that and I'm not sure how to approach this task. I would imagine some sort of HTTP post would be the best approach, but I'm completely new to those ideas, and I'm not even exactly where to send the data to parse it (it doesn't make sense to send an entire HTML document to a database, for instance).
So basically, my overall goal is something that works like this (note that I only need help with steps 1 and 2. I am familiar with data parsing techniques, I've just never applied them to the web):
User views a particular page
Source code is sent via greasemonkey or some other tool to a server
The code is parsed into meaningful data that is stored in a MySQL database.
Any tips or help is greatly appreciated, thank you!
Edit: Code
ihtml = document.body.innerHTML;
GM_xmlhttpRequest({
method:'POST',
url:'http://www.myURL.com/getData.php',
data:"SomeData=" + escape(ihtml)
});
Edit: Current JS Log:
Namespace/GMScriptName: Server Response: 200
OK
4
Date: Sun, 19 Dec 2010 02:41:55 GMT
Server: Apache/1.3.42 (Unix) mod_gzip/1.3.26.1a mod_auth_passthrough/1.8 mod_log_bytes/1.2 mod_bwlimited/1.4 FrontPage/5.0.2.2635 mod_ssl/2.8.31 OpenSSL/0.9.8e-fips-rhel5 PHP-CGI/0.9
Connection: close
Transfer-Encoding: chunked
Content-Type: text/html
Array
(
)
http://www.url.com/getData.php
As mentioned in the comment on your Q, I'm not convinced this is a good idea and personally, I'd avoid any extension that did this like the plague but...
You can use the innerHTML property available on all html elements to get the HTML inside that node - eg the body element. You could then use an AJAX HTTP(S!) request to post the data.
You might also want to consider some form of compression as some pages can be very large and most users have better download speeds than upload speeds.
NB: innerHTML gets a representation of the source code that would display the page in its current state, NOT the actual source that was sent from the web server - eg if you used JS to add an element, the source for that element would be included in innerHTML even though it was never sent across the web.
An alternative would be to use an AJAX request to GET the current URL and send yourself the response. This would be exactly what was sent to the client but the server in question will be aware the page was served twice (and in some web applications that may cause problems - e.g. by "pressing" a delete button twice)
one final suggestion would be to simply send the current URL to yourself and do the download on your own servers - This would also mitigate some of the security risks as you wouldn't be able to retrieve the content for pages which aren't public
EDIT:
NB: I've deleted much spurious information which was used in tracking down the problem, check the edit logs if you want full details
PHP Code:
<?php
$PageContents = $_POST['PageContents']
?>
GreaseMonkey script:
var ihtml = document.body.innerHTML;
GM_xmlhttpRequest({
method:'POST',
url:'http://example.com/getData.php',
data:"PageContents=" + escape(ihtml),
headers: {'Content-type': 'application/x-www-form-urlencoded'}
});
Related
I have a XUL extension which uses a native Windows DLL and js-ctypes to encrypt files inside the local file system. I have already tested a menu driven version of it and it seems to work fine.
Now I would like to do the following: When creating a new email with attachments, be able to "catch" the attachment file and process it (meaning: encrypt it) before uploading to the composed email message. I would like to do it in a transparent fashion so the user does not have to go through the menu driven process except for providing the password for encryption.
I want to do this inside the outlook.com web based email (not Office version).
I know it is a long shot, but does anybody have an idea on where to start looking? Anybody has done something like this in the past?
Thanks in advance!
A good place to start is an addon that already does what you want (in a generic way):
https://addons.mozilla.org/en-US/firefox/addon/tamper-data/
On the download page it says Use tamperdata to view and modify HTTP/HTTPS headers and post parameters. You're interested in changing 'post parameters' so that's a good place to start.
But if you just want to implement this yourself....
I've answered this out-of-order, in order to progress in the way you might build up the solution in development.
In the final extension, you'll need to:
Intercept the request
Target the correct requests(s)
Get access to the POST request body
Parse the POST request body's form data (to get the real binary file data)
Do your encryption step
Re-encode the binary file data, re-assemble the form-data, and modify the POST request headers
Replace the existing content in POST request.
Intercepting the request & replacing the existing POST content
The basics are that you need to implement an nsIObserver passing an nsIHTTPChannel as the "subject" for observation. The "notification" you wish to observe is called http-on-modify-request.
There are simple examples (1, 2) for intercepting GET requests in the documentation for http-on-modify-request, however intercepting POST requests is more complicated.
Getting at the POST request body:
There's a mozillazine forum thread that deals with this exact topic.
Kamelot9's 2nd post in that thread details how to (1) get at the post body:
var httpChannel = aSubject.QueryInterface(Ci.nsIHttpChannel);
var uploadChannel = httpChannel.QueryInterface(Ci.nsIUploadChannel);
var uploadChannelStream = uploadChannel.uploadStream;
uploadChannelStream
.QueryInterface(Ci.nsISeekableStream)
.seek(Ci.nsISeekableStream.NS_SEEK_SET, 0);
var stream = Cc["#mozilla.org/binaryinputstream;1"]
.createInstance(Ci.nsIBinaryInputStream);
stream.setInputStream(uploadChannelStream);
var postBytes = stream.readByteArray(stream.available());
var poststr = String.fromCharCode.apply(null, postBytes);
where aSubject here comes as a parameter to your http-on-modify-request notification. You can then just modify poststr. Depending on the server, you might also need to modify the Content-length header (or your post may be truncated).
Replacing POST request content:
Once you've got your modified POST body, you need to (2) replace the existing content of the inputStream in the uploadChannel with your own:
var inputStream = Cc["#mozilla.org/io/string-input-stream;1"]
.createInstance(Ci.nsIStringInputStream);
inputStream.setData(poststr, poststr.length);
uploadChannel.setUploadStream(
inputStream,
"application/x-www-form-urlencoded",
-1);
// do this last - setUploadStream resets requestMethod to PUT
httpChannel.requestMethod = "POST";
Cc and Ci above are just shorthand for Components.classes and Components.interfaces respectively. These shorthand variables may already be set up, or you can define them yourself.
Parsing the form data:
I think that normally for a file upload, the Content-type: will be multipart/form-data.
To get down to the particular 'attachment' you're interested in, you'll need to:
Parse the mime envelope to get out the file attachment
Find your file attachment
Remove whatever text encoding has been used (e.g: BASE64)
In the POST headers, you'll get something like:
Content-Type: multipart/form-data; boundary=JGUOAeGT3Fjgjcdk6s35F2mPVVyTdzgR
where 'JGUOAeGT3Fjgjcdk6s35F2mPVVyTdzgR' is the MIME boundary. In the body of the POST, the content will start will be formatted like this:
--[boundary]
CONTENT-PART #1
--[boundary]
CONTENT-PART #2
--[boundary]
Each CONTENT-PART above will have some HTTP headers, a blank line, then the body of that particular CONTENT-PART.
An example from another stackoverflow question:
Content-Disposition: form-data; name="updates"; filename="update1353963418000.json"
Content-Type: application/json; charset=UTF-8
Content-Transfer-Encoding: binary
{"collectionType":"activities","date":"2012-11-26","ownerId":"qw12er23","ownerType":"user","subscriptionId":"112233-activities"}]
In this case, the Content-Transfer-Encoding is binary (raw, encoded) UTF8, so you wouldn't need to do any more work to be able to read the JSON in the body of the CONTENT-PART.
In your case, the browser will be sending a binary file, so it'll likely have set the Content-Transfer-Encoding to base64, which means you'll need to Base64-decode the body of the CONTENT-PART to get to your real binary file. If base64data contains the encoded content, then this will give you the raw binary data:
var rawData = atob(base64data);
At that point you can do whatever encryption you want on rawData.
Remeber, you'll have to re-encode the binary data after your encryption (using btoa), then you'll need to re-assemble the multipart envelope, before re-constructing the POST request body. (Don't forget to get .length of the final request body so that you can substitute in the Content-length in the request headers.).
targetting the request(s):
That's the basic mechanism for modifying a POST request. But you've still got to single out your particular POST requests (examine the POST request URL in the observer notification) so that you allow other POST requests to proceed as normal without invoking your modification code.
So i have a canvas on which the user signs, now instead of converting it to a base 64 string i simply want to save it as an image itslef. whats the easiest way to do it html5??
You can easily do that this way (specifying the format as png in this case):
var img = canvas.toDataURL("image/png");
You can specify different image formats.
Take a look at this answer.
I've answered a similar question here:
Simulating user event
Assuming you are saving locally
You can go the route of creating an image from a Data URL, but then saving it is the trickier part that currently isn't very nice using HTML5. It's hopefully going to get better soon, if browsers incorporate the download attribute of the a tag.
Obviously if you have higher permissions than a standard webpage... i.e. you are designing a browser plugin - then there are other options...
If I were to implement something like this myself, at the moment, I would conceed to using a flash plugin to handle the save to the local computer.
Assuming you are saving remotely
By the sounds of it you aren't saving to a server, but if so this is quite easy by just POSTing the base64 information to a script written in a server-side scripting language (i.e. like PHP) and getting that to write the data directly as binary to a file. Obviously you have to make certain you do this securely however, you don't want just any binary data written to your server's filesystem.
Best of both worlds
If you've got the development time, the best method to get a canvas image saved locally - without Flash - is to create a server-side script that instead of saving the data to your server actually writes the Base64 information you send it directly back as a realised binary image file. That way you can create a form that posts your Base64 data to a new tab, this request is evaluated by the server-side, and the binary image is returned... at which point the browser asks the user where they wish to save their image.
You'll need to define the correct headers to force an image to download (rather than display in-browser). A simple change to force this is to set the server-side script's Content-type header to 'image/octect-stream'... there are other header options to set which would be best researched (i.e. headers that control the filename and so forth).
reflect.php
<?php
/// a simple example..
if ( array_key_exists('data', $_POST) && ($data = $_POST['data']) ) {
header('Content-type: image/octet-stream');
echo base64_decode( $data );
exit;
}
and the form...
<form action="reflect.php" method="post" target="_blank">
<input name="data" type="hidden" value=" ... Base64 put here with JS ... ">
</form>
(The whole form should be created dynamically and submitted automatically with JavaScript)
Improving the user experience
There are ways to avoid a new tab being created, but you'd have to research to make sure these other methods don't cause cross-browser problems... for example you could post your form data as part of an iframe (which would keep the process hidden), or just post the data directly on the current window (..and hope that all the browsers receive the correct request and open a download rather than replace your page content - most modern browsers should handle this).
Improving security
With regards to a PHP script that automatically returns binary data, you should keep the access to this script secured by one time use key / authentication token or something similar, and keep a limit on how much Base64 data you are willing to accept. It might not seem like it poses a secutiry risk - as you are not modifying your server in any way with what the user sends - but the dodgy people of this world could take your script and use it to send download request to other users... which if downloaded (and turned out to be unwanted trojans or viruses) would make your server implicit in providing the dodgy file.
All in all
Due to the effort required to get a simple thing like an image saved to the desktop, I wouldn't blame you for doing the following:
Embed the image in the page (after taking your snapshot from canvas) and ask the user to right click and Save as...
Hopefully future things will make this situation better...
I need to do as much as possible on the client side. In more details, I would like to use JavaScript to code an interface (which displays information to the user and which accepts and processes response from the user). I would like to use the web serve just to take a date file from there and then to send a modified data file back. In this respect I would like to know if the following is possible in JavaScript:
Can JavaScript read content of a external web page? In other words, on my local machine I run JavaScript which reads content of a given web page.
Can JavaScript process values filled in a HTML form? In other words, I use HTML and JavaScript to generate an HTML form. User is supposed to fill in the form and press a "Submit" button. Then data should be sent to the original HTML file (not to a web server). Then this data should be processed by JavaScript.
In the very end JavaScript will generate a local data-file and I want to send this file to a PHP web server. Can I do it with JavaScript?
Can I initiate an execution of a local program from JavaScript. To be more specific, the local program is written in Python.
I will appreciate any comments and answers.
It could technically, but can't in reality due to the same origin policy. This applies to both reading and writing external content. The best you can do is load an iframe with a different domain's page in it - but you can't access it programmatically. You can work around this in IE, see Andy E's answer.
Yes for the first part, mmmm not really for the second part - you can submit a form to a HTML page and read GET arguments using Javascript, but it's very limited (recommended maximum size of data around 1024 bytes). You should probably have all the intelligence on one page.
You can generate a file locally for the user to download using Downloadify. Generating a file and uploading it to a server won't be possible without user interaction. Generating data and sending it to a server as POST data should be possible, though.
This is very, very difficult. Due to security restrictions, in most browsers, it's mostly not possible without installing an extension or similar. Your best bet might be Internet Explorer's proprietary scripting languages (WScript, VBScript) in conjuction with the "security zones" model but I doubt whether the execution of local files is possible even there nowadays.
Using Internet Explorer with a local file, you can do some of what you're trying to do:
It's true that pages are limited by the same origin policy (see Pekka's link). But this can be worked around in IE using the WinHttpRequest COM interface.
As Pekka mentioned, the best you can manage is GET requests (using window.location.search). POST request variables are completely unobtainable.
You can use the COM interface for FileSystemObject to read & write local text files.
You can use the WScript.Shell interface's Exec method to execute a local program.
So just about everything you asked is attainable, if you're willing to use Internet Explorer. The COM interfaces will require explicit permission to run (a la the yellow alert bar that appears). You could also look at creating a Windows Desktop Gadget (Vista or Win 7) or a HTML Application (HTA) to achieve your goal.
Failing all that, turn your computer into a real server using XAMPP and write your pages in PHP.
see i got what you want to do
best things is do following
choose a javascript library (eg:jquery,dojo,yui etc), i use jquery.this will decrease some of your load
inspite of saving forms data in in a local file, store them in local variables process them and send them to server (for further processing like adding/updating database etc) using XMLHttp request, and when webservice returns data process that data and update dom.
i am showing you a sample
--this is dom
Name:<input type='text' id='name' />
<a href='javascript:void(0)' onClick='submit()'>Submit Form</a>
<br>
<div id='target'></div>
--this is js
function submit()
{
var _name=$('#name').val();// collect text box's data
//now validate it or do any thing you want
callWebservice(_name,_suc,_err);
//above call service fn has to be created by you where you send this data
//this function automatically do xmlHttprequest etc for you
//you have to create it ur self
}
//call this fn when data is sucessfully returned from server
function _suc(data)
{
//webservice has returned data sucessefully
//data= data from server, may be in this case= "Hello user Name"; (name = filled in input box);
//update this data in target div(manipulate dom with new data);
$('#target').html(data);
}
function _err()
{
//call this fn when error occurs on server
}
// in reality most of the work is done using json. i have shown u the basic idea of how to use js to manipulate dom and call servcies and do rest things. this way we avoid page-reloads and new data is visible to viewer
I would answer saying there's a lot you can do, but then in the comment to the OP, you say "I would like to program a group game."
And so, my answer becomes only do on the client side what you are able and willing to double check on the server side. Never Trust the Client!
And I do not want to do my job twice.
If you are going to do things on the client side, you will have to do it twice, or else be subject to rampant cheating.
We had the same question when we started our project.In the end we moved everything we could on the JS side. Here's our stack:
The backend receives and send JSON data exclusively.We use Erlang, but Python would be the same. It handles the authentication/security and the storage.
The frontend, is in HTML+CSS for visual elements and JS for the logic.A JS template engine converts the JSON into HTML. We've built PURE, but there are plenty of others available. MVC can be an overkill on the browser side, but IMO using a template engine is the least separation you can do.
The response time is amazing. Once the page and the JS/CSS are loaded(fresh or from the cache), only the data cross the network for each request.
I need to find out if something has changed in a website using an RSS Feed. My solution was to constantly download the entire rss file, get the entries.length and compare it with the last known entries.length. I find it to be a very inelegant solution. Can anyone suggest a different approach?
Details:
• My application is an html file which uses javascript. It should be small enough to function as a desktop gadget or a browser extension.
• Currently, it downloads the rss file every thirty seconds just to get the length.
• It can download from any website with an Rss feed.
Comments and suggestions are appreciated, thanks in advance~ ^^
Many RSS feeds use the <lastBuildDate> element, which is a child of <channel>, to indicate when they were last updated. There's also a <pubDate> element, child of <item>, that serves the same purpose. If you plan on reading ATOM feeds, they have the <updated> element.
There are HTTP headers that can be used to determine if a resource has changed. Learn how to use the following headers to make your application more efficient.
HTTP Request Headers
If-Modified-Since
If-None-Match
HTTP Response Headers
Last-Modified
ETag
The basic strategy is to store the above-mentioned response headers that are returned on the first request and then send the values you stored in the HTTP request headers in future requests. If the HTTP resource has not been changed, you'll get back an HTTP 304 - Not Modified response and the resource will not even be downloaded. So this results in a very lightweight check for updates. If the resource has changed, you'll get back an HTTP 200 OK response and the resource will be downloaded in the usual way.
You should be keeping track of the GUID's/ArticleId's to see if you've seen an article before.
You should also see if your source supports conditional gets. It will allow you to check if anything has changed without needing to download the whole file. You can quickly check with this tool to see if your source supports conditional gets. (I wish everyone did.)
Is it possible to use JavaScript to dynamically change the HTTP Headers received when loading an image from an external source? I'm trying to control the caching of the image (Expires, Max-Age, etc...) client-side since I do not have access to the server.
As the others have said, no, it is not possibly to manipulate http headers and caching directives from the server in client code.
What is possible
What you do have the ability to do is ensure you get a new file. This can be done by appending a unique string to the URL of the request as a query string parameter.
e.g. if you wanted to ensure you got a new file each hour
<script type="text/javascript">
var d = new Date();
url += ("?" +d.getYear() + "_" + d.getDay() + "_" + d.getHours());
</script>
What this does is add a value containing the year, day and hour to the url, so it will be unique for each hour, hence ensuring a new file request. (Not tested!)
Obviously this can be made much more generic and fine tuned, but hopefully you'll get the idea.
What is impossible
What you can't do is ensure you will not retrieve a new version from the server.
Caching directives are in the server's responsibility. You can't manipulate them on the client side.
Maybe it's an option for you to install a proxy server, e.g. if you are aiming at company employees?
I do not think Javascript can actually do that : the images are requested by the browser, and it's up to him to define HTTP-headers to issue.
One way to use some custom headers would be with some kind of Ajax-request, not passing by any <img> tag ; but you'd have to know what to do with the returned data... Don't think it would help much.
If you want your images to be kept in cache by the browser, you server has to send the right headers in the responses (like Etag, and/or Expires -- see mod_expires, for Apache, for instance)
If you want to be absolutly sure the browser will download a new image, and not use the version it has in cache, you should use a different URL each time.
This is often done using the timestamp as a parameter to the URL ; like example.com/image.jpg?123456789 (123456789 being, more or less, the current timestamp -- obviously less than more, but you get the idea : each second, the browser will see the URL has changed)
EDIT after the edit of the question :
The Expires header is generated by the server, and is one of the headers that come in the Response (it's not a header the client sends in the Request ; see List of HTTP headers).
So, you absolutly have no control over it from the client-side : it's the server that must be configured to do the work, here...
If you want more answers : what are you trying to do exactly ? Why ?