Reducing Query string size of PHP form [duplicate] - javascript

This question already has answers here:
How to compress/decompress a long query string in PHP?
(9 answers)
Closed 8 years ago.
I current have a URL like this
http://blahblah.com/process.php?q=[HUGEEEEEEEEEEEEEEEEEEEEEEEE STRING of 5000 chars]
My goal is to convert this something like
http://blahblah.com/process.php?q=[less charcters]
The first question:
How do I perform a function (encryption function for instance) on my GET variables before it is sent to the action page?
I've seen many questions asked with a similar topic.
The second question:
Assuming, I can do the above by some means (maybe by jQuery/JavaScript or something). How do I compress in the index.php page and decompress in the process.php page?
My attempt:
Searching for functions with fixed lengths:
I've looked at some encryptions that maintain the string size for ex. md5() gives a standard length that is short and tidy even for an extremely huge string. But unfortunately md5 cannot be decoded easily. Is there any other such function that I decode and which has a fixed length? If so, I could use that assuming I know a way to do Step 1.
EDIT: I write a request not to mark as a duplicate of that question and a question which hasn't been answered have specifically been asked again.. Please read #Jeremy 's comments, he was following this post.

I personally think it is best to use POST to send the data to the page. I am pretty much sure you can not use anything like MD5 to 'compress' the data because what MD5 does is hash the data, so it will look at your data run an algorithm to create this fixed length hash.
However, there is an extremely small possibility that two data sets will create the same hash, therefore it seems to me impossible to reliably decrypt MD5 or other similar hashes. Check out this page for more on hash collisions.

Your problem is that you are using the internet the wrong way. The URL is limited (and it depends on the browser), so don't event to try to use long URLs - even when you want to shorten it.
Please keep in mind, that we are using the WordWideWeb for a long time and if you come into a deadend you just have to rethink your problem. Maybe you are using your current technology the wrong way.
So, use POST instead to transfer your data (as others mentioned before).
If you want to "compress" your data you should use a zip like thing and then you must make that URL confirm like BASE64 afterwards. This is not suitable in any way and completly hideous. (And of course it can not guarantee the length of your URL).
MD5 is a hash not a compression thing. MD5 is not reversable. Once you hash something you can not go back again. This is not a magical way to compress tons of megabytes into a single short number. This is to have a short thing that can tell if the original data was modified (if you do that twice).
See http://en.wikipedia.org/wiki/Hash_function
See http://en.wikipedia.org/wiki/MD5
BTW: It is the same as How to compress/decompress a long query string in PHP?

Related

PHP - Filtering user query to prevent all attacks [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 1 year ago.
Improve this question
A user submits a search query to my site.
I then take this query and use it in other places, as well as echo'ing it back out to the page.
Right now I'm using htmlspecialchars(); to filter it.
What other steps should I take to prevent XSS, SQL Injection, etc, and things I can't even think of. I want to have all my bases covered.
<?php
$query = $_GET["query"];
$query = htmlspecialchars($query);
?>
Right now I'm using htmlspecialchars(); to filter it.
What other steps should I take to prevent XSS, SQL Injection, etc, and things I can't even think of. I want to have all my bases covered.
To cover all your bases, this depends a lot. The most straight forward (but unsatisfying) answer then probably is: do not accept user input.
And even this may sound easy, it is often not and then forgotten that any input from a different context has to be considered user input. For example when you open a file from the file-system, e.g. reading records from a database or some other data from some other system or service - not only some parameter from the HTTP request or a file upload.
Thinking this through, in context of PHP, this normally also includes the PHP code itself which is often read from disk. Not SQL, just PHP code injection.
So if you really think about the question in such a generally broad way ("etc"), the first thing you need to ensure is you've got a defined process to deploy the application and have checks and marks in place that the files of the deployment can't be tempered with (e.g. a read-only file-system). And from the operational side: You can create and restore the known state of the program within seconds with little or no side effects.
Only after that you should start to worry about other kind of user-input. For which - to complete the answer - you should only accept what is acceptable.
A user submits a search query to my site.
Accepting a search query is the higher art of user input. It involves (free form) text which tends to become more and more complex after every other day and may also include logical operators and instructions which may require parsing which involves even more components that can break and can be exploited by various kind of attacks (and SQL Injection is only one of those, albeit still pretty popular). So plan ahead for it.
As a first level mitigation, you can question if the search is really a feature that is needed. Then if you have decided on that, you should outline which problems it generally creates and you should take a look if the problems are common. That question is important because common questions may already have answers, even common answers. So if a problem is common, it is likely that the problem is already solved. Leaning towards an existing solution then only bears the problem to integrate that solution (that is understanding the problem - you always need to do it and you learn soon enough, one or two decades is normally fine - and then understanding the specific solution as you need to integrate it).
For example:
$query = $_GET["query"];
$query = htmlspecialchars($query);
is making use of variable re-use. This is commonly known to be error prone. Using different variable names that mark the context of its value(s) can help:
$getQuery = $_GET["query"];
$htmlQuery = htmlspecialchars($getQuery);
It is then more visible that $htmlQuery can be used in HTML output to show the search query (or at least was intended for it). Similar to $_GET["query"], it would make totally visible that $getQuery would not be appropriate for HTML output and its string concatenation operations.
In the original example, this would not be equally visible for $query.
It would then perhaps also made visible that in other than HTML output contexts, it ($htmlQuery) is not appropriate either. As your question suggests you already imagine that $getQuery or $htmlQuery is not appropriate to deal with the risks of an SQL Injection for example.
The example is intentionally verbose on the naming, real-life naming schemes are normally different and wouldn't emphasize the type on the variable name that much but would have a concrete type:
try {
...
$query = new Query($_GET["query"]);
...
<?= htmlspecialchars($query) ?>
If you already read up to this point, it may become more clear that there hardly can not be any one-size-fits-it-all function that magically prevents all attacks (apart from muting any kind of user-input which sometimes is equal to deleting the overall software in the first place - which is known to be safe, perhaps most of all for your software users). If you allow me the joke, maybe this is it:
$safeQuery = unset($_GET["query"]); // null
which technically works in PHP, but I hope you get the idea, it's not really meant as an answer to your question.
So now as it is hopefully clear that each input needs to be treated in context of input and output to work, it should give some pointers how and where to look for the data-handling that is of need.
Context is a big word here. One guidance is to take a look if you're dealing with user data (user input) in the input phase of a system or in the output phase.
In the input phase what you normally want to do is to sanitize, to verify the data. E.g. is it correctly encoded? Can the actual value or values the data represents (or is intended to represent) be safely decoded? Can any actual value be obtained from that data? If the encoding is already broken, ensure no further processing of that data is done. This is basically error handling and commonly means to refuse input. In context of a web-application this can mean to close the connection on the TCP transport layer (or not send anything (back) on UDP), to respond with a HTTP Status Code that denotes an error (with or without further, spare details in the response body), with a more user-friendly hypertext message in the response body, or, for a HTML-Form dedicated error messages for the part of the input that was not accepted and for some API in the format that the client can consume for the API protocol to channel out errors with the request input data (the deeper you go, the more complicated).
In the output phase it is a bit different. If you for example identified the user-input being a search query and passed the query (as value) to a search service or system and then get back the results (the reflected user input which still is user input), all this data needs to be correctly encoded to transport all result value(s) back to the user. So for example if you output the search query along with the search results, all this data needs to be passed in the expected format. In context of a web application, the user normally tells with each request what the preferred encoding of the response should be. Lets say this is normally hypertext encoded as HTML. Then all values need to be output in a way/form so that these are properly represented in HTML (and not for some error as HTML, e.g. a search for <marquee> would not cause the whole output to move all over the page - you get the idea).
htmlspecialchars() may do the job here, so might by chance htmlentities(), but which function to use with which parameters highly depends on underlying encoding like HTTP, HTML or character encoding and to which part something belongs in the response (e.g. using htmlspecialchars() on a value that is communicated back with a cookie response header would certainly not lead to intended results).
In the input phase you assert the input is matching your expectations so that you can safely let pass it along into the application or refuse further processing. Only you can know in detail what these requirements are.
In the output phase your job is to ensure that all data is properly encoded and formatted for the overall output to work and the user can safely consume it.
In the input phase you should not try to "fix" issues with the incoming data yourself, instead assume the best and communicate back that there will be no communication - or - what the problem was (note: do not let fool yourself: if this involves output of user input, mind what is important for the output phase of it, there is less risk in just dropping user input and not further process it, e.g. do not reflect it by communicating it back).
This is a bit different for the non-error handling output phase (given the input was acceptable), you err here on the safe side and encode it properly, you may even be fine with filtering the user-data so that it is safe in the output (not as the output which belongs to your overall process, and mind filtering is harder than it looks on first sight).
In short, don't filter input, only let it pass along if it is acceptable (sanitize). Filter input only in/for output if you do not have any other option (it is a fall-back, often gone wrong). Mind that filtering is often much harder and much more error prone incl. opening up to attacks than just refusing the data overall (so there is some truth in the initial joke).
Next to input or output context for the data, there is also the context in use of the values. In your example the search query. How could anyone here on Stackoverflow or any other internet site answer that as it remains completely undefined in your question: A search query. A search query for what? Isn't your question itself in a search for an answer? Taking it as an example, Stackoverflow can take it:
Verify the input is in the form of a question title and its text message that can safely enter their database - it passed that check, which can be verified as your question was published.
With your attempt to enter that query on Stackoverflow, some input validation steps were done prior sending it to the database - while already querying it: Similar questions, is your user valid etc.
As this short example shows, many of the questions for a concrete application (your application, your code) needs not only the basic foundation to work (and therefore do error handling on the protocol level, standard input and output so to say), but also to build on top of it to work technically correct (a database search for existing questions must not be prone to SQL injection, neither on the title, not on the question text, nor must the display of error messages or hints introduce other form of injections).
To come back to your own example, $htmlQuery is not appropriate if you need to encode it as a Javascript string in a response. To encode a value within Javascript as a string you would certainly use a different function, maybe json_encode($string) instead of htmlspecialchars($string).
And for passing the search query to a search service, it may be as well encoded differently, e.g. as XML, JSON or SQL (for which most database drivers offers a nice feature called parameterized queries or more formalized prepared statements which are of great help to handle input and output context more easily - common problems, common solutions).
prevent XSS, SQL Injection, etc, and things I can't even think of. I want to have all my bases covered.
You may already now spot the "error" with this "search query". It's not about the part that there aren't things you or anyone else can even think of. Regardless of how much knowledge you have, there always will be known and unknown unknowns. Next to the just sheer number of mistakes we do encode into software each other day. The one "wrong" perhaps is in thinking that there would be a one-size-fits-it-all solution (even in good intend as things must have been solved already - and truly most have been, but still one needs to learn about them first, so good you ask) and perhaps more important the other one to assume that others are solving your problems: your technical problems perhaps, but your problems you can only solve yourself. And if that sentence may sound hard, take the good side of it: You can solve them. And I write this even I can only give a lengthy answer to your question.
So take any security advice - including the text-wall I just placed here - on Stackoverflow or elsewhere with a grain of salt. Only your own sharp eyes can decide if they are appropriate to cover your bases.
Older PHP Security Poster (via my blog)

How to check check all possible decodings for data

Very worried this question may be abused, but here goes!
My server is receiving some data through a GET request in the URL, which has been encoded in some manner, I thought it would be base64 but that is not the case.
I would like to write a js function to accept this string and try all possible encodings/decodings.
Please help!
The data is in this sort of format: ZHpuxRiviOUGeOKTKdw
There is the potential it is encrypted, if that is the case I am not asking for help in decrypting it!
EDIT: my goal is not to write a perfect algorithm which can do this automatically - I simply want to try x top most popular decodings of this string and print them out.
You can't really do that, how would you know if ZHpuxRiviOUGeOKTKdw wasn't actually the string? If this is your server and this is data you're receiving then you should know how the string is encoded based on what your application does.

MD5 with salt in PHP and JavaScript

I need a common hashing method both in php and javascript, something like MD5 or if not MD5 then something to use salt, but to generate same result from php and javascript.
What I wanted to do is, I have a series of questions that I will ask user and users has to answer them, but to make it fast and avoid delay to check the user answers from server, I also want to load the answers with questions and match them in javascript as users answer them. Now I need to bring the answers hashed from php server, and when I am matching them with users answers, I would hash the user answer and match it with the hashed answer from server.
Is it possible?
This is little more than obfuscation what you are doing. Presumably you want to prevent the users from cheating. Doing so and relying on client side code won't work securely.
When the client browser receives the answer hashes and the corresponding salts, the user can simply brute force the correct answers. The number of possible answers is so small that the user can try every possible answer with the received salt and find the matching answer by comparing the hashes.
Since this is nothing more than obfuscation, why make it complicated? Simply encode your correct answers with Base 64 or something like that. This will prevent most users from cheating. If you actually want to make sure that no user can cheat, you need to send the selected answers to the server.
If you actually want to go ahead with your plan, the fastest MD5 hasher for JavaScript right now is SparkMD5, not that you need the speed for your use case. CryptoJS also implements MD5 and PHP has the md5() function. All those implementations are compatible. You just have to make sure that you use the same encoding (Character encoding and Hex/Base64).
A construction with a salt may be md5(answer || salt) where || denotes concatenation. This is not really save, but it doesn't matter in your case anyway.

Basic Knowledge about HTML Query Strings needed

I am new to the world of webdesign and already assigned myself with a very (at least for me) difficult task: I want to build a webpage, that sends a query string to the website of the German Railway (bahn.de) with the parameters I entered on my webpage.
My question now is, if there is a way to decipher the answer, the other webpage (bahn.de) is sending back in regard to my query string.
In my case there will be departure and arrival times, fares, line numbers, .... Is it possible to extract this information from the answer the bahn.de- page is sending?
First and foremost you need to determine what type of data-encoding the website is returning. Is it XML or perhaps JSON? Both formats can be dealt with using a server-side language such as PHP, but the extraction process may differ slightly.
In order to continue along your learning path (which is great, by the way!) you'll need to find out a bit more about what kind of data object the website is sending back from your query. There are plenty of great resources at the other end of a Google search that can teach you how to handle that incoming data once you know which format it is in.

Strange javascript codes... Encrypted, Encoded or Packed? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
How can I obfuscate JavaScript?
I was browsing some sites and found really interesting thing. I am just starter in this coding and never seen such a thing, so I was wondering is it encrypted or encoded or packed or is there anything else?
Script sample:
V10861992380165541086199238016554108619923801655410861992380165541086199238016554108619923801655410861992380165541086199238016554='13047389474143951304738947414395130473894741439513047389474143951304738947414395130473894741439513047389474143951304738947414395130473894741439513047389474143951304738947414395130473894741439513047389474143951304738947414395130473894741439513047389474143951304738947414395130473894741439513047389474143951304738947414395130473894741439513047389474143951304738947414395130473894741439513047389474143951304738947414395130473894741439513047389474143951304738947414395130473894741439513047389474143951304738947414395'
or here is screenshot of one really long thing, couldnt even snap it all over my screen.
http://snpr.cm/8KznHp.png
http://snpr.cm/xOLfRE.png
Can anyone tell me what are these, and how can I do the same?
Do I need to pay for an program or something? Thank you for understanding.
All the line or code does is create a variable starting with V and put the number in it. Without seeing the rest of the code I cant tell if it is just encoded or encrypted as well, but if you notice the string is just repeating the number 1304738947414395.
You can definitely do a simple encoding by your self. A simple encoding is to put all the javascript code in a string like aaa="document.write('blah')" and then say aaa="atob(aaa) which converts the original string to base64. Save the base64 string and then place it in an eval statement like eval(btoa(aaa)) that converts it back to text, and then the eval executes the text. When it's finished you have some encoded mildly obfuscated code.

Categories