Related
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed last year.
Improve this question
I need to store a hash of a single password in a .Net WinForms application.
What's the most secure way to do this?
In particular:
Salt, HMAC, or both?
How much salt?
How many iterations?
What encoding? (The password is plain ASCII)
I assume that the algorithm should be either SHA512 or HMACSHA512.
Salt your hash with secure random salt of at least 128bits or longer, to avoid a rainbow attack and use BCrypt, PBKDF2 or scrypt. PBKDF2 comes with NIST approval.
To quote: Archive.org: http://chargen.matasano.com/chargen/2007/9/7/enough-with-the-rainbow-tables-what-you-need-to-know-about-s.html
The problem is that MD5 is fast. So are its modern competitors, like
SHA1 and SHA256. Speed is a design
goal of a modern secure hash, because
hashes are a building block of almost
every cryptosystem, and usually get
demand-executed on a per-packet or
per-message basis.
Speed is exactly what you don’t want in a password hash function.
Fast password validation functions are a problem, cause they can be attacked using brute force. With all the algorithms above you can control the "slowness"
I can recommend BCrypt.net. Very easy to use and you can tune how long it will take to do the hashing, which is awesome!
// Pass a logRounds parameter to GenerateSalt to explicitly specify the
// amount of resources required to check the password. The work factor
// increases exponentially, so each increment is twice as much work. If
// omitted, a default of 10 is used.
string hashed = BCrypt.HashPassword(password, BCrypt.GenerateSalt(12));
// Check the password.
bool matches = BCrypt.CheckPassword(candidate, hashed);
For a server-side implementation with a large number of passwords, you should definitely use a tunable iterated approach like bcrypt. This well-known article on the topic is still (mostly) relevant:
http://www.securityfocus.com/blogs/262
For a single password in a stand-alone application, where the storage location is probably already secured by the system's own authentication system, I think it's much less important. A single strong hash is likely good enough, and adding salt is easy enough that there's no reason not to do so.
RNGCryptoServiceProvider to generate a random salt, then SHA512 the password with the salt, and finally store both the password hash and the corresponding salt if you want to later verify that some text equals the stored password.
Hash and Salt. If you only hash you could be attacked by a rainbow attack (reverse has lookup) and a salt makes this much more difficult (random salt would be best.) For your encoding you will probably want to either Base64 or Hex encode your resulting byte array. If you just try to store the byte array as Unicode you could run the risk of some data being lost because not all patterns are valid characters. This also allows for an easier way to compare hashes (just compare the base64 or hex string when you want to validate instead of comparing the byte array)
An increased number of rounds doesn't do much beyond slowing down would be attackers. But is also makes is much more difficult to reuse the hashes in the future if you lose or need to recreate your hash algorithm. You might check out a standard password hash such as crypt on unix systems. This allows for you to change out the hash algorithm and can even support versioning.
But again, a simple hash + salt is good enough for most applications.
Strictly looking at more secure:
Salt, HMAC, or both?
Both would be more secure. Since the key to the HMAC could be considered a salt, doing both would be a little redundant, but still more secure because it would take more work to crack.
How much salt?
Every bit of salt would double the combinations that would need to be maintained in a rainbow-table to easily crack the password. But since there is only one password, and only one salt, more may not be needed. The HMAC uses the block size of the underlying hash for its key size, 1024 bits for SHA512. The block size should be good enough for the salt, but doubling or tripling it would make cracking the password with a rainbow-table much, much harder.
How many iterations?
The more the better. Sure, more iterations means it will take longer to determine if the correct password was entered, but computers are fast and users will not mind waiting for a few seconds while verifying the password. Doing more iterations would mean that someone cracking the password would have to do more iterations too.
What encoding? (The password is plain ASCII)
Might as well encrypt (with AES) the over-iterated, over-salted, HMAC'ed, super-secure password along with its salt just to make it harder. Make the password for the encrypted password hash and key, be some combination of strings that should appear in the executable such as "RNGCryptoServiceProvider" or "System.Security.Cryptography". And while encoding we might as well convert it hex, or base64, or better yet base-36 or some other less expected conversion.
Note: This was mostly written in jest, but should still contain some truth.
I think you should stick with open standards. Among the current hash schemes, the "{ssha}" used by OpenLDAP is very secure and widely used. You can find the description here,
http://www.openldap.org/faq/data/cache/347.html
Most LDAP libraries implement this scheme.
You could follow a published standard, like pkcs#5. see http://en.wikipedia.org/wiki/PKCS for a short description, or https://www.rfc-editor.org/rfc/rfc2898 for the RFC.
Here is an API which will do everything you need/want :)
https://sourceforge.net/projects/pwdtknet
User equals untrustworthy. Never trust untrustworthy user's input. I get that. However, I am wondering when the best time to sanitize input is. For example, do you blindly store user input and then sanitize it whenever it is accessed/used, or do you sanitize the input immediately and then store this "cleaned" version? Maybe there are also some other approaches I haven't though of in addition to these. I am leaning more towards the first method, because any data that came from user input must still be approached cautiously, where the "cleaned" data might still unknowingly or accidentally be dangerous. Either way, what method do people think is best, and for what reasons?
Unfortunately, almost no one of the participants ever clearly understands what are they talking about. Literally. Only Kibbee managed to make it straight.
This topic is all about sanitization. But the truth is, such a thing like wide-termed "general purpose sanitization" everyone is so eager to talk about is just doesn't exist.
There are a zillion different mediums, each require it's own, distinct data formatting. Moreover - even single certain medium require different formatting for it's parts. Say, HTML formatting is useless for javascript embedded in HTML page. Or, string formatting is useless for the numbers in SQL query.
As a matter of fact, such a "sanitization as early as possible", as suggested in most upvoted answers, is just impossible. As one just cannot tell in which certain medium or medium part the data will be used. Say, we are preparing to defend from "sql-injection", escaping everything that moves. But whoops! - some required fields weren't filled and we have to fill out data back into form instead of database... with all the slashes added.
On the other hand, we diligently escaped all the "user input"... but in the sql query we have no quotes around it, as it is a number or identifier. And no "sanitization" ever helped us.
On the third hand - okay, we did our best in sanitizing the terrible, untrustworthy and disdained "user input"... but in some inner process we used this very data without any formatting (as we did our best already!) - and whoops! have got second order injection in all its glory.
So, from the real life usage point of view, the only proper way would be
formatting, not whatever "sanitization"
right before use
according to the certain medium rules
and even following sub-rules required for this medium's different parts.
It depends on what kind of sanitizing you are doing.
For protecting against SQL injection, don't do anything to the data itself. Just use prepared statements, and that way, you don't have to worry about messing with the data that the user entered, and having it negatively affect your logic. You have to sanitize a little bit, to ensure that numbers are numbers, and dates are dates, since everything is a string as it comes from the request, but don't try to do any checking to do things like block keywords or anything.
For protecting against XSS attacks, it would probably be easier to fix the data before it's stored. However, as others mentioned, sometimes it's nice to have a pristine copy of exactly what the user entered, because once you change it, it's lost forever. It's almost too bad there's not a fool proof way to ensure you application only puts out sanitized HTML the way you can ensure you don't get caught by SQL injection by using prepared queries.
I sanitize my user data much like Radu...
First client-side using both regex's and taking control over allowable characters
input into given form fields using javascript or jQuery tied to events, such as
onChange or OnBlur, which removes any disallowed input before it can even be
submitted. Realize however, that this really only has the effect of letting those
users in the know, that the data is going to be checked server-side as well. It's
more a warning than any actual protection.
Second, and I rarely see this done these days anymore, that the first check being
done server-side is to check the location of where the form is being submitted from.
By only allowing form submission from a page that you have designated as a valid
location, you can kill the script BEFORE you have even read in any data. Granted,
that in itself is insufficient, as a good hacker with their own server can 'spoof'
both the domain and the IP address to make it appear to your script that it is coming
from a valid form location.
Next, and I shouldn't even have to say this, but always, and I mean ALWAYS, run
your scripts in taint mode. This forces you to not get lazy, and to be diligent about
step number 4.
Sanitize the user data as soon as possible using well-formed regexes appropriate to
the data that is expected from any given field on the form. Don't take shortcuts like
the infamous 'magic horn of the unicorn' to blow through your taint checks...
or you may as well just turn off taint checking in the first place for all the good
it will do for your security. That's like giving a psychopath a sharp knife, bearing
your throat, and saying 'You really won't hurt me with that will you".
And here is where I differ than most others in this fourth step, as I only sanitize
the user data that I am going to actually USE in a way that may present a security
risk, such as any system calls, assignments to other variables, or any writing to
store data. If I am only using the data input by a user to make a comparison to data
I have stored on the system myself (therefore knowing that data of my own is safe),
then I don't bother to sanitize the user data, as I am never going to us it a way
that presents itself as a security problem. For instance, take a username input as
an example. I use the username input by the user only to check it against a match in
my database, and if true, after that I use the data from the database to perform
all other functions I might call for it in the script, knowing it is safe, and never
use the users data again after that.
Last, is to filter out all the attempted auto-submits by robots these days, with a
'human authentication' system, such as Captcha. This is important enough these days
that I took the time to write my own 'human authentication' schema that uses photos
and an input for the 'human' to enter what they see in the picture. I did this because
I've found that Captcha type systems really annoy users (you can tell by their
squinted-up eyes from trying to decipher the distorted letters... usually over and
over again). This is especially important for scripts that use either SendMail or SMTP
for email, as these are favorites for your hungry spam-bots.
To wrap it up in a nutshell, I'll explain it as I do to my wife... your server is like a popular nightclub, and the more bouncers you have, the less trouble you are likely to have
in the nightclub. I have two bouncers outside the door (client-side validation and human authentication), one bouncer right inside the door (checking for valid form submission location... 'Is that really you on this ID'), and several more bouncers in
close proximity to the door (running taint mode and using good regexes to check the
user data).
I know this is an older post, but I felt it important enough for anyone that may read it after my visit here to realize their is no 'magic bullet' when it comes to security, and it takes all these working in conjuction with one another to make your user-provided data secure. Just using one or two of these methods alone is practically worthless, as their power only exists when they all team together.
Or in summary, as my Mum would often say... 'Better safe than sorry".
UPDATE:
One more thing I am doing these days, is Base64 encoding all my data, and then encrypting the Base64 data that will reside on my SQL Databases. It takes about a third more total bytes to store it this way, but the security benefits outweigh the extra size of the data in my opinion.
I like to sanitize it as early as possible, which means the sanitizing happens when the user tries to enter in invalid data. If there's a TextBox for their age, and they type in anything other that a number, I don't let the keypress for the letter go through.
Then, whatever is reading the data (often a server) I do a sanity check when I read in the data, just to make sure that nothing slips in due to a more determined user (such as hand-editing files, or even modifying packets!)
Edit: Overall, sanitize early and sanitize any time you've lost sight of the data for even a second (e.g. File Save -> File Open)
The most important thing is to always be consistent in when you escape. Accidental double sanitizing is lame and not sanitizing is dangerous.
For SQL, just make sure your database access library supports bind variables which automatically escapes values. Anyone who manually concatenates user input onto SQL strings should know better.
For HTML, I prefer to escape at the last possible moment. If you destroy user input, you can never get it back, and if they make a mistake they can edit and fix later. If you destroy their original input, it's gone forever.
Early is good, definitely before you try to parse it. Anything you're going to output later, or especially pass to other components (i.e., shell, SQL, etc) must be sanitized.
But don't go overboard - for instance, passwords are hashed before you store them (right?). Hash functions can accept arbitrary binary data. And you'll never print out a password (right?). So don't parse passwords - and don't sanitize them.
Also, make sure that you're doing the sanitizing from a trusted process - JavaScript/anything client-side is worse than useless security/integrity-wise. (It might provide a better user experience to fail early, though - just do it both places.)
My opinion is to sanitize user input as soon as posible client side and server side, i'm doing it like this
(client side), allow the user to
enter just specific keys in the field.
(client side), when user goes to the next field using onblur, test the input he entered
against a regexp, and notice the user if something is not good.
(server side), test the input again,
if field should be INTEGER check for that (in PHP you can use is_numeric() ),
if field has a well known format
check it against a regexp, all
others ( like text comments ), just
escape them. If anything is suspicious stop script execution and return a notice to the user that the data he enetered in invalid.
If something realy looks like a posible attack, the script send a mail and a SMS to me, so I can check and maibe prevent it as soon as posible, I just need to check the log where i'm loggin all user inputs, and the steps the script made before accepting the input or rejecting it.
Perl has a taint option which considers all user input "tainted" until it's been checked with a regular expression. Tainted data can be used and passed around, but it taints any data that it comes in contact with until untainted. For instance, if user input is appended to another string, the new string is also tainted. Basically, any expression that contains tainted values will output a tainted result.
Tainted data can be thrown around at will (tainting data as it goes), but as soon as it is used by a command that has effect on the outside world, the perl script fails. So if I use tainted data to create a file, construct a shell command, change working directory, etc, Perl will fail with a security error.
I'm not aware of another language that has something like "taint", but using it has been very eye opening. It's amazing how quickly tainted data gets spread around if you don't untaint it right away. Things that natural and normal for a programmer, like setting a variable based on user data or opening a file, seem dangerous and risky with tainting turned on. So the best strategy for getting things done is to untaint as soon as you get some data from the outside.
And I suspect that's the best way in other languages as well: validate user data right away so that bugs and security holes can't propagate too far. Also, it ought to be easier to audit code for security holes if the potential holes are in one place. And you can never predict which data will be used for what purpose later.
Clean the data before you store it. Generally you shouldn't be preforming ANY SQL actions without first cleaning up input. You don't want to subject yourself to a SQL injection attack.
I sort of follow these basic rules.
Only do modifying SQL actions, such as, INSERT, UPDATE, DELETE through POST. Never GET.
Escape everything.
If you are expecting user input to be something make sure you check that it is that something. For example, you are requesting an number, then make sure it is a number. Use validations.
Use filters. Clean up unwanted characters.
Users are evil!
Well perhaps not always, but my approach is to always sanatize immediately to ensure nothing risky goes anywhere near my backend.
The added benefit is that you can provide feed back to the user if you sanitize at point of input.
Assume all users are malicious.
Sanitize all input as soon as possible.
Full stop.
I sanitize my data right before I do any processing on it. I may need to take the First and Last name fields and concatenate them into a third field that gets inserted to the database. I'm going to sanitize the input before I even do the concatenation so I don't get any kind of processing or insertion errors. The sooner the better. Even using Javascript on the front end (in a web setup) is ideal because that will occur without any data going to the server to begin with.
The scary part is that you might even want to start sanitizing data coming out of your database as well. The recent surge of ASPRox SQL Injection attacks that have been going around are doubly lethal because it will infect all database tables in a given database. If your database is hosted somewhere where there are multiple accounts being hosted in the same database, your data becomes corrupted because of somebody else's mistake, but now you've joined the ranks of hosting malware to your visitors due to no initial fault of your own.
Sure this makes for a whole lot of work up front, but if the data is critical, then it is a worthy investment.
User input should always be treated as malicious before making it down into lower layers of your application. Always handle sanitizing input as soon as possible and should not for any reason be stored in your database before checking for malicious intent.
I find that cleaning it immediately has two advantages. One, you can validate against it and provide feedback to the user. Two, you do not have to worry about consuming the data in other places.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 1 year ago.
Improve this question
A user submits a search query to my site.
I then take this query and use it in other places, as well as echo'ing it back out to the page.
Right now I'm using htmlspecialchars(); to filter it.
What other steps should I take to prevent XSS, SQL Injection, etc, and things I can't even think of. I want to have all my bases covered.
<?php
$query = $_GET["query"];
$query = htmlspecialchars($query);
?>
Right now I'm using htmlspecialchars(); to filter it.
What other steps should I take to prevent XSS, SQL Injection, etc, and things I can't even think of. I want to have all my bases covered.
To cover all your bases, this depends a lot. The most straight forward (but unsatisfying) answer then probably is: do not accept user input.
And even this may sound easy, it is often not and then forgotten that any input from a different context has to be considered user input. For example when you open a file from the file-system, e.g. reading records from a database or some other data from some other system or service - not only some parameter from the HTTP request or a file upload.
Thinking this through, in context of PHP, this normally also includes the PHP code itself which is often read from disk. Not SQL, just PHP code injection.
So if you really think about the question in such a generally broad way ("etc"), the first thing you need to ensure is you've got a defined process to deploy the application and have checks and marks in place that the files of the deployment can't be tempered with (e.g. a read-only file-system). And from the operational side: You can create and restore the known state of the program within seconds with little or no side effects.
Only after that you should start to worry about other kind of user-input. For which - to complete the answer - you should only accept what is acceptable.
A user submits a search query to my site.
Accepting a search query is the higher art of user input. It involves (free form) text which tends to become more and more complex after every other day and may also include logical operators and instructions which may require parsing which involves even more components that can break and can be exploited by various kind of attacks (and SQL Injection is only one of those, albeit still pretty popular). So plan ahead for it.
As a first level mitigation, you can question if the search is really a feature that is needed. Then if you have decided on that, you should outline which problems it generally creates and you should take a look if the problems are common. That question is important because common questions may already have answers, even common answers. So if a problem is common, it is likely that the problem is already solved. Leaning towards an existing solution then only bears the problem to integrate that solution (that is understanding the problem - you always need to do it and you learn soon enough, one or two decades is normally fine - and then understanding the specific solution as you need to integrate it).
For example:
$query = $_GET["query"];
$query = htmlspecialchars($query);
is making use of variable re-use. This is commonly known to be error prone. Using different variable names that mark the context of its value(s) can help:
$getQuery = $_GET["query"];
$htmlQuery = htmlspecialchars($getQuery);
It is then more visible that $htmlQuery can be used in HTML output to show the search query (or at least was intended for it). Similar to $_GET["query"], it would make totally visible that $getQuery would not be appropriate for HTML output and its string concatenation operations.
In the original example, this would not be equally visible for $query.
It would then perhaps also made visible that in other than HTML output contexts, it ($htmlQuery) is not appropriate either. As your question suggests you already imagine that $getQuery or $htmlQuery is not appropriate to deal with the risks of an SQL Injection for example.
The example is intentionally verbose on the naming, real-life naming schemes are normally different and wouldn't emphasize the type on the variable name that much but would have a concrete type:
try {
...
$query = new Query($_GET["query"]);
...
<?= htmlspecialchars($query) ?>
If you already read up to this point, it may become more clear that there hardly can not be any one-size-fits-it-all function that magically prevents all attacks (apart from muting any kind of user-input which sometimes is equal to deleting the overall software in the first place - which is known to be safe, perhaps most of all for your software users). If you allow me the joke, maybe this is it:
$safeQuery = unset($_GET["query"]); // null
which technically works in PHP, but I hope you get the idea, it's not really meant as an answer to your question.
So now as it is hopefully clear that each input needs to be treated in context of input and output to work, it should give some pointers how and where to look for the data-handling that is of need.
Context is a big word here. One guidance is to take a look if you're dealing with user data (user input) in the input phase of a system or in the output phase.
In the input phase what you normally want to do is to sanitize, to verify the data. E.g. is it correctly encoded? Can the actual value or values the data represents (or is intended to represent) be safely decoded? Can any actual value be obtained from that data? If the encoding is already broken, ensure no further processing of that data is done. This is basically error handling and commonly means to refuse input. In context of a web-application this can mean to close the connection on the TCP transport layer (or not send anything (back) on UDP), to respond with a HTTP Status Code that denotes an error (with or without further, spare details in the response body), with a more user-friendly hypertext message in the response body, or, for a HTML-Form dedicated error messages for the part of the input that was not accepted and for some API in the format that the client can consume for the API protocol to channel out errors with the request input data (the deeper you go, the more complicated).
In the output phase it is a bit different. If you for example identified the user-input being a search query and passed the query (as value) to a search service or system and then get back the results (the reflected user input which still is user input), all this data needs to be correctly encoded to transport all result value(s) back to the user. So for example if you output the search query along with the search results, all this data needs to be passed in the expected format. In context of a web application, the user normally tells with each request what the preferred encoding of the response should be. Lets say this is normally hypertext encoded as HTML. Then all values need to be output in a way/form so that these are properly represented in HTML (and not for some error as HTML, e.g. a search for <marquee> would not cause the whole output to move all over the page - you get the idea).
htmlspecialchars() may do the job here, so might by chance htmlentities(), but which function to use with which parameters highly depends on underlying encoding like HTTP, HTML or character encoding and to which part something belongs in the response (e.g. using htmlspecialchars() on a value that is communicated back with a cookie response header would certainly not lead to intended results).
In the input phase you assert the input is matching your expectations so that you can safely let pass it along into the application or refuse further processing. Only you can know in detail what these requirements are.
In the output phase your job is to ensure that all data is properly encoded and formatted for the overall output to work and the user can safely consume it.
In the input phase you should not try to "fix" issues with the incoming data yourself, instead assume the best and communicate back that there will be no communication - or - what the problem was (note: do not let fool yourself: if this involves output of user input, mind what is important for the output phase of it, there is less risk in just dropping user input and not further process it, e.g. do not reflect it by communicating it back).
This is a bit different for the non-error handling output phase (given the input was acceptable), you err here on the safe side and encode it properly, you may even be fine with filtering the user-data so that it is safe in the output (not as the output which belongs to your overall process, and mind filtering is harder than it looks on first sight).
In short, don't filter input, only let it pass along if it is acceptable (sanitize). Filter input only in/for output if you do not have any other option (it is a fall-back, often gone wrong). Mind that filtering is often much harder and much more error prone incl. opening up to attacks than just refusing the data overall (so there is some truth in the initial joke).
Next to input or output context for the data, there is also the context in use of the values. In your example the search query. How could anyone here on Stackoverflow or any other internet site answer that as it remains completely undefined in your question: A search query. A search query for what? Isn't your question itself in a search for an answer? Taking it as an example, Stackoverflow can take it:
Verify the input is in the form of a question title and its text message that can safely enter their database - it passed that check, which can be verified as your question was published.
With your attempt to enter that query on Stackoverflow, some input validation steps were done prior sending it to the database - while already querying it: Similar questions, is your user valid etc.
As this short example shows, many of the questions for a concrete application (your application, your code) needs not only the basic foundation to work (and therefore do error handling on the protocol level, standard input and output so to say), but also to build on top of it to work technically correct (a database search for existing questions must not be prone to SQL injection, neither on the title, not on the question text, nor must the display of error messages or hints introduce other form of injections).
To come back to your own example, $htmlQuery is not appropriate if you need to encode it as a Javascript string in a response. To encode a value within Javascript as a string you would certainly use a different function, maybe json_encode($string) instead of htmlspecialchars($string).
And for passing the search query to a search service, it may be as well encoded differently, e.g. as XML, JSON or SQL (for which most database drivers offers a nice feature called parameterized queries or more formalized prepared statements which are of great help to handle input and output context more easily - common problems, common solutions).
prevent XSS, SQL Injection, etc, and things I can't even think of. I want to have all my bases covered.
You may already now spot the "error" with this "search query". It's not about the part that there aren't things you or anyone else can even think of. Regardless of how much knowledge you have, there always will be known and unknown unknowns. Next to the just sheer number of mistakes we do encode into software each other day. The one "wrong" perhaps is in thinking that there would be a one-size-fits-it-all solution (even in good intend as things must have been solved already - and truly most have been, but still one needs to learn about them first, so good you ask) and perhaps more important the other one to assume that others are solving your problems: your technical problems perhaps, but your problems you can only solve yourself. And if that sentence may sound hard, take the good side of it: You can solve them. And I write this even I can only give a lengthy answer to your question.
So take any security advice - including the text-wall I just placed here - on Stackoverflow or elsewhere with a grain of salt. Only your own sharp eyes can decide if they are appropriate to cover your bases.
Older PHP Security Poster (via my blog)
NO. THAT SUGGESTION DOES NOT ANSWER THIS AT ALL. SEE CORRECT ANSWER BELOW.
I am building an application whereby I want a user to enter a password into a browser, which is sent via my server to another device running Python. The password then needs to be validated by the device running Python.
The problem is, I dont want my server handling passwords in any way. So I figured I could hash the password in the browser before it is sent, have the server pass on the hash to the device, then check the hash is equivalent on the Python side.
Python has a built-in library for this purpose, but it seems javascript does not. I thought I could leverage a public javascript library, but when I compare the results from the javascript SHA256 algorithm here to what the SHA256 function in Python produces it is not the same string of characters.
Is there a cross code hash function (or any other solution) I can use?
An Update
In response to a "gee whiz, this question is the same as all these ones" let me clarify. This is not about a strategy for storing passwords or finding a 'trustworthy' library (like the post suggested). There is NOT any discussion about cross code compatibility of SHA2 on this site. I could not even find a discussion that pointed out that different SHA2 implementations SHOULD produce the same result. I did plenty of research. In fact it was the various discussions about different javascript "implementations" of SHA2 that confused me. I actually tested a scenario myself, which further confused me as the website picked up a carriage return and produced a different hash. (see below)
This is about having a function in TWO languages that produces the same output...on different devices. I think it is actually an unusual application of hashing, as generally the same code layer is used to hash, store and compare hashed values.
In the rush to down-vote the question and establish mental superiority it seems to me the question was not read properly and incorrect assumptions were made. Hopefully contributors to this site will in future take a more considered and helpful approach like the successful answer.
The link for the javascript library I provided produced the following hash for the text 'MyPassword'
5e618e009fe35ea092150ad1f2c24e3181b4cf6693dc7bbd9a09ea9c8144720d
If I use the sha256 function from Python I get the result below, which seems to indicate to me that not all SHA256 functions are equal and produce the same result.
All proper implementations of SHA256 (or any hash/encryption) produce the same result if supplied with the same data. Your problem is solved by properly processing the data that you supply to the javascript library. The "5e61..." hash is a result of additional newline appended to the end of the "MyPassword" string, look:
In [1]: import hashlib
In [2]: hashlib.sha256(b'MyPassword').hexdigest()
Out[2]: 'dc1e7c03e162397b355b6f1c895dfdf3790d98c10b920c55e91272b8eecada2a'
In [3]: hashlib.sha256(b'MyPassword\n').hexdigest()
Out[3]: '5e618e009fe35ea092150ad1f2c24e3181b4cf6693dc7bbd9a09ea9c8144720d'
For the future, popular implementations of hashes and cryptographic algorithms are thoroughly tested, and if the answer seems wrong - it's probably because your data is wrong.
This question already has answers here:
How to compress/decompress a long query string in PHP?
(9 answers)
Closed 8 years ago.
I current have a URL like this
http://blahblah.com/process.php?q=[HUGEEEEEEEEEEEEEEEEEEEEEEEE STRING of 5000 chars]
My goal is to convert this something like
http://blahblah.com/process.php?q=[less charcters]
The first question:
How do I perform a function (encryption function for instance) on my GET variables before it is sent to the action page?
I've seen many questions asked with a similar topic.
The second question:
Assuming, I can do the above by some means (maybe by jQuery/JavaScript or something). How do I compress in the index.php page and decompress in the process.php page?
My attempt:
Searching for functions with fixed lengths:
I've looked at some encryptions that maintain the string size for ex. md5() gives a standard length that is short and tidy even for an extremely huge string. But unfortunately md5 cannot be decoded easily. Is there any other such function that I decode and which has a fixed length? If so, I could use that assuming I know a way to do Step 1.
EDIT: I write a request not to mark as a duplicate of that question and a question which hasn't been answered have specifically been asked again.. Please read #Jeremy 's comments, he was following this post.
I personally think it is best to use POST to send the data to the page. I am pretty much sure you can not use anything like MD5 to 'compress' the data because what MD5 does is hash the data, so it will look at your data run an algorithm to create this fixed length hash.
However, there is an extremely small possibility that two data sets will create the same hash, therefore it seems to me impossible to reliably decrypt MD5 or other similar hashes. Check out this page for more on hash collisions.
Your problem is that you are using the internet the wrong way. The URL is limited (and it depends on the browser), so don't event to try to use long URLs - even when you want to shorten it.
Please keep in mind, that we are using the WordWideWeb for a long time and if you come into a deadend you just have to rethink your problem. Maybe you are using your current technology the wrong way.
So, use POST instead to transfer your data (as others mentioned before).
If you want to "compress" your data you should use a zip like thing and then you must make that URL confirm like BASE64 afterwards. This is not suitable in any way and completly hideous. (And of course it can not guarantee the length of your URL).
MD5 is a hash not a compression thing. MD5 is not reversable. Once you hash something you can not go back again. This is not a magical way to compress tons of megabytes into a single short number. This is to have a short thing that can tell if the original data was modified (if you do that twice).
See http://en.wikipedia.org/wiki/Hash_function
See http://en.wikipedia.org/wiki/MD5
BTW: It is the same as How to compress/decompress a long query string in PHP?