Sie sind auf Seite 1von 10

5/18/2014

Hash Functions and its Importance | Pypix

Hash Functions and its


Importance
Tw eet

10

Like

10

103

Share

From time to time, servers and databases are stolen or compromised. With this in mind, it is important to ensure that
some crucial user data, such as passwords, can not be recovered. Today, we are going to learn the basics behind hashing
and what it takes to protect passwords in your web applications.

Disclaimer
Cryptology is a sufficiently complicated subject, and I am by no means an expert. There is constant research happening in
this area, in many universities and security agencies.
In this article, I will try to keep things as simple as possible, while presenting to you a reasonably secure method of storing
passwords in a web application.

What Does Hashing Do?


Hashing converts a piece of data (either small or large), into a relatively short piece of data such as a string or an integer.
This is accomplished by using a one-way hash function. One-way means that it is very difficult (or practically impossible)
to reverse it.
http://pypix.com/python/hash-functions/

1/10

5/18/2014

Hash Functions and its Importance | Pypix

Encryption secures messages so that they can be verified as accurate and protected from interception. Pythons
cryptography support includes hashlib for generating signatures of message content using standard algorithms, such as
MD5 and SHA, and hmac for verifying that a message has not been altered in transmission.
A common example of a hash function is md5() , which is quite popular in many different languages and systems.
1 import hashlib
2
3 data = "Hello World"
4
5 h = hashlib.md5()
6 h.update(data)
7 print(h.hexdigest())
8 # b10a8db164e0754105b7a99be72e3fe5

To calculate the MD5 hash, or digest, for a block of data (here an ASCII string), first create the hash object, and then add the
data and call digest() or hexdigest(). This example uses the hexdigest() method instead of digest() because the output
is formatted so it can be printed clearly. If a binary digest value is acceptable, use digest().

Using a Hash Function for Storing Passwords


The usual process during a user registration:
User fills out registration form, including the password field.
The web script stores all of the information into a database.
However, the password is run through a hash function, before being stored.
The original version of the password has not been stored anywhere, so it is technically discarded.

And the login process:


User enters username (or e-mail) and password.
The script runs the password through the same hashing function.
The script finds the user record from the database, and reads the stored hashed password.
Both of these values are compared, and the access is granted if they match.

Once we decide on a decent method for hashing the password, we are going to implement this process later in this article.
Note that the original password has never been stored anywhere. If the database is stolen, the user logins can not be
compromised, right? Well, the answer is it depends. Lets look at some potential problems.

Problem #1: Hash Collision


A hash collision occurs when two different data inputs generate the same resulting hash. The likelihood of this
http://pypix.com/python/hash-functions/

2/10

5/18/2014

Hash Functions and its Importance | Pypix

happening depends on which function you use.

How can this be exploited?


As an example, I have seen some older scripts which used crc32() to hash passwords. This function generates a 32-bit
integer as the result. This means there are only 2^32 (i.e. 4,294,967,296) possible outcomes.
Lets hash a password:
1 import binascii
2 result = binascii.crc32('supersecretpassword')
3 print(result) #323322056

Now, lets assume the role of a person who has stolen a database, and has the hash value. We may not be able to convert
323322056

into supersecretpassword, however, we can figure out another password that will convert to the same hash

value, with a simple script:

1 import binascii,base64
2
3 i=0
4 while True:
5

if binascii.crc32(base64.encodestring(bytes(i,))) == 323322056:

print(base64.encodestring(i))

i += 1

This may run for a while, though, eventually, it should return a string. We can use this returned string instead of
supersecretpassword

and it will allow us to successfully login into that persons account.

For example, after running this exact script for a few moments on my computer, I was given MTIxMjY5MTAwNg==. Lets test
it out:

1 import binascii
2
3 print(binascii.crc32("supersecretpassword"))
4 #323322056
5
6 print(binascii.crc32("MTIxMjY5MTAwNg=="))
7 #323322056

How can this be prevented?


Nowadays, a powerful home PC can be used to run a hash function almost a billion times per second. So we need a hash
function that has a very big range.
For

example,

md5()

http://pypix.com/python/hash-functions/

might

be

suitable,

as

it

generates

128-bit

hashes.

This

translates

into
3/10

5/18/2014

Hash Functions and its Importance | Pypix

340,282,366,920,938,463,463,374,607,431,768,211,456 possible outcomes. It is impossible to run through so many


iterations to find collisions. However some people have still found ways to do this. (See Here ).

Sha1
Sha1() is a better alternative, and it generates an even longer 160-bit hash value.

Problem #2: Rainbow Tables


Even if we fix the collision issue, were still not safe yet.
A rainbow table is built by calculating the hash values of commonly used words and their combinations.
These tables can have as many as millions or even billions of rows.
For example, you can go through a dictionary, and generate hash values for every word. You can also start combining
words together, and generate hashes for those too. That is not all; you can even start adding digits before/after/between
words, and store them in the table as well.
Considering how cheap storage is nowadays, gigantic Rainbow Tables can be produced and used.

How can this be exploited?


Lets imagine that a large database is stolen, along with 10 million password hashes. It is fairly easy to search the rainbow
table for each of them. Not all of them will be found, certainly, but, nonethelesssome of them will!

How can this be prevented?


We can try adding a salt. Here is an example:
1 import hashlib
2
3 password = "EasyPassword"
4
5 print(hashlib.sha1(password).hexdigest())
6 # ff166c2477f864d609ca8111680bfa387eb4e509
7
8 salt = "f#@V)Hu^%Hgfds"
9
10 print(hashlib.sha1(salt + password).hexdigest())
11 # 3e7edaceb96becaf69ae7e73073812ea136188e2

What we basically do is concatenate the salt string with the passwords before hashing them. The resulting string
obviously will not be on any pre-built rainbow table. But, were still not safe just yet!

Problem #3: Rainbow Tables (again)


Remember that a Rainbow Table may be created from scratch, after the database has been stolen.

http://pypix.com/python/hash-functions/

4/10

5/18/2014

Hash Functions and its Importance | Pypix

How can this be exploited?


Even if a salt was used, this may have been stolen along with the database. All they have to do is generate a new Rainbow
Table from scratch, but this time they concatenate the salt to every word that they are putting in the table.
For example, in a generic Rainbow Table,
f#@V)Hu^%Hgfdseasypassword

easypassword

may exist. But in this new Rainbow Table, they have

as well. When they run all of the 10 million stolen salted hashes against this table, they will

again be able to find some matches.

How can this be prevented?


We can use a unique salt instead, which changes for each user.
A candidate for this kind of salt is the users id value from the database:
1 hashlib.sha1(userid + password).hexdigest()

This is assuming that a users id number never changes, which is typically the case.
We may also generate a random string for each user and use that as the unique salt. But we would need to ensure that we
store that in the user record somewhere.

1 import hashlib, os
2
3 def unique_salt():
4

return hashlib.sha1(os.urandom(10)).hexdigest()[:22]

5
6 salt = unique_salt()
7 password = "" # str or int
8 hash = hashlib.sha1(salt + str(password)).hexdigest()
9 print(hash)
10 # 37dec03d2761122819f8708e6d5c8392ee02b40d

This method protects us against Rainbow Tables, because now every single password has been salted with a different
value. The attacker would have to generate 10 million separate Rainbow Tables, which would be completely impractical.

Problem #4: Hash Speed


Most hashing functions have been designed with speed in mind, because they are often used to calculate checksum values
for large data sets and files, to check for data integrity.

How can this be exploited?


As I mentioned before, a modern PC with powerful GPUs (yes, video cards) can be programmed to calculate roughly a
billion hashes per second. This way, they can use a brute force attack to try every single possible password.
You may think that requiring a minimum 8 character long password might keep it safe from a brute force attack, but lets
determine if that is, indeed, the case:
http://pypix.com/python/hash-functions/

5/10

5/18/2014

Hash Functions and its Importance | Pypix

If the password can contain lowercase, uppercase letters and number, that is 62 (26+26+10) possible characters.
An 8 character long string has 62^8 possible versions. That is a little over 218 trillion.
At a rate of 1 billion hashes per second, that can be solved in about 60 hours.

And for 6 character long passwords, which is also quite common, it would take under 1 minute.
Feel free to require 9 or 10 character long passwords, however you might start annoying some of your users.

How can this be prevented?


Use a slower hash function.
Imagine that you use a hash function that can only run 1 million times per second on the same hardware, instead of 1
billion times per second. It would then take the attacker 1000 times longer to brute force a hash. 60 hours would turn into
nearly 7 years!
One way to do that would be to implement it yourself:
1 import hashlib
2
3 def my_hash(password, salt):
4

hash = hashlib.sha1(salt + password).hexdigest()

5
6

for i in range(1000):

7
8

hash = hashlib.sha1(hash).hexdigest()
return hash

9
10 print(my_hash("12345", "f#@V)Hu^%Hgfds"))

Or you may use an algorithm that supports a cost parameter, such as BLOWFISH . In Python, this can be done using the
py-crypt

library.

1 import bcrypt
2
3 def my_hash(password):
4

return bcrypt.hashpw(password, bcrypt.gensalt(10))

5
6 print(my_hash("atdk"))
7 #$2a$10$WNhGOdVhoZrrKgwxGa2VIuzfAvm9oFWZF9PIVtLIoU5LQOVGLuLrq

Notice the output:


1. The first value is $2a, which indicates that we will be using the BLOWFISH algorithm.
2. The second value $10 in this case, is the cost parameter. This is the base-2 logarithm of how many iterations it will
run (10 => 2^10 = 1024 iterations.) This number can range between 04 and 31.
http://pypix.com/python/hash-functions/

6/10

5/18/2014

Hash Functions and its Importance | Pypix

Lets run an example:


1 import bcrypt, os, hashlib
2
3 def my_hash(password, unique_salt):
4

return bcrypt.hashpw(password, bcrypt.gensalt(10) + unique_salt)

5
6 def unique_salt():
7

return hashlib.sha1(os.urandom(10)).hexdigest()[:22]

8
9 password = "verysecret"
10
11 print(my_hash(password, unique_salt()))
12 # $2a$10$aHx0q.FE/tGvGWzlm6yePemYx9SAsBP2iSiy/uFx7pyjpy980Hita

The resulting hash contains the algorithm ($2a), the cost parameter ($10), and the 22 character salt that was used. The rest
of it is the calculated hash. Lets run a test:
1 import bcrypt, os, hashlib
2
3 # assume this was pulled from the database
4 hash = "$2a$10$6XDaX/3kNby0jI9Ih/Re7.478DOMZK9OnA2mTxKUP0My.39N.jdky"
5
6 # assume this is the password the user entered to log back in
7 password = "verysecret"
8
9 def check_password(hash, password):
10

salt = hash[:29]

11

new_hash = bcrypt.hashpw(password, salt)

12

return hash == new_hash

13
14 if check_password(hash, password):
15

print("Access Granted")

16 else:
17

print("Access Denied")

When we run this, we see Access Granted!

Putting it Together
With all of the above in mind, lets write a utility class based on what we learned so far:
1 import bcrypt, os, hashlib
2
3 class PassHash():
4
5

def unique_salt(self):
return hashlib.sha1(os.urandom(10)).hexdigest()[:22]

6
7
8

def hash(self, password):


return bcrypt.hashpw(password, bcrypt.gensalt(10) + self.unique_salt())

9
10

def check_password(self, hash, password):

http://pypix.com/python/hash-functions/

7/10

5/18/2014

Hash Functions and its Importance | Pypix

11

full_salt = hash[:29]

12

new_hash = bcrypt.hashpw(password, full_salt)

13

return hash == new_hash

14
15 obj = PassHash()
16
17 a = obj.hash("12345")
18 print(a) # $2a$10$gBSbmXKanQJOTSabtX4wfOE2RT2mKDFbCY6r7cqCJSk2YPGjIDrou
19
20 b = obj.check_password(a, "12345")
21 print(b) # True

Now, we can use this utility in our forms to hash the passwords securely.

Conclusion
This method of hashing passwords should be solid enough for most web applications. That said, dont forget: you can also
require that your members use stronger passwords, by enforcing minimum lengths, mixed characters, digits & special
characters.
A question to you, reader: how do you hash your passwords? Can you recommend any improvements over this
implementation?
Download article as PDF

About the author


Written by Ajay Kumar N

http://pypix.com/python/hash-functions/

8/10

5/18/2014

Hash Functions and its Importance | Pypix


WHAT'S THIS?

ALSO ON PYPIX

Developer Tools in Python 1 comment

IPython: The Complete Beginners Guide


3 comments

Taming Command Line Interface 9 comments

3 Comments

How to Create a Python Library 4 comments

pypix

Sort by Best

Login
Share

Favorite

Join the discussion


Levi Gross

4 months ago

There is a side-channel timing attack in your password comparison function.


`return hash == new_hash` should not be used when comparing hashes because it is a
linear time comparison. Make use of a constant time comparison instead e.g
https://github.com/django/djan...
2

Reply Share

aRkadeFR

2 months ago

Very thorough ! Thanks


Reply Share

laike9m

2 months ago

I think you made a mistake in the hash collision part.


i=0
while True:
if binascii.crc32(base64.encodestring(bytes(i,))) == 323322056:
print(base64.encodestring(i))
i += 1
i += 1 should be placed outside the "if", otherwise i will never change and program fails to
find the password.
Reply Share

Subscribe

http://pypix.com/python/hash-functions/

Add Disqus to your site

9/10

5/18/2014

http://pypix.com/python/hash-functions/

Hash Functions and its Importance | Pypix

10/10

Das könnte Ihnen auch gefallen