Searching is a pretty big deal I mean it's something that we
need to do really often and it kind of needs happen really quickly last time we
looked at an Olaf and solution to searching of corset is subject to having a
list that it's already sorted but question is can we do better you're watching
episode 3 of search algorithms hashing hello and welcome back to search
algorithms so our challenge today is can we perform searching in oh one yep
we're actually gonna attempt to search in constant sign disclaimer I'm actually
talking about the average case time complexity here but it is indeed possible
to do that we're gonna have to perform what is known as hashing so what is a
hash and how does it work well this is actually pretty innovative so let's
actually jump straight into seeing how we actually do things with the hash
table the idea is this what we have here is a hash table essentially it is just
a blank array like you normally would have except we know its size and we
initialize it to a pretty significant size so let's say I want to insert a new
item into this hash table I don't simply insert the new item at the top instead
what I'm gonna have to do is I'm gonna have to take my new piece of data and
actually put it through what is called a hash function simply put what the hash
function does is it takes the incoming data and does some mathematical
calculations to that the hash function produces a number this number is in fact
the position in which were supposed to insert the new item so what we're going
to do to make things clearer is instead of working with numbers like we did in
the past two episodes what I'm going to do is I'm gonna actually attempt to
search through letters instead so single letters from A to Z the hash function
I'm going to use in this example is a very simple one a being the first letter is
going to be 1 and B is going to be 2 and so on until Z which is 26 recall that
a hash function works by returning a position that is where you should insert
the new item so you see let's say I wanted insert C into the list what's going
to happen is the hash function is going to tell me treat seeing us at C is the
third alphabet it's going to get me free and what I'm going to do is I'm going
to put it into the foot position in the hash table so this is exactly what the
hash table looks like say I want to insert the letter e right now well I just
give that to hash function and that goes into position five now in the series
what we are primarily concerned with is searching so that's now attempt to
search this hash table let's say now I wanted to find out whereabouts e is in
debt array instead of actually searching through the array or actually
attempting some sort and binary search
what I can do is I can simply tell the
hash function hey where is the position e the hash function will tell me five
I'm going to go to the fifth position in hash table and then go that's a right
then now how about a situation whereby we actually search for something that
doesn't exist I'm gonna tell the hash function hey where's the position of B
and it's gonna tell me oh it's that position - I'm gonna go to position two of
the hash table and realize it's empty as a result I can be sure that B isn't
actually in the list and as a result our returned are not found that's all well
and good that's really simple right well in reality hash tables are a little
bit more complicated than this now since you're working with simplified examples what I'm doing here is just using you know a very simplistic kind of
hash function it's gonna be extremely impractical everyday things like this in
you know practical situations but just bear with me I'm doing this for the
purposes of demonstration so right now my input can be strings of any length
however my hash function stays the same it's just gonna look at the first
letter of whatever that's coming in and it's gonna you know just give you the
position based on that first letter now watch what happens let's say now I
insert the word cat it looks at the first letter that is C and tells me hey
it's going to go to position tree now let's say I want to insert the word
camera where does that go the hash function also tells you to bring that
deposition tree my position tree is already occupied this is called a hash
collision broadly speaking a hash collision happens when hash function points a
new item to a slot that is already occupied so how do we deal with this one
hash collision happens what we're going to have to do is we're going to have to
invoke a completely different set of logic to find a new position for this item
now handling hash collisions is an art in and of itself so what I'm going to do
is I'm only going to cover one very basic method of attempting to resolve a
hash collision
this is a best method but just for the purposes of demonstration
in this episode we're just going to stick with that so what I'm going to do in
this particular case is every time there is a hash collision I'm just going to
keep going down in hash table until I find an empty slot so if we have cat in
this position and I want to insert camera which is mapped to the same position
what I'm going to do is I'm going to move down position fall is empty so I'm
going to insert the word camera then alright that's a problem solved but wait
that just creates another problem if I'm inside the wet dock right now it's
going to try to go to position four and it's going to say hey that's a camera
in that place so it's going to have to move down and it's going to occupy slot
number five which is identity is the slot for any word starting with the letter
e but once again I don't go too deep into the specifics behind actual hashing
I'm more interested in this searching aspect so now what happens if I want to
search for camera I'm going to go to the hash function hash function it's going
to tell me position three I'm going to go to position three and say well that
doesn't look like a camera but this is not enough to tell me that what camera
is not phone what we're gonna have to do is we're going to have to apply the
same logic as set of hash collision we're gonna have to keep going down to see
if perhaps during insertion collisions have a cut to make it you know difficult
to find I move down by one slot and I realize hey that's where it is what
happens when you search for an item that is missing let's try to search for
conflicts the hash function points you in a position free and you say nope that
is not what I'm looking for you try to go down the position for that's not what
I'm looking for either you go around position five nope that's on it you go
down to position six and you say hey that's an empty slot only then can you
conclude that the item it's not found all right so now that we've looked at an
example let us now come back and try to consolidate what we've seen so far
firstly yes searching with a hash table can be very first depending on a plethora
of different factors like how well the hash function is written what size
is
the hash table and what kind of input data do you normally get in the average
case when you try to search for something it should be found very quickly
however if conditions aren't favorable what's going to happen is things can
start getting slow when there are a lot of collisions and you have to keep
doing collision detection logic what's going to happen is every time you
perform as such you're gonna have to go through multiple elements in the table
before we can come to your conclusion in fact the worst case comes about when
the entire table is filled when you perform a search in the worst case what's
gonna happen is well it's just gonna go through everything in the entire table
what this means is unfortunately the worst case time complexity for hash
algorithm is all N how the fingers crossed this doesn't happen very often and
as a result I want to emphasize that hashing has the average case time
complexity of or one that is constant time now do bear in mind that the or one
time complexity doesn't mean that I can find things in one step it just means
that I can find something in a constant number of steps that is number of steps
not bound or determined by the number of items in the list so despite its
shortcomings hashing is actually used very often many practical implementations
of hashing actually allow for say a hash table to expand when needed hash
functions can also be dynamically generated - actually you know what best for the
input data that's there and what happens is you get optimal performance hashing
is also a great way of mapping one data type to another but that is not a
feature that we are really going to discuss in depth today and that is it that
is pretty much all I have for this particular episode as well as this entire
series I know I've only covered three different types of search algorithms and
most of them have pretty serious shortcomings but hey that's how it is now if
we want to perform even more different types of complicated searches we might
have to look at more complicated structures like you know graphs things like
that which I'm not very comfortable with covering yet so as a result if that
happens there'll be a thing for the future hopefully what I've covered so far
is at least a primer for you to understand some of the more basic ways of
performing searches...
No comments:
Post a Comment