How a set in java knows about the duplicate entry?
A set is an unordered collection of objects(data) in which duplicate values cannot be stored.
If the set contains data of primitive data type (like int, String etc), the compiler have intelligence to identify duplicate values in set and so not allowing entry of duplicate values.
As in example below, when i am trying to add number 2 twice into my numberSet (from line 10 and from line 12), my set is storing it as once only. Amazing..
Now in my second example, i am storing employee details into my set. So the data type of my set is now Employee (Non-primitive data type). Mistakenly, i added the details of employee name “Rahul” twice. As i am using a set, i thought it’s feature is to remove duplicate records, so not to worries. But as in screenshot below, I am shocked on output.
Output:
My set internally able to identify that emp1 is an object whereas emp4 is just a reference to emp1 object. So when i am adding emp1 and emp4 both, my set knows i am adding same object twice (by comparing emp1==emp4). So it removed the duplicate emp4.
But my set has not removed duplicate record “Rahul” (its having both emp1 and emp2)……..
When my data type was Integer, set knows how to identify the duplicate records. Now when my data type is user-defined non-primitive Employee, set not knows whether the two objects emp1 and emp2 are same or not?
Why? Let’s see Integer and Employee class together…
Let’s override only hashCode in our Employee class, and check now if Set knows how to identify duplicate records.
Set still containing both emp1 and emp2. Both emp1 and emp2 having same hashCode, but without equals method, we cannot say are they same or not.
Now let’s try overriding equals only to Employee Class.
Set still containing both emp1 and emp2. As there is no implementation of hashCode, during addition of element, it will not invoke equals. As equals gets invoked only when two elements have same hashCode and at the time of addition of element to set.
If i am implementing both hashCode and equals to my employee class:
Now when a new element is added to a set, it always calls the hashCode method to see if already added elements have same hashCode or not.
If elements are having different hashcodes, they will be stored in different buckets And are not duplicate elements.
But if elements are having same hashcodes, they will be stored in same bucket. Now equals method gets triggered to identify elements are duplicate or not? If equals method returns true, duplicate entry will not be added again to the set.
We have learned that at the time of adding an element to a set, hashCode is used to identify which bucket to use to store this element. And if during this add operation, hashcode of new element matches with existing element, equals method will be used to identify its duplicate entry or not.
This is how set identifies about the duplicates. And non premitives data types must implements hashCode and equals methods.
Thanks for reading, share your feedback.
HAPPY CODING