Tuesday, 6 December 2016

Why ConcurrentHashMap does not allow null keys and null values ?

We all know that ConcurrentHashMap (optimized version of  Hashtable) doesn't allow null key or null values. To know the reason behind this implementation, we need to know the internal implementation and its uses.

I am separating the above explanation in two steps.

  1. not allowing null key
  2. not allowing null values
1) Not allowing null key
The reason behind this implementation is due to ambiguity nature of the ConcurrentMap (ConcurrentHashMap & ConcurrentSkipListMaps) when it comes to threading environment. The main reason is that if map.get(key) returns null, we can't detect whether the key explicitly maps to null vs the key isn't mapped. In ConcurrentHashMap, It might be possible that key 'K' might be deleted in between the get(k) and containsKey(k) calls.

For example in below code of ConcurrentHashMap
Line 1: if (concurrentHashMap.containsKey(k)) {     
Line 2:      return concurrentHashMap.get(k);  
Line 3: } else {
Line 4:      throw new KeyNotPresentException();
Line 5: }

The above example i am  explaining with two thread(1 & 2). In Line 2, after checking thread 1 with the key "K" in concurrentHashMap(Line 1 - contains some value for the key "K"),  it will returns null (if thread 2 removed the key 'K' from concurrentHashMap). Instead of throwing KeyNotPresentException , it actually returns null.

The above scenario will not happen in case of non-concurrent map (HashMap), because there is no Concurrent access. So null key are allowed in HashMap by eliminating the ambiguity.

2) Not allowing null values
And also null values are not allowed in the ConcurrentMap (ConcurrentHashMap & ConcurrentSkipListMaps). If you look at the below code in ConcurrentHashMap.

Line 1. public V put(K key, V value) {
Line 2.   if (value == null)
Line 3.     throw new NullPointerException();
Line 4.   int hash = hash(key.hashCode());
Line 5.   return segmentFor(hash).put(key, hash, value, false);
Line 6. }

In line 2 & 3, throws NullPointerException by checking the value, this means, passed value could be used through the ConcurrentHashMap without expecting it to be null. I checked the usage of value in ConcurrentHashMap, there is no evidence or piece of code throwing NullPointerException.  Then i started investigating the internal implementation.

So wrote a simple program with modified ConcurrentHashMap by removing those Line 2 and Line 3 in the above code and created two thread (1 and 2), one take cares of inserting null values and other with reading it. Surprisingly i did not get any error. So i decided to check performance of get and put operation by measuring the time of both modified and unmodified version of ConcurrentHashMap. The get method of modified ConcurrentHashMap takes 5-8 times more time than unmodified one and put method of modified ConcurrentHashMap takes 3-5 times more than unmodified one.

The Reason is segment of ConcurrentHashMap takes a lock if the value of HashEntry becomes null.

GET METHOD
1 get(Object key, int hash) {
2.    if (count != 0) { 
3.      HashEntry e = getFirst(hash);
4.      while (e != null) {
5.        if (e.hash == hash && key.equals(e.key)) {
6.          V v = e.value;
7.            if (v != null)
8.              return v;
9.          return readValueUnderLock(e); //  takes lock if the value is null
10.       }
11.       e = e.next;
12.     }

PUT METHOD
1.  V put(K key, int hash, V value, boolean onlyIfAbsent) {
2.    lock();
3.    try {
4.      int c = count;
5.      if (c++ > threshold) 
6.        rehash();
7.      HashEntry[] tab = table;
8.      int index = hash & (tab.length - 1);
9.      HashEntry first = tab[index];
10.     HashEntry e = first;
11.     while (e != null && (e.hash != hash || !key.equals(e.key)))
12.       e = e.next;
13.     V oldValue;
14.     if (e != null) {
15.       oldValue = e.value;
16.       if (!onlyIfAbsent)
17.         e.value = value;
18.       } else {
19.         oldValue = null;
20.         ++modCount;
21.         tab[index] = new HashEntry(key, hash, first, value);  // New value for HashEntry
22.         count = c;
23.       }
24.     return oldValue;
25.   } finally {
26.   unlock();
27. }
28.  }

Now the questions is, if ConcurrentHashMap  is not allowing null value, then why the null check is implemented for HashEntry value in Line 7 of get method.  Reason is, consider a situation that writer thread is trying to create a new HashEntry in Line 21 of put method but not yet initialized, so actual value is not reflecting in value attribute of HashEntry, in the same time reader thread reads it value as null, so this recheck is done in Line 7 in the get method. 

And extra check in the Line 9 in get method is very costly in terms of performance and it is avoided 
for not null value. so modified ConcurrentHashMap takes long time to read and write null values than unmodified ConcurrentHashMap with non-null values.




If you like the above solution . Please share this blog

No comments:

Post a Comment

s