Tuesday, August 7, 2012

Comparison of Java Collections - Performance and other parameters

 I assume readers have basic knowledge about data general purpose data structures and Java Collections framework to under stand this article. 

Java Collections framework was designed and implemented using general purpose data structures to cater to different kinds of users. We do not need to write our own hash table or linked list to handle huge data items.

Basically, Java collections classes (Set/Map) falls into three core data structures such as Hash Table, Red black Tree and Linked List.

Why Hash Table ?.  
- It is based on arrays
- Uses hashing technique which offers very fast insertion and   searching.
      - If we don’t need to visit items in order, and we can predict in advance the size of our data, 
hash tables are unparalleled in speed and convenience.

Why Red Black Tree ?         

        - It is a balanced tree,  guarantees O(logN) performance whether the input data is ordered or not.
        - Advantage in a red-black tree is that sorted data doesn’t lead to slow O(N) performance.

Why LinkedList ?        

        - Used whenever the amount of data to be stored cannot be predicted in advance or when data will frequently be inserted and deleted.

1) Comparison of HashMap / HashSet / TreeSet / LinkedHashMap:

HashMap/HashSet : Underlying data structure is HashTable.  As the you see, name Hash* - indicates Hash Table data structure.
TreeMap/TreeSet : Underlying data structure is Red black Tree.
LinkedHashMap/LinkedHashSet : Underlying data structure is doubly Linked List and Hash Table.

HashMap/HashSet/LinkedHashMap/LinkedHashSet - all of them have constant time performance for retrieval,insertion and deletion operations. But LinkedHashMap/LinkedHashSet performance is bit lesser than HashTable since it needs to maintain additional linked list.

Except red-black tree (which is non-linear), all the data structures used here are linear type.

Red black Tree
Linked List + HashTable
Retrieval (get)
O(1) - HashMap/HashSetO(log N) -TreeMap/TreeSetO(1) - LinkedHashMap.
Insertion (put)
O(1) - HashMap/HashSetO(log N) - TreeMap/TreeSetO(1) - LinkedHashMap
O(1) - HashMap/HashSetO(log N) - TreeMap/TreeSetO(1) - LinkedHashMap
Parameters affecting performance
Initial Capacity and Load Factor
Initial Capacity and Load Factor
Iteration Order
Does not guarantee the iteration orderSorted according to the natural order
This linked list defines the iteration ordering, which is normally the order in which keys were inserted into the map (insertion-order). Insertion order is not affected if a key is re-inserted into the map.

O(1) - constant time : Better performance.
O(n) - linear time : Worst performance for the large data.
O(log N) - logarithmic time : Average Performance.

2) Comparison ArrayList and LinkedList 

Array list allows random access , so we can grab any element in constant time whereas linked list allows sequential access of elements. You can walk front/back but grabbing element in the middle is proportional to the size of the list.

ArrayList is useful when
- we need random access of elements.
- we don’t need to remove elements inbetween.
- adding elements at amortized time

LinkedList is useful when
- we prefer iteration over the the random access
 - we need efficient removal in between the elements or at start.
Base Data Structure
Dynamically resizing array
Doubly linked list
Retrieval (get)
Insertion (add)
O(1) - on average
O(n) - on worst case
O(1) - Iterator.remove
Mostly useful for
Efficient Retrieval - (by random access)
Efficient Removal
Is NULL permitted
Iterator and ListIterator which are fail-fast*
Iterator and ListIterator which are fail fast*

* Fail-fast :  If the list is structurally modified at any time after the iterator is created, in any way except through the Iterator's own remove or add methods, the iterator will throw a ConcurrentModificationException.