What do you know About the big-O notation and can you give some Examples with Respect to Different Data Structures

The Big-O notation simply describes how well an algorithm scales or performs in the worst case scenario as the number of elements in a data structure increases.  The Big-O notation can also be used to describe other behavior such as memory consumption. At times you may need to choose a slower algorithm because it also consumes less memory. Big-o notation can give a good indication about performance for large amounts of data, but the only real way to know for sure is to have a performance benchmark with large data sets to take into account things that are not considered in Big-O notation like paging as virtual memory usage grows, etc. Although benchmarks are better, they aren’t feasible during the design process, so Big-O complexity analysis is the choice.

The algorithms used by various data structures for different operations like search, insert and delete fall into the following performance groups like constant-time O(1), linear O(n), logarithmic O (log n), exponential O (c to the power n), polynomial O(n to the power c), quadratic O (n to the power 2) and factorial O (n!) where n is the number of elements in the data structure. It is generally a tradeoff between performance and memory usage. Here are some examples.

Example 1: Finding an element in a HashMap is usually a constant-time, which is  O(1) . This is a constant time because a hashing function is used to find an element, and computing a hash value does not depend on the number of elements in the HashMap.

Example 2:  Linear search of an array, list, and LinkedList is linear, which is O(n). This is linear because you will have to search the entire list. This means that if a list is twice as big, searching it will take twice as long.

Example 3: An algorithm that needs to compare every element in an array to sort the array has polynomial complexity, which is O (n2). A nested for loop is O (n2).  An example is shown under sorting algorithms.

Example 4: Binary search of a sorted array or ArrayList is logarithmic, which is O(log n). Searching an element in a LinkedList normally requires O(n). This is one of the disadvantages of LinkedList over the other data structures like an ArrayList or array offering a O (log n) performance, which offers better performance than O(n)  as the number of elements increases. A logarithmic running times mean, if 10 items take at most x amount of time, 100 items will take say at most 2x amount of time, and 10,000 items will take at most  4x. If you plot this on a graph, the time decreases as  n (i.e. number of items) increases.

What can you tell about the performance of a HashMap compared to a TreeMap? Which one would you prefer?

A balanced tree does have O (log n) performance. The TreeMap class in Java maintains key/value objects in a sorted order by using a red-black tree. A red-black tree is a balanced binary tree. Keeping the binary tree balanced ensures the fast insertion, removal, and look-up time of O (log n). This is not as fast as a HashMap, which is O(1) ,  but the TreeMaphas the advantage of that the keys are in sorted order which opens up a lot of other capabilities.

 

Which one to choose?

The decision as to using an unordered collection like a HashSet  or HasMap versus   using a sorted data structure like aTreeSet  or TreeMap depends mainly on the usage pattern, and to some extent on the data size and the environment you run it on.   The practical reason for keeping the elements in sorted order is for frequent and faster retrieval of sorted data if the inserts and updates are frequent. If the need for a sorted result is infrequent like prior to producing a report or running a batch process, then maintaining an unordered collection and sorting them only when it is really required with Collections.sort(…) could sometimes be more efficient than maintaining the ordered elements. This is only an opinion, and no one can offer you a correct answer. Even the complexity theories like Big-O notation like O(n) assume possibly large values of n.  In practice, a O(n) algorithm can be much faster than a O(log n) algorithm, provided the data set that is handled is sufficiently small. One algorithm might perform better on an AMD processor than on an Intel. If your system is set up to swap, disk performance need to be considered. The only way to confirm the efficient usage is to test and measure both performance and memory usage with the right data size. Measure both the approaches on your chosen hardware to determine, which is more appropriate.