Sub-Linear Time Algorithms
- OnoSureiya
- Apr 13, 2020
- 2 min read
Updated: Apr 14, 2020
As the amount and sheer size of data collected and available for analysis increases, simple algorithms become much more complicated. Managing and analyzing such large datasets forces us to reconsider the conventional efficient algorithms. Some of these datasets are too extensive to compute and process in linear time, therefore it is essential that we at least develop algorithms whose running times are not only polynomial but in fact, are sublinear in n.
One of the main features of SubLinear Algorithms is that they only process a fraction of the given input. In this article, we will take the case of the sublinear search algorithm. However, creating a sublinear algorithm is pretty difficult and hence "pseudo-sublinear-time" algorithms are used instead. In these algorithms, some pre-processing is required before processing from the model/algorithm. It is therefore referred to as a "pseudo-sublinear-time" algorithm.
Here is a link to the GitHub Repository holding implementation of below code: https://github.com/AmanPriyanshu/Sublinear-Algorithms/blob/master/sub_linear_search/sub_linear_search.py
Sub-Linear Search Algorithm:
Let us take the example of a simple search algorithm, we have to make sure the list is preprocessed, however. In this case the preprocessing consist of sorting the list in increasing order. Now let us begin with the understanding that, a normal linear search to check whether an element x exists in a input_list and return a boolean output.
Let us take the example of a simple linear search which has (Big O Notation) O(n), where n is the length of the input_list array. This can be implemented using the code:

After having developed the classical linear search, let us work on the sublinear search algorithm. So to make the time complexity O(√ n), we first take a sample S uniformly at random from the input_list. Next, we iterate through all the elements in S find two elements p and q, such that (i)p ≤ x < q, and (ii)there is no element in S that is between p and q. Therefore among S, they are the closest to x. Now that we have determined the values of p and q, we will once again iterate through the original list from the index of p to the index of q. During iteration, we check whether x is realised, if yes then we return boolean value True or else False.
Following is the implementation of the above-described code:

Now it is obvious that linear search will never give wrong output, whereas on the other hand based on selection S the sublinear_search may not give an accurate result. Following are some of the outputs we get when on different number of iterations, every value of x is to be present in the input_list in these examples, however, this will be used to check the accuracy and develop an understanding of the above code.
Iterations: 100

We can clearly see it has an accuracy of 98%
Iterations: 1000

Here, it has an accuracy of 92.8%
Iterations: 10000

It has an accuracy of 93.56%
Iterations: 100000

Here, we have an accuracy of 93.126%
Conclusion
From the above outputs, we can clearly see accuracy of 90%+ with a much lower Big-O Time Complexity. Linear search which has O(n) whereas sublinear search which has O(√n). As data size and complexity increase, the need and understanding of sublinear algorithms will continue to grow. I believe it will pave a different path in Data Analytics.
Comments