Setting up presentation, aides, preparation: 1 minute.
Audience: age 18-20, beginner to medium knowledge of algorithms
Operational goals:
•get people interested in the issue of working with slow internet connections
•explain my approach
•find out about their opinion
- introductory talk
slide 1 – 15 seconds
Hello, my name is Dan Cojocaru and this presentation is about sorting methods in the context of lower end hardware and software.
Slide 2 – 2 30 minues
I am talking more generally about sorting methods, not sorting algorithms, because the algorithms are more or less well known by everybody. However, the way we apply them in order to solve problems can be different.
As more and more of the world is getting connected to the internet, we as developers, programmers, computer scientists have a duty to provide it with quality services. We must remain aware of the constraints that we ourselves do not necessarily have and work based on them.
I have therefore decided to research what sorting methods could be used in order to optimise for the delay that people might encounter while the data is being transmitted to devices.
slide 3 – 5 seconds
on slide 4: definition of online sorting algorithm – 20 seconds
An online sorting algorithm is one that can process its input piece by piece.
That is, such algorithm doesn’t have to wait for all the input data to arrive before it can start processing.
Such an algorithm for sorting is Insertion Sort!
on slide 5: explanation of insertion sort being an online algorithm – 50 seconds
On most common implementations of insertion sort, the single array is treated as two. One unsorted input array and one sorted resulting array. One item at a time is taken from the input array and inserted at the right place into the resulting array. At the end, the only array remaining is the sorted one.
As such, this algorithm performs on one item at a time. That is an online algorithm!
So, considering insertion sort is an online algorithm and it starts processing as soon as it starts receiving items, it should perform faster in slow internet speeds, right?
Wrong.
slide 6 – 40 seconds
The results of my testing show that even in slower internet speeds, the advantage is nearly non existant for lists with few items, and insertion sort still performs slower than other, generally faster algorithms for large inputs.
When I first found this result, I was rather disappointed. But then I though: there surely must be another way, right?
And there is!
slide 7 – 5 seconds
Chunking! Splitting the input into small chunks and handling them individually.
slide 8 – 45 seconds
I have chosen to implement chunking by sorting each individual chunk when available and then merging them together using the same method of merging used in merge sort. This allows the usage of any sorting algorithm to sort the individual chunks, allowing optimisations based on the kind of input that is sorted.
There are probably different ways to chunk the data in order to handle it faster, and those are things that can be further explored.
What is certain is that chunking combined with a fast sorting algorithm rendered significant improvements to the performance of the algorithms.
Slide 9 – 1 minute 10 seconds
I came up with 3 different approaches to chunking.
The first approach is what I call „constant chunking”. We choose a constant number and each time we receive enough items such that the container for unsorted items contains the chosen constant number of elements, we take that as a chunk and process it.
The second approach I tried is „exponential chunking”. We choose a constant number and we store a target size, initially 1. Each time we receive enough items such that the unsorted items container size reach the target size, we process the chunk and we multiply the target size by the constant chosen.
The third approach is „equal chunking”. When the amount of unsorted items is equal to the amount of sorted items, the unsorted items are processed as a chunk.
slide 10 – 1 minute
How about performance? I’m not sure.
Based on my tests, I am certain that the method of chunking the data into multiple smaller parts can significantly improve the performance of sorting.
However, I am uncertain about which method of chunking to choose, if there are more that would be better than the three I though of, if some constants for some methods of chunking would yield better performance in some contexts than others, if unoptimal chunking methods can impact the performance as much as to negate the supposed benefit against a regular approach or if performance is severely affected when transfer speeds are high.
These things are being left for further research.
slide 11 – 5 seconds
Conclusions. Also known as: „Pfew, this presentation is finally over...”
slide 12 – 35 seconds
Firstly, coming back to the start of the presentation, think about constraints that aren’t obvious at first. A statement that can summarise this idea best is „Not everybody has fast internet”.
Secondly, in the traditional „divide et impera” (or divide and conquer) approach, dividing a problem into smaller chunks seems to give significant performance improvements.
And finally, if you’re not bored, if this presentation caught your attention and if you want to experiment further, you’re welcome to!
slide 13 – 30 seconds
You can contact me on my email address!
I will also post these slides, the notes I took to prepare for this presentation and the source code on my website. The source code will appear at a later time (probably after the exams), since it’s still not really ready for a wide audience. I will do some cleanup, some documentation and then release it with an MIT license most likely.
Soooooo… Questions?
1https://www.fcc.gov/reports-research/reports/broadband-progress-reports/2016-broadband-progress-report
2https://www.akamai.com/us/en/multimedia/documents/state-of-the-internet/q1-2017-state-of-the-internet-connectivity-report.pdf