▲The unreasonable effectiveness of modern sort algorithmsgithub.com

50 points by Voultapher 3 days ago | 15 comments

bob1029 3 hours ago [-]

I find in practice that if the sorting process is too slow, you should begin thinking about different ways to attack the problem. Maintaining a total global order of things tends to only get more expensive over time as the scope of your idea/product/business expands. The computational complexity of the sort algorithm is irrelevant once we get into memory utilization.

This is why we have things like tournament selection. Randomly sampling from the population and running tournaments is way more scalable than scanning and ordering a global list each iteration. You can maintain things like an ELO score with very narrow views into memory. Nothing needs a global view yet you get global effects.

spiffytech 2 hours ago [-]

Could you give an example of reframing a problem from totally ordered complete data to a sampled tournament? I can imagine cases amenable to sampling, but since sampled data is smaller I'm not sure why I wouldn't just sort it.

Epa095 5 hours ago [-]

Double jaw-drop. First when the (dynamic) match was slower than the hash map, second when sort_unstable was faster than the hash map!

Cool article. It's clear that all my theoretical algorithm-knowledge comes short when faced with real CPUs.

conradludgate 5 hours ago [-]

The efforts of developing better sorting algorithms like driftsort/ipnsort and better hash functions like foldhash make my life as developer so much simpler. No matter how clever I try to be, most often just using foldhash hashmap or a sort_unstable is the fastest option

codegladiator 2 hours ago [-]

Good read. Reminds me of the 1 billion row aggregation challenge, especially the perfect hashing part, all the top entries all used it.

https://github.com/gunnarmorling/1brc

DennisL123 5 hours ago [-]

Efficiency, not effectiveness. They are all effective in the sense that they produce sorted results. Even the non-modern sort algorithms are effective in the sense that the results are correct. This should be about the efficiency with which they do it, right?

JSR_FDED 1 hours ago [-]

From TFA:

The title is an homage to Eugene Wigner's 1960 paper "The Unreasonable Effectiveness of Mathematics in the Natural Sciences".

aabhay 4 hours ago [-]

Agreed. Effectiveness would imply that some algorithms are more likely to sort the list correctly than others, or they sort a higher percentage of elements. Efficiency is about factors external to the correctness

creata 2 hours ago [-]

"The Unreasonable Effectiveness of Mathematics in the Natural Sciences" is one of those titles that gets imitated a lot for some reason. Maybe even more than "Goto Considered Harmful".

kwertyoowiyop 39 minutes ago [-]

Coming next: “What we talk about when we talk about modern sort algorithms”

3 days ago [-]

dvh 3 hours ago [-]

Isn't this just another case of premature optimization? Shouldn't you be adjusting sorting algorithms only when customer complains?

grues-dinner 1 hours ago [-]

I think the article basically had this conclusion. Think twice before optimising here because you may be able to squeeze something out for a very limited scenario but it can have ugly failure modes and it end up being slower in some cases. Plus it takes time and effort. And modern standard sorts are "unreasonably" fast anyway for many practical purposes.

Then again only thinking of fixing things when a customer complains is a way to end up with a leaning tower of hacks which eventually ossify and also the customer (or rather the users, who may not be the customer especially in business software) may be putting up with dozens of niggles and annoyances before they bother to actually report one bug because they can't work around it.

codegladiator 2 hours ago [-]

This is pushing the limits to identify the boundaries

dvh 2 hours ago [-]

Also known as premature optimization. You had to literally invent new dataset just to show there is a difference. You are inventing problems, stop doing that!

dspillett 1 hours ago [-]

> You are inventing problems

Sometimes that is how useful jumps are made. Maybe someone will come along with a problem and the data they have just happens to have similar properties.

Rather than premature optimisation this sort of thing is pre-emptive research - better to do it now than when you hit a performance problem and need the solution PDQ. Many useful things have come out of what started as “I wonder what if …?” playing.

Loading comments...

bob1029 3 hours ago [-]

spiffytech 2 hours ago [-]

Epa095 5 hours ago [-]

Double jaw-drop. First when the (dynamic) match was slower than the hash map, second when sort_unstable was faster than the hash map!

Cool article. It's clear that all my theoretical algorithm-knowledge comes short when faced with real CPUs.

conradludgate 5 hours ago [-]

codegladiator 2 hours ago [-]

Good read. Reminds me of the 1 billion row aggregation challenge, especially the perfect hashing part, all the top entries all used it.

https://github.com/gunnarmorling/1brc

DennisL123 5 hours ago [-]

JSR_FDED 1 hours ago [-]

From TFA:

The title is an homage to Eugene Wigner's 1960 paper "The Unreasonable Effectiveness of Mathematics in the Natural Sciences".

aabhay 4 hours ago [-]

creata 2 hours ago [-]

"The Unreasonable Effectiveness of Mathematics in the Natural Sciences" is one of those titles that gets imitated a lot for some reason. Maybe even more than "Goto Considered Harmful".

kwertyoowiyop 39 minutes ago [-]

Coming next: “What we talk about when we talk about modern sort algorithms”

3 days ago [-]

dvh 3 hours ago [-]

Isn't this just another case of premature optimization? Shouldn't you be adjusting sorting algorithms only when customer complains?

grues-dinner 1 hours ago [-]

codegladiator 2 hours ago [-]

This is pushing the limits to identify the boundaries

dvh 2 hours ago [-]

Also known as premature optimization. You had to literally invent new dataset just to show there is a difference. You are inventing problems, stop doing that!

dspillett 1 hours ago [-]

> You are inventing problems

Sometimes that is how useful jumps are made. Maybe someone will come along with a problem and the data they have just happens to have similar properties.