par-bin-sort

Parallel Binary Sorter [RESULTS]

  

In the below you can see for sorting 128 million keys using 2 processors took a Wall clock time of 15 min and 0.42 sec and it keeps reducing as we increase the processors. I also report the user time (time spend by processor scanning the keys and putting into buckets) , this is reported in Ticks (on SGI 1tick = 1ms). The communication cost in terms of ticks is reported in column COMM for example when N=2 the COMM = 5560 that means its 5.560 sec.

=======================================
KEYS=134217728 (128 million), each 16-bytes
---------------------------------------
   N   USER COMM USER.COMM WALL.CLOCK
1  2 770922 5560    776482   15:00.42
2  4 471196 4272    475468    8:29.62
3  8 314281 3456    317737    5:17.65
4 16 135562 2148    137710    2:21.83
5 32  82414  950     83364    1:28.87
6 64  75160  559     75719    1:21.54
---------------------------------------
========================================
KEYS=67108864 (64 million),each 16-bytes
----------------------------------------
   N      U    C    C.U    WALL
1  2 340653 2348 343001 6:18.77
2  4 240861 1931 242792 4:14.57
3  8 153782 1781 155563 2:39.18
4 16  64408  804  65212 1:10.91
5 32  46659  486  47145 0:53.32
6 64  61618  369  61987 1:08.19

----------------------------------------
====================================
KEYS=33554432 (32 million), each 16-bytes
------------------------------------
   N      U    C    C.U    WALL
1  2 148070 1219 148070 2:42.66
2  4  99067  677  99744 1:48.60
3  8  47319  322  47641 0:55.41
4 16  17936  135  18071 0:25.64
5 32   9973  191  10164 0:17.55
6 64  10330   58  10388 0:28.04
-----------------------------------