I think you may have misunderstood how the algorithm works. N is simply the number of items remaining in the working set, so over the course of the algorithm N goes from n to 0.
You might be thinking of when √n - 1 = 16, (i.e. the maximum size of the initial length the pivot is picked from is 16). In that case, n = 17*17, or 289.
Upon first glance it seems that it might perform really well with N = cache_line_size / sizeof(T) so perhaps N = 16 for sorting ints?
What size array would be optimal for N = 16?