I have the same effect on my machine. My guess is that since the L3 cache is shared between cores space used by processes running on other cores is causing a bit of 'blurring' of the performance on the boundary based on how much will data will fit.
I think it's this too. The L3 cache seems to be consistently harder to measure than the L1 or L2 caches. I suspect it's just noise from other cores, but I have no strong evidence to support that.