Would you be willing to answer what sort of linear algebra is required in the function field sieve that the talk claims is difficult to parallelize? I write open source parallel linear algebra software [1] and would be happy to contribute.
The matrices involved are generally modulo a prime (2 in factorization, a fairly large prime for the discrete logarithm), and very sparse. The latest factorization record, RSA-768, involved finding elements of the nullspace of a 192e6 x 192e6 matrix, with an average of 144 nonzeroes per row.
Interesting. That isn't excessively large for distributed-memory Krylov-subspace eigensolvers (e.g., SLEPc). Do you have a good reference for the current algorithms used? And how large is the nullspace typically?
EDIT: I assume this [1] is what you're referring to, which draws from the following algorithms of Coppersmith [2,3].
Yes, that is correct. Generally the block Lanczos variant used in practice is Montgomery's [1], though, not Coppersmith's.
Lanczos was popular until a few years ago, as it requires slightly less actual matrix-vector multiplications, but since Wiedemann can be partially distributed it is becoming more desirable.
[1] http://libelemental.org