You will never get it in a camera the size of a cell-phone camera. This technology relies on the fact that each patch of their lens sees a different image of the subject; it's like stereo, but with a single lens, and many views rather than two. It fundamentally has to have a large-diameter lens.
You will never get it this small/big/fast/powerful/affordable is what has been said about a lot of industries/technologies in the past.
Question is, will Lytro be able to do it? Will they have the money&perseverance? With their financials they are not likely to move fast enough on this, 5mp equivalent in 2014 is not very impressive. I shelved my Lytro v1 after a couple of days because of the poor image quality, not because of the 'gimmick' disappointing me.
Pelican Imaging will be bringing out similar technology in cell phones this year or next. They're using a camera array rather than a single sensor + MLA.
I am not sure what you mean by each lens, since there is only one lens with this technology. The diameter of the lens has to be comparable to the separation between the two lenses in a stereo system. For meaningful stereo effect at distances we would find interesting (say, a few feet between you and your toddler), this has to be maybe an inch; I don't see a smaller lens being very interesting.
For a typical cell phone, the hyperfocal distance - beyond which everything is in focus with the lens focused at infinity - is maybe 6 feet; you can't get light field information, at all, for anything further than that. And it will be only a tiny bit of information for closer subjects; you can't take shallow depth-of-focus photos with a cell phone, and you can't apply this technology for the same reason. BTW, motion blur is likely a bigger problem for cell phone photos than focus.