Text will be an important input channel for texture, sound type, voice type and so on. You can't just use input audio, that defeats the point of generating something new. You can't also only use MIDI, it still needs to know what sits behind those notes, what performance, what instrument. So we need multiple channels.