Large Audio LMs for Generation .pdf
Task 2 Dec:
Model | emergent properties | free-form instruction | controllability | architecture | add condition |
---|---|---|---|---|---|
Fugatto | + | + | Compositional Classifier Guidance | Transformer + Flow Matching | Text only |
AudioBox | + | + | Classifier Guidance | Transformer infilling + Flow Matching | Text and audio |
SpiritLM | - | - | - | Transformer + LM | audio |
UniAudio | Transformer + LM | ||||
VoiceLDM |
The poster: