Large Audio LMs for Generation .pdf
Task 2 Dec:
| Model | emergent properties | free-form instruction | controllability | architecture | add condition |
|---|---|---|---|---|---|
| Fugatto | + | + | Compositional Classifier Guidance | Transformer + Flow Matching | Text only |
| AudioBox | + | + | Classifier Guidance | Transformer infilling + Flow Matching | Text and audio |
| SpiritLM | - | - | - | Transformer + LM | audio |
| UniAudio | Transformer + LM | ||||
| VoiceLDM |
The poster: