Framework of the poster:

Large Audio LMs for Generation .pdf


Schedule:

Task 2 Dec:

Papers:

Papers on review

Model emergent properties free-form instruction controllability architecture add condition
Fugatto + + Compositional Classifier Guidance Transformer + Flow Matching Text only
AudioBox + + Classifier Guidance Transformer infilling + Flow Matching Text and audio
SpiritLM - - - Transformer + LM audio
UniAudio Transformer + LM
VoiceLDM

The poster:

  1. Immersive understanding
  2. Architectural strategies - how to add audio, speech, text (LLM)
  3. Data preparation strategies