Large Audio Language Models (topic)

Task 2 Dec:

For the first meeting, you'll need to find only two papers per person (since we'll meet on Mon instead of Wed, you'll have less time compared to other groups meeting on Wed, so you only need to find the first two papers for now).
December 6
- 4 papers descriptions

Model	emergent properties	free-form instruction	controllability	architecture	add condition
Fugatto	+	+	Compositional Classifier Guidance	Transformer + Flow Matching	Text only
AudioBox	+	+	Classifier Guidance	Transformer infilling + Flow Matching	Text and audio
SpiritLM	-	-	-	Transformer + LM	audio
UniAudio				Transformer + LM
VoiceLDM

The poster: