Unified speech and gesture synthesis using flow matching