Loading…
Tuesday November 5, 2024 4:00pm - 4:50pm PST
Zain Hasan, Together AI, Senior ML DevRel Engineer

Many real-world problems are inherently multimodal, from the communicative modalities humans use such as spoken language and gestures to the force, sensory, and visual sensors used in robotics. For machine learning models to address these problems and interact more naturally and wholistically with the world around them and ultimately be more general and powerful reasoning engines, we need them to understand data across all of its corresponding images, video, text, audio, and tactile representations.

In this talk, Zain Hasan will discuss how we can use open-source multimodal embedding models in conjunction with large generative multimodal models that can that can see, hear, read, and feel data(!), to perform cross-modal search(searching audio with images, videos with text etc.) and multimodal retrieval augmented generation (MM-RAG) at the billion-object scale with the help of open source vector databases. I will also demonstrate, with live code demos, how being able to perform this cross-modal retrieval in real-time can enables users to use LLMs that can reason over their enterprise multimodal data. This talk will revolve around how we can scale the usage of multimodal embedding and generative models in production.
Speakers
avatar for Zain Hasan

Zain Hasan

Senior ML DevRel Engineer, Together AI
Zain Hasan is a Senior ML DevRel Engineer at Together AI an open-source vector database. He is an engineer and data scientist by training, who pursued his undergraduate and graduate work at the University of Toronto building artificially intelligent assistive technologies. He then... Read More →
Tuesday November 5, 2024 4:00pm - 4:50pm PST
API World -- Workshop Stage C

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Share Modal

Share this link via

Or copy link