Vaibhav Srivastav’s Post

View profile for Vaibhav Srivastav

GPU poor @ Hugging Face

Fuck yeah! MaskGCT - New open SoTA Text to Speech model! 🔥 > Zero-shot voice cloning > Emotional TTS > Trained on 100K hours of data > Long form synthesis > Variable speed synthesis > Bilingual - Chinese & English > Available on Hugging Face Fully non-autoregressive architecture: > Stage 1: Predicts semantic tokens from text, using tokens extracted from a speech self-supervised learning (SSL) model > Stage 2: Predicts acoustic tokens conditioned on the semantic tokens. Synthesised: "Would you guys personally like to have a fake fireplace, an electric one, in your house? Or would you rather have a real fireplace? Let me know down below. Okay everybody, that's all for today's video and I hope you guys learned a bunch of furniture vocabulary!" TTS scene keeps getting lit! 🐐

Refat Ametov

Driving Business Automation & AI Integration | Co-founder of Devstark and SpreadSimple | Stoic Mindset

5mo

This is next level! The combination of zero-shot voice cloning and emotional TTS opens up so many possibilities. Great to see such innovation in the open-source space.

Like
Reply
Ming-Tsung Wu

Director / Application Dept.

5mo

The TTS model is like the last mile of the telephone line in the early days of telecommunication.

Like
Reply
Raj Panchal

Full Stack Engineer | MERN, Next JS, AWS, GCP, Three.js, OpenAI API | Empowering Businesses with advanced solutions and Interactive Web Technologies

5mo

It takes 45 seconds can there be something faster with gpu? The quality doesn't need to be SOTA the speed needs to be

Onkar Susladkar

MLE@yellow.ai | Research Scholar Northwestern University | Teaching and Research Assistant Indian Institute of Technology Roorkee | RA @IIT jodhpur | RA@IISc Banglore

5mo

can someone share the paper or code

This sounds amazing! Excited to see how it transforms TTS 👀

Like
Reply
Shreya Kar

Data Scientist | Senior Consultant at Machine Learning Reply

5mo

Sourabh Zanwar this we can use

cc-by-nc for those who care

Manash Mishra

ML Engineer @ Paisabazaar | Ex RA NLP @ IIT BHU

5mo

Jalem Raj Rohit “Emotional TTS”

See more comments

To view or add a comment, sign in

Explore topics