Introducing Whisper
We’ve trained and are open-sourcing a neural net called Whisper that approaches human level robustness and accuracy on English speech recognition.
Whisper examples:
Whisper is an automatic speech recognition (ASR) system trained on 680,000 hours of multilingual and multitask supervised data collected from the web. We show that the use of such a large and diverse dataset leads to improved robustness to accents, background noise and technical language. Moreover, it enables transcription in multiple languages, as well as translation from those languages into English. We are open-sourcing models and inference code to serve as a foundation for building useful applications and for further research on robust speech processing.
The Whisper architecture is a simple end-to-end approach, implemented as an encoder-decoder Transformer. Input audio is split into 30-second chunks, converted into a log-Mel spectrogram, and then passed into an encoder. A decoder is trained to predict the corresponding text caption, intermixed with special tokens that direct the single model to perform tasks such as language identification, phrase-level timestamps, multilingual speech transcription, and to-English speech translation.
Other existing approaches frequently use smaller, more closely paired audio-text training datasets, or use broad but unsupervised audio pretraining. Because Whisper was trained on a large and diverse dataset and was not fine-tuned to any specific one, it does not beat models that specialize in LibriSpeech performance, a famously competitive benchmark in speech recognition. However, when we measure Whisper’s zero-shot performance across many diverse datasets we find it is much more robust and makes 50% fewer errors than those models.
About a third of Whisper’s audio dataset is non-English, and it is alternately given the task of transcribing in the original language or translating to English. We find this approach is particularly effective at learning speech to text translation and outperforms the supervised SOTA on CoVoST2 to English translation zero-shot.
We hope Whisper’s high accuracy and ease of use will allow developers to add voice interfaces to a much wider set of applications. Check out the paper, model card, and code to learn more details and to try out Whisper.
https://news.ycombinator.com/item?id=32927360
https://old.reddit.com/r/singularity/comments/xkao78/introducing_whisper/
Jump in the discussion.
No email address required.
Yesssss I can fwinyawwy twanscwibe my eight gigabytes of Jinxthinker videos
Jump in the discussion.
No email address required.
More options
Context
I wonder how they'll twist this into some existential threat so they don't have to release the model.
Jump in the discussion.
No email address required.
if it detects slurs when theyre not being said people will be angry
Jump in the discussion.
No email address required.
More options
Context
It can understand when people say BIPOC shut it down
Jump in the discussion.
No email address required.
More options
Context
Global Insta-translation of The Art of the Kampf.
Jump in the discussion.
No email address required.
More options
Context
More options
Context
How long untill we get the in ear instant translation?
Jump in the discussion.
No email address required.
God won't allow that research the tower of babel idiot
Secured my spot as a top 100 most memorable rdrama poster
Jump in the discussion.
No email address required.
the tower of Babel is a prophecy you idiot
Jump in the discussion.
No email address required.
i thought it happened idk
Secured my spot as a top 100 most memorable rdrama poster
Jump in the discussion.
No email address required.
it can happen again
Jump in the discussion.
No email address required.
Ya, so if we try to create a monolanguage through technology.. it will happen again. Are we really disagreeing?
Secured my spot as a top 100 most memorable rdrama poster
Jump in the discussion.
No email address required.
sounds like not
Jump in the discussion.
No email address required.
More options
Context
More options
Context
More options
Context
More options
Context
It's was not a prophecy at all. Tower of Babel has many meanings but the myth itself is of a similar type as the Greek Prometheus myth. As a general warning against mans folly and pride. The Twisting of Tongues was essentially a punishment for perverting/misusing the sacred teachings.
Jump in the discussion.
No email address required.
More options
Context
The Internet is the second Tower of Babel. It will be brought down just as the original was, inshallah.
Jump in the discussion.
No email address required.
More options
Context
More options
Context
More options
Context
Jump in the discussion.
No email address required.
More options
Context
More options
Context
Hope it can detect microagressions and mansplaining.
No stopping the march of progress![:marseypussyhat: :marseypussyhat:](/e/marseypussyhat.webp)
Jump in the discussion.
No email address required.
More options
Context
Send; black BIPOCS tongue my anus
Variable: To_everyone
Jump in the discussion.
No email address required.
More options
Context
Wow, first
taking jobs from artists and now taking jobs from stenographers? Shame they don't release this stuff cuz this is great.
Jump in the discussion.
No email address required.
More options
Context
Love how state of the art models for everything language related in the last 2 years have just been "idk, we trained a transformer on a preexisting dataset"
Jump in the discussion.
No email address required.
More options
Context
Snapshots:
https://web.archive.org/https://openai.com/blog/whisper
https://archive.ph/?url=https://openai.com/blog/whisper&run=1 (click to archive)
https://ghostarchive.org/search?term=https://openai.com/blog/whisper (click to archive)
Read Paper:
https://web.archive.org/https://cdn.openai.com/papers/whisper.pdf
https://archive.ph/?url=https://cdn.openai.com/papers/whisper.pdf&run=1 (click to archive)
https://ghostarchive.org/search?term=https://cdn.openai.com/papers/whisper.pdf (click to archive)
View Code:
https://web.archive.org/https://github.com/openai/whisper
https://archive.ph/?url=https://github.com/openai/whisper&run=1 (click to archive)
https://ghostarchive.org/search?term=https://github.com/openai/whisper (click to archive)
View Model Card:
https://web.archive.org/https://github.com/openai/whisper/blob/main/model-card.md
https://archive.ph/?url=https://github.com/openai/whisper/blob/main/model-card.md&run=1 (click to archive)
https://ghostarchive.org/search?term=https://github.com/openai/whisper/blob/main/model-card.md (click to archive)
https://web.archive.org/https://i.rdrama.net/images/1684140403185944.webp
https://archive.ph/?url=https://i.imgur.com/oYv1SeP_d.webp?maxwidth=9999&fidelity=high&run=1 (click to archive)
https://ghostarchive.org/search?term=https://i.imgur.com/oYv1SeP_d.webp?maxwidth=9999&fidelity=high (click to archive)
https://web.archive.org/https://i.rdrama.net/images/16841404035575514.webp
https://archive.ph/?url=https://i.imgur.com/nG64Cot_d.webp?maxwidth=9999&fidelity=high&run=1 (click to archive)
https://ghostarchive.org/search?term=https://i.imgur.com/nG64Cot_d.webp?maxwidth=9999&fidelity=high (click to archive)
https://web.archive.org/https://i.rdrama.net/images/16841404039282484.webp
https://archive.ph/?url=https://i.imgur.com/zTAHp9L_d.webp?maxwidth=9999&fidelity=high&run=1 (click to archive)
https://ghostarchive.org/search?term=https://i.imgur.com/zTAHp9L_d.webp?maxwidth=9999&fidelity=high (click to archive)
https://web.archive.org/https://i.rdrama.net/images/168414040429.webp
https://archive.ph/?url=https://i.imgur.com/JHF4LuO_d.webp?maxwidth=9999&fidelity=high&run=1 (click to archive)
https://ghostarchive.org/search?term=https://i.imgur.com/JHF4LuO_d.webp?maxwidth=9999&fidelity=high (click to archive)
https://web.archive.org/https://i.rdrama.net/images/16841404045972388.webp
https://archive.ph/?url=https://i.imgur.com/sT9h6nx_d.webp?maxwidth=9999&fidelity=high&run=1 (click to archive)
https://ghostarchive.org/search?term=https://i.imgur.com/sT9h6nx_d.webp?maxwidth=9999&fidelity=high (click to archive)
https://web.archive.org/https://i.rdrama.net/images/16841404050380034.webp
https://archive.ph/?url=https://i.imgur.com/wL61Zbf_d.webp?maxwidth=9999&fidelity=high&run=1 (click to archive)
https://ghostarchive.org/search?term=https://i.imgur.com/wL61Zbf_d.webp?maxwidth=9999&fidelity=high (click to archive)
https://news.ycombinator.com/item?id=32927360:
https://web.archive.org/https://news.ycombinator.com/item?id=32927360
https://archive.ph/?url=https://news.ycombinator.com/item?id=32927360&run=1 (click to archive)
https://ghostarchive.org/search?term=https://news.ycombinator.com/item?id=32927360 (click to archive)
https://old.reddit.com/r/singularity/comments/xkao78/introducing_whisper/:
https://undelete.pullpush.io/r/singularity/comments/xkao78/introducing_whisper/
https://web.archive.org/https://old.reddit.com/r/singularity/comments/xkao78/introducing_whisper/
https://archive.ph/?url=https://old.reddit.com/r/singularity/comments/xkao78/introducing_whisper/&run=1 (click to archive)
https://ghostarchive.org/search?term=https://old.reddit.com/r/singularity/comments/xkao78/introducing_whisper/ (click to archive)
Jump in the discussion.
No email address required.
More options
Context