Updates

HyperTTS

2.6

June 27, 2025

OpenAI gpt-4o-mini-tts Model For TTS

The OpenAI gpt-4o-mini-tts model is now supported in HyperTTS. This is a feature contributed by Claus (thank you!). You have to select it in the voice options (the default is still tts-1-hd for OpenAI). This voice model accepts an optional instruction field. You can use it to instruct the model to speak in a certain way, or indicate that the source text is in a particular language. For more details, you can consult the OpenAi reference. As with neural and LLM models, the actual output will really vary with the situation, so you'll have to experiment. Claus' feedback is: GPT-4o mini TTS is the first OpenAI TTS model that provides usable output for my Greek flashcards. The common feedback with OpenAI (and also ElevenLabs) is that non-english output was not that good and suffered from an american accent. Hopefully this new model improves that.

OpenAI gpt-4o-mini-tts model with instructions

So what's next ? People have been asking for Google Gemini TTS. I've been working on this for HyperTTS but there's a serious limitation, google limits requests to 10 per minute, even on Tier 1 paid accounts. This means mass-generating Gemini audio will be a tedious process, and HyperTTS will need to implement some retry logic, which will be welcome anyway, to handle the occasional timeout. Besides that here are the issues that I will tackle in the coming weeks in HyperTTS.

Aside from that, in the coming months, I'd like to make progress on an idea I started before Christmas: generating long-play audio files with Anki flashcard sounds, so that you can review your deck while walking. Kind of like a podcast. I have a working prototype but I need to finish it. I'm also actively thinking about how Language Tools can use LLMs. This is honestly an overdue feature given how AI chatbots have become amazing at translation, but also transliteration (for Chinese, you can confidently ask gpt-4 to convert to Pinyin).

HyperTTS

2.3

April 6, 2025

Google Chirp voices

HyperTTS 2.3.0 is out with an updated voice list. It now supports Google Chirp voices which are supposed to compete in quality with ElevenLabs voices.

HyperTTS

2.1

February 23, 2025

Alibaba Cloud

Alibaba Cloud is now supported. This is a community contribution from zzzerk. Alibaba can be interesting if you're looking for more diversity in Mandarin voices (for example they have child voices). Overall I still think Azure voices are better, but it's always good to have more choice.

Mandarin voices from Alibaba Cloud

Also, keyboard shortcuts are now supported in Easy mode. You have to configure those shortcuts in the HyperTTS Preferences screen:

Configure keyboard shortcuts in the Preferences screen
The keyboard shortcuts can be used for Preview and Add Audio in the Easy Add dialog.

HyperTTS

2

January 25, 2025

Easy Mode

The Easy mode provides a simplified interface to add audio one note at a time.

HyperTTS has a new Easy mode which provides a simplified interface for adding audio one note at a time, either during editing or note creation. A lot of users favor this mode, which is closer to how AwesomeTTS functions. In this mode, audio addition is not automated, you have to do at least two clicks to add a note, but it gives you an opportunity to customize the input text before generating the audio, which is important for Japanese and other languages where the default TTS output may not be perfect. Once you add audio, it will remember the settings and will present the same settings next time for the same Note Type and Deck combination.

When you bring up the Add Audio (Easy) dialog, it will try to get input text from three places:

  1. The text inside your field of choice. The first time, it will pick the field where you've placed the cursor, it will remember your field of choice if you choose a different field.
  2. The text you've selected, if you wish to generate audio for a subset of the field text only (for example a single word).
  3. The clipboard, if you have copied any text.

You can switch between these sources using the radio buttons at the top of the dialog.

The Speaker, Play and Settings icon are used to add audio from the Anki editor toolbar.

As a reminder, the Speaker, Play and Gear icons in the Anki editor toolbar are used to add audio while editing or adding a note. These buttons can also be activated with a keyboard shortcut. With the addition of Easy Mode, the meaning of those buttons change:

Easy Mode

  • Speaker Button: brings up the Add Audio (Easy) dialog. You can then listen to a preview and add the audio if you are satisfied.
  • Play Button: also brings up the Add Audio (Easy) dialog, exactly like the Speaker Button. There is no difference between the two buttons in easy mode.
  • Gear Button: shows you the default preset for the current Note Type and Deck (Preset Rules Dialog). From that screen, you can access the more advanced settings.

Advanced Mode

  • Speaker Button: will apply audio according to your preset rules with a single click.
  • Play Button: will play the audio according to your preset rules, to confirm that the audio output is correct.
  • Gear Button: allows you to configure your preset rules for the current Note Type / Deck combination.

Advanced mode is more complicated, it requires you to setup a preset upfront. But once you've done so, you can add audio with a single click, or with a keyboard shortcut (configurable in Preferences)

The first time you use the HyperTTS buttons in the Anki editor toolbar, you'll be asked to choose Easy or Advanced. You can change this later.

The first time you click either the Speaker or Play buttons, you will be given a choice between Easy and Advanced mode. HyperTTS will remember this setting, but you can change it at a later time by going to the Preset Rules dialog as shown below:

To change the setting, go to the HyperTTS preset rules setting (gear icon in the Anki editor toolbar), and click the Easy Mode checkbox.

Separately, in this new version 2.0.1, HyperTTS uses anonymous usage stats and error reporting. This helps us fix issues very quickly to provide the best experience for everyone. You can disable this functionality in the Preferences dialog (Anki main screen, Tools menu, HyperTTS: Preferences).

HyperTTS

1.13

September 22, 2024

Preset Rules Status

Status of Presets when applying/previewing rules

HyperTTS will now display the status all preset rules when previewing or generating audio. Errors no longer prevent generation of audio for other rules. You can now scroll in the preset rules window. These changes should make life easier for those who have a lot of preset rules.

HyperTTS

1.12

August 11, 2024

Better Multilingual support

HyperTTS 1.12 is out, with better Multilingual voice support.

ElevenLabs, OpenAI, Azure all offer multilingual voices.

More and more TTS services offer multilingual voices these days and this led to a confusing situation in HyperTTS where the same voice appeared under multiple languages. Starting with this version, multilingual voices only appear once, but they can be filtered in the Voice Selection screen according to the languages they support.

Azure Standard voices have been deprecated, and you will have to re-create presets which used those. The standard voices were removed from Azure around six months ago.

A MacOS service has been introduced, which lets you use free MacOSX voices if you are a mac user.

Finally, there's a scrollbar in the voice list, and the dropdown is bigger, to allow you to see more voices.