Q: Speaker diarization quality
Hello. I bough the 1st level of the AppSumo offer. For now I used the tool with 2 meeting recording. The transcription quality is very good, but the speaker diarization is not up to the task. The 1st issue is the miss-identification of speaker. The second is the impossibility to change speaker from part of the block, only of the entire block (the blocks are created by the tool during diarization).
I looked for a way to contact you, but could not find. Could you please share if you have this issue in your radar and if you plan to work on it (it is not on the public roadmap)?

Terry_SubEasy.ai
Nov 8, 2024A: Hi there,
For speaker diarization, performance varies from audio to audio and is greatly affected by recording quality. We may not have room for further optimization since we have already made improvements.
Have you tried any other apps that perform in speaker diarization from uploading audio better than us? Name it and we'll take a look to see if they have a better solution. (We tried MeetGeek but it's not that good)
Perhaps we can have a recording plugin to identify each speaker in the first place(online meeting only)
But you can now set an individual sentence to a speaker by right clicking one subtitle, and click change speaker menu.
It's not very intuitive and we'll change that later.
You can contact me directly via terry@subeasy.ai, and please leave a review(it doesn't have to be positive—any honest review is appreciated)
Hey Terry, thanks a lot for answering my question. I’ve tried many apps, including Transcribe.lol, Otter.ai and TurboScribe. Otter.ai is the best one for diarization, followed closely by TurboScribe. The issue with Otter.ai is that it’s English only, and I’m Brazilian; most of my meetings are in Portuguese.
Good to know, we might check that out later as we really have a lot on our roadmap, and I'm going to add this one onto it too.
So would you please compare us to the Otter.ai and TurboScribe with the same file, how do we perform? As currently we really don't have any clue on optimizing it.
Otter.ai is the best one, with TurboScribe following close. Transcribe.lol have issues on the transcription itself.
Would it be possible to implement a human-in-the-loop process or workflow? You'd take a subset of the recording, run the diarization, and then provide it to the user for feedback. Depending on the changes the user makes, you would then run the entire dialization.
Probably not gonna work in this way. Your feedback on the beginning part of the audio can’t affect the later part.
Will keep an eye on if there're better solutions.