Going through my older blog posts, as I am at present, I recently came across two posts about using Voice Recognition Software. The first was Here be Dragons (from 2012). The second I called – with startling originality – Here be more Dragons (2013). Finding these articles inspires me to write something more on the same subject – and come up with a better title.
Dragon Naturally Speaking
The software in question, from Nuance Communications, is Dragon Naturally Speaking (DNS). Hence the draconian theme.
In both the previous articles I was enthusiastic for the versions of Dragon that I was using (11 and 12 respectively). At the same time I was irritated with them for falling short of my vision of what Voice Recognition Software should be.
One thing I complained about was the way Dragon required me to dictate. Dictation, say the instructions, should be “in full sentences” and “in clear, natural speech”. When you think you are speaking clearly and naturally, it’s immensely frustrating to watch the software mangling your words on the screen. It’s also very distracting.
Because I couldn’t trust the software to transcribe what I was saying accurately, I had to dictate while looking at the screen. But because I was then seeing the mistakes come up and constantly having to correct them, it made using the software for creative writing very challenging. Too challenging really.
But here I am, three years further down the line and still using DNS. Version 14 now. Has anything changed?
What’s changed
Well, Dragon still falls short. But – and it’s a big but – it’s also a good deal faster to work with it than not.
I use it regularly in two ways. First, in my translating, and this is where I’ve always found it most useful.
When I translate I have the Swedish text I’m working on in front of me on screen. The microphone is on and Dragon is fired up. I read a Swedish sentence to myself and make the translation in my head. Then I dictate the full sentence into the microphone and watch DNS put it up on the screen. I won’t say it doesn’t make mistakes. I won’t say it doesn’t sometimes drive me to distraction. However, as I’ve recently had to work a couple of days without it, I know that using it speeds up my production considerably. (And – as I’ve observed before – it’s generally better at spelling than I am.)
Writing creatively
My second use for Dragon is where it gets exciting. For the last six months or so I’ve also been using Dragon to write creatively.
If you read the earlier articles, you’ll see that I was complaining that my creative process isn’t geared to dictating full sentences in clear natural speech. It still isn’t. I fumble after words, I try out different approaches. I pick my way from one word to the next as if I’m crossing a bog in the wilderness, testing one step at a time and hoping I won’t suddenly sink. This is pretty much what I am doing at the moment.
This is not a dictation style that is conducive to getting Dragon to understand me. And as I’ve said, having to watch and correct the words as they appear on screen sets up an extra barrier to creativity.
If only there was a way to separate the dictating from the transcribing!
Of course there is a way. Buy a Dictaphone. But the price of digital Dictaphones that play nice with DNS is prohibitive.
I’ve tried using my regular recording device, but then I have to process the recording in several stages. DNS will only transcribe a mono recording, so first I have to convert my recording from stereo to mono. Then I have to boost the volume of the recording to make it loud enough for Dragon to hear. Finally, the recording is full – the device picks up not only my words, but also my silences – so it makes sense to edit out the silences. As you can imagine, all that takes time and trouble. Too much time and trouble.
Dictate+Connect
But I’m clearly not the only person facing these problems and now I’ve found a solution. It was recommended on a DNS forum I follow. A smart phone app called Dictate+Connect. It’s a paid for app, but $10 is a great deal less than €2000+ for a bottom range Dictaphone.
The app lets me dictate as slowly as I like, but cuts out all the silence. The recording is saved on the phone in mono and at an acceptable volume. I can upload it to my computer via wifi and then pass it through Dragon’s transcription function without any further processing.
The transcription seems to work much better with the phone’s recording, perhaps because all my words are stitched together into a more continuous whole. I can also dictate some basic instructions like punctuation marks, line breaks and paragraph breaks which Dragon interprets correctly. (Mostly.)
The only extra thing I have to keep track of while I’m dictating is the length of the recorded file. This isn’t because of any limitation in the phone or software that I’ve yet discovered. It’s just for my own sanity. By now I know that roughly 10 minutes of recorded dictation is equivalent to roughly 1300 words of text. That’s a good couple of hours of post-transcription work.
Rough draft
I treat the transcription as a rough draft. The second stage of writing using a transcribed recording, is taking the rough draft and turning it into a polished draft – or perhaps a polished blog article. That’s what takes the time. But it’s time I feel is well spent.
I know it’s not for everyone, but it suits me. I’ve been trying to find a way to capture ideas for writing for years. Up until recently the most effective solution has been with a pen and a notebook, and that’s not going to change. (Transcribing recordings made with a lot of background noise is much less effective.) But as my blogs about Voice Recognition Software show, I’ve been fumbling for years after a more direct solution. I think I may have found it.
This text was recorded as an initial draft on my phone on Monday morning. The recording took about 45 minutes to make, but was about 11 minutes in length when I uploaded it. Transcribed, the text ran to about 1200 words. I polished the transcription on Monday afternoon, which took about 1½ hours. I re-wrote the draft, on Wednesday morning – another hour. The final text is just under 1000 words (not counting this paragraph).
The illustrations are a little dragon sculpted in red leather by Cili Ivanovski that I photographed in Angered Library, Gothenburg recently.
I wrote this article for the #Blogg52 challenge.
Wow! Jag har försökt att läsa in text när jag kör långa sträckor med bil, men jag pratar inte som jag skriver, så det fungerar inte för min del. Intressant att läsa om hur envis du är, att du inte ger upp, för det tar så mycket längre tid att gå den väg du har gått för att få texten ned i datorn. Men visst är det skönt att bara låta rösten göra jobbet, även om det blir efterarbete.
Kram Kim 🙂
Det tar tid, det är sant Kim, men jag hoppas att det kommer att spara tid så småningom.
Att diktera under körning låter särskilt svårt till mig – inte minst eftersom båda kräver uppmärksamhet. Jag misstänker att det är endast möjligt att diktera samtidigt som du gör något annat när du inte behöver tänka på den andra saken. Det är samma problem jag har när jag diktera in i datorskärmen. Jag försöker att skapa och redigera på samma gång, och jag klar inte av det. Det låter som en bättre användning för din tid när du kör att lyssna på en ljudbok, som du har beskrivit någon annanstans.
Kram.
🙂