In part because I’m deciding whether it’s worth it for my kids (7 & 10) to learn to type, I ended up spending some time yesterday and today learning about the state of speech recognition software. I was also motivated by noticing (almost simultaneously) how much faster Google voice search (on an iPhone 4s) returns text than does Siri (on the same phone). As we were discussing the value of typing (e.g. doctors typing notes while talking with patients, or scientists composing papers without disturbing their lab mates), my wife mentioned that Dragon is sometimes used by dictating doctors.
This led me to realize that if I updated my MacBook Pro from Lion to Mavericks (for free) that I’d gain access to a built-in speech recognition engine — one that is apparently available from within any OSX application. Side-by-side tests suggested that Dragon for OSX, though more expensive, ($160 in late 2013), was better for long and technical narrations than the Mavericks dictation. However, the idea that intrigued me was being able to dictate portions of my daily science writing (like blog posts or paper paragraphs) on my laptop in the same way that I use Siri to generate (at least drafts of) texts and short emails on my phone.
After upgrading OSX (to 10.9) I found the Dictation and Speech pane in system preferences. I initiated the 785 Mb download that enables offline dictation (in near-real time, not in laggy 30-second chunks).
OSX 10.9 (Mavericks) dictation setup
15 minutes later I was able to run some tests in the software that currently underlies my scientific workflows: writing e-mails in Thunderbird; searching in Google Scholar; editing metadata in Zotero; writing blog posts in WordPress; composing papers in Libreoffice; posting updates to Facebook, Google+, LinkedIn, and/or Twitter. [The rest of this paragraph was dictated, and then edited.] Dictating works well in Thunderbird even for addresses in subject lines. Dictating a name into Google scholar is much faster than typing it but until my contacts are imported there are a lot of spelling mistakes. Dictating in Zotero isn’t very useful because most fields are short phrases or single word tags. Speech recognition really shines in blog posts and social media updates because the tone can be as informal as a conversation. Dictating the text of an academic paper doesn’t work so well because it’s hard to talk with the requisite clarity and formal tone.
This is what the OSX Maverick’s dictation system delivered before I edited the end of the preceding paragraph:
Dictating works well in Thunderbird you can do for addresses in subject lines. Dictating a name into Google scholar is much faster than typing it but until my contacts are imported there are a lot of spelling mistakes. Dictating in zero Taro isn’t very useful because most fields are short phrases or single word tags. Speech recognition really shines in blog posts and social media updates because the tone can be is in for mall as a conversation. Dictating the text of an academic paper doesn’t work so well because it’s hard to talk with the requisite clarity and for multi.
Along the way, I learned a little about the sad story of smart scientists losing out to avaricious and incompetent business people, and having their intellectual property end up (most probably) behind this amazing Apple software, and definitely within Dragon. As an open source advocate, I’d strongly recommend boycotting Dragon and going with any other solution that isn’t such a shameful rip-off of the inventors.