TechByter Worldwide

Listen to the Podcast


4 Feb 2022 - Podcast #778 - (21:18)

It's Like NPR on the Web

If you find the information TechByter Worldwide provides useful or interesting, please consider a contribution.

PayPal

Subscribe

4 Feb 2022

Speech Recognition — Who Does It Best?

There's no question that speech recognition has improved since the 1980s when Dragon Systems introduced Dragon Dictate for DOS. It's still not ready to be used indiscriminately without review, but it's becoming more reliable and much easier to use.

In those early days, speech recognition required • users • to • train • the • system • with • a • complex • exercise • and • then • to • leave • spaces • between • words, • like • this.

 Click any small image for a full-size view. To dismiss the larger image, press ESC or tap outside the image.

TechByter ImageJames and Janet Baker started working on speech recognition in 1975. Dragon Systems was founded in 1982 and released Dragon Dictate. The application used a probabilistic method for pattern recognition. One serious problem was the hardware Dragon Dictate ran on. There simply wasn't enough processing power for the speech recognition to work well. That's the primary reason for requiring users to • enunciate • clearly • and • leave • pauses • between • words. That requirement was relaxed in 1997 with the introduction of Dragon NaturallySpeaking. Users still had to train the system by reading known text.

Today even an entry-level smart phone has far more powerful than the beefiest 1982 desktop system, and speech recognition has arrived. There's no longer a need to train the system. Just start talking to it and it starts transcribing your words. The systems improve over time as they learn each person's speech patterns.

I've been reading Neverwhere, a book by Neil Gaiman, based on the BBC miniseries he wrote in 1996. Yes, it was a TV program first and then a novel. It was an ebook that I was reading on an IPad. To keep track of the plot twists, I wrote some notes in Google docs — sometimes typing and sometimes dictating. But I also use speech recognition frequently for short messages on an Android phone and I've been tinkering with it on the tablet that's running Windows 11.

It seemed to me that Android is the best of the three systems, that Microsoft has done a remarkable job with Windows 11, and the the IPadOS is lagging a bit. But more definitive testing was called for because voice recognition systems still get things wrong. Probabilistic analysis all but assures that the system will know if the speaker says "I believe that I'll take my book and sit on the ..." that "potato" will not be the next word even if the speaker mumbles "patio". But errors occur, and all three systems made errors in my brief test.

Although some speech recognition systems can intuitively insert punctuation, it's still better to say the punctuation marks. If the Android system realizes that the user is asking a question, it will place a question mark properly. Sometimes it recognizes the end of a sentence and places a period. Users still need to tell the system when to start a new line or a new paragraph, though.

I tried a test involving the text shown in the box. Attempting to avoid variances between devices, I recorded the text and played it back to each device; one recording without stated punctuation and one with; both recordings made at normal speaking speed.

That was a disaster. Generally the results of trying to capture audio by placing a microphone near a speaker are not good. So I spoke the text to each device, once with stated punctuation and once without, and aiming for consistency from Android to Apple to Windows.

Spoken Text For Testing

I've been wondering which speech recognition system does the best job, the one used by Android devices, the one installed on iPadOS systems, or the one Microsoft uses with Windows 11. So which system is better? Maybe this little two-sentence test will provide some insight.

The Marquis de Carabas is an important character in Puss in Boots, the children's story, but he is also a character in Neil Gaiman's Neverwhere.

Their books are over there and they're going to pick them up later.


You may wonder about my pronunciation of "marquis". I've always pronounced it "mär-ˈkē", but the Mirriam-Webster dictionary shows "ˈmär-kwəs" as the preferred pronunciation. Both pronunciations were used in the BBC miniseries. The book addressed the issue, too. "He was never sure, not then and not later, how you pronounced Marquis de Carabas.," Gaiman wrote about the Marquis. "Some days he said it one way, some days the other." I found that the IPad's speech recognition system invariably wrote "marquee" (a theater sign) when I said "mär-ˈkē" and "marquis" when I said "ˈmär-kwəs". So I'm now trying to forget decades of saying the word one way and pronouncing it the other way.

All three systems produced substandard text if I neglected to call out punctuation and to specify new paragraphs, so each of the three examples is based on the text being spoken carefully and with clear punctuation and paragraph breaks explicitly called out.

Take a look at the results. I've included some notes about what I saw.

Speech recognition can be used to create a good first draft, and the systems become better over time as they learn about your speech patterns and the words you use. It's faster than using the keyboard on your phone. There are mistakes you'll need to correct. I guess we could call them "voiceos" instead of typos.

Then again, maybe not.

This report discusses the performance of built-in speech recognition systems and does not address purpose-built speech recognition applications. If your needs are beyond the capabilities provided by these systems, see TechRadar's report, listings on Capterra, or reviews by CRM on full-featured voice processing systems.

Short Circuits

How The Pandemic Affects Creatives

Without television programs, podcasts, and other materials that entertain and inform us, the past two years would have been even more difficult. Adobe is in a good position to observe work by creatives and says that 2022 is starting with optimism and defiance.

Certain companies are well placed to observe specific trends: ADP's monthly employment reports provide insights into business activities because ADP is one of the two largest payroll processing and human resource management companies in the United States. Adobe is in a similar position with creatives, not because the company handles payroll or HR activities, but because it creates applications that are used to create newspapers, magazines, books, advertisements, radio programs and podcasts, motion pictures and television programs, and various audio programs. Adobe also operates a stock image service and Behance, where creatives can share their work.

Writing on Adobe's blog, Brenda Mills, the Principal of Creative Services and Visual Trends for Adobe Stock, recently described visual trends, design trends, and motion trends that are emerging at the beginning of the third year of the covid era.

We have entered a new normal period and Mills says that, instead of subdued messaging that was common in 2020 and 2021, key themes have evolved to stress comfort, connection, and self-care. She also notes a growing confidence and acceptance of changes "with companies in nearly every sector embracing remote or hybrid work and the digital transformation that only accelerates with each passing season." Perhaps this is art imitating life or life imitating art. But might it be life imitating art imitating life?

We live so much of our lives in our homes now, with interactions online, so people are hungry for optimism, fun, whimsy, and play. And Mills says that we are all even hungrier for authentic, meaningful connections and we are determined to protect the wellbeing of ourselves and loved ones. An analysis of search requests in Adobe Stock reflect these desires.

 Click any small image for a full-size view. To dismiss the larger image, press ESC or tap outside the image.

TechByter ImageCredit: Left: Adobe Stock / MiriamDraws, Right: Adobe Stock / Gerardo.

In visual trends, Mills notes an increasing power of playfulness that reflects comfort, energy and joy. She cites commercial projects that engage consumers and encourage them to stay positive, find the small joys in life, and keep keeping on despite the state of flux that has remained through the pandemic. You may have noticed this in some television commercials that made you smile. There are still lots of messages that are easy to tune out, but there are also messages that are entertaining even after we've seen them several times. Mills also says the pandemic has caused people to struggle with increased stress, depression, and burn out. Visual trends that bring attention to the importance of mental health stand out.

Concerns about the environment and climate change are at an all-time high, Mills says. Unless you've been off the planet or sleeping for the past year, you've noticed that floods and fires are becoming commonplace, even in places where they're usually rare. Mills says "The visual trend we’ve dubbed Prioritize Our Planet is more than just a continuation of the move toward greater sustainability." The new trend shows that there is a greater awareness and sophistication around environmental topics. This has pushed major brands to emphasize their alignment with values that are consistent with efforts to reverse climate change.

TechByter ImageCredit: Left: Adobe Stock / MiriamDraws, Right: Adobe Stock / Gerardo.

Design trends center around what Mills calls soft pop and new naturalism. Soft Pop is a trend defined by fun, pliable forms and examining our relationship to objects in 3D cartooning, character narrative, and what she refers to as squishy appearances. New Naturalism aesthetics are defined by their clean modernism and emphasis of the organic.

Another trend involves otherworldly visions. Maybe this is why television programs such as Doctor Who are so popular. "Otherworldly Visions draws from a slightly cynical yet imaginative and progressive alternative reality," according to Mills. The trend inherits high-tech elements from cyberpunk and sci-fi aesthetics, she says, yet takes them to an entirely new place with lush, surreal gradients, 3D surfaces, and textures.

TechByter ImageCredit: Adobe Stock / Maskot

Trends in motion graphics are affected by what Mills calls a Metaverse Mix, which has nothing to do with the name of Facebook's new parent company. Still, the term metaverse is trending as creatives attempt to navigate and describe a digital world that exists beyond the partly-digital and partly-analog world in which we live now. Mills says there's an emphasis on movement to represent physical and emotional connections between people. Brands have increased the use of rhythm, dancing, and movement in their advertising to demonstrate connections between people.  

Mills says text, captions, or subtitles are an absolute must for social content in 2022. Why? Because most social video is viewed without sound, text overlays have become an essential part of creating successful video campaigns for channels like Instagram, TikTok, and Facebook. Mills notes that accessibility for deaf people has become a higher mainstream priority.

To read the full report by Brenda Mills, visit the Adobe blog.

Liker Tries To Beat Inertia As Tribel

The former Liker social media site is back from the dead with new security and a new name.

Founded in 2018, Liker went silent in March 2021 after being hacked, according to the founders, by fans of former president Trump. A message on the site said it would return in a few days. That was later change to "four to eight weeks". Although the new Liker, now called Tribel, has been in beta testing for a while, it didn't actually return until the first week of January — approximately 35 weeks.

 Click any small image for a full-size view. To dismiss the larger image, press ESC or tap outside the image.

TechByter ImageThe new platform attempts to be like Facebook, like Twitter, and like Reddit. To use it like Facebook, sign up, invite friends from Facebook, add them as friends on Tribel, and you'll see their posts in your Friends feed. To use Tribel like Twitter, filter the feed by topic categories, which are similar to Twitter hashtags. If you want Tribel to work like Reddit, consider audience categories as Reddit subreddits, then choose an audience category for posts.

Tribel is considerably different from Facebook, Twitter, and Reddit though. So different that there's a special page that describes how Tribel works.

The company selected 15,000 Liker members to participate in a soft launch of the new service. When Liker went down, there were about 465,000 members. That compares to Facebook's nearly 3,000,000,000 members. Then, around the first week of January, the service was opened to everyone. It's unclear how many former Liker users are now Tribel subscribers.

It's worth noting that Tribel was founded by Omar and Rafael Rivero, who also operate the left-leaning Occupy Democrats news site. The Media Bias/Fact Check website rates Occupy Democrats as extreme left leaning and says the content is not particularly factual, which is approximately the same rating Breitbart gets in the right-leaning part of the spectrum.

The terms of service are explicit in stating that users are responsible for everything they post, that what they post must be "appropriate in tone and not intended to harm others", and that their posts comply with all applicable laws. Violating the terms can have serious consequences. Specifically:

Tribel is entitled, at its sole discretion, to suspend, restrict, or terminate any account "without notice of any kind." Users are solely responsible for the veracity and accuracy of all posts, and "there will be no tolerance for abusive content nor abusive users." The developers also specify no tolerance for misinformation related to the Covid-19 pandemic, whether in posts or comments. Such content will be deleted and the poster will be banned temporarily or permanently.

We'll have to wait to see how this plays out, but don't expect it to eliminate Facebook anytime soon.

Twenty Years Ago

1 GB CompactFlash About To Ship

In 2002, the thought of a huge 1GB CompactFlash Type II was exciting. Lexar Media said the chard had a sustained write speed of 2.4MB per second. ait was offered in Lexar's Professional Series intended primarily for professional photographers. The price: Around $1200.

Back then amateur photographers often used medium-resolution settings and JPEG format to pack more images per card, pros needed all the resolution they could get. Professionals often used TIFF files and that, I said, means each image may consume 15MB or more.

CompactFlash cards are still available, but most cameras now use smaller SD cards. Lexar's $110 Professional CompactFlash card holds 128GB of data (more than 100 times what the card I was so excited about 20 years ago held) and it sell for about one tenth the price.