Pages Menu
Categories Menu

Posted in Top Stories

Overlapping Speech Transcription Could Help Contend with ATE Complexity

By Keith Schaub, Vice President of Business Development for US Applied Research & Technology, Advantest America Inc.

Introduction

Increasingly complex chipsets are driving corresponding increases in semiconductor test system hardware and software. Artificial intelligence offers innovative, ingenious opportunities to mitigate the challenges that test engineers and test-system operators face and to improve security and traceability. Advantest, which fields thousands of test systems worldwide that test billions of devices per year, is studying several ways in which AI can help.

Initial work has involved facial recognition and overlapping speech transcription (the latter being the focus of this article), both of which can reduce the need for a mouse and keyboard interface. With a mouse and keyboard, operators can leave themselves logged in when other operators take over, creating security vulnerabilities and making it difficult, for example, to trace which operator was on duty during a subsequently detected yield-limiting event. A voice-recognition system could facilitate identifying which operators gave which commands.

Industrial cocktail-party problem

Implementing a voice-recognition system in a test lab or production floor presents its own challenges, with air-cooled systems’ fans whirring and multiple teams of engineers and operators conversing—creating an industrial version of the cocktail-party problem.

To address this problem, Advantest has developed fast, multi-speaker transcription system that accurately transcribes speech and labels the speakers.

The three main steps in the transcription process include speaker separation, speaker labeling, and transcription. For the first step, a real-time GPU-based TensorFlow implementation of the deep-clustering model recently developed by Mitsubishi1 separates the mixed-source audio into discrete individual-speaker audio streams. A matrix of audio-frequency domain vectors obtained by the short-time Fourier Transform (STFT) serves as the input to this model. The model learns feature transformations called embeddings using an unsupervised, auto-associative, deep network structure followed by a traditional k-means clustering method (recent implementations have shown significant improvements over traditional spectral methods) that output the clusters used to generate single-speaker audio.

The second step involves an implementation of Fisher Linear Semi-Discriminant Analysis (FLD)2 for an accurate diarization process to label the speakers for each audio stream that the clustering model generated in the separation step. The third and final step makes use of the Google Cloud speech-to-text API to transcribe the audio streams, assigning a speaker based on the diarization step.

Figure 1: This system-flow diagram illustrates the steps in the overlapping speech-transcription process, from the audio input to the labeling of the speakers.

Figure 1 illustrates the system flow of the entire process. During the first step, the clustering separates the audio. The spectrogram of the mixed and separated audio (Figure 2) makes it easy to visualize the separation taking place.

Figure 2: A view of the spectrogram of the mixed and separated audio helps illustrate how the separation takes place.

Testing the model

We tested the model on the TED-LIUM Corpus Release 3,3 which is a collection of TED Talk audio and time-aligned transcriptions. To measure the system accuracy, we compared our system-generated transcriptions to the ground-truth transcriptions using Word Error Rate (WER), denoted by the proportion of word substitutions, insertions, and deletions incurred by the system. Our system demonstrated a WER of 26% versus a ground-truth WER of approximately 14%. Overall, the generated transcripts were largely intelligible, as shown by the following example:

  • Actual Audio

“Most recent work, what I and my colleagues did, was put 32 people who were madly in love into a function MRI brain scanner, 17 who were. . .”

  • System Transcription

“Most recent work but I am my colleagues did was put 32 people who are madly in love into a functional MRI brain scanner 17 Hoover.

As shown, the results are largely readable, even with the current word error rate.

Often, the audio output from the Separation Step contains many artifacts, which lead to outputs readily understood by humans but that are more difficult for current speech-to-text converters. Thus, we get an output like this:

  • Actual Audio

“Brain just like with you and me. But, anyway, not only does this person take on special meaning, you focus your attention on them…”

  • System Transcription

“Brain, it’s like with your and name. But anyway, I don’t leave something special meeting. I’m still get your attention from you a Grande, AZ them…”

Thus, when the clustering algorithm becomes unstable, the transcription is also erroneous. However, many of these errors can likely be fixed in future work.

Overall, overlapping speech has presented a daunting problem for many applications including automated transcription and diarization. But recent innovations in learned-embeddings for speaker segmentations make it possible to produce accurate, real-time transcription of overlapping speech. The clustering model is the most computationally expensive step, but because it is implemented in TensorFlow and it is GPU-optimized, the system can run in real time. In short, recent research in learned embeddings allows for higher accuracy transcription of overlapping speaker audio.

Nevertheless, implementations of such systems are currently very limited due to relatively low accuracy, which we believe is likely the result of the clustering model using binary (discrete) masks1 to output the audio of each speaker. We will investigate continuous masking to further improve the audio quality well enough to be used for live transcription for live events.

Virtual engineering assistant for ATE

Ultimately, we envision AI techniques such as overlapping speech transcription to be useful in developing an AI-based engineering assistant for ATE, as outlined in a presentation at the 2018 International Test Conference. In the high-decibel environment of the test floor, overlapping speech transcription could help solve the cocktail-party problem, allowing the virtual assistant—a test engineering equivalent of Iron Man J.A.R.V.I.S—to respond to one particular engineer or operator.

Overlapping speech transcription is just one way of interacting with such an assistant. At Advantest, we have also experimented with facial recognition, using software that can create what is essentially a “face fingerprint” from one image, eliminating the need of traditional networks for thousands of images for training. We have found that the technology performs well at a variety of angles (photographing the subject from 30 degrees left or right, for example) and at a variety of distances (image sizes). Eventually, such technology might enable the virtual assistant to proactively intervene when recognizing a look of frustration on an engineer’s face and intuiting what information may be helpful in solving the problem at hand.

Beyond speech-transcription and facial-recognition capabilities, a virtual engineering assistant would embody a wealth of highly specialized domain knowledge, with many cognitive agents offering expertise extending from RF device test to load-board design. Such an assistant would be well versed in test-system features that might only be occasionally required over the long lifetime of expensive equipment with a steep learning curve. Ultimately, such an assistant could exhibit intuition, just as do game-playing AI machines that have mastered “perfect information” games like checkers and chess and have become competitive at games like poker, with imperfect information and the ability to bluff. Although computers haven’t traditionally thought to be intuitive, it might turn out that intuition evolves from deep and highly specialized knowledge of a specific domain.

References

1. Hershey, John R., et al., “Deep Clustering: Discriminative Embeddings for Segmentation and Separation,” 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2016. https://ieeexplore.ieee.org/document/7471631

2. Giannakopoulos, Theodoros, and Sergios Petridis, “Fisher Linear Semi-Discriminant Analysis for Speaker Diarization,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 20, no. 7, 2012, pp. 1913-1922. https://ieeexplore.ieee.org/document/6171836

3. Hernandez, François, et al., “TED-LIUM 3: Twice as Much Data and Corpus Repartition for Experiments on Speaker Adaptation,” Speech and Computer Lecture Notes in Computer Science, 2018, pp. 198-208. https://arxiv.org/abs/1805.04699


Did you enjoy this article? Subscribe to GOSEMI AND BEYOND

Read More

Posted in Top Stories

Flexible Automation Infrastructure Supports Continuous Test-program Integration and Delivery

By Stefan Zügner, V93000 Product Manager, Jan van Eyck, Product Owner SW R&D, Kheng How, Senior Staff Software Engineer and Daniel Blank, Senior Application Consultant Center of Expertise

Test engineers today are facing many challenges working within a collaborative test-program-development environment. Fortunately, a concept called continuous integration, or CI, can be implemented within the test-program development process to help meet these challenges.

Today, each engineer is likely to be part of a team of many engineers working on different parts of the same test program concurrently. Furthermore, test engineers developing IP blocks may be spread across widely scattered geographical locations.

The result of the complexity is that developers issue multiple program changes (commits) every day. Each commit changes the test program, and any commit may break the test program. Consequently, at any given time, the overall quality of the test program may be unknown, and problems can require significant time and resources to discover, debug, and fix. In addition, the longer it takes to discover a bug, the more time and expense it takes to fix it.

Continuous integration addresses today’s test challenges

Collaborative development typically relies on an existing source-code management system (for example, git or SVN) for tracking changes, but by the time an integrator discovers issues, it is usually too late to fix them in an efficient and timely manner. With continuous integration tooling it is now possible to trigger validation tests in an automated manner whenever changes are committed to the source-code-management repository, allowing for frequent integration and timely checks without additional overhead for the individual developer (as illustrated in Figure 1). Continuous delivery in addition automates the release-to-production process and therefore allows for new test-program releases essentially at any point in time.

Figure 1: The continuous integration workflow embraces automated validation tests for each change to a test program.

The difference between traditional and continuous integration processes is illustrated in detail in Figure 2. With the traditional process, different engineers (Alice, Bob, and Charlie in Figure 2) independently develop sections of a test program, and yet another engineer (David) performs integration and test just before the program’s release. If David’s test finds a bug, deadlines could be at significant risk because of the time it may take Alice, Bob, or Charlie to debug and rework their code and resubmit it to David for further integration and test.

Figure 2: A traditional development process (top) can put release deadlines at risk. In contrast, a continuous integration process (bottom) reduces time-to-market and time-to-quality.

The continuous-integration process, in contrast, delivers continuous and systematic validation throughout the entire development cycle, providing immediate feedback to engineers such as Alice, Bob, and Charlie on the quality of their commits. The immediate feedback made possible through continuous integration reduces both time-to-market and time-to-quality. In addition, the automated test-program validation process includes programmatic checks, which allow engineering teams to establish quality processes in a repeatable manner.

Tools for continuous integration

Several software tools can serve in a continuous-integration system.1 One of them is Jenkins (https://jenkins.io/), an open-source and widely used automation tool with support from an active community that makes information widely available on the web. 

Jenkins is extensible and contains comprehensive plugins for functions such as source-code management (for example, git and SVN) and email notification.

In an implementation in which Jenkins is employed in a continuous-integration system (Figure 3), every commit can trigger the automatic running of validation jobs offline or online according to the job setup. The system stores and manages execution logs and test results while sending out notifications and reports on each execution. With the continuous-automation system automating test-program validation, test-program developers can focus on development.

Figure 3: A traditional development process (top) can put release deadlines at risk. In contrast, a continuous integration process (bottom) reduces time-to-market and time-to-quality.

Adding Smart CI to SmarTest 8

For semiconductor test-program development, Advantest offers its SmarTest 8 software for the V93000 platform.2 SmarTest 8 builds on previous versions to offer fast test-program development, efficient debug and characterization, high throughput due to automated optimization, faster time to market, ease of test-block reuse, and efficient collaboration.

To support continuous integration and delivery for test-program development in the SmarTest 8 environment, Advantest offers the Smart CI solution. The Smart CI solution includes the Smart CI custom Jenkins server plugin, which is tailored for SmarTest 8. The plugin offers simple validation job setup through “fill-in-the-blanks” forms, and it supports freestyle (GUI-based, one client) and pipeline (script-based, distributed single validation job on multiple clients) setups.

Tightly integrated with the plugin is the Smart CI Client for SmarTest 8, which provides a command line interface (CLI) to enable continuous integration and delivery for SmarTest 8 test programs. Smart CI Client can also be used for other CI solutions not incorporating Jenkins.

Also included in the Advantest Smart CI solution are Docker images for each individual SmarTest 8 release, allowing for a simplified Smart CI application. The Docker images offer preconfigured setup and enable virtual-machine (VM) and cloud installations. Multiple SmarTest 8 versions and offline jobs can also be run on the same workstation concurrently.

Smart CI works out-of-the-box

Smart CI works out of the box. Just enter a test-program name, and the program compiles, loads, and executes, comparing results against datalog and throughput references.

In addition to working out of the box, Smart CI offers Advantest templates that can be adapted with low to medium effort by a lead test engineer to validate a test program with customer-specific checker scripts. Customized results are available via offline execution.

Beyond continuous integration as enabled by Smart CI today, Advantest’s roadmap calls for the future implementation of continuous delivery, in which a test program (optionally encrypted) can be exported for production, and test-program validation can take place in a production environment, including a test cell. As such Smart CI will also offer an integration with built-in or custom release checkers of TP360.3 TP360 is a software package that helps V93000 customers increase test-program development efficiency, optimize test-program quality and throughput, reduce cost of test, and increase test-program release and correlation efficiency. TP360 is based on an open framework that enables users to add new applications easily and flexibly.

As does continuous integration, continuous delivery will work out of the box—an engineer need only enter a test-program name.

Conclusion

In summary, Smart CI enables automated continuous integration and delivery for SmarTest 8, saving test-program development time and effort and boosting engineering capacity by 10% to 15%. Smart CI ensures test-program quality through fully automated and systematic test-program quality checks throughout the entire development cycle, and it enables the release of runtime-ready test programs at any time. Furthermore, it fosters discipline in engineering teams, enabling team members to consistently deliver high quality, and it provides clear project status reports anytime, thereby increasing manageability and predictability. Smart CI Docker images simplify installation and maintenance, the Advantest Jenkins server plugin supports easy validation job setup, and Advantest provides comprehensive support and continuous enhancements.

References

1. “Comparison of continuous integration software,” Wikipedia. https://en.wikipedia.org/wiki/Comparison_of_continuous_integration_software

2. Donners, Rainer, “A Smarter SmarTest: ATE Software for the Next Generation of Electronics,” GO SEMI & BEYOND, August 3, 2017. http://www.gosemiandbeyond.com/a-smarter-smartest-ate-software-for-the-next-generation-of-electronics/

3. Zhang, Zu-Liang, “TP360—Test Program 360,” Video, VOICE 2013. https://vimeo.com/80319228


Did you enjoy this article? Subscribe to GOSEMI AND BEYOND

Read More

Posted in Featured

Blockchain May Soon Be Everywhere

By Judy Davies, Vice President of Global Marketing Communications, Advantest

Just as disruptive integrated circuit (IC) technologies have been invented, evolved, tested and secured adoption for widespread use, the same is true for innovative applications – such as blockchain. Conceived in 2008, blockchain is gaining momentum as a means of enabling the secure distribution of data for digital transactions. The technology, which has its roots in cryptocurrency, is appealing to businesses for two key attributes: it serves as a decentralized ledger, and it protects any entered data from being modified. Because it can be used to record transactions between parties efficiently, permanently and verifiably, the technology may become a key asset in combatting identity theft and online fraud.

Blockchain has the potential to streamline and safeguard digital business operations for companies of all sizes, from large chains to small online startups. In one of its more unique applications, blockchain is being used to certify the authenticity and history of natural diamonds, enabling buyers to distinguish the real thing from synthetic gems and fake stones. This could prove hugely valuable in helping eliminate support of blood diamonds or ensuring an engagement ring’s gems are the real thing.

Another application is a blockchain ledger, which can extend “smart” connected technology beyond phones, appliances and cars to include stock certificates, property deeds, insurance policies and other important documents. By maintaining the papers’ current ownership records, blockchain can become, in essence, a “smart key,” allows access to the permitted person(s) alone. Government, health care, finance and other fields that rely on unbreachable documentation could be transformed by this capability.

Verifying the path from farm to table to help ensure food safety is another potential use for blockchain. For example, by keeping a registry of the specific field or section from which a head of lettuce was harvested, blockchain may help to quickly pinpoint the sources of dangerously tainted foodstuffs. This will keep consumers safer from illness, as well as prevent unnecessary disposal of uncontaminated food. And if you sometimes wonder whether your produce really is organic or your turkey free-range, this technology can assure you of your food’s integrity.

While hacking is a fear with any networked technology, blockchain may actually prove to be the most impervious to being hacked. Instead of utilizing a central data storehouse, all information on a blockchain is decentralized, encrypted and cross-checked by the entire network. With this distributed design, there is no third-party data center for transactions. Each user’s computer, or node, has a complete copy of the ledger, so even if one or two nodes is lost, system-wide data loss is not a risk. Moreover, using encryption means that file signatures can be verified across all ledgers, on all networked nodes, to ensure they haven’t been altered. If any unauthorized change is made, the traced signature is invalidated.

Blockchain’s design also allows data tracking with validity that can be easily confirmed. Its transparency offers a welcome alternative to the way that much of our personal, online information has been dissected and manipulated for financial gain by some well-known technology behemoths. With its nearly unlimited breadth of applications, blockchain technology looks well-positioned to make the leap from managing digital currencies to becoming the next-generation solution for our online personal and work lives.

Read More

Posted in Upcoming Events

VOICE 2019 Developer Conference Gears Up for an Exciting Technical Program

Advantest is previewing the technical program and complete list of keynote speakers for both locations of its VOICE 2019 Developer Conference.  For the first time, the conference will be held in Scottsdale, Arizona on May 14-15 and Singapore on May 23 under the unifying theme “Measure the Connected World and Everything in ItSM.”

“The semiconductor industry is using technology to build a smarter world,” said Adam Styblinski, technical chairman of the VOICE 2019 Developer Conference and AMD product development engineer.  “With presentations on hot topics including 5G, MIMO and mmWave advancements, VOICE 2019 keeps attendees up to date on cutting-edge technologies and the testing challenges they present.”

VOICE 2019 Program

The heart of VOICE continues to be its comprehensive learning and networking opportunities comprised of a technical program featuring more than 90 presentations across both locations with submissions from authors representing 28 companies and 10 countries; Partners’ Expos; social gatherings; Technology Kiosks; and stimulating keynote speakers.  This year’s technical tracks will focus on device/system level test, the internet of things (IoT), test methodologies, hardware and software design integration, the latest hot topics and – for the first time in 2019 – test solutions enabled by Advantest’s T2000 platform.  Each location will host a technology kiosk showcase offering attendees the opportunity to interact directly with Advantest product experts.

The general session on May 14 in Scottsdale will include an Advantest technology discussion panel moderated by Hans-Juergen Wagner, senior vice president of the SoC business group and managing executive officer at Advantest Corporation.  Four of the company’s leading test experts – Rich Lathrop, Hagen Goller, Masayuki Suzuki and Koichi Tsukui – will sit on the panel and field questions from VOICE attendees.

VOICE 2019 Keynotes

On the second day of VOICE in Scottsdale, the program will feature two keynote speeches by dynamic technology leaders.  The first speaker, Dr. Walden “Wally” Rhines, CEO emeritus of Mentor, a Siemens Business, is a recognized spokesperson for the semiconductor and EDA industries.  The second keynote speech, sponsored by EAG Eurofins Engineering Science, will be given by Dr. Hugh Herr, renowned engineer, biophysicist and leader of MIT Media Lab’s Biomechatronics Group.  Dr. Herr is building the next generation of robotic prosthetics, sophisticated devices that aid human movement by mimicking nature.

For VOICE Singapore, the featured keynote speech on “Industry 4.0: Preparing for the Future of Work” will be delivered by Mark Stuart, co-founder of Anagram Group, a global corporate-training company based in Singapore that won the British Chamber of Commerce’s 2018 “Future of Work” award for contributions in developing future-ready leaders and transforming organizations through innovation. Stuart is a speaker, trainer and executive coach specializing in leadership and innovation. He works with more than 170 government and corporate clients in Singapore, Asia and the UK across a wide range of industries.
Read more about all the VOICE 2019 keynote speakers at https://voice.advantest.com/keynotes/.

VOICE 2019 Quick Links

Registration

Agenda

Technical Program

Hotel Reservations

Sponsors

Spread the Word About VOICE

Questions: mktgcomms@advantest.com

Read More

Posted in Featured Products

New T2000 Module Has Industry’s Highest Analog Digitizer for Cost-Efficient Testing of High-Res Audio ICs

Advantest’s new GPWGD high-resolution module features the industry’s highest analog-performance digitizer, which supports testing of high-resolution audio digital-to-analog converters (DACs) embedded in power-management ICs (PMICs) as well as stand-alone high-resolution audio devices. The module’s innovative measurement technique performs over an ultra-high dynamic range, achieving unprecedented accuracy in analog testing from device characterization to mass production without requiring complex performance boards or additional test and measurement instruments on the T2000 test platform.

High-resolution audio features both a wider dynamic range and an improved sound source compared to CDs. The proliferation of electronic devices capable of supporting high-resolution audio – including smart phones, wireless audio components for wearable electronics and home theaters, automotive navigation systems, gaming consoles, 4K and 8K televisions, and other next-generation products – has led to an increase in the number of PMICs with embedded digital-to-analog converters (DACs), which require high-dynamic-range testing with 24-bit or 32-bit resolution.

When used on the T2000 platform, the GPWGD high-resolution module provides the versatility to test both PMICs and high-resolution audio DACs using the same system configuration.  This helps users to save on their capital investments while also reducing test cycle times.

The module’s upward compatibility and the high-resolution functionality of its digitizer enable industry-leading measurements with both a signal-to-noise ratio (SNR) and a dynamic range (DR) of 130 dB, surpassing the analog performance of other testers typically used by developers of audio ICs. In addition, the unit’s massive parallel site testing capability leverages twice the number of sites compared to other systems on the market, resulting in higher throughput and a lower cost of test.

The new GPWGD high-resolution module’s extendible design allows it to be seamlessly integrated into either laboratory or production environments for existing device types as well as new high-resolution audio ICs.

Read More