Advantest Corporation has extended its V93000 system
to cost-efficiently test the next generation of 5G-NR radio frequency devices
and modules on a single scalable platform.
The new V93000 Wave Scale Millimeter solution has the high multi-site
parallelism and versatility needed for multi-band millimeter-wave (mmWave)
frequencies. The operational range from
24 GHz to 44 GHz and 57 GHz to 70 GHz enables customers to reduce their time to
market for new designs running at mmWave frequencies.
The highly integrated system is architecturally
distinctive from other solutions by providing as many as 64 bi-directional
mmWave ports based on a modular implementation.
This allows not only the use of different 5G and WiGig frequency
modules, but also the addition of new modules as new frequency bands are rolled
out worldwide. Based on an innovative
mmWave card cage with up to eight
mmWave instruments, this highly versatile and cost-effective ATE solution
performs on the level of high-end bench instruments. The scalable system’s wideband testing
functionality gives it the capability to handle full-rate modulation and
de-modulation for ultra-wideband (UWB), 5G-NR mmWave up to 1 GHz, WiGig
(802.11ad/ay) up to 2 GHz and antenna-in-package (AiP) devices in addition to
beamforming and over-the-air testing.
In delivering the industry’s first integrated,
multi-site mmWave ATE test solution, Advantest is providing a pathway for
customers to lower the cost of test for their current and upcoming 5G-NR devices
while leveraging their existing investments in our well-established Wave Scale
RF testers. In particular, OSAT
companies can benefit greatly from this flexible, scalable mmWave ATE solution.
Early installations at customers testing both 5G
and WiGig multi-band devices have been completed. Advantest is now accepting orders for the new
mmWave solution.
The use of artificial
intelligence (AI) techniques such as machine learning is growing as the semiconductor
industry discovers new ways to use these approaches to do things that humans cannot.
In this issue, we talk with Keith Schaub, Vice President of Business
Development for Advantest America’s Applied Research Technology and Ventures,
about unique research Advantest is conducting with the Univ. of Texas, Dallas,
to integrate machine learning into a challenging area of chip development: RF
transceiver design, test and manufacturing.
Q. What led Advantest
to begin investigating the use of machine learning for this application?
A.
Machine learning has been around for a long time. It’s actually a subset of AI,
by which machines learn how to complete tasks without being explicitly
programmed to do so. There have been many startups over the years that looked
to leverage machine learning, but it’s never really been implemented previously
within the semiconductor industry. As we have begun to do more work looking at
the potential advantages of using AI, we’ve come to realize there are some
practical applications by which the industry could greatly benefit.
Q. What is the approach
you’re developing for implementing machine learning?
A.
The approach we’ve been working on with UT Dallas is a proof of concept for how
to take a machine learning method and apply it to semiconductor manufacturing
and test – specifically, RF transceivers. Machine learning is much better
suited to analog than to digital devices. Digital is a series of 1s and 0s, so
the system can either recognize something or not, but there’s no ability to drill
down in terms of granularity in order to leverage the more powerful aspects of
machine learning. Analog systems require far more data because they’re more
complex, making them a better environment for machine learning.
In
RF applications, the numerous transmission protocols, large amounts of data,
and large bandwidths with high data rates create challenges that call for the development
of new algorithms for which modern machine learning is well suited. RF
transceivers are affected by a variety of impairments, such as compression,
interference and offset errors, as well as IQ imbalance. IQ signals form the
basis of complex RF signal modulation and demodulation, both in hardware and in
software, as well as in complex signal analysis.
Figure
1 shows a typical RF transceiver circuit, with a number of potential noise
errors highlighted in red. A graphical representation of the signal quality can
be generated to correspond with each error (Figure 2). The challenge for the
operator is knowing which error generated which plot, and which errors are the
most problematic.
The
approach we’ve developed is a machine learning-based solution for noise
classification and decomposition in RF transceivers. The machine learning system can be trained to learn and then
identify and match up each impairment to each noise plot; this is something
that would be virtually impossible for a human to do.
Figure 1. RF circuit
with potential noise errors in red.
Figure 2.
Constellation plot showing signal quality impairments caused by various noise
errors.
Q. How would this be
put to use in a manufacturing environment?
A.
Figure 3 illustrates how the machine
learning solution works. During the training process – this is literally how
the system learns to recognize and classify data – a set of constellation
points from early versions of the ICs being developed are fed into a machine
learning system. Extracted features are separated by category as either
noise-type classification or noise-level regression, with the system learning
what each type is and how to separate and recognize them by individual error.
This is indicated by the different colors assigned to each specific noise type.
This is particularly valuable because, while RF transceiver designs, like those
of most analog circuits, involve a high degree of customization, certain types
of noise errors can potentially occur regardless of the specific circuit.
Once
the training process is complete, the system can be put into use in production
mode with actual DUTs [devices under test], and use what it has learned through
the training process to apply models, identify the various types of errors and
provide an impairment report. The system doesn’t have to go through lengthy
downtime because the assessment can be completed quickly, and the resulting
report allows the user to determine which errors are most critical and need to
be addressed so that no damage or yield loss occurs.
Figure 3. Machine learning
process for RF transceiver noise classification and decomposition.
This approach can be used throughout
the test process – not only for device and system-level test, but also during design-for-test,
so that analog/RF designers can better simulate and understand whether their
designs will work. This is important due to amount of hand/custom work and the number
of variables associated with analog device design.
Q. At what point do
you see this technique being broadly adopted in the industry? What challenges
would prevent this from occurring?
A. While the technology is mature enough
that it could be implemented right away, there are several reasons machine
learning has not yet been broadly adopted in the semiconductor industry. For
one, there haven’t been sufficient resources/datasets to support its widespread
use. For another, the industry is highly risk averse and concerned about
security, so companies don’t want to make their data – which is their valuable IP
– available for the machine learning process. They have it in the cloud, but in
their own individual clouds, which don’t talk to each other. My belief is that
use of machine learning will become widespread when the big IDMs [integrated
device manufacturers] take the initiative, and the rest of the industry will
follow suit.
NOTE:
Advantest’s Applied Research Technology and Ventures group would like to acknowledge the recent publication at the 2019 IEEE 37th VLSI Test Symposium (VTS) of a paper titled “Machine Learning-based Noise Classification and Decomposition in RF Transceivers,” which details the work described in this interview. The paper was jointly developed by Deepika Neethirajan, Constantinos Xanthopoulos, Kiruba Subramani, Yiorgos Makris (UT Dallas), Keith Schaub and Ira Leventhal (Advantest America).
The following article is
an adapted excerpt of a DesignCon 2019 Best Paper
Award winner. The full paper is included in the conference
proceedings, which can be purchased here.
By Giovani Bianchi, Senior Staff Engineer, and
José Moreira, R&D Engineer, Advantest Corp.; and Alexander Quint, Student, Kalrsruhe Institute
für Technologie, Germany
The opening of the millimeter-wave (mmWave) spectrum to the next
generation of mobile communications introduces mmWave-based communications to
the consumer arena. This new generation includes 5G and WiGig [Wi-Fi-certified
60MHz communications]. From a test engineering point of view, mmWave communications
require a significant jump in testing frequencies from the maximum of 6 gigahertz
(GHz) used for LTE applications
to frequencies as high as 44GHz for 5G and 72 GHz for WiGig.
In addition, these new applications use
phased array antennas, which means there are many
more radio-frequency (RF) ports that need to be tested compared to LTE
applications. At the same time, the same cost-of-test pressure for consumer
applications applies to testing these new mmWave integrated circuits (ICs).
While mmWave applications
pose a variety of new challenges for the automated test equipment (ATE) test
engineering community, this article concentrates on a specific topic: the
design of printed circuit board (PCB) combiners/dividers that can aggregate
multiple RF ports into a single measurement/stimulus port. This can be vital
for reducing cost of test. Deciding to use a combiner/divider will be highly
dependent on the target application, testing phase (e.g., initial
characterization or high-volume production), available ATE resources and test
strategy.
What is a combiner/divider?
In general, power combiners/dividers are passive N-port networks (N ≥ 3). They can be used as power dividers to split the power of an input signal into two or more output signals, or they can be used to combine multiple input signals to one output signal of higher power [1]. In this case, the power divider is called a power combiner. The input of a power divider is the output of a power combiner and vice versa.
The easiest way to build a three-port power divider is shown in Figure 1. It consists of a T-junction and a susceptance, which represents discontinuities in the junction. A reciprocal three-port network (which a power divider is) can never be lossless and matched at all ports [1].
Therefore, to be matched at all ports, resistive elements must be added to the power divider.
Figure 1: General schematic of a power divider.
Wilkinson combiner/divider
Another disadvantage of the simple power divider shown in Figure 1 is that the two output ports are not isolated against each other. To obtain isolation between the output ports, Wilkinson power dividers [1,2] are used. A two-way Wilkinson power divider schematic is shown in Figure 2.
Figure 2: Schematic of Wilkinson power divider.
The two quarter-wave transformers provide good input matching,
whereas the resistor between the two outputs provides good isolation between
the output ports. If the output ports are both matched, the Wilkinson power
divider even appears lossless because there is no current flowing through the resistor.
Wilkinson power dividers can be designed for multiple outputs; can have unequal
power ratios; and can be extended using multiple sections to achieve higher bandwidth.
Wilkinson power dividers well suited for frequencies in the
range of 20-40GHz. However, power dividers with resistors are difficult to use
in frequency ranges above 50GHz. For these scenarios, the better choice is a
hybrid ring.
Hybrid ring or rat-race combiner/divider
While it is not possible to build lossless three-port networks
that are matched at all ports, it is possible to do so with four-port networks.
One easy way to realize a four-port power divider is with a hybrid 180° coupler,
such as the hybrid ring or rat-race example shown in Figure 3 [1].
If port 1 or 3 is used as an input port, the output ports (2,3
or 1,4 respectively) are in phase and port 4 or 2, respectively, is isolated.
If port 2 or 4 is used as an input port, the output ports (1,4 or 2,3
respectively) are shifted by 180°, respectively isolating port 3 or 1. A hybrid
ring can also be used as a power combiner with two inputs and one sum and one
difference output. For example, if ports 2 and 3 are used as input ports, port
1 is the sum output and port 4 is the difference output.
Figure 3: Hybrid ring structure
ATE test fixture challenges
ATE PCB test fixtures used in mmWave applications face some key challenges, as described below.
PCB size and thickness. Figure 4 shows a typical multi-site ATE PCB test fixture for high-volume production testing of an LTE-related RF device (below 6GHz). These PCBs are very large (e.g., 516.8mm x 600mm) and thick—a minimum thickness of 3.5mm is required for some ATE platforms, with stack-up thickness reaching 5mm or higher. In addition, while multi-site setups are necessary for parallel testing (essential for reducing cost of test), handler requirements can cause the pitch between the devices under test (DUTs) to be very tight.
Figure 4: Example of ATE test fixture with eight
sites for high-volume testing
of an RF integrated circuit (<6GHz). Courtesy of Spreadtrum.
Small manufacturing
volumes. Compared with higher-volume
PCB applications, the manufacturing volume for an ATE PCB test fixture can be
as small as only one or two boards at the start of a project. In addition to
their size and complexity, this means these boards will be relatively
expensive.
DUT pitch. Currently, ball grid
array (BGA) pitches for mmWave applications can be less than 0.4mm. This small
pitch, coupled with the large, thick PCB test fixture, further complicates
manufacturing. It will also create mechanical restrictions on any
combiner/divider designs one needs to implement to connect to the DUT BGA pads.
Dielectric material. High-performance dielectric materials have been the default choice in mmWave applications, but cannot be used for the large, high-layer-count PCBs typical of ATE applications. Ideally, traditional high-performance materials that are already in use for ATE test fixture applications can be deployed. Hybrid stack-ups can also be used, with a high-performance RF material in the outer layers and standard FR4 in the inner layers, but this should be discussed in detail with the PCB test fixture fab house.
The type of fiber weave used is also a critical determinant of dielectric material. Typical high-performance RF materials for ATE PCB test fixture manufacturing use a glass weave. This glass weave will have an impact on dielectric loss, dielectric constant, and, ultimately, signal delay. This is now important because mmWave application usually use phase array antennas, so the phase of each element is critical. On the PCB test fixture, it is important that the phase delay of all interconnects to the antenna ports is the same. To minimize differences in dielectric constant, either a spread glass fiber weave type can be used, or the PCB test fixture can be rotated 10 degrees on the manufactured panel.
Microstrip copper profile and plating. For traditional RF applications (<
6GHz), only the skin effect and dielectric loss
were important, but for mmWave applications the surface roughness loss becomes
important [4,5]. This means that when selecting the dielectric material, one
also needs to consider the type of copper profile to be used, taking into
account the manufacturing and reliability requirements of the PCB test fixture.
Choosing a very low-profile copper, for example, may make sense for loss
mitigation, but if the PCB fab house cannot
guarantee its reliability for the specific requirements of the ATE PCB test
fixture, it may generate other problems and should not be used.
Implementing a Wilkinson combiner [for complete analysis and all figures, please see full paper]
As mentioned, ATE PCB
test fixtures present significant challenges due to their size and
requirements. Although there are off-the-shelf combiners/dividers with
excellent performance, especially for the 5G frequency range, they use
materials and implementation techniques that are not viable for an ATE PCB test
fixture. PCB size and the need for a multilayer implementation limit the types
of possible approaches. Also, the large number of ports required for mmWave
applications, coupled with the need to test multiple DUTs in parallel, requires
that the combiner/divider structures be small and omit processes incompatible
with standard ATE PCB test fixture assembly.
Figure 5 shows three examples of a two-way Wilkinson
combiner/divider element targeted for 5G applications (target design was 20-40
GHz), chosen for their implementation simplicity and small size. Example 1 is
the easiest layout for a Wilkinson power divider. The quarter-wave transformers
are curved to reduce coupling between them. In Example 2, there is only a small
modification where a short line with a different width in front of the
quarter-wave transformers is used to further improve the input matching.
Example 3 is the most complex since it consists of two
power divider stages. In general, this type has a higher bandwidth than a single-stage
Wilkinson divider.
Figure 5: Implementation examples of single two-way Wilkinson
combiners/dividers.
The key metrics when
evaluating a combiner/divider are its phase matching (which, with a Wilkinson
simulation model, is always perfect), the return loss at each port and the loss
matching across the frequency of interest. Tables 1 and 2 show the insertion
loss and the return loss of the common port for the three examples at five
different frequencies.
Table 1: Comparison of the simulated insertion loss for each Wilkinson example.
Table 2: Comparison of the simulated return
loss for each Wilkinson example.
The results show that Example
2 has an overall improved return loss
on the common port compared with Example 1. Example 3 shows slightly less
variation on the insertion loss compared with the other examples. These differences may seem small when comparing single
elements but they do amplify once one begins to aggregate the elements in a
more complex Wilkinson combiner (e.g., a 1-to-8 Wilkinson combiner/divider).
Implementing a hybrid ring [for complete analysis and all figures, please see full paper]
As mentioned in the previous section, hybrid ring combiners are able to work at higher frequencies than traditional Wilkinson combiners, but with a smaller bandwidth. This type of design should be targeted for WiGig applications in a frequency band of 56 to 72GHz. Unfortunately, in this frequency range, off-the-shelf combiners in coaxial packages are not easy to come by, although some vendors can create them by special request.
Figure 6 provides two examples of implementing a single two-way hybrid ring combiner/divider element targeted for the WiGig frequency range. The shape of the ring in both examples is not exactly circular due to the rectilinear T-junctions. The layout of Example 2 looks more regular due to the smaller T-junctions, to which trapezoidal tapers have been added, to make the T-junction shorter and to geometrically match the width of the 50-ohm lines.
Figure 6: Implementation examples of single 2-way Hybrid
ring combiners/dividers.
Figure 7 shows simulated and measured results for this
structure. The connectors were not de-embedded from the measured data, so a
full 3D EM simulation was done, including the connector model from signal
microwave. The used PCB parameters on the simulation (dielectric constant and
loss tangent) were based on tuned values from a previous test coupon using the
procedure described in [5]. This is critical to obtain more accurate simulation
results.
Figure
7: Results of the hybrid ring test coupon simulation and measurement.
The results show a
reasonable correlation to simulation, even when assuming the structure etching
was perfect. The target bandwidth of the hybrid ring (56 to 72GHz) was achieved
with less than 1 dB measured amplitude imbalance and less than 25 degrees
measured phase imbalance in that frequency range.
Conclusion
In looking at combiner/divider design approaches for mmWave applications on ATE systems, special attention is given to the Wilkinson and hybrid ring (rat-race) combiner approaches because they’re more easily implemented on ATE PCB test fixtures. These fixtures present specific challenges that need to be considered in advance when designing a combiner/divider for 5G/WiGig applications. In this context, some of the challenges are new to the ATE test fixture design community. The importance of the 5G/WiGig applications will certainly generate design improvements and new ideas, both for combiner/divider topologies and for PCB manufacturing.
References
[1] D.
Pozar, Microwave Engineering, 4th
Edition, Wiley 2011.
[2] E.
J. Wilkinson, “An N-Way Hybrid Power Divider,” IRE Transactions on Microwave Theory and Techniques, Vols. MTT-8,
pp. 116-118, 1960.
[3] Jose Moreira and Hubert Werkmann, An Engineer’s Guide to Automated Testing of High-Speed Interfaces, 2nd
Edition, Artech House 2016.
[4] Rogers Corporation, “Copper Foil Surface Roughness and
its Effect on High Frequency Performance,” PCB West, 2016.
[5] Heidi Barnes, Jose Moreira and Manuel Walz,
“Non-Destructive Analysis and EM Model Tuning of PCB Signal Traces Using the
Beatty Standard,” DesignCon, 2017.
By Ira Leventhal, Vice President, Applied Research Technology and Ventures, Advantest America, and Jochen Rivoir, Fellow, Advantest Europe
Interest in implementing artificial intelligence (AI) for a wide range
of industries has been growing steadily due to its potential for streamlining functions and delivering time and cost savings. However, in order for electrical and electronic systems utilizing AI to be truly dependable, the AI itself must be trustable.
As Figure 1 shows, dependable systems have a number of shared
characteristics: availability, reliability, safety, integrity, and maintainability. This is particularly essential for mission-critical environments such as those illustrated. Users need to be confident that the system will perform the appropriate actions in a given situation, and that it can’t be hacked into, which means the AI needs to be trustable from the ground up. As a test company, we’re looking at what we can do down at the semiconductor level to apply trustable, explainable AI in the process.
Figure 1. Dependable systems are essential for electrical and electronic applications, particularly those with life-or-death implications. These Photos by Unknown Authors are licensed under CC BY-SA and CC BY-NC.
What is trustable AI?
Currently, much of AI is a black box; we don’t always know why the AI
is telling us to do something. Let’s say you’re using AI to determine pass or fail on a test. You need to understand what conditions will cause the test to fail – how much can you trust the results? And how do you deal with errors? You need to understand what’s going on inside the AI, particularly with deep learning models: which errors are critical,
which aren’t, and why a decision is made.
A recent, tragic example is the Boeing 737 MAX8 jet. At the end of the day, the crashes that occurred were due to failure of an AI system. The autopilot system was designed, based on the sensor data it was continually monitoring, to engage and prevent stalling at a high angle of attack – all behind the scenes without the pilot knowing it had taken place. The problem was that the system engaged corrective action at the wrong time because it was getting bad data from the sensor. This makes it an explainable failure – technically, the AI algorithm worked the way it was supposed to, but the sensors malfunctioned. Boeing could potentially rebuild confidence in the airplane by explaining what happened and what they’re doing to prevent future disasters – e.g., adding more redundancy, taking data from more sensors, improving pilot training, etc.
But what if a deep learning model was the autopilot rather than the
simpler model that acts based on sensor data? Due to the black box nature of deep learning models, it would be difficult to assure the public that the manufacturer knew exactly what caused the problem – the best they could do would be to take what seemed like logical measures to correct the issue, but the system would NOT be trustable.
What does this mean for AI going forward? What are the implications of not having trustable AI? To understand this, we need to look briefly at the
evolution of AI.
The “next big thing”… for 70
years
As Figure 2 shows, for seven decades now, AI has been touted as the next
big thing. Early on, AI pioneer Alan Turing recognized that a computer
equivalent to a child’s brain could be trained to learn and evolve into an
adult-like brain, but bringing this to fruition has taken longer that he likely anticipated. During the first 25 years of the AI boom, many demos and prototypes were created to show the power of neural networks, but they couldn’t be used for real-world applications because the hardware was too limited – the early computers were very slow with a minuscule amount of memory. What followed in the 1970s was the first AI winter. The second boom arose in the 1980s and ‘90s around expert systems, and their ability to answer complex questions. The industry created very customized expert-system hardware that was expensive and tough to maintain, and the applications were mediocre, at best. The second AI winter ensued.
Figure 2. The evolution of AI has been marked by hype cycles, followed by AI winters. This Photo by Unknown Author is licensed under CC BY-NC-ND
For the past 20 years, AI has enjoyed a fairly steady upward climb due
to the confluence of parallel processing, higher memory capacity, and more
massive data collection, with data being put into lakes rather than silos to enable better flow of data in and out. Having all these pieces in place has enabled much better algorithms to be created, and Internet of Things (IoT) devices have created a massive, continuous flow of data, aiding in this steady progression.
What this means, however, is that we are currently in the next hype
cycle. The main focus of the current hype is autonomous cars, medical applications such as smart pacemakers, and aerospace/defense – all areas with life-and-death implications. We need trustable AI; otherwise, we will not have dependable systems, which will lead to disappointment and the next AI winter. Clearly, we need to avoid this.
AI in the semiconductor industry
With this backdrop, what are some challenges of applying AI within the semiconductor
industry?
Fast rate of technological advancement. AI is
great for object recognition because it’s learning to recognize things that don’t
change that much, e.g., human beings, trees, buildings. But in the semiconductor
industry, we see a steady parade of process shrinks and more complex designs
that bring new and different failure modes.
Difficult to apply supervised learning due to a
lack of labeled training data for these new areas.
High cost of test escapes. If a faulty device is
passed and sent out for use in an app – an autonomous driving system, for
example – and subsequently fails, the cost could be life and death. Therefore,
both risk aversion and the need for certainty are very high.
To meet these challenges requires a different type of AI. A major
research focus in the AI community is on developing explainable AI techniques
designed to provide greater transparency and interpretability, but these
techniques are currently far from fully opening AI model black boxes. Today, our
focus is on development of explaining
AI. With this approach, we look for opportunities to use machine learning
models and algorithms – deep learning, clustering, etc. – to provide insight into the data so that we can
make better decisions based on the information. By looking for ways to use AI
that have more upside potential for insight, and staying away from those that
increase risk, we can create more trustable AI. This will allow us to make semiconductors
that operate more accurately, reliably and safely – that is, devices that
exhibit all the characteristics associated with dependable systems.
Reduced test time or higher test
quality?
If we use deep learning to analyze test results, we may find that we
don’t need to do as many tests – for example, 10 tests could replace a previous
test flow that required 30 tests, which would greatly reduce test time required.
But if the models are imperfect and result in more test escapes, you end up
losing dependability for the devices and the systems they go into.
Machine learning exploits correlations between measurements, but every machine learning algorithm makes mistakes. As shown in the table, there are two kinds of risks you can take: a) to remove outliers, risk failing good devices at the expense of yield loss, and lose money; or b) to reduce test time, risk passing bad devices, and lose dependability. Multivariate outlier detection can be used to find additional failures, while deep learning can be employed to detect complex, but well-characterized, failures. Either way, you need explainable decisions.
Explaining AI for engineering
applications
Applying AI algorithms to your process requires powerful visualization
tools to help you gain further insights into your data. Understanding what the
machine learning is telling you will enable you to make decisions based on the
data. Let’s take, as an example, machine learning-based debug during
post-silicon validation. After your design is complete and you have your first chips,
you now want to perform a variety of exploratory measurements on the device to
determine that it’s doing what you want it to do.
We are currently researching an innovative approach for applying
machine learning in post-silicon validation, as shown in Figure 3:
Generate. Proprietary machine learning
algorithms are used to smartly generate a set of constrained random tests that
are designed to efficiently find complex relationships and hidden effects.
Execute. The constrained random tests are then
executed on the test system. When the results show relationships under certain
conditions, we want to zero in on these and find out more about what’s going on
in these specific areas. The data collected creates a model of the system.
Analyze. Now that we have our model, we can
perform offline analysis, running through a wide range of different I/O combinations
and using proprietary machine learning algorithms to analyze the data and determine
where there may be effects or issues we need to be aware of.
Figure 3. Machine-learning based post-silicon validation comprises the three steps shown above.
In one example, we implemented this machine learning-based process to
debug the calibration for a driver chip from one of our pin electronics cards.
500,000 test cases were generated, with all inputs varied, and the results were
analyzed to find hidden relationships. Using the resulting model, virtual calibrations
of the device were run with varying input conditions and the resulting
root-mean-square (RMS) error for each calibration was predicted. The machine
learning algorithm uncovered a hidden and unexpected effect of an additional
input on the calibrated output. With this input included in the calibration,
the RMS error was reduced from approximately 600 microvolts (µV)
to under 200µV.
When we took the results, including visualizations and plots, back to the IP
provider for the chip, they were initially surprised by this unexpected effect,
but were able to review the design and find the problem within just one day of obtaining
the data. Two key benefits resulted: the calibration was improved, and the IP designer
was able to tune the design for future generations of parts.
Another application for explaining AI is fab machine maintenance, where
sensor and measurement data are being collected continuously while the machines
are running. The question is what we do with the data. With a reactive
approach, we’re flying blind, so to speak – we don’t know there’s a problem
until we’re alerted to a machine operating out of spec. This creates unexpected
downtime and creates problems with trustability and reliability. Far better is
to take a predictive approach – ideally, one based not on setting simple conditional
triggers alone, but that employs machine learning to look at the data and spot
hidden outliers or other complex issues so that a potential problem with a machine
is identified long before production problems result. By catching hidden issues
– as well as avoiding false alarms – we obtain more trustable results.
The bottom line
Dependable systems
require trustability and explainability. Machine learning
algorithms hold great promise, but they must be carefully and intelligently
applied in order to increase trust and understanding. Explaining AI approaches, which can provide greater insight and
help us make better decisions, have many powerful applications in the
semiconductor industry.
by Shang Yang, Ph.D., Senior R&D and Application Engineer, Advantest Corp.
As the range and volume of chips developed for a host of Internet of Things (IoT) applications continues to escalate, conventional failure analysis (FA) techniques are increasingly challenged by the higher input/output (I/O) density and data throughput associated with complex 2.5D and 3D IC packages. These structures are not flat and one-dimensional; they more closely resemble skyscrapers, with many “floors” or layers, as Figure 1 illustrates. In this example, these layers are sitting on a complex foundation of microbumps, interposers and through-silicon vias (TSVs), on top of a laminate material that is attached to the printed circuit board (PCB) using ball grid array (BGA) bumps. This type of complexity makes it increasingly difficult, when conducting FA on the chip structure, to pinpoint the location of a failure from the package level to the die level.
Figure 1: Multidimensional chips, such as the 3D IC package shown here, face significant challenges with respect to performing failure analysis.
Techniques such as x-ray scanning can perform FA on these devices, but these processes are lengthy, which is problematic given the fast time-to-market windows that IoT devices and applications require. For example, if a 5-micron solder bump is determined to be the source of a failure, it is highly challenging to determine whether the crack is on the top or the bottom surface of the bump. Conducting FA by performing x-ray scanning through the entire chip can take up to a few days depending on the chip complexity.
Time-domain reflectometry (TDR) is increasingly being deployed in order to determine the location of the problem more quickly. However, applying TDR analysis for defect characterization inside the die creates its own set of challenges, as this method becomes less accurate if the failure point is between the package-die interface and the transistors. This combination of challenges points to the need for a new approach to TDR.
Effective defect searching
To further aid in understanding why a revised TDR technique is necessary, let’s take a look at a general chip FA process (Figure 2) leverage two kinds of inspection – structural and functional – both of which are needed to debug the defect down to the device level. The first step is conducting a visual inspection by using the human eye or a microscope. Obvious cracks in the chip may be detected and the failure location narrowed down to the package level with approximately 1000-micron resolution.
Figure 2: Structural and functional inspection techniques are both necessary for failure analysis, but a gap exists on the functional side that conventional TDR cannot fill.
Step 2, electrical evaluation, uses an oscilloscope or curve tracer to verify the functionality of each pin. At this point, the failure location may be further narrowed down to the pin level with resolution of about 300 microns. Next, using TDR, x-ray or ultrasonic imaging, the failure point is further investigated at the interconnect level, down to a resolution of around 100 microns.
While there are a number of powerful tools that can conduct further structural inspection and analysis at the die level, a large gap exists between functional inspection steps 3 and 4, as the figure illustrates. If the density of devices inside the 100-micron scale is very high, conducting step 4 efficiently and getting down to the submicron device level for FA becomes highly difficult. Further complicating the matter is that functional solutions are faster with lower accuracy whereas structural methods are more accurate, but take much longer. A high-resolution TDR system that can deliver accurate results quickly is needed to fill this gap.
TS9000 TDR enables high-res die-level accuracy
Advantest has addressed these challenges by developing a TDR option for its TS9000 terahertz analysis system to achieve real-time analysis with ultra-high measurement resolution. The TS9000 TDR Option relies on Advantest’s TDR measurement technology to pinpoint and map circuit defects utilizing short-pulse signal processing. Figure 3 shows the difference between conventional TDR and the Advantest approach.
Figure 3. Conventional TDR is intrinsically a high-noise, high-jitter process. High-res TDR with the TS9000 option replaces the sampler and step source with photoconductive receptors, enabling low noise and very low jitter.
Using laser-based pulse generation and detection, the Advantest solution delivers impulse-based TDR analysis with ultra-low jitter, high spatial precision of less than 5 microns, and a maximum measurement range of 300mm, including internal circuitry used in TSVs and interposers.
Having a high-resolution TDR solution alone does not guarantee the ability to detect the defect all the way down to the design level. Another problem is signal loss – if it is very high, it will have two effects on the front-end-of-line reflected pulse: the pulse will have reduced amplitude and large spread. This makes it difficult to pinpoint the specific defect location.
Recursive modeling (see Figure 4) simulates “removing” all the layers to enable virtual probing at the desired level without destroying the device or being hampered by the hurdles that conventional FA techniques present. This overcomes the challenge of the probe point not always being available due to probes’ minimum pad size requirement and limited accessibility to points far inside the die. The probe can move down layer by layer, de-embedding each trace and recursively measuring the signal pulse, until the defect point can be clearly observed and characterized in the TDR waveform until the interface before FEOL.
This impulse-based TDR approach has proven to be a highly effective method for quickly localizing failure points in 2.5D/3D chip packages, with ultra-high resolution. The recursive modeling technique described, when implemented with the Advantest TS9000 TDR, can greatly increase the strength of the reflected signal and reduce the spread effect to ensure high-accuracy defect detection.
Figure 4. In recursive modeling, the layers of the device can be virtually peeled away like an onion and probing conducted far inside the die to determine a defect’s nature and location.