# Research Repository UCD | Title | 1-b Observation for Direct-Learning-Based Digital Predistortion of RF Power Amplifiers | | | | |------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|--|--| | Authors(s) | Wang, Haoyu, Li, Gang, Zhou, Chongbin, Zhu, Anding, et al. | | | | | Publication date | 2017-01-23 | | | | | Publication information | Wang, Haoyu, Gang Li, Chongbin Zhou, Anding Zhu, and et al. "1-B Observation for Direct-Learning-Based Digital Predistortion of RF Power Amplifiers" PP, no. 99 (January 23, 2017). | | | | | Publisher | IEEE | | | | | Item record/more information | http://hdl.handle.net/10197/8381 | | | | | Publisher's statement | © 2017 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works | | | | | Publisher's version (DOI) | 10.1109/TMTT.2016.2642945 | | | | Downloaded 2024-04-18 05:53:34 The UCD community has made this article openly available. Please share how this access benefits you. Your story matters! (@ucd\_oa) © Some rights reserved. For more information 1-Bit Observation for Direct Learning Based Digital Predistortion of RF Power Amplifiers Haoyu Wang, Gang Li, Chongbin Zhou, Wei Tao, Falin Liu, and Anding Zhu, Senior Member, IEEE Abstract—In this paper, we propose a low-cost data acquisition approach for model extraction of digital predistortion (DPD) of RF power amplifiers. The proposed approach utilizes only 1-bit resolution analog-to-digital converters (ADCs) in the observation path to digitize the error signal between the input and output signals. The DPD coefficients are then estimated based on the direct learning architecture using the measured signs of the error signal. The proposed solution is proved to be feasible in theory and the experimental results show that the proposed algorithm achieves equivalent performance as that using the conventional method. Replacing high resolution ADCs with 1bit comparators in the feedback path can dramatically reduce the power consumption and cost of the DPD system. The 1-bit solution also makes DPD become practically implementable in future broadband systems since it is relatively straightforward to achieve an ultra-high sampling speed in data conversion by using only simple comparators. Index Terms—Analog-to-digital converter (ADC), digital predistortion (DPD), error signal, linearization, low resolution, power amplifier (PA), wideband. ## I. INTRODUCTION N the past twenty years or so, digital predistortion (DPD) has become one of the most popular linearization techniques for radio frequency (RF) power amplifiers (PAs) in wireless communication systems, especially in cellular base stations [1], [2]. Although it already seems to be a wellestablished technique at current stage, DPD is still facing new challenges since the development of the next generation communication system never stops [2], [3]. For instance, most current DPD solutions are employed in middle to high power base stations where power consumption and cost of DPD units are negligible [2]. In future networks, small-cell base stations will be deployed, where the output power of the PA becomes much lower and thus the power consumption and cost of the digital components become an issue. There are many efforts having been made to address this issue. One idea is to employ new algorithms to simplify the DPD model. For example, compressed sensing (CS) has recently been introduced to DPD to reduce the model complexity [4]-[6]. It also has been shown that some of the distortion compensation, that is usually done at the transmitter side, can be moved to the receiver This work was supported in part by the National Natural Science Foundation of China under Grant Number 61471333 and by the Science Foundation Ireland under Grant Numbers 13/RC/2077 and 12/IA/1267. H. Wang, G. Li, C. Zhou, W. Tao and F. Liu are with the Department of Electronic Engineering and Information Science, University of Science and Technology of China, and also with the Key Laboratory of Electromagnetic Space Information, Chinese Academy of Sciences, Hefei, Anhui, China (e-mail: {wamhoyle; lgml; zhouzcb; jxtaowei}@mail.ustc.edu.cn; liufl@ustc.edu.cn). A. Zhu is with the School of Electrical and Electronic Engineering, University College Dublin, Dublin 4, Ireland (e-mail: anding.zhu@ucd.ie). side, to reduce the complexity and power consumption of the transmitters in small cells [7], [8]. One of the main concerns in DPD implementation is the bandwidth requirement of the feedback path that is used to capture the output signal from the PA for the purpose of model extraction. With carrier aggregation (CA), the signal bandwidth in long-term evolution-advanced (LTE-A) is up to 100 MHz already and it will be increased to 160 MHz or wider soon [3]. For the coming 5th generation (5G) systems, the signal bandwidth will be even wider. In DPD, the bandwidth of the feedback path usually requires five times of the signal bandwidth which means that mutli-giga samples per second (GSPS) analog-to-digital converters (ADCs) are required. The existing and forthcoming data converter technologies, however, could hardly meet this requirement. Some solutions have been proposed to reduce the signal bandwidth requirement. The band-limited method was proposed in [9] but it requires an extra bandpass filter in the RF transmit chain that is difficult and costly to design. The analog aliased sampling method in [10] can reduce the sampling rate but it needs additional analog aliasing operation. The spectral extrapolation based algorithm was reported in [11] and in [12] a forward model was first carried out and then DPD coefficients can be estimated. In [13] a two-stage DPD, i.e., a static nonlinear box cascaded with a dynamic weak nonlinear box, was proposed to decrease the feedback bandwidth. The method proposed in [14] was designed just for concurrent dual-band signals. All of the methods mentioned above require the acquisition bandwidth not narrower than the signal bandwidth. Contrarily, [15] proposed an algorithm based on random demodulation, with which an ultra-narrow feedback bandwidth is enough for wideband DPD, but it requires an extra random sequence generator in the analog domain which is hard to implement due to the cost and timealignment issue. Besides the sampling rate, the other issue relating to ADC is the resolution. Before training the DPD model, the output signal of the PA is digitized. The number of quantization bits depends on the actual system requirement. Usually in a real system, a 14-bit ADC is needed to give a minimum noise floor of -70 dBc [16]. Designing a 14-bit ADC with multi-GSPS is very challenging and costly [17]. It is therefore desirable that the required resolution can be reduced; however, this is not a straightforward task, since reducing the resolution of ADC is equivalent to increasing the noise floor of the feedback signal, which is critical to the accuracy of DPD modeling. Y. Liu *et al.* proposed a method in [18] to reduce the ADC dynamic range, but a minimum 8-bit ADC is required for achieving 1 comparable linearization performance with the conventional DPD. A 1-bit estimator was proposed to quantize the phase of the original input signal in [19] to reduce the complexity of model identification while the resolution requirement for ADCs remains the same. In this paper, a novel direct learning architecture (DLA) based 1-bit quantization method is proposed. The proposed method utilizes only 1-bit resolution comparators to measure the error signal that is then used for DPD coefficients training. The proposed approach dramatically reduces the cost of the feedback chain. Moreover, both theoretical derivation and experimental tests show that the proposed method can be extended to the systems transmitting very wideband signals. This paper is organized as follows. Section II introduces the proposed 1-bit observation method after reviewing the conventional direct learning architecture. In Section III, the time alignment, power alignment, optimization of convergence speed and the overall system complexity are discussed. The experimental results are given in Section IV, followed by a conclusion in Section V. ## II. THEORETICAL DERIVATION The principle of DPD is that a digital block, called predistorter, is inserted into the transmitter chain to preprocess the input signal before it enters the RF PA. If the two nonlinear systems, i.e., the predistorter and the PA, exactly invert each other, a highly linear system can be achieved. In order to extract the coefficients of the predistorter, a small fraction of the transmit signal is transferred back to baseband via a feedback loop. Two architectures are generally employed for model extraction: direct learning and indirect learning architecture (IDLA). The difference between DLA and IDLA has been investigated in [20]. The IDLA estimates the postinverse of the PA first and then copies the coefficients of the post-inverse estimator to the pre-inverse one. The IDLA can be run in an open-loop fashion. While the DLA is usually used in closed-loop systems and it compares the PA output with the original input directly. In low resolution systems, the performance of IDLA is limited, while DLA is able to identify the changes between input and output signals effectively, especially in the 1-bit method we will propose in Section II.B. As a result, the DLA is used for DPD modeling in this paper. # A. Conventional Direct Learning Architecture The simplified conventional DLA block diagram is shown in Fig. 1 [21], [22], where the bold lower-case vectors **x** and **y** represents the input and output sequences, respectively. More specifically, **x** and **y** are expressed as $$\mathbf{x} = [x(n - K + 1), x(n - K + 2), \dots, x(n)]^{T} \in \mathbb{C}^{K \times 1},$$ $$\mathbf{y} = [y(n - K + 1), y(n - K + 2), \dots, y(n)]^{T} \in \mathbb{C}^{K \times 1},$$ (1) where K is the length of the sequences used for training, x(n) and y(n), $n \in \mathbb{Z}$ are baseband input and output signals, respectively, and $(\cdot)^T$ denotes the matrix transpose. The output of digital predistorter is denoted by z(n), and its corresponding vector form is $\mathbf{z}$ . Various behavioral models can be used to Fig. 1. Simplified DPD block diagram based on direct learning architecture. describe the input-output relationship of the DPD [1]–[3]. For instance, the baseband equivalent expression of Volterra model is given by $$z(n) = \sum_{\substack{p=1 \ p:odd}}^{P} \sum_{m_1=0}^{M} \cdots \sum_{m_p=0}^{M} h_p(m_1, \dots, m_p)$$ $$\times \prod_{l=1}^{(p+1)/2} x(n-m_l) \prod_{l=1+(p+1)/2}^{p} x^*(n-m_l),$$ (2) where $h_p$ is the p-th order Volterra kernel, P and M are the nonlinear order and memory depth, respectively, and (2) can be rewritten in a matrix form as $$z = Xh. (3)$$ In (3), each row of $\mathbf{X} \in \mathbb{C}^{K \times L}$ consists of all of the product terms appearing in (2), and $\mathbf{h} \in \mathbb{C}^{L \times 1}$ is the coefficient vector with the length of L. Let g(.) be the transfer function of PA, then the output of PA can be expressed as $$\mathbf{y} = g(\mathbf{z}) = g(\mathbf{X}\mathbf{h}). \tag{4}$$ The cost function of the DLA-based DPD system is the $l_2$ norm of the difference between the output and input of the system, i.e., $\|\mathbf{y} - \mathbf{x}\|_2^2$ . Newtons method is one of the most popular candidates that solve this kind of nonlinear problem. To do so, the Jacobian and Hessian matrices, i.e., first-order and second-order derivatives of the cost function, are calculated first. Then the DPD coefficients can be updated in an iterative procedure [11], [21], [22]: $$\mathbf{h}_{k+1} = \mathbf{h}_k - \mu (\mathbf{X}^H \mathbf{X})^{-1} \mathbf{X}^H (\mathbf{y} - \mathbf{x}), \tag{5}$$ where $(\cdot)^H$ represents the Hermitian transpose, and the damping factor $\mu \leq 1$ . To achieve a relatively good performance using (5), one needs high resolution of the feedback signal, e.g., 14-bit ADC to digitize the output of PA, which is one of the main bottlenecks for DPD applications in the next generation communication systems. In the next subsection, we will discuss the detail of the proposed novel 1-bit observation algorithm, which exhibits comparative performance with the conventional method. # B. Proposed 1-Bit Observation for Direct Learning Based Digital Predistortion In a DLA-based DPD system, the difference between the output and input signals, y(n) - x(n), should be properly measured sample by sample, as demonstrated in (5). Both x(n) and y(n) are baseband complex values, consisting of the in-phase and quadrature (I/Q) signals. They have the form of $$x(n) = x_I(n) + j \cdot x_Q(n),$$ $y(n) = y_I(n) + j \cdot y_Q(n),$ (6) where $x_I(n)$ , $x_Q(n)$ , $y_I(n)$ and $y_Q(n)$ are all real values. An arbitrary real number can be written in the way that its sign multiplies its magnitude, i.e., $a = sign(a) \cdot |a|, a \in \mathbb{R}$ . If the magnitude information |a| is already known or can be estimated in an easy way, sign(a) is the only thing that needs to be measured to calculate the number a. By defining $\Delta_I(n)=y_I(n)-x_I(n)$ and $\Delta_Q(n)=y_Q(n)-x_Q(n)$ as the error samples for the real and imaginary parts, respectively, the difference between the output and input can be expressed as $$y(n) - x(n) = (y_I(n) - x_I(n)) + j \cdot (y_Q(n) - x_Q(n))$$ $$= sign(\Delta_I(n)) |\Delta_I(n)|$$ $$+ j \cdot sign(\Delta_Q(n)) |\Delta_Q(n)|.$$ (7) Because PA is a nonlinear device, without linearization, significant distortion can be introduced into the transmit signal, especially if the PA is run into deep compression. In a real application, however, e.g., LTE, the signal has non-constant envelope and the amplitude of the signal follows a Gaussianlike distribution. Only a small percentage of the signal with high amplitudes is affected severely by the deep compression. The magnitudes of the most error samples are relatively small, compared to the original input. Furthermore, although $|\Delta_I(n)|$ and $|\Delta_Q(n)|$ could hardly be strictly equal, they have the same statistical properties and during DPD training, the errors decrease with the number of iterations and they both approach zero when the training converges. In this work, during the model training process, we assume that the magnitude of the error sample I/Q can be approximately made equal to an updating constant, namely, $|\Delta_I(n)| \approx |\Delta_O(n)| \approx \hat{c}(n)$ . Equation (7) then becomes $$y(n) - x(n) \approx \hat{c}(n) \left( sign \left( \Delta_I(n) \right) + j \cdot sign \left( \Delta_Q(n) \right) \right)$$ = $\hat{c}(n) sign \left( \Delta(n) \right)$ , (8) where $\Delta(n) = \Delta_I(n) + j \cdot \Delta_Q(n)$ and $sign(\Delta(n))$ calculates the signs of real and imaginary parts of $\Delta(n)$ separately. The vector form for (8) is given by $$\mathbf{y} - \mathbf{x} = \left[ \Delta(n - K + 1), \Delta(n - K + 2), \dots, \Delta(n) \right]^{T}$$ $$\approx \begin{bmatrix} \hat{c}(n - K + 1)sign\left(\Delta(n - K + 1)\right) \\ \vdots \\ \hat{c}(n)sign\left(\Delta(n)\right) \end{bmatrix}$$ $$\approx \hat{c}[sign\left(\Delta(n - K + 1)\right), \dots, sign\left(\Delta(n)\right)]^{T}$$ $$\triangleq \hat{c} \cdot \mathbf{\Delta}_{s},$$ (9) where $\Delta_s$ is defined as a column vector that consists of the signs of each I/Q sample. By substituting (9) into (5), it yields $$\mathbf{h}_{k+1} = \mathbf{h}_k - \hat{c}_k (\mathbf{X}^H \mathbf{X})^{-1} \mathbf{X}^H \mathbf{\Delta}_s.$$ (10) Fig. 2. Demonstration of the relationship between conventional DPD and the proposed 1-bit DPD. As it can be seen, the data matrix X is already known, and $\hat{c}_k$ is treated as the step size for the k-th iteration. Note that the damping factor $\mu$ in (5) is combined into $\hat{c}_k$ to simplify the expression and this has no impact on the final result. Only the sign information of the error signal is thus needed for conducting the calculation in (10). This enables using 1-bit ADCs to digitize the error signal. The difference between the proposed algorithm in (10) and the conventional one in (5) is demonstrated in Fig. 2. The grey dots are the error samples, and the circle in black line denotes the objective of the conventional method with radius equaling the root mean square (RMS) of magnitudes of the error samples, while the two squares represent the targets of the proposed method with different step sizes. In the proposed algorithm, the error samples are approximately averaged to the vertexes of the square, e.g., the error samples in the first quadrat are moved to the upper-right vertex of the square. Equation (10) is similar to that used in the simultaneous perturbation method [23], [24], where a Bernoulli process is carried out to estimate the gradient. How to choose an appropriate step size $\hat{c}_k$ is critical. If it is properly chosen, (10) achieves comparative performance as (5). This issue will be discussed in detail in Section III. # III. SYSTEM IMPLEMENTATION #### A. System Description The block diagram of the proposed 1-bit observation DPD system is illustrated in Fig. 3. The main difference from the conventional DPD is that, in the feedback path, after demodulation, the analog I and Q signal is sent to a comparator to compare with the original input, respectively, to obtain the sign of the error signal, instead of being fully digitized. In this configuration, an additional digital to analog conversion path, path 2 as highlighted in Fig. 3, is added to convert the original digital I/Q to the analog domain to make the comparison. The Fig. 3. Proposed 1-bit observation DPD system. comparators here are equivalent to the conventional ADCs working with only 1-bit. The signs of the error signal are then sent to the DPD training block for model extraction. Before model extraction, time delay between the input and output samples must be properly calibrated. In the conventional system, time alignment is conducted in the digital domain by comparing the input and output data samples. In the proposed system, because only 1-bit comparators are used, the high resolution output samples are not available. A special time alignment methodology must be developed, which will be discussed in the following subsection. To facilitate time alignment, the sign of the output signal can be obtained by using the existing comparators with the reference level switched to ground, shown in Fig. 3. Another issue is power alignment. In the conventional system, power alignment is also done in the digital domain in both conventional DLA-based and IDLA-based DPDs [25], [26]. In the proposed system, power alignment must be carried out in the analog domain, because only the input and output signal levels are aligned properly, the sign of the error signal then be obtained correctly. The attenuation level of the attenuator thus must be properly chosen to ensure the powers between input and output signals are aligned before they enter the comparators. In real systems, some power control modules, e.g., variable gain amplifiers (VGAs) [27], can be applied to facilitate the implementation. # B. Time-Alignment Algorithm Calculating cross-correlation between the input and output signals in the time domain [28] for time alignment is a common approach in the conventional DPD training algorithms. This is, however, not practical in the proposed system, since only the signs of the output signal can be obtained. Directly calculating the cross-correlation between the signs of the input and output in the time domain will cause large errors. In this paper, instead, we suggest to use the frequency domain based algorithm to estimate the time delay [29], [30]. Fourier transform (FT) states that a delay in the time domain is equivalent to a phase rotation in the frequency domain. The time delay can thus be calculated from the measured phase rotation in the frequency domain. For a given set of time domain data samples, after discrete Fourier transform (DFT), the phase-frequency relation is a simple linear function expressed as $$\varphi = s \cdot f + b,\tag{11}$$ where $\varphi$ and f are phase rotation and frequency, respectively, s is the slope which is directly proportional to the time delay, b is a constant related to phase shift in the time domain. s and b can be estimated by using the least squares (LS) algorithm with the frequency domain data samples. Once the slope s is obtained, the time delay is calculated as $$t_{delay} = -\frac{N\hat{s}}{2\pi},\tag{12}$$ where N is the total number of samples used for DFT calculation, and $\hat{s}$ is the estimated slope for s in (11). The reason why the time domain cross-correlation does not work in this case is because the signal amplitudes are only at two levels. If we transform it into the frequency domain, however, the signal power in in-band is still much higher than the noise floor, despite of high quantization noise. This is illustrated in Fig. 4 where the spectra of a LTE signal with different time domain resolutions are given. To simplify the illustration, only quantization noise is considered here. From the figure, we can see that the noise floor increases while the number of bits reduces. Despite the high noise floor with 1-bit sampling, the signal power in in-band is higher than the noise about 6 dB. If we use these in-band values to form the equation in (11), we should be able to find the slope s and thus calculate the time delay between the input and output Fig. 4. Power spectral density comparison of 20 MHz LTE signal with different resolutions. signals employing (12). To avoid large errors at the edge of the spectrum, only the center part of the DFT components, e.g., samples within 80% of the bandwidth, may be used. Some example test results are given in Fig. 5 to demonstrate the effectiveness and feasibility of the frequency domain algorithm under the scenario of low resolution. A 20 MHz long-term evolution (LTE) signal was sent to the PA. The output signal was captured with a 14-bit resolution ADC and the delay between the output and the original input signal was estimated to be 4.15 sample intervals (SI), with the cross correlation method in the time domain. For comparison, we obtain the sign of the input and output signals from the high resolution version and then estimate the time delay in the frequency domain. In Fig. 5 (a) and (b), the estimated time delay is 4.141 SI by utilizing the frequency domain based algorithm with the high resolution signal, while 3.670 SI is obtained using 1-bit resolution. Although the estimated delay in (b) is less accurate than that in (a), the results become better when the time delay is relatively small as shown in Fig. 5 (c) and (d), in which the original input signal was first delayed by 3.670 SI in the digital domain. In these cases, the actual delay is 0.475 SI. The algorithm using high resolution signal almost predict the accurate delay, i.e., 0.473 SI, while the estimated delay using 1-bit resolution is 0.462 SI, with only 0.013 SI difference. It is recommended that the time delay should be less than $1/64 \approx 0.016$ SI to achieve reasonable linearization performance [31]. As a result, repeating calculations for three or four times are enough for the time delay estimation. In the conventional system, time delay is only required to be calculated for aligning the captured input and output samples in the digital domain for model extraction purpose. In the proposed system, as described in Section II.B and shown in Fig. 3, the PA output is compared directly with the original input in the analog domain to find the sign of the error. This leads that the time delay between *path 1* and *path 2* must be fully calibrated before data acquisition. To do so, in this work, we propose to find the time delay from input to output for *path 1* and *path 2* respectively first, and then tune the time Fig. 5. Time delay estimation using the frequency domain based algorithm. (a) High resolution signal with 4.15 SI delay. (b) Low resolution signal with 4.15 SI delay. (c) High resolution signal with 0.475 SI delay. (d) Low resolution signal with 0.475 SI delay. delay block in the input of $path\ 2$ to align the two paths. To proceed, as illustrated in Fig. 3, we first set the two switches in $path\ 1$ connected to SW-II (ADC) and the other two in $path\ 2$ connected to SW-II-1 (ground) so that the comparators directly measure the signs of the output signal from the PA to estimate the delay in $path\ 1$ , $t_1$ , by using the frequency domain estimation approach proposed earlier. We then switch $path\ 1$ to SW-II-2 (ground) and connect $path\ 2$ to SW-II (ADC) to estimate the time delay in $path\ 2$ , $t_2$ . Once both $t_1$ and $t_2$ in the two paths are estimated, it is ready for the system to work in Mode II, and the two delay blocks on the left part of Fig. 3 are switched on to delay the input signal by $t_1-t_2$ (assuming $t_1>t_2$ ). The detailed procedure of the proposed time alignment algorithm is given in Table I. In the conventional system, calibrating fractional time delay in the discrete time domain may involve large computation. TABLE I PROCEDURE OF TIME ALIGNMENT | 1 | Estimate time delay $t_1$ for path $1$ | | | |-----|-----------------------------------------------------------------------------------------------------------|--|--| | 1.1 | Connect path 1 to SW-II (ADC), path 2 to SW-I-1 (ground). | | | | 1.2 | Measure the signs of PA output signal $sign(\mathbf{y})$ . | | | | 1.3 | Initialize $t_1 = 0$ , $\Delta t = 0$ , $i = 0$ , $\mathbf{x}_i = \text{original input } \mathbf{x}$ . | | | | 1.4 | repeat | | | | 1.5 | Calculate time delay $\Delta t$ between $sign(\mathbf{y})$ and $\mathbf{x}_i$ by utilizing (11) and (12). | | | | 1.6 | Delay $\mathbf{x}_i$ by $\Delta t$ in the digital domain, denoted by $\mathbf{x}_{i+1}$ . | | | | 1.7 | Update $t_1$ as $t_1 = t_1 + \Delta t$ . | | | | 1.8 | Update $i$ as $i = i + 1$ . | | | | 1.9 | until $\Delta t < 1/64$ SI. | | | | 2 | Estimate time delay $t_2$ for path 2 | | | | 2.1 | Connect path 1 to SW-I-2 (ground), path 2 to SW-II (ADC). | | | | 2.2 | Measure the signs of signal $sign(\mathbf{x})$ in path 2. | | | | 2.3 | Repeat 1.3-1.9. | | | | 3 | Connect both path 1 and path 2 to SW-II (DPD training mode). | | | | 4 | Delay the original input signal by $t_1 - t_2$ in path 2. | | | In the proposed approach, the time delay can be adjusted by using analog components in *path* 2 and thus complex digital interpolation can be avoided, though the analog design requirement is higher. #### C. Estimation of the Step Size Another important issue in the proposed model extraction, i.e., (10), is the choice of the step size $\hat{c}_k$ , which is critical to the linearization performance as well as the convergence speed. If the step size is too small, the algorithm may converge very slowly, contrarily if it is too large, the performance can be very poor. As was mentioned in Section II.B that the proposed method is somewhat similar to the simultaneous perturbation method, $\hat{c}_k$ typically has the form of $$\hat{c}_k = \frac{c_0}{k^{\gamma}},\tag{13}$$ where $c_0$ and $\gamma$ are constants which should be properly chosen, k is the iteration number started from 1. It is effective to set $c_0$ at a level approximately equal to the standard deviation of the measurement [32], i.e., the standard deviation of $\mathbf{y} - \mathbf{x}$ in the first iteration. As the error samples show zero mean, the radius of black circle in Fig. 2 can be evaluated by the standard deviation of $\mathbf{y} - \mathbf{x}$ . Unfortunately, the difference between input and output is not directly obtained in the proposed method. As a result, a new formula should be developed to estimate the standard deviation properly and effectively. For a given PA, the AM-AM characteristic can be obtained from the datasheet, or measured from a simple single-tone test under a specific frequency. Lets define $P_{in}^{peak}$ as the peak input power under a given average input power level and the peak-to-average power ratio (PAPR) of the original signal. The corresponding actual peak output power is denoted by $P_{out}^{peak}$ and the expected peak output power with the small signal gain is $P_{out}^{peak,sg}$ . Here we propose a novel algorithm using the characteristic of PA, the RMS of the input sequence $\mathbf{x}$ defined in (1) and signal bandwidth to predict $c_0$ . It can be expressed as $$c_{0} = \lambda \times \left(0.5 + \frac{B}{100 \ MHz}\right) \times rms\left(\frac{\mathbf{x}}{\max|\mathbf{x}|}\right) \times \left(\sqrt{\frac{P_{out}^{peak,sg}}{P_{out}^{peak}}} - 1\right),$$ (14) where $\lambda$ is a damping factor defined in the region (0,1], and B denotes the input signal bandwidth. If the peak input power is equal to the power where 1 dB compression just appears, the signal bandwidth is 20 MHz and assume the RMS of the input sequence to be 0.46, then the standard deviation approximates $c_0 = (0.5+0.2)\times 0.46\times (10^{1/20}-1)\approx 0.04$ by utilizing (14). The damping factor $\lambda$ is to fine tune the step size. Usually $\lambda$ is set to 1 for simplicity. In practical, if the linearization performance is more important than the convergence speed, the value of $\lambda$ can be reduced, e.g. $\lambda = 0.8$ . Another factor in (13), $\gamma$ , is also important for the convergence speed and accuracy. A general criterion for choosing a reasonable $\gamma$ is that the ratio between the two adjacent step sizes satisfies $$\frac{\hat{c}_{k-1}}{\hat{c}_k} \approx \frac{std(\mathbf{y} - \mathbf{x})_{k-1}}{std(\mathbf{y} - \mathbf{x})_k},\tag{15}$$ where $std(\cdot)$ denotes the standard deviation of a sequence. It is often difficult to evaluate the standard deviations in (15). One should try many possible values and then determine which one is the best for a particular system. Based on our test cases, the factor $\gamma$ can be obtained by $$\gamma = 5c_0 + \sqrt{2} \cdot rms\left(\frac{\mathbf{x}}{\max|\mathbf{x}|}\right). \tag{16}$$ During model extraction, multiple iterations are normally conducted, and after each iteration, the step size $\hat{c}_k$ , obtained from (13), will continuously decrease, and ideally it should approach zero when the system converges. However, in a real system, due to measurement errors and noise, to make the system stable, we should set a predefined threshold $\delta$ , and when $\hat{c}_k$ is smaller than $\delta$ , it remains unchanged, i.e., $\hat{c}_k = \max(c_0/k^\gamma, \delta)$ . #### D. Overall Complexity Comparison The proposed 1-bit observation method uses only two simple comparators to quantize the error signal, as shown in Fig. 3. Removing high resolution ADCs from the system can drastically reduce the power consumption as well as the cost of the feedback loop, since the ADC is the one of the most expensive and power consuming components in the RF front-end [12]. Although an extra DAC is inserted, the overall complexity of the proposed method is still much less than that of the conventional one. A roughly estimated power consumptions are listed in Table II, based on the commercial devices available from the official website of Analog Devices Incorporated. Assuming the DPD correction bandwidth is 500 MHz, the total power consumption of the proposed method is 1.26 W, which is much less than that of the conventional one. Since only 1-bit comparators are required in the proposed feedback loop, the sampling speed can be pushed to much | TABLE II | |---------------------------------------------------------| | COMPLEXITY COMPARISON BETWEEN PROPOSED AND CONVENTIONAL | | DPD | | | | Conventional | Proposed | | |-------------------------|---------------------------|--------------|----------|----------| | DAC | Model type | AD9779 | AD9779 | AD9136 | | | Number | 1 | 2 | 2 | | | Sampling rate | 500 | 500 | 2000 | | | (MSPS) | | | | | | Resolution (Bits) | 16 | 16 | 16 | | | Approx. power consumption | 0.6 W | 1.2 W | 2.9 W | | ADC | Model type | AD9684 | ADCMP553 | ADCMP573 | | | Number | 1 | 2 | 2 | | | Sampling rate<br>(MSPS) | 500 | 500 | 2000 | | | Resolution (Bits) | 14 | 1 | 1 | | | Approx. power consumption | 2.2 W | 0.06 W | 0.2 W | | Total power consumption | | 2.8 W | 1.26 W | 3.1 W | higher to meet the wider bandwidth requirement for future systems. For instance, a system with a 2 GSPS sampling rate may be used in the proposed DPD. This system can handle 4 times wider signals than that using the conventional system but it consumes an equivalent level of power. As a result, the proposed DPD is a very promising solution suitable for the systems transmitting very wideband signals. In terms of computational complexity, the proposed algorithm in (10) also outperforms the conventional method in (5). Firstly, (5) has K (the number of samples) more extra complex addition operation, i.e., $\mathbf{y} - \mathbf{x}$ , than (10) since the error signs in vector $\mathbf{\Delta}_s$ are obtained from the two analog comparators directly. Secondly, the complex multiplications of $\mathbf{X}^H \mathbf{\Delta}_s$ in (10) require less hardware resources than that of $\mathbf{X}^H(\mathbf{y} - \mathbf{x})$ in (5). This is because the low resolution values require less storage and exhibit faster read and write operations than the high resolution samples. In summary, the overall complexity of the proposed 1-bit DPD method is much less than that of the conventional DLA-based DPD. # IV. EXPERIMENTAL RESULTS Various experimental tests were conducted to evaluate the proposed method. Fully implementing the proposed 1-bit data observation based DPD in hardware shown in Fig. 3 is difficult because the two data acquisition paths must be realized in an analog circuit chip which will take considerable time and efforts to accomplish. For proof of concept, in this work, we use a conventional DPD test bench to capture high resolution output signals from PA output and then convert to the low resolution samples in MATLAB to emulate the scenario employing the real 1-bit observation approach. The test bench set up, shown in Fig. 6, is the same as that used in [9] except the PA. It consists of a baseband board, an RF board, a spectrum analyzer and a broadband Doherty PA [33] Fig. 6. Experimental test bench setup. operated at 2.14 GHz. The baseband board was designed to configure the RF board, generate and digitize the input and output signals, respectively. The quadrature modulation and demodulation were performed in the RF board and DPD signal generation was conducted in MATLAB. #### A. Proposed Method versus Conventional Method To validate the feasibility of the proposed method, we first assume the input and output signals are perfectly time aligned, and the output signal is normalized so that the average power of the output is the same as that of input signal. 1) Evaluation with 20 MHz LTE signal: A single carrier LTE signal with 20 MHz bandwidth and 6.8 dB PAPR is used to excite the PA in the first test. A decomposed vector rotation (DVR) model [34] is used as the DPD model, with the partition number of 8 and memory depth equaling 3. The measured average output power of PA is 33.55 dBm, and the average input power is 20.79 dBm which is obtained from the input-output power curve [33]. The peak input power is 20.79 + 6.8 = 27.59 dBm. Again from the input-output power curve, the peak output power is $P_{out}^{peak}=39.36~\mathrm{dBm}$ . The PA has a small signal gain of approximately 12.84 dB, so the expected peak output power is $P_{out}^{peak,sg} \approx 27.59 + 12.84 = 40.43$ dBm. The RMS value of the input sequence is 0.464. By using (14) and assuming $\lambda = 1$ , the initial step size $c_0$ is computed as $c_0 = (0.5 + 0.2) \times 0.464 \times (10^{(40.43 - 39.36)/20} - 1) \approx 0.04$ . Then we obtain $\gamma = 5 \times 0.04 + 0.464 \times 1.414 \approx 0.9$ by utilizing (16). The step size $\mu$ of conventional method (5) is set to 1. There are 32768 input samples in total, and the first 15000 samples are used for DPD coefficients extraction. The conventional DPD converges after about 10 iterations, while the 1-bit DPD takes 5~7 more iterations to achieve the similar performance. The power spectral density (PSD) comparison is demonstrated in Fig. 7(a), from which we can see that the proposed 1bit DPD shows comparative linearization performance with the conventional method. The adjacent channel power ratio (ACPR) is better than -56 dBc for 1-bit DPD which is only 2~3 dB worse than that for the conventional DPD. The AM-AM and AM-PM characteristics are shown in Fig. 7(b), both the nonlinearity and memory effect of PA are well compensated by utilizing the proposed 1-bit DPD. 2) Evaluation with 60 MHz UMTS signal: A 12-carriers universal mobile telecommunications system (UMTS) signal with 60 MHz bandwidth and 6.8 dB PAPR is used in the second test. The DPD model is still the DVR model with Fig. 7. Measured results for 20 MHz LTE signal. (a) Power spectral density comparison with and without DPD. (b) AM-AM and AM-PM characteristics without DPD and with proposed 1-bit DPD. Fig. 8. Measured results for 60 MHz UMTS signal. (a) Power spectral density comparison with and without DPD. (b) AM-AM and AM-PM characteristics without DPD and with proposed 1-bit DPD. the partition number of 8 and memory depth of 3. The measured average output power of PA is 33.70 dBm, and the average input power is 20.95 dBm which is obtained from the input-output power curve. The peak input power is 20.95 + 6.8 = 27.75 dBm. Again from the input-output power curve, the peak output power is $P_{out}^{peak} = 39.48$ dBm. The expected peak output power is $P_{out}^{peak,sg} \approx 27.75 + 12.84 = 40.50$ dBm. The BMG. 40.59 dBm. The RMS value of the input sequence is 0.468. Assume $\lambda = 1$ and then the parameter $c_0$ is computed as $c_0 = (0.5 + 0.6) \times 0.468 \times (10^{(40.59 - 39.48)/20} - 1) \approx 0.07$ . Then $\gamma$ is obtained as $\gamma = 5 \times 0.07 + 0.468 \times 1.414 \approx 1$ by utilizing (16). The step size $\mu$ of conventional method (5) is set to 0.707. The PSD comparison, AM-AM and AM-PM characteristics are shown in Fig. 8(a) and Fig. 8(b), respectively. The ACPR values for both the proposed 1-bit DPD and conventional DPD are nearly -55 dBc, which again indicates that the proposed 1bit DPD has almost the same capability with the conventional DPD in dealing with the nonlinearity and memory effect of PA. Note that the parameters $c_0=0.07$ and $\gamma=1$ are just the recommended values in this test. Other step sizes can also be used. More test results, in terms of convergence speed, with different parameters setup are illustrated in Fig. 9, where NMSE in the vertical axes stands for the normalized mean square error. From the results, we can see, although different parameters in a reasonable region show similar performance, properly designed parameters indeed can help improve the convergence speed and the linearization performance. # B. Performance Evaluation with Proposed Time Alignment Algorithm To validate the proposed time alignment algorithm, the PA output signal is first quantized with 1-bit resolution in MATLAB, and then the original signal is synchronized with the low resolution output signal by utilizing the frequency based time aligning procedure proposed in Section III.B. Note that in our test bench, the feedback loop has a non-ideal phase response, more specifically, the feedback signal has a non- Fig. 9. Convergence speed of conventional DLA-based DPD and 1-bit DPD with different parameters setup for 60 MHz UMTS signal. (a) $\gamma=1$ with different $c_0$ . (b) $c_0=0.07$ with different $\gamma$ . constant group delay, which may have strong impact on the proposed frequency domain based algorithm. To counteract this effect, the in-band phase distortion in the frequency domain is compensated before the time alignment algorithm is carried out. The feedback signal is time-aligned with the original input using conventional algorithm first. We then delayed the aligned output signal digitally by 5 SI to emulate the real possible delay between path 1 and path 2, and finally 1-bit quantized. The proposed time alignment algorithm was then conducted to calculate the delay between the delayed output with 1-bit resolution and the original input signal. In our test, 3<sup>-5</sup> iterations are enough for the convergence of the algorithm. The PSD comparison is shown is Fig. 10, where all the DPD methods are iterated for 20 times, although more iterations can still improve the performance of the 1-bit method. The ACPR result with the proposed time alignment algorithm is about 5 dB worse than both the 1-bit DPD and conventional DPD with perfect aligned data. The performance degradation using the frequency-based time alignment algorithm is mainly caused by the non-constant group delay in the feedback loop, which cannot be fully compensated. Actually, in the system demonstrated in Fig. 3, one can tune the time mismatch between the input and output signal manually to achieve a better performance since it is done only once in the initial setup. #### C. Impact of Power Mismatch The power alignment is implemented in the analog domain in the proposed 1-bit DPD system, which is different from the conventional normalization in the digital domain. Due to Fig. 10. Power spectral density comparison with proposed time alignment algorithm and conventional time alignment algorithm (perfect). Fig. 11. Power spectral density comparison with different amplitude (power) mismatches. the temperature or voltage variation of the analog devices, the power of the input and output signal sometimes cannot be perfectly aligned. The power mismatch issue is evaluated in this subsection. To emulate the real power variation, both the maximum amplitudes of input and output signal are normalized to 1, and the maximum amplitude of output signal is then scaled to different levels $\rho$ , i.e., $\rho = \{0.95, 0.98, 0.99, 1.02, 1.05, 1.1\}$ as shown in Fig. 11. From the PSD comparison, it is clearly demonstrated that when $\rho = 1.02$ , the linearization performance almost keeps unchanged compared to the conventional DPD. However when $\rho$ increases, the performance degrades. On the other hand, when $\rho < 1$ , the linearization performance degrades very quickly with the decrease of $\rho$ . This is because when $\rho < 1$ , the DPD model is only designed for the signal in the magnitude region $[0, \rho]$ , the samples whose magnitudes are larger than $\rho$ encounter the extrapolation problem and large errors may occur. Contrarily, when $\rho > 1$ , although the power is not perfectly matched, the DPD is capable of dealing with all the samples falling in the region [0,1], and thus less error appears in this case. The tuning step of the existing commercial available digital controlled VGAs can be as small as 0.125 dB/step, which can enable the maximum scaling factor approximately to 1.029. With integrated circuit solutions, the tuning precision can be further improved. The power misalignment between the two paths is thus practically controllable. #### V. CONCLUSION This paper proposes a low-complexity 1-bit observation method for estimation of DPD coefficients. The feasibility of the proposed algorithm is proved in theory and validated in experimental tests. With the existing ADC technology, it is possible to achieve either high sampling speed with low resolution or high resolution with low sampling speed, but hardly to have both high sampling speed and high resolution at the same time. The 1-bit observation solution eases the requirement of ADC in DPD system, and thus reduces both the power consumption and the cost of the feedback path, compared to the conventional algorithms with high resolution data. For future outlook, the proposed method has two main potentials: 1) Applying DPD in small cells becomes a reality due to the ultra-low complexity; 2) It is possible to employ DPD in future broadband systems, such as 5G, since it can achieve ultra-high sampling speed by using only simple comparators. Finally, the authors expect that this technique will attract significant interests and attention among researchers in the field of DPD. #### ACKNOWLEDGMENT The authors would like to thank Y. Guo, N. Kelly, W. Cao, M. Yang and Y. Wang, all with School of Electrical and Electronic and Engineering, University College Dublin, Dublin 4, Ireland, for providing valuable discussion and helpful test support during this work. H. Wang would like to thank Mr. and Ms. O'Dwyer for providing excellent accommodation for him during his stay in Ireland. ### REFERENCES - F. M. Ghannouchi, and O. Hammi, "Behavioral modeling and predistortion," *IEEE Microw. Mag.*, vol. 10, no. 7, pp. 52–64, Dec. 2009. - [2] L. Guan, and A. Zhu, "Green communications: digital predistortion for wideband RF power amplifiers," *IEEE Microw. Mag.*, vol. 15, no. 7, pp. 84–99, Nov. 2014. - [3] J. Wood, "Digital pre-distortion of RF power amplifiers: progress to date and future challenges," in *IEEE MTT-S Int. Microw. Symp. Dig.*, May 2015, pp. 1–3. - [4] J. Reina-Tosina, M. Allegue-Martnez, M. J. Madero-Ayora, C. Crespo-Cadenas, and S. Cruces, "Digital predistortion based on a compressed-sensing approach," in *Proc. Eur. Microw. Conf.*, Nuremberg, Oct. 2013, pp. 408–411. - [5] A. Abdelhafiz, A. Kwan, O. Hammi, and F. M. Ghannouchi, "Digital predistortion of LTE-A power amplifiers using compressed-samplingbased unstructured pruning of Volterra series," *IEEE Trans. Microw. Theory Techn.*, vol. 62, no. 11, pp. 2583–2593, Nov. 2014. - [6] J. Reina-Tosina, M. Allegue-Martnez, C. Crespo-Cadenas, C. Yu, and S. Cruces, "Behavioral modeling and predistortion of power amplifiers under sparsity hypothesis," *IEEE Trans. Microw. Theory Techn.*, vol. 63, no. 2, pp. 745–753, Feb. 2015. - [7] M. B. Mabrouk, G. Ferre, E. Grivel, and N. Deltimple, "Interacting multiple model based detector to compensate power amplifier distortions in cognitive radio," *IEEE Trans. Commun.*, vol. 63, no. 5, pp. 1580– 1593, May 2015. - [8] M. V. Amiri, S. A. Bassam, M. Helaoui, and F. M. Ghannouchi, "Partitioned distortion mitigation in LTE radio uplink to enhance transmitter efficiency," *IEEE Trans. Microw. Theory Techn.*, vol. 63, no. 8, pp. 2661–2671, Aug. 2015. - [9] C. Yu, L. Guan, E. Zhu, and A. Zhu, "Band-limited Volterra series-based digital predistortion for wideband RF power amplifiers," *IEEE Trans. Microw. Theory Techn.*, vol. 60, no. 12, pp. 4198–4208, Dec. 2012. - [10] L. Ding, F. Mujica, and Z. Yang, "Digital predistortion using direct learning with reduced bandwidth feedback," in *IEEE MTT-S Int. Microw. Symp. Dig.*, Seattle, WA, June 2013, pp. 1–3. - [11] Y. Ma, Y. Yamao, Y. Akaiwa, and K. Ishibashi, "Wideband digital predistortion using spectral extrapolation of band-limited feedback signal," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 61, no. 7, pp. 2088–2097, July 2014. - [12] Y. Liu, J. J. Yan, H. T. Dabag, and P. M. Asbeck, "Novel technique for wideband digital predistortion of power amplifiers with an undersampling ADC," *IEEE Trans. Microw. Theory Techn.*, vol. 62, no. 11, pp. 2604–2617, Nov. 2014. - [13] O. Hammi, A. Kwan, S. Bensmida, K. A. Morris, and F. M. Ghannouchi, "A digital predistortion system with extended correction bandwidth with application to LTE-A nonlinear power amplifiers," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 61, no. 12, pp. 3487–3495, Dec. 2014. - [14] Y. Liu, J. J. Yan, and P. M. Asbeck, "Concurrent dual-band digital predistortion with a single feedback loop," *IEEE Trans. Microw. Theory Techn.*, vol. 63, no. 5, pp. 1556–1568, May 2015. - [15] W. Tao, H. Wang, C. Zhou, G. Li, and F. Liu, "A random demodulation based reduced sampling rate method for wideband digital predistortion," in *Proc. AsiaPacific Microw. Conf.*, vol. 1, Nanjing, China, Dec 2015, pp. 1–3. - [16] L. Guan, R. Kearney, C. Yu, and A. Zhu, "High-performance digital predistortion test platform development for wideband RF power amplifiers," *Int. J. Microw. Wirel. T.*, vol. 5, no. 2, pp. 149–162, 2013. - [17] R. H. Walden, "Analog-to-digital converter survey and analysis," *IEEE J. Sel. Areas Commun.*, vol. 17, no. 4, pp. 539–550, Apr. 1999. - [18] Y. Liu, X. Quan, S. Shao, and Y. Tang, "Digital predistortion architecture with reduced ADC dynamic range," *Electron. Lett.*, vol. 52, no. 6, pp. 435–437, Mar. 2016. - [19] Y. Ma, Y. Yamao, Y. Akaiwa, and C. Yu, "FPGA implementation of adaptive digital predistorter with fast convergence rate and low complexity for multi-channel transmitters," *IEEE Trans. Microw. Theory Techn.*, vol. 61, no. 11, pp. 3961–3973, Nov. 2013. - [20] R. N. Braithwaite, "A comparison of indirect learning and closed loop estimators used in digital predistortion of power amplifiers," in *IEEE MTT-S Int. Microw. Symp. Dig.*, Phoenix, AZ, May 2015, pp. 1–4. - [21] H. W. Kang, Y. S. Cho, and D. H. Youn, "On compensating nonlinear distortions of an OFDM system using an efficient adaptive predistorter," *IEEE Trans. Commun.*, vol. 47, no. 4, pp. 522–526, Apr. 1999. - [22] D. Zhou, and V. E. DeBrunner, "Novel adaptive nonlinear predistorters based on the direct learning algorithm," *IEEE Trans. Signal Process.*, vol. 55, no. 1, pp. 120–133, Jan. 2007. - [23] J. C. Spall, "An overview of the simultaneous perturbation method for efficient optimization," *Johns Hopkins apl tech. dig.*, vol. 19, no. 4, pp. 482–492, 1998. - [24] N. Kelly, and A. Zhu, "A modified simultaneous perturbation stochastic optimization algorithm for digital predistortion model extraction," in *Proc. IEEE Int. Workshop Integr. Nonlinear Microw. Millimetre-Wave Circuits*, Oct. 2015, pp. 1–3. - [25] O. Hammi, and F. M. Ghannouchi, "Power alignment of digital predistorters for power amplifiers linearity optimization," *IEEE Trans. Broadcast.*, vol. 55, no. 1, pp. 109–114, Mar. 2009. - [26] H. Wang, F. Liu, W. Tao, and G. Li, "One-step extraction of optimal normalisation gain for digital predistortion linearisation," *Electron. Lett.*, vol. 51, no. 6, pp. 514–516, Mar. 2015. - [27] H. Qian, H. Huang, and S. Yao, "A general adaptive digital predistortion architecture for stand-alone RF power amplifiers," *IEEE Trans. Broad*cast., vol. 59, no. 3, pp. 528–538, Sep. 2013. - [28] C. D. Presti, D. F. Kimball, and P. M. Asbeck, "Closed-loop digital predistortion system with fast real-time adaptation applied to a handset WCDMA PA module," *IEEE Trans. Microw. Theory Techn.*, vol. 60, no. 3, pp. 604–618, Mar. 2012. - [29] G. Jacovitti, and G. Scarano, "Discrete time techniques for time delay estimation," *IEEE Trans. Signal Process.*, vol. 41, no. 2, pp. 525–533, Feb. 1993. - [30] Y. Ma, Y. Akaiwa, Y. Yamao, and S. He, "Test bed for characterization and predistortion of power amplifiers," *Int. J. RF Microw. Comput. Eng.*, vol. 23, no. 1, pp. 74–82, 2013. - [31] A. S. Wright, and W. G. Durtler, "Experimental performance of an adaptive digital linearized power amplifier [for cellular telephony]," *IEEE Trans. Veh. Technol.*, vol. 41, no. 4, pp. 395–400, Nov. 1992. - [32] J. C. Spall, "Implementation of the simultaneous perturbation algorithm for stochastic optimization," *IEEE Trans. Aerosp. Electron. Syst.*, vol. 34, no. 3, pp. 817–823, July 1998. - [33] M. Yang, J. Xia, and A. Zhu, "A 1.8-2.3 ghz broadband doherty power amplifier with a minimized impedance transformation ratio," in *Proc. Asia-Pacific Microw. Conf.*, vol. 1, Nanjing, China, Dec. 2015, pp. 1–3. - [34] A. Zhu, "Decomposed vector rotation-based behavioral modeling for digital predistortion of RF power amplifiers," *IEEE Trans. Microw. Theory Techn.*, vol. 63, no. 2, pp. 737–744, Feb. 2015. **Haoyu Wang** was born in Deyang, Sichuan Province, China, in 1989. He received the B.E. degree in electronic engineering from University of Science and Technology of China (USTC), Hefei, China, in 2012. He is currently pursuing the Ph.D. degree in electromagnetic filed and microwave technology at USTC. From Mar. to May, 2016, he was a visiting Ph.D. student with the IoE<sup>2</sup> Lab, School of Electrical and Electronic and Engineering, University College Dublin, Ireland. His research interest includes be- havioral modeling of RF power amplifiers, digital predistortion linearization and digital signal processing. Gang Li received the B.E. degree in electronic engineering from University of Science and Technology of China (USTC), Hefei, China, in 2013. He is currently working toward the Ph.D. degree in electromagnetic filed and microwave technology at USTC. His research interests include digital predistortion and nonlinear modeling of transmitters. Chongbin Zhou was born in Luoyang, Henan Province, China, in 1988. He received the B.E. degree from University of Science and Technology of China (USTC), Hefei, China, in 2011. He is currently working towards the Ph.D. degree at USTC. His research interests include compressive sensing, signal processing and radar imaging. Wei Tao received the B.E. degree in electronic information science and technology from Dalian Maritime University, Dalian, China, in 2013. He is currently working toward the M.S. degree in electromagnetic filed and microwave technology at University of Science and Technology of China, Hefei, China. His research interests focus on the area of wireless communication technology, mainly includes power amplifier linearization, digital predistortion, and nonlinear modeling of transmitter. Falin Liu was born in Xingtai, Hebei Province, China, in July 1963. He received the B.E. degree from Tsinghua University (THU), Beijing, China, in 1985, and the M.E. and Ph.D. degrees from University of Science and Technology of China (USTC), Hefei, China, in 1988 and 2004, respectively, all in electronic engineering. Since 1988, he has been with the Department of Electronic Engineering and Information Science, USTC, where he is currently a full Professor. From Oct. 1997 to Oct. 1998, he was a Visiting Scholar in Tohoku University, Japan. He has published over 90 papers in refereed journals and international conferences. His research interests include mm-wave transceivers, passive devices, computational electromagnetics, microwave communications and radar imaging. Dr. Liu is a senior member of Chinese Institute of Electronics. He is the associate editor-in-chief of Journal of Microwaves (in Chinese) and a member of the editorial board of Journal of Radars (in Chinese). He is the recipient of the second prize of National Science and Technology Progress Award and the first prize of the CAS (Chinese Academy of Sciences) Science and Technology Progress Award. Anding Zhu (S'00-M'04-SM'12) received the B.E. degree in telecommunication engineering from North China Electric Power University, Baoding, China, in 1997, the M.E. degree in computer applications from Beijing University of Posts and Telecommunications, Beijing, China, in 2000, and the Ph.D. degree in electronic engineering from University College Dublin (UCD), Dublin, Ireland, in 2004. He is currently an Associate Professor with the School of Electrical and Electronic Engineering, UCD. His research interests include high-frequency nonlinear system modeling and device characterization techniques with a particular emphasis on behavioral modeling and linearization of RF power amplifiers for wireless communications. He also has interests in high efficiency power amplifier design, wireless transmitter architectures, digital signal processing and nonlinear system identification algorithms.