A High IIP2 SAW-Less Superheterodyne Receiver With Multistage Harmonic Rejection

Iman Madadi, Member, IEEE, Massoud Tohidian, Member, IEEE, Koen Cornelissens, Patrick Vandennameele, and Robert Bogdan Staszewski, Fellow, IEEE

Abstract—In this paper, we propose and demonstrate the first fully integrated surface acoustic wave (SAW)-less superheterodyne receiver (RX) for 4G cellular applications. The RX operates in discrete-time domain and introduces various innovations to simultaneously improve noise and linearity performance while reducing power consumption: a highly linear wideband noise-canceling low-noise transconductance amplifier (LNTA), a blocker-resilient octal charge-sharing bandpass filter, and a cascaded harmonic rejection circuitry. The RX is implemented in 28-nm CMOS and it does not require any calibration. It features NF of 2.1–2.6 dB, an immeasurably high input second intercept point for closely-spaced or modulated interferers, and input third intercept point of 8–14 dBm, while drawing only 22–40 mW in various operating modes.

Index Terms—Bandpass filter (BPF), charge-sharing, discrete-time, IIP2, process-scalable, receiver, surface acoustic wave (SAW)-less, superheterodyne.

I. INTRODUCTION

C ONVENTIONAL multiband, multistandard cellular receivers (RXs) require many external duplexers, surface acoustic wave (SAW) filters and switches, typically one per band, to attenuate out-of-band (OB) blockers before they reach the sensitive low-noise amplifier (LNA) input. In time-division duplexing (TDD) systems, external SAW filters can be eliminated if the RX chain can handle large interferers (e.g., 0 dBm at 20 MHz away from a GSM channel of interest [1]). On the other hand, for frequency-division duplexing (FDD) systems, the external SAW filters are responsible for not only the filtering of OB blockers but also for duplexing, i.e., separation of concurrent transmit (TX) and RX operations. To reduce cost and size of the total system solution, in which the external antenna interface network is nowadays the largest contributor, the recent trend is to eliminate SAW filters and switches by using a highly linear wideband RX [2]–[7]. As a consequence, the isolation of TX-to-RX and the suppression of TX interferers are worsening, which all further increase RX linearity requirements in FDD systems.

The resulting reductions in OB filtering imply tough IIP2 requirements (e.g., 90 dBm [7], [8]) for zero-IF (ZIF) and low-IF (LIF) receivers. The IIP2 performance of such receivers depends mainly on the second-order nonlinearity of LNA and RF mixer in the receiver chain, as shown in Fig. 1(a). Since the typical IIP2 of an RF mixer is between 50 and 70 dB [9], ZIF/LIF receivers require highly sophisticated calibration algorithms [7], [10]–[15] to be frequently executed to account for variations in power supply [4], [16]–[20], process corner [20], temperature [21], mixer transistor’s gate bias [16], RF blocker frequency [14], [17], [19], [20], LO frequency [17], [19], [20], LO power [20], and channel frequency [21]. Also, the IIP2 calibration time is rather very slow and it needs to be run repeatedly due to environmental and operational changes [16].

Superheterodyne or high-IF (HIF) architectures, on the other hand, can have a theoretically infinite IIP2. As shown in Fig. 1(b), the desired signal and modulated blocker at the RF input will be down-converted to a higher IF and dc, respectively; thus, the modulated blocker can be completely filtered out by a bandpass filter (BPF) [22], [23]. For this reason, there is an increasing interest in uncalibrated high-IIP2 SAW-less...
superheterodyne RXs with integrated blocker-tolerant BPFs that are amenable to CMOS scaling.

This paper is organized as follows. An overview of wireless receivers is presented in Section II. In Section III, the general idea of the proposed RX with $M/N$-phase discrete-time (DT) operation is discussed. Section IV provides detailed analysis of the $M/N$-phase DT charge-sharing (CS)-BPF. Section V gives a description of a cascaded three-stage harmonic rejection (HR) circuitry. Design and implementation of the receiver chain are described in Section VI, with measurement results given in Section VII. Finally, the conclusion is drawn in Section VIII.

II. OVERVIEW OF STATE-OF-THE-ART WIRELESS RECEIVERS

The pioneers of RFIC integration have quickly realized the superiority of operating receivers at ZIF/LIF rather than at HIF: [24] simpler architecture, and a much higher level of monolithic integration as a result of using low-frequency low-pass filters (LPFs) for channel selection [see Fig. 1(a)]. This was despite the many issues associated with ZIF/LIF receivers: time-variant dc offsets, sensitivity to $1/f$ (flicker) noise, large in-band LO leakage, and the second-order nonlinearity [2]–[7]. Those issues were viewed rather as an inconvenience and handled through various calibrations. However, high-performance cellular ZIF/LIF receivers now require extensive calibration efforts. For example, an intensive IIP2 calibration needs to be concurrently run in the background with dc offset and HR calibration [8], [18].

A superheterodyne architecture, shown in Fig. 1(b), pushes the IF frequency much higher such that the aforementioned problems are not a major concern anymore. Despite the obvious advantages, the superheterodyne radios have been abandoned for decades because it was extremely difficult to integrate a high quality (Q)-factor BPF for image rejection in CMOS using continuous-time (CT) circuitry [24].

The integration problem of HIF BPF was addressed in [25] [see Fig. 2(a)] utilizing an $N$-path filtering technique [26]–[31]; and in [32] [see Fig. 2(b)], [33] using a discrete-time (DT) quadrature CS-BPF [34]–[36]. The $N$-path filter cannot reject images defined as blockers/interferers at harmonics of the IF frequency because it inherently features replicas there [25]. On the contrary, a transfer function (TF) of the DT CS-BPF has only one peak in the entire sampling frequency domain of $-f_s/2$ to $f_s/2$, which makes it a proper candidate as an integrated BPF for superheterodyne receivers [34]. The center frequency and bandwidth of the full-rate DT CS-BPF in [32] and [33] are precisely controlled via $f_s$ and capacitor ratios. Additionally, that filter comprises only transistors as switches and capacitors, which occupy a small area and follow the process scaling very well. Unfortunately, the CS-BPF in [32] and [33] has insufficient blocker rejection to support the SAW-less operation.

In this work, we propose the superheterodyne architecture shown in Fig. 3 that utilizes a novel charge-sharing BPF based on an $M/N$-phase signaling and an extra pole to improve filtering. Combined with a proposed highly linear wideband low-noise transconductance amplifier (LNTA) and cascaded HR stages, the first-ever SAW-less HIF (superheterodyne) RX is thus demonstrated. By exploiting two stages of the $M/N$-phase CS-BPF, the desired signal is amplified, while the images and in-band/OB blockers are progressively filtered out throughout the receiver chain.

As stated above, the proposed architecture has several key advantages compared to state-of-the-art LIF RXs. First, since its IF is high, the issues associated with LIF RXs are eliminated, specially IIP2 and the need for dc offset calibration. Also, $1/f$ noise is not a concern anymore, so the active IF amplifiers use minimum length transistors. Second, two stages of DT CS-BPF consist of only capacitors as information charge storage devices, and transistors as switches. All of this makes the structure more compatible with the technology scaling. Moreover, the proposed RX offers the same level of monolithic integration as LIF RXs without using any calibration. Furthermore, the proposed RX exhibits clear advantages over the traditional superheterodyne RXs, which are summarized below. First, it includes two stages of integrated blocker-tolerant complex

Fig. 2. State-of-the-art superheterodyne receivers.
image-reject CS-BPFs and three stages of HR circuitry. Second, since the center frequency (i.e., coinciding with the chosen IF) of the \(M/N\)-phase DT CS-BPF is well controlled by clock frequency and ratio of capacitors, the IF frequency could be changed, thus avoiding RX desensitization in face of extremely large blockers. Finally, the second mixer and baseband filters have moved to the digital domain after the ADC (external in this work); hence, they are ideal.

III. PROPOSED SAW-LESS SUPER-HETERODYNE RECEIVER

Digital circuits benefit from process scaling in both speed and power consumption due to, respectively, the increase in transistor transit frequency \(f_T\) and lowering of its dimensions with every finer process technology node. However, analog/RF circuitry is getting worse, except for LNAs, because the threshold voltage \(V_{th}\) remains almost constant, while the supply voltage \(V_{DD}\) decreases. Also, the intrinsic gain and signal swing are reduced. All of those make analog/RF circuitry not amenable to CMOS scaling [37]–[42].

One the other hand, the DT approach is based on building blocks that scale very well: transistors acting as switches, switched capacitors, inverter-based \(g_m\)-cells, and digital clock generation circuitry. Hence, the RF performance improves with newer CMOS technology [32], [43]. These reasons motivate us to exploit the DT approach in the proposed SAW-less superheterodyne RX shown in Fig. 3.

The input voltage at the antenna is converted to current by LNTA and down-converted to HIF by DT sampling RF mixer, as shown in Fig. 3. The octal (i.e., eight-phase) mixer can be reconfigured to operate in the quadrature (i.e., four-phase) mode if the detected reception conditions are not demanding. After the mixer, the sampled down-converted signal is fed to the DT CS-BPF to attenuate images and OB blockers. To reduce the power consumption of the first CS-BPF even further, the decimation by 2 can be performed by integrating two samples, thus giving rise to the antialiasing sinc-type TF. In addition to all advantages of the two-stage CS-BPF, each of them provides intrinsic 3\(^{rd}\)/5\(^{th}\) HR that can be further improved by turning on the additional HR block. The second CS-BPF is cascaded via inverter-based \(g_m\)-cells providing flicker-noise-free gain. The sufficient front-end filtering provided by the two-stage CS-BPF (unlike in [32]) allows to directly digitize the IF signal using a low-power ADC, and move the second mixer and baseband filtering into the digital domain. As calculated, a 10 bit 400 MS/s ADC should be sufficient after the two stages of CS-BPF filtering, while consuming less than 2 mW with state-of-the-art successive approximation register ADC [44]. Also, it should be mentioned that the IIP2 generated by ADC is not a concern because the ADC’s IM2 component is at dc and the desired signal is at IF. The only possible limitation on the IIP2 in the proposed receiver is the quantization noise of the second digital mixer, but it can be arbitrarily reduced by increasing its word length.

IV. DT \(M/N\)-PHASE CS-BPF

The DT CS-BPF exhibits clear advantages over the traditional types of filters, such as active-RC, \(N\)-path, \(g_m\)-C, and biquad. The active-RC and \(g_m\)-C filters are substantially noisier due to the noise contributions from opamp and \(g_m\) components. Those components also generate flicker noise; thus to suppress it, their area needs to be very large. Furthermore, typical IF and BB filters need to be reconfigurable, in which the required bandwidth scales over a decade. Since the bandwidth in active filters is determined by the RC or \(C/g_m\) time constant, the capacitors should be up to 50% larger to compensate for RC and \(g_m\)-C mismatches. This contributes to their area disadvantage. As far as the \(N\)-path filters are concerned, they suffer from replicas at harmonics of their mixer switching frequency, while CS-BPF has only one peak in the entire sampling frequency.

\(^1\)LNA noise figure improves when \(f_T\) increases.
Also, in the traditional N-path filter, the stop-band rejection is severely limited by the switch ON-resistance.

A. Conventional Quadrature CS-BPF

Fig. 4(a) shows the well-known DT IIR LPF [45]. The input current \(i\), generated by a \(g_m\)-cell, is integrated on the history capacitances \(C_H\) and \(C_R\) as the input charge packet \(q_0 = \int_{-\frac{T}{2}}^{\frac{T}{2}} i(t) dt\) during \(\varphi_1\) over a time window \(T_s\). At \(\varphi_1\) going inactive, \(C_R\) samples a portion of the total “history” charge. As a result, the DT circuit illustrated in Fig. 4(a) has a 1st-order IIR characteristic, with \(C_R\) acting as a lossy component (termed “switched-capacitor resistor”). The order of the Fig. 4(a) DT IIR filter can be further increased to 2nd or 4th, as shown in Fig. 4(b) and (c), respectively, or indefinitely beyond, as demonstrated in [46]. The conventional quadrature CS-BPF with a single real-valued output can be synthesized from the 4th-order DT IIR filter by applying input charge packets \(q_0\), \(q_{90}\), \(q_{180}\), and \(q_{270}\) with a multiple of 90° degree phase shifts, as shown in Fig. 4(d) [34]. By defining the complex-valued input constructed from two differential signals having the quadrature relationship, \(q_I = q_0 - q_{180}\) and \(q_Q = q_{90} - q_{270}\), the complex TF of a conventional quadrature CS-BPF is derived as

\[
H(z) = \frac{V_oI(z) + jV_oQ(z)}{q_I(z) +jq_Q(z)} = \frac{k}{1 - [a + j \cdot (1 - a)]z^{-1}} \tag{1}
\]

where

\[
k = 1/(C_H + C_R) \tag{2}
\]

\[
a = C_H/(C_H + C_R). \tag{3}
\]

This TF has a 1st-order complex BPF characteristic with its peak located at

\[
f_{BF} = \frac{f_s}{2\pi} \arctan \left( \frac{1 - a}{a} \right). \tag{4}
\]

The filter comprises only capacitors and switching transistors. Its center frequency \(f_{BF}\) only depends on the sampling frequency \(f_s\) and capacitor ratios. Hence, it is fully amenable to process scaling.

B. 8/8-Phase CS-BPF

The filtering characteristic and tolerance to OB blockers of the conventional quadrature CS-BPF can be significantly enhanced by increasing the number of inputs, corresponding history capacitors, and digital clock phases to 8 (i.e., octal) or more. As an example of such a filter, the schematic of a 8/8-phase CS-BPF is proposed in Fig. 4(e), where it features eight inputs/outputs, eight history capacitors, and eight digital clock phases. The inputs, which are generated by the DT mixer for the first filter, are differential integrated charge packets \(q_1, q_2, q_3, q_4\).
that are phase shifted by 0, 45°, 90°, 135°. As in the traditional CS-BPF, \( C_R \) shares the charge between various \( C_H \)s. By defining the complex output voltage as

\[
V_{oC} = V_{o,1} + e^{j\pi/4}V_{o,2} + e^{j\pi/2}V_{o,3} + e^{j3\pi/4}V_{o,4}
\]  

(5)

and complex input charge as

\[
q_{iC} = q_1 + e^{j\pi/4}q_2 + e^{j\pi/2}q_3 + e^{j3\pi/4}q_4
\]  

(6)

and following the same approach as presented in [34], we find the complex TF of the 8/8-phase CS-BPF, driven by ideal input charge packets, as

\[
H_{8/8}(z) = \frac{V_{oC}(z)}{q_{iC}(z)} = \frac{k}{(1-a z^{-1}) - e^{j\pi/4} (1-a) z^{-1}}
\]  

(7)

where \( k \) and \( a \) are the same as in (2) and (3). The peak of the TF lies at

\[
f_{IF} = \frac{f_s}{2\pi} \arctan \left[ \frac{(1-a)\sin(\pi/4)}{a + (1-a)\cos(\pi/4)} \right].
\]  

(8)

The 8/8-phase CS-BPF has a 1st-order BPF characteristic centered at \( f_{IF} \). In addition to the filtering improvement over its conventional counterpart, this filter is capable of filtering images and OB blockers at \( 3^{rd}/5^{th} \) LO harmonics. It should be noted that this filter still maintains the full compatibility with the technology scaling due to its DT passive nature.

C. 8/16-Phase CS-BPF

To further improve the filtering order and characteristics of the 8/8-phase CS-BPF, we propose to add an IIR LPF (of single or multiple poles) during the charge-sharing process in between every two adjacent inputs. As an example of such a filter, one LPF pole is added between each pair of adjacent input history capacitors \( C_H \) in Fig. 4(f) to give rise to an 8/16-phase CS-BPF. This filter has 8 inputs, 8 outputs, 16 \( C_H \)'s (8 of them are input \( C_H \)'s), and 16 nonoverlapped clock phases with a duty cycle of 1/16. The input is interpreted as four differential charge packets \( (q_1, q_2, q_3, \text{ and } q_4) \) with multiples of 45° degree phase shifts provided by the DT mixer. The eight individual single-ended input charge packets are accumulated into their respective input \( C_H \)'s. At the end of each odd-numbered phase \( \varphi_1, \varphi_3, \ldots, \varphi_{15} \), the rotating capacitor \( C_R \) samples a charge from the active \( C_H \). In the following even-numbered phase of \( \varphi_2, \varphi_4, \ldots, \varphi_{16}, C_R \) containing the previous packet is charge-shared with a newly introduced history capacitor, termed “output \( C_H \),” which contains the intermediate (i.e., additionally LPF filtered) version of the “history” charge. Therefore, in each phase, \( C_R \) removes a charge proportional to \( C_H / (C_H + C_R) \) from each \( C_H \) (whether input or output) and then delivers it to the next \( C_H \). The newly introduced output history capacitors add significant extra filtering, thus improving blocker resiliency. They also provide convenient pick-up nodes for the dedicated output port that is now physically separate from the input.

In the above case, the 8/16-phase CS-BPF does not operate at the full rate and so all eight outputs can be read out at the maximum sampling rate of \( f_s = 1/T_s = f_{LO} \). By defining the \( V_{oC} \) and \( q_{iC} \), the same as (5) and (6), the filtering TF of the filter driven by ideal charge packets, as shown in Fig. 4(f), can be proven to be

\[
H_{8/16}(z) = \frac{V_{oC}(z)}{q_{iC}(z)} = \frac{k \cdot (1-a) z^{-1}}{(1-a z^{-1})^2 - e^{j\pi/4} [(1-a) z^{-1}]^2}
\]  

(9)

where \( k \) and \( a \) are the same as (2) and (3), respectively. We find the center frequency of the filter to be

\[
f_{IF} = \frac{f_s}{2\pi} \arctan \left[ \frac{(1-a)\sin(\pi/8)}{a + (1-a)\cos(\pi/8)} \right].
\]  

(10)
D. Proposed General M/N-Phase CS-BPF

Fig. 5 proposes various configurations of the single-stage full-rate CS-BPF: 1) without the additional LPF poles; 2) with one LPF pole; and 3) with \( X = (N/M - 1) \) LPF poles between the adjacent history capacitors. For extending the CS-BPF to a general form, we use the notation of “M/N-phase CS-BPF,” where it has M inputs, M outputs, N history capacitors, N nonoverlapped clock phases with a duty-cycle of \( D = 1/N \), and X LPF poles in the charge-sharing loop. Inputs of the filter are interpreted as differential charge packets, \( q_1, q_2, \ldots, q_{M/2} \) that are phase shifted by \( 0, 2\pi/M, 4\pi/M, \ldots, (M - 2)\pi/M \) radians, and for the first CS-BPF, provided by the M-phase DT mixer.

To summarize, the blocker-resilient 8/16-phase CS-BPF features a sharp and highly linear TF to filter images and OB blockers even at \( 3^{\text{rd}}/5^{\text{th}} \) harmonics of LO. The OB filtering of blockers is improved significantly compared to [32] and [34] by increasing the number of input phases of CS-BPF and adding the LPF pole between each pair of adjacent input history capacitors. The center frequency of the filter is fully controllable by the capacitance ratios and sampling frequency, thus making it insensitive to PVT. The only possible concern in the future CMOS nodes would be a degradation of metal-oxide-metal (MOM) capacitor matching for constant capacitance units, which become more dense and thus more mismatched due to the metal stack becoming more compressed. This would normally prevent aggressive area scaling of MOM capacitors. However, this architecture employs charge-sharing rotation that acts akin to dynamic weighted averaging (DWA), thus making it robust to capacitor mismatches. Simulations reveal that folded images are below -120 dB (normalized to the TF peak) even in face of a 50% capacitor mismatch.\(^2\) Therefore, the capacitor mismatch degradation in the advanced CMOS technologies would be insignificant in the CS-BPF.

To support the full-rate operation, parallelism/interleaving techniques are used to increase the sampling frequency to \( f_s = M f_{LO} \) [34]. As in any sampling system, frequency components at \( f_s \pm f_{IF} \) are folded to the desired frequency at IF. Therefore, larger M increases \( f_s \), thus pushing away the closest folding frequencies. Similarly, increasing \( M \) improves the CS-BPF tolerance to blockers but at the same time introduces more complexity and power consumption in the full-rate mode, i.e., without any decimation.

To investigate the TF of full-rate M/N-phase CS-BPF, the time-domain output voltage expressions at \( t = nT_s \), where \( T_s = 1/f_s \), can be derived as

\[
V_{i,1}[n] = \frac{C_H V_{i,1}[n - 1] + C_R V_{o,X,M/2}(n - 1) + 2q_i[n]}{C_H + C_R} \tag{11}
\]

\(^2\)Note that the MOM capacitor mismatch in this design is merely 0.03%–0.1%.

To summarize, the blocker-resilient 8/16-phase CS-BPF features a sharp and highly linear TF to filter images and OB blockers even at \( 3^{\text{rd}}/5^{\text{th}} \) harmonics of LO. The OB filtering of blockers is improved significantly compared to [32] and [34] by increasing the number of input phases of CS-BPF and adding the LPF pole between each pair of adjacent input history capacitors. The center frequency of the filter is fully controllable by the capacitance ratios and sampling frequency, thus making it insensitive to PVT. The only possible concern in the future CMOS nodes would be a degradation of metal-oxide-metal (MOM) capacitor matching for constant capacitance units, which become more dense and thus more mismatched due to the metal stack becoming more compressed. This would normally prevent aggressive area scaling of MOM capacitors. However, this architecture employs charge-sharing rotation that acts akin to dynamic weighted averaging (DWA), thus making it robust to capacitor mismatches. Simulations reveal that folded images are below -120 dB (normalized to the TF peak) even in face of a 50% capacitor mismatch.\(^2\) Therefore, the capacitor mismatch degradation in the advanced CMOS technologies would be insignificant in the CS-BPF.

To support the full-rate operation, parallelism/interleaving techniques are used to increase the sampling frequency to \( f_s = M f_{LO} \) [34]. As in any sampling system, frequency components at \( f_s \pm f_{IF} \) are folded to the desired frequency at IF. Therefore, larger M increases \( f_s \), thus pushing away the closest folding frequencies. Similarly, increasing \( M \) improves the CS-BPF tolerance to blockers but at the same time introduces more complexity and power consumption in the full-rate mode, i.e., without any decimation.

To investigate the TF of full-rate M/N-phase CS-BPF, the time-domain output voltage expressions at \( t = nT_s \), where \( T_s = 1/f_s \), can be derived as

\[
V_{i,1}[n] = \frac{C_H V_{i,1}[n - 1] + C_R V_{o,X,M/2}(n - 1) + 2q_i[n]}{C_H + C_R} \tag{11}
\]

\(^2\)Note that the MOM capacitor mismatch in this design is merely 0.03%–0.1%.

\[
H_{M/N}(z) = \frac{\sum_{l=1}^{M/2} (V_{o,X,l}(z)) e^{j(2l-2)\pi/M}}{\sum_{l=1}^{M/2} (q_i(z)) e^{j(2l-2)\pi/M}} \approx \frac{k \cdot ((1 - a) z^{-1})^{N/2} - 1}{1 - a z^{-1} - e^{j2\pi/M} ((1 - a) z^{-1})^{N/M}} \tag{15}
\]

V. HARMONIC REJECTION

The differential mixer driven by a square-wave clock is a linear time-variant circuit that down-converts the desired signal...
Fig. 7. Concept of (a) multistage phase-frequency controlled system; (b) multistage PCF.

Fig. 8. (a) Proposed HR stages in the superheterodyne receiver. (b) Harmonic rotation vectors. (c) Harmonics cancellation summation.

together with undesired interferers at higher LO harmonics. In narrow-band receivers, those interferers are not of a major concern because of a customary RF band filtering right after the antenna. In wideband RF receivers, such RF band select filtering would be very difficult, so it is the LO harmonics instead that need to get rejected. The required level of LO HR is 60–100 dB, which is almost impossible with only one HR stage due to practical amplitude and phase mismatches. A two-stage HR was introduced in [47], but it prevents further HR improvements because of the nonredundant (i.e., quadrature) signal representation. In this section, we propose a mismatch insensitive HR concept that can be arbitrarily cascaded without any bound on the HR capability.

Fig. 7(a) starts with a high-level model of a multistage phase-frequency control system. Its key feature is that the harmonic TF depends on both the input frequency $f$ and phases $\phi_i$, $i =$
The charge-sharing phases of the signal for the vector. The HR for Fig. 8(b) shows the corresponding arrangement of phase rotation vectors. The HR for the mixer (f_1, f_3, f_5, ..., f_n), it is now stored as phases in the M mixer output lines, with M > 4 to ensure redundancy, where it will be preserved as long as the number of lines is maintained. The multiple phases in M lines can be processed by the phase-controlled filter (PCF) leading to a different TF for every harmonic.

A. CS-BPF Harmonic Rejection Concept

In our implementation, the PCF HR circuitry consists of three stages in total, as shown in Fig. 8. It includes two stages of CS-BPFs. Although the 1st and 3rd/5th input harmonics are down-converted to the same IF frequency by the octal mixer, the phase difference between two adjacent lines for the 1st and 3rd/5th harmonics are π/4, and (−3π/4)/(5π/4), respectively. The charge-sharing phases of the signal for the 1st (blue), 3rd (red), and 5th (purple) harmonics are shown in Fig. 8(a). Assuming that the even harmonics are removed due to the differential configuration, the phase difference of odd harmonics is sensed by CS-BPF, so the general harmonic TF of the M/N-phase CS-BPF and ϕ_i can be found as

\[
H(z,\varphi_i) = \frac{1/(C_R + C_H)}{(1-a z^{-1})^{N/M-1}} \times (1-a z^{-1})^{N/M} - i \varphi_i (1-a z^{-1})^{N/M} \tag{17}
\]

where \( i \in [1, 2, ..., n] \) and \( a \) is equal to (3). Fig. 8(b) shows the corresponding arrangement of phase rotation vectors. The HR for 3rd/5th harmonics is ~22 dB for each CS-BPF, which can be infinitely improved by cascading CS-BPFs since the octal format fully preserves the harmonic information.

HR is further improved by the proposed “stage-2” HR block. It consists of four X_1 blocks, each comprising three identical

0, 1, 2, ..., Multiple phases \( \varphi_i \) can be generated with an \( M \)-phase mixer, shown in Fig. 7(b), which not only down-converts the desired signal at the fundamental but also does the interferers at higher 3rd, 5th, ..., nth LO harmonics to the same IF frequency with multiple phases of \( \psi_i = (i-1) \times 2\pi/M \) where \( i = 1, 2, ..., M \). Therefore, instead of storing the harmonic information in the frequency domain, as is the case before the mixer (\( f_1, f_3, f_5, ..., f_n \)), it is now stored as phases in the M mixer output lines, with M > 4 to ensure redundancy, where it will be preserved as long as the number of lines is maintained. The multiple phases in M lines can be processed by the phase-controlled filter (PCF) leading to a different TF for every harmonic.

\[
\psi_i = (-1)^{i-1} \times i \times 2\pi/M \tag{18}
\]

respectively, where \( i \in [1, 2, ..., n] \) and \( a \) is equal to (3). Fig. 8(b) shows the corresponding arrangement of phase rotation vectors. The HR for 3rd/5th harmonics is ~22 dB for each CS-BPF, which can be infinitely improved by cascading CS-BPFs since the octal format fully preserves the harmonic information.

VI. DESIGN AND IMPLEMENTATION OF THE RECEPTOR CHAIN

We have described so far the evolution of the M/N-phase CS-BPF toward its full exploitation as an image reject
filter in the fully integrated SAW-less discrete-time superheterodyne receiver. In this section, we describe the detailed design implementation of the receiver, starting with various operational modes of the fully reconfigurable $M/N$-phase CS-BPF.

**A. 4/16-Phase and 8/16-Phase CS-BPFs**

The two implemented CS-BPF filters are each programmed as either quadrature (4/16-phase) or octal (8/16-phase). In either mode, the filter is clocked by 16 nonoverlapped signals with $D = 1/16$ and the filter’s center frequency is located at IF with no replicas present. The 16 history $C_H$ and 16 rotating, $C_R$ capacitors in the full-rate CS-BPFs shown in Fig. 5(b) and (c) are actually eight differential capacitors each, in order to simplify the schematic by $\times 4$. Also, due to the differential implementation, common-mode voltage and even-order nonlinearity of the prior stages are canceled out. $C_H$ and $C_R$ are digitally tunable with 8-bit binary-weighted codes to support variable IF of $-10$ MHz up to $-90$ MHz for GSM.

**B. Clock Generation Circuitry**

Block diagram of the clock generation is shown in Fig. 11. An external sinusoidal input is converted to a 50% duty-cycle clock after passing through the input buffer. It drives three clock generation circuits. The first circuit provides all the clock phases required for the RF mixer, while the remaining two provide all the clock phases for the CS-BPFs. All three circuits are independently programmable to operate in either the octal or quadrature mode. In these modes, the mixer clock generation has a respective output duty-cycle of 12.5% and 25%, while the clock for both CS-BPFs is always at $D = 6.25\%$, as shown in Fig. 11. To be able to further save dissipated power, the dividers are used to enable decimation by 1, 2, or 4 for both CS-BPF stages.

Functional block diagram of the clock generation circuitry for the mixer and the two CS-BPF stages is the same. Fig. 12 shows an example of the mixer LO generation. The CK and CK input clocks with $D = 50\%$ are driving eight and four dynamic latches connected back-to-back in a loop for the octal and quadrature modes, respectively. The latch outputs are followed by digital gates, which produce 12.5% (octal) and 25% (quadrature) duty-cycle clocks. The final output is selected between the octal or quadrature outputs by eight multiplexers. Therefore, in the quadrature mode, half of the mixer switches are OFF.

**C. Low-Noise Transconductance Amplifier**

Fig. 13(a) shows a fully differential schematic of the proposed LNTA, which simultaneously features low NF and high IIP3 (only single-ended signal waveforms are shown). The noise-canceling common-gate transistors ($M_{n1}/M_{n2}$) provide the RX input matching. The noise-canceling operation is as follows. The input signal gets amplified by transistors $M_{n1}$/$M_{n3}$ and $M_{p1}$ in a differential feed-forward manner, whereas the thermal noise of $M_{n1}$ channel experiences subtraction at the output nodes because of the out-of-phase correlated noise voltages at $V_x$ and $V_{outn}$. The 3rd-order nonlinearity of $M_{n1}$ and $M_{n3}$ can be simultaneously canceled at the differential output because $M_{n1}$ and $M_{n3}$ operate in weak and saturation regions, respectively, resulting in out-of-phase $g_{m3}$ (3rd-order transconductance) to each other. Therefore, partial cancellation of the IM3 component happens at the differential output. The cancellation happens at the desired frequency because at other frequencies, an additional IM3 is generated due to the 2nd-order nonlinearity of $M_{n3}$. Simulated (with extracted parasitics) NF and gain of LNTA with a resistive load is shown in Fig. 13(b) across 0.1–4 GHz.
Fig. 13. (a) LNTA schematic. (b) Its post-layout simulated noise figure and gain.

Fig. 14. (a) Simplified block diagram of the RX front-end. (b) Simulated LNTA output impedance when the CS-BPF is ON/OFF.

Fig. 15. IF $g_m$-cell schematic with common-mode rejection load.

Fig. 16 shows the simplified block diagram of the RX front-end and simulated impedances at the LNTA output. The composite impedance load $Z_{out}$ seen by the LNTA is comprised of its own intrinsic output impedance, $Z_o$, in parallel with a load provided by the mixer and CS-BPF, $Z_{CS}$. Since $Z_o$ (~ 350 Ω) is several times (>2.5x) higher than $Z_{CS}$ (140 Ω peak), the mixer is considered to be operating in a current commutating mode [55], rather than in a voltage mode. As a result, the effect of the on-resistance of the mixer switches is also minimized.

D. IF Stage Transconductance Amplifier ($g_m$-cell)

Fig. 15 shows a schematic of the pseudo-differential inverter-based IF transconductance amplifier with a common-mode (CM) rejection load. The $g_m$-cell operates at 0.9 V supply and a pair of complementary thick-oxide PMOS/NMOS transistors is utilized to increase the transconductance linearity to $+11$ dBm (simulated) for all corner cases within a temperature range of −30°C to 100°C [56]. The common-mode feedback circuitry provides a proper bias of $V_{DD}/2$ to the outputs.

To suppress any possible CM oscillation in the RX chain, the CM gain of the $g_m$-cell is drastically reduced by placing a CM load at its output. It features different impedances for the CM and differential-mode (DM) signals. The impedance for DM signals is very high; it is proportional to the small-signal drain resistance of the CM load transistors $M_n$ and $M_p$, while the impedance for CM signals is very low, equal to $1/((g_{mn} + g_{mp})A)$, where $g_{mn}$ and $g_{mp}$ are the small-signal transconductance of $M_n$ and $M_p$.

VII. MEASUREMENT RESULTS

Fig. 16 shows the chip micrograph of the proposed superheterodyne RX for 4G cellular mobiles realized in TSMC 28 nm CMOS [57]. The active area is 0.52 mm$^2$, which is mostly occupied by $C_H$ and $C_R$ capacitors of the two CS-BPFs. Both the RX and clock inputs are differential and so “hybrids” are used to interface with 50-Ω single-ended instrumentation. The chip is wire-bonded to a PCB and the characteristics of
Fig. 16. Chip micrograph of the proposed discrete-time superheterodyne receiver.

Fig. 17. Measured RX TF for different bands.

PCB lines and cables are de-embedded from the measurement’s results. All the measurements are performed at high RX gain without any calibrations, even those concerning the linearity.

The RX is fully characterized in “2G band-5” and “3G band-1,” as representatives of GSM and PCS bands, respectively, but it is fully functional in the entire 0.5–2.5 GHz RF input frequency range. The measured normalized TFs are shown in Fig. 17 for GSM, PCS, and LTE bands with 0.85, 2.1, and 2.5 GHz RF input frequencies. The RX bandwidth is 6.5 MHz for 2G/3G and 20 MHz for LTE, while IF frequency is 2.5 GHz RF input frequencies. The RX bandwidth is 6.5 MHz for 0.85–2.5 GHz carriers. Although the absolute value of IF in the proposed RX can be variable in a face of large blocker, within the range of 10–90 MHz, 25–220 MHz, and 29–262 MHz for 2G, 3G, and LTE bands, respectively. Also, the absolute value of IF in the proposed RX can be variable in a face of large blocker, within the range of 10–90 MHz, 25–220 MHz, and 29–262 MHz for 2G, 3G, and LTE bands, respectively.

Fig. 18 shows the RX gain at 0.85 and 2.1 GHz carriers for I channel only. By recombining the I/Q channels, an extra 6 dB gain can be obtained. The overall pass-band gain of LNTA and 1st CS-BPF in GSM and PCS bands is around 18 and 17.5 dB, respectively. The gain of IF gm-cell and 2nd CS-BPF is measured by subtracting the total RX gain from the gain provided by LNTA and 1st CS-BPF. That peak gain value is 17 and 16.5 dB for 2G and 3G, respectively. The total RX gain is between 29 and 35 dB for 0.85–2.5 GHz carriers. Although the 1st and 2nd CS-BPFs are identical, the former shows a sharper filtering characteristic due to a larger output resistance of LNTA versus that of IF gm-cell.

The comparisons of measured TFs of LNTA and 1st CS-BPF with calculations per (17) are shown in Fig. 19(a) and (b), respectively, for 3rd and 5th harmonics. The difference between the measured and calculated 1st harmonic at IF is due to the effect of LNTA output impedance. The 19-dB rejection of 3rd and 5th harmonics per each CS-BPF stages is measured at IF.

The measured wideband TFs in the normal and HR modes for three ICs is shown in Fig. 20. All the images are rejected by more than 65 dB, including the IF image, in all three measured ICs without any calibration. The worst-case HR of 58 dB is achieved when the HR-block is ON: 40 dB from the two-stage CS-BPFs, 15 dB from the HR-block, and the rest from the LNTA’s finite bandwidth. The highlighted images are multiples of smallest LO frequency in the clock generation circuitry with an offset of ±fIF.

Fig. 21 plots the measured receiver NF of 2.1–2.6 dB with an LO frequency of 865, 2115, and 2535 MHz for 2G, 3G, and LTE, respectively. The minimum noise figure in each standard happens at the center frequency of CS-BPFs, which coincides with the IF location. Also, the NF contribution of each building block is summarized in Table I for GSM an PCS bands.

The simulated (post-layout extracted) OB-IIP3 of CS-BPF is more than +30 dBm. Furthermore, because of its strong blocker filtering, OB-IIP3 is mainly determined by the linearity of LNTA. Fig. 22 shows the measured OB-IIP3 of the RX versus offset frequency for 2G and 3G. It should be mentioned that the linearity was measured at the maximum gain (i.e., the lowest noise figure) and without any calibration. The variation in OB-IIP3 over offset frequency is due to the linearity dependency of LNTA on the offset frequency. The peak IIP3 of +14 dBm is achieved for the offset frequencies specified by the 2G/3G standards at duplex (fTX) and half duplex ((fTX + fRX)/2) frequencies.

For IIP2 measurements, we consider two separate test cases:
1) closely spaced tones or a modulated single tone IIP2 test case (limited by the mixer’s IIP2).
2) far away two-tone test case (limited by the LNA’s IIP2)

The first test case is a strong impediment to the removal of the SAW filter at the front of RF chain. The required IIP2 would be more than +90 dBm. The second test case is additionally applicable to wideband RXs, but it is less stringent. To calculate the needed IIP2, let us assume the blocker level of −32.5 dBm applied to the RX for the required sensitivity of −99 dBm and SNR of 9 dB to maintain signal purity. The IM2 component should be below −108 dBm. Therefore, the needed IIP2 is +43 dBm. To clarify the situation, both IIP2 test cases are measured.

For the first test case, since the RX architecture is superheterodyne with an fIF of −15 to −35 MHz, the applied closely spaced two-tone or single modulated tone with 15 MHz bandwidth will be down-converted to around dc, thus completely filtered out. This case has been measured when the RX is in high-gain mode, and unsurprisingly, the only phenomenon observed was the instrument’s noise floor.

For the second test case, the two tones are far away from each other but the generated IM2 by the LNTA is in-band.
In our tests, the two tones are located at $f_{RF}+\text{spacing}$ and $2f_{RF}+\text{spacing}$, while $f_{RF}$ is 860 MHz in GSM. As shown in Fig. 23, IIP2 of better than +50 dBm is achieved when the LNTA is set to mid-gain (the standard allows for a gain relaxation there).

The RX blocker tolerance is demonstrated by means of “NF under blocker” tests. Special attention to the purity of the large blocker signal is paid in these measurements: an external BPF is added to the RF blocker source to eliminate its phase noise components falling within the RF signal band, thus preventing reciprocal mixing from inadvertently increasing the measured NF. Fig. 24 shows the RX NF measurement versus the blocker power at 20 and 80 MHz offsets for PCS and GSM bands when the LNTA is in mid- and high-gain modes, respectively. For the PCS band, $-12$ dBm blocker at 20 MHz and 0 dBm blocker at 80 MHz offsets increase the measured NF to 7.2 and
Fig. 21. Measured noise figure for GSM, PCS, and LTE bands.

TABLE I

<table>
<thead>
<tr>
<th>Noise Figure Contribution of Each Building Block in the RX Chain</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>Input Port</strong></td>
</tr>
<tr>
<td>GSM band [%]</td>
</tr>
<tr>
<td>PCS band [%]</td>
</tr>
</tbody>
</table>

Fig. 22. Measured IIP3 for (a) GSM and (b) PCS bands versus frequency offset.

The RX passes all GSM and PCS bands requirements except for those with a 0 dBm blocker. The excessive rise of NF at 0 dBm blocker is mainly due to the LNTA’s cascode structure that operates at a very low $V_{DD} = 0.9$ V supply due to I/O constraints in our testchip. Using I/O transistors at 1.8 or 2.5 V supply should add enough headroom to eliminate this linearity issue.

The measured power consumption of the RX chip versus input frequency is shown in Fig. 25. The overall RX power consumption varies from 22 to 40 mW dependent on input RF band and related clock frequency. The main contributor to the overall RX power is analog part for GSM band. As the clock frequency increases for PCS band, the main contributor is the power consumed by DT part including RF mixer, CS-BPF1, CS-BPF2, and clock buffers and dividers.

Table II compares the proposed DT RX with state-of-the-art RXs. While being the best-in-class in meeting the key performance parameters without any calibration, its power consumption and area are generally the lowest, and it does not suffer from any issues related to dc offsets, flicker noise, or IM2 products since its IIP2 is immeasurably high for closely spaced or single modulated interferers.

VIII. CONCLUSION

We have proposed and demonstrated a new architecture of a discrete-time superheterodyne receiver targeting a SAW-less...
operation of the 4G cellular standard. The consequence of reduced filtering at the antenna interface network forces much better linearity and filtering of the on-chip RF front-end. Consequently, the LNA is made wideband with a new noise cancellation scheme. The RF mixer and two stages of bandpass filtering are octal, which provides strong filtering and allows to naturally reject input harmonics. The architecture is realized in 28 nm CMOS and is amenable to further scaling.
ACKNOWLEDGMENT
The authors would like to thank Atef Akhnoukh, and especially Wil Straver from TUDelft and P. Vanbekbergen and P. Stynen from M4S/Hisinlicon for their support.

REFERENCES
Massoud Tohidian (S’08–M’15) received the B.Sc. and M.Sc. degrees in electrical engineering (with Honors) from Ferdowsi University of Mashhad, Mashhad, Iran, and the University of Tehran, Tehran, Iran, in 2007 and 2010, respectively, and the Ph.D. degree (cum laude) from Delft University of Technology (TU Delft), Delft, The Netherlands, in 2015.

He was a Researcher at IMEP-LAHC Laboratory, Grenoble, France, from 2009 to 2010. He was a Consultant at M4S/Hisilicon, Leuven, Belgium, from 2013 to 2014, designing a 28-nm SAW-less receiver chip for mobile phones. Since February 2015, he has been a Co-founder and CEO of Qualinx B.V., Delft, The Netherlands, developing low-power CMOS wireless chips. He holds seven patents and patent applications in the field of RF-CMOS design. His research interests include RF transceivers, discrete-time/digital signal processing, PLL, and oscillators.

Iman Madadi (S’08–M’15) received the B.S. degree from K. N. Toosi University of Technology, Tehran, Iran, in 2007, the M.S. degree from the University of Tehran, Tehran, Iran, in 2010, and the Ph.D. degree from Delft University of Technology, Delft, The Netherlands, in 2015, all in electrical engineering.

From 2013 to 2014, he was a Consultant at M4S/Hisilicon, Leuven, Belgium, where he designed a 28-nm SAW-less receiver chip for mobile phones. Since February 2015, he has been the Co-founder and CTO of Qualinx B.V., Delft, The Netherlands. He holds six patents and patent applications in the field of RF-CMOS design. His research interests include analog and RF IC design for wireless communications.

Koen Cornelissen received the M.Sc. degree in electrical engineering from KU Leuven, Leuven, Belgium, in 2004, and the Ph.D. degree from KU Leuven, in 2010, for his work entitled “Delta-Sigma A/D converter design in nanoscale CMOS.”

To conduct the research, he obtained a Ph.D. Fellowship at the Research Foundation–Flanders (FWO). He joined M4S-Huawei, where he is currently working as a Senior Analog Design Engineer on integrated circuits for cellular transceivers.

Robert Bogdan Staszewski (M’97–SM’05–F’09) was born in Bialystok, Poland. He received the B.Sc. degree (summa cum laude) in electrical engineering, the M.S. degree in electrical engineering, and the Ph.D. degree in electrical engineering from the University of Texas at Dallas, Dallas, TX, USA, in 1991, 1992, and 2002, respectively.

From 1991 to 1995, he was with Alcatel Network Systems, Richardson, TX, USA, working on SONET cross-connect systems for fiber optics communications. He joined Texas Instruments, Dallas, TX, USA, in 1995, where he was elected Distinguished Member of Technical Staff (limited to 2% of technical staff). Between 1995 and 1999, he was engaged in advanced CMOS read channel development for hard disk drives. In 1999, he co-founded a Digital RF Processor (DRP) Group within Texas Instruments with a mission to invent new digitally intensive approaches to traditional RF functions for integrated radios in deeply-scaled CMOS processes. He was appointed a CTO of the DRP Group between 2007 and 2009. In July 2009, he joined Delft University of Technology, Delft, The Netherlands, where he is currently a part-time Full Professor. Since September 2014, he has been a Professor with the University College Dublin (UCD), Dublin, Ireland. He has authored and co-authored three books, five book chapters, 100 journal and conference publications, and holds 140 issued U.S. patents. His research interests include nanoscale CMOS architectures and circuits for frequency synthesizers, transmitters, and receivers.

Prof. Staszewski has been a TPC Member of ISSCC, RFIC,ESSCIRC, ISCAS, and RFID. He is a recipient of the IEEE Circuits and Systems Industrial Pioneer Award.