<table>
<thead>
<tr>
<th><strong>Title</strong></th>
<th>A DSP Coprocessor for ADSL Lite</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>Authors(s)</strong></td>
<td>Berg, Vincent; Rodriguez, Jose; Bleakley, Chris J.; Murray, Brian</td>
</tr>
<tr>
<td><strong>Publication date</strong></td>
<td>1999-01</td>
</tr>
<tr>
<td><strong>Conference details</strong></td>
<td>Irish Signals and Systems Conference (ISSC), Galway, Ireland, January, 1999</td>
</tr>
<tr>
<td><strong>Item record/more information</strong></td>
<td><a href="http://hdl.handle.net/10197/7117">http://hdl.handle.net/10197/7117</a></td>
</tr>
</tbody>
</table>
A DSP Co-processor Core for ADSL Lite

Vincent Berg, Jose Rodrigues, Dr Chris Bleakley, Dr Brian Murray

Massana Ltd.
5 Westland Square, Dublin 2, Ireland.
www.massana.com

Abstract

This paper presents Massana's DSP co-processor solution – FILU-DMT [1] for enabling soft G.Lite (or ADSL Lite) on Pentium and RISC processors. The user interacts with the co-processor via a C API which accesses a shared RAM interface. All of the G.Lite DSP functions are pre-programmed and held in ROM. The FILU-DMT is implemented in fully synthesizable Verilog RTL with a single synchronous clock for high scan coverage. It is based on a dual MAC architecture which can perform a radix-4 FFT butterfly in 8 cycles yielding a 256 point FFT in 21 $\mu$s. This is the industry's fastest FFT for this class of processor. The FILU-DMT supports block floating-point arithmetic which achieves near floating-point performance at a fraction of the area cost of conventional DSPs.

1. Introduction

The explosive growth of the Internet has fuelled consumer demand for high bandwidth connections to the home at low cost. One of the emerging solutions to this problem is G.Lite or ADSL.Lite (Asynchronous Digital Subscriber Line). Recently standardised as G.922.2 [2], G.Lite provides up to 1.5 Mbit/s downstream and 512 kbit/s upstream over existing copper wires with "always-on" functionality [3]. Unlike ADSL, G.Lite is "splitter-less" and therefore no installation cost is incurred in connection to the home. It is also less computationally demanding than full ADSL.

Obviously, mass-market consumer devices such as the G.Lite modem are extremely price sensitive. However, due to the quantity of DSP processing required a purely software approach is not feasible. This paper describes Massana's co-processor solution – FILU-DMT. It is an ultra-small DSP VLSI core which enables standard processors, such as the Pentium or RISCs, to provide soft G.Lite.

The remainder of this paper provides an overview of the G.Lite algorithm, details of the FILU-DMT architecture and an explanation of the implementation of G.Lite on the FILU-DMT DSP co-processor.

2. xDSL Overview

xDSL (Digital Subscriber Line) services are dedicated, point-to-point, public network access technologies that allow multiple forms of data, voice, and video to be carried over twisted-pair copper wire on the local loop (“last mile”) between a Network Service Provider’s (NSP’s) central office and the customer site, or on local loops created either intra-building or intra-
campus. The “x” in xDSL stands for the various kinds of digital subscriber line technologies, including ADSL (Asymmetric DSL), R-ADSL (Rate Adaptive DSL), HDSL (High data rate DSL), SDSL (Single line DSL), and VDSL (Very high data rate DSL).

ADSL allows more bandwidth downstream - from an NSP’s central office to the customer site – than upstream from the subscriber to the central office. This asymmetry, combined with “always on” access (which eliminates call setup), makes ADSL appropriate for Internet, video-on-demand, and remote local area network (LAN) access.

G.Lite is proposed as a lower-speed version of ADSL [4] that will eliminate the need for the telco to install and maintain a residential premises-based POTS splitter. Elimination of the POTS splitter is intended to simplify installation and reduce the costs for Network Service Providers (NSPs). G.Lite is also intended to work over longer distances than full-rate ADSL, making it more widely available to mass market consumers.

3. G.Lite Modulation Scheme (DMT)

Discrete MultiTone (DMT) modulation is the technology used in ANSI standard T1.413 for full-rate ADSL. It is proposed for G.Lite [4]. DMT is basically an OFDM (Orthogonal Frequency Division Multiplexing) modulation used for wired systems. The idea of DMT is to divide the available spectrum into several sub-channels (sub-carriers or tones). Each sub-carrier is modulated by an individual QAM symbol. The sum of all the sub-carriers results in a DMT symbol, Figure 1.

![DMT Spectrum](image1)

The copper wire pair does not change its physical behaviour significantly with time and therefore is considered a stationary channel. This makes it possible to use a technique called bit loading, which makes good use of the spectrally shaped channel. When using bit loading the sub-channels are dynamically assigned different numbers of bits according to their respective SNRs. A stationary channel makes it possible to measure the SNR on each sub-channel. In other words, bit loading consists of assigning a higher order QAM constellation to those sub-channels with higher SNR. More details in [6] and [7].
3.1 Cyclic Prefix

The cyclic prefix, also called guard time, is a copy of the last samples of the DMT symbol which are prepended to the transmitted symbol. Thus, for the input to the FFT, it appears as if the transmitted sequence was periodic. Orthogonality of the sub-carriers can be completely maintained, even though the signal passes through a time-dispersive channel (i.e. digital subscriber lines), by introducing a cyclic prefix. Loss of orthogonality means Inter Symbol Interference (ISI) and Inter-carrier Interference (ICI).

In order to perform the FFT process without equalisation of the ISI, the cyclic prefix length must be greater than the channel impulse response ([8] approx. 100 ms). Alternatively, the equalisation must reduce the ISI to less than the length of the cyclic prefix.

There is a trade-off between rate efficiency, SNR and cyclic prefix length: for a cyclic prefix length \( \nu \) and an \( N \) point FFT, the data rate is decreased by a factor of \( N/(N+\nu) \). A figure from Alcatel [8] is 95% efficiency. This means that the guard time is \( \sim 12.5 \mu s \).

3.2 Spectrum Characteristics

3.2.1 Transmitter

The interference in the POTS [9] should be less than +15 dB\text{Brn} (dB above reference noise) over the 0-4 kHz region with a flat weighting (noise power level lower than -75 dBm in 4 kHz or -111 dBm/Hz). Maximum transmitted power upstream is recommended to be lower than 7 dBm (or -40 dBm/Hz). The emitted signal should therefore be down to 70 dB in the POTS band. No special specifications have been provided for the upper band. Power should be low enough not to interfere in the downstream band. Furthermore, a fraction of power emitted in the higher frequency is transformed as noise in the POTS bands. Power emitted should therefore be as low as possible. A candidate spectrum mask is shown in Figure 2.

![Spectrum Mask](image)

**Figure 2** Emitted spectrum requirements at transmitter.

3.2.2 Receiver

There can be a large difference at the receiver between the reflected Tx energy and the wanted Rx energy. Reflected Tx is supposed to be eliminated by the analogue front-end (before A/D conversion). If the front end filters are not tight enough then the reflected Tx could easily saturate the A/D converters and generate non-linearities or reduce dramatically the A/D converters dynamic range.
4. G.Lite Implementation

Typically, transmission is carried out by converting the bit-stream to a complex number representing the phase and amplitude of each tone. A Rotor is applied to compensate for any phase difference in Tx and Rx clocks. Gain Adjust is applied followed by an IFFT to convert the complex tones into a time domain signal which is interpolated and transmitted. A single block of data is referred to as a DMT symbol.

After Decimation, the receiver uses a Time Equaliser (TEQ) to equalise for the copper twisted-pair line. Due to the length of line, the TEQ requires a long FIR filter. The computational burden of this is eased somewhat by the transmitter which places a cyclic prefix consisting of a copy of the end of the current block at the start of the symbol. This allows most of the equalisation to be done in the Frequency Domain by the Frequency Equaliser (FEQ). An FFT is used to convert the Time Domain signal to a set of complex tones. After the FEQ, the Rotor function is performed. Figure 4 shows a G.Lite implementation.

5. FILU-DMT

The FILU-DMT is composed of two DSP co-processor cores, the FILU-200 and the FILU-50. The first is a high speed dual MAC FFT engine which performs the main G.Lite DSP block functions. The second is a single MAC filter engine that perform the interpolation and decimation on the A/D and D/A data. Decimation and interpolation "on-the-fly" allows a highly
area efficient solution since it cuts down on the RAM requirements. The next two sub-sections examine the FILU-200 and the FILU-50 co-processor cores in more detail.

5.1 FILU-200 DSP Co-processor (FFT Engine)

The architecture of the FILU-200 is shown in Figure 5. It is optimised for fast complex number arithmetic and can implement very high speed FFT operations. It is a dual MAC architecture with supporting ALU and adder plus two barrel shifters. Specific FFT addressing modes are implemented in hardware and twiddle factors are stored in compressed format in a dedicated ROM table. Each cycle up to nine parallel operations can be performed. It can perform a radix-4 FFT butterfly in 8 cycles which results in a 1024 point complex FFT in 128 $\mu$s, at an 80 MHz clock rate. This is much faster than most of today's high-end DSPs.

The core DSP functions are micro-coded in ROM. In addition to real and complex FFTs and IFFTs, FIR and IIR filters, correlation, Taylor series and matrix/vector operations are provided. Additional functions can be programmed in RAM.

To ensure adequate SNR for G.Lite applications, the FFT Engine utilises a 22 x 16 bit MAC, 20-bit internal data path with 44-bit accumulation and block floating-point arithmetic. During an FFT the barrel shifter monitors the accumulator and determines the minimum scaling needed for the next pass. The barrel shifter can scale each accumulator result by a power of 2 prior to saving to a register.

![FFT Engine Architecture](image)

5.2 FILU-50 DSP Co-processor (Filter Engine)

The FILU-50 is ultra-small (less than 3,500 gates). It consists of a single two-cycle MAC with a minimum register set to support decimation and interpolation. The A/D and D/A for G.Lite typically operate at 4.4 MHz or 8.8 MHz. The data is sample rate converted to 1.1 MHz for block processing by the FILU-200. By using a separate filter engine to do sample based interpolation and decimation on the A/D and D/A the Host and FILU-200 RAM requirements are kept to a minimum.
Up to 40 MIPS are provided for the interpolation and decimation functions. The desired Tx and Rx spectrum masks can be achieved by a combination of filtering at the 1.1 MHz rate using the FILU-200 and interpolation and decimation using the FILU-50. Up to 40 MIPS are available on the FILU-50 for this purpose. The FILU-50 RAM is used to store coefficients and filter memory. Fast interrupts are provided to provide very low overhead for the A/D and D/A input/output handling.

**Figure 6**  
FILU Filter Engine

### 5.3 G.Lite using the FILU-DMT

Figure 7 shows an implementation of G.Lite using the FILU-DMT. The DSP intensive G.Lite functions are off-loaded from the Host processor to the FILU-DMT co-processor. In this way, the FILU-DMT co-processor enhances the throughput of the Host by operating in parallel and by virtue of its DSP optimised architecture.

**Figure 7**  
FILU-DMT Reference Architecture.
5.4 Host Interface

The Host processor communicates with the FILU-DMT via a shared RAM interface. The RAM is memory mapped into the Host address space. A bus-request/bus-grant or a master/slave protocol is used to share the RAM. Data, parameters and coefficients are fully programmable and are passed to and from the Host via the shared RAM. The Host initiates functions on the FILU-DMT by placing macrofunction calls (e.g. FFT) and parameters in the RAM. The microcoded DSP macrofunctions are retrieved from the FILU-DMT ROM and are executed independently of the Host and in parallel. The Host can view the FILU-DMT co-processor as a memory mapped peripheral.

The user controls the FILU-DMT co-processor by means of a simple Application Programming Interface (API). This allows the user to harness the power of the FILU-DMT via C function calls. The details of the hardware interface and the DSP macrofunction implementation are "hidden" from the user behind this API. In fact, the DSP macrofunctions, such as the FFT, are written in highly optimised microcode and are supplied with the FILU-DMT package. An instruction set simulator (ISS) allows the user to develop and test their application entirely in C before "going to Silicon". This shortens development time since all of the low-level the DSP code is provided.

5.5 Benchmarks

Since the G.Lite DMT symbol is 250 µs in duration, all processing must be achieved within this time budget. As can be seen from Table 1, the FILU-DMT engine typically processes the received data in roughly 72 µs. This assumes 20 coefficients are used for the TE and 256 points for the FFT. This leaves sufficient MIPS for the transmit functions plus an overhead for fast retraining in the case of on-/off-hook transitions and for line probing algorithms [11].

<table>
<thead>
<tr>
<th>Task</th>
<th>Number of points</th>
<th>Number of Cycles</th>
<th>Number of Cycles</th>
<th>Exec Time µs</th>
</tr>
</thead>
<tbody>
<tr>
<td>TE</td>
<td>256</td>
<td>N/2</td>
<td>2816</td>
<td>35.2 µs</td>
</tr>
<tr>
<td>FFT</td>
<td>256</td>
<td>N(log₂(N)+2)</td>
<td>1609</td>
<td>21.2 µs</td>
</tr>
<tr>
<td>FE</td>
<td>128</td>
<td>3N</td>
<td>422</td>
<td>5.3 µs</td>
</tr>
<tr>
<td>ROT</td>
<td>128</td>
<td>4N</td>
<td>564</td>
<td>7.0 µs</td>
</tr>
<tr>
<td>R/W</td>
<td>512</td>
<td>0.5N</td>
<td>282</td>
<td>3.5 µs</td>
</tr>
<tr>
<td>Total</td>
<td></td>
<td></td>
<td></td>
<td>72.1 µs</td>
</tr>
</tbody>
</table>

Table 1 Sample FILU-DMT G.Lite Receive Benchmarks.

6. FILU-DMT Hardware Details

The FILU-DMT hardware is generated in synthesizable Verilog Register Transfer Language (RTL). As a result, the FILU-DMT is technology independent and can be targeted to any process or voltage. Automatic synthesis tools are used in-house to ensure that the FILU-DMT can be quickly and easily migrated to a new technology and can be integrated with any Host or RAM. The design employs a single synchronous clock which facilitates a high scan test coverage. Initially, the FILU-DMT is intended for implementation at 80 MHz in a 0.35 µm TLM (triple metal layer) process. The total area for the FILU-DMT solution is 35,000 gates or
0.85 mm². By industry standards, this is an ultra-small DSP co-processor. For consumer applications, such as G.Lite, small area is critical for low cost.

7. Conclusions

This paper has presented the FILU-DMT DSP co-processor core for G.Lite applications. The paper examined the generic DSP requirements of G.Lite and describes the specific VLSI solution developed by Massana. The solution incorporates a number of innovative features and achieves performance equal to the best DSP processors at a fraction of the area cost.

8. References