frequency,  $f_{CL}$ , which are placed at the dominant pole and at the gain-bandwidth product of the transconductance amplifier, respectively. At frequencies higher than the gain-bandwidth product, the loop no longer affects the transfer gain and the amplifier response is simply that of the differential stage M1-M4. The high cutoff frequency,  $f_{CH}$ , is set by the high-frequency poles at the amplifier output. The values of the low and high cutoff frequencies are

$$f_{CL} = \frac{g_{m1,2}}{2\pi C_1} \tag{1}$$

$$f_{CH}^* = \frac{g_{m3,4}}{2\pi C_O} \tag{2}$$

where  $C_0$  is the total capacitance at one of the output nodes.

Consider now the effect of the auxiliary stage, with a frequency response as shown by curve (ii) in Fig. 2. It is characterised by high-pass behaviour with a zero at frequency  $f_{ZZ}$  given by

$$f_{Z2} = \frac{1}{2\pi r_{B2} C_2} \tag{3}$$

where  $r_{B2}$  is the output resistance of current generator IB2.



Fig. 3 Frequency response of different circuits

- (i) circuit in Fig. 1 without auxiliary inverting stage (ii) circuit in Fig. 1 with auxiliary inverting stage
- (iii) auxiliary inverting stage only

At low frequencies, the auxiliary stage does not contribute to the main frequency response since capacitors C1 and C2 are open circuits. At high frequencies C1 and C2 are shorted out and an amplified replica of the input signal with opposite sign is supplied to the inverting input of main amplifier.

We can considerably increase the bandwidth of the whole amplifier if the unity-gain frequency of the auxiliary stage equals the high cutoff frequency of the main amplifier. For this condition, the frequency response of the whole amplifier (close to the high cutoff frequency) is represented by the bold line in Fig. 3, which shows an increased bandwidth. To achieve this performance we have to impose the condition that

$$\frac{f_{Z2}}{A_1} = f_{CH}^* \tag{4}$$

where A1 is the DC gain (less than unity) of the auxiliary stage and is given by

$$A_1 \frac{1}{a_{m0} r_{R2}} \tag{5}$$

By combining eqns. 3-5 we obtain the expression of the required bypass capacitor C2

$$C_2 = \frac{g_{m9}}{g_{m3,4}} C_O \tag{6}$$

Circuit evaluation: The circuit was simulated with a 0.8µm p-well CMOS technology. The supply voltage was set to 3V and bias currents IB1 and IB2 were both set to 100µA. Transistor dimensions and other electrical parameters are reported in Table 1.

The frequency response of the amplifier with and without the auxiliary stage, and that of the auxiliary stage is shown in Fig. 3

by curves (i), (ii) and (iii), respectively. The high cutoff frequencies in curves (i), (ii) are 290 and 650MHz, and thus a > 300MHz improvement in bandwidth is achieved. The gain in both cases is ~15dB

Table 1: Main electrical parameters

| Component   | Value       |
|-------------|-------------|
| VDD-VSS     | 3V          |
| IB1 IB2     | 100μΑ       |
| M1 M2       | 30/0.8μm/μm |
| M3 M4 M5 M6 | 3/0.8µm/µm  |
| M7 M8       | 3/0.8µm/µm  |
| M9          | 15/0.8μm/μm |
| M10         | 3/0.8µm/µm  |
| C1          | 1 pF        |
| C2          | 0.2pF       |

Finally, worst-case conditions due to process tolerances were also considered. They caused a variation in gain and high cutoff frequency of < 1dB and 50MHz, respectively.

© IEE 1999 4 May 1999 Electronics Letters Online No: 19990745

DOI: 10.1049/el:19990745

G. Palmisano and S. Pennisi (Dipartimento Elettrico Elettronico e Sistemistico, Universita di Catania, Viale Andrea Doria, 6 I-95125 Catania, Italy)

E-mail: spennisi@dees.unict.it

#### References

- 1 ROFOUGARAN, A., CHANG, J.Y.C., ROFOUGARAN, M., and ABIDI, A.A., : 'A 1GHz CMOS RF front-end IC for a direct-conversion wireless receiver', IEEE J. Solid State Circuits, 1996, 31, pp. 880–889
- 2 SHAHANI, A.R., SHAEFFER, D.K., and LEE, T.H.: A 12 mW wide dynamic range CMOS front-end for a portable GPS receiver', IEEE J. Solid State Circuits, 1997, 32, (12), pp. 2061-2070
- 3 WU, S., and RAZAVI, B.: 'A 900-MHz/1.8GHz CMOS receiver for dual-band applications'. IEEE ISSCC Dig. Tech. Paper, New York, NY, USA, 1998, pp. 124-125
- 4 PALMISANO, G., and SALERNO, R.: 'A replica biasing for constantgain CMOS open-loop amplifiers'. IEEE ISCAS Dig. Tech. Paper, Monterey, California, 1998
- 5 PALMISANO, G., and PENNISI, S.: 'A 20 dB CMOS IF amplifier with embedded single-to-differential input converter'. IEEE ISCAS'99, 1999
- 6 LAKER, K., and SANSEN, W.: 'Design of analog integrated circuits and systems' (McGraw-Hill, 1994)

## Scheduling input-queued switches by shadow departure time algorithm

### J. Li and N. Ansari

A new scheduling algorithm for input-queued cell switches, referred to as a shadow departure time algorithm (SDTA), is introduced. Simulations demonstrate that the cell delay distribution of the SDTA is more desirable than the distributions of the GS-OCF and GS-LQF algorithms in terms of cell overdue probability.

Introduction: The input-queued (IQ) switching architecture is attractive for high-speed switch implementation owing to its scalability. The throughput of an IQ switch using the longest queue first (LQF) [1] algorithm and oldest cell first (OCF) [2] algorithm can achieve 100% throughput under all admissible independent traffic conditions. As approximations of the LQF and OCF algorithms, the GS-LQF and GS-OCF algorithms have been introduced [3] using stable matching [4], and thus have lower complexity than the LQF and OCF algorithms, which employ

maximum weight matching. The disadvantage of the LQF and GS-LQF algorithms is that they can lead to starvation under certain conditions [2].

In this Letter, a shadow departure time algorithm (SDTA), which is starvation-free, is proposed to achieve better performance than the GS-LQF and GS-OCP algorithms while retaining the same level of complexity.

Shadow departure time algorithm (SDTA): Consider an  $N \times N$  IQ cell switch, consisting of N inputs, N outputs, and a non-blocking switch fabric, with virtual output queueing (VOQ) [2], in which multiple VOQs directed to different outputs are maintained at each input. Assume that there exists a shadow  $N \times N$  FIFO output-queued (OQ) switch, and that exactly the same traffic to the IQ switch is concurrently fed into the shadow switch. SDT(e), the shadow departure time of a cell c, is defined as the point in time at which the cell departs the shadow switch. Since FIFOs are used both in the VOQs of the IQ switch and the output queues of the shadow switch, in the same VOQ the cell that arrives later will have a larger SDT, and the head-of-line (HOL) cell will have the smallest SDT among all the cells belonging to the same VOQ.

Let  $Q_{i,j}$  denote the VOQ directed to output j at input i. To apply SDTA,  $w_{i,j}(n)$ , the weight of  $Q_{i,j}$  at time slot n is defined as

$$w_{i,j}(n) = \begin{cases} SDT(\mathbf{c}_{i,j}^0(n)) - n & \text{if } Q_{i,j} \text{ is not empty} \\ \infty & \text{otherwise} \end{cases}$$

in which  $\mathbf{c}_{i,j}^{0}(n)$  is the HOL cell of  $Q_{i,j}$  at time slot n. According to the above definition, the cell that has a smaller weight is quicker to leave the switch.

SDTA searches for a stable match [4] between inputs and outputs by setting the preference list for every input and output following the rule: input i prefers output j with a smaller value for  $w_{i,j}(n)$ , and ties are broken randomly; conversely, output j prefers input i with a smaller value for  $w_{i,j}(n)$ , and ties are also broken randomly. Stable matching [4] seeks to match N inputs with N outputs so that there is no pair consisting of an input and an output which prefer each other to the 'partners' with which they are currently matched.

It can be shown that the VOQ which has the smallest weight will always be chosen to transmit the HOL cell. If an HOL cell of a VOQ is not served in a timeslot, its weight will decrease by one, thus eventually becoming small enough to be served. Hence, SDTA is a starvation-free algorithm.



Fig. 1 Cumulative distributions of cells under i.i.d. on-off traffic with load of 80%

(i) GS-LQF (ii) SDTA (iii) GS-OCF

Performance: To simulate the bursty nature of real traffic, an on-off traffic model was used in the simulations. The on-off traffic model assumes that the source has two states: OFF and ON. In the OFF state, the source does not send any cells. In the ON state, the source sends data cells at the peak cell rate (P). At each timeslot, the source in the OFF state changes to the ON state with a probability  $\alpha$ . Similarly, the source in the ON state changes to the OFF state with a probability  $\beta$ . There is no correlation between the two probabilities.

The performance of the SDTA, GS-OCF and GS-LQF algorithms was simulated in a  $16 \times 16$  switch. 256 i.i.d. flows, each belonging to a different input-output pair, were created in the simulations. Bursty traffic was generated based on the on-off traffic model.  $\beta$  was chosen to be 0.1 and the peak cell rate was set to be the link capacity. Each simulation lasted for  $10^6$  timeslots.

Fig. 1 shows the cumulative distributions of cells against cell delay using the GS-OCF, GS-LQF and SDTA algorithms under i.i.d. on-off traffic with a traffic load of 80%. The curves approximate the cumulative density functions of cell delay. The Figure indicates that the cumulative distribution of SDTA is always larger than that of GS-OCF, implying that for a given delay bound a greater percentage of cells can be transmitted within the delay bound using SDTA than using GS-OCF. In other words, cells using SDTA have a lower probability of being overdue than those using GS-OCF. When the cell delay is large, the cumulative cell distribution rates using SDTA and GS-OCF are almost identical, but that of GS-LQF is smaller, implying that more cells experience much longer latency than those using GS-OCF and SDTA.

Conclusions: A new algorithm, SDTA, which employs stable matching, thus having a lower complexity than algorithms which use maximum weight matching, has been proposed to improve on existing stable matching algorithms in terms of QoS features. It has been proven that SDTA is starvation-free. Simulations also show that SDTA has a larger cumulative distribution of cell delay than GS-OCF, which implies that switches using SDTA have a lower probability of cells being overdue than those using GS-OCF.

Acknowledgments: This work was partially supported by Lucent Technologies, and was carried out in part while N. Ansari was on leave at the Department of Information Engineering, Chinese University of Hong Kong. The authors would like to acknowledge the fruitful discussion with S. Li on input queuing.

© IEE 1999 23 April 1999 Electronics Letters Online No: 19990789

DOI: 10.1049/el:19990789

J. Li and N. Ansari (New Jersey Center for Wireless Telecommunications, New Jersey Institute of Technology, Newark NJ 07102, USA)

N. Ansari: Corresponding author

E-mail: ang@njit.edu

#### References

- 1 McKEOWN, N., ANANTHARAM, V., and WALRAND, J.: 'Achieving 100% throughput in an input-queued switch'. Proc. INFOCOM'96, March 1996, San Francisco, CA, pp. 296-302
- 2 MEKKITTIKUL, A., and McKEOWN, N.: 'A starvation-free algorithm for achieving 100% throughput in an input-queued switch'. Proc. ICCCN'96, October 1996, pp. 226-231
- 3 McKEOWN, N.: 'Scheduling algorithms for input-queued cell switches'. Ph.D. Dissertation, University of California, Berkeley, 1995.
- 4 GALE, D., and SHAPLEY, L.S.: 'College admissions and the stability of marriage', American Mathematical Monthly, 1962, 69, pp. 9-15

# DSP implementation of real-time MPEG-2 audio decoder using novel synthesis filter bank

Won-Kyu Paik and Sun-Young Hwang

An efficient synthesis filter is presented which can carry out realtime MPEG-2 audio decoding. The proposed algorithm reduces the number of MAC operations by adopting novel IDCT and windowing schemes, exploiting a multichannel structure, and implementing CGD techniques. The DSP implementation is MPEG-2 compliant and achieves real-time processing with 60% reduction in runtime compared with a fast ISO decoder.