DYNAMIC RECONFIGURABLE LIFTING-BASED WAVELET PACKET PIPELINE PROCESSOR FOR REAL-TIME AUDIO APPLICATION

Alexey Petrovsky, Maxim Rodionov, Alexander Petrovsky

Department of Computer Engineering, Belarusian State University of Informatics and Radioelectronics

6, P. Brovky str., 220013, Minsk, Belarus

phone: + (375-17) 293-23-40, fax: + (375-17) 331-09-14, email: palex@bsuir.by

web: www.bsuir.by

ABSTRACT

In this paper, dynamic algorithm transforms (DAT) for reconfigurable real-time audio application based on the adaptive wavelet packet (WP) decomposition are presented. DAT techniques is to constrain a minimum cost sub-band decomposition of wavelet transform by maximizing the minimum masking threshold (which is limited by the perceptual entropy) in every sub-band for the given embedded processor architecture and temporal resolution. The processor architecture is based on the implementation of the wavelet transform by means of its factoring into lifting steps. Practical reconfiguration strategies for the given processor are presented.

1. INTRODUCTION

The wavelet packet (WP) as a generalization of the standard wavelet transform provides a more flexible choices for time-frequency (time-scale) representation of signals [1]. In many applications, such as the design of cost-effective real-time multimedia systems and high quality audio transmission and storage. In parallel to the definition of the ISO/MPEG standards, several audio coding algorithms have been proposed that use the wavelet transform, in particular, adaptive wavelet packet transform, as the tool to decompose the signal [2,3]. In practice, WP are often implemented using a tree-structured filter bank [2-4]. The WP is a set of transformations that admit any type of tree-structured filter bank, that provides a different time-frequency tiling map. Applying the lifting scheme [5-7] for the construction of wavelets filter bank allows significantly reduce the number of arithmetic operations that are necessary to compute the transform.

Algorithm transformation techniques [8,9] such as have been employed in high-speed DSP system design. All of the above mentioned techniques are applied during the VLSI design phase and their implementation is time invariant. Therefore, this class of signal processing techniques is referred as static techniques.

Recently, dynamic techniques both of the circuit level and algorithmic level have been proposed. These techniques are based on the principles that the input signal is usually nonstationary, and hence, it is better (from a coding perspective) to adapt the algorithm and architecture to the input signal. Such systems are referred to as reconfigurable signal processing systems [10,11]. The key goal of these techniques is to improve the algorithm performance by exploiting variability in the data and channel.

Our approach is to design of dynamic algorithm transform (DAT) for design of application-specific reconfigurable lifting-based WP pipeline processor, in particular, for audio processing in real-time [8,12]. The principle behind DAT techniques is to define parameter of input audio signals (sub-band entropy) and output encoded sequences (subband rate) for the given embedded processor architecture. Adaptive wavelet analysis for audio signal processing purposes is particularly interesting if the psychoacoustic information is considered in the WP decomposition scale. Due to the lack of selectivity of wavelet filter banks, the psychoacoustic information is computed in the wavelet domain.

2. DYNAMIC TRANSFORMATION OF WP DECOMPOSITION

We present adaptive WP tree derived via DAT’s. The principle behind DAT is to define parameter of input signals (subband entropy) and output sequences (subband rate) for the given embedded processor architecture. In other hands, DAT techniques is to construct a minimum cost subband decomposition of WP by maximizing the minimum masking threshold (which is limited by the perceptual entropy(PE)) in every subband for the given computational complexity (for the given embedded processor architecture) C and temporal resolution. Achieving this purpose, we suppose that the tree structure of WP decomposition is adapted, as closely as possible, to the critical bands (CB – WPDB: (l, n) ∈ EDB) as shown in [8]. For the WP tree structure Ei the information density H belong to tree Ei is estimated as

\[ H_{E_i} = \sum_{v(l,n) \in E_i} \sum_{k} w_{E_i}(k) \cdot \ln(w_{E_i}(k)), \]

(1)

where

\[ w_{E_i}(k) = \frac{|x_{l,n,k}|}{\sum_{v(l,n) \in E_i} |x_{l,n,k}|}. \]

(2)

Here x_{l,n,k} are wavelet coefficients, l is a decomposition level, n is the node number of decomposition level, k is the index of the current wavelet coefficient of the node (l,n).
The growing decision for WP tree based on the given $H$ is being taken in terms of allowing the further decomposition of the WP tree can be expressed as:

$$H_{E_l} > H_{E_{l-1}}. \quad (3)$$

If (3) is true we continue the subband splitting process in WP tree, otherwise the suboptimal decomposition for the given frame of signal is founded. 

The subband splitting process is managed based on the estimated values of $PE$ in parent and child nodes of current WP tree structure. $PE$ estimation is described in [13,14] and expressed as

$$PE_{E_{n-1}} = \sum_{k=0}^{r} log_2 \left( 2 \left\lfloor \text{int}(SMR_{E_{n,k}}) \right\rfloor + 1 \right), \quad (4)$$

where $SMR_{E_{n,k}}$ is a ration between the absolute value of the wavelet coefficients $x_{i,n,k}$ in a subband of tree $E_{i}$ (node $(l,n)$), and the corresponding masking threshold $T_{E_{i,n}}$, which is linearly spread among the $K_{n,i}$ coefficients $x_{i,n,k}$, $k = \{0,K_{n,i}\}$ of node $(l,n)$. The large magnitude of $SMR_{E_{n,k}}$ determines node $(l,n)$ significance for $PE$ formation.

Each allowed parent node $(l,n)$ is split on two child nodes $(l+1,2n)$ and $(l+1,2n+1)$, if and only if the sum of $PE_{E_{l+1,2n}}$ and $PE_{E_{l+1,2n+1}}$ in the child nodes less than in the current node $PE_{E_{l,n}}$, that can written as

$$PE_{E_{l,n}} > PE_{E_{l+1,2n}} + PE_{E_{l+1,2n+1}}. \quad (5)$$

The computational resource at current WP tree level $c_2$ is estimated, and if it is more than the maximum allowed computation resource $C$ the WP tree decomposition is terminated.

The dynamic WP tree structure growing level by level based on the $H$ and corresponding time-frequency tilling map are demonstrated in a figure 1.

![WP tree structure creation and corresponding time-frequency tilling map](image)

Figure 1 – WP tree structure creation and corresponding time-frequency tilling map.

Applying the information density $H$, the perception entropy $PE$, the limited WP tree structure $CB − WPD$ and the maximum allowed computation resource $C$ together in WP growing procedure allows us to found suboptimal solution for input signal analysis on the given hardware architecture.

3. WP IMPLEMENTATION BASED ON LIFTING SCHEME

In the tree-based scheme of the WP, each node of the tree consists of a 2-channel filter bank. Each node can be broken down into a finite sequence of simple filtering steps, which are called lifting steps or ladder structures. In [15] for two-channel filter bank proposed a method of transition from the implementation on the basis of FIR filters to architecture at the based on lifting scheme. The decomposition is essentially a factorization of the polyphase matrix of the wavelet filters into elementary matrices. As discussed in [6,15], the lifting steps scheme consists of three phases: the first step splits the data into two subsets: even and odd; the second step recalculates the coefficients (high-pass) as the failure to predict the odd set based on the even; finally the third step updates the even set using the wavelet coefficients to compute the scaling function coefficients (low-pass). This method allows to reduce on half the number of multiplications and summations. The filtering results, finding scaling $X_{l+1,2n}(z)$ and wavelet $X_{l+1,2n+1}(z)$ coefficients relative to the input signal $X_{l,n}(z)$ in $z$ domain can written as the follows:

$$[X_{l+1,2n}(z) \quad X_{l+1,2n+1}(z)] = [X_{l,n,e}(z) \quad X_{l,n,o}(z)] \times \prod_{i=1}^{l/2} \left[ \begin{array}{cc} 1 & 0 \\ 0 & 1 \end{array} \right] \left[ \begin{array}{cc} K_1 & 0 \\ 0 & K_2 \end{array} \right]. \quad (6)$$

In general case the polynomials $s_i(z)$ and $t_i(z)$ can represented as $(b_0 + b_1 z^{-1})^n$, where $b_0$, $b_1$ are constants, $n$ is the integer exponent. For example, the $b_0$, $b_1$ and $u$ parameters of lifting scheme for db4 wavelet mother function are presented in table 1. $K_1$ and $K_2$ are equal $-0.1202$ and $-0.3192$ correspondingly.

<table>
<thead>
<tr>
<th>$b_0$</th>
<th>$b_1$</th>
<th>$u$</th>
</tr>
</thead>
<tbody>
<tr>
<td>3.1029</td>
<td>1.6625</td>
<td>0.3141</td>
</tr>
<tr>
<td>-5.1995</td>
<td>0.0763</td>
<td>-0.3192</td>
</tr>
<tr>
<td>0.3141</td>
<td>-0.9220</td>
<td>0.0763</td>
</tr>
<tr>
<td>0.0763</td>
<td>-0.9220</td>
<td>-0.3192</td>
</tr>
</tbody>
</table>

Table 1 – The parameters of lifting scheme for db4 (8 taps)
In this work, for fixed point WP implementation an arithmetic with an arbitrary number of integer and fractional bits is used as proposed in [16,17]. The advantage of this number representation is the fact that it can be realized using conventional integer arithmetic resource.

The block diagrams of the lifting step for the cases when \( b_1 = 0 \), and for the case when \( b_1 \neq 0 \) (see ) are shown in figure 2 and figure 3 correspondingly. The solid and the dashed lines at the begin and the end of the block diagrams are associate with the instances \( s_i(z) \) and \( t_i(z) \) accordingly. The arithmetic shift right operation (block shift) is realized in compliance with the implemented fixed-point algorithm and amounts to a signal lines shift before submitting them to the adder input. So, the shifters are not used system hardware resources.

![Figure 2 - Block diagram of the lifting step when \( b_1 = 0 \)](image)

![Figure 3 - Block diagram of the lifting step when \( b_1 \neq 0 \)](image)

4. PIPELINE PROCESSOR WITH DYNAMIC RECONFIGURABLE ARCHITECTURE

4.1 Run time reconfigurable architecture

The structure of reconfigurable DSP system for signal analysis based on DAT approach consists of the specific microprocessor oriented on the signal processing (DSP microprocessor) and WP processor itself with the reconfigurable architecture. The DSP microprocessor perform several task, such as: processing wavelet coefficients \( X_{m,n,k} \) in sub-bands \((l,n)\) that corresponds to the current WP tree structure \( E_l \); estimate \( H_{E_l} \) and \( P_{E_{l,n}} \); obtain the reconfiguration vector for WP processor \( r_{l,n} \), \((l,n) \in E_l \). WP processor is realized on pipeline architecture with dynamic reconfiguration for implementing adaptive WP. The length of the pipeline is obtained from the limited WP tree structure \((CB–WPD)\).

A great dependance of the process on the WP structure grows to the necessity of introducing an easily reconfigurable parallel-pipeline structure with computation resource \( C \). Thus, the DSP system for audio processing based on DAT-approach consists as shown on figure 4.

4.2 WP lifting based pipeline processor

The pipeline architecture is applied for effective implementation of the WP algorithm, as it confirmed in \([7,18]\).

We suggest the pipeline architecture for constructing the WP lifting based processor. This architecture integrates the sequential connection of the homogeneous block that implement a two-channel filter bank that allows to calculate WP with an arbitrary tree structure. The maximum number of decomposition level that can be realize is 8, it is associate with the depth of the \( CB–WPD \).

Thus, the basic decomposition of WP expressed as processing unit (PU) which acts a two-channel filter bank based on lifting scheme. The block diagram of PU is shown in figure 5. The input sequence \( X_{m,n,k} \) is split into even \( X_e \) and odd \( X_o \) samples in PU before the processing is started according to the lifting scheme. The structure of PU has the following abbreviations (see figure 5): \( wi \) is a bit capacity; \( I \) is a number of elementary steps of the lifting scheme; \( V_{PF} \) is a vectors, each element of it is a set of the parameters of the same elementary step of the lifting scheme; \( V_{BUF} \) is a vector, each element of it specify the number of delays, respectively in the upper and lower channels after the same elementary step. The present elements corresponds to FIFO registers that, on the one hand are delay elements \( z^{-1} \) in the algorithm and, on the other hand, makes possible a pipelined realization in architecture for throughput performance increasing. The coefficients \( K_1 \) and \( K_2 \) applies to the result of lifting scheme as it described in (6). The estimated hardware resources required for PU implementation are shown in a table 2.

Table 2 – Estimation of hardware resources for PU implementation

<table>
<thead>
<tr>
<th>Resource type</th>
<th>Utilized</th>
</tr>
</thead>
<tbody>
<tr>
<td>Multipliers ( wi \times wi )</td>
<td>( N+2 )</td>
</tr>
<tr>
<td>Address ((wi – bit capacity))</td>
<td>( N )</td>
</tr>
<tr>
<td>Registers ((wi – bit capacity))</td>
<td>( N+1 )</td>
</tr>
<tr>
<td>Multiplexers 2-in-1 ((wi – bit capacity))</td>
<td>( 1 )</td>
</tr>
</tbody>
</table>

\( N \) – is the number of filter taps.

![Figure 4 – DAT-based reconfigurable signal processing system.](image)
Next, buffer/switch unit (BSU) realizes double buffering scheme known as “ping-pong” for providing parallel access to the data for storing results and getting source data from/for PU. The additional channel is for outputting the result data. The two output streams of samples \( x_{l+1,2n,k} \) and \( x_{l+1,2n+1,k} \) from \( l \)-th PU are stored in BSU and simultaneously \( l + 1 \)-th PU can get the samples for the next processing stage. Unified block diagram of BSU is represented on the figure 6. Each BSU in parallel-pipeline architecture has addressed a different memory size that depends on the WP decomposition level.

The reconfiguration vector \( r_{i,n} \), decoding, memory address generation, PU enabling, data exchange controlling and the pipe-line synchronizing are performed by the control units (CU). All these functions in CU at each WP processor stages is carried out in parallel. The stages work of the pipeline WP processor are synchronized according to the DAT’s technique.

4.3 Dynamic WP tree decomposition algorithm

Suppose, a computation resource \( (m,r_{i,n}) \) is limited by \( C \). The limiting WP tree structure is \( CB – WPD: (l,n) \in E_{CB} \). The required computation resource of \( l \)-th WP tree \( (l,n) \in E_i \) is defined as value \( c_i \). The split decision of dividing the parent node \( (l,n) \) on the two children \( (l + 1,2n) \) and \( (l + 1,2n + 1) \) is referred as a \( split(l,n) \) where \( l \) is a transformation scale level, \( n \) is \( n \)-th node of the level \( l, m \) is a stage of the pipeline processor. Number of the input frame of the audio signal is \( j \).

**STEP 1.** Let \( j = 1, m = 1, l = 0 \) and \( split(l,n) = YES, r_{i,n} = YES \), WP tree root node is \( (0,0) \) for the first frame of the input audio signal with perceptual entropy \( PE_{0,0} \) and information density \( H_E_{0,0} \) and reconfiguration process is allowed. **STEP 2.** \( i = j \), the first frame of the input audio signal defines growing process of WP tree structure. Making the signal decomposition is based on \( i \)-th PU.

**STEP 3.** Estimate the perceptual entropy in each node \( PE_{i,m} \) and information density \( H_{E_{i,m}} \) of the WP tree structure.

**STEP 4.** Check the WP tree structure information density \( H_{E_{i,m}} \) in comparison with the WP tree

\[
\text{IF } H_{E_{i,m}} < H_{E_{i-1,m}} \text{ THEN that is not an audio signal, the coefficients are not processed and GOTO STEP 1.}
\]

**STEP 5.** \( l = l + 1 \).

If \( l – 1 \) > maximum of the scale level of the limiting WP tree structure \( CB – WPD: E_{CB} \), THEN STOP – the growing process for the WP tree structure \( E_{i,m} \) is finished.

**STEP 6.** Check the WP tree structure \( E_{i,m} \) nodes belonging to \( CB – WPD: E_{CB} \).

IF \( (l,n) \in E_{m-1} \), THEN \( r_{i,n} = NO \).

**STEP 7.** \( m = m + 1 \). WP tree structure \( E_{i,m} \) growing is performed as follows

For each node \( n \) of the level \( i \):
- estimate and check the adequacy of the WP tree computation resource \( c_{i,m} \),
- IF \( c_{i,m} > C \), THEN \( r_{i,n} = NO \) and STOP – the growing process for the WP tree structure \( E_{i,m} \) is finished.
- perform the decomposition of the parent node \( (l,n) \),
- calculate the perceptual entropy into the child nodes \( PE_{i+1,2n} \) and \( PE_{i+1,2n+1} \):
  IF \( PE_{i+1,2n} \geq PE_{i+1,2n+1} \), THEN \( split(l,n) = YES, r_{i,n} = YES \)
  ELSE \( split(l,n) = NO, r_{i,n} = NO \).

**STEP 8.** \( j = j + 1 \). Read the next frame of the input audio signal. It will be processed according to the WP tree structure \( E_{i,m} \).

**STEP 9.** Estimate the information density \( H_{E_{i,m}} \) of the WP tree structure \( E_{i,m} \):

IF \( H_{E_{i,m}} > H_{E_{i,m-1}} \), THEN \( r_{i,n} = NO \) and STOP – the growing process for the WP tree structure \( E_{i,m} \) is finished.

**STEP 10.** GOTO STEP 5.

**STOP.** Optimal WP tree structure for the \( i \)-th input frame of the audio signal is \( E_{i,m-1}, m = m - 2, l = l - 2, i = i + 1 \). GOTO STEP 5.

The processing of the first six frames for the pipeline WP-processor is shown on figure 7; \( j \) is a number of the frame which was loaded to WP processor for processing according to the WP tree struc-
ture $E_{m,i}$ of the current frame $i$. Darkened areas show where the optimal WP tree for the each frame was found, and the arrows redefine the new WP tree structure for the next current frame, which the information density $WTE_{m,i}$ of the WP tree structure is estimated on.

![Diagram of dynamic reconfiguration](image)

**Figure 7** – Diagram of the dynamic reconfiguration WP tree structures and multi-frame processing in the parallel-pipeline WP processor

### 5. CONCLUSION

In the given paper the dynamic reconfigurable lifting based adaptive WP processor was presented. The lifting scheme allows to reduce on half the number of multiplications and summations and increase the processing speed. Applying DAT-based approach as the design techniques for time-varying WP decomposition allows us to construct dynamically adapted to input signal WP analysis.

Reconfigurable systems offers several advantages over competing alternatives: faster and smaller than general purpose hardware solutions; lower development cost than dedicated hardware solutions; dynamic reconfigurable supports multiple algorithms within a single application; multi-purpose architecture generates volume demand for a single hardware design.

The proposed techniques optimize system performance, and, in addition, provides a convenient framework within which ongoing research in the areas of non-uniform filter bank applied to speech/audio coding algorithms and reconfigurable architectures can be synergistically combined to enable the design of reconfigurable high-performance DSP systems.

### 6. ACKNOWLEDGMENT

This work was supported by Belarusian republican fund for fundamental research under the grant T08MC-040.

### REFERENCES