SlideShare a Scribd company logo
International Journal of VLSI design & Communication Systems (VLSICS) Vol.3, No.4, August 2012
DOI : 10.5121/vlsic.2012.3404 37
FPGA IMPLEMENTATION OF EFFICIENT VLSI
ARCHITECTURE FOR FIXED POINT 1-D DWT USING
LIFTING SCHEME
Durga Sowjanya1
, K N H Srinivas2
and P Venkata Ganapathi3
1
Research fellow, Sri Vasavi Engineering College, Tadepalligudem
k.durgasowjanya@gmail.com
2
Head of the department in Sri Vasavi Engineering College, Tadepalligudem
Knh.tridents@gmail.com
3
Venkata Ganapathi Puppala, Quartics Technologies Pvt Ltd, Pune
ganapathi.pv@gmail.com
ABSTRACT
In this paper, a scheme for the design of area efficient and high speed pipeline VLSI architecture for the
computation of fixed point 1-d discrete wavelet transform using lifting scheme is proposed. The main focus
of the scheme is to reduce the number and period of clock cycles and efficient area with little or no
overhead on hardware resources. The fixed point representation requires less hardware resources
compared with floating point representation. The pipelining architecture speeds up the clock rate of DWT
and reduced bit precision reduces the area required for implementation. The architecture has been coded
in verilog HDL on Xilinx platform and the target FPGA device used is Virtex-II Pro family, XC2VP7-
7board. The proposed scheme requires the least computing time for fixed point 1-D DWT and achieves the
less area for implementation, compared with other architectures. So this architecture is realizable for real
time processing of DWT computation applications.
KEYWORDS
Discrete wavelet transform (DWT), Lifting based scheme, field-programmable gate-array (FPGA), pipeline
architecture, reduced bit precision, fixed point, VLSI architecture.
1. INTRODUCTION
The advantages of the wavelet transform over conventional transforms, such as the Fourier
transform, are now well recognized. Because of its excellent locality in time-frequency domain,
wavelet transform is remarkable and extensively used for signal analysis, compressing and de-
noising. Since the development of the theory for the computation of the discrete wavelet
transform (DWT) by Mallat [1] in1989, the DWT has been increasingly used in many different
areas of science and engineering mainly because of the multi resolution decomposition property
of the transformed signals. Definition given for DWT by Mallat [1] provided possibility of its
implementation in hardware and software. The discrete wavelet transform (DWT) performs a
multi resolution signal analysis which has adjustable locality in both the space (time) and
frequency domains [1].The DWT is computationally intensive because of multiple levels of
decomposition involved in the computation of the DWT. It is therefore challenging to design an
efficient VLSI architecture to implement the DWT computation for real-time applications,
particularly those requiring processing of high-frequency or broadband signals [2]–[4]. Using
finite impulse response (FIR) filters and then sub sampling is the classical method for
International Journal of VLSI design & Communication Systems (VLSICS) Vol.3, No.4, August 2012
38
implementing the DWT. Due to the large amount of computations required, there have been many
research efforts to develop new algorithms [15] .Many architectures have been proposed in order
to provide high-speed and area-efficient implementations for the DWT computation [5]–[8]. In
[9]–[11], the poly phase matrix of a wavelet filter is decomposed into a sequence of alternating
upper and lower triangular matrices and a diagonal matrix to obtain the so-called lifting-based
architectures with low hardware complexity.
The pipeline architectures have the advantages of requiring a small memory space and a short
computing time and are suitable for real-time computations. However, these architectures have
some inherent characteristics that have not yet been fully exploited in the schemes for their
design. The computational performance of such architectures could be further improved, provided
that the design with pipeline make sure of lifting steps to the maximum extent possible,
synchronizes the operations of the stages optimally, and utilizes the available hardware resources
optimally.
In this paper, a scheme for the design of pipeline architecture for a fast computation of the DWT
is developed. The goal of fast computation is achieved by minimizing the number and period of
clock cycles. The main idea used for minimizing these two parameters is to optimally distribute
the task of the DWT computation among the stages of the pipeline and lifting scheme of 9/7 filter.
In the study, we focus on the issues of theoretical path and internal memory size with 9/7 filters.
To ease the tradeoff between the pipeline stages of 1-D architecture, a modified algorithm is
proposed for the design of 1-D pipeline architecture. Based on the modified data path of lifting-
based DWT, the proposed architecture achieves the one-multiplier delay constraint but uses less
internal memory compared to the related architectures. Moreover, the proposed architecture
implements the 9/7 filters by cascading the three main components.
Due to recent advances in the technology, implementation of the DWT on field programmable
gate array (FPGA) and digital signal processing (DSP) chips has been widely developed. Based
on [4], the main challenges in the hardware architectures for 1-D DWT are the processing speed
and the number of multipliers. The number of multipliers in each pipeline stage determines the
clock speed of the structure.
This paper is organized as follows. In Section II, Discrete Wavelet Transform is presented. In
Section III, choice of pipeline for the 1-d dwt is presented, Section IV, the Lifting based scheme
is presented, Section V, briefly introduces the underlying concepts of the architecture of 1-d
DWT. Section VI, Presents Performance evaluation and FPGA implementation and compares the
proposed architecture with other related studies. Finally, a brief conclusion is given in Section
VII.
2. DISCRETE WAVELET TRANSFORM
In this section the theoretical background and algorithm development is discussed. The first
recorded mention of what is now called a "wavelet" seems to be in 1909, in a thesis by Alfred
Haar. An image is represented as a two dimensional (2D) array of coefficients, each coefficient
representing the brightness level in that point. When looking from a higher perspective, it is not
possible to differentiate between coefficients as more important ones, a lesser important ones. But
thinking more intuitively, it is possible. Most natural images have smooth color variations, with
the fine details being represented as sharp edges in between the smooth variations.
Technically, the smooth variations in color can be termed as low frequency components and the
sharp variations as high frequency components. The low frequency components (smooth
variations) constitute the base of an image, and the high frequency components (the edges which
International Journal of VLSI design & Communication Systems (VLSICS) Vol.3, No.4, August 2012
39
give the detail) add upon them to refine the image, thereby giving a detailed image. Hence, the
averages/smooth variations are demanding more importance than the details [4].In wavelet
analysis, a signal can be separated into approximations and detail coefficients. Averages are the
high-scale, low frequency components of the signal. The details are the low scale, high frequency
components. This coefficients measure the signal energy distribution in each frequency channel
corresponding to the scaling parameter j at the time k.
The Discrete Time wavelet Transform (DWT) has found many applications in digital signal
processing, due to the efficient computation and the sufficient properties for non-stationary signal
analysis.
For the wavelet analysis, the structure is given in figure 1(a). As a result, the DWT decomposes a
digital signal into different sub bands so that the lower frequency sub bands have finer frequency
resolution and coarser time resolution compared to the higher frequency sub bands. The DWT is
being increasingly used for image compression due to the fact that the DWT supports features
like progressive image transmission (by quality, by resolution), ease of compressed image
manipulation, region of interest coding, etc.
2.1. One dimensional DWT:
Any signal is first applied to a pair of low-pass and high-pass filters. Then down sampling (i.e.,
neglecting the alternate coefficients) is applied to these filtered coefficients. The filter pair (h, g)
which is used for decomposition is called analysis filter-bank and the filter pair which is used for
reconstruction of the signal is called synthesis filter bank.(g`, h`).The output of the low pass filter
after down sampling contains low frequency components of the signal which is approximate part
of the original signal and the output of the high pass filter after down sampling contains the high
frequency components which are called details (i.e., highly textured parts like edges) of the
original signal.
The output from low pass filter G (z) represents the approximate coefficient denoted by S j
(n).
S j
(n)= (k)G(2n-k)
The output from high pass filter H(z) represents the detailed coefficient denoted by W j
(n).
W j
(n)= (K)H(2n-k)
International Journal of VLSI design & Communication Systems (VLSICS) Vol.3, No.4, August 2012
40
3. PIPELINE FOR THE 1-D DWT COMPUTATION
In a pipeline structure for the DWT computation, multiple stages are used to carry out the
computations of the various decomposition levels of the transform. Thus, the computation
corresponding to each decomposition level needs to be mapped to a stage or stages of the
pipeline. In order to maximize the hardware utilization of a pipeline, the hardware resource of a
stage should be proportional to the amount of the computation assigned to the stage. Since the
amount of computations in successive decomposition levels of the transform gets reduced by a
factor of two, two scenarios can be used for the distribution of the computations to the stages of a
pipeline. In the first scenario, the decomposition levels are assigned to the stages so as to equalize
the computations carried out by each stage, i.e., the hardware requirements of all the stages are
kept the same. In the second scenario, the computations of the successive decomposition levels
are assigned to the successive stages of a pipeline on a one-level-to-one-stage basis. Thus, in this
case, the hardware requirement of the stages gets reduced by a factor of two as they perform the
computations corresponding to higher level decompositions.
A stage-equalized pipeline structure is the one in which the computations of all the levels are
distributed equally among the stages. The process of stage equalization can be accomplished by
dividing equally the task of a given level of decomposition into smaller subtasks and assigning
each such sub task to a single stage and/or by combining the tasks of more than one consecutive
level of decomposition into a single task and assigning it to a single stage. Note that, generally, a
division of the task would be required for low levels of decomposition and a combination of the
tasks for high levels of decomposition.
In a one-to-one mapped structure, the computations of decomposition levels are distributed
exactly among all stages. In this structure, the computations of the first levels are carried out by
the first stage, remaining levels are carried out by remaining stages respectively and those of the
last levels are performed recursively by the second stage. Thus, for either pipeline structure, i.e.,
the one-to-one mapped or stage-equalized, a two-stage pipeline would be the best choice in terms
of hardware efficiency and from the standpoint of design and implementation simplicity. Note
that the five-stage version of either pipeline structure is the same but due to the flexibility of
designing the architecture, stage equalized pipelining structure is preferred.
4. LIFTING BASED SCHEME
The lifting scheme has been developed as a flexible tool suitable for constructing the second
generation wavelet. It is composed of three basic operation stages: splitting, predicting, and
updating.
Fig.2 shows the lifting scheme of the wavelet filter computing one dimension signal:
• Split step: where the signal is split into even and odd points, because the maximum correlation
between adjacent pixels can be utilized for the next predict step.
• Predict step: The even samples are multiplied by the predict factor and then the results are added
to the odd samples to generate the detailed coefficients.
• Update step: the detailed coefficients computed by the predict step are multiplied by the update
factors and then the results are added to the even samples to get the coarse coefficients. Note that
the details and approximation coefficients (d, s) in lifting scheme, respectively, are the same as
high pass and low pass outputs. Daubechies and Sweldens first derived the lifting-based discrete
wavelet transform [11], [12].
International Journal of VLSI design & Communication Systems (VLSICS) Vol.3, No.4, August 2012
41
The lifting scheme can decompose DWT filter bank into several lifting steps. As h~
(z) and g~
(z)
are the low-pass and high-pass analysis filters; the poly phase matrix p~
(z) is defined as follows:
The poly phase matrix p~
(z) can be factorized into a sequence of alternating upper and lower
triangular matrices multiplied by a constant diagonal matrix.
The 9/7 filter has two lifting steps and one scaling step .The detailed algorithm of the 9/7 filter is
described from (2) to (7). First, the input sequences xi are split into even and odd parts, Si
0
and di
0
.Second, the two splitting sequences are performed by two lifting steps. The outputs are denoted
as Si
n
and di
n
, where n represents the stage of lifting step. Finally, through the normalization
factors k1 andk2, the low-pass and high-pass wavelet coefficients Si and di can be obtained.
1. Splitting Step:
di
0
=x2i+1 (2)
si
0
=x2i (3)
2. Lifting Steps:
2.1. First lifting step
di
1
=di
0
+α× (si
0
+si+1
0
) (predictor) (4)
si
1
=si
0
+β× (di-1
1
+di
1
) (updater) (5)
2.2. Second lifting step
di
2
=di
1
+γ× (si
1
+si+1
1
) (predictor) (6)
si
2
=si
1
+δ× (di-1
2
+di
2
) (updater) (7)
Several architectures [14], [15] have been proposed to directly implement the lifting structures of
the 9/7 filters. The five pipeline stages are used to improve the processing time, but the critical
path is still restricted by the computation of predictor or updater (i.e., two adders and one
multiplier propagation delay).
International Journal of VLSI design & Communication Systems (VLSICS) Vol.3, No.4, August 2012
42
5. 1-D DWT ARCHITECTURE
By combining the functional units described in previous section we can construct one
dimensional DWT. The architecture can be applied to implement the lifting-based 1-D DWT. The
structure processes all input samples that arrive in pairs at consecutive clock pulses and the results
for each pair are ready after five cycles. However, due to the pipelined structure, the clock
frequency is higher than that of parallel architectures. There is a trade-off between the clock
speed and the number of pipeline stages.Figure3 shows the proposed architecture for the 9/7 fixed
point DWT computation.
Figure3: Architecture for lifting scheme for fixed point 1-d DWT
In this architecture we advocated a five stage pipeline structure for the computation of 1-
DDWT.The proposed structure is constrained by the nature of the DWT computation and is
capable of optimizing the use of hardware resources.
In this five stage pipeline structure, all stages need to share the computation. Hence, all the stages
need to be synchronized with one other. The pipeline registers in this proposed architecture are
used in better way to optimize the use of hardware resources. Every stage performs by operating
on the data produced by the previous stage. In this section we present the design of proposed five
stage pipeline architecture by using 9 by 7 filter coefficients, pipeline registers and delay
elements. The design of this architecture mainly focused on fixed point DWT computation.
In order to show the efficiency of our architecture, several architectures are chosen for
comparison. In the proposed architecture, the clock pulses required to compute outputs are less
than those in the previous architectures. This is due to the sequential states required to complete
the computation of each output.
The architecture uses fixed point representation for arithmetic. The bit width of data inputs of the
first stage of the 1-D DWT is 11 bits. That include 1 sign bit, 8 integral bits and the fractional
bits are chosen to be 2 bits. Ideally the pixel inputs are 8 bits but having 11 bits signed input
makes the hardware more generic. Whereas the coefficients inputs have 1 sign bit, 1 integral bit
and 7 fractional bits. The bit precision will grow after the multiplication and addition performed
in each stage of the DWT. To reduce the propagation delays in the digital circuit, the fractional
part of the multiplier output is truncated by 7 bits before performing the addition. In this way the
first stage of the DWT data outputs will have 19 bits (1 sign, 16 integral, 2 fractional bits).
Looking at the 9/7 DWT coefficients, we can say that the output never crosses 4 times that of the
inputs so we are safe to truncate 7 integral bits from the first stage output and feed to the second
stage input. Similar approach is applied on the second stage also. After the second stage the
outputs will have 20 bits (1 sign, 17 integral and 2 fractional bits). Saturation is applied to clip the
data outputs between 0 and 255 (the 8-bit pixel range).
International Journal of VLSI design & Communication Systems (VLSICS) Vol.3, No.4, August 2012
43
6. PERFORMANCE EVALUATION AND FPGA IMPLEMENTATION
In order to evaluate the performance of the architecture resulting from the proposed scheme, we
need to make use of certain metrics that characterize the architecture in terms of the hardware
resources used and the computation time. A five stage pipelined architecture is implemented and
the simulation result of that is shown in figure 4.
For simulation of this architecture, YUV image is applied as the input. The image is given in the
form of array that is represented by [0: ROWS-1] [0: COLUMNS-1]. The image is splitted into
two parts as [0: ROWS/2-1] [0: COLUMNS/2-1] and [0: ROWS/2-1][0:COLUMNS/2-1]. That
image can be used in the verilog code by using fopen command. We get the output image in two
ports-output low pass image:[0:ROWS-1][0:COLUMNS/2-1], output high pass image [0:ROWS-
1][0:COL/2-1]. In the simulation it is observed that when the input applies, the output is obtained
after five clock cycles.
The hardware resources used for the filtering operation are measured by the number of multipliers
and the number of adders, and that used for the memory space and pipeline latches is measured
by the number of registers. The hardware resources utilization is shown in figure
5.Thecomputation time, in general, is technology dependent. However, a metric, which is
independent of the technology used but can be utilized to determine the computation time, is the
number of clock cycles consumed from the instant the first sample is inputted to the last sample
outputted assuming a given clock-cycle period, for example, unity, as the latency of a MAC cell.
Figure 4: Simulation result for pipelining architecture
International Journal of VLSI design & Communication Systems (VLSICS) Vol.3, No.4, August 2012
44
Figure5: Hard ware resources utilization
Figure6: Timing constraints for DWT computation
For the DWT computation, the comparison for the metrics mentioned before for various
architectures are summarized in Table I. It is seen from the table that, compared to the
architecture of [17],all the other architectures, including the proposed one, require approximately
twice the number of clock cycles, except the architecture of [14], which requires four times as
many clock cycles.
Table I: comparison of various architectures
Architecture Tc(ns)
Parallel(13) 17.8
Systolic(14) 11.8
Pipelined(17) 11.8
DRU(18) 10.2
International Journal of VLSI design & Communication Systems (VLSICS) Vol.3, No.4, August 2012
45
IP core (19) 11.8
Pipeline with parallelism
(27)
8.7
Proposed 8.280
Table II: Resources Used in the FPGA devices
Resource Used Available in Total Percentage used
CLB Slices 158 4928 3%
Flip Flop Slices 230 9856 2%
4-input LUTS 133 9856 1%
Bonded IOBs 70 248 28%
This performance of [17] is achieved by utilizing the hardware resources of adders and
multipliers that are four times that required by the architecture of [14] and twice that required by
any of the other architectures. In order to verify the estimated results, an implementation of the
circuit is carried out in FPGA. Verilog is used for the hardware description and Xilinx ISE 8.2i
for the synthesis of the circuit on a Virtex-II Pro XC2VP7-7 board. The implementation is
evaluated with respect to the clock period (throughput) measured as the delay of the critical path
of the MAC-cell network, and the resource utilization (area) measured as the number of
configuration logic block slices, DFFs, lookup tables, and input/output blocks. The resources used
by the implementation are listed in Table II. The circuit is found to perform well with a clock
period as short as 8.280 ns. The clock period and timing constraints are shown in figure 6.
7. CONCLUSION
In this paper, a scheme for the design of pipeline architecture for a real-time computation of the
fixed point 1-D DWT has been presented. The objective has been to achieve a low computation
time by maximizing the operational frequency and minimizing the number of clock cycles
required for the DWT computation, which, in turn, have been realized by developing a scheme
for two lifting steps with 9 by 7 filtering and having five pipeline stages for the pipeline
architecture.
A study has been undertaken, which suggests that, in view of the nature of the DWT computation,
it is most efficient to map the overall task of the DWT computation to only five pipeline stages.
There are two main ideas that have been employed for the internal design of each stage in order to
enhance pipelining for DWT computation. The first idea was to decompose the filtering operation
into two sub tasks that operate independently on the even- and odd-numbered input samples,
respectively. This idea stems from the fact that the DWT computation is a two-sub band filtering
operation, and for each consecutive decomposition level. Each subtask of the filtering operation is
performed by a MAC-cell network with coefficients taken from 9 by 7 filters. The second idea
employed for enhancing pipeline is to minimize the delay of the critical path.
In order to assess the effectiveness of the proposed scheme, pipeline architecture has been
designed and simulated. The simulation results have shown that the architecture designed based
on the proposed scheme requires the smallest number of clock cycles to compute output samples
and a reduction of at least 60% in the period of the clock cycle in comparison to those required by
the other architectures with a comparable hardware requirement. An FPGA based implementation
of the designed architecture has been carried out, demonstrating the effectiveness of the proposed
scheme for designing efficient and realizable architectures for the DWT computation.
International Journal of VLSI design & Communication Systems (VLSICS) Vol.3, No.4, August 2012
46
Finally, the principle of pipelining architecture using lifting scheme presented in this paper for the
design of architecture for the 1-D DWT computation is extendable to that for the 2-D DWT
computation.
REFERENCES
[1] S. Mallat, “A theory for multi resolution signal decomposition: The wavelet representation,” IEEE
Trans. Pattern Anal. Mach. Intell., vol.11, no. 7, pp. 674–693, Jul. 1989.
[2] J. Chilo and T. Lindblad, “Hardware implementation of 1D wavelet transform on an FPGA for infra
sound signal classification, ” IEEE Trans. Nucl. Sci., vol. 55, no. 1,pp. 9–13, Feb. 2008.
[3] S. Cheng, C. Tseng, and M. Cole, “Efficient and effective VLSI architecture for a wavelet-based
broadband sonar signal detection system,” in Proc. IEEE 14th ICECS, Marrakech, Morocco, Dec.
2007,pp. 593–596.
[4] K. G. Oweiss, A. Mason, Y. Suhail, A. M. Kamboh, and K. E.Thomson, “A scalable wavelet
transform VLSI architecture for real-time signal processing in high-density intra-cortical implants,
”IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 54, no. 6, pp. 1266–1278, Jun. 2007.
[5] K. Andra, C. Chakrabati, and T. Acharya, “A VLSI architecture for lifting-based forward and inverse
wavelet transform,” IEEE Trans .Signal Process., vol. 50, no. 4, pp. 966–977, Apr. 2002.
[6] C. Huang, P. Tseng, and L. Chen, “Analysis and VLSI architecture for1-D and 2-D discrete wavelet
transform,” IEEE Trans. Signal Process. ,vol. 53, no. 4, pp. 1575–1586, Apr. 2005.
[7] M. Martina and G. Masera, “Multiplierless, folded 9/7–5/3 wavelet VLSI architecture,” IEEE Trans.
Circuits Syst. II, Exp. Briefs, vol. 54,no. 9, pp. 770–774, Sep. 2007.
[8] A. Acharyya, K. Maharatna, B. M. Al-Hashimi, and S. R. Gunn,“Memory reduction methodology for
distributed-arithmetic-based DWT/IDWT exploiting data symmetry,” IEEE Trans. Circuits Syst.
II,Exp. Briefs, vol. 56, no. 4, pp. 285–289, Apr. 2009.
[9] K. A. Kotteri, S. Barua, A. E. Bell, and J. E. Carletta, “A comparison of hardware implementations of
the biorthogonal 9/7 DWT: Convolution versus lifting,” IEEE Trans. Circuits Syst. II, Exp. Briefs,
vol. 52, no.5, pp. 256–260, May 2006.
[10] C. Wang and W. S. Gan, “Efficient VLSI architecture for lifting-based discrete wavelet packet
transform,” IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 54, no. 5, pp. 422–426, May 2007.
[11] G. Shi, W. Liu, L. Zhang, and F. Li, “An efficient folded architecture for lifting-based discrete
wavelet transform,” IEEE Trans. Circuits Syst.II, Exp. Briefs, vol. 56, no. 4, pp. 290–294, Apr. 2009.
[12] C. T. Huang, P. C. Tseng, and L. G. Chen, “Efficient VLSI architectures of lifting-based discrete
wavelet transform by systematic design method,” in Proc. IEEE ISCAS, vol. 5, May 2002, pp. 565–
568.
[13] C. Chakrabarti, M. Vishwanath, and R. M. Owens, “Architectures for wavelet transforms: A survey,”
J. VLSI Signal Process., vol. 14, no. 2,pp. 171–192, Nov. 1996.
[14] A. Grzesczak, M. K. Mandal, and S. Panchanathan, “VLSI implementation of discrete wavelet
transform,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 4, no. 4, pp. 421–433, Dec. 1996.
[15] T. Acharya, C. Chakrabarti, A survey on lifting-based discrete wavelet transform architectures. J.
VLSI Signal Process. 42, 321–339 (2006)
International Journal of VLSI design & Communication Systems (VLSICS) Vol.3, No.4, August 2012
47
[16] I. Daubechies, W. Sweldens, Factoring wavelet transform into lifting steps. J. FourierAnal. Appl. 4,
247–269 (1998)54 Discrete Wavelet Transforms: Algorithms and Applications VLSI Architectures of
Lifting-Based Discrete Wavelet Transform 15
[17] C. Zhang, C. Wang, and M. O. Ahmad, “A VLSI architecture for a high-speed computation of the 1D
discrete wavelet transform,” in Proc.IEEE Int. Symp. Circuits Syst., Kobe, Japan, May 2005, vol. 2,
pp.1461–1464.
[18] Chengjun Zhang, Chunyan Wang and M. Omair Ahmad, “A Pipeline VLSI Architecture for High-
Speed Computation of the 1-D Discrete Wavelet Transform” IEEE Trans. Circuits Systems I,pp.1529-
8328,Feb 2010.
[19] An Efficient Architecture for 2-D Lifting-based Discrete Wavelet Transform PingpingYu,Suying
Yao, JiangtaoXu School of Electronic and Information Engineering Tianjin University Tianjin,
China,p.p-978-1-4244-2800-7/09-IEEE-2009
[20] VLSI Architectures of Lifting-Based Discrete Wavelet Transform Sayed Ahmad Salehi and Rasoul
Amirfattahi Isfahan University of Technology, Department of Electrical and Computer Engineering,
Digital Signal Processing Research Lab., Isfahan Iran.
[21] A High-Performance and Memory-Efficient Pipeline Architecture for the 5/3 and 9/7 Discrete
Wavelet Transform of JPEG2000 Codec Bing-Fei Wu, Senior Member, IEEE, and Chung-Fu Lin,
Student Member, IEEE -IEEE Transactions on Circuits and systems for video
technology,vol.15,no.12,December 2005
[22] A Rescheduling and Fast Pipeline VLSI Architecture for Lifting-based Discrete Wavelet Transform
Bing-Fei Wu and Chung-Fu Lin Department of Electrical and Control Engineering National Chiao
Tung University 1001 Ta Hsueh Road, Hsinchu, Taiwan, 30050, R.O.C
[23] Lifting Based Discrete Wavelet Transform Architecture for JPEG2000 Chung-JrLian, Ktian-Ftr Chen,
Hong-Hui Chen, and Liang-Gee Chen DSP/AC Design Lab., Department of Electrical Engineering,
National Taiwan University, Taipei 106, Taiwan, R.O.C.
[24] F. Marino, D. Guevorkian, and J. Astola, “Highly efficient high-speed/low-power architectures for 1-
D discrete wavelet transform ,”IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 47, no. 12, pp.1492–
1502, Dec. 2000.
[25] T. Park, “Efficient VLSI architecture for one-dimensional discrete wavelet transform using a scalable
data recorder unit,” in Proc.ITC-CSCC, Phuket, Thailand, Jul. 2002, pp. 353–356.
[26] S. Masud and J. V. McCanny, “Reusable silicon IP cores for discrete wavelet transform
applications,” IEEE Trans. Circuits Syst. I, Reg. Papers,vol. 51, no. 6, pp. 1114–1124, Jun. 2004.
[27] C.-T. Huang, P.-C.Tseng, L.-G. Chen, Flipping structure: an efficient VLSI architecture for lifting
based discrete wavelet transform, IEEE Trans. Signal Process. 52 (2004), pp.1080–1089
[28] K. K. Parhi and T. Nishitani, “VLSI architectures for discrete wavelet transforms,” IEEE Trans. Very
Large Scale Integr. (VLSI) Syst., vol. 1,no. 6, pp. 191–202, Jun. 1993.
[29] M. Vishwanath, R. M. Owens, and M. J. Irwin, “VLSI architectures for the discrete wavelet
transform,” IEEE Trans. Circuits Systems II, Analog.Digit. Signal Process., vol. 42, no. 5, pp. 305–
316, May 1995.
[30] C. Cheng and K. K. Parhi, “High-speed VLSI implementation of 2-Ddiscrete wavelet transform,”
IEEE Trans. Signal Process., vol. 56, no.1, pp. 393–403, Jan. 2008.
International Journal of VLSI design & Communication Systems (VLSICS) Vol.3, No.4, August 2012
48
[31] VLSI Implementation of Discrete Wavelet Transform (DWT) for Image Compression Abdullah
AlMuhit, Md. Shabiul Islam and Masuri Othman* Faculty of Engineering Multimedia University
(MMU) Jalan multimedia, Cyberjaya, Selangor 63100,Malaysia.
Authors
K. Durga Sowjanya was born in Koyyalagudem (Andhra Pradesh). She received
B.Tech in Electronics and Communication Engineering from Jawaharlal Nehru
Technological University, Kakinada. She is currently pursuing M.Tech from Sri
Vasavi Engineering College, J.N.T University, Kakinada, India. Her research interest
includes image and signal processing algorithms and VLSI architecture development.
She did her Master’s thesis in VLSI Architecture for Computation of 1-D DWT.
Mr.K.N.H.Srinivas, received the B.Tech., degree in electronics and communication
engineering from S.V.H. College of Engineering, Machilipatnam, Nagarjuna
University and Completed M.Tech (Electronic Instrumentation) at National Institute
of Technology, Warangal, India. He is currently working as Head of the Department
of Electronics and Communication Engineering, Sri Vasavi Engineering College. He
is a fellow IETE &ISTE and guided eight post graduate students so far. He published
eight research papers in reputed international journal and international conferences.
Venkata Ganapathi Puppala, received the B.Tech., degree in electronics and
communication engineering from JNTU University, Hyderabad, India, in 2005, and
the M.Tech degree in electronics instrumentation engineering from the National
Institute of Technology, Warangal, India, in 2007. In 2007, he joined iChip
Technologies, Hyderabad. And worked as ASIC Design engineer and involved
development of Application Specific Instruction Set Processors for Video Encoder
and Decoders for H.264/MPEG-4 AVC, VC1 standards. Later he joined in Quartics
Technologies, Pune where he is currently working as ASIC Design engineer and
involved in ASIP architectures for computer vision, video post processing and 2D video to 3D video
conversion algorithms.
Ad

More Related Content

What's hot (19)

Time domain analysis and synthesis using Pth norm filter design
Time domain analysis and synthesis using Pth norm filter designTime domain analysis and synthesis using Pth norm filter design
Time domain analysis and synthesis using Pth norm filter design
CSCJournals
 
A Multiple Access Technique for Differential Noise Shift Keying: A Review of ...
A Multiple Access Technique for Differential Noise Shift Keying: A Review of ...A Multiple Access Technique for Differential Noise Shift Keying: A Review of ...
A Multiple Access Technique for Differential Noise Shift Keying: A Review of ...
IRJET Journal
 
Performance Analysis of OFDM Transceiver with Folded FFT and LMS Filter
Performance Analysis of OFDM Transceiver with Folded FFT and LMS FilterPerformance Analysis of OFDM Transceiver with Folded FFT and LMS Filter
Performance Analysis of OFDM Transceiver with Folded FFT and LMS Filter
idescitation
 
Packets Wavelets and Stockwell Transform Analysis of Femoral Doppler Ultrasou...
Packets Wavelets and Stockwell Transform Analysis of Femoral Doppler Ultrasou...Packets Wavelets and Stockwell Transform Analysis of Femoral Doppler Ultrasou...
Packets Wavelets and Stockwell Transform Analysis of Femoral Doppler Ultrasou...
IJECEIAES
 
wavelet packets
wavelet packetswavelet packets
wavelet packets
ajayhakkumar
 
Fpga sotcore architecture for lifting scheme revised
Fpga sotcore architecture for lifting scheme revisedFpga sotcore architecture for lifting scheme revised
Fpga sotcore architecture for lifting scheme revised
ijcite
 
A Novel Approach of Area-Efficient FIR Filter Design Using Distributed Arithm...
A Novel Approach of Area-Efficient FIR Filter Design Using Distributed Arithm...A Novel Approach of Area-Efficient FIR Filter Design Using Distributed Arithm...
A Novel Approach of Area-Efficient FIR Filter Design Using Distributed Arithm...
IOSR Journals
 
Modified Adaptive Lifting Structure Of CDF 9/7 Wavelet With Spiht For Lossy I...
Modified Adaptive Lifting Structure Of CDF 9/7 Wavelet With Spiht For Lossy I...Modified Adaptive Lifting Structure Of CDF 9/7 Wavelet With Spiht For Lossy I...
Modified Adaptive Lifting Structure Of CDF 9/7 Wavelet With Spiht For Lossy I...
idescitation
 
FPGA Realizations of Walsh Transforms for Different Transform and Word Length...
FPGA Realizations of Walsh Transforms for Different Transform and Word Length...FPGA Realizations of Walsh Transforms for Different Transform and Word Length...
FPGA Realizations of Walsh Transforms for Different Transform and Word Length...
IJECEIAES
 
Automated Piecewise-Linear Fitting of S-Parameters step-response (PWLFIT) for...
Automated Piecewise-Linear Fitting of S-Parameters step-response (PWLFIT) for...Automated Piecewise-Linear Fitting of S-Parameters step-response (PWLFIT) for...
Automated Piecewise-Linear Fitting of S-Parameters step-response (PWLFIT) for...
Piero Belforte
 
Gu2512391243
Gu2512391243Gu2512391243
Gu2512391243
IJERA Editor
 
Iaetsd computational performances of ofdm using
Iaetsd computational performances of ofdm usingIaetsd computational performances of ofdm using
Iaetsd computational performances of ofdm using
Iaetsd Iaetsd
 
wcnc05
wcnc05wcnc05
wcnc05
Charan Litchfield
 
High Speed Decoding of Non-Binary Irregular LDPC Codes Using GPUs (Paper)
High Speed Decoding of Non-Binary Irregular LDPC Codes Using GPUs (Paper)High Speed Decoding of Non-Binary Irregular LDPC Codes Using GPUs (Paper)
High Speed Decoding of Non-Binary Irregular LDPC Codes Using GPUs (Paper)
Enrique Monzo Solves
 
Dg34662666
Dg34662666Dg34662666
Dg34662666
IJERA Editor
 
Chapter 5 pc
Chapter 5 pcChapter 5 pc
Chapter 5 pc
Hanif Durad
 
Low Memory Low Complexity Image Compression Using HSSPIHT Encoder
Low Memory Low Complexity Image Compression Using HSSPIHT EncoderLow Memory Low Complexity Image Compression Using HSSPIHT Encoder
Low Memory Low Complexity Image Compression Using HSSPIHT Encoder
IJERA Editor
 
iscas07
iscas07iscas07
iscas07
Charan Litchfield
 
Es25893896
Es25893896Es25893896
Es25893896
IJERA Editor
 
Time domain analysis and synthesis using Pth norm filter design
Time domain analysis and synthesis using Pth norm filter designTime domain analysis and synthesis using Pth norm filter design
Time domain analysis and synthesis using Pth norm filter design
CSCJournals
 
A Multiple Access Technique for Differential Noise Shift Keying: A Review of ...
A Multiple Access Technique for Differential Noise Shift Keying: A Review of ...A Multiple Access Technique for Differential Noise Shift Keying: A Review of ...
A Multiple Access Technique for Differential Noise Shift Keying: A Review of ...
IRJET Journal
 
Performance Analysis of OFDM Transceiver with Folded FFT and LMS Filter
Performance Analysis of OFDM Transceiver with Folded FFT and LMS FilterPerformance Analysis of OFDM Transceiver with Folded FFT and LMS Filter
Performance Analysis of OFDM Transceiver with Folded FFT and LMS Filter
idescitation
 
Packets Wavelets and Stockwell Transform Analysis of Femoral Doppler Ultrasou...
Packets Wavelets and Stockwell Transform Analysis of Femoral Doppler Ultrasou...Packets Wavelets and Stockwell Transform Analysis of Femoral Doppler Ultrasou...
Packets Wavelets and Stockwell Transform Analysis of Femoral Doppler Ultrasou...
IJECEIAES
 
Fpga sotcore architecture for lifting scheme revised
Fpga sotcore architecture for lifting scheme revisedFpga sotcore architecture for lifting scheme revised
Fpga sotcore architecture for lifting scheme revised
ijcite
 
A Novel Approach of Area-Efficient FIR Filter Design Using Distributed Arithm...
A Novel Approach of Area-Efficient FIR Filter Design Using Distributed Arithm...A Novel Approach of Area-Efficient FIR Filter Design Using Distributed Arithm...
A Novel Approach of Area-Efficient FIR Filter Design Using Distributed Arithm...
IOSR Journals
 
Modified Adaptive Lifting Structure Of CDF 9/7 Wavelet With Spiht For Lossy I...
Modified Adaptive Lifting Structure Of CDF 9/7 Wavelet With Spiht For Lossy I...Modified Adaptive Lifting Structure Of CDF 9/7 Wavelet With Spiht For Lossy I...
Modified Adaptive Lifting Structure Of CDF 9/7 Wavelet With Spiht For Lossy I...
idescitation
 
FPGA Realizations of Walsh Transforms for Different Transform and Word Length...
FPGA Realizations of Walsh Transforms for Different Transform and Word Length...FPGA Realizations of Walsh Transforms for Different Transform and Word Length...
FPGA Realizations of Walsh Transforms for Different Transform and Word Length...
IJECEIAES
 
Automated Piecewise-Linear Fitting of S-Parameters step-response (PWLFIT) for...
Automated Piecewise-Linear Fitting of S-Parameters step-response (PWLFIT) for...Automated Piecewise-Linear Fitting of S-Parameters step-response (PWLFIT) for...
Automated Piecewise-Linear Fitting of S-Parameters step-response (PWLFIT) for...
Piero Belforte
 
Iaetsd computational performances of ofdm using
Iaetsd computational performances of ofdm usingIaetsd computational performances of ofdm using
Iaetsd computational performances of ofdm using
Iaetsd Iaetsd
 
High Speed Decoding of Non-Binary Irregular LDPC Codes Using GPUs (Paper)
High Speed Decoding of Non-Binary Irregular LDPC Codes Using GPUs (Paper)High Speed Decoding of Non-Binary Irregular LDPC Codes Using GPUs (Paper)
High Speed Decoding of Non-Binary Irregular LDPC Codes Using GPUs (Paper)
Enrique Monzo Solves
 
Low Memory Low Complexity Image Compression Using HSSPIHT Encoder
Low Memory Low Complexity Image Compression Using HSSPIHT EncoderLow Memory Low Complexity Image Compression Using HSSPIHT Encoder
Low Memory Low Complexity Image Compression Using HSSPIHT Encoder
IJERA Editor
 

Similar to FPGA IMPLEMENTATION OF EFFICIENT VLSI ARCHITECTURE FOR FIXED POINT 1-D DWT USING LIFTING SCHEME (20)

A Detailed Survey on VLSI Architectures for Lifting based DWT for efficient h...
A Detailed Survey on VLSI Architectures for Lifting based DWT for efficient h...A Detailed Survey on VLSI Architectures for Lifting based DWT for efficient h...
A Detailed Survey on VLSI Architectures for Lifting based DWT for efficient h...
VLSICS Design
 
Hz2514321439
Hz2514321439Hz2514321439
Hz2514321439
IJERA Editor
 
Hz2514321439
Hz2514321439Hz2514321439
Hz2514321439
IJERA Editor
 
Hz2514321439
Hz2514321439Hz2514321439
Hz2514321439
IJERA Editor
 
project ppt (1)FINAL vlsi_field_gate.ppt
project ppt (1)FINAL vlsi_field_gate.pptproject ppt (1)FINAL vlsi_field_gate.ppt
project ppt (1)FINAL vlsi_field_gate.ppt
jadhavmanjiri04
 
Ek35775781
Ek35775781Ek35775781
Ek35775781
IJERA Editor
 
Design of Scalable FFT architecture for Advanced Wireless Communication Stand...
Design of Scalable FFT architecture for Advanced Wireless Communication Stand...Design of Scalable FFT architecture for Advanced Wireless Communication Stand...
Design of Scalable FFT architecture for Advanced Wireless Communication Stand...
IOSRJECE
 
Bf36342346
Bf36342346Bf36342346
Bf36342346
IJERA Editor
 
40520130101002
4052013010100240520130101002
40520130101002
IAEME Publication
 
GNU Radio based Real Time Data Transmission and Reception
GNU Radio based Real Time Data Transmission and ReceptionGNU Radio based Real Time Data Transmission and Reception
GNU Radio based Real Time Data Transmission and Reception
IRJET Journal
 
R044120124
R044120124R044120124
R044120124
IJERA Editor
 
Modified Distributive Arithmetic Based DWT-IDWT Processor Design and FPGA Imp...
Modified Distributive Arithmetic Based DWT-IDWT Processor Design and FPGA Imp...Modified Distributive Arithmetic Based DWT-IDWT Processor Design and FPGA Imp...
Modified Distributive Arithmetic Based DWT-IDWT Processor Design and FPGA Imp...
IOSR Journals
 
Implementation of Wide Band Frequency Synthesizer Base on DFS (Digital Frequ...
Implementation of Wide Band Frequency Synthesizer Base on  DFS (Digital Frequ...Implementation of Wide Band Frequency Synthesizer Base on  DFS (Digital Frequ...
Implementation of Wide Band Frequency Synthesizer Base on DFS (Digital Frequ...
IJMER
 
FPGA based Efficient Interpolator design using DALUT Algorithm
FPGA based Efficient Interpolator design using DALUT AlgorithmFPGA based Efficient Interpolator design using DALUT Algorithm
FPGA based Efficient Interpolator design using DALUT Algorithm
cscpconf
 
Digital Wave Simulation of Lossy Lines for Multi-Gigabit Applications
Digital Wave Simulation of Lossy Lines for Multi-Gigabit ApplicationsDigital Wave Simulation of Lossy Lines for Multi-Gigabit Applications
Digital Wave Simulation of Lossy Lines for Multi-Gigabit Applications
Piero Belforte
 
Area Efficient 9/7 Wavelet Coefficient based 2-D DWT using Modified CSLA Tech...
Area Efficient 9/7 Wavelet Coefficient based 2-D DWT using Modified CSLA Tech...Area Efficient 9/7 Wavelet Coefficient based 2-D DWT using Modified CSLA Tech...
Area Efficient 9/7 Wavelet Coefficient based 2-D DWT using Modified CSLA Tech...
IRJET Journal
 
Topology estimation of a digital subscriber line
Topology estimation of a digital subscriber lineTopology estimation of a digital subscriber line
Topology estimation of a digital subscriber line
IAEME Publication
 
Lc3519051910
Lc3519051910Lc3519051910
Lc3519051910
IJERA Editor
 
Efficient FPGA implementation of high speed digital delay for wideband beamfor...
Efficient FPGA implementation of high speed digital delay for wideband beamfor...Efficient FPGA implementation of high speed digital delay for wideband beamfor...
Efficient FPGA implementation of high speed digital delay for wideband beamfor...
journalBEEI
 
DIGITAL WAVE SIMULATION OF LOSSY LINES FOR MULTI-GIGABIT APPLICATION
DIGITAL WAVE SIMULATION OF LOSSY LINES FOR MULTI-GIGABIT APPLICATIONDIGITAL WAVE SIMULATION OF LOSSY LINES FOR MULTI-GIGABIT APPLICATION
DIGITAL WAVE SIMULATION OF LOSSY LINES FOR MULTI-GIGABIT APPLICATION
Piero Belforte
 
A Detailed Survey on VLSI Architectures for Lifting based DWT for efficient h...
A Detailed Survey on VLSI Architectures for Lifting based DWT for efficient h...A Detailed Survey on VLSI Architectures for Lifting based DWT for efficient h...
A Detailed Survey on VLSI Architectures for Lifting based DWT for efficient h...
VLSICS Design
 
project ppt (1)FINAL vlsi_field_gate.ppt
project ppt (1)FINAL vlsi_field_gate.pptproject ppt (1)FINAL vlsi_field_gate.ppt
project ppt (1)FINAL vlsi_field_gate.ppt
jadhavmanjiri04
 
Design of Scalable FFT architecture for Advanced Wireless Communication Stand...
Design of Scalable FFT architecture for Advanced Wireless Communication Stand...Design of Scalable FFT architecture for Advanced Wireless Communication Stand...
Design of Scalable FFT architecture for Advanced Wireless Communication Stand...
IOSRJECE
 
GNU Radio based Real Time Data Transmission and Reception
GNU Radio based Real Time Data Transmission and ReceptionGNU Radio based Real Time Data Transmission and Reception
GNU Radio based Real Time Data Transmission and Reception
IRJET Journal
 
Modified Distributive Arithmetic Based DWT-IDWT Processor Design and FPGA Imp...
Modified Distributive Arithmetic Based DWT-IDWT Processor Design and FPGA Imp...Modified Distributive Arithmetic Based DWT-IDWT Processor Design and FPGA Imp...
Modified Distributive Arithmetic Based DWT-IDWT Processor Design and FPGA Imp...
IOSR Journals
 
Implementation of Wide Band Frequency Synthesizer Base on DFS (Digital Frequ...
Implementation of Wide Band Frequency Synthesizer Base on  DFS (Digital Frequ...Implementation of Wide Band Frequency Synthesizer Base on  DFS (Digital Frequ...
Implementation of Wide Band Frequency Synthesizer Base on DFS (Digital Frequ...
IJMER
 
FPGA based Efficient Interpolator design using DALUT Algorithm
FPGA based Efficient Interpolator design using DALUT AlgorithmFPGA based Efficient Interpolator design using DALUT Algorithm
FPGA based Efficient Interpolator design using DALUT Algorithm
cscpconf
 
Digital Wave Simulation of Lossy Lines for Multi-Gigabit Applications
Digital Wave Simulation of Lossy Lines for Multi-Gigabit ApplicationsDigital Wave Simulation of Lossy Lines for Multi-Gigabit Applications
Digital Wave Simulation of Lossy Lines for Multi-Gigabit Applications
Piero Belforte
 
Area Efficient 9/7 Wavelet Coefficient based 2-D DWT using Modified CSLA Tech...
Area Efficient 9/7 Wavelet Coefficient based 2-D DWT using Modified CSLA Tech...Area Efficient 9/7 Wavelet Coefficient based 2-D DWT using Modified CSLA Tech...
Area Efficient 9/7 Wavelet Coefficient based 2-D DWT using Modified CSLA Tech...
IRJET Journal
 
Topology estimation of a digital subscriber line
Topology estimation of a digital subscriber lineTopology estimation of a digital subscriber line
Topology estimation of a digital subscriber line
IAEME Publication
 
Efficient FPGA implementation of high speed digital delay for wideband beamfor...
Efficient FPGA implementation of high speed digital delay for wideband beamfor...Efficient FPGA implementation of high speed digital delay for wideband beamfor...
Efficient FPGA implementation of high speed digital delay for wideband beamfor...
journalBEEI
 
DIGITAL WAVE SIMULATION OF LOSSY LINES FOR MULTI-GIGABIT APPLICATION
DIGITAL WAVE SIMULATION OF LOSSY LINES FOR MULTI-GIGABIT APPLICATIONDIGITAL WAVE SIMULATION OF LOSSY LINES FOR MULTI-GIGABIT APPLICATION
DIGITAL WAVE SIMULATION OF LOSSY LINES FOR MULTI-GIGABIT APPLICATION
Piero Belforte
 
Ad

Recently uploaded (20)

YSPH VMOC Special Report - Measles Outbreak Southwest US 5-17-2025 .pptx
YSPH VMOC Special Report - Measles Outbreak  Southwest US 5-17-2025  .pptxYSPH VMOC Special Report - Measles Outbreak  Southwest US 5-17-2025  .pptx
YSPH VMOC Special Report - Measles Outbreak Southwest US 5-17-2025 .pptx
Yale School of Public Health - The Virtual Medical Operations Center (VMOC)
 
How to Change Sequence Number in Odoo 18 Sale Order
How to Change Sequence Number in Odoo 18 Sale OrderHow to Change Sequence Number in Odoo 18 Sale Order
How to Change Sequence Number in Odoo 18 Sale Order
Celine George
 
BÀI TẬP BỔ TRỢ TIẾNG ANH 9 THEO ĐƠN VỊ BÀI HỌC - GLOBAL SUCCESS - CẢ NĂM (TỪ...
BÀI TẬP BỔ TRỢ TIẾNG ANH 9 THEO ĐƠN VỊ BÀI HỌC - GLOBAL SUCCESS - CẢ NĂM (TỪ...BÀI TẬP BỔ TRỢ TIẾNG ANH 9 THEO ĐƠN VỊ BÀI HỌC - GLOBAL SUCCESS - CẢ NĂM (TỪ...
BÀI TẬP BỔ TRỢ TIẾNG ANH 9 THEO ĐƠN VỊ BÀI HỌC - GLOBAL SUCCESS - CẢ NĂM (TỪ...
Nguyen Thanh Tu Collection
 
How to Add Button in Chatter in Odoo 18 - Odoo Slides
How to Add Button in Chatter in Odoo 18 - Odoo SlidesHow to Add Button in Chatter in Odoo 18 - Odoo Slides
How to Add Button in Chatter in Odoo 18 - Odoo Slides
Celine George
 
MICROBIAL GENETICS -tranformation and tranduction.pdf
MICROBIAL GENETICS -tranformation and tranduction.pdfMICROBIAL GENETICS -tranformation and tranduction.pdf
MICROBIAL GENETICS -tranformation and tranduction.pdf
DHARMENDRA SAHU
 
PUBH1000 Slides - Module 11: Governance for Health
PUBH1000 Slides - Module 11: Governance for HealthPUBH1000 Slides - Module 11: Governance for Health
PUBH1000 Slides - Module 11: Governance for Health
JonathanHallett4
 
Antepartum fetal surveillance---Dr. H.K.Cheema pdf.pdf
Antepartum fetal surveillance---Dr. H.K.Cheema pdf.pdfAntepartum fetal surveillance---Dr. H.K.Cheema pdf.pdf
Antepartum fetal surveillance---Dr. H.K.Cheema pdf.pdf
Dr H.K. Cheema
 
How to Manage Amounts in Local Currency in Odoo 18 Purchase
How to Manage Amounts in Local Currency in Odoo 18 PurchaseHow to Manage Amounts in Local Currency in Odoo 18 Purchase
How to Manage Amounts in Local Currency in Odoo 18 Purchase
Celine George
 
Dastur_ul_Amal under Jahangir Key Features.pptx
Dastur_ul_Amal under Jahangir Key Features.pptxDastur_ul_Amal under Jahangir Key Features.pptx
Dastur_ul_Amal under Jahangir Key Features.pptx
omorfaruqkazi
 
Peer Assesment- Libby.docx..............
Peer Assesment- Libby.docx..............Peer Assesment- Libby.docx..............
Peer Assesment- Libby.docx..............
19lburrell
 
U3 ANTITUBERCULAR DRUGS Pharmacology 3.pptx
U3 ANTITUBERCULAR DRUGS Pharmacology 3.pptxU3 ANTITUBERCULAR DRUGS Pharmacology 3.pptx
U3 ANTITUBERCULAR DRUGS Pharmacology 3.pptx
Mayuri Chavan
 
How to Configure Extra Steps During Checkout in Odoo 18 Website
How to Configure Extra Steps During Checkout in Odoo 18 WebsiteHow to Configure Extra Steps During Checkout in Odoo 18 Website
How to Configure Extra Steps During Checkout in Odoo 18 Website
Celine George
 
How to Share Accounts Between Companies in Odoo 18
How to Share Accounts Between Companies in Odoo 18How to Share Accounts Between Companies in Odoo 18
How to Share Accounts Between Companies in Odoo 18
Celine George
 
materi 3D Augmented Reality dengan assemblr
materi 3D Augmented Reality dengan assemblrmateri 3D Augmented Reality dengan assemblr
materi 3D Augmented Reality dengan assemblr
fatikhatunnajikhah1
 
Classification of mental disorder in 5th semester bsc. nursing and also used ...
Classification of mental disorder in 5th semester bsc. nursing and also used ...Classification of mental disorder in 5th semester bsc. nursing and also used ...
Classification of mental disorder in 5th semester bsc. nursing and also used ...
parmarjuli1412
 
ITI COPA Question Paper PDF 2017 Theory MCQ
ITI COPA Question Paper PDF 2017 Theory MCQITI COPA Question Paper PDF 2017 Theory MCQ
ITI COPA Question Paper PDF 2017 Theory MCQ
SONU HEETSON
 
Rebuilding the library community in a post-Twitter world
Rebuilding the library community in a post-Twitter worldRebuilding the library community in a post-Twitter world
Rebuilding the library community in a post-Twitter world
Ned Potter
 
Cyber security COPA ITI MCQ Top Questions
Cyber security COPA ITI MCQ Top QuestionsCyber security COPA ITI MCQ Top Questions
Cyber security COPA ITI MCQ Top Questions
SONU HEETSON
 
Chemotherapy of Malignancy -Anticancer.pptx
Chemotherapy of Malignancy -Anticancer.pptxChemotherapy of Malignancy -Anticancer.pptx
Chemotherapy of Malignancy -Anticancer.pptx
Mayuri Chavan
 
How to Manage Cross Selling in Odoo 18 Sales
How to Manage Cross Selling in Odoo 18 SalesHow to Manage Cross Selling in Odoo 18 Sales
How to Manage Cross Selling in Odoo 18 Sales
Celine George
 
How to Change Sequence Number in Odoo 18 Sale Order
How to Change Sequence Number in Odoo 18 Sale OrderHow to Change Sequence Number in Odoo 18 Sale Order
How to Change Sequence Number in Odoo 18 Sale Order
Celine George
 
BÀI TẬP BỔ TRỢ TIẾNG ANH 9 THEO ĐƠN VỊ BÀI HỌC - GLOBAL SUCCESS - CẢ NĂM (TỪ...
BÀI TẬP BỔ TRỢ TIẾNG ANH 9 THEO ĐƠN VỊ BÀI HỌC - GLOBAL SUCCESS - CẢ NĂM (TỪ...BÀI TẬP BỔ TRỢ TIẾNG ANH 9 THEO ĐƠN VỊ BÀI HỌC - GLOBAL SUCCESS - CẢ NĂM (TỪ...
BÀI TẬP BỔ TRỢ TIẾNG ANH 9 THEO ĐƠN VỊ BÀI HỌC - GLOBAL SUCCESS - CẢ NĂM (TỪ...
Nguyen Thanh Tu Collection
 
How to Add Button in Chatter in Odoo 18 - Odoo Slides
How to Add Button in Chatter in Odoo 18 - Odoo SlidesHow to Add Button in Chatter in Odoo 18 - Odoo Slides
How to Add Button in Chatter in Odoo 18 - Odoo Slides
Celine George
 
MICROBIAL GENETICS -tranformation and tranduction.pdf
MICROBIAL GENETICS -tranformation and tranduction.pdfMICROBIAL GENETICS -tranformation and tranduction.pdf
MICROBIAL GENETICS -tranformation and tranduction.pdf
DHARMENDRA SAHU
 
PUBH1000 Slides - Module 11: Governance for Health
PUBH1000 Slides - Module 11: Governance for HealthPUBH1000 Slides - Module 11: Governance for Health
PUBH1000 Slides - Module 11: Governance for Health
JonathanHallett4
 
Antepartum fetal surveillance---Dr. H.K.Cheema pdf.pdf
Antepartum fetal surveillance---Dr. H.K.Cheema pdf.pdfAntepartum fetal surveillance---Dr. H.K.Cheema pdf.pdf
Antepartum fetal surveillance---Dr. H.K.Cheema pdf.pdf
Dr H.K. Cheema
 
How to Manage Amounts in Local Currency in Odoo 18 Purchase
How to Manage Amounts in Local Currency in Odoo 18 PurchaseHow to Manage Amounts in Local Currency in Odoo 18 Purchase
How to Manage Amounts in Local Currency in Odoo 18 Purchase
Celine George
 
Dastur_ul_Amal under Jahangir Key Features.pptx
Dastur_ul_Amal under Jahangir Key Features.pptxDastur_ul_Amal under Jahangir Key Features.pptx
Dastur_ul_Amal under Jahangir Key Features.pptx
omorfaruqkazi
 
Peer Assesment- Libby.docx..............
Peer Assesment- Libby.docx..............Peer Assesment- Libby.docx..............
Peer Assesment- Libby.docx..............
19lburrell
 
U3 ANTITUBERCULAR DRUGS Pharmacology 3.pptx
U3 ANTITUBERCULAR DRUGS Pharmacology 3.pptxU3 ANTITUBERCULAR DRUGS Pharmacology 3.pptx
U3 ANTITUBERCULAR DRUGS Pharmacology 3.pptx
Mayuri Chavan
 
How to Configure Extra Steps During Checkout in Odoo 18 Website
How to Configure Extra Steps During Checkout in Odoo 18 WebsiteHow to Configure Extra Steps During Checkout in Odoo 18 Website
How to Configure Extra Steps During Checkout in Odoo 18 Website
Celine George
 
How to Share Accounts Between Companies in Odoo 18
How to Share Accounts Between Companies in Odoo 18How to Share Accounts Between Companies in Odoo 18
How to Share Accounts Between Companies in Odoo 18
Celine George
 
materi 3D Augmented Reality dengan assemblr
materi 3D Augmented Reality dengan assemblrmateri 3D Augmented Reality dengan assemblr
materi 3D Augmented Reality dengan assemblr
fatikhatunnajikhah1
 
Classification of mental disorder in 5th semester bsc. nursing and also used ...
Classification of mental disorder in 5th semester bsc. nursing and also used ...Classification of mental disorder in 5th semester bsc. nursing and also used ...
Classification of mental disorder in 5th semester bsc. nursing and also used ...
parmarjuli1412
 
ITI COPA Question Paper PDF 2017 Theory MCQ
ITI COPA Question Paper PDF 2017 Theory MCQITI COPA Question Paper PDF 2017 Theory MCQ
ITI COPA Question Paper PDF 2017 Theory MCQ
SONU HEETSON
 
Rebuilding the library community in a post-Twitter world
Rebuilding the library community in a post-Twitter worldRebuilding the library community in a post-Twitter world
Rebuilding the library community in a post-Twitter world
Ned Potter
 
Cyber security COPA ITI MCQ Top Questions
Cyber security COPA ITI MCQ Top QuestionsCyber security COPA ITI MCQ Top Questions
Cyber security COPA ITI MCQ Top Questions
SONU HEETSON
 
Chemotherapy of Malignancy -Anticancer.pptx
Chemotherapy of Malignancy -Anticancer.pptxChemotherapy of Malignancy -Anticancer.pptx
Chemotherapy of Malignancy -Anticancer.pptx
Mayuri Chavan
 
How to Manage Cross Selling in Odoo 18 Sales
How to Manage Cross Selling in Odoo 18 SalesHow to Manage Cross Selling in Odoo 18 Sales
How to Manage Cross Selling in Odoo 18 Sales
Celine George
 
Ad

FPGA IMPLEMENTATION OF EFFICIENT VLSI ARCHITECTURE FOR FIXED POINT 1-D DWT USING LIFTING SCHEME

  • 1. International Journal of VLSI design & Communication Systems (VLSICS) Vol.3, No.4, August 2012 DOI : 10.5121/vlsic.2012.3404 37 FPGA IMPLEMENTATION OF EFFICIENT VLSI ARCHITECTURE FOR FIXED POINT 1-D DWT USING LIFTING SCHEME Durga Sowjanya1 , K N H Srinivas2 and P Venkata Ganapathi3 1 Research fellow, Sri Vasavi Engineering College, Tadepalligudem k.durgasowjanya@gmail.com 2 Head of the department in Sri Vasavi Engineering College, Tadepalligudem Knh.tridents@gmail.com 3 Venkata Ganapathi Puppala, Quartics Technologies Pvt Ltd, Pune ganapathi.pv@gmail.com ABSTRACT In this paper, a scheme for the design of area efficient and high speed pipeline VLSI architecture for the computation of fixed point 1-d discrete wavelet transform using lifting scheme is proposed. The main focus of the scheme is to reduce the number and period of clock cycles and efficient area with little or no overhead on hardware resources. The fixed point representation requires less hardware resources compared with floating point representation. The pipelining architecture speeds up the clock rate of DWT and reduced bit precision reduces the area required for implementation. The architecture has been coded in verilog HDL on Xilinx platform and the target FPGA device used is Virtex-II Pro family, XC2VP7- 7board. The proposed scheme requires the least computing time for fixed point 1-D DWT and achieves the less area for implementation, compared with other architectures. So this architecture is realizable for real time processing of DWT computation applications. KEYWORDS Discrete wavelet transform (DWT), Lifting based scheme, field-programmable gate-array (FPGA), pipeline architecture, reduced bit precision, fixed point, VLSI architecture. 1. INTRODUCTION The advantages of the wavelet transform over conventional transforms, such as the Fourier transform, are now well recognized. Because of its excellent locality in time-frequency domain, wavelet transform is remarkable and extensively used for signal analysis, compressing and de- noising. Since the development of the theory for the computation of the discrete wavelet transform (DWT) by Mallat [1] in1989, the DWT has been increasingly used in many different areas of science and engineering mainly because of the multi resolution decomposition property of the transformed signals. Definition given for DWT by Mallat [1] provided possibility of its implementation in hardware and software. The discrete wavelet transform (DWT) performs a multi resolution signal analysis which has adjustable locality in both the space (time) and frequency domains [1].The DWT is computationally intensive because of multiple levels of decomposition involved in the computation of the DWT. It is therefore challenging to design an efficient VLSI architecture to implement the DWT computation for real-time applications, particularly those requiring processing of high-frequency or broadband signals [2]–[4]. Using finite impulse response (FIR) filters and then sub sampling is the classical method for
  • 2. International Journal of VLSI design & Communication Systems (VLSICS) Vol.3, No.4, August 2012 38 implementing the DWT. Due to the large amount of computations required, there have been many research efforts to develop new algorithms [15] .Many architectures have been proposed in order to provide high-speed and area-efficient implementations for the DWT computation [5]–[8]. In [9]–[11], the poly phase matrix of a wavelet filter is decomposed into a sequence of alternating upper and lower triangular matrices and a diagonal matrix to obtain the so-called lifting-based architectures with low hardware complexity. The pipeline architectures have the advantages of requiring a small memory space and a short computing time and are suitable for real-time computations. However, these architectures have some inherent characteristics that have not yet been fully exploited in the schemes for their design. The computational performance of such architectures could be further improved, provided that the design with pipeline make sure of lifting steps to the maximum extent possible, synchronizes the operations of the stages optimally, and utilizes the available hardware resources optimally. In this paper, a scheme for the design of pipeline architecture for a fast computation of the DWT is developed. The goal of fast computation is achieved by minimizing the number and period of clock cycles. The main idea used for minimizing these two parameters is to optimally distribute the task of the DWT computation among the stages of the pipeline and lifting scheme of 9/7 filter. In the study, we focus on the issues of theoretical path and internal memory size with 9/7 filters. To ease the tradeoff between the pipeline stages of 1-D architecture, a modified algorithm is proposed for the design of 1-D pipeline architecture. Based on the modified data path of lifting- based DWT, the proposed architecture achieves the one-multiplier delay constraint but uses less internal memory compared to the related architectures. Moreover, the proposed architecture implements the 9/7 filters by cascading the three main components. Due to recent advances in the technology, implementation of the DWT on field programmable gate array (FPGA) and digital signal processing (DSP) chips has been widely developed. Based on [4], the main challenges in the hardware architectures for 1-D DWT are the processing speed and the number of multipliers. The number of multipliers in each pipeline stage determines the clock speed of the structure. This paper is organized as follows. In Section II, Discrete Wavelet Transform is presented. In Section III, choice of pipeline for the 1-d dwt is presented, Section IV, the Lifting based scheme is presented, Section V, briefly introduces the underlying concepts of the architecture of 1-d DWT. Section VI, Presents Performance evaluation and FPGA implementation and compares the proposed architecture with other related studies. Finally, a brief conclusion is given in Section VII. 2. DISCRETE WAVELET TRANSFORM In this section the theoretical background and algorithm development is discussed. The first recorded mention of what is now called a "wavelet" seems to be in 1909, in a thesis by Alfred Haar. An image is represented as a two dimensional (2D) array of coefficients, each coefficient representing the brightness level in that point. When looking from a higher perspective, it is not possible to differentiate between coefficients as more important ones, a lesser important ones. But thinking more intuitively, it is possible. Most natural images have smooth color variations, with the fine details being represented as sharp edges in between the smooth variations. Technically, the smooth variations in color can be termed as low frequency components and the sharp variations as high frequency components. The low frequency components (smooth variations) constitute the base of an image, and the high frequency components (the edges which
  • 3. International Journal of VLSI design & Communication Systems (VLSICS) Vol.3, No.4, August 2012 39 give the detail) add upon them to refine the image, thereby giving a detailed image. Hence, the averages/smooth variations are demanding more importance than the details [4].In wavelet analysis, a signal can be separated into approximations and detail coefficients. Averages are the high-scale, low frequency components of the signal. The details are the low scale, high frequency components. This coefficients measure the signal energy distribution in each frequency channel corresponding to the scaling parameter j at the time k. The Discrete Time wavelet Transform (DWT) has found many applications in digital signal processing, due to the efficient computation and the sufficient properties for non-stationary signal analysis. For the wavelet analysis, the structure is given in figure 1(a). As a result, the DWT decomposes a digital signal into different sub bands so that the lower frequency sub bands have finer frequency resolution and coarser time resolution compared to the higher frequency sub bands. The DWT is being increasingly used for image compression due to the fact that the DWT supports features like progressive image transmission (by quality, by resolution), ease of compressed image manipulation, region of interest coding, etc. 2.1. One dimensional DWT: Any signal is first applied to a pair of low-pass and high-pass filters. Then down sampling (i.e., neglecting the alternate coefficients) is applied to these filtered coefficients. The filter pair (h, g) which is used for decomposition is called analysis filter-bank and the filter pair which is used for reconstruction of the signal is called synthesis filter bank.(g`, h`).The output of the low pass filter after down sampling contains low frequency components of the signal which is approximate part of the original signal and the output of the high pass filter after down sampling contains the high frequency components which are called details (i.e., highly textured parts like edges) of the original signal. The output from low pass filter G (z) represents the approximate coefficient denoted by S j (n). S j (n)= (k)G(2n-k) The output from high pass filter H(z) represents the detailed coefficient denoted by W j (n). W j (n)= (K)H(2n-k)
  • 4. International Journal of VLSI design & Communication Systems (VLSICS) Vol.3, No.4, August 2012 40 3. PIPELINE FOR THE 1-D DWT COMPUTATION In a pipeline structure for the DWT computation, multiple stages are used to carry out the computations of the various decomposition levels of the transform. Thus, the computation corresponding to each decomposition level needs to be mapped to a stage or stages of the pipeline. In order to maximize the hardware utilization of a pipeline, the hardware resource of a stage should be proportional to the amount of the computation assigned to the stage. Since the amount of computations in successive decomposition levels of the transform gets reduced by a factor of two, two scenarios can be used for the distribution of the computations to the stages of a pipeline. In the first scenario, the decomposition levels are assigned to the stages so as to equalize the computations carried out by each stage, i.e., the hardware requirements of all the stages are kept the same. In the second scenario, the computations of the successive decomposition levels are assigned to the successive stages of a pipeline on a one-level-to-one-stage basis. Thus, in this case, the hardware requirement of the stages gets reduced by a factor of two as they perform the computations corresponding to higher level decompositions. A stage-equalized pipeline structure is the one in which the computations of all the levels are distributed equally among the stages. The process of stage equalization can be accomplished by dividing equally the task of a given level of decomposition into smaller subtasks and assigning each such sub task to a single stage and/or by combining the tasks of more than one consecutive level of decomposition into a single task and assigning it to a single stage. Note that, generally, a division of the task would be required for low levels of decomposition and a combination of the tasks for high levels of decomposition. In a one-to-one mapped structure, the computations of decomposition levels are distributed exactly among all stages. In this structure, the computations of the first levels are carried out by the first stage, remaining levels are carried out by remaining stages respectively and those of the last levels are performed recursively by the second stage. Thus, for either pipeline structure, i.e., the one-to-one mapped or stage-equalized, a two-stage pipeline would be the best choice in terms of hardware efficiency and from the standpoint of design and implementation simplicity. Note that the five-stage version of either pipeline structure is the same but due to the flexibility of designing the architecture, stage equalized pipelining structure is preferred. 4. LIFTING BASED SCHEME The lifting scheme has been developed as a flexible tool suitable for constructing the second generation wavelet. It is composed of three basic operation stages: splitting, predicting, and updating. Fig.2 shows the lifting scheme of the wavelet filter computing one dimension signal: • Split step: where the signal is split into even and odd points, because the maximum correlation between adjacent pixels can be utilized for the next predict step. • Predict step: The even samples are multiplied by the predict factor and then the results are added to the odd samples to generate the detailed coefficients. • Update step: the detailed coefficients computed by the predict step are multiplied by the update factors and then the results are added to the even samples to get the coarse coefficients. Note that the details and approximation coefficients (d, s) in lifting scheme, respectively, are the same as high pass and low pass outputs. Daubechies and Sweldens first derived the lifting-based discrete wavelet transform [11], [12].
  • 5. International Journal of VLSI design & Communication Systems (VLSICS) Vol.3, No.4, August 2012 41 The lifting scheme can decompose DWT filter bank into several lifting steps. As h~ (z) and g~ (z) are the low-pass and high-pass analysis filters; the poly phase matrix p~ (z) is defined as follows: The poly phase matrix p~ (z) can be factorized into a sequence of alternating upper and lower triangular matrices multiplied by a constant diagonal matrix. The 9/7 filter has two lifting steps and one scaling step .The detailed algorithm of the 9/7 filter is described from (2) to (7). First, the input sequences xi are split into even and odd parts, Si 0 and di 0 .Second, the two splitting sequences are performed by two lifting steps. The outputs are denoted as Si n and di n , where n represents the stage of lifting step. Finally, through the normalization factors k1 andk2, the low-pass and high-pass wavelet coefficients Si and di can be obtained. 1. Splitting Step: di 0 =x2i+1 (2) si 0 =x2i (3) 2. Lifting Steps: 2.1. First lifting step di 1 =di 0 +α× (si 0 +si+1 0 ) (predictor) (4) si 1 =si 0 +β× (di-1 1 +di 1 ) (updater) (5) 2.2. Second lifting step di 2 =di 1 +γ× (si 1 +si+1 1 ) (predictor) (6) si 2 =si 1 +δ× (di-1 2 +di 2 ) (updater) (7) Several architectures [14], [15] have been proposed to directly implement the lifting structures of the 9/7 filters. The five pipeline stages are used to improve the processing time, but the critical path is still restricted by the computation of predictor or updater (i.e., two adders and one multiplier propagation delay).
  • 6. International Journal of VLSI design & Communication Systems (VLSICS) Vol.3, No.4, August 2012 42 5. 1-D DWT ARCHITECTURE By combining the functional units described in previous section we can construct one dimensional DWT. The architecture can be applied to implement the lifting-based 1-D DWT. The structure processes all input samples that arrive in pairs at consecutive clock pulses and the results for each pair are ready after five cycles. However, due to the pipelined structure, the clock frequency is higher than that of parallel architectures. There is a trade-off between the clock speed and the number of pipeline stages.Figure3 shows the proposed architecture for the 9/7 fixed point DWT computation. Figure3: Architecture for lifting scheme for fixed point 1-d DWT In this architecture we advocated a five stage pipeline structure for the computation of 1- DDWT.The proposed structure is constrained by the nature of the DWT computation and is capable of optimizing the use of hardware resources. In this five stage pipeline structure, all stages need to share the computation. Hence, all the stages need to be synchronized with one other. The pipeline registers in this proposed architecture are used in better way to optimize the use of hardware resources. Every stage performs by operating on the data produced by the previous stage. In this section we present the design of proposed five stage pipeline architecture by using 9 by 7 filter coefficients, pipeline registers and delay elements. The design of this architecture mainly focused on fixed point DWT computation. In order to show the efficiency of our architecture, several architectures are chosen for comparison. In the proposed architecture, the clock pulses required to compute outputs are less than those in the previous architectures. This is due to the sequential states required to complete the computation of each output. The architecture uses fixed point representation for arithmetic. The bit width of data inputs of the first stage of the 1-D DWT is 11 bits. That include 1 sign bit, 8 integral bits and the fractional bits are chosen to be 2 bits. Ideally the pixel inputs are 8 bits but having 11 bits signed input makes the hardware more generic. Whereas the coefficients inputs have 1 sign bit, 1 integral bit and 7 fractional bits. The bit precision will grow after the multiplication and addition performed in each stage of the DWT. To reduce the propagation delays in the digital circuit, the fractional part of the multiplier output is truncated by 7 bits before performing the addition. In this way the first stage of the DWT data outputs will have 19 bits (1 sign, 16 integral, 2 fractional bits). Looking at the 9/7 DWT coefficients, we can say that the output never crosses 4 times that of the inputs so we are safe to truncate 7 integral bits from the first stage output and feed to the second stage input. Similar approach is applied on the second stage also. After the second stage the outputs will have 20 bits (1 sign, 17 integral and 2 fractional bits). Saturation is applied to clip the data outputs between 0 and 255 (the 8-bit pixel range).
  • 7. International Journal of VLSI design & Communication Systems (VLSICS) Vol.3, No.4, August 2012 43 6. PERFORMANCE EVALUATION AND FPGA IMPLEMENTATION In order to evaluate the performance of the architecture resulting from the proposed scheme, we need to make use of certain metrics that characterize the architecture in terms of the hardware resources used and the computation time. A five stage pipelined architecture is implemented and the simulation result of that is shown in figure 4. For simulation of this architecture, YUV image is applied as the input. The image is given in the form of array that is represented by [0: ROWS-1] [0: COLUMNS-1]. The image is splitted into two parts as [0: ROWS/2-1] [0: COLUMNS/2-1] and [0: ROWS/2-1][0:COLUMNS/2-1]. That image can be used in the verilog code by using fopen command. We get the output image in two ports-output low pass image:[0:ROWS-1][0:COLUMNS/2-1], output high pass image [0:ROWS- 1][0:COL/2-1]. In the simulation it is observed that when the input applies, the output is obtained after five clock cycles. The hardware resources used for the filtering operation are measured by the number of multipliers and the number of adders, and that used for the memory space and pipeline latches is measured by the number of registers. The hardware resources utilization is shown in figure 5.Thecomputation time, in general, is technology dependent. However, a metric, which is independent of the technology used but can be utilized to determine the computation time, is the number of clock cycles consumed from the instant the first sample is inputted to the last sample outputted assuming a given clock-cycle period, for example, unity, as the latency of a MAC cell. Figure 4: Simulation result for pipelining architecture
  • 8. International Journal of VLSI design & Communication Systems (VLSICS) Vol.3, No.4, August 2012 44 Figure5: Hard ware resources utilization Figure6: Timing constraints for DWT computation For the DWT computation, the comparison for the metrics mentioned before for various architectures are summarized in Table I. It is seen from the table that, compared to the architecture of [17],all the other architectures, including the proposed one, require approximately twice the number of clock cycles, except the architecture of [14], which requires four times as many clock cycles. Table I: comparison of various architectures Architecture Tc(ns) Parallel(13) 17.8 Systolic(14) 11.8 Pipelined(17) 11.8 DRU(18) 10.2
  • 9. International Journal of VLSI design & Communication Systems (VLSICS) Vol.3, No.4, August 2012 45 IP core (19) 11.8 Pipeline with parallelism (27) 8.7 Proposed 8.280 Table II: Resources Used in the FPGA devices Resource Used Available in Total Percentage used CLB Slices 158 4928 3% Flip Flop Slices 230 9856 2% 4-input LUTS 133 9856 1% Bonded IOBs 70 248 28% This performance of [17] is achieved by utilizing the hardware resources of adders and multipliers that are four times that required by the architecture of [14] and twice that required by any of the other architectures. In order to verify the estimated results, an implementation of the circuit is carried out in FPGA. Verilog is used for the hardware description and Xilinx ISE 8.2i for the synthesis of the circuit on a Virtex-II Pro XC2VP7-7 board. The implementation is evaluated with respect to the clock period (throughput) measured as the delay of the critical path of the MAC-cell network, and the resource utilization (area) measured as the number of configuration logic block slices, DFFs, lookup tables, and input/output blocks. The resources used by the implementation are listed in Table II. The circuit is found to perform well with a clock period as short as 8.280 ns. The clock period and timing constraints are shown in figure 6. 7. CONCLUSION In this paper, a scheme for the design of pipeline architecture for a real-time computation of the fixed point 1-D DWT has been presented. The objective has been to achieve a low computation time by maximizing the operational frequency and minimizing the number of clock cycles required for the DWT computation, which, in turn, have been realized by developing a scheme for two lifting steps with 9 by 7 filtering and having five pipeline stages for the pipeline architecture. A study has been undertaken, which suggests that, in view of the nature of the DWT computation, it is most efficient to map the overall task of the DWT computation to only five pipeline stages. There are two main ideas that have been employed for the internal design of each stage in order to enhance pipelining for DWT computation. The first idea was to decompose the filtering operation into two sub tasks that operate independently on the even- and odd-numbered input samples, respectively. This idea stems from the fact that the DWT computation is a two-sub band filtering operation, and for each consecutive decomposition level. Each subtask of the filtering operation is performed by a MAC-cell network with coefficients taken from 9 by 7 filters. The second idea employed for enhancing pipeline is to minimize the delay of the critical path. In order to assess the effectiveness of the proposed scheme, pipeline architecture has been designed and simulated. The simulation results have shown that the architecture designed based on the proposed scheme requires the smallest number of clock cycles to compute output samples and a reduction of at least 60% in the period of the clock cycle in comparison to those required by the other architectures with a comparable hardware requirement. An FPGA based implementation of the designed architecture has been carried out, demonstrating the effectiveness of the proposed scheme for designing efficient and realizable architectures for the DWT computation.
  • 10. International Journal of VLSI design & Communication Systems (VLSICS) Vol.3, No.4, August 2012 46 Finally, the principle of pipelining architecture using lifting scheme presented in this paper for the design of architecture for the 1-D DWT computation is extendable to that for the 2-D DWT computation. REFERENCES [1] S. Mallat, “A theory for multi resolution signal decomposition: The wavelet representation,” IEEE Trans. Pattern Anal. Mach. Intell., vol.11, no. 7, pp. 674–693, Jul. 1989. [2] J. Chilo and T. Lindblad, “Hardware implementation of 1D wavelet transform on an FPGA for infra sound signal classification, ” IEEE Trans. Nucl. Sci., vol. 55, no. 1,pp. 9–13, Feb. 2008. [3] S. Cheng, C. Tseng, and M. Cole, “Efficient and effective VLSI architecture for a wavelet-based broadband sonar signal detection system,” in Proc. IEEE 14th ICECS, Marrakech, Morocco, Dec. 2007,pp. 593–596. [4] K. G. Oweiss, A. Mason, Y. Suhail, A. M. Kamboh, and K. E.Thomson, “A scalable wavelet transform VLSI architecture for real-time signal processing in high-density intra-cortical implants, ”IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 54, no. 6, pp. 1266–1278, Jun. 2007. [5] K. Andra, C. Chakrabati, and T. Acharya, “A VLSI architecture for lifting-based forward and inverse wavelet transform,” IEEE Trans .Signal Process., vol. 50, no. 4, pp. 966–977, Apr. 2002. [6] C. Huang, P. Tseng, and L. Chen, “Analysis and VLSI architecture for1-D and 2-D discrete wavelet transform,” IEEE Trans. Signal Process. ,vol. 53, no. 4, pp. 1575–1586, Apr. 2005. [7] M. Martina and G. Masera, “Multiplierless, folded 9/7–5/3 wavelet VLSI architecture,” IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 54,no. 9, pp. 770–774, Sep. 2007. [8] A. Acharyya, K. Maharatna, B. M. Al-Hashimi, and S. R. Gunn,“Memory reduction methodology for distributed-arithmetic-based DWT/IDWT exploiting data symmetry,” IEEE Trans. Circuits Syst. II,Exp. Briefs, vol. 56, no. 4, pp. 285–289, Apr. 2009. [9] K. A. Kotteri, S. Barua, A. E. Bell, and J. E. Carletta, “A comparison of hardware implementations of the biorthogonal 9/7 DWT: Convolution versus lifting,” IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 52, no.5, pp. 256–260, May 2006. [10] C. Wang and W. S. Gan, “Efficient VLSI architecture for lifting-based discrete wavelet packet transform,” IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 54, no. 5, pp. 422–426, May 2007. [11] G. Shi, W. Liu, L. Zhang, and F. Li, “An efficient folded architecture for lifting-based discrete wavelet transform,” IEEE Trans. Circuits Syst.II, Exp. Briefs, vol. 56, no. 4, pp. 290–294, Apr. 2009. [12] C. T. Huang, P. C. Tseng, and L. G. Chen, “Efficient VLSI architectures of lifting-based discrete wavelet transform by systematic design method,” in Proc. IEEE ISCAS, vol. 5, May 2002, pp. 565– 568. [13] C. Chakrabarti, M. Vishwanath, and R. M. Owens, “Architectures for wavelet transforms: A survey,” J. VLSI Signal Process., vol. 14, no. 2,pp. 171–192, Nov. 1996. [14] A. Grzesczak, M. K. Mandal, and S. Panchanathan, “VLSI implementation of discrete wavelet transform,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 4, no. 4, pp. 421–433, Dec. 1996. [15] T. Acharya, C. Chakrabarti, A survey on lifting-based discrete wavelet transform architectures. J. VLSI Signal Process. 42, 321–339 (2006)
  • 11. International Journal of VLSI design & Communication Systems (VLSICS) Vol.3, No.4, August 2012 47 [16] I. Daubechies, W. Sweldens, Factoring wavelet transform into lifting steps. J. FourierAnal. Appl. 4, 247–269 (1998)54 Discrete Wavelet Transforms: Algorithms and Applications VLSI Architectures of Lifting-Based Discrete Wavelet Transform 15 [17] C. Zhang, C. Wang, and M. O. Ahmad, “A VLSI architecture for a high-speed computation of the 1D discrete wavelet transform,” in Proc.IEEE Int. Symp. Circuits Syst., Kobe, Japan, May 2005, vol. 2, pp.1461–1464. [18] Chengjun Zhang, Chunyan Wang and M. Omair Ahmad, “A Pipeline VLSI Architecture for High- Speed Computation of the 1-D Discrete Wavelet Transform” IEEE Trans. Circuits Systems I,pp.1529- 8328,Feb 2010. [19] An Efficient Architecture for 2-D Lifting-based Discrete Wavelet Transform PingpingYu,Suying Yao, JiangtaoXu School of Electronic and Information Engineering Tianjin University Tianjin, China,p.p-978-1-4244-2800-7/09-IEEE-2009 [20] VLSI Architectures of Lifting-Based Discrete Wavelet Transform Sayed Ahmad Salehi and Rasoul Amirfattahi Isfahan University of Technology, Department of Electrical and Computer Engineering, Digital Signal Processing Research Lab., Isfahan Iran. [21] A High-Performance and Memory-Efficient Pipeline Architecture for the 5/3 and 9/7 Discrete Wavelet Transform of JPEG2000 Codec Bing-Fei Wu, Senior Member, IEEE, and Chung-Fu Lin, Student Member, IEEE -IEEE Transactions on Circuits and systems for video technology,vol.15,no.12,December 2005 [22] A Rescheduling and Fast Pipeline VLSI Architecture for Lifting-based Discrete Wavelet Transform Bing-Fei Wu and Chung-Fu Lin Department of Electrical and Control Engineering National Chiao Tung University 1001 Ta Hsueh Road, Hsinchu, Taiwan, 30050, R.O.C [23] Lifting Based Discrete Wavelet Transform Architecture for JPEG2000 Chung-JrLian, Ktian-Ftr Chen, Hong-Hui Chen, and Liang-Gee Chen DSP/AC Design Lab., Department of Electrical Engineering, National Taiwan University, Taipei 106, Taiwan, R.O.C. [24] F. Marino, D. Guevorkian, and J. Astola, “Highly efficient high-speed/low-power architectures for 1- D discrete wavelet transform ,”IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 47, no. 12, pp.1492– 1502, Dec. 2000. [25] T. Park, “Efficient VLSI architecture for one-dimensional discrete wavelet transform using a scalable data recorder unit,” in Proc.ITC-CSCC, Phuket, Thailand, Jul. 2002, pp. 353–356. [26] S. Masud and J. V. McCanny, “Reusable silicon IP cores for discrete wavelet transform applications,” IEEE Trans. Circuits Syst. I, Reg. Papers,vol. 51, no. 6, pp. 1114–1124, Jun. 2004. [27] C.-T. Huang, P.-C.Tseng, L.-G. Chen, Flipping structure: an efficient VLSI architecture for lifting based discrete wavelet transform, IEEE Trans. Signal Process. 52 (2004), pp.1080–1089 [28] K. K. Parhi and T. Nishitani, “VLSI architectures for discrete wavelet transforms,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 1,no. 6, pp. 191–202, Jun. 1993. [29] M. Vishwanath, R. M. Owens, and M. J. Irwin, “VLSI architectures for the discrete wavelet transform,” IEEE Trans. Circuits Systems II, Analog.Digit. Signal Process., vol. 42, no. 5, pp. 305– 316, May 1995. [30] C. Cheng and K. K. Parhi, “High-speed VLSI implementation of 2-Ddiscrete wavelet transform,” IEEE Trans. Signal Process., vol. 56, no.1, pp. 393–403, Jan. 2008.
  • 12. International Journal of VLSI design & Communication Systems (VLSICS) Vol.3, No.4, August 2012 48 [31] VLSI Implementation of Discrete Wavelet Transform (DWT) for Image Compression Abdullah AlMuhit, Md. Shabiul Islam and Masuri Othman* Faculty of Engineering Multimedia University (MMU) Jalan multimedia, Cyberjaya, Selangor 63100,Malaysia. Authors K. Durga Sowjanya was born in Koyyalagudem (Andhra Pradesh). She received B.Tech in Electronics and Communication Engineering from Jawaharlal Nehru Technological University, Kakinada. She is currently pursuing M.Tech from Sri Vasavi Engineering College, J.N.T University, Kakinada, India. Her research interest includes image and signal processing algorithms and VLSI architecture development. She did her Master’s thesis in VLSI Architecture for Computation of 1-D DWT. Mr.K.N.H.Srinivas, received the B.Tech., degree in electronics and communication engineering from S.V.H. College of Engineering, Machilipatnam, Nagarjuna University and Completed M.Tech (Electronic Instrumentation) at National Institute of Technology, Warangal, India. He is currently working as Head of the Department of Electronics and Communication Engineering, Sri Vasavi Engineering College. He is a fellow IETE &ISTE and guided eight post graduate students so far. He published eight research papers in reputed international journal and international conferences. Venkata Ganapathi Puppala, received the B.Tech., degree in electronics and communication engineering from JNTU University, Hyderabad, India, in 2005, and the M.Tech degree in electronics instrumentation engineering from the National Institute of Technology, Warangal, India, in 2007. In 2007, he joined iChip Technologies, Hyderabad. And worked as ASIC Design engineer and involved development of Application Specific Instruction Set Processors for Video Encoder and Decoders for H.264/MPEG-4 AVC, VC1 standards. Later he joined in Quartics Technologies, Pune where he is currently working as ASIC Design engineer and involved in ASIP architectures for computer vision, video post processing and 2D video to 3D video conversion algorithms.
  翻译: