A Journal Established in early 2000 as National journal and upgraded to International journal in

# Design of floating-point multiplier using modified Wallace \& Dadda algorithms 

Appaji.H.Birasal ${ }^{1}$, Chetana $\mathbf{H}^{2}$<br>${ }^{1,2,}$ Assistant professor, Department of Electronics and communication, Rural Engineering College, Hulkoti, Karnataka, India appaji1906@gmail.com

### 1.1 Introduction

In computing, floating point describes a method of representing an approximation of a real number in a way that can support a wide range of values. Low power consumption and smaller area are some of the most important criteria for the fabrication of DSP systems and high performance systems. Optimizing the speed and area of the multiplier is a major design issue. This can be achieved using Wallace and Dadda algorithm of an IEEE 754 single precision floating point multiplier. Improvement in speed multiplication of Dadda and Wallace multiplier is done using carry look ahead adder. Multiplier based on Wallace and dada algorithms provides an area efficient and high speed multiplication. The focus of this project is delay comparison of floating point multiplier using Wallace tree and Dadda tree algorithms. The Dadda tree multiplier is faster than Wallace tree multiplier. Both uses XOR operation for sign bit calculation and bias is used for exponent calculation. But mantissa multiplication is calculating separately by using two different techniques, those are Wallace and Dadda tree.

Wallace and Dadda tree involves three steps:[1]Generating partial product using booth algorithm.[2]Partial products are added using full adder and half adder until it is reduced to two rows.[3]Final two rows are added using carry look ahead adder. Dadda tree multiplier uses less number of half adders and full adders compare to Wallace tree multiplier. Floating point multiplier is one of the key hardware blocks in digital and high performance systems like digital signal processors and microprocessors. There are two types of floating point multipliers: single precision floating point multiplier (32-bit) and double precision floating point multiplier (64-bit).

Now a day's speech, video and other such real time applications are required for mobile systems. For example cell phone and laptop. Improving multipliers design directly benefits the high performance embedded processors used in consumer and industrial electronic products. The floating point multiplier should be implemented to present both fast multiplication and less hardware. Higher processor has been broadly used in computer. In DSP applications, multiplication is one of the most utilized arithmetic operations as part of filters, convolves and transforms processors. Both single precision floating point multiplier with Wallace tree and Dadda tree algorithms are designed using Verilog HDL code.

### 1.2 Problem Statement

International Journal on Recent Researches in<br>Science, Engineering \& Technology (IJRRSET)<br>A Journal Established in early 2000 as National journal and upgraded to International journal in 2013 and is in existence for the last 10 years. It is ron by Retired Professors from NIT, Trichy, Journal Indexed in JIR, DIIF and SJIF.<br>Available online at: www.jirset.com

ISSN $=2347-6729$
ISSN iam : 2348-3105
JIR IF : 2.54
SJIF IF : 4.334
Cosmos: 5.395

Volume 9, Issue 1 - January 2021 - Pages 7-38
It is required to implement single precision floating point multiplier using Wallace and Dadda tree algorithms. Delay comparison between both techniques.

### 1.3 Objective

The main objective of this study is to achieve high speed single precision multiplication using booth algorithm in Wallace and Dadda tree. This is achieved using verilog HDL code. The generated partial products are added using full adders and half adders. For final two rows addition carry look ahead adder is used to calculate product of two floating point numbers.

### 1.4 Scope of the project

The aim here is to design and implement single precision floating point multiplier using Wallace and Dadda tree algorithm on Virtex 5. To generate partial product booth algorithm is used because it generates 9 partial products for 23 -bit input data. Without booth algorithm 23 partial products are generated. Dadda tree uses partition method to reduce calculations. It is basic mathematical operation block in digital signal processor.

### 1.5 Motivation

Array multipliers, parallel multipliers and conventional Wallace multipliers are simple but very slow and require more area. These techniques require more number of full adders and half adders. The Wallace and Dadda tree multipliers using booth algorithm generates less partial products. So it requires less designing steps. It uses less number of full adders and half adders compare to other multipliers. All the available design uses carry save adders or ripple carry adders for design of floating point multiplier. Carry look ahead adder is one of the fastest adder and having more advantages among all the available adders. So our aim is to design and implement floating point multiplier using Wallace and Dadda algorithm with carry look ahead adder.

### 1.6 Tools used for simulation

Xilinx ISE 14.2 design suite is used to implement floating point multiplier using Wallace and Dadda algorithm in verilog HDL

## 2. LITERATURE SURVEY

Multiplication involves two basic operations: the generation of partial products and their accumulation. There are two ways to speed up the multiplication:
[i] Reduce the partial product. [ii] Speed up their accumulation. The complexity is reduced by using less number of partial products. It speeds up to accumulate the partial products. Multipliers are key component of FIR filters, digital signal processors, microprocessors etc. A digital system's performance is determined by performance of multiplier because multiplier is slowest and area consuming element in the system. Hence optimizing the speed and area of the multiplier is a major design issue. Area and speed are usually conflicting constraints so that improving speed results mostly in larger areas. Radix $2^{\wedge} \mathrm{n}$ multipliers which operate on digits in parallel design instead of bits bring the pipelining to the digit level and avoid most of the above problems. It was introduced by M.K.Ibrahim in 1993. These structures are iterative and modular. The pipelining done at the digit level brings the benefit of speed of operation irrespective of multiplier size.

Kodali. R.K, Gundabathula S.K.\& Bappana. L investigated the floating point arithmetic, i.e. multiplication. In general the IEEE-754 single precision multiplier requires

International Journal on Recent Researches in<br>Science, Engineering \& Technology (IJRRSET)<br>A Journal Established in early 2000 as National journal and upgraded to International journal iti 2013 and is in existence for the last 10 years. It is ron by Retired Professors from NIT, Trichy, Journal Indexed in JIR, DIIF and SJIF.<br>Available online at: www.jrrset.com

ISSN $\Rightarrow: 2347-6729$
ISSN oum : 2348-3105
JIR IF : 2.54
SJIF IF : 4.334
Cosmos: 5.395

Volume 9, Issue 1 - January 2021 - Pages 7-38
a 23 *23 mantissa multiplication and double precision floating point multiplier requires a $52 * 52$ mantissa multiplication to obtain final result. Ramesh, A.P.Tilak and Prasad.A.M researched on a high speed floating point double precision multiplier is implemented on a virtex-6 FPGA. In addition the proposed design is complaint with IEEE-754 format and handles overflow, underflow, rounding and various exception conditions. Authors discussed design trade-offs of various multiplier implementations. M.Anb uselvi in paper entitled 'Design and analysis of floating point and Galois field multipliers using wavepipelining" presents a design and analysis of floating point and Galois field multipliers using a pipelined technique called "wave pipelining". Pipelining technique improves the throughput of a logic circuit while avoiding some of the overheads of traditional pipelining. Galois field theory deals with binary numbers, has the properties of mathematical field and is finite in scope. Galois field multipliers used in coding theory and cryptography. Anna Jain in paper entitled "FPGA design of a fast 32 -bit floating point multiplier unit with single precision IEEE754-2008 standard. This design makes multiplier faster by reducing delay caused by the propagation of carry by implementing adders having least power delay. The modules are written in Verilog HDL and then synthesized and simulated using the XILINX ISE 14.2 targeted on the SPARTAN 3E FPGA.

### 2.1 Different types of multipliers

Two types of parallel multipliers were defined in the 1960's. The first type use a rectangular array of identical cells which contains AND gate and addition logic to generate and sum the partial product bits. Which was defined by J.C.Majithia and R.Kitai in 1964.This type of multiplier is called array multiplier. Its delay is proportional to multiplier input word size. It is easier to implement. The second type of parallel multipliers is column compression multipliers, uses counters or compressors to reduce the matrix of partial product to two words. The delay of this is faster than array multiplier, i.e. proportional to logarithm of the multiplier word length.

Array multiplier: It is an efficient layout of a combinational multiplier. With its good structure, this multiplier is based on the standard add and shift operations. Each partial product is generated by taking into account the multiplicand and one bit of multiplier each time. The addition is carried out by high speed carry-save algorithm and final product is obtained by any fast adder.

Conclusion: Array multiplier gives more power consumption as well as optimum number of components required, but delay for this multiplier is larger. It also requires larger number of gates because of which area is also increased; due to this array multiplier is less economical. Thus it is a fast multiplier but hardware complexity is high.

Baugh Wooley Multiplier: Baugh wooley Two's complement signed multiplier is the best algorithm for signed multiplication because it improves the regularity of the multiplier and allow all the partial products to have positive sign bits. When multiplying two's complements numbers directly, each of the partial products to be added is a signed numbers. Thus each partial product has to be sign extended to the width of the final product in order to form a correct sum by the carry save adder.

Conclusion: According to this multiplier an efficient method of adding extra entries to the matrix is suggested to avoid negatively bits in the partial product matrix which results in extra circuitry and increase power consumption.

# International Journal on Recent Researches in <br> Science, Engineering \& Technology (IJRRSET) <br> A Journal Established in early 2000 as National journal and upgraded to International journal in 2013 and is in existence for the last 10 years. It is run by Retired Professors from NIT, Trichy, Journal Indexed in JIR, DIIF and SJIF. <br> Available online at: www.jirset.com 

ISSN $=2347-6729$ ISSN iam : 2348-3105

JIR IF : 2.54
SJIF IF : 4.334
Cosmos: 5.395

## Volume 9, Issue 1 - January 2021 - Pages 7-38

Braun Multiplier: It is parallel multiplier that is known as carry save array multiplier. It performs multiplication of two unsigned numbers. It consists of array of AND gates and adders.

Conclusion: One of the major disadvantages of Braun's Multiplier is that the number of components increases quadratically with number of bits. So multiplier becomes inefficient. It cannot stop the switching activity even if bit coefficient is zero that results in more power dissipation.

Booth multiplier: In the 1950's booth algorithm used in array multipliers to perform two's complement multiplication. Which is defined by Andrew D,Booth in 1951.It groups 2-bits,3-bits,4-bits at a time to reduce partial products. The triplets are converted by booth logic into a set of five control signals used by the adder cells in the array to control the operations performed by the adder cells. It reduces number of adders.

Conclusion: The disadvantages of booth multiplier are number of add operations and shift operation becomes variable. It becomes inconvenient in designing parallel multipliers. The algorithm becomes inefficient when binary value 1 varied.

Conventional Wallace Method: It consists of three steps. In first step partial products of two numbers are generated. In second step using column compression 3-bits and 2 -bits are grouped. Perform full adder and half adder operations until two rows remains. In third step remaining two rows are added using fast adder like carry look ahead adder.

Conclusion: In this method more partial products are generated. So it requires more number of adders, hence delay is more.

### 2.2 Overview of Wallace multiplier

It computes a partial product by examining two multiplicand bits at a time. Later higher radix booth algorithm was introduced to improve the latency performance of the regular booth array multiplier. In 1964 C.S.Wallace introduced a fast multiplication of Wallace based on booth algorithm using full adder and half adder. It also consists of three steps. In first step partial products of two numbers are generated using radix-8 booth algorithm. It generates less number of partial products compare to conventional Wallace method. In second step using column compression 3-bits and 2-bits are grouped. Perform full adder and half adder operations until two rows remains. In third step remaining two rows are added using fast adder like carry look ahead adder. Compare to conventional Wallace method booth algorithm with Wallace multiplier uses less number of full adders and half adders and Wallace multiplier using booth algorithm is faster. It is $67 \%$ faster than the existing Wallace multiplier, 22\%faster than radix-8 booth multiplier. Modified Wallace multiplier has the advantage of reduced complexity because of less number of half adders. The gate count in modified Wallace multiplier is less than that of the conventional Wallace multiplier.

### 2.3 Overview of Dadda multiplier

Dadda algorithm was invented by Luigi Dadda in 1965 using counter placement strategy in the partial product array. It is similar to the Wallace multiplier, but it is slightly faster for all operand sizes. It requires less number of logic gates. It consist basic three steps. In first step partial products of two numbers are generated using radix-8 booth algorithm. It generates less number of partial products compare to conventional Wallace method. In second step the partial products are divided in to two parts. In part-0 right

International Journal on Recent Researches in<br>Science, Engineering \& Technology (IJRRSET)<br>A Journal Established in early 2000 as National journal and upgraded to International journal in 2013 and is in existence for the last 10 years. It is run by Retired Professors from NTI, Trichy, Journal Indexed in JIR, DIIF and SJIF.

ISSN $\Rightarrow: 2347-6729$
ISSN iom : 2348-3105
JIR IF : 2.54
SJIF IF : 4.334
Cosmos: 5.395

## Volume 9, Issue 1 - January 2021 - Pages 7-38

most 30 bits are taken for calculations, using full adders and half adders the column addition is performed until two rows get. At last these two rows are added using carry look ahead adder. In part-1 left most 19-bits are taken for calculations, using column compression 3-bits and 2-bits are grouped. Perform full adder and half adder operations until two rows remains. At last these two rows are added using carry look ahead adder. In third step part-0 and part-1 results are added using half adders and full adders. Finally we get product of two 23-bit numbers.
Dadda has three steps:

1. Logical AND each bit from one bit of multiplicand, that gives $n^{2}$ result. Depending on position of the multiplied bits, the wires carry different weights. Example wire of bit carrying result of $\mathrm{a}_{2} \mathrm{~b}_{3}$ is 32 .
2. Reduce the number of partial products to two layers of full adders and half adders. Take three wires with same weights and perform full adder. The result will be an output wire of the same weight and an output wire with a higher weight for each tree input wires. If two wires of same weight left, perform half adder. Otherwise pass them to next layer. If only one wire left, connect it to next layer.
3. Group the wires in two numbers and add them with carry look ahead adder.

## 3. FLOATING POINT MULTIPLICATION

The IEEE (Institute of Electrical and Electronics Engineers) has created a Standard to define floating point representation. The IEEE 754 Standard is more used for floating point multiplication and is adapted to many hardware and software implementations.
The standard defines five basic formats depends on their base and the number of bits used.

### 3.1 IEEE 754 STANDARDS FOR BINARY FLOATING POINT MULTIPLICATION:

There are three binary floating point formats, which can be encoded using 32,64 or 128 bits. The first two binary formats are the 'single precision' and 'double precision' formats of IEEE 754-1985 and third is called 'quad'.

| SIGN | EXPONENT | MANTISSA |
| :--- | :--- | :--- |

IEEE (Institute of Electrical \& Electronics Engineering.) numbers are stored using scientific notation.
$\pm$ Mantissa* ${ }^{2}$ exponent
We can represent single precision floating point numbers with three binary terms:
1] Sign bit s: 1 bit.
2] Exponent field E': 8 bits.
3] Fraction field f: 23 bits.
$\mathrm{E}^{\prime}=\mathrm{E}+127 . \quad 0 \leq \mathrm{E}^{\prime} \leq 255$.

- The actual exponent E IS IN THE RANGE OF $-126 \leq \mathrm{E} \leq 127$
- The basic aspects of working with floating point numbers are two:

1. If number is not normalized, it can normalized by shifting the fraction and adjusting the exponent.
(a) Un-normalized value:

| 0 | $10001000 \quad 00101100000000000000000$ |
| :--- | :--- | :--- |

International Journal on Recent Researches in<br>Science, Engineering \& Technology (IJRRSET)<br>A Jourual Established in early 2000 as National journal and upgraded to International journal in 2013 and is in existence for the last 10 years. It is ron by Retired Professors from NIT, Trichy, Journal Indexed in JIR, DIIF and SJIF.<br>Available online at: www.jirset.com

ISSN $\Rightarrow: 2347-6729$
ISSN iam : 2348-3105
JIR IF : 2.54
SJIF IF : 4.334
Cosmos: 5.395

## Volume 9, Issue 1 - January 2021 - Pages 7-38

There is no implicit 1 to the left of the binary point.
Value represented $=+0.0010110000 \ldots *^{*}{ }^{9}$
(b) Normalized value:

\section*{| 0 | $10000101 \quad 01100000000000000000000$ |
| :--- | :---: | :---: |}

Value represented $=+1.011000000 \ldots . \ldots 2^{6}$
The scale factor is in the form of $2^{i}$. Shifting the mantissa right by one bit position is rewarded by an increase of 1 in exponent. Shifting the mantissa left by one bit position is rewarded by a decrease of 1 in exponent.
2. When computations precede, a number that does not fall in the required range. In single precision floating point numbers normalized representation requires an exponent less than -126 or greater than +127 . In first case underflow has occurred. In second case overflow has occurred. Both are arithmetic exceptions.

### 3.1.1 Exceptions:

The IEEE standard defines 5 types of exceptions that occurred when flag bit sets.

### 3.1.1.1 Invalid Operation

All exponent bits values are ' 1 ' and all the mantissa bits are equal to ' 0 ', then it represents infinity. If all exponent bits values are ' 1 ' and all the mantissa bits are not equal to ' 0 ', and then it represents Not a Number ( NaN ). The result of invalid operation is NaN (Not a number).

### 3.1.1.2 Division by zero

If divisor is zero in ordinary arithmetic there is no meaning for this expression. In computer language integer division by zero may cause a program to terminate and if floating point numbers may cause NaN (Not a number) value. Division by zero results infinity and the multiplication of two numbers also results infinity. Therefore to differentiate between the two cases, a divide by zero exception was implemented.

### 3.1.1.3 Underflow and overflow

In two cases underflow exception occurs: tininess and loss of accuracy. Tininess is detected after or before rounding when a result lies between $\pm 2$ Emin. Loss of accuracy is detected when the result is when a renormalizations loss occurs. The underflow exception occurs whenever tininess is detected after rounding and at the same time result is inexact. The overflow exception occurs whenever the result exceeds the maximum value. It is not occurred when one operand is infinity, because infinity is always exact.

Overflow/Underflow means that the result's exponent is too large/small. If exponent is 8 -bit then range is from 1 to 254 otherwise the value is not a normalized value. An overflow occurs when adding two exponents or during normalization. An underflow occurs when subtracting the bias to form the intermediate exponent. If intermediate exponent $<0$ then it is an underflow that cannot be compensated, if the intermediate exponent $=0$ then it is an underflow but that may be compensated during normalization by adding 1 to it. When an overflow occurs an overflow flag is high and the result is $\pm$ Infinity. The sign is determined according to the sign of floating point multiplier inputs. When an underflow occurs an underflow flag is high and the result is $\pm$ Zero. If E1 and E2 are exponents of two numbers, then total exponent is $\mathrm{E}=\mathrm{E} 1+\mathrm{E} 2-$ 127.E1 and E 2 range is from 1 to 254 . Therefore resulting in E values from 1+1-127=125 to $254+254-127=381$. But for normalized numbers $E$ range is from 1 to 254 .

International Journal on Recent Researches in<br>Science, Engineering \& Technology (IJRRSET)<br>A Journal Established in early 2000 as National journal and upgraded to International journal in 2013 and is in existence for the last 10 years. It is rum by Retired Professors from NTT. Trichy. Journal Indexed in JIR, DIIF and SJIF.<br>Available online at: www.jirset.com

ISSN $=2347-6729$
ISSN iam : 2348-3105
JIR IF : 2.54
SJIF IF : 4.334
Cosmos: 5.395

## Volume 9, Issue 1 - January 2021 - Pages 7-38

Special values: The end values 0 and 255 of the excess- 127 exponent $E$ ' are used to represent special values. When $E^{\prime}=0$ and mantissa $M=0$,the value exact 0 is represented. When $E^{\prime}=255$ and $M=0$, the value is infinite. When $E^{\prime}=0$ and $M \neq 0$, de-normal numbers are represented. Their value is $\pm 0 . \mathrm{M}^{*} 2^{-126}$. Therefore they are smaller than the smallest normal number. When $\mathrm{E}^{\prime}=255$ and $\mathrm{M} \neq 0$, the value is called not a number. It is the result of performing an invalid operation such as $0 / 0$ or $\sqrt{ }-1$.

The sign bit is 0 for positive numbers and 1 for negative numbers. The field f contains a binary fraction. The actual mantissa of floating point value is ( $1+\mathrm{f}$ ). For example if $f$ is $01110111 \ldots$, the mantissa become $1.01110111 \ldots$ There are many ways to write a number in scientific notation, but there is always a unique normalized representation, with exactly one non-zero digit to the left of the point.
$0.456 * 10^{3}=4.56 * 10^{2}=45.6^{*} 10^{1}$
A side effect is that we get a little more precision for given number. There are 24-bits in mantissa, but we need to store only 23 of them. The exponent field represents the exponent as a biased number. It consist actual component plus 127 for single precision floating point numbers. This converts all single precision exponents from -127 to 127 into unsigned numbers from 0 to 254.
Two examples shown below for single precision:
If exponent is 3 , the e-field is $3+127=130=10000010_{2}$
If the e-field contains 8 'b10000011 ( $131_{10}$ ) the actual exponent is $131-127=4$
Storing a biased exponent means we can compare IEEE values as if they were signed integers.

### 3.2 The binary representation of IEEE format for single precision floating point number:

The decimal number is -12.375 that is first convert to binary form. So the value is $1100.011_{(2)}$. Normalize the number by shifting the binary point until there is a single 1 to the left. Shift binary point to left after 3-bits. i.e. $1100.011 * 2^{0}=1.100011 * 2^{3}$
The exponent is 3 . Therefore in biased form it is $130=10000010$.
The fraction is 100011 .

- $\mathbf{1 2 . 3 7 5}$

| 1 | 10000010 | 10001100000000000000000 |
| :--- | :--- | :--- |

### 3.3 Floating point conversion to IEEE 754 format:

Ex1: The decimal number is 147.625
Step1: Convert decimal number to its equivalent binary fractional form.
147.625=10010011.101

Step2: Normalize the binary fractional number.
10010011.101=1.0010011101*2 ${ }^{7}$

Step3: Convert the exponent to 8-bit excess-127 notation. Add 127 to exponent and convert it to 8 -bit binary number.
$7+127=134=10000110$
Step4: Convert mantissa to buried bit format.
$1.0010011101 \longrightarrow 0010011101$
Step5: Write down $1+8+23=32$ bit binary number.
$147.625=01000011000100111010000000000000$

International Journal on Recent Researches in<br>Science, Engineering \& Technology (IJRRSET)<br>A Journal Established in early 2000 as National journal and upgraded to International journal in 2013 and is in existence for the last 10 years. It is run by Retired Professors from NTI, Trichy, Journal Indexed in JIR, DIIF and SJIF.<br>Available online at: www.jirset.com

ISSN $=2347-6729$
ISSN iam : 2348-3105
JIR IF : 2.54
SJIF IF : 4.334
Cosmos: 5.395

Volume 9, Issue 1 - January 2021 - Pages 7-38
Ex2: The decimal number is -20.5
Step1: Convert number to its equivalent binary fractional form. $20.5=10100.1$
Step2: Normalize the binary fractional number.
$10100.1=1.01001 * 2^{4}$
Step3: Convert the exponent to 8-bit excess-127 notation. Add 127 to exponent and convert it to 8 -bit binary number.
$4+127=131=10000011$
Step4: Convert mantissa to buried bit format.
$1.01001 \longrightarrow 01001$
Step5: Write down $1+8+23=32$ bit binary number. Sign bit is 1 for negative number.
$-20.5=11000001101001000000000000000000$

### 3.4 Floating point multiplier block diagram



The above figure shows block diagram of floating point multiplier. It consist mainly five steps:
Step 1: The sign of floating point number n 1 and n 2 are logically XOR together.
Sign=Sign1 XOR sign2. If both inputs are 0 , then output is 0 . If $\operatorname{sign} 1$ is 0 and sign 2 is 1 , then output is 1 . If sign 1 is 1 and $\operatorname{sign} 2$ is 0 , then output is 1 . If both inputs are 1 , then sign output is 0 .
Step2: IEEE exponents are stored as 8 -bit unsigned integers with a bias of 127. Take example $1.10101 * 2^{3}$ the exponent is 3 added to 127 and sum is $130\left(10000010_{2}\right)$.If binary exponent is unsigned; it cannot be negative. The largest possible exponent is 128. It is added with 127 and sum is 255 . This is largest unsigned value represented by 8 -bits.The range is from $1.0 * 2^{-127}$ to $1.0^{*} 2^{+128}$ The exponent is calculated by adding both exponent of floating point numbers and the result is subtracted from bias (127). $\mathrm{E}=\mathrm{E} 1+\mathrm{E} 2-127$
Step3: The mantissa is calculated by multiplying both mantissa of floating point numbers. $\mathrm{M}=\mathrm{M} 1 * \mathrm{M} 2$. Multiplication is done using any algorithm. Those are array multiplier, booth multiplier, parallel multiplier, conventional Wallace multiplier, Wallace with booth multiplier, dadda multiplier etc. Due to large delay of multipliers, different methods have been designed to increase speed. The partial products are generated using booth algorithm. The partial product bits are added using half adders and full adders until two rows get, at finally these rows are added using fast carry look ahead adder. Dadda multiplier algorithm is faster than remaining all types of multipliers. If without booth algorithm multiplication is performed then it generates more number of multiplications. It

International Journal on Recent Researches in<br>Science, Engineering \& Technology (IJRRSET)<br>A Journal Established in early 2000 as National journal and upgraded to International journal in 2013 and is in existence for the last 10 years. It is run by Retired Professors from NTI, Trichy, Journal<br>Indexed in JIR, DIIF and SJIF,<br>Available online at: www.jirset.com

ISSN $\Rightarrow: 2347-6729$
ISSN oum : 2348-3105
JIR IF : 2.54
SJIF IF : 4.334
Cosmos: 5.395

## Volume 9, Issue 1 - January 2021 - Pages 7-38

takes more delay to execute. Multiplication is a basic and important building block in all arithmetic logic units.
Step4: Normalize the result value if value is un-normalized, so that there is a 1 just before the decimal point. Shifting decimal point one place to the left increments the exponent by 1, Moving one place to right decrement the exponent by 1 . For example, decimal number is 4566.23 is normalized as $4.56623 * 10^{3}$. Same way the floating point binary value 1100.100 is normalized as $1.100100 * 2^{3}$ by moving the decimal point 3 positions to the left and multiplying by $2^{3}$. In a normalized mantissa, the digit 1 always appears to the left of the decimal point. The leading 1 is lost from the mantissa in the IEEE storage format because it is redundant. Sign, exponent and normalized mantissa are grouped into the binary IEEE representation.
The advantages of normalizing floating point numbers are:

1) The version is exceptional; there is one option to write a real number in such a format.
2) It is easy to compare two normalized numbers, check the sign, exponent and mantissa.
3) In a normalized format, a fixed size mantissa is using all the digit cells to store required digits.
4) The IEEE format normalization representation always begins with a 1-bit. This bit can be omitted and it is replaced by data. The omitted bit is also known as hidden bit.
Step5: If mantissa bits are more than 5-bits rounding is required. If we applied the truncation rounding method then the mantissa is 5-bits.At finally product of two floating point numbers is getting using IEEE standard.
3.5 Floating point multiplication algorithm:


Fig3.5a: Flowchart of floating point multiplier
The following algorithm is used to multiply two floating point numbers:

1. Multiplication (1.M1*1.M2): Its response is multiplying the unsigned significant and putting the decimal point in the multiplication product. Multiplication is performed on 23-bits. Operands $x$ and $y$ are used for multiplication. The floating point number $x$ consist of sign bit $s_{-} x$, exponent bits $e_{-} x$ and mantissa bits $m_{-} x$. The floating point number $y$ consists of sign bit $s_{-} y, e_{-} y$ and mantissa bits $m_{-} y$.
2. Putting the decimal point in the product.
3. Adding the exponents ( $e_{-}$out $=e_{-} x+e_{-} y-127$ ): Its response is to add two floating point number exponents and sum is subtracted from bias 127. An 8-bit

International Journal on Recent Researches in<br>Science, Engineering \& Technology (IJRRSET)<br>A Journal Established in early 2000 as National journal and upgraded to International journal in 2013 and is in existence for the last 10 years. It is run by Retired Professors from NTT. Tricloy. Journal Indexed in JIR, DIIF and SJIF.

## Volume 9, Issue 1 - January 2021 - Pages 7-38

carry look ahead adder is used to add two input exponents. This adder uses generate and propagate functions. Gi is referred as the carry generate signal. So carry $\mathrm{C}_{\mathrm{i}+1}$ is generated whenever $\mathrm{Gi}=1$. Pi is referred as the carry propagate signal. When $\mathrm{Pi}=1$, the input carry is propagated to the output carry. $\mathrm{C}{ }_{\mathrm{i}+1}=\mathrm{Ci}$. Computing the values of P and G depends on input bits.


Fig3.5b: Block diagram of carry look ahead adder
Full adders are used to calculate sum, propagate and generate bits. The ai, bi and ci are input bits. Si and $\mathrm{Ci}+1$ are output bits.
$\mathrm{Pi}=\mathrm{ai}+\mathrm{bi} \quad \mathrm{Gi}=$ ai . bi
$\mathrm{Si}=$ ai xor bi xor ci
$\mathrm{Ci}+1=\mathrm{Gi}+\mathrm{Pi} . \mathrm{ci}$
Carry look ahead adder is faster because it generates carry bits parallel by an additional logic circuit when inputs change. It uses carry bypass logic ti speed up the carry propagation.
4. Obtaining sign by performing the operation s 1 XOR s2.
i.e. s_ out=s_ x XOR s_y. Multiplying one negative number and one positive number results negative number product. If both numbers are positive or negative then product is positive number. According to logical XOR truth table multiplication is performed. When both inputs are 0 or 1 , the output is 0 . When any one of the input is 0 or 1 , the output is 1 .
5. Normalizing the result: The result of the significant multiplication is normalized to have a leading 1 to the left of the decimal point. If product is $1010.0000100_{(2)}$ then its normalized value is $1.0100000100_{(2)}$.
6 . Rounding the result to fit in the 32 -bits.

### 3.5.1 Floating point numbers multiplication examples:

Ex1: 12.52 * $15.25=190.93$

$$
\begin{aligned}
& 12.52 \longrightarrow 1100.10000101 \\
& 15.25 \longrightarrow 1111.01000000
\end{aligned}
$$

Normalized value of first number is $1.10010000101 * 10^{3} \rightarrow 10010000101$
Normalized value of second number is $1.11101000000 * 10^{3} \rightarrow 11101000000$
Exp1=3+127=130=10000010
IEEE format of first number is:
0-10000010-10010000101000000000000
Exp2 $=3+127=130=10000010$
IEEE format of second number is:
0-10000010-11101000000000000000000
Exp=Exp1+Exp2-127

## Volume 9, Issue 1 - January 2021 - Pages 7-38

$$
=130+130-127
$$

Exp=133
Mantissa multiplication:


### 10.1111101110110001000000

The normalized value of product is $1.01111101110110001000000 \times 10^{1}$
Total $\exp =$ product_ $\exp +\exp -127$
$=1+133-127$
Total $\exp =7$
The product of mantissa of two numbers is 1.01111101110110001000000 .
Shift decimal point to right after 7 bits. So the product is
$10111110.1110110001000000_{(2)}=190.93_{(10)}$
Normalized form of product is
$1.01111101110110001000000^{*} 10^{7} \longrightarrow 01111101110110001000000$
Exp $=7+127=134=10000110$
Sign of product is 0 .
IEEE form of product is: $0-10000110-01111101110110001000000$
Ex2: $16.25 *(-23.75)=(-385.9375)$
$16.25 \longrightarrow 10000.01$
$23.75 \longrightarrow 10111.11$
Normalized value of first number is $1.000001 * 10^{4} \longrightarrow 000001$
Normalized value of second number is $1.011111 * 10^{4} \longrightarrow 011111$
Exp1 $=4+127=131=10000011$
IEEE format of first number is:
0-10000011-00000100000000000000000
Exp2 $=4+127=131=10000011$
IEEE format of second number is:
0-100000011-011111000000000000000000
$\operatorname{Exp}=E x p 1+E x p 2-127$
$=131+131-127$
$\operatorname{Exp}=135$

Volume 9, Issue 1 - January 2021 - Pages 7-38

Mantissa multiplication:


The product of mantissa of two numbers is 1.100000011111 .
Shift decimal point to right after 9 bits. So the product is
$1100000011.111_{(2)}=-385.9375_{(10)}$.
Normalized form of product is $1.100000011111 * 10^{9} \longrightarrow 100000011111$
Exp $=9+127=136=10001000$
Sign bit is 1 .
IEEE format of product is:
1-10001000-10000001111100000000000
Еx3: $6.5 * 2.5=16.25$
$6.5 \longrightarrow 0110.10$
$2.5 \longrightarrow 0010.10$
Normalized value of first number is $1.1010 * 10^{2} \longrightarrow 1010$
Normalized value of second number is $1.010 * 10^{1} \longrightarrow 0101$
Exp $1=2+127=129=10000001$
Exp2 $=1+127=128=10000000$
IEEE format of first number is:
0-10000001-10100000000000000000000
IEEE format of second number is:
0-10000000-01000000000000000000000
Exp=Exp1+Exp2-127
$=129+128-127=130$
Mantissa multiplication:
1.1010
1.0100
00000
00000
11010

International Journal on Recent Researches in Science, Engineering \& Technology (IJRRSET)
A Journal Established in early 2000 as National journal and upgraded to International journal in 2013 and is in existence for the last 10 years. It is rum by Retired Professors from NIT. Trichy, Journal Indexed in JIR. DIIF and SJIF.
Available online at: www.jurset.com

ISSN $\Rightarrow 2347-6729$
ISSN ioume : 2348-3105
JIR IF : 2.54
SJIF IF : $\mathbf{4 . 3 3 4}$
Cosmos: 5.395

Volume 9, Issue 1 - January 2021 - Pages 7-38
00000
11010

### 10.00001000

The normalized value of product is $1.000001000 * 10^{1}$
Total exp =product_ exp $+\exp -127$

$$
=1+130-127
$$

Total $\exp =4$
The product of mantissa of two numbers is 1.000001000 .
Shift decimal point to right after 4 bits. So the product is
$10000.01000_{(2)}=16.25_{(10)}$.
Normalized form of product is $1.000001 * 10^{4} \longrightarrow 000001$
Exp $=4+127=131=10000011$
Sign bit is 0 .
IEEE format of product is: $0-10000011-00000100000000000000000$

## 4. BOOTH3 ALGORITHM

The 16-bit booth 3 multiplication concept is also used for 23-bit booth 3 multiplication. Multiplier and multiplicand both are 23 -bits. The multiplier is divided into 8 groups. Each group contains 4-bits binary value as shown in figure 4.1a.

### 4.1 Multiplication of two binary numbers using booth 3 algorithm



Fig4.1a: 16-bit booth 3 multiplication
In each group multiplier $4^{\text {th }}$ bit is checked and if it is 0 then sign bit is $S$. If it is 1 then sign bit is $\sim \mathrm{S}$. The partial products reduced from 23 to 9 by using boot 3 algorithms. The partial product selection table is shown in figure. Each partial product is chosen from the

International Journal on Recent Researches in<br>Science, Engineering \& Technology (IJRRSET)<br>A Journal Established in early 2000 as National journal and upgraded to International journal in 2013 and is in existence for the last 10 years. It is run by Retired Professors from NTI, Triclyy. Journal Indexed in JIR, DIIF and SJIF.

ISSN $\Rightarrow 2347-6729$
ISSN ioume : 2348-3105
JIR IF : 2.54
SJIF IF : 4.334
Cosmos: 5.395

## Volume 9, Issue 1 - January 2021 - Pages 7-38

set $0, \pm \mathrm{M}, \pm 2 \mathrm{M}, \pm 3 \mathrm{M}, \pm 4 \mathrm{M}$. Except 3 M all multiples are obtained from shifting and complementing of the multiplicand.

The following steps are used to perform booth3 algorithm:

1) The multiplication of two 23 -bits binary numbers using Booth algorithm implies reduction in number of digits to 8 as shown below figure 4.1b..


Fig4.1b: multiplier recoding
2) The partial products multiplexer selects one operation out of nine possible operations depending on value of the corresponding signed bit as shown in figure4.1c.


Fig4.1c: Partial product multiplexer
3) The partial product multiplexer selects $M$ if multiplier 4-bits binary value is 0001 and $\sim \mathrm{M}$ is selected when binary value of multiplier is 1101 . The multiplexer selects 2 M when binary value of multiplier is 0011 . The multiplexer selects $\sim 2 \mathrm{M}$ when binary value of multiplier is 1011. The partial product multiplexer selects 3 M when binary value of multiplier is 0101.
For ex. The operation of 3 M is:


Volume 9, Issue 1 - January 2021 - Pages 7-38

### 4.2 Logic diagram of booth3 partial product generator



Fig4.2: 16-bit booth 3 partial product generator logic circuit
The above logic diagram shows booth 3 algorithm. This modified booth algorithm is most used method to generate partial product. This algorithm generates less partial products compare to other techniques by using reduction method. Therefore compression speed is enhanced. 2-bit, 3-bit, 4-bit recoding is used for this algorithm. The 4-bit recoding means that the multiplier $B$ is divided into groups of four bits and the algorithm is applied to this group. The booth algorithm is implemented into two steps:

1) Booth decoding
2) Booth selecting

The booth encoding is used to produce one of the four values in the multiplier group. The booth selecting circuit is used to produce a partial product bit k. This algorithm reduces partial products by a factor of 2, without adding before to produce the partial products.fig shows the dot diagram for a $23 * 23$ multiplication. The multiplier is divided into overlapping groups of 4 bits and each group is decoded to select a single partial product as per the selection table. Each partial product is shifted 3 bit positions with respect to its neighbors. The numbers of partial products are reduced from 23 to 9 . In

International Journal on Recent Researches in
Science, Engineering \& Technology (IJRRSET)
A Journal Established in early 2000 as National journal and upgraded to International journal in 2013 and is in existence for the last 10 years. It is run by Retired Professors from NTT, Trichy, Journal Indexed in JIR, DIIF and SJIF.
Available online at: www.jurset.com

ISSN $\Rightarrow 2347-6729$
ISSN (avian : 2348-3105
JIR IF : 2.54
SJIF IF : 4.334
Cosmos: 5.395

Volume 9, Issue 1 - January 2021 - Pages 7-38
general there is $(\mathrm{n}+2) / 2$ partial products, where n is the operand length. Many required multiples are obtained by a simple shift of the multiplicand. Negative multiples taken in two's complement form, which is obtained using a bit by bit complement of the corresponding positive multiples, with a 1 added at the least significant bit of partial product. Booth algorithm also reduces dots in dot diagram. In this partial product groups are assigned to a set $0, \mathrm{M}, 2 \mathrm{M}, 3 \mathrm{M}, 4 \mathrm{M},-0,-\mathrm{M},-2 \mathrm{M},-3 \mathrm{M},-4 \mathrm{M}$.

M is multiplicand value. -M is complement of multiplicand value. 2 M is circular left shift by 1 -bit position. -2 M is circular left shift of complement of multiplicand 1-bit position. 3 M is ( $\mathrm{a}+2 \mathrm{a}$ ), that means 'a 'refers multiplicand and 2 a is circular shift of a. -3 M is complement of $(a+2 \mathrm{a})$ value. 4 M is circular shift of multiplicand by 2 -bit position. -4 M is complement of 4 M value. The number of dots, constants and sign are added is 126 for 23*23 multiplier and height of partial product is now 9 .

Generation of the multiple 3 M requires adder circuit. It cannot be obtained by simple shifting or complementing of multiplicand. This increases the complexity of the partial product generation. The amount of hardware and delay depends upon number of partial products to be added. Booth algorithm generates less partial products, so hardware cost is less and it improves performance of multiplier. Booth is used in multiplier with long operands i.e. greater than 16 -bits. Booth 2 is fastest algorithm, booth 3 is power efficient and booth 4 requires less area. In booth 3 algorithm starting 27 bits are dots and $28^{\text {th }}, 29^{\text {th }}$ and $30^{\text {th }}$ bits are sign bits, which are S . 31 th bit is complement of sign bit, that is $\sim$ S. If MP [3] is 0 , then sign is 0 . So it represents $S$. If MP [3] is 1 , then sign is 1 . So it represents $\sim S$.

## 5. PROPOSED ALGORITHMS

### 5.1 Floating point multiplication using Wallace algorithm

In 1964 C.S.Wallace introduced a Wallace tree multiplication algorithm. It includes three steps to multiply two numbers.

Step 1: The partial products are generated using booth 3 algorithm. Nine partial products are generated. Two 23 -bit numbers are used as inputs, those are multiplicand and multiplier. The multiplier input is divided into 8 groups. Each group consists of 4-bit binary value. If 0001 in the group then multiplicand value should write as it is. If 0011 in the group then $2^{*}$ multiplicand value should write. Similarly $\pm 3$ multiplicand and $\pm 4$ multiplicand are represented for other binary numbers shown in multiplication using booth3 algorithm table.
Step 2: In first stage the nine partial products are divided into 3 levels. In level-1 the full adder ( $3: 2$ counter) and half adders (2:2 counter) are used for 3-bits and 2-bits respectively. The full adder and half adder results sum and carry bits are stored in $2^{\text {nd }}$ stage, level-1. In level-1 also same full adder and half adders are used, these outputs sum and carry are stored in further level. This continues in same way until two rows get.
Step3: These two rows are added using carry look ahead adder. It is faster adder so the delay of multiplication is less. Overall the multiplication consist 5 stages.Stage- 1 consist of 3 levels. Stage- 2 consists of 2 levels. Stage- 3 consists of 1 level. Stage- 4 consist of 1 level and finally stage- 5 also consist of 1 level i.e. using CLA the addition is performed. Two CLA's are used to perform addition and to get product output. At finally we get product of two numbers. Single precision 32-bit floating point multiplication of two

A Journal Established in early 2000 as National journal and upgraded to International journal in 2013 and is in existence for the last 10 years. It is run by Retired Professors from NTH, Trichy. Journal Indexed in JIR, DIIF and SJIF.

## Volume 9, Issue 1 - January 2021 - Pages 7-38

numbers consists of 1 -bit sign, 8 -bits exponent and 23 -bits mantissa. The 23 -bit two floating point numbers mantissa are multiplied using above Wallace technique. The sign bit of first number and sign bit of second numbers are XOR to get sign bit multiplication. When both are 0 or 1 , the output is 0 . When any one output is 0 or 1 , the output is 1 . Exponent is calculated using propagate and generate function. Using carry look ahead adder the 8 -bits of exponent are added. The difference between Wallace tree multiplier and column compression multiplier is that, in Wallace tree each possible bit in each column is covered by $3: 2$ counter and $2: 2$ counter, until finally the partial product has two rows. This algorithm consists of 5 stages.


Fig5.1: Dot diagram of 23-bit Wallace multiplier.

International Journal on Recent Researches in<br>Science, Engineering \& Technology (IJRRSET)<br>A Journal Established in early 2000 as National journal and upgraded to International journal iti 2013 and is in existence for the last 10 years. It is ron by Retired Professors from NIT, Trichy, Journal Indexed in JIR, DIIF and SJIF.

ISSN $\Rightarrow 2347-6729$
ISSN ioume : 2348-3105
JIR IF : 2.54
SJIF IF : 4.334
Cosmos: 5.395

## Volume 9, Issue 1 - January 2021 - Pages 7-38

Stage-1: It consists of 3 levels as shown in figure5.1. The level-1 of stage-1 performs 36 full adders' functions. MP [0] is $0^{\text {th }}$ bit of multiplier. It is always 0 , because when grouping the multiplier which contains 4 bits in each group we should add 1 zero to multiplier. If MP [3] is 0 , the sign
bit is 0 and it is represented as S. If MP [3] is 1 , the sign bit is 1 and it is represented as $\sim$ S. Nine partial products are produced using booth algorithm. These partial products are divided in to 3 levels. The level- 2 consists of 39 full adders and level- 3 consists of 35 full adders. The sum and carry outputs of level- 1 of stage- 1 are stored in level- 1 of stage- 2 .

Stage-2: Level- 1 consists of 6 half adders and 30 full adders. Level- 2 of stage- 1 sum and carry output bits are stored in this stage-2.This continues until two rows get. The level-2 consists of 5 half adders and 30 full adders.
Stage3: It consists of 12 half adders and 31 full adders. Two dots are used for half adder operation and three dots are used for full adder operation.
Stage4: It consists of 18 half adders and 30 full adders.
Stage5: Two CLA'S are used to add last two rows of Wallace algorithm.CLA1 add sum bits from S226 to S248 with carry bits from C225 to C247. CLA2 add sum bits from S249 to S271 with carry bits from C248 to C270. Two half adders are used at beginning and end of stage-5. At finally we get product of two binary numbers. Product is assigned from S0, S110, S181, S224, and S272 to S321.
Totally Wallace multiplier uses:

1) Full adders $=231$.
2) Half adders $=43$.
3) $\mathrm{CLA}=2$.

Wallace multiplier require more number of full adders, half adders compare to Dadda multiplier. So Wallace is more complex to design but Dadda multiplier is easy to design the single precision floating point multiplier. Wallace multiplier requires more wires compare to Dadda multiplier. Carry look ahead adders are used to improve the speed of the design. These are faster adders compare to all other adders because they uses carry generate and propagate functions. The dot diagram of Wallace multiplier is explained above clearly. Stage levels are reduced as stage number increase. Final stage is carry look ahead adder, from that result product of two numbers will get.

### 5.2 Floating point multiplication using Dadda algorithm

Dadda multiplier developed Wallace's multiplier by defining a few counters in partial product reduction stage using carry look ahead adder. Dadda uses many ways to compress the partial product bits using 3:2 and 2:2 counters. Fig shows the process of 23*23 bits dot diagram for dadda multiplier. Each dot represents a bit. In first step columns having more than six dots are reduced to 6 dots, next reduced to 4 dots, next reduced to 3 dots and at final dots are reduced to 2 dots in a column. These two rows are added using carry look ahead adder. Each half adder uses two dots, outputs one in the same column and one in the next more significant column and each full adder uses three dots, outputs one in same column and one in the next more significant column so that no column in step 1 will have more than 6 dots.

In each case the rightmost dot of the pair that is connected by a line is in the column from which the inputs were taken from the adder. In next step reduction is no more than 4 dots per column, further no more than three dots per column, at last no more than two dots per column is performed. The height of the matrices is obtained by

## International Journal on Recent Researches in Science, Engineering \& Technology (IJRRSET)

A Journal Established in early 2000 as National journal and upgraded to International journal in 2013 and is in existence for the last 10 years. It is run by Retired Professors from NIT, Trichy, Journal Indexed in JIR, DIIF and SJIF.
Available online at: www.jirset.com

ISSN $=2347-6729$
ISSN iam : 2348-3105
JIR IF : 2.54
SJIF IF : 4.334
Cosmos: 5.395

## Volume 9, Issue 1 - January 2021 - Pages 7-38

functioning back from the final two row matrix and restricting the height of the each matrix to the largest integer that is no more than 1.5 times the height of its successor. Each matrix is produced from its predecessor in one adder delay. Since the number of bits in the words to be multiplied, the delay of the matrix reduction process that reduces is proportional to $\log \mathrm{n}$, where n is word size. Final two row matrix can be implemented as a carry look ahead adder and total delay for this multiplier is proportional to the logarithm of the word size n .

### 5.2.1 Partitioning the partial products:

Partial products are divided into two parts: part-o and part-1. In which part-0 and part- 1 consists of n columns. The two parts are separately performed and finally added both result together. The partial products of each part are reduced to two rows by the using 3:2 counter and 2:2 counters by referring dadda algorithm. The grouping of 3 dots and 2 dots in same column refers to 3:2 and 2:2 counters respectively. $S$ and $C$ denote partial sum and partial carry bits.
Part-0: (Stage-1):In stage-1 nine partial products are divided into 3 levels as shown in figure5.2.1a. The partial products are generated using booth 3 algorithm. MP [0] is always 0 because when grouping the multiplier bit 0 is replaced with zero. Ex: Multiplier is 10010110110100010


Fig5.2.1a: Dot diagram of 23-bit Dadda multiplier.

International Journal on Recent Researches in<br>Science, Engineering \& Technology (IJRRSET)<br>A Journal Established in early 2000 as National journal and upgraded to International journal in 2013 and is in existence for the last 10 years. It is rum by Retired Professors from NTT. Trichy, Journal Indexed in JIR, DIIF and SJIF.<br>Available online at: www.jurset.com

ISSN $=2347-6729$
ISSN iomal : 2348-3105
JIR IF : 2.54
SJIF IF : 4.334
Cosmos: 5.395

Volume 9, Issue 1 - January 2021 - Pages 7-38
By using logic diagram the 9 partial product generation equation is wrote:


Using FOR loop partial products are generated. In each group if MP [3] is 0 , then sign bit is 0 and it is noted to S . If MP [3] is 1 , then sign bit is 1 and it bits noted as $\sim \mathrm{S}$. That means complement of $S$ is calculated. In level-1 one half adder functions is performed and 15 full adders are performed. In level- 2 one half adder and 12 full adders are used. In level- 3 one half adder and 10 full adders are used.

Stage-2: S0, S1.......S15 bits are stored in level-1 of stage-2. The generated carry output bits are written in next column by one bit shift. The c 0 is carried to next column where it is to be added up with sum s1 of a 3:2 counter. The carry c 1 of 3:2 counter is added to next column. Stage-2 full adders and half adders outputs sums and carries are stored in next column in level-1 and level-2. The output sums S16 to S28 are added to previous sums and carries in level-1. Totally 21 full adders and 1 half adder used in this level. In level-2 of stage-2 18 full adders and 1 half adder and another 1 half adder are used to perform addition.

Stage-3: It consists 1 half adder and 25 full adders. The addition is used same process.
Stage-4: It consists of 2 half adder and 28 full adders. The bits C62 and C107 are added using half adders. The process is continues until two rows to get.

Stage-5: The two CLA'S CLA1 and CLA2 and 5 half adders are used to add the input bits.CLA is faster than other adders. It uses carry propagate and generate functions. The starting bits addition is performed using half adder. Four half adders are used at beginning. Next S109 to S137 and C108 to C136 bits are added using two CLA'S. At last using one half adder for C137 and C172 the last sum bit S173 is getting. Also it generates carry bit C173.Finally S138 to S173 are result sum bits of part-0 in dadda algorithm. C139 to C174 are result carry bits of part-0 in dadda algorithm.

International Journal on Recent Researches in
Science, Engineering \& Technology (IJRRSET)
A Journal Established in early 2000 as National journal and upgraded to International journal in 2013 and is in existence for the last 10 years. It is run by Retired Professors from NTT. Trichy. Journal Indexed in JIR, DIIF and SJIF.
Available online at: www.jurset.com

JIR IF : 2.54
SJIF IF : 4.334
Cosmos: 5.395

Volume 9, Issue 1 - January 2021 - Pages 7-38
Dadda Algorithm (Part 1)



Fig5.2.1b: Dot diagram of 23-bit Dadda multiplier.
Part-1: (Stage-1):

International Journal on Recent Researches in<br>Science, Engineering \& Technology (IJRRSET)<br>A Journal Established in early 2000 as National journal and upgraded to International journal in 2013 and is in existence for the last 10 years. It is ron by Retired Professors from NIT, Trichy, Journal Indexed in JIR, DIIF and SJIF.<br>Available online at: www.jirset.com

ISSN $=2347-6729$
ISSN iomal : 2348-3105
JIR IF : 2.54
SJIF IF : 4.334
Cosmos: 5.395

## Volume 9, Issue 1 - January 2021 - Pages 7-38

The partial products are shifted upward to make a not more than 6 bits in first step as shown in figure5.2.1b. The satge- 1 consists of two levels. In level- 1 half adders are 2 and full adders are 6. P4 [38] and P5 [38] bits are added using half adder. The starting bit is $31^{\text {st }}$ bit. The partial product $31^{\text {st }}$ to $50^{\text {th }}$ bits are considered for calculation. P2 [31], P3 [31] and P4 [31] bits are added using full adders. In this full adder ' $a$ ' is treated as P2 [31], ' b ' is treated as P3 [31] and 'Cin' is treated as P4 [31]. In level-2 4 half adders are used.
Stage-2: It consists of two levels.Level-1 consists of 2 half adders and 12 full adders. The sums of level-1 of stage-1 are stored in level-1 of stage-2. In that level next column consists of carry bits of level-1 of stage-1. In level- 2,3 half adders and 7 full adders are used. The sum and carry output bits of previous stage bits are added using half adders and full adders in next stage.
Stage-3: It consists of 3 half adders and 14 full adders. The partial products P [7] and P[8] bits are added together.
Stage-4: It consists of 2 half adders and 17 full adders. The partial products P [8] and P[9] bits are added together in stage-4.
Stage-5: One CLA is used to perform addition of sum bits from S228 to S245 and carry bits from C227 to C244. Another 2 half adders are used at beginning bits and end bits. At finally we get part-1 output of dadda multiplier. The result sum bits are from S246 to S266 and carry bits are from C247 to C267. At last the product of two binary numbers will get by adding part-0 output and part-1 output. From S138 to S169 are directly assigned to output and next 1 half adder and 2 full adders are used. At last 18 half adders are used to get final product.
Totally dadda multiplier uses:

1) Full adders $=187$.
2) Half adders $=55$.
3) $\mathrm{CLA}=3$.

## 6. VERTEX 5 FAMILY OVERVIEW

Vertex-5 devices are user-programmable gate arrays with various configurable elements and embedded cores optimized for high-density and high-performance system designs.

### 6.1 General Description

The Vertex- 5 family provides the newest most powerful features in the FPGA market. Using the second generation Advanced Silicon Modular Block (ASMBL) column-based architecture, the Vertex-5 family contains five distinct platforms, the most choice offered by any FPGA family. Each platform contains a different ratio of features to adders the needs of a wide variety of advanced logic designs. In addition to the most advanced, high-performance logic fabric, Vertex- 5 FPGAs contain many hard-ip system level blocks, including powerful 36kbit block RAM/FIFOs, second generation $25 \times 18$ DSP slices, select IO technology with built-in digitally controlled impedance, chip sync source-synchronous interface blocks, system monitor functionality, enhanced clock management tiles with integrated Digital Clock Managers(DCM) and Phase Locked Loop(PLL) clock generators and advanced configuration options.

Additional platform dependent features include power-optimized high-speed serial transceiver blocks for enhanced serial connectivity, PCI Express compliant integrated

ISSN $=2347-6729$
ISSN iam : 2348-3105
JIR IF : 2.54
SJIF IF : 4.334
Cosmos: 5.395

Volume 9, Issue 1 - January 2021 - Pages 7-38
Endpoint blocks, tri-mode Ethernet Media Access Controllers(MACs) and highperformance PowerPC440 microprocessor embedded blocks. These features allow advanced logic designers to build the highest levels of performance and functionality into their FPGA-based systems. Built on a 65 nm state-of-the-art copper process technology, Virtex-5 FPGAs are a programmable alternative to custom ASIC technology.

Most advanced system designs require the programmable strength of FPGAs. Virtex-5 FPGAs offer the best solution for addressing the needs of high performance logic designers, high-performance DSP designers and high performance embedded systems designers with unprecedented logic, DSP, hard/soft microprocessors and connectivity capabilities. The Virtex-5 LXT, SXT, TXT AND FXT platforms include advanced high-speed serial connectivity and link/transaction layer capability.

### 6.2 Features

1. Virtex-5 FPGA slices are organized differently from previous generations. Each Virtex-5 FPGA slice contains four LUTs and four flip-flops.
2. Each DSP48E slice contains a $25 \times 18$ multiplier, an adder and an accumulator.
3. Block RAMs are fundamentally 36kbits in size. Each block can also be used as two independent 18kbit blocks.
4. Each Clock Management Tile (CMT) contains two DCMs and one PLL.

### 6.3 Virtex-5 FPGA ordering information



## 7. SIMULATIONS AND RESULTS

The single precision floating point multiplier using Wallace algorithm and dadda algorithms are designed using Xilinx ISE 14.2 design suit and have been synthesized with XC5VLX110T of Virtex-5 as the target device. Proposed algorithm achieves from writing Verilog code. The delay of single precision floating point multiplier using Wallace algorithm is compared with delay of single precision floating point multiplier using dadda algorithm. This chapter mainly discusses the simulation results of floating point multiplier using Wallace and dadda algorithm and analysis of performance goals.

### 7.1 Exponent multiplication of two floating point numbers:

## Volume 9, Issue 1 - January 2021 - Pages 7-38

The two numbers are: $16.25 \times-23.75$

1) $16.25=10000.01=1.000001 \times 10^{4}$

$$
\exp 1=4+127=131=10000011
$$

2) $23.75=10111.11=1.011111 \times 10^{4}$
$\exp 2=4+127=131=10000011$
Inputs:
$\begin{array}{lll}16.15 \longrightarrow 0 & 10000011 & 00000100000000000000000 \\ 23.75 \longrightarrow 1 & 10000011 & 01111100000000000000000\end{array}$
Sign=sign1 XOR sign2
$=0$ XOR $1=1$

## Sign=1

exp =exp1+exp2-127
$=131+131-127$

$$
\exp =135=10000111
$$

$$
11111=1.100000011111
$$

$\operatorname{expf}=135+1=136$
Final exp is $136-127=9$
$\mathrm{M}=1100000011.111_{2}=385.9375_{10}$


Fig7.1: Two floating point number exponent multiplication output
The product exponent is calculated by adding two floating point number exponents and addition result is subtracted from bias 127 .

A Journal Establisbed in early 2000 as National journal and upgraded to International journal in 2013 and is in existence for the last 10 years. It is run by Retired Professors from NIT, Trichy. Journal Indexed in JIR, DIIF and SJIF.
Available online at: www.jirset.com

ISSN $\Rightarrow 2347-6729$
ISSN iovial : 2348-3105
JIR IF: 2.54
SJIF IF : 4.334
Cosmos: 5.395

## Volume 9, Issue 1 - January 2021 - Pages 7-38

$\exp 1=10000011_{2}=131_{10}$
$\exp 2=10000011_{2}=131_{10}$
$\exp =\exp 1+\exp 2-127$

$$
=131+131-127=135_{10}=10000111_{2}
$$

The single precision floating point representation consist 8 -bit exponent. The exponent field represents the exponent as a biased number. It contains the actual exponent plus 127 for single precision. This converts all single precision exponents from -127 to 127 into unsigned numbers from 0 to 254 . The resultant exponent is calculated using generate and propagate functions in verilog code.
7.2 Generation of partial products using booth3 algorithm:

| Name | Yalue | - | $\mid 1,984,741 \mathrm{ps}$ | 11,984,742ps | $\mid 1,984,743 \mathrm{ps}$ | $\begin{gathered} 1,984,744 \mathrm{ps} \\ \hline 1 ـ 山) ~ \end{gathered}$ |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| ${ }^{-1} \mathrm{H}_{\text {I }}$ amp[22:0] | 0000010000 |  |  | 000001000 | 100000000000000 |  |
| - - $_{\text {H }}$ ame[22:0] | 0111110000 |  |  | 011111000 | 100000000000000 |  |
| 推 ck | 1 |  |  |  |  |  |
|  | 0000000000 |  | 000000 | 00000111110000000 | 100000000000000000 | 100000000000 |
| , $68 \times[27: 0]$ | 0000000001 |  |  | 000000000100 | 10000000000000000 |  |
|  | 000001111, |  |  | 00000111110 | 10000000000000000 |  |
| - $\mathrm{H}_{6} \mathrm{mc} 3$ 3[26:0] | 0001011101 |  |  | 00010111010 | 10000000000000000 |  |
| - $\mathrm{C}_{6} \mathrm{mc}[26: 0]$ | 000001111, |  |  | 00000111110 | 10000000000000000 |  |
| - H ¢ mc_2[26:0] | 0000111110 |  |  | 00001111100 | 10000000000000000 |  |
| - $\mathrm{H}_{8}^{\text {mc_-4[26:0] }}$ | 0001111100 |  |  | 0001111000 | 70000000000000000 |  |
|  | 0000000000 |  |  | 000000000000 | 10000000000000000 |  |
| > 50 [ $31: 0]$ | 0000000000 |  |  | 000000000000000 | 10000000000000110 | 1 |
| > 50] pp[26:0] | 0000000000 |  |  | 00000000000 | 10000000000000000 |  |
| - Lob ppi[49:0] | 0000000000 |  | 000000 | 200000000000001000 | 100000000000000000 | 100000000000 |
| - 5 Pp2[49:0] | 0000000000 |  | 000000 | D00000000000111000 | 1000000000000000000 | 100000000000 |
| - 5 pp ${ }^{\text {ch [49:0] }}$ | 0000000000 |  | 000000 | 200000000111000000 | 100000000000000000 | 100000000000 |
| - [0] pp4[49:0] | 0000000000 |  | 000000 | D0000111000000000 | 100000000000000000 | W0000000000 |
| - ${ }_{\text {d }}$ pp5[49:0] | 000000001. |  | 000000 | 00111000000000000 | 1000000000000000004 | \$00000000000 |

Fig7.2: Partial products output
The variables amp and amc are multiplier and multiplicand binary values. Those are 23-bits wide. As explained in chapter 4 each partial product is chosen from the set 0 , $\pm \mathrm{M}, \pm 2 \mathrm{M}, \pm 3 \mathrm{M}, \pm 4 \mathrm{M}$. Except 3 M all multiples are obtained from shifting and complementing of the multiplicand. Using partial product generation table for 23-bit mantissa nine partial products are generated.

The partial products are assigned as pp1, pp2, pp3, pp4, pp5, pp6, pp7, pp8, pp9. Many intermediate wires and registers are used to calculate partial products. FOR loop is used to generate each partial product. To calculate 3 M the 2 M is added with M . The partial product generation equation is written using XOR, AND, OR and NOT basic logic gate expressions. Without booth algorithm 23 partial products are generating but using booth algorithm only 9 partial products are generating.

International Journal on Recent Researches in Science, Engineering \& Technology (IJRRSET)
A Journal Established in early 2000 as National journal and upgraded to International journal in 2013 and is in existence for the last 10 years. It is run by Retired Professors from NTI, Trichy. Journal Indexed in JIR, DIIF and SJIF.

## Volume 9, Issue 1 - January 2021 - Pages 7-38

### 7.3 Using Wallace algorithm floating point multiplier output:

| Name | Yalue |  |  |  | 19898989 |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: |
|  | 010000012 |  | ditam | momiomomil | manomom |  |
|  | 110000012 |  |  | aniliomamom | mamamoio |  |
| Tha ${ }^{\text {d }}$ | 1 |  |  |  |  |  |
|  | omolitome |  | momilimomili | momomomomilin | momomomomom |  |
| 1) prio] | 10000111 |  |  | 10000il1 |  |  |
| - Emaiksicas | olmonom |  |  | miomomiomilim |  |  |
| 1) [fmatiselici | 010141120 |  |  | 1100300300 |  |  |
|  | 110000112 |  | 11.100 | momomililicu | mamomoio |  |
| 1) [19xific] | 1000011 |  |  | 1000011 |  |  |
| 17 inc | 1 |  |  |  |  |  |
| 17 mpl | 0 |  |  |  |  |  |
| 17 min | 1 |  |  |  |  |  |
|  |  |  |  |  |  |  |

Fig7.3: The output of floating point multiplier using Wallace algorithm
The variables ' $a$ ' and ' $b$ ' are 32-bit inputs. These two floating point numbers are represented in IEEE 754 format. The verilog code is written in structural mode.
$\mathrm{a}=01000001100000100000000000000000$
$\mathrm{b}=11000001101111100000000000000000$
These two floating point numbers are multiplied and output product is generated.
Pro $=110000111100000011111000000000$
Exponent calculation is above explained and sign bit calculation is performed using logical XOR function. Floating point multiplication using Wallace algorithm uses more number of full adders and half adders compare to dadda algorithm floating point multiplication.

### 7.4 Using Dadda algorithm floating point multiplier output:

## Volume 9, Issue 1 - January 2021 - Pages 7-38



Fig7.4: The output of floating point multiplier using Dadda algorithm
Dadda algorithm floating point multiplication uses same steps like Wallace algorithm but mantissa multiplication is different.Booth3 algorithm generates 9 partial products. The same partial product generation code is used for both Wallace and dadda algorithms. In dadda algorithm the partial products are divided in to 2 parts.

Part-0 and part-1 operations are separately performed and finally these results are added together to get final product. Dadda algorithm uses less full adders and half adders compare to Wallace algorithm. Therefore it is faster than floating point multiplier using Wallace algorithm. Mantissa1 and mantissa2 are 23 -bit wide. The product is also IEEE754format. We can convert that to decimal point number.

### 7.5 Device Utilization Summary

Common components such as flip-flops, LUTs, block RAM and multiplexers make up the basic logic structures on a Virtex-5. A collection of these basic structures is called as slice or Configurable Logic Block (CLB). The numbers of slice registers used are 253 and number of slice LUTs are 958 in floating point multiplier using Wallace algorithm. Information about map report and device utilization will give whether design fits into the device or not.

International Journal on Recent Researches in
Science, Engineering \& Technology (IJRRSET)
A Journal Established in early 2000 as National journal and upgraded to International journal in 2013 and is in existence for the last 10 years. It is rum by Retired Professors from NIT. Trichy, Journal Indexed in JR, DIIF and SJIF.
Available online at: www.jirset.com

JIR IF : 2.54
SJIF IF : 4.334
Cosmos: 5.395

Volume 9, Issue 1 - January 2021 - Pages 7-38

| Device Utilization Summary |  |  |  |
| :---: | :---: | :---: | :---: |
| Slice Logic Utilization | Used | Available | Utilization |
| Number of Slice Registers | 253 | 12,480 | 2\% |
| Number used as Flip Flops | 253 |  |  |
| Number of Slice LUTS | 913 | 12,480 | 7\% |
| Number used as logic | 913 | 12,480 | 7\% |
| Number using 06 output only | 912 |  |  |
| Number using 05 output only | 1 |  |  |
| Number of route-thrus | 1 |  |  |
| Number using 06 output only | 1 |  |  |
| Number of occupied Slices | 364 | 3,120 | 11\% |
| Number of LUT Flip Flop pairs used | 918 |  |  |
| Number with an unused Flip Flop | 665 | 918 | 72\% |
| Number with an unused LUT | 5 | 918 | 1\% |
| Number of fully used LUT-FF pairs | 248 | 918 | 27\% |
| Number of unique control sets | 9 |  |  |
| Number of slice register sites lost to control set restrictions | 23 | 12,480 | 1\% |
| Number of bonded IOBS | 95 | 172 | 55\% |
| IOB Flip Flops | 1 |  |  |
| Number of BUFG/BUFGCTRLS | 1 | 32 | 3\% |
| Number used as BLJFGs | 1 |  |  |
| Average Fanout of Non-Clock Nets | 4.87 |  |  |

Table7.5a: Design summary of floating point multiplier using Wallace algorithm.

Table7.5a and 7.5b shows the slice utilization for floating point multiplication using Wallace and Dadda algorithm. As shown in the table, both Wallace and Dadda algorithm multiplication use $2 \%$ of the slice registers, LUTs, logics and slices. Information about map report and device utilization will give whether design fits into the device or not. As proposed design uses $2 \%$ of the available resources, one can tell that design fits into the Virtex-5.

International Journal on Recent Researches in<br>Science, Engineering \& Technology (IJRRSET)<br>A Journal Establisbed in early 2000 as National journal and upgraded to International journal in 2013 and is in existence for the last 10 years. It is rum by Retired Professors from NIT. Trichy, Journal Indexed in JIR, DIIF and SJIF.<br>Available online at: www.jirset.com

Volume 9, Issue 1 - January 2021 - Pages 7-38

| Device Utilization Summary |  |  |  |
| :---: | :---: | :---: | :---: |
| Slice Logic Utilization | Used | Available | Utilization |
| Number of Slice Registers | 253 | 12,480 | 2\% |
| Number used as Flip Flops | 253 |  |  |
| Number of Slice LUTs | 798 | 12,480 | 6\% |
| Number used as logic | 798 | 12,480 | 6\% |
| Number using 06 output only | 797 |  |  |
| Number using 05 output only | 1 |  |  |
| Number of route-thrus | 1 |  |  |
| Number using O6 output only | 1 |  |  |
| Number of occupied Slices | 301 | 3,120 | 9\% |
| Number of LUT Flip Flop pairs used | 805 |  |  |
| Number with an unused Flip Flop | 552 | 805 | 68\% |
| Number with an unused LUT | 7 | 805 | 1\% |
| Number of fully used LUT-FF pairs | 246 | 805 | 30\% |
| Number of unique control sets | 9 |  |  |
| Number of slice register sites lost to control set restrictions | 23 | 12,480 | 1\% |
| Number of bonded IOBs | 95 | 172 | 55\% |
| IOB Flip Flops | 1 |  |  |
| Number of BUFG/BUFGCTRLs | 1 | 32 | 3\% |
| Number used as BuFGs | 1 |  |  |
| Average Fanout of Non-Clock Nets | 4.87 |  |  |

Table7.5b: Design summary of floating point multiplier using dadda algorithm.

### 7.4.2 Timing Summary

The proposed solution processes data at a rate of 8 bytes per cycle at 47.083 MHz . Clock frequency is used to calculate throughput. Timing summary provides statistics on average routing delays and performance versus constraints.
> Timing summary for floating point multiplier using Wallace algorithm:
Speed grade:-2
Minimum period: 21.239 ns
Minimum input arrival time before clock: 4.20ns
Maximum output required time after clock: 2.826 ns
Maximum combinational path delay: No path found
$>$ Timing summary for floating point multiplier using Dadda algorithm:
Speed grade:-2
Minimum period: 20.797ns
Minimum input arrival time before clock: 4.200 ns
Maximum output required time after clock: 2.826 ns
Maximum combinational path delay: No path found

## 8. ADVANTAGES DISADVANTAGES AND APPLICATIONS:

### 8.1 Advantages:

1. Floating point multiplier using Wallace and dadda algorithm designs presented here are very lean and require less resource when implemented on Virtex-5.
2. Wallace and dadda multiplier algorithms have less delay.

International Journal on Recent Researches in Science, Engineering \& Technology (IJRRSET)<br>A Journal Established in early 2000 as National journal and upgraded to International journal in 2013 and is in existence for the last 10 years. It is run by Retired Professors from NTH, Trichy, Journal Indexed in JIR, DIIF and SJIF.<br>Available online at: www.jirset.com

ISSN $\Rightarrow: 2347-6729$
ISSN iam : 2348-3105
JIR IF : 2.54
SJIF IF : 4.334
Cosmos: 5.395

## Volume 9, Issue 1 - January 2021 - Pages 7-38

3. The number of logic levels required to perform the summation is reduced in Wallace and dadda algorithm compare to other multiplier algorithm techniques.
4. Wallace and dadda multipliers algorithms are faster because to generate less partial products these are adopt booth 3 algorithm. It uses smaller area and low power dissipation.
5. In both Wallace and dadda algorithm carry look ahead adders are used instead of carry select adders or ripple carry adders, so carry look ahead adder is one of the fastest adder and having more advantages among all the available adders.

### 8.2 Disadvantages:

1. Wallace and dadda algorithms are complex to layout in VLSI design and have irregular wires.

### 8.3 Applications:

1. High Speed Signal Processing that includes DSP based applications.
2. DWT and DCT transforms used for image and wide signal processing.
3. FIR and IIR Filters for high speed, low power filtering applications.
4. Multi-rate signal processing applications such as digital down converts and up converters

## CONLUSION

In the proposed work design of floating point multiplier using Wallace and Dadda algorithm with carry look ahead adder on FPGA is presented that is used for DSP applications. Modified booth3 algorithm is used to design fast multiplier. So floating point multiplier using Dadda algorithm with carry look ahead adder is faster than floating point multiplier using Wallace algorithm with carry look ahead adder. Inherently parallel design of algorithm allows an efficient hardware implementation. Dadda multiplier has smaller delay. The simulations and synthesis results of modules are provided.

IEEE 754 standard based floating point representation has been used. The unit has been coded in Verilog and has been synthesized. Carry look ahead adder is used in the design of final stage adder of Wallace and dadda tree used for mantissa multiplication and in the exponent addition. The dadda multiplier has less number of reduction stages and levels compared to other multiplier techniques.

Algorithms are designed using Xilinx ISE 14.2 design tool and implemented on Virtex-5. Synthesis report shows that proposed design achieves area and performance goals.
Comparison of synthesis report of floating point Wallace \& Dadda multipliers using carry look ahead adder:

|  | Floating point multiplication <br> using Wallace algorithm. | Floating point multiplication <br> using Dadda algorithm |
| :--- | :---: | :---: |
| No. of slices | $253 / 12480=2 \%$ | $253 / 12480=2 \%$ |
| No. of LUTs | $913 / 12480=7 \%$ | $798 / 12480=6 \%$ |
| Delay | 10.619 ns | 10.398 ns |

Table8.1 Delay comparison of Wallace and Dadda floating point multiplier

## FUTURE SCOPE

The designed floating point unit operates on 32-it operands. It can also design for 64 -bit operands to enhance precision. It can be extended to have more mathematical operations like addition, subtraction, division, square root, trigonometric, logarithmic and exponential functions. In future implementing higher compressors for Wallace tree and

# International Journal on Recent Researches in <br> Science, Engineering \& Technology (IJRRSET) <br> A Journal Established in early 2000 as National journal and upgraded to International journal in 2013 and is in existence for the last 10 years. It is run by Retired Professors from NTI, Triclyy, Journal Indexed in JIR, DIIF and SJIF. <br> Available online at: www.jriset.com 

ISSN $=2347-6729$
ISSN iam : 2348-3105
JIR IF : 2.54
SJIF IF : 4.334
Cosmos: 5.395

Volume 9, Issue 1 - January 2021 - Pages 7-38
Dadda tree used for mantissa multiplication can further increase the efficiency of the floating point multiplier in terms of speed.

A few researchers have shown that there is a considerable improvement in the delay by using higher order 6:2, 7:2, 9:2 compressors for Wallace tree but no paper for Dadda tree. Exceptions like overflow, underflow, inexact, division by zero, infinity, NAN etc are incorporated in the floating point multiplier.

## Bibliography

1. Rizwan Mudassir, et al., "Switching Activity Reduction in Low Power Booth Multiplier'", IEEE, 2008.
2. M. 0. Lakshmanan, Alauddin Mohd Ali, "High Performance Parallel Multiplier Using Wallace-Booth
3. Algorithm," IEEE International Conference on Semiconductor Electronics, pp. 433-436, 2002.
4. Masataka Matsui and James B. Burr, "A Low-Voltage $32 \times 32$-Bit Multiplier in Dynamic Differential Logic,IEEE", 1995.
5. Nishant Bano et. al / VSRD International Journal of Electrical, Electronics \& Comm. Engg."Analysis of various 16-bit Booth Multipliers", Vol. 2 (1), 2012
6. Robert F. Shaw, "Arithmetic Operations in a Binary Computer," Review of Scientific Instruments, vol. 21, pp. 687-693, 1950.
7. J. C. Majithia and R. Kitai, "An Iterative Array for Multiplication of Signed Binary Numbers," IEEE Transactions on Electronic Computers, vol. EC-13, pp. 14-17, 1964.
8. R. De Mori, "Suggestions for an I.C. Fast Parallel Multiplier," Electronics Letters, vol. 5, pp. $50-51,1965$.
9. H. H. Guild, "Fully Iterative Fast Array for Binary Multiplication," Electronics Letters, vol. 38, pp. 843-852, 1968.
10. A.D.Pezaris, "A 40ns 17-bit by 17-bit Array Multiplier," IEEE Transactions on Computers, vol. C-20, pp. 442-447, 1971.
11. Edwin de Angel and Earl E. Swartzlander, Jr., "An Ultra Low Power Multiplier," International Conference on Signal Processing Applications and Technology, pp. 2118-2122, 1995.
12. S. Shafiulla Basha1, Syed. Jahangir Badashah, "Design and Implementation of radix-4 based high speed multiplier for alu's using minimal partial products" July 2012.314 Vol. 4, Issue 1, pp. 314-325
13. P. R. Cappello and K Steiglitz, "A VLSI layout for a pipe-lined Dadda multiplier," ACM Transactions on Computer Systems 1,2(May 1983) ,pp. 157-17
14. Baugh, Charles R.; Wooley, B.A., "A Two's Complement Parallel Array Multiplication Algorithm," Computers, IEEE Transactions on , vol.C-22, no.12, pp.1045,1047, Dec. 1973
15. Wallace, C. S., "A Suggestion for a Fast Multiplier," Electronic Computers, IEEE Transactions on, vol.EC-13, no.1, pp.14,17, Feb. 1964
16. L. Dadda, "Some schemes for parallel multipliers," Alta Frequenza, vol. 34, pp. 349-356, 1965

International Journal on Recent Researches in Science, Engineering \& Technology (IJRRSET)

ISSN $\Rightarrow: 2347-6729$
ISSN oum : 2348-3105
JIR IF : 2.54
SJIF IF : 4.334
Cosmos: 5.395

## Volume 9, Issue 1 - January 2021 - Pages 7-38

17. Townsend, W. Swartzlander, E. Abraham, J., "A Comparison of Dadda and Wallace Multiplier Delays". SPIE Advanced Signal Processing Algorithms, Architectures, and Implementations XIII.
18. Kogge, Peter M.; Stone, Harold S., "A Parallel Algorithm for the Efficient Solution of a General Class of Recurrence Equations," Computers, IEEE Transactions on, vol.C-22, no.8, pp.786,793, Aug. 1973
19. Brent, Richard P.; Kung, H. T., "A Regular Layout for Parallel Adders," Computers, IEEE Transactions on , vol.C-31, no.3, pp.260,264, March 1982
20. Han Tackdon, Carlson, D.A., "Fast area-efficient VLSI adders," Computer Arithmetic (ARITH), 1987 IEEE 8th Symposium on, vol., no.,pp.49,56, 18-21 May 1987
