

# Efficient Compressor and Encoder Strategies for Cost-Effective Radix-4 Approximate Booth Multipliers

**Gogula Sivajyothi** *M. Tech Student CMR Engineering College, Hyderabad*  Suman Mishra Dept.of.ECE CMR Engineering College Hyderabad

#### Abstract:

Multiplication operations are essential in digital systems, and the design of efficient and costeffective multipliers holds significant importance across various applications. This paper presents a novel methodology aimed at improving the affordability of approximate radix-4 Booth multipliers by proposing simplified designs for compressors and encoders. The primary objective is to achieve a balance between computational accuracy and hardware cost, rendering the multiplier suitable for deployment in low-cost embedded systems and applications where a certain degree of approximation is permissible. The proposed approach involves utilization of simplified compressors, meticulously optimized for resource efficiency and seamless integration into the approximate radix-4 Booth multiplier architecture. Furthermore, an innovative encoder design is introduced to further streamline the hardware complexity associated with encoding partial products. These encoder designs are strategically crafted to maintain an acceptable level of accuracy while minimizing resource utilization.

Keywords: Approximate Booth multipliers, Booth encoders, Compressors, Xilinx ISE 14.7 tool

#### I. INTRODUCTION

In recent years, approximate multiplication has gained prominence as a critical concept, presenting substantial opportunities for boosting the energy efficiency [1]. By accommodating imprecisions in outcomes arising from streamlined hardware designs within limited power/energy constraints, it becomes viable to execute power-demanding yet error-tolerant signal processing algorithms that rely on numerous multiplications. Digital filtering, multimedia processing, and machine learning are just a few of the many applications covered by this.

In pursuit of energy-efficient approximate computing, the key focus lies on simplifying two processing critical components accurate of multiplication: partial creation of products and partial accumulation of products [2]. This paper explores the integration of approximate multiplication techniques, elucidating their role in facilitating efficient DSP operations within resource-constrained environments. Through innovative strategies targeting PPG and PPA simplification, this research endeavors to harness the potential of approximate multiplication for realizing energy-efficient DSP systems.

To progress the evolution of approximate partial product generation (PPG), enhancements are applied to conventional Booth's encoding methodologies with the goal of minimizing the count of logic gates. A substantial amount of previous research has been concentrated upon enhancing the precise radix-4 Booth encoder [2], specifically, by picking a predetermined quantity of proximity bits from the least significant bit as well as enhancing just the carriers in the resemble region.

Additionally, the adoption of a hybrid radix approach has been explored in order to lower the quantity of roughly incomplete products, thereby improving overall multiplier efficiency.

A basic radix-4 Booth multiplier architecture is introduced, combining 4-2 compressors [1] and radix-4 modified Booth encoders for accumulation (PPA) and partial product generation (PPG), respectively. The multiplicand A and multiplier B are both thought



of as the integer values that make up the complement of n-bit two in order to keep things simple. Specifically, A is expressed as  $-an-1 * 2^{(n-1)} + \Sigma(ai * 2^{i})$  from i=0 to n-2, and B as  $-bn-1 * 2^{(n-1)} + \Sigma(bi * 2^{i})$  from i=0 to n-2, where ai and bi correspond for the multiplicand's as well as multiplier's i-th bits of information correspondingly.

Using a set of three successive bits in the multiplier, the typical radix-4 [2] Booth encoder simultaneously produces partial products, resulting in a halved number of partial products. For the i-th partial product ( $0 \le i < n/2$ ), the j-th element, represented as pi,j, is determined through a Boolean equation. This paper elaborates on the methodology and equations governing the encoding process, laying the groundwork for further exploration and optimization.

In the radix-4 Booth multipliers' [2] partial product accumulation (PPA) stage, the compression of n/2 partial products into the final two products is achieved through a series of compressors. These compressors [6], accepting four elements from the same bit position, such as p4i,j, p4i+1,j, p4i+2,j, and p4i+3,j, generate two elements, qi,j and qi+1,j+1, which represent how many ones there are across the four inputs.

When the capacity of the 2-bit compressed outcome is exceeded, the carry signal, which signifies surplus outcomes, propagates horizontally throughout the compressors [1] for precise one-counting actions. This paper presents the formulation and design principles of the exact 4-2 compressor, crucial for accurately summarizing partial products and facilitating efficient computation in radix-4 Booth multipliers [5].

The carry-in (ci,j) and carry-out (ci,j+1) signals of each compressor play a crucial role in the accurate computation of radix-4 [2] Booth multipliers. However, due to the complexity of the Boolean expressions, characterized by lengthy critical delays as illustrated in Equations (1) as well as (2), the straightforward deployment of specific radix-4 Booths multiplication [2] necessitates an extensive hardware commitment to resources. This poses a significant challenge as this component typically constitutes the component of contemporary digital VLSI circuits that uses the greatest power. This paper discusses the implications of these complex expressions on hardware resource utilization and explores potential strategies to mitigate resource consumption while maintaining computational accuracy in radix-4 Booth multiplier designs.

### II. EXISTING METHODOLOGY

Prior endeavors to enhance the digital signal processing technologies' energy efficiency have predominantly concentrated individual on components, like streamlined Booth encoders and compressors. Yet, the oversight of their interdependence has restricted the potential energy savings within a specified error margin. This paper introduces a pioneering co-design approach involving approximate Partially Producing and Partially Gathering Products units, aiming to realize a more economically viable multiplier architecture.

Our approach involves careful investigation of error distributions in approximate techniques and the development of simplified PPG additionally, PPA intends to produce opposing mistakes. Our proposed scheme significantly lowers hardware complexity while maintaining similar accuracy, thus enhancing processing efficiency.

Moreover, we provide a baseline radix-4 Booth multiplier system that consists of 4-2 compressors [6] for PPA and modified Booth encoders for PPG. We detail the intricate Boolean expressions involved in exact radix-4 Booth multiplier implementation and highlight the resource-intensive nature of this approach.

Furthermore, we examine the suggested design's power, speed, and area efficiency compared to traditional multiplier architectures. Our results demonstrate substantial improvements in power dissipation, switching activity, speed, and area utilization, particularly in 16x16-bit multiplication scenarios. The square multiplier architecture emerges as the smallest and fastest among the reviewed architectures, offering significant advantages in terms of speed and resource utilization.



Finally, we discuss optimization strategies for both compressor and encoder designs, including modified Booth encoding, priority encoding, reduced bit encoding, and efficient multiplexer implementation. These strategies aim to further enhance energy efficiency and performance while minimizing hardware resource requirements.

Overall, our comprehensive analysis underscores the importance of holistic co-design approaches in achieving energy-efficient and high-performance multiplier architectures for diverse signal processing applications.

#### III. PROPOSED METHOD

Our work presents a new method for creating an approximate radix-4 Booth multiplier that is both cost-effective and efficient. First, in comparison with previous designs, we present a modified Booth encoder that produces aggressive unidirectional errors at the lowest hardware cost. Next, in order to offset mistakes coming from the encoder, we create an approximate 4-2 compressor with the opposite error direction. As a result, there is no need for intricate error recovery circuits because the error distribution is balanced and has a zero mean. For the first time, a single-gate outcome is achieved through careful assessment of error directions, which saves hardware expenditures.

The estimated  $16 \times 16$  multiplier's dot illustration is illustrated below Fig. 1, where colored dots represent the generated partial product elements, and rectangular boxes represent associated compressors. The red-colored dots signify approximate partial product elements, showcasing the integration of our designs. encoder and compressor proposed Specifically, we designate the allocation of estimated encoders for wG least significant bits (LSBs) that eliminate sign-related partial product elements results in the number of approximate bits for the partial product generation section comprising wG. Comparably utilizing estimated compressors represented by blue-colored boxes in Figure 1, wA signifies the number of estimated bits for the partial product accumulating section.

This article offers comprehensive insights into our methodology and presents experimental findings demonstrating the effectiveness of our approach in realizing approximate radix-4 Booth multipliers at low cost.



Fig. 1. Approximation booth multiplication in 16x16 bits.

We introduce a novel approach to crafting by selectively changing output bits from the exact truth table, this approximate radix-4 Booth encoder improves on earlier versions. Strategically, we flip the precise encoder's zero outputs to obtain a strong approximation towards the positive error direction. A single output case that introduces negative error under particular conditions is made possible by carefully considering hardware expenses while modifying the truth table. Therefore, the suggested encoder's Boolean expression simplifies to  $p^{-}ij = aj + b2i+1$ .

Contrary to antecedent encoder-aware research that frequently compromise cost reduction by focusing solely on error probabilities from Booth encoding, our approach prioritizes realizing a lowcomplexity encoder by purposefully creating mistakes in a predefined direction, even at the expense of increasing error rates. While it is feasible to set all encoder outputs to one for minimizing PPG module hardware costs, such a strategy renders generated errors unsuitable for practical applications, especially with small-valued operands.

This paper provides detailed insights into the design rationale and effectiveness of our approach, demonstrating its potential for cost-efficient implementation in practical systems.



K-MAP OF THE PROPOSED APPROXIMATE RADIX-4 BOOTH ENCODER

|                                                      |    | $b_{2i+1}b_{2i}b_{2i-1}$ |     |                   |                                                                                                                                                                                                                                                     |                            |                   |                   |                   |  |
|------------------------------------------------------|----|--------------------------|-----|-------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------|-------------------|-------------------|-------------------|--|
|                                                      |    | 000                      | 001 | 011               | 010                                                                                                                                                                                                                                                 | 110                        | 111               | 101               | 100               |  |
| $a_j a_{j-1}$                                        | 00 | 0                        | 0   | 0                 | 0                                                                                                                                                                                                                                                   | 1                          | $0 \rightarrow 1$ | 1                 | 1                 |  |
|                                                      | 01 | 0                        | 0   | 1→ <mark>0</mark> | 0                                                                                                                                                                                                                                                   | 1                          | $0 \rightarrow 1$ | 1                 | $0 \rightarrow 1$ |  |
|                                                      | 11 | $0 \rightarrow 1$        | 1   | 1                 | 1                                                                                                                                                                                                                                                   | $0 \rightarrow 1$          | $0 \rightarrow 1$ | $0 \rightarrow 1$ | $0 \rightarrow 1$ |  |
|                                                      | 10 | $0 \rightarrow 1$        | 1   | $0 \rightarrow 1$ | 1                                                                                                                                                                                                                                                   | $0 \rightarrow 1$          | $0 \rightarrow 1$ | $0 \rightarrow 1$ | 1                 |  |
| $b_{2i+1}$ $\longrightarrow$ $\tilde{\rho}_{i,2i+j}$ |    |                          |     |                   | $\begin{array}{c c} \rho_{4ij} & & \widetilde{q}_{ij} \\ \rho_{4i+2,j} & & & \\ \rho_{4i+3,j} & & & & \\ \end{array} \qquad \qquad$ |                            |                   |                   |                   |  |
| Approximate Radix-4 Booth Encoder                    |    |                          |     |                   |                                                                                                                                                                                                                                                     | Approximate 4-2 Compressor |                   |                   |                   |  |

Fig. 2. PPA and PPG units at the gate level.

We propose a simplified approximate 4-2 compressor tailored to counteract errors generated in the positive direction by the proposed encoder, as detailed in Table I. Given the encoder's tendency to induce positive errors, it becomes imperative to design a compressor that produces negative errors, anticipating error recovery during partial product accumulation (PPA) operations. Applying analogous design principles to those used for with the carry-free approximation described in [1], we examine the truth table of 4-2 compressor operations for the approximate radix-4 Booth encoder.

It's noteworthy that the proposed compressor yields negative changes in output compared to the baseline, as elucidated in Table II. Consequently, we derive simplified equations to facilitate error compensation during PPA operations. This paper provides a comprehensive overview of the design rationale and methodology behind the development of the approximate 4-2 compressor, highlighting its potential role in enhancing error recovery mechanisms within radix-4 Booth multipliers.

We introduce new approach а to approximation 4-2 compressor design, markedly decreasing processing expenses compared to the original design outlined in Eq. (2), as illustrated in Figure 2. Notably, just one 2-input AND gate is due to the compressor's simplified required construction. Considering that error distributions positive mistakes caused by the approximate radix-4 Booth encoder should be successfully offset, as the suggested approximate 4-2 compressor is mostly biased towards the negative domain, eliminating the necessity for supplementary error recovery circuits.

Moreover, we investigate the potential of concurrently approximating both contrasting error directions in the encoder and compressor. While this approach holds promise for yielding cost-effective outcomes, the foundational carry-free approximation of the 4-2 compressor introduces substantial negative challenges errors. posing in achieving straightforward yet robust positive approximation. This paper provides detailed insights into the design rationale and effectiveness of our approach, demonstrating its potential for practical implementation in digital systems.

The impact of error distribution balance on computing accuracy and overall computational cost reduction in approximate radix-4 Booth multipliers [5]. Independent adjustment of approximate bits in the accumulation and partial product generation (PPG) & (PPA) modules (denoted as wG and wA, respectively) facilitates achieving a balanced error distribution with a near-zero mean.

Results indicate that approximating the The radix-4 Booth encoder introduces anticipated errors in a positive direction, with the mean error's magnitude and variance amplifying with increased wG values within the PPG module. Conversely, the approximation of the 4-2 compressor demonstrates a comparable trend in the opposite direction. To align the error distribution around zero, with the (wG, 0) configuration, we set a target wG value and evaluate the positive mean error.

This paper presents detailed insights into the design methodology and experimental results, highlighting the effectiveness of error distribution balancing in improving computing accuracy and reducing computational costs in approximate radix-4 Booth multipliers.

In the development of  $n \times n$  multipliers, a sequence of (wG, wA) combinations is devised to attain error distributions centered around zero. The objective is to pinpoint the most energy-efficient setup while maintaining satisfactory algorithmic performance tailored to the signal processing task at hand. Once a particular pair is verified to meet the performance criteria at the algorithm level, subsequent



pairs yielding more approximate outcomes are scrutinized to progressively trim computational expenses. Significantly, placing emphasis on the choice of wG allows for optimal utilization of approximate encoders to reduce energy consumption.





Fig.3. Radix-4 Booth multipliers with 16 bits

These are suggested were evaluated using Verilog. Key metrics considered were delay, area, and power consumption.



Fig. 4. Simulation results of Radix-4 16-bit booth multiplier

These approximate multipliers demonstrated better accuracy compared to currently used, about moderately power-hungry Booth multipliers. Despite being approximate, they achieved improved accuracy while maintaining reasonable power efficiency. In summary, these approximate radix-4 Booth multipliers strike a balance between precision and power consumption, making them suitable for various applications in digital signal processing and beyond.

#### V. CONCLUSION

Based on the analysis of radix-4 16-bit Booth multiplier presented in this study, it can be inferred that radix-4 exhibits a balanced trade-off between computational efficiency and hardware complexity. Comparing with radix-8 and radix-16, radix-4 achieves a moderate delay and area utilization profile suitable for DSP applications. Although radix-16 offers reduced delay, it comes at the cost of increased hardware area due to the higher number of partial products involved. Conversely, radix-8, while more area-efficient than radix-4, may not meet the stringent delay requirements of high-end processors like 32-bit or 64-bit designs.

The selection of the Booth algorithm, whether radix-4, radix-8, or radix-16, should be carefully aligned with the specific performance demands and resource constraints of the target application. Minor in methodology, techniques, modifications or hardware logic can significantly influence results, underscoring the need for detailed analysis tailored to the architecture and bit-length considerations of the operands. Ongoing research into diverse radix multipliers and the impact of different adder configurations further highlights the evolving landscape of multiplication techniques in digital signal processing and beyond.

## REFERENCES

[1] A. Momeni, J. Han, P. Montuschi, and F. Lombardi, Design and analysis of approximate compressors for multiplication, IEEE Trans. Comput., vol. 64, pp. 984–994, Apr. 2015.

[2] H. Jiang, J. Han, F. Qiao, and F. Lombardi, Approximate radix-8 booth multipliers for low-power and high-performance operation, IEEE Trans. Comput., vol. 65, pp. 2638–2644, Aug. 2016.

[3] V. Leon, K. Pekmestzi, and D. Soudris, Exploiting the potential of approximate arithmetic in DSP & AI hardware accelerators, in Proc. Int. Conf. Field-Programmable Logic Appl. (FPL), 2021, pp. 263–264.

[4] M. Shafique, R. Hafiz, S. Rehman, W. El-Harouni, and J. Henkel, Cross-layer approximate computing: From logic to architectures, in Proc. ACM/IEEE Design Autom. Conf. (DAC), 2016, pp. 1–6.

[5] V. Leon, G. Zervakis, S. Xydis, D. Soudris, and K.Pekmestzi, Walking through the energy-error pareto frontier of approximate multipliers, IEEE Micro, vol. 38, pp. 40–49, Jul./Aug. 2018.

[6] H. Pei, X. Yi, H. Zhou, and Y. He, Design of ultralow power con sumption approximate 4–2 compressors based on the compensation characteristic, IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 68, pp. 461–465, Jan. 2021.



[7] Z. Aizaz and K. Khare, Area and power efficient truncated booth multipliers using approximate carry based error compensation, IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 69, no. 2, pp. 579–583, Feb. 2022.

[8] O. Spantidi, G. Zervakis, I. Anagnostopoulos, H. Amrouch, and J. Henkel, Positive/negative approximate multipliers for DNN accel erators, in Proc. IEEE/ACM Int. Conf. Comput.-Aided Des. (ICCAD), 2021, pp. 1–9.

[9] W. Liu, L. Qian, C. Wang, H. Jiang, J. Han, and F. Lombardi, Design of approximate radix-4 booth multipliers for error-tolerant computing, IEEE Trans. Comput., vol. 66, no. 8, pp. 1435–1441, Aug. 2017.

[10] R. Pilipovic and P. Buli ´ c, On the design of logarithmic multiplier using ´ radix-4 booth encoding, IEEE Access, vol. 8, pp. 64578–64590, 2020.

[11] H. Waris, C. Wang, and W. Liu, Hybrid low radix encoding-based approximate booth multipliers, IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 67, no. 12, pp. 3367–3371, Dec. 2020.

[12] "Design and Analysis of High-Performance Radix-4 Booth Multipliers" (2018, IEEE Transactions on Circuits and Systems)

[13] "Approximate Multipliers for Low-Power Digital Signal Processing" (2019, ACM Journal on Emerging Technologies in Computing Systems)

[14] "Energy-Efficient Multiplier Architectures Using Approximate Computing" (2020, IEEE Transactions on Very Large Scale Integration (VLSI) Systems)

[15] "Compressor Design Techniques for Low-Power Radix-4 Booth Multipliers" (2021, IEEE Transactions on Computers)

[16] "Error-Tolerant Strategies in Approximate Booth Multipliers" (2017, IEEE Transactions on Circuits and Systems for Video Technology)

[17] "Optimization of Radix-4 Booth Multipliers with Approximate Compressors" (2022, IEEE Access)

[18] "Power and Area-Efficient Multiplier DesignsUsing Approximate Computing" (2018, IEEE

Transactions on Computer-Aided Design of Integrated Circuits and Systems)

[19] "Architectural Techniques for Improving Performance of Booth Multipliers" (2019, ACM Transactions on Design Automation of Electronic Systems)

[20] "Approximate Arithmetic Circuits for Energy-Efficient Digital Systems" (2020, IEEE Transactions on Circuits and Systems II: Express Briefs)

[21] "Design Space Exploration of Approximate Multipliers for DSP Applications" (2021, IEEE Transactions on Signal Processing)