Achieving Performance Speed-up in FPGA Based Bit-Parallel Multipliers using Embedded Primitive and Macro support

Burhan Khurshid, Roohie Naaz Mir


Modern Field Programmable Gate Arrays (FPGA) are fast moving into the consumer market and their domain has expanded from prototype designing to low and medium volume productions. FPGAs are proving to be an attractive replacement for Application Specific Integrated Circuits (ASIC) primarily because of the low Non-recurring Engineering (NRE) costs associated with FPGA platforms. This has prompted FPGA vendors to improve the capacity and flexibility of the underlying primitive fabric and include specialized macro support and intellectual property (IP) cores in their offerings. However, most of the work related to FPGA implementations does not take full advantage of these offerings. This is primarily because designers rely mainly on the technology-independent optimization to enhance the performance of the system and completely neglect the speed-up that is achievable using these embedded primitives and macro support. In this paper, we consider the technology-dependent optimization of fixed-point bit-parallel multipliers by carrying out their implementations using embedded primitives and macro support that are inherent in modern day FPGAs. Our implementation targets three different FPGA families viz. Spartan-6, Virtex-4 and Virtex-5. The implementation results indicate that a considerable speed up in performance is achievable using these embedded FPGA resources.

Full Text:



G. L. Narayan and B. Venkataramani, “ Optimization Techniques for FPGA based Wave Pipelined DSP Blocks,” IEEE Transc.Very Large Scale Integr. (VLSI) syst., vol. 13, No. 7, pp. 783-792, July 2005.

M. A. Ashour and H. I. Saleh, “An FPGA Implementation guide for some different types of Serial-Parallel Multiplier Structures,” Microelectronics Journal, vol. 31, pp. 161-168, 2000.

K. Compton, S. Hauck, “Reconfigurable Computing: A survey of Systems and Software,” ACM Computing Surveys, vol. 34, No. 2, pp. 171-210, June 2002.

R. Tessier, W. Burleson, “Reconfigurable Computing and Digital Signal Processing: Past, Present and Future,” Programmable Digital Signal Processors, Yu Wen Hue d, Marcel Dekker, pp. 147-186, 2002.

K. K. Parhi, "VLSI Digital Signal Processing Systems Design and Implementation," Wiley, 1999.

S. Shanthala and S. Y. Kulkarni, “VLSI Design and Implementation of Low Power MAC Unit with Block Enabling Technique,” European Journal of Scientific Research, ISSN 1450-216X, vol. 30, No. 4, pp. 620-630, 2009.

K. H. Chen, Y. H. Chen and Y. S. Chu, “A Versatile Multimedia Functional Unit Design using the Spurious Power Suppression Technique,” in Proc. IEEE Asian Solid-State Circuits conf., 2006, pp. 111-114.

R. Woods, J. McAllister, G. Lightbody and Y. Yi, “FPGA-based Implementation of Signal Processing Systems,” Wiley, 2008.

Z. Guo, W. Najjar, F. Vahid and K. Vissers, “A Quantitative Analysis of the Speed up Factors of FPGAs over Processors,” in Proc. Int. Symp. on FPGAs, ACM Press, 2004.

K. Underwood “FPGAs vs. CPUs: Trends in Peak Floating-Point Performance,” in Proc. Int. Symp. on FPGAs, ACM Press, 2001.

G. Stitt, F. Vahid and S. Nematbakhsh, “Energy Savings and Speed ups from Partitioning Critical Software Loops to Hardware in Embedded systems,” ACM Transc. Embedded Comput. Systems, vol. 3, pp. 218-232, 2004.

R. Tessier and W. Burleson, “Reconfigurable Computing for DSP: A Survey,” Journal of VLSI Signal Processing, vol. 28, pp. 7-27, 2001, Kluwer Academic Publisher.

T. J. Todman, G. A. Constantinides, S. J. E. Wilton, O. Mencer, W. Luk and P. Y. K. Cheung, “Reconfigurable Computing: Architecture and Design Methods,” in IEEE Proc. Comput. Digit. Tech., vol. 152, No. 2, March 2005.

K. S. Hemmert and K. D. Underwood, “Fast, Efficient Floating-Point Adders and Multipliers for FPGAs,” ACM Transactions on Reconfigurable Technology and Systems, vol. 3, No. 3, Article 11, September 2010.

G. Quan, J. P. Davis, S. Devarkal and D. A. Buell, “High-Level Synthesis for Large Bit-Width Multipliers on FPGAs: A Case Study,” ACM 2005.

S. Kilts, “Advanced FPGA Design Architecture, Implementation, and Optimization,” Wiley 2007.

S. Ramachandran “Digital VLSI Systems Design: A Design Manual for Implementation of Projects on FPGAs and ASICs using Verilog,” Springer, 2011.

M. Shand, P. Bertin, and J. Vuillemin, “Hardware Speedups in Long Integer Multiplication,” Computer Architecture News, vol. 19, No. 1, pp. 106–114, 1991.

L. Louca, T. A. Cook, and W. H. Johnson, “Implementation of IEEE Single Precision Floating Point Addition and Multiplication on FPGAs,” in ACM/SIGDA International Symposium on Field Programmable Gate Arrays, Monterey, CA, pp. 107–116, Feb. 1996.

F. de Dinchin and V. Lefèvre, “Constant Multipliers for FPGAs,” in Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications, H.R. Arabnia (Ed.), CSREA Press, vol. I, pp. 167–173, June 2000.

T. Courtney, R. Turner, and R. Woods, “Multiplexer Based Reconfiguration for Virtex Multipliers,” in Field-Programmable Logic and Applications. Proceedings of the 9th International Workshop, FPL 2000, pp. 749–758, 2000.

T. Courtney, R. Turner, and R. Woods, “An Investigation of Reconfigurable Multipliers for use in adaptive Signal Processing,” in Proceedings of the IEEE Symposium on FPGAs for Custom Computing Machines (FCCM ’00), IEEE Computer Society Press, pp. 341–343, April 2000.

A. F. Tenca, M. D. Ercegovac, and M. E. Louie, “Fast On-Line Multiplication Units Using LSA Organization,” in Proceedings of the International Society of Optical Engineering (SPIE). Visual Communications and Image Processing. Real-Time Signal Processing, vol. 3807, pp. 74–83, 1999

C. Wallace, ”A Suggestion for a Fast Multiplier,” IEEE Transactions on Electronic Computers, 13:14–17, 1964.

Z. Wang and W. C. Miller, “A new Design Technique for Column Compression Multipliers,” IEEE Transactions on Computers, vol. 44:962–970, 2005.

F. Cheng and M. Theobald, ”Design of Synchronous Variable Latency Pipelined Multipliers.,” IEEE Transaction on Computers, vol. 49: 659-672,2005.

Z. Huang, “High Level Optimization Techniques for Low Power Multiplier Design” Ph.D. Thesis, University of California, los angels, 2003.

C. H. Chang and R. K. Satzoda, “A Low Error and High Performance Multiplexer-Based Truncated Multiplier,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, Vol. 18, No. 12, December 2010.

S. S. Kidambi, F. E. Guibaly and A. Antoniou, “Area-Efficient multipliers for Digital Signal Processing Applications,” IEEE Transactions on Circuits and Systems –II: Analog & Digital Signal Processing, Vol. 43, No. 2, February 1996.

J. E. Stine and O. M. Duverne, “Variations on Truncated Multiplication,” Proceedings of the Euromicro Symposium on Digital System Design, 2003.

Y. M. Motey and T. G. Panse, “Hardware Implementation of Truncated Multiplier Based on Multiplexer Using FPGA,” International conference on Communication and Signal Processing, April 3-5, 2013.

H. Park and E. E. Swartzlander, “Truncated Multiplications for the Negative Two's Complement Number System,” 49th IEEE International Midwest Symposium on Circuits and Systems, San Juan, August 6-9, 2006.

J. Valls and E. Boemo, “Efficient FPGA Implementation of Two’s Complement Digit-Serial/Parallel Multipliers,” IEEE Transactions on Circuits and Systems-II: Analog and Digital Signal Processing, Vol. 50, No. 6, June 2003.

G. Zhou, L. Li and H. Michalik, “Area optimization of bit parallel finite field multipliers with fast carry logic on FPGAS,” International Conference on Field Programmable Logic and Applications, 2008.

S. Gao, D. A. Khalili and N. Chabini, “Efficient Scheme for Implementing Large Size Signed Multipliers Using Multigranular Embedded DSP Blocks in FPGAs,” International Journal of Reconfigurable Computing Vol. 2009, Article ID 145130, Hindawi Publishing Corporation.

C. Ingemarsson, P. Kallstrom and O. Gustafsson, “Using DSP block pre-adders in pipeline SDF FFT implementations in contemporary FPGAs,” 22nd International Conference on Field Programmable Logic and Applications, August 2012.

C. R. Baugh and B. Wooley, “A two’s Complement Parallel Array Multiplication Algorithm,” IEEE Trans. On Computers, Vol. C-22, No. 12, pp. 1045-1047, Dec. 1973.


P. K. Meher, S. Chanderasekaran and A. Amira, "FPGA Realization of FIR Filters by Efficient and Flexible Systolization using Distributed Arithmetic," IEEE Transactions on Signal Processing, Vol. 56, No. 7, July 2008.



  • There are currently no refbacks.