This document describes a proposed MAC (multiply and accumulate) unit design that aims to improve performance and reduce resource usage compared to conventional pipeline MAC unit designs. The proposed design uses Booth encoding to reduce the number of partial products, groups the partial products into blocks that are added using multi-operand adders, and implements circular convolution by rearranging the partial products. Simulation results show that the proposed design achieves higher performance and lower resource usage than conventional pipeline and redundant carry-save MAC unit designs. The design is synthesized on an Altera Stratix III FPGA to take advantage of fast carry chains.