Design a micro-architecture for a multiplier

Design a full precision multiplier that inputs two 401-bit unsigned operands and outputs their 802-bit product.
The design must be mathematically correct and should consider the following optimization parameters:
The design is for a Xilinx Ultrascale+ device and should be focused around sensible usage of the DSP48E2 IP.

Key considerations should include:
DSP48E2 capabilities and utilization Input and output interfacesInterconnecting and cascadingPeripherals such as BRAM and CLB Pipelining and foldingIt is strongly desired for a solution to consider multiple architectures and discuss the tradeoffs between them.

The solution should be presented in the form of detailed block diagrams. Online tools such as can be used.

To apply, please send your answer to