Commit 9a38ae89 authored by Masood Raeisi's avatar Masood Raeisi
Browse files

updated document

parent e19cc105
No preview for this file type
......@@ -125,7 +125,7 @@ make
This will automatically download and compile VPR. Do not use the option $-j$ since it will lead to an error during compilation.
In the directory \textbf{\textit{example}}, you will find a exemplary ZUMA configuration file \textbf{\textit{zuma\_config.py}}.
In the downloaded ZUMA source code in directory \textbf{\textit{example}}, you will find a exemplary ZUMA configuration file \textbf{\textit{zuma\_config.py}}.
The most important options in \textbf{\textit{zuma\_config.py}} are the following options:
\begin{lstlisting}
......@@ -160,8 +160,6 @@ build/packedOverlay.v (structured overlay)
build/packedOverlayBlackBox.v (a blackboxed overlay)
\end{lstlisting}
% ZUMA\_custom\_generated file
For further information on ZUMA overlay and config file, you can refer to ZUMA documentation.
\subsection{RapidWright}
......@@ -224,7 +222,7 @@ params.sdfFileName = "../timing.sdf"
\section{Design Flows}\label{sec:design_flows}
In this section we explain the methods we used to restructure the ZUMA overlay.
The first step for all methods is to decide for a hierarchical model of the virtual architecture in the Verilog design generated by ZUMA. We studied three hierarchical models: cluster, intercluster and hierarchical. In cluster mode, ZUMA will add only one level of hierarchy and puts all nodes inside each cluster into respective Verilog modules. This is depicted in Figure~\ref{fig:cluster}. In intercluster, ZUMA will create separate modules for each interconnect and BLE (see Figure~\ref{fig:intercluster}). In hierarchical mode, ZUMA generates a hierarchical design (see Figure~\ref{fig:hierarchy}).
The first step for all methods is to decide for a hierarchical model of the virtual architecture in the Verilog design generated by ZUMA. We studied three hierarchical models: cluster, intercluster and hierarchical. In cluster mode, ZUMA will add only one level of hierarchy and puts all nodes inside each cluster into respective Verilog modules. This includes BLEs and interconnects inside the respective cluster. This is depicted in Figure~\ref{fig:cluster}. In intercluster, ZUMA will create separate modules for each inside cluster interconnect and BLEs (see Figure~\ref{fig:intercluster}). In hierarchical mode, ZUMA generates a hierarchical design (see Figure~\ref{fig:hierarchy}). In this design, ZUMA Overlay will form separate module per cluster and inside each cluster, it instantiate the BLEs and interconnect modules that belong to that cluster.
\begin{figure*}[ht]
\centering
......@@ -265,7 +263,7 @@ The code first implements the ZUMA overlay in order to extract the utilization r
Offset variables will add some additional rows or columns to the module's pblock.
Then, the code will generate the pblocks for each cluster (or interconnects and BLEs) and assigns the modules to pblocks. In the end, it will implement the design, using the assigned pblocks constraints.
% {The Verilog files inside Verilog folder are from ZUMA}
For simplicity of tcl scripts the verilog source codes in verilog folder are borrowed from ZUMA project.
In all three automated codes, first we implement the ZUMA overlay without any floorplaning or OOC technique to generate the utilization report. In next step we parse the utilization report to estimate the size of the pblock, needed to implement each cluster, intercluster or BLE.
......@@ -343,6 +341,7 @@ javac ZUMAPlacer.java
\section{Experimental Setup}\label{sec:experimental_setup}
We have tested the three methods with all three hierarchical model with a different configurations. The configurations for the tests are shown in Table~\ref{tab:tests}.
For all tests, \textbf{\textit{interconn\_distance}} and \textbf{\textit{ble\_distance}} are set to $1$.
We also evaluated one special case for $2\times 2$ Virtual FPGA Overlay with facing BLEs (\textbf{\textit{2-2-hierarchy-facing}}). In this use case, the BLEs from the same cluster row face each other.
\begin{table}
\centering
......@@ -376,61 +375,63 @@ For all tests, \textbf{\textit{interconn\_distance}} and \textbf{\textit{ble\_di
\hline
& method & comb Min. [MHz] & comb Avg. [MHz] & comb Max. [MHz] & muladd Min. [MHz] & muladd Avg. [MHz] & muladd Max. [MHz] \\
\hline \hline
ZUMA & - & 42.5854477008117 & 42.5854477008117 & 50.5198492487698 & 18.2575371677813 & 18.2575371677813 & 21.6272797856304 \\
ZUMA-Cluster & - & 109.547023059648 & 109.547023059648 & 116.819700474288 & 43.1676415359047 & 43.1676415359047 & 46.0816751610554 \\
ZUMA-Intercluster & - & 109.547023059648 & 109.547023059648 & 116.819700474288 & 43.1676415359047 & 43.1676415359047 & 46.0816751610554 \\
ZUMA-Hierarchical & - & 109.547023059648 & 109.547023059648 & 116.819700474288 & 43.1676415359047 & 43.1676415359047 & 46.0816751610554 \\
ZUMA & - & 42.585 & 42.585 & 50.519 & 18.257 & 18.257 & 21.627 \\
ZUMA-Cluster & - & 109.547 & 109.547 & 116.819 & 43.167 & 43.167 & 46.081 \\
ZUMA-Intercluster & - & 109.547 & 109.547 & 116.819 & 43.167 & 43.167 & 46.081 \\
ZUMA-Hierarchical & - & 109.547 & 109.547 & 116.819 & 43.167 & 43.167 & 46.081 \\
\hline \hline
2-2-cluster-1 & Floorplaning & 95.3288846520496 & 95.3288846520496 & 105.709362678252 &
41.5392795427356 & 41.5392795427356 & 45.4353386977323 \\
2-2-cluster-1 & OutOfContext & 109.547023059648 & 109.547023059648 & 116.819700474288 & 43.1676415359047 & 43.1676415359047 & 46.0816751610554 \\
2-2-cluster-1 & RapidWright & 109.547023059648 & 109.547023059648 & 116.819700474288 & 43.1676415359047 & 43.1676415359047 & 46.0816751610554 \\
2-2-cluster-1 & Floorplaning & 95.328 & 95.328 & 105.709 & 41.539 & 41.539 & 45.435 \\
2-2-cluster-1 & OutOfContext & 109.547 & 109.547 & 116.819 & 43.167 & 43.167 & 46.081 \\
2-2-cluster-1 & RapidWright & 109.547 & 109.547 & 116.819 & 43.167 & 43.167 & 46.081 \\
\hline
2-2-cluster-5 & Floorplaning & 93.51999925184 & 93.51999925184 & 104.310093044603 & 40.8341602250779 & 40.8341602250779 & 44.7267197423741 \\
2-2-cluster-5 & OutOfContext & 109.547023059648 & 109.547023059648 & 116.819700474288 & 43.1676415359047 & 43.1676415359047 & 46.0816751610554 \\
2-2-cluster-5 & RapidWright & 109.547023059648 & 109.547023059648 & 116.819700474288 & 43.1676415359047 & 43.1676415359047 & 46.0816751610554 \\
2-2-cluster-5 & Floorplaning & 93.519 & 93.519 & 104.310 & 40.834 & 40.834 & 44.726 \\
2-2-cluster-5 & OutOfContext & 109.547 & 109.547& 116.819 & 43.167 & 43.167 & 46.081 \\
2-2-cluster-5 & RapidWright & 109.547 & 109.547 & 116.819 & 43.167 & 43.167 & 46.081 \\
\hline
2-2-cluster-10 & Floorplaning & 97.7803852547179 & 97.7803852547179 & 108.898060525542 & 41.0524198348872 & 41.0524198348872 & 44.9745444078652 \\
2-2-cluster-10 & OutOfContext & 109.547023059648 & 109.547023059648 & 116.819700474288 & 43.1676415359047 & 43.1676415359047 & 46.0816751610554 \\
2-2-cluster-10 & RapidWright & 109.547023059648 & 109.547023059648 & 116.819700474288 & 43.1676415359047 & 43.1676415359047 & 46.0816751610554 \\
2-2-cluster-10 & Floorplaning & 97.780 & 97.780 & 108.898 & 41.052 & 41.052 & 44.974 \\
2-2-cluster-10 & OutOfContext & 109.547 & 109.547 & 116.819 & 43.167 & 43.167 & 46.081 \\
2-2-cluster-10 & RapidWright & 109.547 & 109.547 & 116.819 & 43.167 & 43.167 & 46.081 \\
\hline
2-2-intercluster-1 & Floorplaning & 96.6286271970934 & 96.6286271970934 & 104.601416303177 & 40.9285878000066 & 40.9285878000066 & 44.1739038245766 \\
2-2-intercluster-1 & OutOfContext & 102.171136653895 & 102.171136653895 & 108.800905223532 & 42.3701883354872 & 42.3701883354872 & 45.2083653559255 \\
2-2-intercluster-1 & RapidWright & 102.171136653895 & 102.171136653895 & 108.800905223532 & 42.3701883354872 & 42.3701883354872 & 45.2083653559255 \\
2-2-intercluster-1 & Floorplaning & 96.628 & 96.628 & 104.601 & 40.928 & 40.928 & 44.173 \\
2-2-intercluster-1 & OutOfContext & 102.171 & 102.171 & 108.800 & 42.370 & 42.370 & 45.208 \\
2-2-intercluster-1 & RapidWright & 102.171 & 102.171 & 108.800 & 42.370 & 42.370 & 45.208 \\
\hline
2-2-intercluster-5 & Floorplaning & 97.369087262176 & 97.369087262176 & 105.555379629923 & 40.270618556701 & 40.270618556701 & 43.5183734572737 \\
2-2-intercluster-5 & OutOfContext & 107.892323461186 & 107.892323461186 & 114.966314869743 & 42.8494911622925 & 42.8494911622925 & 45.7370758457929 \\
2-2-intercluster-5 & RapidWright & 107.892323461186 & 107.892323461186 & 114.966314869743 & 42.8494911622925 & 42.8494911622925 & 45.7370758457929 \\
2-2-intercluster-5 & Floorplaning & 97.369 & 97.369 & 105.555 & 40.270 & 40.270 & 43.518 \\
2-2-intercluster-5 & OutOfContext & 107.892 & 107.892 & 114.966 & 42.849 & 42.849 & 45.737 \\
2-2-intercluster-5 & RapidWright & 107.892 & 107.892 & 114.966 & 42.849 & 42.849 & 45.737 \\
\hline
2-2-intercluster-10 & Floorplaning & 97.5238689669297 & 97.5238689669297 & 105.704893079501 & 40.2382102044101 & 40.2382102044101 & 43.4510567296997 \\
2-2-intercluster-10 & OutOfContext & 103.632312555055 & 103.632312555055 & 110.374057681483 & 42.6958136754691 & 42.6958136754691 & 45.5605773436361 \\
2-2-intercluster-10 & RapidWright & 103.632312555055 & 103.632312555055 & 110.374057681483 & 42.6958136754691 & 42.6958136754691 & 45.5605773436361 \\
2-2-intercluster-10 & Floorplaning & 97.523 & 97.523 & 105.704 & 40.238 & 40.238 & 43.451 \\
2-2-intercluster-10 & OutOfContext & 103.632 & 103.632 & 110.374 & 42.695 & 42.695 & 45.560 \\
2-2-intercluster-10 & RapidWright & 103.632 & 103.632 & 110.374 & 42.695 & 42.695 & 45.560 \\
\hline
2-2-hierarchy-1 & Floorplaning & 96.6286271970934 & 96.6286271970934 & 104.601416303177 & 40.9285878000066 & 40.9285878000066 & 44.1739038245766 \\
2-2-hierarchy-1 & OutOfContext & 102.171136653895 & 102.171136653895 & 108.800905223532 & 42.3701883354872 & 42.3701883354872 & 45.2083653559255 \\
2-2-hierarchy-1 & RapidWright & 102.171136653895 & 102.171136653895 & 108.800905223532 & 42.3701883354872 & 42.3701883354872 & 45.2083653559255 \\
2-2-hierarchy-1 & Floorplaning & 96.628 & 96.628 & 104.601 & 40.928 & 40.928 & 44.173 \\
2-2-hierarchy-1 & OutOfContext & 102.171 & 102.171 & 108.800 & 42.370 & 42.370 & 45.208 \\
2-2-hierarchy-1 & RapidWright & 102.171 & 102.171 & 108.800 & 42.370 & 42.370 & 45.208 \\
\hline
2-2-hierarchy-5 & Floorplaning & 97.369087262176 & 97.369087262176 & 105.555379629923 & 40.270618556701 & 40.270618556701 & 43.5183734572737
\\
2-2-hierarchy-5 & OutOfContext & 107.892323461186 & 107.892323461186 & 114.966314869743 & 42.8494911622925 & 42.8494911622925 & 45.7370758457929 \\
2-2-hierarchy-5 & RapidWright & 107.892323461186 & 107.892323461186 & 114.966314869743 & 42.8494911622925 & 42.8494911622925 & 45.7370758457929 \\
2-2-hierarchy-5 & Floorplaning & 97.369 & 97.369 & 105.555 & 40.270 & 40.270 & 43.518 \\
2-2-hierarchy-5 & OutOfContext & 107.892 & 107.892 & 114.966 & 42.849 & 42.849 & 45.737 \\
2-2-hierarchy-5 & RapidWright & 107.892 & 107.892 & 114.966 & 42.849 & 42.849 & 45.737 \\
\hline
2-2-hierarchy-10 & Floorplaning & 97.5238689669297 & 97.5238689669297 & 105.704893079501 & 40.2382102044101 & 40.2382102044101 & 43.4510567296997 \\
2-2-hierarchy-10 & OutOfContext & 103.632312555055 & 103.632312555055 & 110.374057681483 & 42.6958136754691 & 42.6958136754691 & 45.5605773436361 \\
2-2-hierarchy-10 & RapidWright & 103.632312555055 & 103.632312555055 & 110.374057681483 & 42.6958136754691 & 42.6958136754691 & 45.5605773436361 \\
2-2-hierarchy-10 & Floorplaning & 97.523 & 97.523 & 105.704 & 40.238 & 40.238 & 43.451 \\
2-2-hierarchy-10 & OutOfContext & 103.632 & 103.632 & 110.374 & 42.695 & 42.695 & 45.560 \\
2-2-hierarchy-10 & RapidWright & 103.632 & 103.632 & 110.374 & 42.695 & 42.695 & 45.560 \\
\hline \hline
2-2-hierarchy-facing & Floorplaning & 100.327066235929 & 100.327066235929 & 108.53874290428 & 40.6408245210479 & 40.6408245210479 & 43.8904494382023 \\
2-2-hierarchy-facing & OutOfContext & 103.890706976261 & 103.890706976261 & 110.712545945707 & 42.4295139699175 & 42.4295139699175 & 45.2726317886311 \\
2-2-hierarchy-facing & RapidWright & 103.890706976261 & 103.890706976261 & 110.712545945707 & 42.4295139699175 & 42.4295139699175 & 45.2726317886311 \\
2-2-hierarchy-facing & Floorplaning & 100.327 & 100.327 & 108.538 & 40.640 & 40.640 & 43.890 \\
2-2-hierarchy-facing & OutOfContext & 103.890 & 103.890 & 110.712 & 42.429 & 42.429 & 45.272 \\
2-2-hierarchy-facing & RapidWright & 103.890 & 103.890 & 110.712 & 42.429 & 42.429 & 45.272 \\
\hline
\end{tabular}}
\caption{Test results for xc7z100ffg900-2 FPGA. The table reports on the calculated operating frequencies.}
\caption{Test results for xc7z100ffg900-2 FPGA. The table reports on the calculated operating frequencies in MHz.}
\label{tab:results}
\end{table}
As we can see from the results~\ref{tab:results}, by adding one level of hierarchy (cluster) to the ZUMA generated Verilog design, Vivado is able to find a better placement, which leads to a considerable timing improvement. By breaking the clusters into interconnections and BELs, both hierarchical and non-hierarchical will not guide Vivado any further. \\
In Table~\ref{tab:results}, we present the calculated minimum, average and maximum clock frequency for different configurations mentioned in Table~\ref{tab:tests}. Since the calculated cock frequency is dependent to Virtual circuit, we examined the timings for each configuration with two circuits. Both sample Virtual circuits are provided in src folder of project directory. First circuit $comb$ uses nearly half of Virtual FPGA logic elements (BLEs), However, $muladd$ circuit uses almost all logic elements of Virtual FPGA. Since the timing report from Vivado (SDF file) reports the same number for minimum and average delay of each node, the calculated minimum and average clock frequency are always the same. For comparison, we also provided calculated clock frequencies for these two circuits for ZUMA overlay without any hierarchical design (zuma) and also ZUMA hierarchical designs without any placement method (ZUMA-Cluster, ZUMA-Intercluster and ZUMA-Hierarchical).
As we can see from the results, by adding one level of hierarchy (cluster) to the ZUMA generated Verilog design, Vivado is able to find a better placement, which leads to a considerable timing improvement. By breaking the clusters into interconnections and BELs, both hierarchical and non-hierarchical will not guide Vivado any further. \\
Using the floorplanning flow, we can achieve a better timing compared to the unstructured ZUMA. However, the out-of-context flow produces better results compared to floorplanning, since Vivado is able to spend more time on placement-and-routing per module. Using the RapidWright method, we did not observe any difference in timing compared to the OOC flow. Our first goal with RapidWright was to fully relocate each module in both x and y direction. However, due to inconsistency of the FPGA's column pattern, moving in x direction is not possible. We tried to generate multiple implementation per module (for different x locations); however, due to relatively large size of clusters and interclusters, the modules formed unique column pattern, making them specific to the implemented position.
\section{Conclusion}\label{sec:Conclusion}
Our study shows that structuring the ZUMA Overlay verilog code into cluster modules, will help Vivado to place different nodes from the same cluster near each other will will lead to significant improvements. Also out of context has slight advantage over Floorplaning, However, OOC flow is time consuming and the required time for a Virtual Overlay depends on Virtual FPGA's array dimension. As a future work, one can try to exploit the regular structure of ZUMA Overlay even further and instead of generating all BLEs and interconnects for different positions, just generate them for one cluster and use RapidWright for replicating those modules in other clusters. This way we can decrease the time required for OCC flow while still enjoying the better placement and timing benefit of it.
\bibliography{report}
\bibliographystyle{plain}
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment