Convergent scheduling is a general instruction scheduling framework that simplifies and facilitates the application of a multitude of arbitrary constraints and scheduling heuristics required to schedule instructions for modern complex processors. A convergent scheduler is composed of independent passes, each implementing a heuristic that addresses a particular problem or constraint. The passes share a simple, common interface that allows the spatial and temporal preferences associated with each instruction to be queried and modified. With each heuristic independently applying its scheduling constraint in succession, the final result is a well formed instruction schedule that is able to satisfy most of the constraints. We have implemented a set of different passes that addresses scheduling constraints such as partitioning, load balancing, communication bandwidth, and register pressure. By applying a hand-selected, fixed ordering of the passes we are able to obtain an average increase in speedup on a reference 4-cluster VLIW architecture of 28% when compared to Desoli's PCC algorithm, 14% when compared to UAS, and a speedup of 21% over the existing space-time scheduler of the Raw processor. Then, we applied machine-learning techniques to automatically search for good pass orderings, when moving to different VLIW architectures. The architecture-specific pass orderings yield speedups ranging from 12% to 95% over the baseline order. The {em cross validation} studies we ran show that our automatically generated orderings perform well beyond the benchmarks on which they were `trained': benchmarks that were not in the training set are within 6% of the performance they would obtain had they been in the training set.
Convergent Scheduling
Puppin D;
2004
Abstract
Convergent scheduling is a general instruction scheduling framework that simplifies and facilitates the application of a multitude of arbitrary constraints and scheduling heuristics required to schedule instructions for modern complex processors. A convergent scheduler is composed of independent passes, each implementing a heuristic that addresses a particular problem or constraint. The passes share a simple, common interface that allows the spatial and temporal preferences associated with each instruction to be queried and modified. With each heuristic independently applying its scheduling constraint in succession, the final result is a well formed instruction schedule that is able to satisfy most of the constraints. We have implemented a set of different passes that addresses scheduling constraints such as partitioning, load balancing, communication bandwidth, and register pressure. By applying a hand-selected, fixed ordering of the passes we are able to obtain an average increase in speedup on a reference 4-cluster VLIW architecture of 28% when compared to Desoli's PCC algorithm, 14% when compared to UAS, and a speedup of 21% over the existing space-time scheduler of the Raw processor. Then, we applied machine-learning techniques to automatically search for good pass orderings, when moving to different VLIW architectures. The architecture-specific pass orderings yield speedups ranging from 12% to 95% over the baseline order. The {em cross validation} studies we ran show that our automatically generated orderings perform well beyond the benchmarks on which they were `trained': benchmarks that were not in the training set are within 6% of the performance they would obtain had they been in the training set.File | Dimensione | Formato | |
---|---|---|---|
prod_160674-doc_125433.pdf
accesso aperto
Descrizione: Convergent Scheduling
Dimensione
499.73 kB
Formato
Adobe PDF
|
499.73 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.