Parallelization of Control Recurrences for ILP Processors

Schlansker, Michael; Kathail, Vinod; Anik, Sadun

doi:10.1007/BF03356743

Michael Schlansker¹,
Vinod Kathail¹ &
Sadun Anik¹

23 Accesses
2 Citations
9 Altmetric
Explore all metrics

Abstract

The performance of applications executing on processors with instruction level parallelism is often limited by control and data dependences. Performance bottlenecks caused by dependences can frequently be eliminated through transformations which reduce the height of critical paths through the program. The utility of these techniques can be demonstrated in an increasingly broad range of important situations. This paper focuses on the height reduction of control recurrences within loops with data dependent exits. Loops with exits are transformed so as to alleviate performance bottlenecks resulting from control dependences. A compilation approach to effect these transformations is described. The techniques presented in this paper used in combination with prior work on reducing the height of data dependences provide a comprehensive approach to accelerating loops with conditional exits. In many cases, loops with conditional exits provide a degree of parallelism traditionally associated with vectorization. Multiple iterations of a loop can be retired in a single cycle on a processor with adequate instruction level parallelism with no cost in code redundancy. In more difficult cases, height reduction requires redundant computation or may not be feasible.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

J. Ferrante, K. Ottenstein, and J. Warren, The program dependence graph and its use in optimization, ACM Trans. on Programming Language Systems 9(3):319–349 (1987).
Article MATH Google Scholar
A. Nicolau and J. A. Fisher, Measuring the parallelism available for very long instruction word architectures, IEEE Trans. on Computers C-33(11):968–976 (1984).
Article Google Scholar
N. P. Jouppi and D. Wall, Available instruction level parallelism for superscalar and superpiplined machines, Proc. of the Third Int. Conf. on Architectural Support for Programming Languages and Operating Systems, Boston, Massachusetts, pp. 272–282 (1989).
Chapter Google Scholar
D. W. Wall, Limits of instruction-level paralleiism, Proc. of the Fourth Int. Conf. on Architectural Support for Programming Languages and Operating Systems, Santa Clara, California, pp. 176–188 (1991).
Chapter Google Scholar
M. S. Lam and R. P. Wilson, Limits of control flow on parallelism, Proc. of the 19th Int. Symp. on Computer Architecture, Gold Coast, Australia, pp. 46–57 (1992).
Google Scholar
D. J. Kuck, The Structure of Computers and Computations, Wiley, New York, 1978.
Google Scholar
M. Schlansker and V. Kathail, Acceleration of first and higher order recurrences on processors with instruction level parallelism, In Sixth Int. Workshop on Languages and Compilers for Parallel Computing, U. Banerjee, D. Gelernter, A. Nicolau, and D. Padua, eds., Springer-Verlag, pp. 406–429 (1993).
Google Scholar
J. A. Fisher, Trace scheduling: A technique for global microcode compaction, IEEE Trans. on Computers C-30, 7:478–490 (1981).
Article Google Scholar
P. Tirumalai, M. Lee, and M. S. Schlansker, Parallelization of loops with exits on pipelined architectures, Proc. of the Super Computing ’90, New York, pp. 200–212 (1990).
Chapter Google Scholar
W.-M. W. Hwu, S. A. Mahlke, W. Y. Chen, P. P. Chang, N. J. Warter, R. A. Bringmann, R. G. Ouellete, R. E. Hank, T. Kiyohara, G. E. Habb, J. G. Holm, and D. M. Lavery, The Superblock: An effective technique for VLIW and superscalar compilation, The Journal of Super Computing 7(1/2):229–248 (1993).
Article Google Scholar
S. A. Mahlke, W. Y. Chen, R. A. Bringmann, R. E. Hank, W. W. Hwu, B. R. Rau, and M. S. Schlansker, Sentinel scheduling: A model for compiler-controlled speculative execution. ACM Trans. on Computer Systems 11(4):376–408 (1993).
Article Google Scholar
J. A. Fisher, Global code generation for instruction-level parallelism: trace scheduling-2, Technical Report HPL-93-43, Hewlett-Packard Laboratories, Palo Alto, California, 1993.
Google Scholar
A. Nicolau, Parallelism, memory-anti-aliasing and correctness for trace scheduling compilers, Ph. D. Thesis. Yale University, 1984.
Google Scholar
J. R. Ellis, Bulldog: A Compiler for VLIW Architectures MIT Press, Cambridge, Massachussetts (1985).
Google Scholar
S.-M. Moon and K. Ebcioglu, An efficient resource-constrained global scheduling technique for superscalar and VLIW processors, Proc. of the 25th Ann. Int. Symp. on Microarchitecture, Portland, Oregon (1992).
Google Scholar
G. Lowney, S. Freudenberger, T. Karzes, W. D. Lichtenstein, R. Nix, J. O’Donnell, and J. Ruttenberg, The Multiflow Trace Scheduling Compiler, The Journal of Super Computing 7(1/2):51–142 (1993).
Article Google Scholar
P. Y. T. Hsu and E. S. Davidson, Highly concurrent scalar processing, Proc. of the Thirteenth Ann. Int. Symp. on Computer Architecture, Tokyo, Japan, pp. 386–395 (1986).
Google Scholar
B. R. Rau, Cydra 5 directed dataflow architecture, Proc. of the COMPCON ’88, San Francisco, California, pp. 106–113 (1988).
Google Scholar
B. R. Rau, D. W. L. Yen, W. Yen, and R. A. Towle, The Cydra 5 departmental supercomputer: Design philosophies, decisions and trade-offs, Computer 22(1): 12–35 (1989).
Article Google Scholar
J. C. H. Park and M. S. Schlansker, On predicated execution, Technical Report HPL-91-58, Hewlett-Packard Laboratories, Palo Alto California, 1991.
Google Scholar
V. Kathail, M. S. Schlansker, and B. R. Rau, HPL PlayDoh architecture specification: Version 1.0, Technical Report HPL-93-80, Hewlett-Packard Laboratories, Palo Alto, California (1993).
Google Scholar
B. R. Rau, Data flow and dependence analysis for instruction level paralleiism, In Fourth Int. Workshop on Languages and Compilers for Parallel Computing, U. Banerjee, D. Gelernter, A. Nicolau, and D. Padua, eds., Springer-Verlag, pp. 236–250 (1992).
Chapter Google Scholar
B. R. Rau and C. D. Glaeser, Some scheduling techniques and an easily schedulable horizontal architecture for high performance scientific Computing, Proc. of the Fourteenth Ann. Workshop on Microprogramming, pp. 183–198 (1981).
J. C. Dehnert, P. Y. T. Hsu, and J. P. Bratt, Overlapped Loop Support on the Cydra 5, Proc. of the Third Int. Conf. on Architectural Support for Programming Languages and Operating Systems, Boston, Massachusetts, pp. 26–38 (1989).
Chapter Google Scholar
J. C. Dehnert and R. A. Towle, Compiling for the Cydra 5, The Journal of Supercomputing 7(1/2):181–228 (1993).
Article Google Scholar
N. J. Warter, S. A. Mahlke, W. W. Hwu, and B. R. Rau, Reverse if-conversion, Proc. of the SIGPLAN ’93 Conf. on Programming Language Design and Implementation, Albuquerque, New Mexico, pp. 290–299 (1993).
Google Scholar
M. Schlansker, V. Kathail, and S. Anik, Height reduction of control recurrences for ILP processors, Proc. of the 27th Ann. Int. Symp. on Microarchitecture, San Jose, California, pp. 40–51 (1994).
Google Scholar
B. R. Rau, M. S. Schlansker, and P. P. Tirumalai, Code generation Schemas for modulo scheduled DO-loops and WHILE-loops, Technical Report HPL-92-47, Hewlett-Packard Laboratories, Palo Alto California (1992).
Google Scholar
F. H. McMahon, The Livermore Fortran Kernels: A computer test of the numerical performance range, Technical Report UCRL-53745, Lawrence Livermore National Laboratory (1986).
Google Scholar
H.-P. Company, PA-RISC 1.1 Architecture and Instruction Set Reference Manual, Second Edition Hewlett-Packard (1992).
Google Scholar
S. A. Mahlke, D. C. Lin, W. Y. Chen, R. E. Hank, and R. A. Bringmann, Effective compiler support for predicated execution using the hyperblock, Proc. of the 25th Ann. Int. Symp. on Microarchitecture, Portland, Oregon, pp. 45–54 (1992).
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Computer Research Center, Hewlett-Packard Laboratories, 1501 Page Mill Road, Palo Alto, California, 94304, USA
Michael Schlansker, Vinod Kathail & Sadun Anik

Authors

Michael Schlansker
View author publications
Search author on:PubMed Google Scholar
Vinod Kathail
View author publications
Search author on:PubMed Google Scholar
Sadun Anik
View author publications
Search author on:PubMed Google Scholar

Corresponding author

Correspondence to Michael Schlansker.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Schlansker, M., Kathail, V. & Anik, S. Parallelization of Control Recurrences for ILP Processors. Int J Parallel Prog 24, 65–102 (1996). https://doi.org/10.1007/BF03356743

Download citation

Published: 26 May 2016
Issue date: February 1996
DOI: https://doi.org/10.1007/BF03356743

Key Words

Access this article

Log in via an institution

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Parallelization of Control Recurrences for ILP Processors

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Enhancing the Effectiveness of Inlining in Automatic Parallelization

A new thread-level speculative automatic parallelization model and library based on duplicate code execution

Multi-grain Parallel Processing of Inductive Logic Programming

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Key Words

Subscribe and save

Buy Now

Parallelization of Control Recurrences for ILP Processors

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Enhancing the Effectiveness of Inlining in Automatic Parallelization

A new thread-level speculative automatic parallelization model and library based on duplicate code execution

Multi-grain Parallel Processing of Inductive Logic Programming

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Key Words

Subscribe and save

Buy Now