close
Skip to main content
Log in

Parallelization of Control Recurrences for ILP Processors

  • Published:
International Journal of Parallel Programming Aims and scope Submit manuscript

Abstract

The performance of applications executing on processors with instruction level parallelism is often limited by control and data dependences. Performance bottlenecks caused by dependences can frequently be eliminated through transformations which reduce the height of critical paths through the program. The utility of these techniques can be demonstrated in an increasingly broad range of important situations. This paper focuses on the height reduction of control recurrences within loops with data dependent exits. Loops with exits are transformed so as to alleviate performance bottlenecks resulting from control dependences. A compilation approach to effect these transformations is described. The techniques presented in this paper used in combination with prior work on reducing the height of data dependences provide a comprehensive approach to accelerating loops with conditional exits. In many cases, loops with conditional exits provide a degree of parallelism traditionally associated with vectorization. Multiple iterations of a loop can be retired in a single cycle on a processor with adequate instruction level parallelism with no cost in code redundancy. In more difficult cases, height reduction requires redundant computation or may not be feasible.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+
from $39.99 /Month
  • Starting from 10 chapters or articles per month
  • Access and download chapters and articles from more than 300k books and 2,500 journals
  • Cancel anytime
View plans

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. J. Ferrante, K. Ottenstein, and J. Warren, The program dependence graph and its use in optimization, ACM Trans. on Programming Language Systems 9(3):319–349 (1987).

    Article  MATH  Google Scholar 

  2. A. Nicolau and J. A. Fisher, Measuring the parallelism available for very long instruction word architectures, IEEE Trans. on Computers C-33(11):968–976 (1984).

    Article  Google Scholar 

  3. N. P. Jouppi and D. Wall, Available instruction level parallelism for superscalar and superpiplined machines, Proc. of the Third Int. Conf. on Architectural Support for Programming Languages and Operating Systems, Boston, Massachusetts, pp. 272–282 (1989).

    Chapter  Google Scholar 

  4. D. W. Wall, Limits of instruction-level paralleiism, Proc. of the Fourth Int. Conf. on Architectural Support for Programming Languages and Operating Systems, Santa Clara, California, pp. 176–188 (1991).

    Chapter  Google Scholar 

  5. M. S. Lam and R. P. Wilson, Limits of control flow on parallelism, Proc. of the 19th Int. Symp. on Computer Architecture, Gold Coast, Australia, pp. 46–57 (1992).

    Google Scholar 

  6. D. J. Kuck, The Structure of Computers and Computations, Wiley, New York, 1978.

    Google Scholar 

  7. M. Schlansker and V. Kathail, Acceleration of first and higher order recurrences on processors with instruction level parallelism, In Sixth Int. Workshop on Languages and Compilers for Parallel Computing, U. Banerjee, D. Gelernter, A. Nicolau, and D. Padua, eds., Springer-Verlag, pp. 406–429 (1993).

    Google Scholar 

  8. J. A. Fisher, Trace scheduling: A technique for global microcode compaction, IEEE Trans. on Computers C-30, 7:478–490 (1981).

    Article  Google Scholar 

  9. P. Tirumalai, M. Lee, and M. S. Schlansker, Parallelization of loops with exits on pipelined architectures, Proc. of the Super Computing ’90, New York, pp. 200–212 (1990).

    Chapter  Google Scholar 

  10. W.-M. W. Hwu, S. A. Mahlke, W. Y. Chen, P. P. Chang, N. J. Warter, R. A. Bringmann, R. G. Ouellete, R. E. Hank, T. Kiyohara, G. E. Habb, J. G. Holm, and D. M. Lavery, The Superblock: An effective technique for VLIW and superscalar compilation, The Journal of Super Computing 7(1/2):229–248 (1993).

    Article  Google Scholar 

  11. S. A. Mahlke, W. Y. Chen, R. A. Bringmann, R. E. Hank, W. W. Hwu, B. R. Rau, and M. S. Schlansker, Sentinel scheduling: A model for compiler-controlled speculative execution. ACM Trans. on Computer Systems 11(4):376–408 (1993).

    Article  Google Scholar 

  12. J. A. Fisher, Global code generation for instruction-level parallelism: trace scheduling-2, Technical Report HPL-93-43, Hewlett-Packard Laboratories, Palo Alto, California, 1993.

    Google Scholar 

  13. A. Nicolau, Parallelism, memory-anti-aliasing and correctness for trace scheduling compilers, Ph. D. Thesis. Yale University, 1984.

    Google Scholar 

  14. J. R. Ellis, Bulldog: A Compiler for VLIW Architectures MIT Press, Cambridge, Massachussetts (1985).

    Google Scholar 

  15. S.-M. Moon and K. Ebcioglu, An efficient resource-constrained global scheduling technique for superscalar and VLIW processors, Proc. of the 25th Ann. Int. Symp. on Microarchitecture, Portland, Oregon (1992).

    Google Scholar 

  16. G. Lowney, S. Freudenberger, T. Karzes, W. D. Lichtenstein, R. Nix, J. O’Donnell, and J. Ruttenberg, The Multiflow Trace Scheduling Compiler, The Journal of Super Computing 7(1/2):51–142 (1993).

    Article  Google Scholar 

  17. P. Y. T. Hsu and E. S. Davidson, Highly concurrent scalar processing, Proc. of the Thirteenth Ann. Int. Symp. on Computer Architecture, Tokyo, Japan, pp. 386–395 (1986).

    Google Scholar 

  18. B. R. Rau, Cydra 5 directed dataflow architecture, Proc. of the COMPCON ’88, San Francisco, California, pp. 106–113 (1988).

    Google Scholar 

  19. B. R. Rau, D. W. L. Yen, W. Yen, and R. A. Towle, The Cydra 5 departmental supercomputer: Design philosophies, decisions and trade-offs, Computer 22(1): 12–35 (1989).

    Article  Google Scholar 

  20. J. C. H. Park and M. S. Schlansker, On predicated execution, Technical Report HPL-91-58, Hewlett-Packard Laboratories, Palo Alto California, 1991.

    Google Scholar 

  21. V. Kathail, M. S. Schlansker, and B. R. Rau, HPL PlayDoh architecture specification: Version 1.0, Technical Report HPL-93-80, Hewlett-Packard Laboratories, Palo Alto, California (1993).

    Google Scholar 

  22. B. R. Rau, Data flow and dependence analysis for instruction level paralleiism, In Fourth Int. Workshop on Languages and Compilers for Parallel Computing, U. Banerjee, D. Gelernter, A. Nicolau, and D. Padua, eds., Springer-Verlag, pp. 236–250 (1992).

    Chapter  Google Scholar 

  23. B. R. Rau and C. D. Glaeser, Some scheduling techniques and an easily schedulable horizontal architecture for high performance scientific Computing, Proc. of the Fourteenth Ann. Workshop on Microprogramming, pp. 183–198 (1981).

  24. J. C. Dehnert, P. Y. T. Hsu, and J. P. Bratt, Overlapped Loop Support on the Cydra 5, Proc. of the Third Int. Conf. on Architectural Support for Programming Languages and Operating Systems, Boston, Massachusetts, pp. 26–38 (1989).

    Chapter  Google Scholar 

  25. J. C. Dehnert and R. A. Towle, Compiling for the Cydra 5, The Journal of Supercomputing 7(1/2):181–228 (1993).

    Article  Google Scholar 

  26. N. J. Warter, S. A. Mahlke, W. W. Hwu, and B. R. Rau, Reverse if-conversion, Proc. of the SIGPLAN ’93 Conf. on Programming Language Design and Implementation, Albuquerque, New Mexico, pp. 290–299 (1993).

    Google Scholar 

  27. M. Schlansker, V. Kathail, and S. Anik, Height reduction of control recurrences for ILP processors, Proc. of the 27th Ann. Int. Symp. on Microarchitecture, San Jose, California, pp. 40–51 (1994).

    Google Scholar 

  28. B. R. Rau, M. S. Schlansker, and P. P. Tirumalai, Code generation Schemas for modulo scheduled DO-loops and WHILE-loops, Technical Report HPL-92-47, Hewlett-Packard Laboratories, Palo Alto California (1992).

    Google Scholar 

  29. F. H. McMahon, The Livermore Fortran Kernels: A computer test of the numerical performance range, Technical Report UCRL-53745, Lawrence Livermore National Laboratory (1986).

    Google Scholar 

  30. H.-P. Company, PA-RISC 1.1 Architecture and Instruction Set Reference Manual, Second Edition Hewlett-Packard (1992).

    Google Scholar 

  31. S. A. Mahlke, D. C. Lin, W. Y. Chen, R. E. Hank, and R. A. Bringmann, Effective compiler support for predicated execution using the hyperblock, Proc. of the 25th Ann. Int. Symp. on Microarchitecture, Portland, Oregon, pp. 45–54 (1992).

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Michael Schlansker.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Schlansker, M., Kathail, V. & Anik, S. Parallelization of Control Recurrences for ILP Processors. Int J Parallel Prog 24, 65–102 (1996). https://doi.org/10.1007/BF03356743

Download citation

  • Published:

  • Issue date:

  • DOI: https://doi.org/10.1007/BF03356743

Key Words