Sample SSE Routines

Nowadays almost all PC processors come equipped with some extra registers which the user can access directly. Until recently these could be accessed using only assembly level programming. However now gcc allows inline instructions which can access these registers. This feature can be put to good use as it enables quite a bit of speeding up of programs. Unfortunately when I tried to look for references , I found it hard to get examples using gcc. (The INTEL syntax and the syntax followed by gcc are not the same.) The only reference I found was the example by Martin Lüscher. However that was a reasonably complicated example for a beginner and so I had to struggle a bit. Here I am putting a few really really simple routines which I wrote as I was trying to make myself familiar with SSE. Hopefully someone will find it useful.

Examples
Ex. 1) Unpacked move
Ex. 2) Packed move
Ex. 3) Permutation
Ex. 4) Quarternionic multiplications (Requires dat)
Ex. 4a) C program for Quarternionic multiplications. (Requires dat)
All examples assume knowledge of previous example.

The following programs are mainly of interest for lattice gauge theorists.

Staple-0 : This program multiplies three su(2) vectors.
A trivial change will make it suitable for calculating staples. (Requires dat).

Staple-1 : same as Staple-0 but routine allows call from fortran program eg. cover.(Requires cover)

Staple-2 : Faster version of Staple-1. (routine allows call from fortran program)(Requires cover)

Main fortran program which calls Staple-2 (can also call Staple-1)(Requires fort.10)

Script for compiling and linking the programs cover and Staple2.
Here I use the very minimal option for using SSE.
For other options, see the notes by Martin Lüscher above.

Fortran program computing staples. (Requires fort.10)

Data files : dat, fort.10

Back