/*************************************************************************/
/*                                                                       */
/* Licensed Materials - Property of IBM                                  */
/*                                                                       */
/*                                                                       */
/* (C) Copyright IBM Corp. 2010                                          */
/* All Rights Reserved                                                   */
/*                                                                       */
/* US Government Users Restricted Rights - Use, duplication or           */
/* disclosure restricted by GSA ADP Schedule Contract with IBM Corp.     */
/*                                                                       */
/*************************************************************************/

================================================================================
OVERVIEW
================================================================================

   This workload consists of an OpenCL implementation of a solution to the
   Sparse Matrix Vector (SpMV) multiplication problem.  The user may select
   whether the computation will be done in single or double precision.
   
   The computation is accomplished in two steps:
      (1) read in the matrix and convert it into a tiled format
      (2) perform multiplications with this matrix on a random vector of input
   
   There are four subdirectories in this sample: src, data, ppc, and ppc64.
   
   The code in this workload consists of the following two files,
   found in the "src" directory:
      spmv.cpp       OpenCL host code, creating the tiled matrix format
      spmv.cl        OpenCL kernel code, implementing the multiplication
   
   The "data" directory contains a sample file in Matrix Market format.
   The user is encouraged to load matrices of his or her choosing from sources
   available online into this directory.
   
   The "ppc" and "ppc64" directories contain the Makefiles to build this sample.
   If you build in the "ppc" directory, you will create a 32-bit binary.
   If you build in the "ppc64" directory, you will create a 64-bit binary.
   
   For additional information with regards to this implementation and its data 
   format, please read the white paper titled "Tiled and Packetized SpMV using 
   OpenCL" published in the "OpenCL Lounge" developerWorks group.
   https://www.ibm.com/developerworks/mydeveloperworks/groups

================================================================================
MOTIVATION
================================================================================
   
   This sample demonstrates a new way of representing sparse matrices which 
   works well for the various hardware platforms supported by OpenCL.  It is 
   intended for those applications which have one matrix and LOTS of input 
   vectors to process through that matrix.  Performance is optimized for the 
   multiplication itself, at the expense of the time required to create the 
   internal format of the sparse matrix.

   This sample also demonstrates the use of fission and migration to support
   NUMA operations on the data buffers, as well as the use of the SubBuffer
   function to allow one contiguous user buffer to be accessed by multiple
   devices.
   
================================================================================
PREREQUISITES
================================================================================
   
   IBM OpenCL Dev Kit version 0.3 is required to run this sample.
   
================================================================================
HOW TO BUILD
================================================================================
   
   To build 32-bit binary, 
   cd to the ppc directory in the sample and type "make".
   
   To build 64-bit binary, 
   cd to the ppc64 directory in the sample and type "make".
   
================================================================================
HOW TO RUN
================================================================================
   
   The binary is "spmv" and it will be in the directory where you typed "make".
   
   Type "spmv --help" to see useful information, detailed below.
   
   Two examples:
   
      spmv -f ../data/sample.mtx
      spmv -f ../data/sample.mtx --cpu --ls --verbose -verify --timing
   
================================================================================
COMMAND LINE OPTIONS:
================================================================================
   
   You must provide the filename of a matrix file in the "data" directory.  
   For the one shipped file accompanying this sample, the format for this
   specification is 
   
      -f ../data/sample.mtx
   
   You may explicitly select one of the following devices
   
      -c, --cpu 
      -g, --gpu
      -a, --accel
   
   If the device selected is not available on your hardware,
   the results are undefined and hardware specific.
   If no device is selected, the code will use the default 
   device "CL_DEVICE_TYPE_DEFAULT".
   If there are multiple devices of the selected type, the
   code will select the first one.
   
   There are two kernels to select from:
   
      -L, --ls     (the "load/store" kernel)
      -A, --awgc   (the "async workgroup copy" kernel)
   
   If you don't make a selection, the program will select the kernel that
   works best for the selected (or default) device.
   The default kernel is "ls" if you are using a CPU or GPU device, 
   and "awgc" if you are using an ACCELERATOR device.

   You may specify the local work group size you would like to use, which
   will affect operations on the GPU device, when using the load/store kernel:

      -l, --lwgsize [n]   (note that n will be coerced to a power of 2)
   
   You may select one of the following generic flags:
   
      -d, --double
      -v, --verify
      -V, --verbose
      -t, --timing
      -n, --numa
      -h, --help
   
   The "--double" flag will cause SpMV to operate in double precision rather 
   than the default, which is to run SpMV in single precision.
   
   The "--verify" flag causes the code to compare the computed results against 
   a trivial implementation of SpMV in the host code, to ensure correctness.
   The default is to not take this action.
   
   The "--verbose" option causes more information to be printed out when 
   running the code.  The default is to not take this action.
   
   The "--timing" option causes execution of a performance run and display of 
   resulting performance data.  The default is to not take this action.
   
   If you select --numa, and your version of OpenCL supports the required
   extensions, then this sample will automatically optimize memory and thread
   allocation to minimize memory throughput delays.  
   The default is to not take this action.
   
   You can enter "--help" on the command line, 
   which will cause a message similar to this text to appear.
   
================================================================================
END OF TEXT
================================================================================
