GridSpace Tutorials

Below you will find a list of tutorials showing the main functionalities of the GridSpace platform. To be able to follow the instructions you need to have an account on at least one of the supported execution hosts.

  1. Not so simple Hello World
  2. Simple Grid Job
  3. Multi-site execution
  4. Publishing an experiment
  5. Subsnippets
  6. WebGUI
  7. RESTful invocation

Not so simple Hello World

An example experiment you are going to build by following the instructions below uses a simple sorting method to sort a random sequence of numbers. In addition it will visualize the process with a simple animation. You can read about the sorting algorithm here. During building of the experiment we will use languages such as Ruby, Python and Bash.

  1. Login to the Experiment Workbench by clicking the Login button on the GridSpace home page. You need to have an account on one of the supported execution hosts.

  2. The first snippet of our new experiment will take care of generating a random sequence of numbers. Let's use Ruby to accomplish this. Paste the following code in to your Ruby snippet (be sure to pick Ruby as the snippet's interpreter):

    #!/usr/bin/ruby
    
    size = 10
    array = (1..size).map { rand(size) }
    
    File.open("input.txt", "w") do |f|
    	array.each { |n| f.puts(n) }
    end
  3. Now you need to inform Workbench about output files produced by the Ruby snippet by adding an asset. To do that click on the outputs menu at the bottom of the snippet and choose Add new simple output ... option. In the popup window define asset's name and path (be sure to use exactly the same path as in the snippet).

  4. Execute the Ruby snippet by clicking the Play icon button in the upper right corner of the snippet panel. When it finishes, the file manager should show a newly created input.txt file (look at the Output panel).

  5. In the next snippet let's implement the sorting algorithm but in doing so let's also remember each step to later visualize it. Click on the Add new snippet icon icon in the upper right part of the snippet panel. Change the interpreter of the newly added snippet to Python and paste the following code into the snippet:

    #!/usr/bin/python
    
    
    step_count = 1
    
    # dump single sorting state to a file
    # indexes is a list of indexes affected in given state
    # array is array content at this step
    def dump_state(indexes, array):
        global step_count
        steps_file.write(",".join([str(el) for el in indexes]) +"\n")
        for number in array:
            steps_file.write(str(number) +"\n")
        steps_file.write("---\n")
        step_count += 1
        f.close()
    
    def insertion_sort(array):
        l = len(array)
        for i in range(1, l):
            val = array[i]
            j = i
            dump_state([j], array)
            while j > 0 and array[j-1] > val:
                array[j] = array[j-1]
                dump_state([j], array)
                j -= 1
            array[j] = val
        return array
    
    tested_sorting = insertion_sort
    
    #load array from file
    f = open("input.txt","r")
    steps_file = open("steps.txt", "w")
    input_array = []
    for line in f:
        line = line.rstrip()
        if line:
            input_array.append(int(line))
    
    f.close()
    
    
    print input_array
    print tested_sorting(input_array)
    
    steps_file.close()

    The code reads in the sequence of random numbers from the input.txt file and writes each sorting step to the steps.txt file. The last entry in the second file is the sorted sequence.

  6. Define input and output assets for the sorting snippet by using input and output asset menus (you can use the Add existing input/output option for adding the input asset by picking the input.txt option from the submenu, the output asset needs to be defined by creating a new output asset and giving it a name and path).

  7. Create a third snippet and set Bash as its interpreter. Peste the following code into the snippet:

    #!/bin/bash
    
    filecnt=1
    
    STEP_COUNT=`cat steps.txt | grep "^---" | wc -l`
    
    echo "number of steps: ${STEP_COUNT}"
    mkdir --parents frames
    for line in `cat steps.txt` ; do
        `echo $line | grep -q '^---'`    
        if [ $? -ne "0" ] ; then
            echo "$line" >> /tmp/step_"$filecnt"
        else
            # generate plot
            echo "set terminal png" > tmp.plt
            echo "unset border" >> tmp.plt
            echo "unset xtics; unset ytics" >> tmp.plt
            echo "set output \"frames/plot${filecnt}.png\"" >> tmp.plt
            echo "plot \"/tmp/step_${filecnt}\" every ::1 with boxes"  >> tmp.plt
            gnuplot tmp.plt
            filecnt=`expr $filecnt + 1`
        fi
    done
    rm -rf /tmp/step_*

    The code takes each step of the sorting procedure and produces a corresponding gnuplot graph in a file stored in the frames directory.

  8. Define assets for the Bash snippet by creating one input asset (steps.txt) and one output asset (frames) which is a directory.

  9. Add the final snippet with the Bash interpreter and paste the following code into it (the code uses the ffmpeg encoder to produce the final movie):

    FRAMERATE=10
    OUT=plot.flv
    rm -rf $OUT
    ffmpeg -r $FRAMERATE -b 16777216 -i frames/plot%d.png $OUT
  10. Define on input asset (frames) and one output asset (plot.flv) for the final snippet.

  11. Run all the snippets by clicking the Play icon icon starting from the first snippet until the last one. Watch the final movie by clicking it in the Files panel.

You can change the last snippet to improve the movie's parameters. For example, you could change the FRAMERATE value to 20 to make the movie faster or change the bitrate or the encoder (in this case it is MPEG4).

Simple Grid Job

In this tutorial we will look at the possibility to run simple Grid jobs from within Experiment Workbench. To follow this tutorial you need to be registered in the vo.plgrid.pl virtual organization.

  1. Login to Workbench with the grid executor by using your certificate and private key in the login panel (note that the user proxy is generated on the client side and that the private key is not transfered to the server side). You will need Java enabled in your browser to generate the user proxy.

  2. Make sure that the interpreter of the snippet in your new experiment is Grid Bash and paste the following code into the snippet:

    hostname
  3. Observe the Output panel to see how the job status changes and to see the results when the job finishes.

Multi-site execution

In case an experiment is run on several computation sites its assets can be copied between the sites by Workbench. In this tutorial we will execute simple bash scripts which will collect host names from sites they were executed on. You need accounts on at least two execution hosts to run this tutorial.

  1. Login to one of the available execution hosts, create a new experiment (if not already created), pick Bash (any version will do) as the interpreter of your new snippet and paste the following code into the snippet:

    hostname >> hosts.txt
  2. Next, we need to define the output asset by clicking the Add new simple output... option from the outputs menu (located at the bottom of the snippet widget). In the popup window you need to define the output's name (e.g. hosts) and the path (e.g. hosts.txt, remember that the path needs to be exactly the same as the one used in your code).

  3. You can now execute the snippet by clicking the Play icon icon to see if the output file is created in the Files panel (you can also check its contents by clicking it).

  4. Now, let's create another snippet by clicking the Add new snippet icon icon in your current snippet. Paste the code from the first snippet into the new one (you can switch between snippets by clicking their name labels).

  5. Login to another execution host by clicking the Login to another executor option inside the hosts menu in the upper right part of the Workbench panel and make sure that the Bash interpreter of the second snippet runs on the new execution host.

  6. It is always a good idea to save your work. Do this now by clicking the Save experiment icon icon in the experiment panel.

  7. We need to define input and output assets of the second snippet now. Do this by clicking the inputs menu and picking the Add existing input/output option (there should be only the hosts.txt option available in the submenu). In the input definition popup use the Pencil edit icon icon to point to the file produced by the first snippet and save the changes. In the outputs menu create a new output asset and save the experiment (use different names than those used for the output asset in the first snippet).

  8. Run the second snippet and check in the Files panel that the produced output file has two host names inside.

You can continue to pass the hosts.txt file asset to subsequent execution hosts remembering that currently it is necessary to point the input asset definition to its location on the previous execution host (see point number 7).

Publishing an experiment

Experiment publication is one of the Workbench's features that enables you to publish experiment code and data to be viewed by other users. In addition users can substitute your data by their own to see how the experiment behaves. To publish an experiment follow the instructions below.

  1. Login to one of the execution hosts, create a new experiment and pick Bash as the snippet's interpreter. Paste the following code into your snippet:

    cat in.txt > out.txt

    The code simply copies the contents of the in.txt file into the out.txt file.

  2. Define input and output assets corresponding with the two input and output files by using the inputs and outputs menu at the bottom of the snippet widget (be sure to enter their names and paths which need to be the same as in the code).

  3. To successfully run the experiment the input file in.txt needs to exist. Let's create it by clicking the Menu -> New -> File menu option on the Files panel and giving it a name of in.txt.

  4. Edit the newly created file by using the Open with -> Plain Text Editor option in the drop-down menu next to the file item. Write whatever text you like and save the changes.

  5. Execute the snippet by clicking the Play icon icon and check that the out.txt file was created and that it contains the same contents as the input file.

  6. Publish the experiment by clicking one of the Save and release to ... options in the drop-down menu next to the Save experiment icon icon and check that a new release item appeared on the Releases panel.

  7. You can check how the published experiment looks like by clicking the release item. The experiment will be displayed in a new window in the publishing layout. You can substitute the original input by clicking the Upload icon icon and execute the snippet to see a coresponding contents of the output asset.

Notice that snippets and assets in the publishing layout are displayed inside HTML frames. It is possible to embed these in any web page of your choice. The only requirement is to also embed the master widget (the first widget on the publishing template). All HTML code necessary to embed the frames can be obtained by clicking the View embed code links under each of the widgets on the publishing page.

Subsnippets

In this excercise we will create a simple experiment. First, we will generate data set - a few directories containing set of files with randomly generated textual data. Then we will go folder by folder and find the file containing the most words. Then we will build a summary of the files winning in each folder and visualize it in gnuplot.

  1. Our first snippet is written in Ruby and looks as follows:

    Dir.mkdir("data")
    
    num_tests = rand(3) + 2
    num_tests.times do |i|
    
      Dir.mkdir("data/dir#{i}")
    
      num_files = rand(3) + 2
      num_files.times do |j|
      
        File.open("data/dir#{i}/file#{j}.txt", 'w') do |f| 
          num_chars = rand(1000)
          f.write((0...num_chars).map{ ('a'..'z').to_a.concat(["\n", " "])[rand(28)] }.join) 
        end
      end
    end

    In this snippet our data is prepared in subdirectories of the newly created "data" directory. So that we define new output asset with path "data/". You may also choose execution directory by specifying context. Now we can pick Ruby as an interpreter and run the snippet by clicking Play icon button. Our data is now generated.

  2. Add another snippet and choose Ruby as an interpreter. Add existing asset "data/" as an input. Please note that the following snippet is missing some code as we left a few lines for some valuable processing to be applied in the middle of the first each loop.

    Dir.new("data").each do |dir|
      if (dir =~ /dir*/)
      
        # insert subsnippet here
             
      end
    end
    
    File.open("summary.txt", "w") do |summary|
      Dir.glob(File.join("data", "dir*", "winner.txt")).each do |f|
        File.open(f, "r") do |infile|
          while (line = infile.gets)
            summary.write(line)
          end 
        end
      end
    end
    

    Let's look at the commented line inside the if statement. We need to iterate over the files in each directory and count words in these files. In Bash interpreter counting words is trivial and boils down to using wc -w command like in the code presented below:

    for file in `ls`; 
    do 
      echo `cat $file | wc -w` $file >> summary.txt 
    done
    cat summary.txt | sort -nr | head -1 > winner.txt

    Fortunatelly it is very easy to mix two programming languages using GridSpace subsnippets.

  3. Let's remove "# insert subsnippet here" placeholder and insert subsnippet in this place. Place the cursor inside if statement and choose New subsnippet from the Actions menu or simply click New subsnippet button located on the top-right corner of snippet panel.

  4. Now, we need to specify context for the subsnippet execution. Context plays two roles in this case. First of all it is a directory in which subsnippet will be executed. Secondly it is a reference point for the location of files passed from master snippet to subsnippet and back. As the subsnippet context can be any string able to be rendered in the language of master snippet, we may tell our subsnippet to operate on the files located in the directory <master_snippet_context>/<subsnippet_context>. So in order to tell our subsnippet to operate on the data in subdirectories of <master_snippet_context>/data we iterate over them (the name is held in the dir variable), pass the path data/#{dir} as a context and set ./ as a subsnipput input path. Our experiment should now look as follows:

    Subsnippet in a loop

    It is essential to understand what we just did now. We told our subsnippet that it will be executed in data/#{dir} folder in the user home directory. At the same time we have also told the subsnippet to copy all the declared inputs from the paths related to <master_snippet_context>/data/#{dir} that directory. That operation guarantees that our inputs will in the intended place even if we change execution site of the master or the sub- snippet. This subsnippet operates on its' context folder as an input (path: "./"), so entire directory will be copied before executing subsnippet.

  5. What we have to do now is to specify our winner.txt file as an output asset of the subsnippet in order to make it accessible for the master snippet. Analogously to the input specification, the output will be copied after subsnippet execution from <subsnippet_context>/<asset_path> on subsnippet executor to <master_snippet_context>/<subsnippet_context>/<asset_path> on the master snipper executor.

  6. At the end of our last snippet all of the winner.txt files are merged into a single summary.txt. This will be the product of this snippet and needs to be defined as an output asset. Now you can eventually click Play icon button on the master snippet panel. The summary.txt is created.

  7. Let's create our last snippet, that will process our summary.txt and produce Gnuplot chart result.png visualizing our experiment.

    reset
    set term pngcairo
    set output 'result.png'
    set xtics mirror rotate by -90 font ",8"
    plot "summary.txt" using 1:xtic(2) with boxes lt 4 notitle 

    Cleck Play icon button on the last snippet panel. Now you can open result.png in the file manager and see the experiment result.

WebGUI

The WebGUI mechanism allows the experiment execution to be paused with a user query by displaying a dynamic web page inside the Experiment Workbench (EW). EW comes with a sample implementation of such a web application giving you the possibility to easily build simple web forms for users executing your experiments. By following this tutorial you will ask the user to provide their gender, age and a few words about them from a Ruby scripting language.

  1. Login to one of the available execution hosts where a Ruby interpreter is available.

  2. From the list of available interpreters pick Ruby (any Ruby version will do).

  3. Because the WebGUI mechanism is based on REST-like calls we need to import a Ruby library which supports such calls. Paste the following code into the Ruby snippet:

    baseUri = "#{ENV['GS2_WEBGUI_ENDPOINT']}"
    if(baseUri.start_with?('https'))
    	require 'net/https'
    else
    	require 'net/http'
    end

    As you can see depending on the schema we use for the connection (http or https) we import a different Ruby library. Also, you can see that the base of the URL we will use for WebGUI communication can be obtained from the environment (appropriate environmental variables are set by the Workbench).

  4. In the next piece of code a POST request needs to be built and sent to the Workbench. It needs to carry the information about how the web form should look like, what is the current experiment's execution identifier and what are the URLs for subsequent communication steps. Append the following code to your Ruby snippet:

    startUri = URI.parse("#{baseUri}/start")
    http = Net::HTTP.new(startUri.host, startUri.port)
    if(baseUri.start_with?('https'))
    	http.use_ssl = true
    end
    startRequest = Net::HTTP::Post.new(startUri.path)
    startRequest.set_form_data(
      'address' => "#{ENV['GS2_WEBGUI_ENDPOINT']}/render",
      'json' => "{webguiDocType: 'request', label: 'User data', data: [\
        {name: 'gender', label: 'Gender', pref: 'radio', options:[\
          {label: 'Male', value: 'male'}, {label: 'Female', value: 'female'}], value: 'female'},\
        {name: 'age', label: 'Age', pref: 'text'},\
        {name: 'about', label: 'About me', pref: 'richTextArea'}]}",
      'respondTo' => "#{ENV['GS2_WEBGUI_ENDPOINT']}/submitData",
      'gs2ExperimentSessionId' => ENV['GS2_EXPERIMENT_SESSION_ID']
    )
    http.start {|http|
    	http.request(startRequest)
    }

    As you an see, there are four parameters of the POST request:

    • address - address of the application which will render the user request,
    • json - this is the message passed to the web application (JSON format is used),
    • respondTo - this is the address of the Workbench service for returning the user data,
    • gs2ExperimentSessionId - this is the identifier of the currently executed experiment, its value can be obtained from the environment.

    At the end the request is sent to the Workbench and it is possible to monitor its status.

  5. To check the status of the WebGUI request we will now periodically test for user data availability and eventually output the data. To do that paste the following code into your Ruby snippet:

    checkUri = URI.parse("#{baseUri}/checkStatus?gs2ExperimentSessionId=#{ENV['GS2_EXPERIMENT_SESSION_ID']}")
    http = Net::HTTP.new(checkUri.host, checkUri.port)
    if(baseUri.start_with?('https'))
    	http.use_ssl = true
    end
    checkData = 'IN_PROGRESS'
    checkRequest = Net::HTTP::Get.new(checkUri.request_uri)
    until checkData != 'IN_PROGRESS'
    	checkData = http.start {|http|
    		response = http.request(checkRequest)
    		response.body
    	}
    	sleep 1
    end
    p checkData

    To check the WebGUI request status a GET request is used. If the data is not yet available the request will return with the IN_PROGRESS value. Otherwise the response is returned. In case the user clicks the Cancel button in the WebGUI popup window the CANCEL value is returned.

  6. Execute the snippet by clicking the Play icon button inside the Ruby snippet. A popup window should open asking you to input the requested data. After submitting the form a corresponding JSON document should be displayed in the Output panel.

To find out more about the WebGUI mechanism (e.g. about how to write your own web application which can be integrated into Experiment Workbench) visit the Experiment Workbench User Guide here.

RESTful invocation

GridSpace Experiment Workbecnh provides an API to invoke executors via REST services. The mechanism of invocation is described in the User Guide here. This tutorial is an example of using the Ruby language to execute code with a REST service. We will use a simple http client to submit a task, get execution status and retrieve standard output and error contents.

  1. Log in to the Workbench using any executor with a Ruby interpreter configured.

  2. Select Ruby interpreter in an empty snippet panel.

  3. Copy and paste the code listed below into the snippet. It uses a ruby HTTP client to submit a gLite job using the Grid executor accessed through the Workbench REST API. The job produces output containing the host name of the worker node. Please notice that the snippet uses a few environment variables provided by Experiment Workbench to construct the invocation address and obtain the authentication token (ENV['GS2_REST_ENDPOINT' and ENV['GS2_EXPERIMENT_SESSION_ID']). In order to submit a task we use the "run" service via a POST HTTP method and "application/json" as the type of the request message.

    require "net/http"
    require 'json'
    
    session_id = ENV['GS2_EXPERIMENT_SESSION_ID']
    
    run = {"code" => "echo \"Sample execution on `hostname` machine\""}
    
    uri = URI("#{ENV['GS2_REST_ENDPOINT']}/grid-executor-0.1.1/run/#{session_id}")
    http = Net::HTTP.new(uri.host, uri.port)
    
    request = Net::HTTP::Post.new(uri.request_uri)
    request["Content-Type"] = "application/json"
    request.body = JSON.dump(run)
    
    response = http.request(request)
    resp = JSON.parse(response.body)
    
    if (resp["responseStatus"] == "ERROR") 
       puts "Execution failed: #{resp["errorMessage"]}"
       exit
    end 
    
    idFile = File.open("/tmp/controller_id.txt", "w")
    idFile.write(resp["controllerId"])
    
    puts "JOB STARTED!"
    puts "CONTROLLER ID: #{resp["controllerId"]}"
  4. Run the snippet code using the Play icon button. You should see a message similar to the one visible below.

    Execution failed: Executor grid-executor-0.1.1 is not connected
  5. The message indicates that you should be logged into a grid executor. To do that click on the name of the executor you are currently logged in (in the top-right corner of the workbench) and select Login to another executor. In the popup window select grid executor. To authenticate you need a generated proxy with the vo.plgrid.pl voms extension. The proxy may also be generated using an applet provided by the Workbench. Choose appropriate authentication method and log into the grid executor.

  6. Now, when you run the snippet again the message should indicate that the job was succesfully submitted.

    JOB STARTED!
    CONTROLLER ID: a96dfb9b-46ec-4bd2-8f98-49c79fa13bfb

    The controller identifier was saved in the file "/tmp/controller_id.txt" and will be used later to check status and output of the job.

  7. The next snippet reads the controller id from the saved file and gets the status of the job for us. This time we are using a service called "status" and the GET HTTP method. You may run the snippet listed below several times and wait until the job is done.

    require "net/http"
    require 'json'
    
    id = File.read('/tmp/controller_id.txt')
    session_id = ENV['GS2_EXPERIMENT_SESSION_ID']
    
    uri = URI("#{ENV['GS2_REST_ENDPOINT']}/grid-executor-0.1.1/status/#{session_id}/#{id}")
    http = Net::HTTP.new(uri.host, uri.port)
    
    request = Net::HTTP::Get.new(uri.request_uri)
    request["Content-Type"] = "application/json"
    
    resp = JSON.parse(http.request(request).body)
    
    if (resp["responseStatus"] == "ERROR") 
       puts "Get status failed: #{resp["errorMessage"]}"
       exit
    end 
    
    puts "JOB STATUS: #{resp["status"]}"
  8. When the submitted job is finished a message produced by the snippet should be identical to the one listed below:

    JOB STATUS: WMS:DONE
  9. Now, let's get jobs' standard output and error.

    require "net/http"
    require 'json'
    
    id = File.read('/tmp/controller_id.txt')
    session_id = ENV['GS2_EXPERIMENT_SESSION_ID']
    
    uri = URI("#{ENV['GS2_REST_ENDPOINT']}/grid-executor-0.1.1/output/#{session_id}/#{id}")
    http = Net::HTTP.new(uri.host, uri.port)
    
    request = Net::HTTP::Get.new(uri.request_uri)
    request["Content-Type"] = "application/json"
    
    resp = JSON.parse(http.request(request).body)
    
    if (resp["responseStatus"] == "ERROR") 
       puts "Get output failed: #{resp["errorMessage"]}"
       exit
    end 
    
    puts "JOB OUTPUT: #{resp["output"]}"
    
    uri = URI("#{ENV['GS2_REST_ENDPOINT']}/grid-executor-0.1.1/error/#{session_id}/#{id}")
    
    request = Net::HTTP::Get.new(uri.request_uri)
    request["Content-Type"] = "application/json"
    
    resp = JSON.parse(http.request(request).body)
    
    if (resp["responseStatus"] == "ERROR") 
       puts "Get error failed: #{resp["errorMessage"]}"
       exit
    end 
    
    puts "JOB ERROR: #{resp["error"]}"
  10. When the output is ready we should be able to see the message produced by the job.

    JOB OUTPUT: Sample execution on wn654 machine