Parallel IT - Parallel Iterator, Parallel Task and Pyjama

RedLib - Quick Start

Here you will find some background information about the reduction library (RedLib) of PARCutils, as well as some examples to illustrate the numerous features it provides.

1. RedLib Framework Design and Implementation

1.1 Why Reductions?
1.2 Desing Principles of RedLib
1.3 Categories of Implemented Reductions
1.4 Nesting Reductions

2. RedLib Examples

2.1 WordCount
2.2 RankedInvertedIndex

1. RedLib Framework Design and Implementation

1.1 Why Reductions?

In many parallel computations, each of the parallel components provides a partial result, which needs to be integrated with that of other components in computing the final outcome. The process of integrating the results from different components is called Reduction.
Moreover, parallel computing on shared-memory applications is predominant, as multi-core devices have become ubiquitous. For the same reason, reductions have been exploited in parallel systems for a long time.

1.2 Design Principles of RedLib

The framework design emphasizes on enhancing modifiability and code reuse, by basing the mechanism on a class called Reducible, and a base interface called Reduction. You can find comprehensive explanations on Reducible and Reduction in the tutorial that is provided for intorduction to Parallel It.
It is noteworthy that every reduction class provided by RedLib, or developed by individula programmer for customized operations must implements the Reduction interface, which is presented in the following paragraphs:

public interface Reduction{
    public E reduce (E first, E second);
}

Moreover, the features provided by RedLib are meant to demonstrate comparable performance with respect to user-manual implementations (i.e., no overhead penalties). These features can be listed as follows.

Providing reusable implementations for promoting modifiability and code reuse
Including relational algebraic operations (e.g., union and intersection of sets) in order to allow more complex reductions
Avoiding inefficiency factors, such as state variables and shared objects, so that implementations can be used in parallel as well
Supporting nesting of reduction objects for dictionary types, by performing reductions through separate stages in this category
Using generic <Key, Value> data pairs in order to allow wider range of data processing

1.3 Categories of Implemented Reductions

RedLib offers a set of ready-to-use reductions from three different levels of complexity. The following paragraphs explain each of these complexity levels in details, and specify what reductions are implemented for each category.

1.3.1 Scalar/Simple Reductions

This category includes reduction operations on primitive types at the lowest complexity level. These operations can be listed as follows.

Sum, Multiplication, Average, Minimum, Maximum operations for Integer, Long, Short, Float, Double, BigInteger and BigDecimal types
BitWiseAND, BitWiseOR and BitWiseXOR operations for Boolean, Byte, Integer and Short types
AND, OR, XOR operations for the Boolean type

The following code snippet demonstrates the implementation for FloatAverage as one of the operations available at this level.

public Class FloatAverage implements Reduction<Float>{
    public Float reduce (Float first, Float second){
        return (first + second)/2.0f;
    }
}

1.3.2 Collections/Aggregate Type Reductions

In the higher level of the operations, reduction operations are performed on aggregate types, which are slightly more complex than reductions on primitive types. Reduction operations at this level use generic types, and include union and intersection of Collections and Sets (unlike Sets, Collections allow duplicates). As an example, the following pseudo-code represents the implementation of SetUnion in RedLib.

public Class T SetUnion implements Reduction<Set<T>>{
    public Set<T> reduce (Set<T> first, Set<T> second){
      for (T t : second){
         //the following method disregards duplicates
         first.add(t);
      }
      return first;
   }
}

1.3.3 Map/Dictionary Reductions

Finally, at the highest level, reductions involve operations on dictionary types (e.g., union and intersection on maps), using generic types for <Key, Value> pairs. Reductions in this category are comprised of two stages. First, elements are grouped based on their keys. Second, elements that are associated to the same key are reduced into one element.
At this level of complexity, reductions can nest other reduction objects for performing the second stage, thus a comprehensive range of combinations can be formed that promote code reuse. As an example, the following pseudo-code represents the implementation of MapUnion in RedLib.

public Class <K, V> MapUnion implements Reduction<Map<K, V>> {

      private Reduction<V> reducer;
      
      MapUnion (Reduction<V>  r){
          reducer = r;
      }
      
      public Map<K, V> reduce (Map<K, V> first, Map<K, V> second){
      for (K k2 : second.getKeys()){ 
         if (first.contains(k2)){ 
            V result = reducer.reduce(v1, v2); 
            first.put(k2, result);
         }else
            first.add(k2, v2);
         }
      }
      return first;  
}

1.4 Nesting Reductions

It is now clear how confusing reductions can become, when it comes to complex high level datastructures. Implementing these types of complex reductions can be cumbersome, error prone and hard to modify for individual programmers. Moreover, imagine how the readability of one's code may be affected when implementing reductions on convoluted dictionary types, such as map of maps, etc.
RedLib allows programmers to nest reduction operations into the third category for implementing complex reductions on convoluted dictionary types. Considering the variety of reductions that can be nested, and considering that RedLib reductions are based on generic types, users will have considerable flexibility at runtime for specifying their reduction operations. Moreover, the code required for implementing such operations will be reduced to one or two lines. The next section presents some examples that clarify the flexibility and ease of use of RedLib Reductions.

2. RedLib Examples

In this section we present some of the reduction operations that are exploited in well-known distributed system applications. The applications discussed in this section were originally implemented by HiBench (a benchmark suite that is released as a contribution between Hadoop and Intel). However, we had to port the implementations to their shared-memory equivalents, considering that RedLib is a framework focused on shared-memory applications (even though it can be used for distributed memory as well), but the original benchmarks were based on Hadoop for distributed memory.

2.1 WordCount

This application counts the frequency of user-specified patterns among a number of documents, and is commonly used as a functional module within distributed-memory applications. In this application a number of documents are inspected in parallel by a number of threads. Each parallel task processes a document separately for the specified patterns and returns a map of <Pattern, Frequency> tuples, which represents the frequency for each pattern in that document. Furthermore, when the results from all of the threads are returned, the values corresponding to each pattern are summed, in order to form the final map.
The reduction operation thereof is performed in two layers. The outer layer produces the unions of the maps that are returned from different threads. Furthermore, the inner layer is performed when the union of every two map is being calculated, and sums every two values (i.e., frequencies) that are associated to the same key (i.e., pattern). The following pseudo-code presents one possible implementation for this algorithm by an individual programmer.

Set<String> patterns = secondMap.keySet();
  for (String pattern : patterns){ 
	if (finalResult.containsKey(pattern)){
	 	int tempValue = finalResult.get(pattern) + secondMap.get(pattern);
 		finalResult.put(pattern, tempValue);
 	}
 	else{
 		finalResult.put(pattern, secondMap.get(pattern));
 	}
}

This implementation can be reduced to two lines of code using RedLib reductions. The following code presents the equivalent implementation of the operation thereof in RedLib.

import pu.RedLib.*;
...
MapUnion<String, Integer> reducer = new MapUnion< >(new IntegerSum());
reducer.reduce(finalResult, secondMap);

As demonstrated in the code above, an instance of IntegerSum has been nested in an instance of MapUnion. This implies that the inner reduction operation of MapUnion is the sum of the values associated to the same key. Thus, if one requires to have the maximum of the two values instead of the sum, the only change required to the code is nesting and instance of IntegerMaximum rather than IntegerSum.

2.2 RankedInvertedIndex

This application lists the documents in which user-specified words can be found, as well as the frequencies of each word in those documents. The application is commonly utilized in web-based search tasks, where documents are ranked based on their relevance to a specific topic. In this application, parallel tasks search through documents separately, and each task returns its own map of <Word, <Document, Frequency>> tuples. Furthermore, at the final stage the maps returned from the parallel tasks are reduced to form the final map of <Word, <Document, Frequency>> tuples.
The reduction operations involved in this process can be structured in three layers. The outer-most layer produces the union of the <Word, <Document, Frequency>> maps returned from different threads. Therefore, if a specific word is found in different documents by separate threads, the union of <Document, Frequency> tuples associated to that word is produced in the second layer. Furthermore, if separate threads investigate the same document for the same word, the maximum value returned from those threads will be associated to that document in the final result. The following code present one possible approach for implementing this reduction.

Set<String> words = secondMap.keySet();
for (String word : words){
  if (firstMap.containsKey(word)){
    Set<String> documents = secondMap.get(word).keySet();
    for (String document : documents){
         if (firstMap.get(word).containsKey(document)){
           if (firstMap.get(word).get(document) < secondMap.get(word).get(document)) 							
  	         firstMap.get(word).put(document, secondMap.get(word).get(document)); 
         }
         else{
             firstMap.get(word).put(document, secondMap.get(word).get(document));
         }
    } 	
  }
  else{
	  firstMap.put(word, secondMap.get(word));
  }
}

The reduction algorithm for this application is a considerably complex operation on map of maps. Implementing these types of operations can be confusing, error prone, hard to modify and more imprtantly, they can be implemented inefficiently by novice developers. RedLib enables nesting different layers of reduction operations and implementing such algorithms with only two lines of code, as follows.

import pu.RedLib.*;
...
Reduction<Map<String, Map<String, Integer>>> reducer = 
          new MapUnion<String, Map<String, Integer>>(new MapUnion<String, Integer>(new IntegerMaximum()));
reducer.reduce(finalResult, secondMap);

As demonstrated in the code above, reduction layers are nested into eachother, such that the inner-most reduction layer is preformed by the inner-most reduction object (i.e., IntegerMaximum). Note that, every RedLib reduction object can be declared as an instance of its base interface (i.e., Reduction). This approach provides the advantages of polymorphism, and when combined with the concept of nestable reductions, it offers programmers with runtime flexibility for deciding about the types of reduction operations.