RedLib - Quick Start
Here you will find some background information about the reduction library (RedLib) of PARCutils, as well
as some examples to illustrate the numerous features it provides.
1. RedLib Framework Design and Implementation
1.1 Why Reductions?
In many parallel computations, each of the parallel components provides a partial result, which needs
to be integrated with that of other components in computing the final outcome. The process of integrating
the results from different components is called Reduction.
Moreover, parallel computing on shared-memory applications is predominant, as multi-core devices
have become ubiquitous. For the same reason, reductions have been exploited in parallel systems for a long time.
1.2 Design Principles of RedLib
The framework design emphasizes on enhancing modifiability and code reuse, by basing the mechanism on a class
called Reducible, and a base interface called Reduction. You can find comprehensive explanations on Reducible
and Reduction in the
tutorial that is provided for intorduction to
Parallel It.
It is noteworthy that every reduction class provided by RedLib, or developed by individula programmer for customized
operations must implements the
Reduction
interface, which is presented in the following paragraphs:
public interface Reduction{
public E reduce (E first, E second);
}
Moreover, the features provided by RedLib are meant to demonstrate comparable performance with respect
to user-manual implementations (i.e., no overhead penalties). These features can be listed as follows.
- Providing reusable implementations for promoting modifiability and code reuse
- Including relational algebraic operations (e.g., union and intersection of sets) in order to allow more complex reductions
- Avoiding inefficiency factors, such as state variables and shared objects, so that implementations can be used in parallel as well
- Supporting nesting of reduction objects for dictionary types, by performing reductions through separate stages in this category
- Using generic <Key, Value> data pairs in order to allow wider range of data processing
1.3 Categories of Implemented Reductions
RedLib offers a set of ready-to-use reductions from three different levels of complexity. The following paragraphs explain each of these
complexity levels in details, and specify what reductions are implemented for each category.
1.3.1 Scalar/Simple Reductions
This category includes reduction operations on primitive types at the lowest complexity level. These operations can be listed as follows.
- Sum, Multiplication, Average, Minimum, Maximum operations for Integer, Long, Short, Float, Double, BigInteger and BigDecimal types
- BitWiseAND, BitWiseOR and BitWiseXOR operations for Boolean, Byte, Integer and Short types
- AND, OR, XOR operations for the Boolean type
The following code snippet demonstrates the implementation for
FloatAverage as one of the operations available at this level.
public Class FloatAverage implements Reduction<Float>{
public Float reduce (Float first, Float second){
return (first + second)/2.0f;
}
}
1.3.2 Collections/Aggregate Type Reductions
In the higher level of the operations, reduction operations are performed on aggregate types, which are slightly more complex than reductions
on primitive types. Reduction operations at this level use generic types, and include union and intersection of Collections and Sets
(unlike Sets, Collections allow duplicates). As an example, the following pseudo-code represents the implementation of SetUnion in RedLib.
public Class T SetUnion implements Reduction<Set<T>>{
public Set<T> reduce (Set<T> first, Set<T> second){
for (T t : second){
//the following method disregards duplicates
first.add(t);
}
return first;
}
}
1.3.3 Map/Dictionary Reductions
Finally, at the highest level, reductions involve operations on dictionary types (e.g., union and intersection on maps), using generic types for
<Key, Value> pairs. Reductions in this category are comprised of two stages. First, elements are grouped based on their keys. Second,
elements that are associated to the same key are reduced into one element.
At this level of complexity, reductions can nest other reduction objects for performing the second stage, thus a comprehensive range of combinations
can be formed that promote code reuse. As an example, the following pseudo-code represents the implementation of MapUnion in RedLib.
public Class <K, V> MapUnion implements Reduction<Map<K, V>> {
private Reduction<V> reducer;
MapUnion (Reduction<V> r){
reducer = r;
}
public Map<K, V> reduce (Map<K, V> first, Map<K, V> second){
for (K k2 : second.getKeys()){
if (first.contains(k2)){
V result = reducer.reduce(v1, v2);
first.put(k2, result);
}else
first.add(k2, v2);
}
}
return first;
}
1.4 Nesting Reductions
It is now clear how confusing reductions can become, when it comes to complex high level datastructures. Implementing these types of complex
reductions can be cumbersome, error prone and hard to modify for individual programmers. Moreover, imagine how the readability of one's code
may be affected when implementing reductions on convoluted dictionary types, such as map of maps, etc.
RedLib allows programmers to nest reduction operations into the third category for implementing complex reductions on convoluted dictionary
types. Considering the variety of reductions that can be nested, and considering that RedLib reductions are based on generic types, users
will have considerable flexibility at runtime for specifying their reduction operations. Moreover, the code required for implementing such
operations will be reduced to one or two lines. The next section presents some examples that clarify the flexibility and ease of use of
RedLib Reductions.
2. RedLib Examples
In this section we present some of the reduction operations that are exploited in well-known distributed system applications. The applications
discussed in this section were originally implemented by HiBench (a benchmark suite that is released as a contribution between Hadoop and Intel).
However, we had to port the implementations to their shared-memory equivalents, considering that RedLib is a framework focused on shared-memory
applications (even though it can be used for distributed memory as well), but the original benchmarks were based on Hadoop for distributed memory.
2.1 WordCount
This application counts the frequency of user-specified patterns among a number of documents, and is commonly used as a functional module within
distributed-memory applications. In this application a number of documents are inspected in parallel by a number of threads. Each parallel task
processes a document separately for the specified patterns and returns a map of <Pattern, Frequency> tuples, which represents the frequency
for each pattern in that document. Furthermore, when the results from all of the threads are returned, the values corresponding to each pattern are
summed, in order to form the final map.
The reduction operation thereof is performed in two layers. The outer layer produces the unions of the maps that are returned from different threads.
Furthermore, the inner layer is performed when the union of every two map is being calculated, and sums every two values (i.e., frequencies) that are
associated to the same key (i.e., pattern). The following pseudo-code presents one possible implementation for this algorithm by an individual programmer.
Set<String> patterns = secondMap.keySet();
for (String pattern : patterns){
if (finalResult.containsKey(pattern)){
int tempValue = finalResult.get(pattern) + secondMap.get(pattern);
finalResult.put(pattern, tempValue);
}
else{
finalResult.put(pattern, secondMap.get(pattern));
}
}
This implementation can be reduced to two lines of code using RedLib reductions. The following code presents the equivalent implementation of the
operation thereof in RedLib.
import pu.RedLib.*;
...
MapUnion<String, Integer> reducer = new MapUnion< >(new IntegerSum());
reducer.reduce(finalResult, secondMap);
As demonstrated in the code above, an instance of
IntegerSum has been nested in an instance of
MapUnion. This implies that the inner
reduction operation of
MapUnion is the sum of the values associated to the same key. Thus, if one requires to have the maximum of the two
values instead of the sum, the only change required to the code is nesting and instance of
IntegerMaximum rather than
IntegerSum.
2.2 RankedInvertedIndex
This application lists the documents in which user-specified words can be found, as well as the frequencies of each word in those documents. The application is commonly
utilized in web-based search tasks, where documents are ranked based on their relevance to a specific topic. In this application, parallel tasks
search through documents separately, and each task returns its own map of <Word, <Document, Frequency>> tuples. Furthermore, at the final
stage the maps returned from the parallel tasks are reduced to form the final map of <Word, <Document, Frequency>> tuples.
The reduction operations involved in this process can be structured in three layers. The outer-most layer produces the union of the
<Word, <Document, Frequency>> maps returned from different threads. Therefore, if a specific word is found in different documents by
separate threads, the union of <Document, Frequency> tuples associated to that word is produced in the second layer. Furthermore, if separate
threads investigate the same document for the same word, the maximum value returned from those threads will be associated to that document in the
final result. The following code present one possible approach for implementing this reduction.
Set<String> words = secondMap.keySet();
for (String word : words){
if (firstMap.containsKey(word)){
Set<String> documents = secondMap.get(word).keySet();
for (String document : documents){
if (firstMap.get(word).containsKey(document)){
if (firstMap.get(word).get(document) < secondMap.get(word).get(document))
firstMap.get(word).put(document, secondMap.get(word).get(document));
}
else{
firstMap.get(word).put(document, secondMap.get(word).get(document));
}
}
}
else{
firstMap.put(word, secondMap.get(word));
}
}
The reduction algorithm for this application is a considerably complex operation on
map of maps. Implementing these types of operations can be confusing, error prone, hard
to modify and more imprtantly, they can be implemented inefficiently by novice developers.
RedLib enables nesting different layers of reduction operations and implementing such
algorithms with only two lines of code, as follows.
import pu.RedLib.*;
...
Reduction<Map<String, Map<String, Integer>>> reducer =
new MapUnion<String, Map<String, Integer>>(new MapUnion<String, Integer>(new IntegerMaximum()));
reducer.reduce(finalResult, secondMap);
As demonstrated in the code above, reduction layers are nested into eachother, such that the
inner-most reduction layer is preformed by the inner-most reduction object (i.e.,
IntegerMaximum).
Note that, every RedLib reduction object can be declared as an instance of its base interface (i.e., Reduction).
This approach provides the advantages of polymorphism, and when combined with the concept of nestable reductions,
it offers programmers with runtime flexibility for deciding about the types of reduction operations.