Explaining the Qubic Computation Model: Part 4

In the first part of this series, we have looked at a conceptual overview of the Abra specification, which describes the how data flow is achieved within the Qubic Computation Model. In the second part we used the Abra specification to start implementing Qupla, a higher level programming language for Qubic. We introduced Qupla’s most basic entities (trit vectors, look-up tables, and constants) and how they map to Abra. The third part explored the way Qupla uses functions and expressions. This fourth part will show what basic operations Qupla provides.

A quick recap: Qupla expressions are made up of a very limited number of basic operations. These operations can be combined in different ways to express the intent for the expression. There are only five basic operations in Qupla from the programmer’s viewpoint:

Call, which calls a predefined function with a given number of input parameters and evaluates to the return value associated with those input values.
Look-up, which looks up a given combination of input trits in a specific LUT and evaluates to the corresponding LUT output value.
Slice, which evaluates to a sub-vector slice taken from a given input trit vector at a given offset and size within that trit vector.
Concatenate, which evaluates to a single trit vector made up of the concatenation of multiple given input trit vectors.
Merge, which evaluates to the single non-null trit vector from several given equal-sized input trit vectors.

We will now discuss each of these operations separately.

Now that we know from the previous article how a function is declared we will look deeper into the details of how a function is actually called. The earlier examples of function calls that were used in the function bodies may have seemed very obvious for someone who has programmed in other languages, but Qupla function calls come with a few twists.

It is expected that most IoT devices will utilize FPGAs or ASICs in the future. The Abra specification supports the way in which function calls necessarily are implemented on such devices. Remember that FPGAs use hard-wired circuitry to implement a function and data flows through that circuitry. A function is hard-wired to the code that calls it. That means that in FPGA programs any time a function is called it must be replicated in circuitry by instantiating a duplicate of that function and hard-wiring the duplicate to the call point.

On FPGA there is no real function call overhead because it’s just direct connection of circuitry, hard-wired together. Essentially it is as if every function is inlined at the place of the call. For example, the results of parameter expressions are directly wired to where those parameters are used within the function call. We will see that this inline property of functions has really nice consequences for the few basic operations that Qupla has.

However, function instantiation has a very important consequence for state variables, because each function instantiated on FPGA will have its own copy of the state variable circuitry. It means that you will only be able to get to the exact same state variable when the exact same call path is followed to the exact same function instance. This means that a specific state variable can be accessed only once during a single processing wave because every function call within a single wave necessarily has a different call path.

The implication for programming with state variables is therefore that you need to encapsulate state variables in micro-services that are accessed in separate waves through interaction with the Qubic Supervisor. We will go deeper into this paradigm shift when we discuss the Supervisor.

Note that even though on CPU-based devices it is not necessary to instantiate each function separately, we still need to ensure identical behavior on all possible devices, which means we need to emulate the behavior of FPGA-based implementations. State variables will therefore still have to be associated with their call path and will have to be stored separately from the function. This can easily be achieved by using a global HashMap where the state variable value is stored under a key that consists of the call path plus the state variable id.

Another twist with function calls is that when all input parameters are null, the resulting output is automatically null as well. The reason for that is that null signifies the absence of value. And on FPGA circuitry this could mean that the function does not get triggered at all so that no energy is spent and of course there is no output value in that case. It means that for any function to start executing it needs at least a single non-null trit as input data.

This feature can be used to prevent further alternative path execution in decision logic as we will see when we explore decision logic in depth. On CPU devices a quick test of the input parameters can be used to short-circuit the function when all inputs are null by not calling the function at all and returning null immediately. Note that by using proper function analysis we can even short-circuit functions on CPU that we know to return null when any of the input parameters is null. An example of such a function isadd(a, b), which will return null wheneitherais null or bis null

We will go deeper into the impact on Qupla programming style of these exotic function call mechanisms when we explore some code examples in depth in a future part of this series. But for now we will first explore the remaining four basic operations in Qupla, because at some point a function call needs to actually do something more than just call other functions.

The trit vector concatenation operation is used to concatenate multiple trit vectors into a single trit vector of cumulative size. We will often want to use a divide-and-conquer technique when creating functionality by composing larger vectors from several smaller vectors. Therefore we need the ability to split up a vector in smaller sub-vectors (slice operation), so that we can perform operations on them, and then we need to recombine the small result vectors into larger combined result vectors (concatenation operation). Consider this example:

func Trit all<Trit> (Trit val) {return val}func Tryte all<Tryte>(Trit val) {return val & val & val}func Tiny all<Tiny>(Trit val) {return all<Tryte>(val) & all<Tryte>(val) & all<Tryte>(val)}func Int all<Int>(Trit val) {return all<Tiny>(val) & all<Tiny>(val) & all<Tiny>(val)}

The above functions can be used to create a trit vector that consists of all zeroes, all ones, or all minuses, depending on the input trit value. You simply pass in the trit value you want repeated. Except for the very smallest type most of the larger types can be expressed in terms of three instances of the next smaller type. This feature has been put to good use in the standard Qupla library to create functionality that applies to a wide range of trit vector sizes.

As you can see in the first function of the example the most trivial case is for the Trit type. We simply return the single trit parameter value as a single trit.

The Tryte type in the second function requires a bit more work. Here we use the trit vector concatenation operation&to combine 3 instances of the single Trit parameter value into a single 3-trit Try tevector. As you can see, it is okay to use the same input multiple times as input to a concatenation operation.

The Tiny type in the third function really starts demonstrating how to build larger types from smaller types. Here we use&with the previous Tryte type function to concatenate three 3-trit Tryt evectors into a single 9-tritTinyvector.

Finally, the fourth function does the same for the Int type by using&to concatenate three 9-trit Tiny vectors into a single 27-tryte Intvector.

In fact, since we defined each of our larger types to be 3 times larger than the previous smaller type, we can use a template that can instantiate all functions for such types as follows:

func Trit all<Trit> (Trit val) {return val}func Tryte all<Tryte>(Trit val) {return val & val & val}template all<T> {type P [T / 3]func T all<T>(Trit val) {return all<P>(val) & all<P>(val) & all<P>(val)}}

In fact, even the functionall < Tryte > () could be defined using this template as well if you think about it. Sinceall < Trit > () returns the input parameter valwe could replace the 3 concatenations of val with 3 calls toall<Trit>(). “But doesn’t that mean inefficiency due to extra function call overhead?”
I’m glad you asked.

To programmers used to programming on a CPU it may seem that all these function calls and concatenations that basically manipulate single trits at a time may result in a lot of inefficient overhead, and sure, on a CPU this would probably be the case. But remember that Abra is geared towards IoT and therefore FPGA programming. Let’s examine what that means for these Qupla functions when they get translated to Abra.

// func Trit all<Trit>(Trit val)branch all_1[ 1] in val[ 1] out ret = merge{val}//func Tryte all<Tryte>(Trit val)branch all_3[ 1] in val[ 1] out ret0 = merge{val}[ 1] out ret1 = merge{val}[ 1] out ret2 = merge{val}// func Tiny all<Tiny>(Trit val)branch all_9[ 1] in val[ 3] out ret0 = all_3(val)[ 3] out ret1 = all_3(val)[ 3] out ret2 = all_3(val)

Parameter passing, return value passing, and concatenation of trit vectors can be done by directly wiring up circuitry. The first function returns its input parameter, which means its resulting output trit can be wired directly to the input parameter trit, which in turn means the entire function ‘disappears’ and becomes a single wire. In the second function something similar happens. The 3 output trits can be wired directly to the input parameter trit to create the resulting 3 output trits. Which again means the function ‘disappears’ into simple wiring.

The third function concatenates the results of 3 calls to the second function, which we already saw can each be replaced by simple wiring to their input parameter trit. Since that input parameter trit is the same as the input parameter trit for the third function the parameter passing is also just a wire connection. And the same goes again for the concatenation of those 3 results. So we end up with 9 output trits that are wired directly to the single input trit and everything in between has ‘disappeared’.

We leave the fourth function, and any further template instantiations, as an exercise to the reader. You will see that all these concatenations will reduce to simple wiring on FPGA. This means that we can express complex functionality in terms of simpler functionality easily without having to pay for the execution overhead that is normally associated with such complexity.

Of course on CPU this call overhead could become a problem, but there are solutions. Remember that function calls essentially instantiate functions in FPGA by inlining them? Well, we could do the same thing here. By inlining each function call we end up with a set of functions that are reduced to the correct number of concatenations of the input value.

In addition we will provide a mechanism that will allow Abra functions to be substituted with a dll function that uses optimal architecture-specific instructions. For example, the multiplication and division functions in our standard Qupla library will probably be very inefficient on CPU hardware, but they can be substituted with architecture-specific dll functions that implement the equivalent through hardware multiplication and division instructions.

The opposite operation of trit vector concatenation is trit vector slicing. Trit vector slicing allows us to extract a smaller sized sub-vector from an input trit vector. One of its intended uses is to split up a larger vector into smaller ones that we can then operate on before recombining the small results into a larger result through concatenation. But there are other uses as well. Consider this example:

template lshift<T> {func T lshift<T> (T val) {return val[1 : T - 1] & 0}}template rshift<T> {func T rshift<T> (T val) {return 0 & val[0 : T - 1]}}

The example implements two generic shift functions. The lshift()function will shift the input trit vector one trit to the left by shifting out the leftmost trit and shifting in a zero trit on the right. The rshift()function does the opposite by shifting out the rightmost trit and shifting in a zero trit on the left.

The slicing operation can take several forms. In this example we use the index form[offset : size]that specifies the zero-based index offset of the trit to start at and the size of the sub-vector we want to extract from that offset. The range thus specified needs to stay within the range of the source trit vector. Offset and size have to be constant expressions. In this case we use type sizes in these constant expressions.

As you can see the lshift()function will use the index form of the slice operation to take all input parameter trits except the leftmost one, and then use the concatenate operation to add the ‘missing’ zero trit to the right. The rshift()function does the opposite by taking all input parameter trits except the rightmost one and concatenates the ‘missing’ zero trit to the left. In Abra it looks like this when instantiating the shift functions for Tryte and Tiny:

// func Tryte lshift<Tryte>(Tryte val)branch lshift_3[ 1] in val_1_0[ 2] in val_2_1[ 2] out ret0 = val_2_1[ 1] out ret1 = constZero_[val_1_0]// func Tiny lshift<Tiny>(Tiny val)branch lshift_9[ 1] in val_1_0[ 8] in val_8_1[ 8] out ret0 = val_8_1[ 1] out ret1 = constZero_[val_1_0]// func Tryte rshift<Tryte>(Tryte val)branch rshift_3[ 2] in val_2_0[ 1] in val_1_2[ 1] out ret0 = constZero_1(val_2_0)[ 2] out ret1 = val_2_0// func Tiny rshift<Tiny>(Tiny val)branch rshift_9[ 8] in val_8_0[ 1] in val_1_8[ 1] out ret0 = constZero_1(val_8_0)[ 8] out ret1 = val_8_0

We take advantage of the fact that we can split up the input trit vector of a branch however we want. For rshift it even means that because we ignore the last trit completely we actually would not even need to specify it. We simply take the part of the trit vector we need and concatenate a zero trit at the correct end by taking advantage of the fact that multiple output sites will be concatenated.

Note that slicing, just like concatenation, on FPGA reduces to simple wiring of the desired input trits to the output result. The output of the above functions can be directly wired to the desired trits of the input parameter trit vector, and the ‘missing’ zero trit is a simple constant LUT look-up.

Here are the possible forms of the slice operation:

value[offset] // returns the single trit found at <offset>value[offset : size] // returns <size> trits starting at <offset>value.field // returns all the trits of the <field> that is part// of the structured trit vector <value>

Notice how accessing a field in a structured trit vector is essentially a slice operation as well. In fact, even using a variable can be viewed as taking the entire vector as a slice, because in the end it will all end up as wiring inside FPGA circuitry. The same goes for function call input parameters and results. That means there is no actual temporary variable storage necessary for variables in FPGA circuitry, instead it’s all just data flowing directly from circuit to circuit.

In the previous sections we showed very simple functions that could directly slice their input or directly concatenate their output. That is not always the case, however. When a slice operation takes place in the middle of a sequence of operations it is not possible to take only a slice from a result directly. In that case Qupla will generate a slice function:

branch slice_3_0 // slice 3 trits at offset 0[ 3] in target[ 3] out ret = merge{target}branch slice_9_0 // slice 9 trits at offset 0[ 9] in target[ 9] out ret = merge{target}branch slice_3_3 // slice 3 trits at offset 3[ 3] in offset[ 3] in target[ 3] out ret = merge{target}branch slice_3_6 // slice 3 trits at offset 6[ 6] in offset[ 3] in target[ 3] out ret = merge{target}

These examples show how Qupla tries to generate optimal slicing functions. Remember that a slice operation always has an offset and a size. These functions don’t care about the size of the input trit vector. They only take the necessary amount of target trits starting at the offset and return those immediately. We encode the size and offset as part of the function name for easy use.

Note how these slice functions, once instantiated, reduce 100% to wiring. They only connect inputs to outputs. Remember, Abra describes data flow. That means that instructions don’t necessarily have to perform an action other than directing (parts of) the data flow. All slice functions will essentially be optimized away on FPGA, and only the desired data flow remains.

One interesting observation can be made about all slice functions that start at offset 0. They always return the first <size> trits. That means that we can (re-)use them as concatenation function! Remember that a branch knot site concatenates its inputs. That means a concatenation operation that needs to be generated in the middle of a sequence of operations can just a branch knot site and call the slice operation for the resulting concatenated size at offset 0, which will return the entire combined input trit vector. Again, it’s just 100% wiring directing data flow.

Now that we know how function calls, slicing, and concatenation are implemented, let’s see what the remaining Abra operations look like when implemented by Qupla. It turns out that whereas the previous instructions reduce completely to wiring, Abra’s remaining 2 instructions will actually result in circuitry. So let’s examine the LUT look-up operation and the merge operation next.

The LUT look-up operation is essentially the processing heart of the Qupla programming language. Most function call chains will eventually end up at a function that executes one or more LUT look-ups to achieve its intended functionality. In addition, the only way to realize the null values necessary for Qupla’s decision logic (merge) operation is through a LUT look-up. Let’s look at an important example:

// LUT logic: return (trit1 == true) ? trit2 : null;lut nullifyTrue {true,- = -true,0 = 0true,1 = 1}func Trit nullifyTrue<Trit> (Bool t, Trit val) {return nullifyTrue[t, val] // LUT look-up operation}template nullifyTrue<T> {type P [T / 3]func T nullifyTrue<T> (Bool t, T val) {ret0 = nullifyTrue<P>(t, val[0 * P : P]]ret1 = nullifyTrue<P>(t, val[1 * P : P]]ret2 = nullifyTrue<P>(t, val[2 * P : P]]return ret0 & ret1 & ret2}}

We’re going to use the nullifyTrueLUT that we talked about earlier to implement a number of very important functions that we will be needing to implement decision logic in Qupla in combination with the merge operation. The LUT look-up operation is indicated with the bold square brackets in the example. It looks like an array that is indexed by the separate input trits.

In the above example we will be building a set of functions that have two input parameters. The first is aBool, and the only important value for that one is true(1). The second is a parameter val that has the same type as the function type specifier. The logic of the function is such that it will either return the val parameter when the Bool parametert is true, or null when the Bool parametert has any other value than true (i.e. 0, -, or null). We call this action nullifying the value.

The first function nullifies a single Trit by doing a direct look-up using the nullifyTrueLUT. This LUT will return the value of the valtrit whenever the Booltritt equals 1. As you can see any other possible value fortis missing from the LUT declaration, which means that in those cases it returns null. Note how we specify two parameters to the look-up operation, because the nullifyTrueLUT expects two input trits: first the Boolflagt and then the valueval.

The templated function nullifies a Tvalue by slicing it in 3Psized values and nullifying each of those before concatenating the results together again to form the return value. Becauset is the same value for each look-up operation that means that we either recreate the input valtrit by trit, or construct a same-sized trit vector of null trits. Here’s what the Abra looks like:

 branch nullifyTrue_1[ 1] in t[ 1] in val[ 1] out ret = nullifyTrue_[t, val]branch nullifyTrue_3[ 1] in t[ 1] in val0[ 1] in val1[ 1] in val2[ 1] out ret0 = nullifyTrue_[t, val0][ 1] out ret1 = nullifyTrue_[t, val1][ 1] out ret2 = nullifyTrue_[t, val2]branch nullifyTrue_9[ 1] in t[ 3] in val0[ 3] in val1[ 3] in val2[ 3] out ret0 = nullifyTrue_3(t, val0)[ 3] out ret1 = nullifyTrue_3(t, val1)[ 3] out ret2 = nullifyTrue_3(t, val2)

Again, most of what is happening here is function calls, concatenation, and slice operations, which on FPGA will resolve mostly into simple wiring and parallel look-ups for each trit. There is one exception, however. Every separate trit of the input parameter val will be run through a nullifyTrue_look-up operation together with the Bool valuet. If any of the input trits to the look-up operation is null then the look-up operation will return null immediately. Otherwise it will use the input trits as an index into a table of return values. Any missing values in this table will cause null to be returned as well.

On CPU-based devices look-ups can be implemented as a simple and fast array indexing operation. On FPGAs a LUT of any output size can efficiently be implemented by using one or more trees of multiplexers that are each hooked up to some configuration ram that holds the truth table for each multiplexer tree. This means that again data will simply flow trough the circuitry and results flow out on the other side.

The merge operation is mainly used to implement decision logic in Qupla. It will take the results of several parallel execution paths as input and returns either the single non-null result, or null when there is no non-null result. If there are multiple non-null results then this is regarded as a programming error and the merge operation will cause a merge exception. Here is an example that uses the merge operation:

// LUT logic: return (Bool) (trit1 > 0)lut isGreater {- = false0 = false1 = true}// LUT logic: return (Bool) (trit1 < 0)lut isLess {- = true0 = false1 = false}// return lhs > rhs ? lhs : rhstemplate max<T> {func T max<T> (T lhs, T rhs) {greater = isGreater[cmp<T>(lhs, rhs)]retLhs = nullifyTrue<T>(greater, lhs)retRhs = nullifyFalse<T>(greater, rhs)return retLhs | retRhs // merge}}// return lhs < rhs ? lhs : rhstemplate min<T> {func T min<T> (T lhs, T rhs) {less = isLess[cmp<T>(lhs, rhs)]retLhs = nullifyTrue<T>(less, lhs)retRhs = nullifyFalse<T>(less, rhs)return retLhs | retRhs // merge}}

We define amax(lhs, rhs)and amin(lhs, rhs)function template that determines the maximum value and minimum value, respectively, of lhs and rhs. Both functions are very similar, so we will only discuss the max()function here and leave the min()function as an exercise to the reader.

Note how we use the cmp()function to determine which one of lhsandrhs is greater than the other. The cmp()function will return-when the first parameter is less than the second,1when the first parameter is greater than the second, and0when both parameters are equal. In this case we want to know if lhs is greater than rhs. We find this out by using the is GreaterLUT to check if the return value of cmp()is equal to1. If it is then greater will be set to true and otherwise to false.

Now we get to the interesting part. See how we call the nullifyTrue() and nullifyFalse()functions we defined earlier and use the merge operation indicated by|(vertical bar) to merge the results? Both execution paths to the merge operation will be evaluated. Whether that happens in parallel or in sequence does not matter for the outcome. The nullifyTrue()function call will return lhs when greater equals true, and null otherwise. The nullifyFalse()function call will return rhs when greater equals false, and null otherwise.

Note that we are assured in this way that we either return lhs as the first input and null as the second input to the merge, or we return null as the first input and rhs as the second input to the merge. This satisfies the condition that only one of the merge inputs can be non-null. The merge will therefore return either lhs or rhs, depending on which one is greater than the other. Here is the resulting Abra pseudo-code for max<Tryte>():

branch max_3[3] in lhs[3] in rhs[1] cmp = cmp_3(lhs, rhs)[1] greater = isGreater_0[cmp][3] retLhs = nullifyTrue_3(greater, lhs)[3] retRhs = nullifyFalse_3(greater, rhs)[3] out ret = merge{retLhs, retRhs}

A merge operation can be implemented on FPGA by using a multiplexer for each trit position in the input vectors that merges the input trits into a single trit at the same position in the output vector. Again, just data flowing through circuitry. Implementation on CPUs can use standard if-then-else constructs to decide which input value to return.

To be able to understand Qupla code more easily it helps to think in terms of wiring circuitry blocks. Remember that Pipe Dream computer game? Programming Qupla is a bit like that. You have black boxes (functions, look-up tables, mergers) and connect them with pipes in such a way that data can flow through in only one way. Data flows through those pipes unless it hits a look-up table that has no output for that combination of inputs. That will stop any data from flowing through the connected pipes downstream from that look-up table. Since everything is pretty much defined in terms of look-ups, higher order constructs like functions will have similar behavior. For certain combinations of input data no output data will be flowing out from them because of a programmed ‘blockage’ inside the function.

We can have several parallel data flows, and our merge operation expects at most one of those to actually flow data. So a merge connects several parallel data flows and merges it into a single one, selecting the result flow of the only path that contains data flow. This is how we implement decision logic. You can view the parallel data flows as all the cases of a switch statement being executed in parallel, but only one ends up not being blocked (selected) and then the selected data continues flowing on after the merge. The merge operation is like a place where several pipes meet up and combine into a single output pipe. But each input pipe contains a different chemical, and they don’t mix. So you don’t want more than one chemical to arrive at that point at any time if you want to avoid explosions (merge exceptions).

Qupla functions are made up of one or more expressions. There are 5 basic operations in Qupla: function call, vector concatenation, vector slicing, LUT look-up, and merge execution branches. The first 3 reduce to wiring-only on FPGA, while the other 2 are simply implemented as multiplexers. Functions are instantiated as they are called. On CPU there are reasonably efficient implementations for the 5 operations and the function instantiation can be mimicked. We have also seen that decision logic can be implemented using a combination of LUT look-up and merge operations.

In the next part of this series we will put all the parts together in a few examples and discuss them in detail. We will also introduce a few more language features.