Levels of Abstraction in ASIC Modeling

We often hear people throwing around names like "cycle-accurate", or "transaction-accurate", when discussing modeling. People don't always mean the same thing by these terms (especially for newer levels, such as transaction-accurate). This page gives my take on how these various levels are defined.

Axes of Classification

In functional modeling, we can identify two primary axes on which to classify abstraction: data and timing. These two axes can be seen most clearly in the lowest levels of modeling -- things can become a bit murky on the higher levels. So lets start at the bottom, and work our way up...

The Digital Abstraction

The fundamental abstraction of functional verification is that we consider all signals to be digital, not analog. This means that, although signals have values that are measured in voltage (which can be somewhat murky), we pretend that values are nice and clean: digital one or zero.

But, as with all abstraction, we lose something when we pretend that signals are either one or zero. In reality, signals take time to change value. While a signal is changing, its value is not well defined. Rather than deal with these issues, we extend the abstraction of "one and zero" to include values of "rising" and "falling" (and sometimes "changing").

Most of the time, we ignore these periods of unpredictability. Designers will worry about setup and hold times; or metastability of sampled signals; but functional verification people can ignore them. Tools are used to perform static timing analysis, which will catch most problems. We can also run gate-level simulations with "SDF backannotated timings". These tools are needed because abstractions are imperfect.

The Cycle Abstraction

The digital abstraction converted an analog voltage to one of two discrete values. The cycle abstraction does the same for time. We throw away analog time (usually measured in nanoseconds), and instead count time in terms of cycles of a clock.

This is possible because almost all digital designs are built using synchronous (clocked) logic. Processing is broken into chunks of combinatorial logic, whose propagation delay is less than the period of the clock. A cycle-precise model will describe the design in terms of these combinatorial blocks. This abstraction is also known as register-transfer level (RTL) because it is rooted in the transfer (and modification) of data between registers.

Until recently (with the advent of behavioral synthesis), designs were described at or below the level of RTL. But verification engineers have not been constrained to RTL. They have been able to move away from cycle-precise models, to cycle-approximate models. (Both of these levels are covered by the umbrella term: "cycle-accurate).

A cycle-approximate model promises that operations will generaly take the correct number of cycles, but not that all those cycles will be modeled. If a 32-bit multiply takes 3 clock cycles, then the model might say: "In 3 cycles time, the value is ...". It would say nothing about the values between "now" and when the data is available, three cycles later. This minor additional abstraction will allow the modeler to ignore the precise details of how the calculation is performed, and will probably allow the model to run faster (in terms of wallclock time).

Problems with the cycle abstraction emerge when there exist multiple clock domains within the device. If the clocks have no fixed phase relationship, then communication between the domains becomes indeterminate. It is impossible (in principle) to always know when a signal send from one domain will arrive at its destination. It is impossible to simulate it correctly, because you will ultimately hit quantum uncertainty in the underlying physics. A correct design will be constructed to work in this environment: but a model cannot be cycle accurate without some help -- the only reliable approach is to use an external timing input that tells the model when the signal arrives.

Even if we don't need to worry about domain crossings, there is still the possibility that we may need to be half-clock (or phase) accurate. Take, for example, a DDR DRAM. These double-data-rate devices transafer data on both the rising and falling edges of a clock. We can often ignore this, and pretend that the negative edge data arrives at the same time as the preceding (or following) rising edge. But occasionally this breaks down. In such cases, it may be necessary to build a phase-accurate model -- where both phases of the clock are modeled. Fortunately, this is rare. And even in those cases, there is no significant difference between a phase-precise model and a cycle-accurate model whose frequency is doubled. So this is considered a detail, not a fundamentally different abstraction.

Basic Data Abstraction

A model that describes its data in terms of ones and zeros (or vectors of ones and zeros), is known as a "bit accurate" model. Most cycle-precise models are bit-accurate, because the level of detail needed to model the cycles that accurately does not permit abstraction of the data. But as we move beyond cycle-precise models, we find it useful to abstract the data.

The most obvious abstraction is to replace the concept of "a vector of bits" with the concept of "an integer". I.e. "4'b1010 == 10". This might not seem like much (indeed, it isn't much), but it is a vital first step. Even bit acurate models may sometimes do this much. But note what happens to indeterminate values when we perform this abstraction: the value 4'b1x10 might be 10, or might be 14. We don't know, because bit-2 of the vector is unknown. When we abstract this 4-bit value into an integer, we don't say "its either 10 or 14": we say "its unknown". So we lose a lot of resolution. In fact, we will often just pick one of the two values, and say "its value is 10". This loss of information can sometimes cause problems!

Enumerations

Instead of abstracting to integers, we can instead abstract to enumerated types. For example, instead of saying "write_en == 1", we can say "access_type == WRITE". In doing this, we take our first step towards the abstraction of transactions. But before I go there, lets examine how enumerations can lose information.

Just as the digital abstraction loses the concepts of "rising" and "falling" edges, so enumerations can lose information when there exists a many-to-one mapping of values. Consider this set of values:

READ
WRITE
IDLE

This may describe what is happening, but it requires two bits to represent the values at the bit-accurate level. Two bits give 4 values. The mapping might be:

ce_	we	type
0	0	READ
0	1	WRITE
1	0	IDLE
1	1	IDLE

So if we just say "IDLE", we lose the value of "we". This is fine, as long as we don't wish to reconstruct the bit-accurate values from our transaction-accurate values. This has implications for verification, when we wish to compare RTL-simulation results against the model -- leading to a type of script known as "fuzzy-compare".

The Transaction Abstraction

The traditional concept of transaction abstraction is that we take a group of wires (that form an interface), and abstract the communication over that interface to its conceptual atoms. Thus the interface to memory might be "read(Addr)->data, write(addr, data), idle()"; or it may be more complex, involving split-transactions, burst-initiation, interruption, etc.

The problem with this abstraction is that it is not immediately obvious how time is abstracted. If you don't abstract time, then you have a perfectly usable cycle-accurate model with enumerated values. But when we atempt to abstract time, we must do so in such a way that we can construct a mapping from the cycle-view to our abstracted view.

The naive approach is to look at a transaction such as "write(addr, data)", and model it as a function call. If the client and server of this function lie in different threads of exection (hence the function is actually implemented as a message from one thread to the other), then this would seem to give us what we want. The server thread can model the timing structure of the resource (e.g. contention for banks within a DRAM); and the client can just say "write this data" -- a fire-and-forget operation.

This naive approach is the one taken by the SystemC master-slave library. And it works -- for fire-and-forget style transactions. A UTF (untimed-functional) model within SystemC can be mapped to a cycle-accurate view, as long as all its messages are asynchronous.

But consider a read transaction. As a client, you may want to call "read(addr)", and expect to get the value as the return value. Unfortunately, this can only work in the most trivial of cases: where transactions are not pipelined, or where the resource is not shared.

Consider two back-to-back transactions, say a read followed by a write. Consider also that the round-trip time is, say, 10 cycles. But the bus is pipelined. So a correct model would issue the read at cycle N, and the write on the following cycle (N+1). But in the transaction world, if the read must wait for its reply, then the write cannot be issued until cycle N+10 (because the read will block until it has a result). A different client may issue a read at time N+5 -- the result of this read could not be correct from the perspective of the cycle-level abstraction.

These problems are not insurmountable, though C++ can make life difficult (and SystemC provides no help whatsoever). One approach is to use "lazy" values.

A lazy value is a value that knows how to get a value, but does not do so until the value is actually needed. Using a lazy value, one might write:

  int add_values(int addr)
  {
    lazy<int> d1 = read(addr);
    lazy<int> d2 = read(addr+1);
    return d1 + d2;
  }

Here, we use the "lazy" template to create a synchronous facard onto an asynchronous interface. The constructor of the lazy object "d1" would issue a non-blocking read request, and would remember a "handle" for the expected reply. Onc cycle later, the CTOR of d2 would do the same. So the requests are issued back-to-back. However, the addition operation would require the actual values. The casting operator from the lazy to the actual integer would block until it receives the expected reply. (Of course, we could always define a lazy addition, but that can get somewhat hairy). So with a bit of magic (and probably a performance hit), we can construct our client in such a way that we preserve a mapping from the cycle world onto our synchronous transaction view.

To avoid these problems, most people either ignore the issue or use only asynchronous transactions. A typical SystemC UTF model will ignore the issue. It is common to hear people saying that to use a UTF model in a cycle-accurate simulation, you must first re-write the model to be cycle accurate. What they are saying, in effect, is that the UTF abstraction that they use is not a valid abstraction over the real hardware. Its OK in a hand-waving scenario (and possibly for a SW group to use when writing drivers), but unusable for verification. The asynchronous approach is better from the perspective of verification (it works), but the lack of synchronous transactions makes it a less powerful abstraction.

Unified Transaction Abstraction

We can describe the transaction abstraction in two ways: the grouping of values (not bits!) into interfaces; and the construction of conceptual atoms of communication that use these groups. This is a somewhat obscure way of saying that we want functional calls and encapsulation. However, this view of transactions does, in my opinion, fail to grasp the opportunity to make a real abstraction.

The above definition is really just a syntactic wrapper over the structure of the design. There is an implicit assumption that we are describing the interface between two blocks. The obvious point here is that we still have the blocks. So there is little, or no, abstraction of the structure of the design. We're just finding a nice way to describe the internal interfaces.

My vision of transaction models goes beyond this. The focus of the transaction model should not be on the communication between blocks. Transactions should focus on the flow of information though the chip. Information may get transformed, stored or thrown away. But these operations need not be associated with specific blocks of hardware. If we can model the causal flow with a sufficient lack of detail (!), then the association with hardware blocks will be relegated to the mapping onto a cycle-level model. For the most part, a transaction model based on this information-flow approach will very similar to the traditional transaction model. This is because most hardware tends to have a fairly clean mapping between functional blocks and the functions they perform. Unfortunalely, this similarity is somewhat superficial. A transaction model build as a syntactic abstraction of a specific microarchitecture will have a large number of minor mismatches when compared to a different microarchitecture.

It is difficult to abstract from a single data-point. We do not often have the luxury of contemplating multiple designs when constructing the transaction model. In this context, it is very probable that specific implementation decisions will pollute the model. We should not attempt to prevent this. It is too easy to get into a mode of thought where you attempt to build a model that is perfectly abstract. Such models become bloated, full of hooks that are never used, and largely untested. So I instead recommend an approach based on the principles of "Pragmatic Programming", specifically you should follow the D.R.Y principle ("Don't Repeat Yourself") and also remember YAGNI (You Aren't Going to Need It). If you have no duplication, and no code that you don't need, then you'll most-likely end up with a clean, object-oriented, transaction model.

index