SICO Online Editor

The Single Instruction Computer

Download the interpreter here: sico.c

SICO is a Single Instruction COmputer that mimics the functionality of a normal computer while using only one computing instruction. This is like going into a forest with no tools and trying to build a house. Since we only have one instruction, most modern conveniences are gone. Things like multiplying numbers or memory allocation need to be built from scratch using SICO's instruction.

The instruction is fairly simple: Given A, B, and C, compute mem[A]-mem[B] and store the result in mem[A]. Then, if mem[A] was less than or equal to mem[B], jump to C. Otherwise, jump by 3. We use the instruction pointer (IP) to keep track of our place in memory. The pseudocode below shows a SICO instruction:

A = mem[IP+0] B = mem[IP+1] C = mem[IP+2] IP += 3 if mem[A] <= mem[B]: IP = C mem[A] -= mem[B]

The instruction pointer and memory values are all 64 bit unsigned integers. Overflow and underflow are handled by wrapping values around to be between 0 and 2^64-1 inclusive.

Interaction with the host environment is done by reading and writing from special memory addresses. For example, writing anything to -1 will end execution of the SICO program.

SICO is based off of the subleq architecture.

SICO Assembly Language

We can write a SICO program by setting the raw memory values directly, but it will be easier to both read and write a program by using an assembly language. Because there's only one instruction, we don't need to define what's used for data, execution, or structure like in other languages. We only need to define memory values, and the flow of the program will decide what gets executed.

The entire assembly language is simple enough to fit on a single piece of paper:

ComponentDescription
Single Line CommentDenoted by #.

Ex:
# Hello, # World!
Multi Line CommentDenoted by #| and terminated with |#.

Ex:
#| Hello, World! |#
Current AddressDenoted by a question mark. Inserts the current memory address.

Ex:
? ?+1 # Next address
Label DeclarationDenoted by a name followed by a colon. Declarations mark the current memory address for later recall.

Labels are case sensitive and support UTF-8. They can consist of letters, numbers, underscores, periods, and characters with a high bit. However, the first character can't be a number.

Ex:
loop: Another_Label3:
Label RecallDenoted by a label name. Inserts the memory address declared by Label:.

Ex:
loop: # declaration loop # recall
SublabelDenoted by a period before a label. Places a label under another label's scope. Avoids name collisions.

Ex:
A: .B: # Shorthand for A.B:
NumberInserts the number's value. A number must be in decimal or hexadecimal form.

Ex:
'A 'B 'C # Evaluates to: 65 66 67
ASCII LiteralDenoted by an apostrophe. Inserts an ASCII value.

Ex:
123 0xff
OperatorDenoted by a plus or minus. Adds or subtracts the number or label from the previous value. Parentheses are not supported. To express a negative number, use the form 0-x.

Ex:
len-txt+1 ?+1
Input / OutputAddresses above 2^63-1 are considered special and reading or writing to them will interact with the host. For an instruction A, B, C:

A = -1: End execution.
A = -2: Write mem[B] to stdout.
B = -3: mem[B] = stdin.
B = -4: mem[B] = environment timing frequency.
B = -5: mem[B] = system time.
A = -6: Sleep for mem[B]/freq seconds.

Ex:
0-2 txt ?+1 # A = -2. Print a letter.

Hello, World!

We can show how the SICO assembly language and architecture work with a simple program that prints "Hello, World!" to the screen:

loop: 0-2 txt ?+1 # Print a letter. len one exit # Decrement [len]. If [len]<=1, exit. ?-5 neg loop # Increment letter pointer. exit: 0-1 0 0 txt: 'H 'e 'l 'l 'o ', ' 'W 'o 'r 'l 'd '! 10 len: len-txt neg: 0-1 one: 1

The first 3 lines of the program define a loop, and at each iteration of the loop we print a new character. Below the loop is a command to exit the program, followed by all of the data we need to make the program work.

For brevity, we will use [x] as shorthand for mem[x], the value at address x. Also, even though we're working with unsigned integers, we'll frequently refer to -1, -2, etc as shorthand for 2^64-1, 2^64-2, etc.

loop: 0-2 txt ?+1

The first line declares the label loop: to mark the memory address of where our loop starts. It also contains the instruction 0-2 txt ?+1. This instruction won't behave like a typical instruction since -2 is the first operand, which is a special output address. When a program writes to -2, the interpreter will print the value of [txt] to the screen.

len one exit

The second line, len one exit, is our first normal instruction. When this instruction is executed, it will take the values at memory addresses len and one, subtract them, and store the result back at len. That is, it will perform [len]=[len]-[one] which evaluates to [len]=[len]-1. Also, if [len] is less than or equal to [one] (before subtracting), then we will jump to exit and end the program.

Thus, when the program is running, we expect it to count from len-txt down to 1 and then exit.

?-5 neg loop

The third instruction is used to advance to the next character to print and restart the loop. Recall from the assembly specification that ? refers to the current address, so ?-5 refers to five memory addresses back. In this case, it points to the first instruction (where txt is). When this instruction executes, -1 will be subtracted from text, which will have the effect of advancing to the next character that we want to print. Since we are using unsigned arithmetic, [neg]=-1=2^64-1 will be greater than or equal to any other value. Thus this instruction will always jump to loop and restart the loop.

exit: 0-1 0 0

The last instruction of our program uses another special address. In this case, writing anything to -1 will tell the interpreter to end the program. Note, we do not specifically need to use 0-1 as the first operand. Any expression that evaluates to -1, such as 2-3, will also work.

txt: 'H 'e 'l 'l 'o ', ' 'W 'o 'r 'l 'd '! 10

Here we define the ASCII character codes for the text "Hello, World!", plus an end-of-line character.

len: len-txt+1

This line is a quick and dirty way of calculating the length of the text we want to print. We need to add 1 to the length since we abort at 1 instead of 0.

one: 1 neg: 0-1

And the last 2 lines define the constants 1 and -1.

This section appears to be rather long, given that we are going over a simple program. However, it is only long because it is meant as an introduction, and we are trying to be thorough with explanations. The next section will be more terse.

Synthesized Instructions

Although printing text to the screen is easy, we will need to synthesize more complicated instructions to serve as building blocks when we make more complicated programs. In future articles, we will also show how to turn these synthesized instructions into easy to call functions from within SICO. For now, we are only focusing on basic instructions in order to show how the SICO architecture works.

For these code blocks, tmp will denote an address for a temporary variable. It can have any value at the start of the block, so we'll usually need to zero it out. We will also continue the use of [x] as shorthand for mem[x], or the value at address x.

Up first is one of the most common instructions, an unconditional jump to jmp. It also sets [tmp]=0.

tmp tmp jmp

We can abort a SICO program by writing any value to -1. Note that we need to calculate -1 as "0-1" due to the syntax our assembly language.

0-1 0 0

Set [a]=[b]. The series of ?+1 expressions points to the next memory address after the instruction. They simply serve to force the instruction pointer to go to the next instruction regardless of whether or not the instruction would jump.

tmp tmp ?+1 # [tmp]=0 tmp b ?+1 # [tmp]=-[b] a a ?+1 # [a]=0 a tmp ?+1 # [a]=-(-[tmp])=[b]

Jump to jmp if [a]=[b].

tmp1 tmp1 ?+1 # [tmp1]=0 tmp1 b ?+1 # [tmp1]=-[b] tmp2 tmp2 ?+1 # [tmp2]=0 tmp2 tmp1 ?+1 # [tmp2]=-[tmp1]=[b] tmp2 a ?+1 # [tmp2]=[b]-[a] tmp1 tmp1 ?+1 # [tmp1]=0 tmp2 tmp1 jmp # if [tmp2]<=0, then [a]=[b], so jump

We can print the character "A" to the screen by writing it to the special address -2. We define char to be the ASCII code for "A".

0-2 char ?+1 char: 65

Increment [a]. This will always jump to the jump address.

a neg ?+1 # [a]=[a]-[neg]=[a]-(-1)=[a]+1 neg: 0-1

Decrement [a].

a one ?+1 # [a]=[a]-[one]=[a]-1 one: 1

Set [C]=[[A]+[B]]. This is the same as getting the value at an array index, as in C=arr[i] in other languages. This will form the backbone of functions in SICO.

tmp tmp ?+1 tmp A ?+1 tmp B ?+1 # [tmp]=-[A]-[B] ptr ptr ?+1 ptr tmp ?+1 # [ptr]=[A]+[B] tmp tmp ?+1 tmp ptr:0 ?+1 # [tmp]=-[[A]+[B]] C C ?+1 C tmp ?+1 # [C]=[[A]+[B]]

Set [[A]+[B]]=[C]. This is the same as assigning a value to an array, as in arr[i]=C. Assume [p0]=[p1]=[p2].

p0 A ?+1 p0 B ?+1 # [p0]=[p0]-[A]-[B] p1 p0 ?+1 # [p1]=[A]+[B] p2 p0 ?+1 # [p2]=[A]+[B] tmp tmp ?+1 tmp p1 ?+1 p0 p0 ?+1 p0 tmp ?+1 # [p0]=[A]+[B] tmp tmp ?+1 tmp C ?+1 p0:0 p1:0 ?+1 # [[A]+[B]]=0 p2:0 tmp ?+1 # [[A]+[B]]=[C]

If we allow a SICO instruction to be atomic, we can actually create a spinlock. When the lock is first acquired, the value of [lock+1] is overwritten from z-1 to z-1-[z-1]=z and we jump to the critical section. When a second thread tries to acquire the lock, it will subtract [z]=0 from z, which will fail to jump, and the thread will be caught by the next instruction, z z lock. When the owning thread is done with the lock, it just needs to subtract [z+1]=1 from [lock] to allow the lock to be acquired by a new thread.

lock: lock+1 z-1 crit # acquire lock z z lock # failed to acquire, try again crit: # # critical section # lock+1 z+1 ?+1 # reopen spinlock z z jmp # jump 0-1 z:0 1

Architecture Properties

SICO belongs to the family of one instruction architectures, like the subleq architecture it's based off of. This section will go over the theoretical and practical properties that SICO has. First, however, it is necessary to outline the differences between SICO and subleq.

Whereas subleq uses 2's complement signed arithmetic, SICO uses unsigned arithmetic. There are several reasons for this:

We may use the entire address range, instead of only the non-negative addresses subleq allows.
SICO offers 3 ways to guarantee a jump

1. Zeroing an address:A A jmp
2. Subtracting anything from 0:Z A jmp
3. Subtracting -1 from anything:A Z-1 jmp

Subleq can only guarantee a jump by zeroing an address. This gives SICO more options when deciding program flow, since these are all common operations.
In unsigned arithmetic, if A>B, then we know that A-B>0. This does not hold in signed arithmetic.
Using the instruction A Z jmp, we can test A for equality with 0 using only one instruction and without modifying any variables. This a common value to test for. Subleq, would require 1 to 2 instructions and modifying A for this test.

SICO also uses a different order for operands. With SICO, for operands A, B, and C, we take [A]=[A]-[B], where subleq takes [B]=[B]-[A]. The order of operands in subleq is perfectly valid, of course, but goes against the standard ordering used in programming. For example, we usually write a=b+c instead of b+c=a. In an early version of SICO, I used the subleq order of operands and found myself constantly having to reorder the operands in my head. Thus I decided swap the roles of A and B.

So, with comparisons done, we can now go over the properties of the SICO architecture.

Self Modifying

Compared to modern architectures which prevent the modification of code for security reasons, SICO programs require some amount of self modification to do anything useful at all. This can most easily be seen in the "Hello, World!" program in the section above. Specifically, the lines

0-2 text ?+1 # Print a letter. ?-2 neg loop # Increment letter pointer.

Here we must modify part of an instruction where text is in order to print successive characters in a string.

We can use this self modifying property to generate some simple programs at run time, or make a self interpreter that counts the number of instructions used by another SICO function. However, symbolic self modification, like that seen in Lisp, would be difficult to perform in SICO.

Special Addresses

As part of the SICO specification, reading and writing from special addresses can be used to interact with the host environment. For instance, 0-1 0 ?+1 will end the program, and 0-2 char ?+1 will print [char] to the screen. Without these addresses, a SICO program would have no way to display its results or interact with its host computer. It would effectively be a brain in a vat. There's no real way around this, given our one instruction limit.

Some consideration was taken as to how to interact with the host. One choice was to confine all host interaction to one address, such as 0-1 B C. What this address would do would depend on the values of B and C. After using this method for a while, I determined that it doesn't work well with SICO as a whole. The ability to read and write from specific addresses allows them to used as if we're reading and writing from any other memory position. My opinion of this might change as I use SICO.

One major difference from subleq implementations is ending execution. Whereas SICO requires writing to a specific address, most subleq implementations end execution by jump to any negative address. Since SICO allows any address to be used, we needed an explicit address to end execution.

Turing Complete

We set out at the beginning of the article to "build up to what a normal computer can do", which begs the question of whether or not this goal is possible. We will prove that it's possible by showing that SICO can replicate the operations in Minsky's Computation: Finite and Infinite Machines. By doing so, we will show that SICO is Turing complete and thus can replicate any more complicated computer.

Let regn denote some register and let the following assembly code initialize our program.

0 0 start neg1: 0-1 pos1: 1 reg0: 0 reg1: 0 reg2: 0 # ... start:

We have the operation regn', which increments register regn and jumps to the next instruction. In SICO, this can be performed by

regn neg1 ?+1

And we have the second operation, regn-(n), which jumps to some memory address n if regn=0, otherwise it decrements regn and jumps to the next instruction. In SICO, this can be performed by

regn 0 n regn pos1 ?+1

Thus, SICO is Turing complete.

One Instruction

This is also a good place to mention a common criticism I have come across in forum postings while doing my research. That is: how can you say that a SICO (or subleq) instruction is "one" instruction when it does so many things? After all, it performs a subtraction and then jumps, so it should at least count as two instructions.

The most important thing to consider is the intent of the architecture. Most architectures have a variety of instructions to perform a variety of actions by design. A SICO program is meant to be composed of one simple instruction, and the goal is to build up to what a normal architecture can do. If we get tied up by the definition of the architecture, then we'll lose the intent of it.

That being said, we can create a formal definition of what "one instruction" actually means. The definition uses a simplified version of Laplante and Gilreath's complexity calculation: Given a set of memory addresses, is it possible to execute more than 1 instruction on those memory addresses. For instance, consider addresses A and B. In a complex architecture we could execute

add A, B sub A, B mul A, B ...

etc, all on the same memory addresses. Whereas with SICO, given A, B, and C, we always execute the same instruction. This definition offers a better argument in my opinion, although it can be circumvented by mapping instructions to memory addresses as in transport triggered architectures.

Notes

Since writing this, the interpreter has been improved in a few ways:

The parser underlines the section of text that yields an error.
The parser checks for non-terminated block quotes.
The parser uses a hash map instead of an AVL tree for matching labels.
Memory is held in a flat array instead of an AVL tree. When profiling a SICO math library, the flat array was almost 100 times faster.

The main downside with using a flat array to hold memory is that a lot of it can be dead space. Although the array will grow if we try to access a memory address outside of its current allocation, for speed reasons the array will not check if it can be shrunk. Using a paged memory system may offer a decent balance between speed and wasted space. It would be used like so:

val=mem[addr>>48][(addr>>32)&0xffff][(addr>>16)&0xffff][addr&0xffff]

However, testing on large applications will need to be done.

Creating an animated heat map showing memory activity would be useful in showing how the architecture works to casual observers.

To support multithreading, make each instruction atomic. The interpreter can spawn threads by writing to a special address. The new thread's instruction pointer would start at [B].

Within the language, make a bitwise and integer arithmetic library. Also see if there is a self synchronizing sequence of instructions. That is, if the IP were to randomly land anywhere in this block of memory, could we always direct it to some safe memory address?

SICO's name was originally unileq, following in the tradition of subleq. However, unileq doesn't quite roll off the tongue.

sicofiles.zip contains my unfinished work, as well as test cases for the sico.c interpreter.

See also the esolangs entry: https://esolangs.org/wiki/SICO