Automat: Objects As Syntax Not Data

Abstract

Automat is a skeuomorphic desktop automation tool (similar in function to AutoHotKey) with the end-goal of evolving into a fast & intuitive general purpose computing environment. This article aims to give an overview of Automat, focusing on it's distinguishing approach to object-oriented programming.

Automat's notion of objects (called Syntax Objects) are a materialization of programming abstractions. They should be applicable to other programming environments and should become a valuable tool for programming language researchers and software engineers.

Introduction to Automat

Computers have a potential to greatly improve people's lives. By automating repeatable work. By providing access to the humanity's combined knowledge & culture. By amplifying human creativity with innovative digital tools.

Yet existing software ecosystem largely fails to realize this potential. Forces that surround software engineering seem to eventually separate users & programmers, locking out a large portion of the population from commanding computers. One source of these forces lies in poor, inconsistent or obsolete decisions made when the foundational technologies have been designed. Another source, maybe the more important out of the two, lies in social barriers that limit the access to knowledge, often in subtle and hard to see ways.

Luckily we're seeing continued progress on both fronts. Growth of internet, smartphones and large language models in the last few decades inspire great optimism in more egalitarian access to technology in the future.

This work aims to further this goal by reducing the barriers required to control computers. It introduces an environment called Automat - an interface offering a skeuomorphic representation of objects in computer memory.

A stack of three objects - Window Capture, Tesseract OCR and a Timer. The objects are connected in sequence. First Window Capture captures a window. Then Tesseract OCR analyzes its contents. Finally a timer waits for one second and loops back to the beginning.

Metaphorical interface allows a wider audience of users to compose complex behaviors. People who see themselves as "non-technical" are not alienated by formal appearance. People who are not familiar with computer terminology can guess the function and composition rules of on-screen elements. It invites playful experimentation.

Initial goal of Automat is set on the problem of game automation. Games provide entertainment, enrich social life and serve as a tool for learning and sharpening skills. Unfortunately many modern computer games, particularly the ones developed with return on investment in mind, adopt a range of techniques that produce addiction and incentivise recurring payments. Mechanisms such as virtual resource gathering, randomized rewards, classical conditioning through repetition, stimulus & reward are a common element of many games.

The opportunity that Automat sees in this environment is that game automation may offer a way to break free from addictive gameplay loops, while simultaneously allowing players to engage with the game on a deeper, more strategic level.

From the perspective of Automat's development, the variety of games creates a smooth complexity gradient - which makes it usable very early in its development as a macro playback utility with incremental additions expanding its use to more use-cases. Eventually, the same techniques that can be used for game automation may potentially be used to solve real-world problems, transitioning Automat into a general purpose computing environment.

Learning from programming languages which are in broad use today, Automat's development is not being done with expectation of profit. Monetization efforts tend to compromise the design of computing environments by introducing competing objectives. Commercial programming systems have had a history of difficulties in attracting user trust. Instead, Automat's development is driven by a small community following hacker culture & hacker ethics. The goal of this article is to encourage readers to join this community and push the frontier of visual programming together.

Just like many tools in the physical world are designed with two "ends" in mind - one meant for being comfortable in human hand - and one efficient at doing the work. Automat's design has three "ends".

One meant for interfacing with the person using the system,
one meant for interoperability with other tools and
one meant for effectively utilizing the computing hardware.

This article focuses on the first of the three.

Objects As Syntax Not Data

Automat's design is based on the idea that programmers working with a program build a mental representation of the program, composed of interacting abstract objects.

Not all programmers subscribe to this view. Notably Programming as Theory Building by Peter Naur distances itself from it: (...) a theory held by a person has no inherent division into parts and no inherent ordering. Instead it adopts a more scientific, behavioral definition of theory: as the ability to answer questions and make changes to the program.

Note that this usage of "abstract objects" has nothing to do with classical OOP terminology. Instead, the term comes from philosophical conceptualism, which argues that a thinking mind generalizes knowledge through universals, which exist within the mind as abstract objects. While most universals are closely connected to linguistic terms and are easy to communicate, domain experts over time can develop their own framework of concepts. Think of a mathematician examining a mathematical problem or a chess player planning a strategy.

Within the field of software engineering the abstract objects that programmers use to mentally represent programs are commonly called abstractions and are fundamental to high-level programming languages and software architecture.

Abstractions in a computer program typically map to subgraphs of the abstract syntax tree. They may also exist within the code implicitly as invariants, or imagined results of code execution at its different stages.

Programmers use a term "readability" for programs where abstractions are evident from the source code. Conversely, the practice of obfuscation makes programs harder to understand by breaking abstractions while preserving program behavior.

We can attempt to translate objects from this abstract mental domain back into a concrete computing environment where each abstract object is materialized as a concrete and composable object on the screen of a computer. We will call the objects produced by this translation Syntax Objects.

Novelty

Syntax Objects are a new term for a real counterpart of a concept known within software engineering as abstractions.

Within the context of software engineering, the contribution of this article is that it:

defines the category of Syntax Objects as a materialization of abstractions,
identifies some of the properties of Syntax Objects that are helpful in their design,
proposes a set of programmer-centric design objectives for Syntax Objects.

Example

Although the term Syntax Objects is new, it describes an idea that has been around since the dawn of high level programming languages - an attempt to materialize abstractions as composable bricks for programming.

Syntax Objects in Bash

Syntax Objects can be found in every high level programming system. Many of them are provided by the programming language through its syntax and standard library. Programmers often roll out their own through user-defined functions, classes and macros.

Properties

Since the size of the source code for any problem is constant, regardless of the size of the processed data, then a round-trip translation of any program from its formal definition into abstractions and then back into Syntax Objects should produce a constant number of the latter. This means that the number of Syntax Objects for any given program is constant.

Just like variables and unlike traditional OOP objects, Syntax Objects are not data themselves. Instead they form an interface to data. Syntax Objects are typically not automatically constructed & destroyed during program execution.

One exception to this may be programs that employ macro expansion, rule-based substitutions or polymorphic code during their execution.

Just like user-defined functions with their call sites and a definition, Syntax Objects are mental primitives (atoms) from the perspective of the programmer solving one specific problem but within the system their behavior may be arbitrarily complex and open for change. This means that the atomicity of Syntax Objects is relative.

A concrete bound for the number of simultaneously interacting Syntax Objects may be estimated from the psychological studies of the working memory. The Magical Number Seven, Plus or Minus Two by George Miller, suggests that a working memory can only hold about 7 distinct chunks of information at once.

A limit that may prove helpful in managing visual complexity. A known weak point of visual programming environments, summarized in a statement called the Deutsch limit: The problem with visual programming is that you can't have more than 50 visual primitives on the screen at the same time.

Similarly to program source code, Syntax Objects may either perform their work through interpretation or be translated into a more efficient form through compilation.

Design Criteria

The starting point for designing Syntax Objects is their mapping to mental devices that programmers use when thinking about programs. By providing an interface that resembles and supports those mental contructs, both experienced and novice programmers can work faster and more intuitively. Syntax Objects are the area within programming where careful design is most needed.

A brief introduction to the role of design can be found in Beauty Is Objective by Andrew Coyle. Although written with website design in mind, this introduction is very applicable to programming environments. In fact it is recommended to interrupt this lecture now to read that article.

We can outline some objectives for a well designed Syntax Object:

Faithful. Elementary operations offered by a Syntax Object should correspond to the transforms that programmers perform when thinking about a program.

Discoverable. When a programmer needs a specific Syntax Object but is not aware of its existence, it should be easy to find it.

Memorable. A programmer who used specific Syntax Object in the past should have an easy time recalling it when needed.

Skimmable. Programmers (including programmers without formal education) should be able to intuitively guess the function of a Syntax Object in the first seconds of seeing it for the first time.

Easy to master. Fully understanding the behavior of a Syntax Object should require as little effort as possible.

Fast to use. Time needed to perform a task using a given Syntax Object should be minimized.

Many other design objectives could be listed here but only the ones directly related to the programmers efficiency have been deemed relevant.

The design of Syntax Objects, just like the choice of right abstractions, can promote interoperability and efficiency, but this article will not discuss these aspects.

Exercise: It's valuable to go back to the bash example above. Please take a look at that example now and try to answer the following questions:

How would you rate the Bash programmer interface in each of the design criteria listed in this section?
Try to imagine an alternative interface that would aim to maximize each out of those design criteria. Focus on just one criterion at a time. How could this interface look like?

Design criteria listed above should provide a valuable compass for language designers iterating on a language design.

Software engineers, who often introduce custom abstractions through OOP techniques, could use these design criteria as an alternative to guidelines such as SOLID or GRASP.

Objects In Automat

After having defined Syntax Objects in textual realm, let's now take a look at how they're used in Automat.

Demonstration of imperative-style macro used for leveling up combat skills in Skyrim

Let's discuss what can be observed in this demonstration:

The Automat's interface is heavily metaphorical. The sleep function is represented by a timer. The function to press a mouse button is represented by an image of a mouse. The usage of visual metaphors that tie functions to familiar physical objects helps with skimmability.
Clicking a mouse is performed using two objects (down & up) as opposed to a single toggleable object. This maps more closely to the OS-level input events and improves the faithfulness of the interface. For example this allows the player to craft physically impossible sequeunces of events (like three "mouse down" events in a row), which can be exploited in some games.
Objects that are available to the player are visible in the bottom bar, offering good discoverability.
Objects follow unstructured programming style, where control flow is passed explicitly (like in a goto statement) through blinking cables. The unstructured programming style is simple to understand, even more so because of blinking feedback, making it easy to master. Contrast this for example to other programming styles (such as structured, reactive, dataflow, functional or message passing), all of which require a learning phase.
Most actions, even including setting the timer durations, are done with a mouse, minimizing the need to move the hand between mouse and a keyboard. Combined with the locality of actions within objects (Fitt's law), this makes the interface fast to use.

Side Note: Game Automation & Fun

Let's make a brief digression and consider the impact of game automation on the fun of playing a game. At first sight it might seem like game automation would take it away.

We will consider this problem through the lens of Gamism, Narrativism and Simulationism - a game design theory which postulates that players of role-playing games are motivated by three different goals: (1) to win, (2) to create an interesting story and (3) to engage with a world governed by rules.

Let's consider the Skyrim example from the video above.

Regardless of the player's playstyle, the usage of Automat saved about 12 hours of repetitive actions in exchange for about one hour of mentally stimulating design.
From the perspective of a gamist, a leveled-up and stronger character is more fun to play. The player can now engage more powerful enemies and explore more of the game world.
Fully levelling a skill brings a sense of accomplishment. Some players level up skills just for the sake of it. For a simulationists, who seek to exploit rules of the world to solve self-imposed challenges, achieving this through an intellectually challenging path (rather than simple repetition) makes the sense of accomplishment even greater.
From a narrativist's role-playing perspective, the character is more interesting to play since his resilience skills now have an interesting origin story: they have been developed over ten days of being punched by a mudcrab while being stuck waist-deep in a swamp. The character may have developed a crab-related PTSD or made mudcrabs his sworn enemy. The specific mudcrab that punched him may reappear later in his story.

Even though intuitively game automation should take away the enjoyment of playing a game, in reality this first impression couldn't be further from the truth.

Side Note: Game Automation & Craftsmanship

The two videos show subsequent iterations of a macro for leveling up a character in Black Myth: Wukong.

Second iteration of a macro for leveling up character in Black Myth: Wukong, at a rate of 1 exp/s.
Playback clamped to 0:15-2:00.

Third iteration of the macro for Black Myth: Wukong, at a rate of 3.4 exp/s.
Playback clamped to 1:50-5:35.

Optimizing a macro to achieve a higher time efficiency is another, intrinsic source of reward in game automation.

It's present in games like Factorio, Satisfactory or Kerbal Space Program. It's the defining feature of the whole category of Zachlike games (to which Automat belongs).

Programmers may be familiar with a similar sense of accomplishment stemming from code golfing or performance tuning a piece of code.

Programming By Demonstration

Some macros require fine timing and coordination over multiple inputs that is hard to achieve with manually placed timers and low-level input events. This problem can be solved with another kind of object - a Timeline.

Timeline is an abstraction known from multimedia authoring environments (for example Macromedia Flash or Blender), that contains multiple ordered (usually keyed by frame) sequences of values.

Within Automat a Timeline is keyed by a 64-bit integer - number of nanoseconds and can contain different types of tracks. One kind of track can switch connected objects between on and off states (it's used for controlling keys). Another kind of track represents 2D traces as a series of vectors (it's used for controlling mouse). Timeline is optimized for fast playback of tracks that contain thousands of values each second, to match the polling rate of gaming hardware.

Timeline is not typically created directly by the player but rather by a Macro Recorder - another object that provides programming-by-demonstration abilities. The example below shows how it's typically used.

A typical setup for recording macros in Automat

The next video shows how this setup can be used in practice:

Recording & playback of a potion brewing macro in Kingdom Come: Deliverance II.
Playback clamped to 3:03:50-3:08:10.

The example above shows a lengthy potion brewing mini-game being recorded into a macro & then played back. Once the hotkeys are connected to Macro Recorder & a Timeline the whole process can be executed without leaving the game. Also, once the playback is started, the game can be left unattended. Automat will continue playing the game indefinitely.

Note that some of the video examples used in this article come from development live streams and show in-progress builds of Automat. In those examples graphical glitches (invisible objects, flickering) may be present.

The visual design of the Macro Recorder imitates a blend of recording device and a parrot. This is meant as a distant metaphor for the ability of some parrots to repeat human voice. The parrot signals its interest by following the mouse cursor with its eyes, which also rotate while the parrot is recording. To avoid distracting the player with its eyes, the parrot goes to sleep when it's not being used (due to a bug the parrot slept during the recording).

One subtle observation from the setup video is that Macro Recorder automatically connects to the nearest Timeline. Some objects auto-connect to the closest neighbor that satisfies some interface. The auto-connect feature makes the environment faster to use. Simply placing objects in the same area is enough to make them function together.

The auto-connect feature is accompanied by a subtle radar scanning animation (not visible on the video) which makes it more discoverable.

Another mechanism (again, not visible on the video) is that in the case that Macro Recorder lacks a Timeline to record to, it will create a new Timeline and connect to it automatically. This is another mechanism that improves its speed of use.

The last observation in this example is that Macro Recorder produces a network of objects that can then be manually modified by the player, allowing for further optimization and teaching the player how the objects can be used together.

Machine Code Scripting

Another example of Syntax Objects is the scripting mechanism used by Automat. Automat's scripts use a metaphor of playing cards with instructions written on them, similar to many collectible card games.

Instruction cards in Automat

Each card corresponds roughly to one machine instruction and can be composed with other cards to form a program. Programs built in this way are translated into machine code and executed directly on the CPU, making this scripting approach as fast as hand-written assembly.

Below is a demonstration of machine code scripting:

Machine code instructions in Automat

The video introduces many new elements that should be discussed in more detail.

First - the video shows an Instruction Library - another object that can be used to select one of thousands of instructions available on x86-64 CPUs. Following the playing card metaphor, Instruction Library is represented as a deck of cards. It groups instructions into categories (Logic, Math, Control Flow, etc.) and allows for filtering instructions based on the registers that they read & write.

Another new object represents the state of the CPU - a set of registers, also known within OS terminology as task context. It's being shown as a rectangle of broadway-style lights with a starfield inside. It provides a real-time view of the state of registers and allows multiple scripts to coexist.

Finally, individual registers are shown as checkerboards governed by an icon. Registers can be used by other objects within Automat to communicate with the machine code.

The general style used throughout machine code objects is loosely based on graphic design of Monty Python's Flying Circus making it more playful and memorable.

At the same time the representation tries to stay faithful to the underlying x86-64 CPU model - the instructions still show the assembly mnemonic & machine code in the corners of the card - but they do so in a way that passes as visual ornament rather than the main interface.

Below is a video showing how machine code scripting can be used to create non-trivial macros:

Complex Skyrim macro that scans the shopkeeper's inventory, buying every "Giant's Toe" & "Creep Cluster". When done, performs a Timeline-driven inventory reset glitch & starts over.

Step-by-step breakdown of the Skyrim macro.

What's striking about this example is the sudden increase in its complexity. The number of objects on screen vialates the 7±2 rule. The purpose of many objects is not immediately obvious. Large number of cables makes the overall control flow hard to follow. Even though any programmer would be able to decipher the code after short while, it's far from perfect in the "skimmability" area.

This example shows the current limitations of Automat. The implementation of many of its systems is not complete, they're undergoing design iterations or haven't been designed at all.

Machine code subsystem which is the focus of this section still lacks x86 instruction set extensions like AVX, SHA or AES; ModR/M instructions; EFLAGS & RIP registers.

Especially the area of visual complexity management could benefit from some improvements. Some plans in this area include:

an ability to group & minify objects,
visual & textual annotations and
a better support for DRAKON-style control flow.

The game automation area is missing a way to interact with gamepad & possibly other kinds of HID devices.

This also connects to the question of what's next beyond input automation.

Future of Automat

The eventual goal of Automat is to become a general purpose computing tool. This can only happen if it's capabilities are expanded out of game automation, into neighboring areas.

One such area, still somewhat related to game automation might be program manipulation. Most programs are distributed to users as black boxes, with only limited ability to customize their behavior (usually through predefined settings or extension APIs). Automat's objects could enable direct manipulation of entities within the memory of other processes, allowing debugger-like tools for hooking and altering the behavior of other programs. Program modification is an area where very few tools exist and even the best ones are typically reserved for experts (one notable exception being Cheat Engine). Encapsulating the tricks & techniques of those expert users into well designed objects could open this area to a much wider audience.

Another area might be multimedia processing. The world of software is full of excellent graph-based processing tools, many of which could benefit from better interfaces. Examples of such libraries might include GStreamer, FFmpeg, GEGL, OpenCV, YOLO. Vision is often vital to fine control so it ties well with game automation. Recent versions of Automat already include Tesseract OCR (also visible in the first video in this article), showing that some work in this direction is already happening.

Yet another area might be to interface with other programming systems. The ability to embed snippets of code within Automat could certainly help with more complex logic. A solid programming environment (even for small code snippets) embedded within Automat could also address the difficult setup required by many language ecosystems and make them more accessible.

Automat follows a "no installer, single statically-linked binary" distribution model so theoretically it might serve as a simple programming environment to set up.

Some examples of more ambitious directions might be:

to use Automat as a Wayland compositor (turning it into a window manager capable of isolating and concurrently driving multiple apps according to player-defined rules),
to split out a core capable of running on an STM32 microcontroller, controllable in real time from a desktop interface (enabling hardware-level input replay, required for automation in more challenging environments, like consoles or mobile devices),
to provide more social features, like an ability to "visit" other players by connecting to other Automat instances,
last but not least - integration of LLM-based agents with their tool-calling & multimodal capabilities might provide a new, rapid way for automating arbitrary tasks (not necessarily related to gaming), while still allowing to visually inspect, structure and coordinate the work of multiple agents.

In addition to expanding its feature set, the core of UX of Automat could also be improved. Established animation techniques such as PBR rendering, skeletal animation or particle systems could be introduced to make objects more expressive. Sound design, a powerful and underutilized tool could be used as another feedback channel to convey the state of the program. The learning curve of Automat could be further flattened with better built-in tutorials and more structured documentation.

The list of potential directions is long and full of synergies, which leads us to the point of this article.

Invitation to Collaboration

It is my hope that this article inspires you that with some design & engineering effort, a more humane, more accessible and more efficient future of computing is possible.

If you agree with this vision, if you subscribe to hacker culture and hacker ethics, and finally if you are willing to put in some effort to make it happen (even just a couple of hours every now and then), then let me invite you to Automat's community, where we'll be able to bring this vision to life, together.

As Alan Kay once said:

The best way to predict the future is to invent it.