Comparing TeaVM with Kotlin/JS peformance with software 3D renderer

I've developed a software 3D renderer in Kotlin, and here's why.

I have a project called TeaVM, a bytecode-to-JavaScript compiler. Unfortunately, my project does not have many benchmarks. This is not only a problem for project publicity but also a problem for me as the project's author: I don't have sufficient insights into TeaVM's performance in real-world scenarios. While there are large applications compiled with TeaVM, the issue lies in their size. Firstly, it's difficult to measure specific performance aspects. Secondly, there's no way to compare performance since these projects cannot be ported into something else.

My software 3D renderer addresses these problems. It's a relatively small, CPU-bound application that can be compiled to JavaScript both with TeaVM and Kotlin/JS, as well as run as a JVM application.

There's also a third goal: TeaVM lacks good examples. One aspect of using TeaVM is creating Web Workers. In this article, I'll attempt to demonstrate how such an application, with the use of Web Workers, can be written. I'll also compare the performance of Kotlin/JS and TeaVM and draw some conclusions. Spoiler: I was a bit disappointed, but I also learned some lessons and have some ideas on how TeaVM can be improved.

It's important to note that I won't delve into the details of 3D rendering in this article. Firstly, it's not directly relevant to the primary focus. Secondly, my expertise in this area is limited. Lastly, given the prevalence of 3D accelerators in today's devices, software 3D rendering has limited practical use. For those interested in the 3D rendering aspect, I encourage exploring the source code and referring to relevant articles on the internet.

Let's focus instead on the Kotlin-to-JavaScript aspect, project setup and using Web Workers.

Setting up a Gradle project

To set up a Kotlin project with TeaVM, you would need something like this in your build.gradle.kts:

plugins {
    kotlin("jvm") version "1.9.20"
    war
    id("org.teavm") version "0.9.0"
}

Although in our project, we rely on the most recent TeaVM version built from the master branch, so in our case, the version is something like 0.10.0-SNAPSHOT. In a multimodule project, you can declare the plugin without application in the root module (which is our case) and omit the version in submodules. Note that war plugin is optional here.

Now let's define TeaVM settings:

teavm.js {
    addedToWebApp = true
    mainClass = "org.teavm.samples.software3d.teavm.MainKt"
}

teavm.wasm {
    addedToWebApp = true
    mainClass = "org.teavm.samples.software3d.teavm.WasmWorkerKt"
    optimization = OptimizationLevel.AGGRESSIVE
    minHeapSize = 4
    maxHeapSize = 32
}

The mainClass and addedToWebApp parameters are crucial here. The former informs TeaVM about the entry point for each platform, while the latter instructs TeaVM to add generated JS and Wasm files into the .war file.

These recommendations are valid for any project. However, we have a bit of a special case: we compile our project both with Kotlin/JS and TeaVM I believe, in the real world, it's a quite rare use case, so all Gradle settings below are not for a general case project.

In order to use Kotlin/JS, we replace the Kotlin JVM plugin with Kotlin Multiplatform:

plugins {
    kotlin("multiplatform") version "1.9.20"
    // ...
}

And configure Kotlin as follows:

kotlin {
    js {
        browser {

        }
        binaries.executable()
    }
    jvm()
    sourceSets.jvmMain.dependencies {
        implementation(teavm.libs.jsoApis)
    }
}

This is quite usual for a Kotlin Multiplatform project.

Further configuration a bit awkward since TeaVM relies on source sets provided by the standard Gradle Java plugin. However, Kotlin Multiplatform provides its own "source sets", which aren't compatible with ones provided by Gradle. Moreover, standard source sets simply don't work with Kotlin Multiplatform, so we have to keep our TeaVM-specific code in the jvmMain source set instead of teavm. Also, we need the following lines in our configuration:

tasks.withType<TeaVMTask> {
    classpath.from(kotlin.jvm().compilations["main"].output.classesDirs)
    classpath.from(kotlin.jvm().compilations["main"].runtimeDependencyFiles)
}

By default, the TeaVM Gradle plugin adds main Java source set to the compilation classpath, but TeaVM knows nothing about Kotlin Multiplatform source sets, so we have to do this manually.

The final step is the inclusion of JavaScript files, produces by Kotlin/JS, into the *.war file:

tasks.war {
    dependsOn(tasks.named("jsBrowserDistribution"))
    with(copySpec {
        from(kotlin.js().binaries.executable().map { it.distribution.outputDirectory })
        into("kjs")
    })
}

Unfortunately, I did not find much information regarding this, so had to find a solution myself; this one is not necessarily right or optimal.

Using Web Workers

Creating Web Workers with TeaVM

A Web Worker is useless without some foreground JavaScript that works in the UI thread. So, we need not only the worker itself but also a script that loads the worker, communicates with it, and displays the rendering result.

There are several ways to achieve this:

  1. Declare separate modules for the main script and the worker script.

  2. Declare separate Gradle tasks for the main script and the worker script. However, this is not directly supported by the TeaVM Gradle plugin: it creates only one JavaScript task, and in order to create another one manually, the user has to write some extra configuration. I believe there's room for improvement here.

  3. Use one script for both purposes, pass a parameter so that the script knows in which role to operate.

Let's follow the third approach.

In our main.kt, we write following:

fun main(args: Array<out String>) {
    if (args.size == 1 && args[0] == "worker") {
        worker()
    } else {
        runController()
    }
}

In index.html:

<head>
  <!-- ... -->
  <script type="text/javascript" 
          charset="utf-8" 
          src="js/software3d.js">
  </script>
</head>
<body onload="main()">
  <!-- ... -->
</body>

We see that the generated JavaScript defines a function called main. In the HTML file, we call it without any arguments, so it will follow the runController path.

Unfortunately, there's no way to create a worker and call some function with certain arguments. Thus, we write a simple JavaScript manually to do it:

importScripts("js/software3d.js");
main(["worker"])

Finally, to create a worker from Kotlin code, we write the following code:

val worker = Worker.create("js-worker.js")

Now we can assign a task to the worker:

worker.postMessage(JSObjects.createWithoutProto<JSMapLike<JSObject>>().apply {
    set("type", JSString.valueOf("frame"))
    set("time", JSNumber.valueOf(frameTime))
})

And receive the rendering result:

worker.onMessage { event ->
    val data = event.data as JSMapLike<*>
    val buffer = data["data"] as ArrayBuffer
    displayBuffer(buffer)
}

In our case, we're spawning several workers to run in parallel, so the final code is slightly more complex.

Worker listens for incoming messages from the main script using the following code:

Window.worker().listenMessage { 
    val dataJson = it.data as JSMapLike<*>
    when ((dataJson["type"] as JSString).stringValue()) {
        "frame" -> worker.renderFrame(dataJson)
        // handle other message types
    }
 }

And responds with the following code:

val message = JSObjects.createWithoutProto<JSMapLike<JSObject>>().apply {
   set("data", buffer)
}
Window.worker().postMessage(message, JSArray.of(buffer))

The second argument passed to postMessage is an array of transferable objects. We use it to improve performance of communication between the main script and the worker script: the buffer will be passed by reference, not copied. Worker script, however, loses access to buffer, but it's not a problem since the next frame will be rendered into another buffer.

Writing Web Workers with Kotlin/JS

In Kotlin/JS, the worker code is more or less similar. First, to access the worker global object, we define the following property:

private val self = js("self") as DedicatedWorkerGlobalScope

Now, we can use it to listen for messages:

self.onmessage = { event ->
    val data = event.data.asDynamic()
    when (data.type as String) {
        "frame" -> renderFrame(data)
       // handle other message types
    }
    Unit
}

And respond:

val message = Any().asDynamic()
message.data = typedArray.buffer
message.time = ((perfEnd - perfStart) * 1000000).toInt()
self.postMessage(message, arrayOf(typedArray.buffer))

Workers and WebAssembly

WebAssembly can't use Workers directly, at least the classic version without GC support The reason is that WebAssembly can't pass JavaScript objects; it's only capable of passing and receiving numbers. So, all we can do is to write all the actual Worker code in JavaScript and communicate with the WebAssembly module to call the actual renderer

Another difficulty is that WebAssembly requires some time to load, and the main script can send several messages by this time. This problem can be solved in two ways:

  1. Don't send messages from the main script immediately; wait for "I'm ready" messages from the worker first.

  2. Store all incoming messages in a buffer; then, when the WebAssembly module is loaded, send them further to the module instance.

Our renderer uses the second approach.

First, we load the WebAssembly adapter generated by TeaVM, define the instance, and message buffer:

importScripts("wasm/software3d.wasm-runtime.js");
let instance = null;
let pendingInstanceFunctions = [];

Second, define the message event listener:

addEventListener("message", e => {
    let data = e.data
    switch (data.type) {
        case "frame":
            pendingInstanceFunctions.push(() => {
                instance.exports.renderFrame(data.time);
            });
            runPendingFunctions();
            break;
        // handle other event types
    }
});

Third, load the WebAssembly module:

TeaVM.wasm.load("wasm/software3d.wasm", {
    installImports(o, controller) {
        o.renderer = {
            result(data, size, time) {
                let buffer = controller.instance.exports.memory.buffer.slice(data, data + size);
                self.postMessage({ data: buffer, time: time });
            }
        }
    },
}).then(teavm => {
    teavm.main([]);
    instance = teavm.instance;
    runPendingFunctions();
});

The load function takes an options object as a second argument, where the installImports option is a function that provides implementations of modules imported by the WebAssembly module. Our WebAssembly module imports the renderer module that contains a single result function. Note that this notation:

controller.instance.exports.memory.buffer

provides access directly to WebAssembly memory. WebAssembly module passes the address of an int array, containing the rendered frame, to the data parameter. This whole notation simply gives direct access to the representation of Java array data.

Finally, runPendingFunctions is declared as follows:

function runPendingFunctions() {
    if (instance === null) {
        return;
    }
    for (let f of pendingInstanceFunctions) {
        f();
    }
    pendingInstanceFunctions = [];
}

From the WebAssembly part, this looks like:

@Export(name = "renderFrame")
fun renderFrame(time: Double) {
    val perfStart = System.nanoTime()
    updater(time)
    renderer.render()
    val perfEnd = System.nanoTime()
    val buffer = raster.color
    sendRenderResult(Address.ofData(buffer), buffer.size * 4, (perfEnd - perfStart).toInt())
}

@Import(module = "renderer", name = "result")
external fun sendRenderResult(data: Address, dataSize: Int, time: Int)

Here, Address.ofData is a TeaVM intrinsic (only available in WebAssembly and C backends) that takes the physical address of an array. Note that in the general case, GC can move objects and their address changes, but in our case, we pass the address directly to the external function. So we have a guarantee that no other Java or Kotlin code is called, and objects created in JVM heap, which means that GC is not a problem. In more complex cases, developer should take this into account and use special techniques to prevent object from moving.

A bit more about our worker architecture

In our renderer, we utilize multiple workers to enhance performance. Each worker is responsible for rendering the entire scene, but it processes a portion shrunk by N times vertically, where N is the number of workers. Each worker renders its segment with a distinct vertical pixel offset.

Specifically, the rendering breakdown is as follows: the first worker handles lines 0, N, 2N, 3N, and so forth; the second worker tackles lines 1, N + 1, 2N + 1, 3N + 1, and continues in this pattern.

This approach ensures an even distribution of workload among the workers, preventing discrepancies based on the amount of geometry in different parts of the screen. The decision to split by lines, rather than columns, was intentional. It takes advantage of the fact that images are typically stored line-by-line, making the process of copying entire lines more efficient than copying columns.

Benchmark

First, I should warn you that I don't have enough knowledge in the area of making benchmarks, so my methodology might not be quite perfect.

To perform measurements, I used the following hardware configuration:

  • AMD Ryzen 7 3700X 8-Core CPU

  • B550 AORUS ELITE V2 Motherboard

  • 32GiB RAM

and the following software configuration

  • Fedora Linux 39, kernel 6.6.4

  • Google Chrome 120.0.6099.71

I ran each test for 80 seconds and took the average time, by worker, spent rendering a frame. I repeated tests 3 times and present here the average score (lower is better). Additionally, I measured the average time spent rendering a frame in general, which includes communication between the worker and the main script.

In the following table, time is in microseconds.

TeaVMKotlin/JSTeaVM (WebAssembly)
Worker302726624491
Relative1.13711.687
Total481441738380

I was a little bit disappointed to see these results because TeaVM performs here 14% slower compared to Kotlin/JS. AFAIK, Kotlin/JS does its work rather straightforwardly (at least, in the old times when I used to work at the Kotlin team at JetBrains), whereas TeaVM does a great job to optimize the generated code. Also, I observed the code generated by both compilers and could not see any difference in the hot method, so these results are a mystery to me.

Anyway, it's not a big problem. First, now I have these measures, and now I have a chance to polish TeaVM to achieve better results. Second, it's only one particular task, and this does not mean that TeaVM is slower than Kotlin/JS in all tasks. Third, TeaVM has at least performance comparable to Kotlin/JS. Since TeaVM has a killer feature, i.e., support for other JVM languages, it still remains a good alternative to Kotlin/JS.

Another surprise is how slow WebAssembly is. I thought it was something with TeaVM. TeaVM adds some overhead to maintain a shadow stack for GC and exception handling, as well as null pointer checks. I tried just to turn them off, but this did not increase performance dramatically. So I simply don't have an explanation. Perhaps, JS JIT can generate SIMD instructions, but WebAssembly does not perform such optimizations, relying on WebAssembly-targeting compiler to produce SIMD. Anyway, I can only speculate here; I don't have enough information about the internals of JS and WebAssembly runtimes.

I also was a bit curious and measured performance with a different number of workers (only for TeaVM). Here are the results:

WorkersScorePerformance ratio
1139201
2102021.364
459522.339
840723.418
1237483.714
1532394.298
1630274.599

This also looks a bit strange. I was expecting that increasing the number of cores by N does not necessarily increase performance by N. But I did not expect that actual performance grows so slowly depending on the number of cores. I did my best to ensure that workers operate in their own memory area so that cores don't waste time synchronizing their caches. What else can be the reason? Perhaps, when only one core works, the CPU boosts its performance. Can the JS engine add its own overheads? If you have some idea, please let me know.

Lessons learned

The first lesson I already mentioned: TeaVM can be slower in tasks that I thought it was faster. This means I have to look through the generated code and profiling results to find out what makes this difference and fine-tune the TeaVM optimizer.

Another lesson is that I need to work with the TeaVM optimizer even a bit more. Frankly speaking, my first measures were even more disappointing, but I quickly found the reason and added some optimizations to the Kotlin code. Here are my observations.

In Kotlin, there are properties, but Kotlin/JVM transforms them into getter/setter pairs (there aren't properties in JVM). TeaVM does not know anything about it and produces the same getter/setter pairs in the output JS. Kotlin/JS targets Kotlin properties to JS properties since there isn't such an intermediate step. Inlining can solve this, but it turned out that the hot method (Rasterizer.drawTrianglePart) s too big, and TeaVM's inliner does not optimize it. I need to fine-tune inliner heuristics to fix it, but first, I need to refactor the inliner to be less greedy and rely more on various heuristics to estimate if inlining a particular method would be profitable.

To fix this issue, I just added @JVMField annotation to some properties.

The next optimization is related to the for .. in loop. Kotlin/JVM does not optimize the following code:

for (i in start until end) {
  // body
}

While Kotlin/JS recognizes this special case and produces for loop. Instead, Kotlin/JVM produces code that instantiates IntRange, asks it to create Iterator and then iterates over this Iterator. So, it's not equivalent to Java's

for (var i = start; i < end; ++i) {
  // body
}

But closer to following:

for (var i : new IntRange(start, end - 1)) {
    // body
}

For the JVM, it's not always a problem, since it can optimize this. TeaVM's optimizer also has everything necessary to produce efficient code here. However, its inliner can't be too aggressive since the size of generated JS is important, that's why it fails to optimize loops in large methods like Rasterizer.drawTrianglePart. I worked around it by rewriting for .. in loop with while. However, this also means that I have put more efforts into improving inliner heuristics.

And finally, the overall development experience with TeaVM was comparable to the one with Kotlin/JS: interoperability with JS is not 100% smooth in both cases, but still very good and much more convenient than JNI.

Conclusion

One might think that TeaVM is a toy project that can only compile these tiny examples. Actually, TeaVM has been here for 10 years and is production-ready There are examples of relatively large code bases compiled from Java to JavaScript with TeaVM, such as Retro Commantder RTS and many others.

Why would a developer prefer TeaVM over Kotlin/JS? The answer is simple: support for multiple languages. Java, Kotlin, and Scala have their own solutions to compile to JavaScript. However, each one is limited to one particular language. If you have a library written in Java and want to use it from Kotlin or Scala, the only solution is TeaVM. One example of such library is libGDX, which is written in Java and, therefore, can only be compiled into JavaScript with TeaVM.

Note that I wrote this article with the help of ChatGPT 3.5. You may have noticed that I am neither a good technical writer (but rather an engineer) nor a native English speaker. I did not ask ChatGPT for much creativity; instead, I simply requested it to perform some spellchecking, punctuation correction, and a slight improvement in the style.