Custom Scripting Languages are almost always the wrong choice

Introduction

It is very common for people already engaged in a large and demanding programming project to wish for, design, and ship a custom scripting language. Let’s talk about why, and explore some options for a less stressful way to accomplish the same purpose.

“Guest Languages” and why people seek them out

A guest language is a language that is hosted inside the execution of code from a different language. So if a C++ application executes Python code as part of its runtime then C++ is the host language and Python is the guest language. Most often guest languages are used to facilitate code being authored after the primary application is shipped. Plugins, mods, custom content, user generated content, all of these might inspire you to bring in a guest language. Some popular options for guest languages include Lua, JavaScript, Python, C#, Java, and many more. Generally these languages are designed to fulfill two end goals.

Ease of use. It should be easy to learn even for a programming novice.
Fast deployment. If the code is compiled at all, compilation should be very quick.

Many developers don’t realize that you can host these languages inside of your application. So they reinvent the wheel of guest languages, and unintentionally slip the development and maintenance of a new programming language into the scope of their project.

What it takes to make a programming language

Creating a new language is a serious undertaking. Very few people are prepared for the overwhelming amounts of labor it takes to create a programming language. I’ll give a brief and simplified overview. First you’ll need to define a model for your language. This is the internal mental model shared by the compiler and the programmer. What programming paradigms is your language going to employ? Most languages employ several, but have a few preferred paradigms. What jargon do you intend to use for describing these concepts? Can you define each of these concepts comprehensively? Once you have that you’ll need a syntax to describe this model. Then you’ll need an interpreter for your language. This will parse text files and turn them into data structures representing your syntax tree. Then you can convert the syntax tree into the data structures that represent your language model, and finally you have everything you need to start running your scripting language.

When everything goes correctly.

Problem is that even the best developers do not write perfect code every time. Something will go wrong eventually. Now you need an error model. You’ll need to be able to detect errors in the syntax, as well as invalid usages of the mental model. Are your error descriptions helpful? Will the developer know how to fix the code once they read the errors? Most developers will also want warnings for suspicious code that can still technically run. What qualifies as suspicious code? What sorts of mistakes are people likely to make when developing with your language?

What’s your plan for debugging tooling? How is your interpreter going to pause execution and show the current state of the program? You’ll want a way to step through the code. What should the front end experience of debugging look like? Where exactly should debugging occur? What about auto-formatting? Developers typically want a way to clean up their whitespace if nothing else. You’ll need to optimize your interpreter eventually. The language interpreters everyone has gotten used to have many years of development going into optimizations.

No one gets all of this right on their first try either. Great languages are made over the course of years, decades even. All of this work has to be completed, evaluated, redesigned, and completed again many times over a very long time. Then after a few years of having at least a couple dozen people doing this you might have a decent language. Maybe.

I’m not saying you should never make a scripting language. I’m saying that if you do, make sure you have the time and people to spare.

The “reward” you get for successfully shipping your custom language

In order to streamline and speed up development, the team agrees to design and implement a custom scripting language. The team just wants a small scripting language. Nothing too complicated. Then the reality of designing a programming language slowly dawns on the team over the course of many months, maybe even years. This new language is leeching valuable time from your project, you’re spending time developing tooling, educating people on how to use the language, fixing bugs in the language implementation, optimizing the performance of the language, and before you know it everyone is crunching to get stuff done on time. The stress of crunch further compromises your work, you’re tackling complicated problems with far too little time to address them properly. “Hacks” and “quick fixes” abound, no one has time to “do it right” anymore. You successfully ship your project in spite of the problems. Now you, your team, and your users get to endure the consequences of this crunching for however long your code remains in production.

So what should we do instead?

There are many great options for guest languages that are ready to be integrated into your next project. I covered a few of them at the beginning of this article. Consider your needs, your use case, and evaluate these languages against them. If you have strict security needs, consider the utilities that exist to restrict and control each of these guest languages. Then, if after searching exhaustively you can’t find any language that meets your needs, only then should you consider implementing a custom language. Hopefully by now you understand the significance of that decision, and your team can decide for itself if it’s ready to take on this workload.

Written on December 29, 2023