For some years there is a trend towards using multiple general-purpose languages in a company, a department or even a project. A scenario: The main parts are in Java. Then there is a little embedded Groovy, the web app is JRuby on Rails and there are some Python scripts for configuration. If databases and an O/R-Mapper are involved there will be some JPA-QL and probably some SQL. And DDL-scripts per target database of course.
Is there a difference between actual programs (in Java for example) and configuration? Except for the expressiveness, there is no difference. A Spring XML application context factory defines its own language using the basic syntax of XML. The same is true for Maven POMs, Properties-Files or the JAR MANIFEST.MF. Don’t be surprised if you find Groovy scripts defined in a properties file. This can get really, really ugly.
Each language should fulfill strict requirements regarding correctness and clean semantics. Whoever is a little into language design knows that there are only a few languages (like ML) that have a precise language specification. Java doesn’t, but for all practical cases its semantics are sufficiently well documented in the JLS. Pressure to innovate in dynamic languages and the inherent complexity makes me doubt the reliability of these languages. The evolution of PHP and PERL are prime examples where poor language design and semantics broke quite some applications and probably caused expensive repair projects.
All the hidden languages like build tool definitions, annotation-based libraries (JUnit, Hibernate, EJB, WebServices, JAXB…) or configuration files must be treated with great care. They are equally important when it comes to maintainability and extendability of a software system. The Hibernate O/R-mapper is known for its lack of checking the annotation-based mapping, causing problems from simply ignorance to obscure side-effects. And every version promises new defects, as input validation is largely done in a procedural, pseudo-structured way.
You’ve probably noticed by now that I am not a fan of polyglot programming. Most developers are not even aware of how much languages they are really using and which risks are imposed by that. Writing a language translator or interpreter is a well-understood, but still complex task even when you know what you are doing. The hidden languages are mostly declarative and declarative programming has been tagged with the “good” attribute a few decades ago. But creating a correct declarative language that can be used creatively is hard. Languages must be robust and resilient or they will be of no benefit in the long run.
Luckily there is way out of Babylon. Java is powerful yet simple enough to be used as a tool for a lot of the tasks you would intuitively create a special-purpose language (representations) for. Google’s Guice has shown how effective an embedded domain-specific language in Java is. Objects configuring objects. For some time now, when I think of data I think of passivated objects of a domain model first because only the most trivial data has any meaning without any notion of behavior. An example: A poor intern was once asked to prepare a letter with a special offer to all customers. What he didn’t knew was that this table was used for prospects, too. The “customers” table needed to be inner-joined with the “orders” table to see if the customer was actually a customer (this was an undocumented concept). Instead of some thousand customers the audience was now increased to about 40000 recipients. If the print service provider hadn’t called back because he got suspicious, this would have been quite an expensive advertising campaign.
A domain model captures the knowledge about the domain. It directly specifies concepts like “There can be any number of data sources, but they all have to have unique names” or “A VirtualHost of a WebServer must have at least one domain name assigned if is not the default host.” A lot of difficulties arise from tools having there own concrete language that is much harder to learn than the abstraction the tool provides.
The Apache web server is my favorite example of how a simple concept is made really complicated. The interpreter for the httpd language does the same tasks the Java compiler does, except that it is less powerful and a pain to use for all but a few users (and even developers – Apples Lion Server product still likes to scramble it from time to time…). It looks a bit like XML, but it is a homebrew-something. How much easier would a Java-based embedded DSL be, a fluent language where the user can learn about the configuration options by using standard code completion and get feedback on syntax errors by background compilation? A new configuration could be unit- and acceptance-tested against a mock implementation and easily versioned, released and deployed. The same idea is true for build files (be it Maven, Ant or whatever), database scripts and almost everything else. Additionally, using a turing-complete language for configuration makes automation a breeze. Java code is also refactoring-safe, which is a big pro compared to dumb configuration formats. I don’t know how many tools for creating configuration files I have seen and was asked to create in my life! But in the end it’s always the same story – create something like that, but with this and these changed.
The missing link for using Java as a kind of lingua franca in other areas except application code is code generation. Each domain model needs multiple representations of itself. There is mutable and immutable representations, aenemic representations (to bind to UI components for example), builders for creating complex instances in code using an embedded DSL, data-only representations as an internal model for data storage and wire-format and so on.
While bytecode enhancement is used for about ten years now, my current understanding is that the real power of code generation comes from evaluating and manipulating the abstract syntax tree of a general-purpose language itself. Experience has led me to the principle, that code generation, weaving or whatever you’d like to call it should be a seperate and explicit step in the build process and the result should be code that could have been written by hand. That makes bytecode-weaving a non-option in nearly all cases because it is hard to impossible to test and analyze and does not lend itself to recursive application. Add an aspect to an aspect? The concept is not in balance. What one really wants is composition and a separation of sub-typing and sub-classing.
So if you find yourself inventing a new XML concrete language representation or tinker with the idea of adding a dynamic language to the project, ask yourself if the same could be achieved in your primary general-purpose language using the builder pattern and an embedded DSL. I bet you will benefit even before going into production just by the gain of simplicity when refactoring and adding new features.