July 2007 Archives
My current project involves a lot of transforming and migrating database data, so some kind of ETL framework is called for. I spent some time researching the various Java ETL offerings and surprisingly found there is not a whole lot out there.
Fortunately, I did find one which is meeting my needs in a very elegant way: Scriptella
Their website says
"Our primary focus is simplicity. You don't have to study yet another complex XML-based language"
I second that emotion; the last thing I need to do is deal with another misguided attempt to stuff procedural logic into XML.
At first glance, though, Scriptella looks like nothing if not "yet another complex XML-based language:"
<etl>
<connection driver="$driver" url="$url" user="$user"
password="$password"/>
<script>
<include href="PATH_TO_YOUR_SCRIPT.sql"/>
-- And/or directly insert SQL statements here
</script>
</etl>
Do not be fooled. That is just a shell, and it's about all of the XML you will have to write.
The Rules
For the impatient, here is a sketch of a spec for how you write a Scriptella script:
- You use XML to build a simple skeleton for nesting chunks of declarative languages (typically SQL but not always)
- Outer chunks are responsible for generating a result set - (a list of hashmaps, essentially)
- Inner chunks are processed iteratively for each row in the result set generated by its enclosing chunk
- Inner chunks do something useful with each row (such as perform an INSERT). They can refer to members of the current result set row by name.
- Chunks can be nested to any depth (i.e., inner chunks can also be outer chunks)
A Simple Example
That may sound a little confusing, but it's actually quite elegant and powerful. A simple example goes a long way.
Say you have an Employee table in Database A that you want to copy into Database B, row by row. In scriptella, it would look something like this:
...
<query connection-id='DatabaseA'>
SELECT ID, FIRST, LAST FROM EMP
<script connection-id='DatabaseB'>
INSERT INTO EMPCOPY VALUES (?ID, ?FIRST, ?LAST)
</script>
</query>
When processed, this Scriptella script would select each row out of EMP in Database A and insert them into EMPCOPY in Database B. The nested 'script' element gets processed once for each row, and the '?' parameters get filled in with values from each row.
It's as if you wrote Java code that opened two JDBC connections, executed a query on one, iterated through the result set, and dumped each row into a prepared INSERT statement opened on the other connection.
At this point, you can knock yourself out - create tables from scratch, write arbitrarily complex SELECT statements, create and drop tables as needed, whatever. You can also use this as a way to write queries that generate queries - a more dialect-neutral alternative to the various flavors of dynamic SQL (i.e., queries that generate queries). You have the full power of SQL at your disposal and Scriptella won't get in the way unless you ask it to.
Beyond SQL
Things start to get really interesting when you consider that Scriptella is designed to accommodate any language, not just SQL. For example, something like this can be used to dump data from a table to System.out:
...
<query connection-id='DatabaseA'>
SELECT ID, FIRST, LAST FROM EMP
<script connection-id='janino-java-connection'>
String employee = get("FIRST");
System.out.print("Found an employee named "+employee);
</script>
</query>
Any language that you have a JSR-223 driver for can be used. You can have scripts that generate data for processing by enclosed XML chunks as well as processing data from enclosing chunks
Wrapping Up
There are a lot more (and better) examples on the Scriptella website. The short of it, though is that Scriptella gives you a simple, elegant, declarative mechanism for:
- Moving data between databases
- Writing dynamic queries
- Sprinkling in procedural logic when you need it
Most importantly, it does all of this with as little intrusion as possible. Most of the time, I want to do the heavy lifting with SQL, and Scriptella doesn't get in the way of that. But when I need to add some procedural logic or do something trickier, Scriptella is there.
