Monty Williams on MagLev
From RubyConf 2008 in Orlando.
Interviewed by Geoffrey Grosenbach
Geoffrey Grosenbach: Geoffrey Grosenbach at RubyConf 2008, Orlando, Florida, talking to Monty Williams, one of the founders of Gemstone. I thought one of the most exciting things to come out of Rails Conf a couple of months ago was the story of MagLev where GemStone is building a Ruby interpreter on top of their existing Smalltalk technology.
You were telling me last night about how that all started and you actually didn’t get the approval to work on the project until later than you hoped. How did the whole thing start?
Monty Williams: So the way it’s started is that a couple of years ago I was talking with Avi Bryant who gave a talk at Rails Conf the year before on a heresy that Ruby would be better running on top of a Smalltalk virtual machine. He came in and gave a talk about “I’m from 25 years in the future, and you guys in Ruby are doing this, but a lot of that was already invented with Smalltalk”.
I had worked with Avi previously on Gemstone’s implementation of Seaside, which is a web framework developed in Smalltalk. So, I got together with Avi and we hatched this plan to do what he thought, which was to run Ruby on top of our Smalltalk virtual machine.
We started talking about that after Rails Conf 2007, basically, in October of 2007. I thought this was a great idea. Let’s try and get something done by Rails Conf in six months and see what we can do. Well, it didn’t quite happen that way. It wasn’t until Valentine’s Day in 2008 that we actually started on the project which gave us 100 days to get what we wanted implemented in MagLev. We decided the test would be: could we get WEBrick running on MagLev in 100 days?
We were also going well, gee, we have to sign up for Rails Conf and we have to make this commitment. What if we can’t get this done in 100 days? We’ll look pretty stupid. To work around that, we said: let’s first off get the YARV benchmarks that were shown on Antonio Cangiano’s website, a fairly popular place to hit when you’re looking at Ruby benchmarks. We said, if we can get, let’s say, 80 percent of these 40 or so benchmarks running, then if we at least get to Rails Conf it’s something we can show if nothing else.
So, Avi is a pretty sharp Ruby programmer and Smalltalk programmer. It took him, maybe, a couple of weeks to get enough of the benchmarks running that we thought we could have something to show if nothing else occurred. Then, we started on WEBrick which was a fairly tough task, but you don’t have to have as much of Ruby implemented to run WEBrick as one would think.
With a lot of hard work and nights and weekends between Avi working on part of it and the team from Gemstone working on the other part, we got WEBrick running with relatively minor changes probably about two weeks before Rails Conf. Then, we had two weeks left over, and we took Tim Bray’s – he has an XML parser written in Ruby called RA. He never followed on because it’s so much slower than REXML, slower than Hpricot that nobody would ever want to use it.
We thought, let’s see if we can get that running and get some performance measure to see because that’s more like a real app than WEBrick is because WEBrick spends most of its time sitting, listening, waiting on a socket. So, we imported Tim Bray’s RA application which turned out to be roughly five times faster than it was in MRI.
There we were at RubyConf. A hundred days in from the very first start until we got to – I mean, not RubyConf. We were at the Rails Conf, end of May. That was the story of the first 100 days.
Geoffrey: It’s gone on from there. The only graphic they showed at the lecture yesterday was that originally after those hundred days, they were passing, I think, a total of 15,000 specs; no 1500.
Monty: Let me tell you. I actually have that. I would give you real numbers. Let me just open up a file here real quickly. So, the story was that after Rails Conf it turned out we had a lot of infrastructure work to do in terms of getting AST notes that weren’t implemented, implemented, getting parser work, compiler work. A bunch of stuff that was actually not much involved with coding Ruby but was involved in getting stuff to run the way we want it.
We didn’t actually start working on trying to run Ruby specs, which is the project that Brian Ford from the Rubinius team has put together a large numbers of specs if you pass them all you run Ruby. We knew that MRI, for example, passed a huge number of those tests fairly close to 16,000 tests in the core and language Ruby spec. There is also a third standard library part of Ruby spec.
We decided that the best way to get us running and say that we’re really running Ruby is to pass those specs. We started in September, and we passed about 1800 of those specs out of the nearly 16,000. Then, it turned out to be a little bit of vacation time. We didn’t do any new releases until the end of September.
By the end of September we were running about 4100. Every week we run about a few hundred more so as of the third of November we’re running about 7500 tests. When I say running, actually running ones that passed. We run about 12,500 and about 3400 of those don’t have the proper result which could be an error. It could be that we just don’t produce the right string and stuff. We’re close and about 1700 we actually fail on…
Our goal is to, as rapidly as possible, get to running roughly the 15,000-16,000 of those specs as we can. And when we get a higher number done in the Core and Language, we will start running the standard library tests. Which brings in mainly more Ruby code – you know, code that we will take… that’s already existing Ruby code in the standard library that we should run unchanged.
So, we think when we get through Core and Language, we should run most of the standard library tests and that brings the total up to about 33,000.
Geoffrey: Impressive. It’s great to see a project like that, like Ruby specs, to where it wasn’t really endorsed by Matz, but he said “Hey, I don’t want to do it. But, if you want to do it, go for it.” Now, we have that. Now the different implementations can use that. And it’s a real nice benchmark to gauge your progress, and then, actually be able to see if you implemented what Ruby is.
Monty: Yeah, and it is. I mean you look at MRI, Rubinius and JRuby. We have seen all run these specs. I think the IronRuby guys too. I haven’t looked at their results. But, they are running roughly the same percentage. You don’t have to run 100%, you only have to run like 99% plus. There is some weird edge cases that I think nobody runs probably correctly. But, we think that the Ruby specs project is just – it was a great boon to us, because when you think of somebody coming completely outside from Ruby, the way we started implementing it as we sat with the pick axe book, and we said “OK. Hmm… chapter X. what’s… let’s do this.” [laughs]
The thing was we had an existing VM, right. So, we have a VM that runs a lot of stuff. And, for example, what Avi was able to do to get WEBrick running where you needed time, was just map Ruby class to the Smalltalk. And you get something that basically runs, and then, you account for the difference.
And there are some subtle differences. But, you know, you can think of it as basically we had the engine but we are building this part of the car around it. And the car had to meet the specs. But, we’ve got a good engine but we still needed to add up some transmissions and wheels. That’s probably a poor analogy. But, at any rate – we have the virtual machine, we have the persistence and we have the distributed object share already there. And we have to make it run Ruby.
Now, there’s a prior story that will kind of add up to this, is, when we started running Seaside, which is the Smalltalk web framework. What we did is take our dialogue to the Smalltalk and Seaside which was originally written in Squeak, which has some different syntactical pieces that we have. And we actually added the capability to run Squeak Smalltalk to our virtual machines which already ran GemStone’s Smalltalk.
So, we already had experience taking our virtual machine and making it run a slightly different language. And Avi’s point was that the object models in Ruby and Smalltalk are so similar that the basic engine part should work just fine. All you have to do is forget how to compile it into the byte codes that you already run.
And I think we wound up adding maybe a couple of dozen primitives to account for some of the difference, because zero based versus one based. You don’t want to be dealing for that different offset at a high level. So, you have some primitives that we had to change. But, otherwise it seems to be a pretty good thing to bolt this on to an existing VM that’s already used in production in a lot of financial trading systems – which are probably a bad word these days -shipping, semiconductor manufacturing.
So, we got a good solid persistence mechanism, distributed cache and virtual machine. And the trick is to make it run Ruby. And Ruby is a wonderful language for all the developers who use Ruby. And boy, there are a lot of edge cases. So, for implementers of Ruby, I can see what Charles Nutter and John Lam and all of the other guys have gone through to get this to run. It’s a extremely complex language, because Ruby takes care of the hard parts for you to make it easy for the developer.
And I think that’s reason that so many people like Ruby. Its wonderful for the developer, you get concise code. And yeah it’s just hard for the implementers because we have to take care of all the things to make your life easy.
Geoffrey: Two other brief questions. You mentioned the kinds of projects that GemStone is being used on right now. A lot of people think of Smalltalk as this old thing from the 80’s and it has passed its time. But, no, it’s at the core of financial institutions, stock markets and high technology companies. You had the vision to take that technology and make it available then to Ruby developers, open source developers building websites things on a lot smaller scale then many of your current customers. Why did you feel like that was important?
Monty: Well, GemStone is pretty solidly been an enterprise company for the last… I’ve been with… as you mentioned one of the founders been with GemStone since 82. And we always been into the enterprise business. And the thing that motivated me to take this approach was – I think that applications that run on your computer that have to be installed in an enterprise, installed on thousands of computers are just a huge waste of IT resources managing those. If you can run an app over the web, then, I can deploy it in my data center and everybody in my corporation can have access to it.
You just look at it and you go in the future – as we get better and better at developing apps through the web, everything is going to run that way. I think it’s just inevitable. And it’s hard, you know, it’s getting much better. Ajax’s working through the web is better. And as you have seen Google apps and other apps that run – and you can do drag and drop, and whatever.
I think it’s the way that things are going to go. And even inside the corporation I deal with, for example, here’s a trading company that we deal with. It has 8000 users. And even in the very early days at Florida Power & Light, and their trouble call system had thousands of computers that they had to install this on. So, every time they upgraded their app, they had a huge problem to push the app out and make sure it got installed on all the work stations that it was being used for.
And I saw that pain, and in how many staff that they had to do that back in the late 80’s or early 90’s. And I think… why would you do that? Let’s make it easy on them.
Geoffrey: Final question. Even after you finished this fast wonderful Ruby…
Monty: Virtual machine.
Geoffrey: ...virtual machine. Not quite an interpreter, a little different but…
Monty: It’s actually complied in native codes, so. Yeah, it has a just-in-time compiler. And, in fact, we found one of the things that Alan mentioned in his talk is the difference… but we are building it on the GemStone virtual machine version 3. And the main difference is a full native code compile mode. And that gives us about a two x to three x speed up over the interpreted version of GemStone 64 bit, which was a version two.
When we get MagLev finished all the customers who run all of our other products are going to run two to 3x speed up too, because of having full native code. And methods are complied basically the first time you execute a method it gets complied in native code. So, the first thing there’s a little bit of hit. But, if you are going to run method even 20 times complied on native code, 19 of those times run faster.
Geoffrey: So, even when you do complete this, you still have the problem of how do we come up with a great way of working with a persistent object data store in Ruby? Some Rubyist, definitely Rails developers, may think of active record as the “be all and all”. But, that’s just kind of a kludge to work with a relational data base.
Is the current way – that’s your Smalltalk applications query your persistent data stored. Do you think that’s going to give you any hints as to what a good way would be do it in a Ruby, or is this just going to take just brand new thinking about how to work with objects and store those and query them?
Monty: I don’t think that GemStone wants to be the – people who want to use Ruby, we want to develop this in a way that’s natural to using Ruby. And what we would like is that when you go use Rails, you have to learn Ruby and then you have to learn SQL – and maybe, everybody already knows SQL. But, you think “God, why shouldn’t I just go… you know just regular dotted expression… why do I need to write a query in SQL?” So there are two things. One is, why do I need to force my data into a tabular format instead of having a nice graph of objects the way I want.
So, I would like to build the model the way I want the object model to be and have the scheme of my database be the model itself. Not to be a bunch of rows and tables that I convert my object into. Now, Esther Dyson has a wonderful quote where she says… you know, it’s like if you drive your car home at night. And when you put it in the garage you had to disassemble your car and put all the pistons in one bin and wheels in another. And then in the morning when you get ready to drive it you got to take it out and reassemble it all together.”
And she says “People have been doing this for years. And eventually there are going to figure out that’s just not the best way to store your car.” You know relational databases are here to stay. You have to deal with a lot of legacy, information. You have to retrieve data from it send data to it. So, you cannot avoid it. And we in that sense will have to do active record to deal with that type of data.
But, the people we are talking to kind of on the leading and say “I would really like to… I am tried of active record.” There have been proposals like active model. And we are just really at the start of thinking, how would Rubyist best like to do this? And so, when we get this kind of early alpha out to a few selected people – to make sure don’t get overwhelmed and we get the kinks worked out. We are going to pretty soon start getting feedback from people who say “This is the way we think it should work.” Then we are going to implement from there.
But, we don’t want to force people to use our idea of the way object persistence work based on our Smalltalk model if that’s not the way Rubyist want to do it. So, how that turns out – when I come back to RubyConf next year maybe that all will be mapped perfectly and we will all understand it.
The experience that we had with one of the large banks who had an object-oriented system built in Smalltalk talking to Oracle. And they finally switched to using the persistent store that we had that was object persistence. There aren’t having to do joins. They don’t have to do all the OR mapping. They got about a 15x speed up in their application just by pulling the data out of Oracle.
And so, there is always an expense in doing this, you know, disassembly. Putting in the parts and getting it back out is expensive. And whether it becomes the really preferred method or whether people just say “You know, I know relational stuff. I want to stick with it. Got to do that,” may happen. But, there will be at least some percentage of the Ruby market that goes “I can get my application done better. I can use the model that I have in Ruby as the scheme up for my persistence. And that’s going to enable me to get my job done faster and more reliably, because I don’t have to think in two different ways.”
So, that’s what we think will and hope will happen as a result of MagLev.
Geoffrey: Well, very ambitious and exciting project, but great to hear about it. Thanks for the conversation.
Monty: Oh, you are more than welcome. Well, let’s do it again next year when we’ve got stuff that we can show and talk more about – how persistence’s really works out.
Geoffrey: Sounds good.
Speedy bandwidth is provided by the fine folks at Rails Machine. Other expenses are paid for by PeepCode screencasts where I got a sale going on through the end of November. 20 bucks off the unlimited plan or one extra credit if you buy the five pack.
And if you are craving more Ruby audio content, check out the Rails Envy podcast and by friends over at railsenvy.com. A weekly podcast every Wednesday covering the latest news in the Ruby and the Rails world.