What is making us slow?
- We are waiting for the CPU
- We are waiting for network I/O
- We are waiting for the user
- We are waiting for disk I/O
What do we do? We have other cores, let's make use of them!
Threading The mainstream answer these days is spawning threads. However, if you try this in Ruby you will be in for a surprise. Though Ruby 1.9 uses OS threads, the interpretor has what is called a "global interpreter lock" or "GIL". This means that any time one thread is running Ruby code, no other thread can be running Ruby code. This effectively means that, in Ruby, even if you have multiple threads, you can only use at most 1 core of your CPU at a time.
Threading also has all the common problems of race conditions, locks, synchronizing and general "thread safety". This can be a little intimidating to manage, but usually a clean efficient solution can be arrived at. The best idea is to share as little global data as possible. If you are writing a server, assign at most one thread per request and keep all the data for that request local and you are already 99% there.
Notes:
- Rubinius is said to be working on a version of their runtime that doesn't have a GIL and could therefor use multiple cores.
- jRuby works today without a GIL and allows the full use of multiple cores.
Processes don't share memory. Once you fork a new processes you can still read the state of your program before the fork, but any changes you make in process B won't update process A and visa versa. Once you fork you must coordinate your processes through some other means, usually that means Pipes. This is a highly error prone and tedious "API".
On Linux or other Unix-based systems (including MacOS), forking a new processes is usually very efficient. In abstract, forking means cloning the entire processes. In practice, though, the operating system has a nifty feature called "copy on write" (CoW). Which says the two processes can share most all of the memory that hasn't been written to since the fork. As soon as one or the other processes writes to the memory (usually a CPU "page" = 4k), it is copied over into the new processes. This means that when you fork a new processes nothing needs to be copied at first and only a few things need to be copied over time as you start altering things. This is not only CPU efficient, it is also memory efficient. Great, right? Well, in Ruby it's not so great. Ruby's garbage collector (1.9.x) writes to every object every time it runs. This means that the first time either processes triggers garbage collection, the entire processes must be copied. In other words, Ruby loses all benefits of CoW. If you are writing a large application that uses a lot of memory, forking only makes sense for very long-running processes because the overhead of forking is very high.
Notes:
- Ruby Enterprise Edition patched Ruby 1.8.x to be CoW friendly. There is currently no Enterprise Edition for Ruby 1.9.x and forward support of REE seems to be in question.
- Rubinius is supposed to be CoW friendly. I have not tested this.
- jRuby, well, just use Threads. Forking may work, too, if Java allows it and is CoW friendly. Both seem likely, but I don't know.
Conclusion If your app is CPU-bound your best bet is to optimize your code. It is very hard to impossible to use multiple cores in Ruby unless you are willing to switch fully to jRuby.
Network I/O Bound
EventMachine If you are writing some sort of server that is going to respond to network requests or if your app sends out network requests it is highly likely that you are network I/O bound. In the server case, you'd like to be able to be processing another request while you wait for the first one's input to arrive. In the client case, you'd like to be doing other things while waiting for the server to get back to you.
If your app is network I/O bound, there is a good solution in Ruby: EventMachine. All network I/O is "evented". When you fire off a network request, you pass in a block of what you want to do after the network request completes. The call returns immediately. You can then do other work while the network request processes and when it is done, the EventMachine "Reactor" (global event loop) will queue your block and execute it. Similarly if you are waiting for input, you can pass in a block of what to do once the input is ready, the call returns immediately and you can do other work while the input processes.
Eventing has the nifty advantage of being concurrent without having to be "thread-safe". It is known as a "cooperative multitasking". Threading is "preemptive multitasking". The down side of eventing is you have to pass around a lot of blocks that are "defferred" to run later at some point. Code execution order can get confusing and your code-size goes up significantly.
Another thing to be aware of is eventing only runs on one thread. Even if you are using jRuby, you will still only be using one core. EventMachine has the idea of "offline" processes which run on separate threads from a thread pool. This gets you halfway there, but keep in mind that your Reactor, the global event loop, is still single-threaded. At some point that one thread running the event loop is going to be a bottleneck no matter how many offline threads you have running. Obviously if you are running offline threads you have to think about thread-safety again, too.
Fibers Fibers are a cool new technology in Ruby 1.9. They are basically threads, except they actually share just one OS thread and they only multitask when you tell them to. This makes thread-saftey trivial, much like eventing, PLUS you can still write your code in a linear fashion. A mix of eventing and fibers basically gets you all the simplicity of ignoring thread-safety with little of the conceptual or code overhead of pure eventing. Sounding pretty good, right?
Conclusion Eventing, optionally with Fibers, can solve your network IO bound situation. It is a clean framework that isn't as complicated to write for as a pure threading solution. The key benefit over threading is simpler concurrency and allegedly higher performance on a single core. The down side is you are limited to one core.
Note Threading works well for network-bound processes, too. Although Ruby has the GIL, the GIL is released whenever Ruby makes calls to the OS - including networking calls. This means that when a thread is waiting for network IO, another ruby thread can be running on the core. Threads solve the network-bound problem if you can mange the thread safety.
Waiting for the User
There isn't a lot of user interfaces written in Ruby, but concurrency is a common problem in GUI design. Generally pure Eventing works great for this. Inputs, in the form of user mouse and keyboard interactions can trigger events, and outputs, in the form of screen updates, are usually quick enough that they don't need their own threads. I'd be curious to hear about anyone who has practical experience with this in Ruby, but I suspect this isn't a major problem. You are mostly likely to end up being CPU bound if your interface is properly evented rather than User-bound.
Disk I/O Bound
What if your app is disk IO bound? I'm writing a store server in Ruby called Monotable. Ideally, under maximum load, it should keep N disks reading and writing at their max capacity at all times. In principle, Eventing and Fibers should work here. Whenever I read or write to disk, I could suspend execution of the current Fiber and let other events fire. When the read/write completes, the system could queue my fiber as an event and it would continue on the main thread when its turn comes up.
Unfortunately neither EventMachine nor any of the Fiber-aware gems available support evented disk IO. Looking at the Ruby IO library it appears there is no way to asynchronously read or write and be notified or test later if the read/write has completed. I've looked at IO#read_nonblock and IO#write_nonblock and it is not clear those are answers to this problem.
Conclusion Only threading can solve the disk-IO-bound problem. Since the GIL is released with any File I/O, other threads can be doing work while the first is blocked. It seams the best solution will be to use EventMachine for network IO and defer any disk IO work to threads.
Back to Being CPU Bound...
If you are network IO bound, you can use EventMachine and Fibers to speed you up, but under enough load you will be CPU bound. For disk IO bottlenecks, threads are your best bet, and they will speed you up, but you will ultimately be CPU bound here as well. Unfortunately, Ruby doesn't help us much with this. Rubinius is looking promising, and jRuby is here today with multi-core support, but if you want your code to work with the standard, mainstream Ruby 1.9.x you are mostly out of luck.
This is an incomplete exploration of the problem. I'd love to hear if people discover more on this topic.
This is written 2011-09-21. My current version of Ruby is ruby 1.9.2p290 (2011-07-09 revision 32553) [x86_64-linux].
Other Interpretors
Gems dealing with non-blocking IO
GIL - Global Interpretor Lock / Global VM Lock
- groups.google.com/group/comp.lang.ruby/msg/e809a7a7205eaec9
- en.wikipedia.org/wiki/Global_Interpreter_Lock
I hope this is solved in Ruby 2.0. Either ways, this was a good read, thanks for sharing this.
ReplyDeleteNo MacRuby love? It's pretty good at this, being tightly tied to OS X concurrency capabilities...
ReplyDeleteIf your app is specifically charged with handling disk IO, then as you said eventmachine is of limited help.
ReplyDeleteHowever, if your goal is instead "store information and retrieve it later", you can finesse the issue by using a fast datastore -- Redis, Mongo, etc -- over a local socket.
JRuby does not currently support fork because no mainstream JVMs are forkable.
ReplyDeleteRubinius supports fork, but is not and will never be CoW friendly.
Cyberfox: MacRuby was not included because it is not available on all platforms. Specifically, not linux.
ReplyDeleteFlip: The application is actually datastore - a direct replacement for Redis, Mongo, etc...
Charles: I was basing my assumption that Rubinius was CoW friendly on this rather dated page with one, unsupported comment that said it was: http://www.modrails.com/documentation/Architectural%20overview.html . I have not found any other sources with information either way.
Fibers allows you to write linear code at the expense of hiding concurrency. It also means that your code is no longer race free. There is no magical bullet.
ReplyDeleteTake a look at: http://methodmissing.github.com/eio/
ReplyDeletezimbatm> I have been working with Goliath ( https://github.com/postrank-labs/goliath ) - the Fiber version of EventMachine. It is a good compromise to get to first base. There may be no current magic bullet, but I do believe it is possible to do MUCH better.
ReplyDeleteKyle - thanks for the link. I'll have to try that out.
ReplyDeleteRuby concurrency is the ability for two jobs to begin, run, and finish concurrently. However, it doesn't necessarily follow that they'll ever be moving forward at the same time.
ReplyDeleteHire our professionals to get Iron Gates Installation Services in Northridge CA within your range.