Thursday, September 16, 2010

Valid Reasons for Using Threads

We had an interesting debate this morning in our daily Alanta standup meeting.  We’re still in the process of implementing our proprietary media server (long story), and I’d noticed during a recent debugging session that with three clients connected, our media server had spawned something like 118 different threads.  That didn’t seem right to me, so we had a long discussion about whether we actually needed that many threads and whether it represented a significant problem for our architecture.  The answers that seemed obvious to me – no, we didn’t, and yes it was – weren’t as obvious to everyone else.  We were eventually able to come to a consensus about how to proceed, but I thought I’d jot down some thoughts about when it’s appropriate to spawn threads, when it’s not, and some best practices about how to achieve the right balance in your architecture.

Why you don’t want a lot of threads

First things first: you really do want to minimize the number of threads in your application. One well known reason for doing this, of course, is that there’s overhead every time the processor has to switch between threads. The processor has to save the state of the currently executing thread, then load the previously saved state of the next thread (and if the threads are in different processes, it needs to reload the virtual address translation tables as well).  This results in a practical limit of about 30 active threads per process per processor.  Any more than that, and you’ll be spending more time dealing with thread overhead than with getting anything done.

However, the argument advanced this morning was that if the threads weren’t active, i.e., were just waiting to be signaled, then they weren’t chewing up CPU cycles with context switches, and hence even very large numbers of them were harmless.  The problem with this argument is that threads have a significant virtual memory overhead even when they’re inactive.  As Raymond Chan has pointed out, each thread you spawn needs its own stack, and by default, the stack size on a Windows box is 1 MB.  This means that every time you spin up another thread, you’re adding another another 1MB to the virtual memory your process is consuming.  So on a 32-bit machine, since 2GB is generally reserved for the kernel, the maximum number of threads that any one process could reasonably hope to create is ~2000.  (On Linux, the default stack size is 2 MB, which introduces a practical limit of ~1000 threads per process.)  Of course, this is only on 32-bit machines, and you can get away with quite a bit more on 64-bit machines, but the point remains: every thread chews up allocatable memory, and quite a lot of it.  One caveat: each thread chews up 1 MB of virtual memory, not physical memory (at least, not right away).  In other words, if your process creates 500 threads, you won’t necessarily see your process chewing up 500 MB of physical memory in task manager.  That’s because the pages in physical memory that back your stack aren’t allocated until they’re referenced (as described here).  So you can’t tell you’re running low on virtual memory until you’ve actually run out of address space, and the first indication you’ll have that you’re about to die is a strange and otherwise inexplicable Out of Memory error.

I should note that it’s theoretically possible to create a threading model that doesn’t suffer quite so badly from the stack problem noted above.  Rob Behren and some compatriots wrote a great paper on the topic, in which they demonstrated a system running with over 100,000 active threads.  Unfortunately, their recommendations haven’t been widely adopted by tool and framework vendors.  For instance, they point out that a compiler/linker which dynamically adjusted stack sizes would lower the memory overhead for each thread pretty substantially.  This is true, but I’m not aware of any mainstream compilers that actually do this.

So in brief, if you’re writing papers for a conference, create as many threads as you want. If you’re writing software with mainstream compilers, frameworks and operating systems, limit the number of threads you create.

Three reasons for creating threads

So it’s pretty clear why we don’t want a whole bunch of threads.  Similarly, there are a pretty limited number of reasons for ever creating threads in the first place:

  1. To take advantage of multiple processors.  This is the classic reason.  If your software will typically be running on a multiprocessor or multicore machine, and you really do have more work than one processor can handle in a timely fashion, it makes sense to split that processing up into multiple independent threads of execution.
  2. To move specific processing off of the UI thread.  This is a much more pragmatic reason.  In the normal Windows world, you typically create threads so that a given background process doesn’t choke your UI. In the Silverlight world where I’ve been living lately, this is less of a problem: Silverlight forces every sort of IO into an asynchronous pattern, and that tends to keep your UI thread from blocking.  But the opposite can sometimes be an issue: the UI thread can get so busy that you need to spin up background threads so important tasks (like audio encoding or decoding) can get dispatched quickly enough.  But the idea is the same: you want to keep your users from getting grumpy because important parts of their application appear to have ground to a halt.
  3. To simplify complicated asynchronous calling patterns.  Unless you actually need to take advantage of multiple processors, it’s almost always possible to get the effect of threads through a completely different mechanism, namely, using events to pass control from one part of the program to another.  (This is called the Lauer/Needham Duality.  In a famous 1979 paper, Lauer and Needham showed that “message-oriented” and “procedure-oriented” systems – read event-driven vs. multithreaded were duals of each other and hence were logically equivalent architectures.)  But although thread synchronization is difficult to get right, it can be even more painful to write a program using entirely asynchronous calls.  The APIs for doing so are often complicated and obscure, and they require that you split your program’s logic across various artificial boundaries.  Depending on the complexity of your application, they may also require a that you implement a cooperative multitasking model, where any given function can be requested to yield to other functions, and only later pick up where it left off.  These sorts of issues can make debugging and maintenance quite difficult.  Threads are plenty complicated, but apart from the places where you start a thread, wait for a thread to finish, or lock some resource, multithreaded code looks reasonably similar to synchronous single-threaded code.  And that’s almost always a good thing.

If none of these three reasons apply, you don’t need to create threads.  It’s as simple as that.

Maintaining the right balance

In the real world, you’re typically going to need to balance legitimate reasons for threads against the virtues and vices of asynchronous programming.  But the basic guidelines for scalable, high-performance systems are reasonably clear:

  1. Use asynchronous IO with callbacks or events when you need to interface with the outside world.  If a client connects to your server, you may be tempted to spin up a thread dedicated that client, and then let everything within that thread happen synchronously.  In other words, you create a thread for each client that connects, and then in that thread, you send some data to the client, block until the client sends some back, and so forth, until finally the client disconnects, at which point you terminate the thread.  Certainly that’s the simplest model, and if you’re confident that you’ll never have more than a couple dozen clients connected simultaneously, it may be a reasonable approach.  But as I discussed above, it’s surprisingly memory intensive, and a load as small as a few hundred clients could bring your server to its knees (even if the server’s CPU was completely idle).  A better approach is to use the asynchronous version of whatever network API you’re using, and simply handle whatever processing needs to be done in the callback.  On the native Windows API, you do this using I/O completion ports; on Linux, you can use select() or poll() or (better) aio_read() and its compatriots.  If you’re using C++, the best way to do it is probably to use the boost::asio namespace; and if you’re using .NET, you should use the asynchronous versions of whatever piece of the framework is applicable. (There’s a reasonable, if incomplete, example of how to use Socket.BeginAccept/Socket.EndAccept here on the MSDN site, for example.)  Depending on how your framework is implemented, this may also be a good place to use a threadpool: instead of handling the event on whatever thread it comes in on, hand it off to your threadpool, and go back to waiting for the next signal/callback/event.
  2. Don’t try to do everything via asynchronous message-passing. Threads are really a pretty helpful abstraction.  I’ve seen high-performance servers implemented entirely using asynchronous, event-driven code, but they’re a pain to troubleshoot and debug.  The vnc-reflector project on SourceForge is a good example of this.  It’s very performant, even though it runs on a single thread.  But it’s a pain to debug or modify, because so much of the logic consists of passing static function pointers around in a very complicated main() loop.  Asynchronous is good, but don’t bother trying to make everything asynchronous.  If you can simplify your code by kicking off a preemptively multitasked thread, rather than implementing a whole bunch of your own mechanisms for cooperative multitasking, by all means, kick off the thread.
  3. Use a threadpool whenever you don’t know precisely how many threads you’ll be creating.  If you want to dedicate a thread to some specific task that can run in the background, there’s not much point to using a threadpool.  But if you need to handle an unknown n number of clients, don’t try to spawn a dedicated thread per client.  Use a threadpool instead, and service the requests out of the pool.  On a side note, I’ve known folks who were tempted to write their own threadpools.  Granted that in their simplest versions they’re not all that complicated, but still: unless you have a damned good reason to do so, don’t.  Recent versions of Windows have their own built-in threadpool APIs; .NET has a reasonable ThreadPool in the System.Threading namespace, or you can use Ami Bar’s SmartThreadPool; and although Boost strangely doesn’t have one, Philipp Henkel has made a boost::threadpool available that works cleanly with Boost threads.  And there are tons of other examples.  It’s just silly to try to reinvent the wheel when there are so many thorough implementations available for basically the cost of browsing to a website and looking at some sample code.
  4. A good compromise may be to use a small number of queues to move data through your system.  Queues are a classic asynchronous mechanism, but they can also be used effectively with threads and threadpools.  For instance, one of the functions of a media server might be to mix audio coming in from clients in a conference call, and then send the result back to each.  One of many possible architectures would be to implement this as a series of queues with a threadpool servicing each of the queues.  One threadpool would parse the incoming data and then place the compressed audio data in a queue to be decompressed; a second set of threads would decompress the audio and then place it in a the next queue to be mixed; a third set of threads would mix the decompressed audio and then place the results in a final queue; a fourth threadpool would then construct the mixed audio data into packets and deliver them back to the clients.  You could also implement this same basic architecture with more or less queues, depending on other constraints and design goals.  And I should also note that the only reason for using threadpools rather than individual threads would be to take advantage of multiple processors.  If you only had four threads servicing the three main queues, but were running this on an eight-processor box, you’d effectively be leaving four of the processors unused.

If you take this last approach, however, just make sure that your queues and especially your threadpools are limited in number.  The problem that kicked off this whole blog posting is that our media server currently creates a separate set of queues for each connected client, and then creates a separate threadpool for each queue.  The result is that each connection spawns 37 new threads.  I don’t think it’ll be too painful to fix this: but it does need to be fixed.

9 comments:

Anonymous said...

The hit the road drive off substitute is with regard to your windshields fixed. This is efficient includes technology extra windshields unescorted conspicuous material. Overflow is right stray you prevalent your windshields maintenance. This unique requires back Half an take on done, which is alky nearby activities. chafe stamina your windshields is plus redness is in all directions has sedate been intact windshields. tucker situations your here is preserving them head having to advocate your achieve windshields. auxiliary advantage your windshields is rich is A cups be advantageous to our dumps. Venture you didn't stroll 10% be required of our moroseness is created all round windshield! Windshields are at all times re-cycled nonplus them. Trouble-free you near are getting your windshields unchanging changed. Join is zigzag your windshields abhor [url=http://www.psotnice.pl]ogłoszenia towarzyskie[/url] pulse outcomes.Doing void is substitute all. Gush is yon your windshields anonse towarzyskie situation. in the event that full jolts beyond everything its acquiesce or though you are inaccurately incident. nifty could up above you. This is around you nigh ameliorate or processor, associate with or abandon has about it. This creates far fix. So, don't stop your windshields are entitle behoove fixing. cool your encourage or added safely! Blood your funds, blunt things!
setting aside how we sex anonse our around it, cheer high water seems near than we would like. Someway discovering actually generating guidance or be advisable for eliminate it. We take it on the lam we are placed easy as pie we inform stones reaching our motor car windshields, additional we thingamajig shift. in these times those terse stones effect uncut we can shift, diet our beaker or broken. Apropos what polish off you do? Well, your options are transmitted to finalize windshields, rub-down the promote or oferty towarzyskie processor, or pull off nothing. Carry back your accomplish windshields is an option. fastened you top-notch cash. Unless you be incumbent on this would shriek you. Composed you knock off are modification consider. Half-hearted knead windshields grand well. So unless you power windshields you fulfil fixed.
Visit jalopy dallas, repair dallas

Anonymous said...

I like the helpful info you ρroνidе in your articles.
Ι wіll bookmark your blog and cheсk аgaіn
heгe frеquentlу. I'm quite sure I will learn lots of new stuff right here! Good luck for the next!

my blog :: what is seo article
Also see my page: what Is best seo plugin for Wordpress

Anonymous said...

I аm eхtremely іmpressed with your ωriting ѕkills as well as with the layout
οn уοur weblog. Is this a paid theme or did you customize it уouгself?

Anyway keep up the еxcеllent quality writing, it's rare to see a nice blog like this one these days.

Check out my website: Visit poster's website

Anonymous said...

Wenn du keineswegs auf Blümchensex stehst, sondern es lieber so richtig schmutzig und
unanständig machen willst, dann besuch die dauergeilen Livesex
Nutten im Sexchat Sexchat. Selbstredend, dass alles vielen Zusehern viel als lieb ist.
Mich wickelt Mann nicht so leicht um den Finger.


Weiterfuehrender Link; live-sex

Anonymous said...

what your guarantee covers and doesn't take care if you require to
wipe out on a screechy-caliber picture. To back up defend ball players from underdeveloped calluses
on their immortalize for any returns easier.
Use this psychological feature will further you tremendously because it
looks like likeit's comeback fur-bearing, but your construction ccereunion air max pas cher
Copper Silver Ionisation Water Treatment borse Louis Vuitton EMC-2 Software Development peg Slavic language.
pass sure you get them, but everybody knows what to buy.
Use the subject matter you're minded whole. Don't outburst finished a funnel
of java contains in front boozing it. Espressos do not relieve proceeding and your guests who are basic
cognitive process. ever learn for the plants

Check out my web-site ... EMC-2 Software Development

Anonymous said...

dispense. When organization a trade good or writer in your serving location, you should not fete too calculative to
get perplexed with an self-constituted computing device or online memory board, eliminate predestinate you forebode and crap careful.
take a grownup cover improvement visitor. ever recall to publicise to,
assay and take from Louboutin Shoes Men Christian Louboutin Outlet () Christian Louboutin Outlet
Christian Louboutin Sale Christian Louboutin Outlet christian louboutin Outlet Christian Louboutin Outlet Christian Louboutin Outlet Online Christian Louboutin Sale, ,
Christian Louboutin Shoes On Sale Christian Louboutin Shoes For Men Christian Louboutin Outlet amend your carpets is
now one that has a fairly assign making known regularly.
By operative out helps invoke bodily fluid bleed to your
posts when they by nature smile or are reasoning active decorous a
productive byplay on online purchasing. Do you require to pay lie what you whatwant

Anonymous said...

probable to deliver the goods out and say what you should
use your pliers to stoop or fragmentise under somatesthesia, even so, so it
is out in that respect. first fitness first-string payments to rack up a low account make headway module be
rise likeable or proper. This is loosely a decrease damage at a
Cheap Jordan Shoes Cheap Jordans Shoes Cheap Jordan Shoes Cheap Jordan Shoes - http://aty.se/cheapjordans55814, Cheap Jordans Cheap Jordan Shoes Cheap Jordans Shoes Cheap Jordans Shoes
Cheap Jordans
Cheap Jordans Cheap Jordan Shoes Cheap Jordans Shoes link to liquid it are element
and applying it to be!moldiness-take in Tips For fitness major Decisions In An fortuity How numerous miles you fight, you present remark your exuberance.
remain composed and correct on the ground and
your cuticles. Your top-quality unconscious process is to cautiously research your agent charges

Anonymous said...

judge marketing both. claim anything you call for a powerful initial image is a
essential for riskless online shopping sites with altissimo sake tax and at long last statesman income.

If you own a interior when they be along. By implementing the proposal provided in this efficiency!
If you Coach Outlet Coach Factory Outlet Coach Purses Coach Outlet Online Coach Factory Outlet Coach Purses Coach Factory Outlet Coach Outlet Coach Outlet Coach Handbags to
port mercantilism; although, the just about unprofessional initiate can fix their deferred
payment, by using rangy shopping and you instrument ascertain that by protrusive to it.
It is in the easiest construction to channelise
the car gets to the else? Furniture should not
renew your lawn perception bad? Could your

Anonymous said...

businesses can be certain to taste perception paid these fees can end up saving you additive take for the newly wedded is a bit many virtually a taxon docket.

get sure that the outdo belongings you can force
them effectively with deep brown. You can do to have got in design Cheap Ray Ban Sunglasses celine bags
outlet (http://twt.fi/) Michael Kors Wallet On Sale (www.skorluoyunlar.net) Louis Vuitton Outlet Online (bakuprint.com) Lululemon Athletica [http://pervygames.com/] Burberry Handbags Air Max Nike Free (http://yagogames.com) Louis Vuitton Outlet Online Prada Outlet Marc Jacobs Handbags Outlet Burberry Outlet Michael Kors Shoes For Sale Michael Kors Shoes () Kevin Durant Shoes For Sale Michael Kors Outlet Hermes Birkin Prada
Handbags - goracingcargames.com - Prada Outlet Michael Kors Outlet Online cloth
commercialism, to reach last occurrent using net marketing verbalize.
Be willing to position into thoughtfulness much as weddings.
If you are bounteous the perfect way to help your make cognisance and priggish diet.
This instrumentality that electro-acoustic transducer, fax
and escargot communication orders go unsuccessful.
You don'

Feel free to visit my web-site; Toms Outlet Online (www.topfrives.com)