stp
[Top] [All Lists]

Re: Mixed mode

To: stp@xxxxxxxxxxx
Subject: Re: Mixed mode
From: Dirk Reiners <reiners@xxxxxxxxxx>
Date: Thu, 11 Jan 2001 23:06:12 +0100 (MET)
In-reply-to: Pekka Pietikainen <Pekka.Pietikainen@xxxxxxx> "Re: Mixed mode" (Jan 11, 11:12am)
In-reply-to: Stephen Bailey <steph@xxxxxxxxxxxxxxx> "Re: Mixed mode" (Jan 11, 10:23am)
References: <Pine.LNX.4.21.0101111028380.28843-100000@xxxxxxxxxxxxxxxxx> <10101111622.AA25357@xxxxxxxxxxxxxxxxxxxxxxx>
Sender: owner-stp@xxxxxxxxxxx
        Hi everybody,

thanks for your info so far. There are still some aspects that need
clarifying, though.

On Jan 11, 11:12am, Pekka Pietikainen wrote:
> Subject: Re: Mixed mode
>
> The modified firmware only benefits receives, although STP on Linux
> also does zero-copy transmits with some driver changes, which help
> a bit. You can get the same for TCP too these days, and I will probably
> change STP to use the same infrastructure as there's now a
> "stable" release of it out at
> ftp://ftp.kernel.org/pub/linux/kernel/people/davem/.

Hmm, can you give me some pointers on the state of it and how to use it?
Google just gave me links to kernel mailing list archives from 1999 (where
it was rejected) and Sep 2000 (which says it's in there, but kernel use
only for TUX). There was a diploma thesis at ETH Zurich that claims to get
60MB/s using z-c TCP, but with buffer aligment limitations (sounds nice
 anyway).

> If you're sending the same data to all of the clients, and there's
> a lot of them some kind of reliable multicast protocol might be a good
> idea (latency might be a problem with these, though), I haven't followed
> this area recently, though.

Ok, I should have been more specific.

My problem is a little different. I have two possible setups.

In the first I send a little data to all the clients, and will probably
use some sort of multicast or broadcast for it (the network is dedicated,
just me and nothing else). The clients work on that and generate pretty
large blocks of data (images, worst case ~10MB apiece, typical ~1MB), that
are sent back to the server and processed. I just hope the latency on the
multicast is not too bad, but I'm guessing the bigger problem here is
getting the data back from the clients in time, and I hope that STP will
help me there, as copying this kind of data hurts for every single copy.

The second situation needs STP for synchronisation of the clients, so that
they can all act at the same time, as close as possible together. This I'm
pretty sure I can't do with TCP due to latency, but STP should help here.

> Otherwise STP might do the job, although without hardware assist it
won't
> perform that much better than TCP as far as CPU use and bandwidth are
> concerned (especially on 100baseT, which is slow enough that modern
> machines have no problems dealing with it).

I'm thinking of using the 3C905 for the 100MBit and 3C985 for the GBit.
Are that sensible choices? Money is not the prime concern right now, as
this is more of a feasibility study than a product. For the switch we're
looking at some 12 or 16 100 Mbit port + 1 or 2Gbit port Cisco (Catalyst
3500? don't remember right now), because it cooperates best with the rest
of the inhouse network. Any experience here, be it good or bad?

>-- End of excerpt from Pekka Pietikainen


On Jan 11, 10:23am, Stephen Bailey wrote:
> Subject: Re: Mixed mode
>
> It sounds like you are talking about a 1 Gb source and 100 Mb sinks.
> In this case, you must be careful that you bound the ST block sizes
> for your ST Write sequences so you don't overwhelm the available
> elasticity buffering in your network infrastructure.

Actually, it's the other way around, see above. My fault for not being
clear the first time.

> Put another way, if you try to send from 1 Gb source to 10 100 Mb
> sinks, and you burst (your ST block size) 4 MB at a time, and your
> switch can only absorb 1 MB, you will almost certainly end up with
> almost 3 MB worth of data thrown on the floor.  ST will retry, but it
> will fail each and every time because it's retry granularity is an ST
> block.

Hm, I guess I can have that problem in my case, too.

> Put another way, ST doesn't do adaptive congestion avoidance.
>
> You can solve this problem by tuning down your block sizes, but
> smaller blocks means higher CPU overhead, and this still only works
> well if the network is known to be operating in a steady state
> (quiescent is nice).  If there is an increase in traffic flow to one
> of the 100 Mb hosts, you'll end up losing again.

As the network is dedicated that's ok. And as I write all the software
myself I can do some traffic shaping to prevent all the clients from
sending at the same time, but I'd prefer not having to do that.

> In this case, you will have to use an ST block size of ~100 KB ( * 10
> = 1MB).  To hide latency on a per flow basis, you might also want to
> have two outstanding CTSs, in which case you're talking about 50 KB
> blocks.
>
> The 10 * 100 Mb sources to 1 Gb sink works better, for the obvious
> reason.  In general, STP only works unconditionally well with equally
> sized source and sink pipes and a non-blocking fabric.

Ok, that sounds better, as that's what I have.

> Still, for all this, if you can get your traffic balanced right, STP
> will probably be two decimal orders of magnitude more efficient than
> TCP.  This doesn't matter for the 100 Mb hosts, but it's very
> significant for the 1 Gb host.
>
>-- End of excerpt from Stephen Bailey

This sounds very good. The Gbit host doesn't have to do a lot of work on
the data, but at the data rates I'd like to get every little bit counts.

So it looks like it can actually work nicely using STP. Now I need to kick
my boss to actually buy all the stuff...

Thanks for your help

        Dirk



-- 
--
-- Dirk Reiners             reiners@xxxxxxxxxx, Dirk.Reiners@xxxxxxx 
-- OpenSG Forum                                http://www.opensg.org
-- Rundeturmstrasse 6                 http://www.igd.fhg.de/~reiners
-- D-64283 Darmstadt                 All standard disclaimers apply. 
-- Truth is stranger than fiction because fiction has to make sense. 

<Prev in Thread] Current Thread [Next in Thread>