Author Topic: UDT congestion control algorithm implemented  (Read 35162 times)

Rak'kar

  • Administrator
  • Hero Member
  • *****
  • Posts: 6895
  • Karma: 291
    • View Profile
    • RakNet
UDT congestion control algorithm implemented
« on: August 07, 2009, 10:41:53 PM »
The next version of RakNet will be using the advanced congestion control algorithm from UDT http://udt.sourceforge.net/ . From a practical standpoint, internet bandwidth is improved by about 1.5X, while LAN bandwidth is improved about 13X. This should get even better as the algorithm is tuned.

If you want to try it out now, you can get it out of Sourceforge
http://raknetjenkinsso.svn.sourceforge.net/viewvc/raknetjenkinsso/trunk/

Click download GNU tarball at the bottom.

AshMcConnell

  • Full Member
  • ***
  • Posts: 169
  • Karma: 3
    • View Profile
    • Sirocco Racing Simulator
Re: UDT congestion control algorithm implemented
« Reply #1 on: August 08, 2009, 03:33:19 AM »
Hi Rak'kar,

Sounds fantastic.  Do you know if there any effect on the latency?  Is this the default congestion control from now on?  If not do I just remove the #define: -

#define _USE_RAKNET_FLOW_CONTROL

Is the stats structure fixed on the trunk (as I'd need it to compare benchmarks)?

All the best,
Ash
« Last Edit: August 08, 2009, 03:37:44 AM by AshMcConnell »

Rak'kar

  • Administrator
  • Hero Member
  • *****
  • Posts: 6895
  • Karma: 291
    • View Profile
    • RakNet
Re: UDT congestion control algorithm implemented
« Reply #2 on: August 08, 2009, 09:56:59 AM »
It has no effect on the latency. It's the default from now on. The defines are no longer used. The stats should be working too.

AshMcConnell

  • Full Member
  • ***
  • Posts: 169
  • Karma: 3
    • View Profile
    • Sirocco Racing Simulator
Re: UDT congestion control algorithm implemented
« Reply #3 on: August 09, 2009, 09:54:01 AM »
Hi Rak'kar,

There still seems to be a problem with the stats.  The total amounts (bitsReceived and  totalBitsSent), seem to work correctly, but bitsPerSecondReceived and bitsPerSecondSent are 0 (or there abouts i.e. 1e-11 to 1e-13).  I am expecting around 10-15Kbits/sec

Thanks for your help
All the best,
Ash

Rak'kar

  • Administrator
  • Hero Member
  • *****
  • Posts: 6895
  • Karma: 291
    • View Profile
    • RakNet
Re: UDT congestion control algorithm implemented
« Reply #4 on: August 09, 2009, 11:20:38 AM »
Fixed now.

AshMcConnell

  • Full Member
  • ***
  • Posts: 169
  • Karma: 3
    • View Profile
    • Sirocco Racing Simulator
Re: UDT congestion control algorithm implemented
« Reply #5 on: August 09, 2009, 11:27:18 AM »
Perfect, that did the trick!

Thanks again
Ash

AshMcConnell

  • Full Member
  • ***
  • Posts: 169
  • Karma: 3
    • View Profile
    • Sirocco Racing Simulator
Re: UDT congestion control algorithm implemented
« Reply #6 on: August 11, 2009, 03:08:13 AM »
After quite a bit of testing it seems that the old version uses a lot less bandwidth than the UDT version.  I am sending very small bits of info very regularly.

For example I'm getting a peak of around 300Kbits/sec with the old version and 600KBits/sec with UDT and on normal use (when the cars spread out a little) I am getting around 100-150Kbits/sec with the old version and 200-300Kbits/sec with UDT.

Perhaps I am doing something in a strange way.  My clients are sending position updates to the server at about 30Hz, the server is processing these as quick as possible (around 300Hz) and passing the updates onto the clients right away.  Do you think I could send more than one car update at a time from the Server to the Client?  I'm worried that this may increase the latency a little.

Thanks for your help!
All the best,
Ash

Rak'kar

  • Administrator
  • Hero Member
  • *****
  • Posts: 6895
  • Karma: 291
    • View Profile
    • RakNet
Re: UDT congestion control algorithm implemented
« Reply #7 on: August 11, 2009, 10:12:17 AM »
* EDIT *

I just checked in a change to the trunk that should help

"Reduce bandwidth usage by 2 bytes per message, and 4 bytes per datagram"
« Last Edit: August 11, 2009, 10:28:35 AM by Rak'kar »

AshMcConnell

  • Full Member
  • ***
  • Posts: 169
  • Karma: 3
    • View Profile
    • Sirocco Racing Simulator
Re: UDT congestion control algorithm implemented
« Reply #8 on: August 11, 2009, 10:33:16 AM »
Yeah, those 8 bytes really must make the difference. 

My packet sizes are quite small: -

High Detail - 356 bits
Med Detail - 212 bits
Low Detail - 72 bits

The high detail is only when pretty close up, so fairly rarely used.  It think the low detail is mostly used (i'm going to add some stats to see which are used the most), so 64bits extra for those is a big chunk.

1. I have already commented out the 64bit time - although I think i'll probably want to use 64 time before i release.
2. I'll check how many messages are being sent out with 10 players, although i hope to support around 20 players

BTW I am sending a single driver to each other single driver at 30Hz or so.  Looks like combining them into a single message for all other drivers may be the way to go to reduce datagram overhead? I tried this before and got poor results, but i couldnt quite figure out why.

Are you going to get rid of the old way completely?  I was just starting to grow quite fond of it :)

Thanks again for the help and a great lib!
All the best,
Ash

Rak'kar

  • Administrator
  • Hero Member
  • *****
  • Posts: 6895
  • Karma: 291
    • View Profile
    • RakNet
Re: UDT congestion control algorithm implemented
« Reply #9 on: August 11, 2009, 02:30:24 PM »
I reduced bandwidth further and checked it in

1 less byte for all datagrams
1 less byte for all messages
1 less byte for UNRELIABLE_SEQUENCED,RELIABLE_SEQUENCED,RELIABLE_ORDERED
2 less bytes for split messages (Over the MTU size in bytes)

Also, some further performance optimizations that will affect large data transfers.

You don't need to combine messages, RakNet does it automatically anyway. I'm getting rid of the old way once the new way has been tested for a while. My goal is not to have a new way that is worse in some aspects but better in others, but to be better in all aspects.

Rak'kar

  • Administrator
  • Hero Member
  • *****
  • Posts: 6895
  • Karma: 291
    • View Profile
    • RakNet
Re: UDT congestion control algorithm implemented
« Reply #10 on: August 11, 2009, 02:36:48 PM »
This is the overhead used by the new system:

Per datagram:
1 byte for bitflags
4 bytes for timestamp, used to calculate RTT for congestion control
3 bytes for sequence number, used to lookup datagrams for ACKs

Per message:
2 bytes for message length
3 bytes for sequence number, used to prevent returning to the user duplicated messages
1 byte for bitflags
if (UNRELIABLE_SEQUENCED,RELIABLE_SEQUENCED,RELIABLE_ORDERED)
   3 bytes for sequence number, used to reorganize messages in order
   1 byte for ordering channel
if (message over MTU)
   4 bytes for number of splits, not compressed to improve performance
   2 bytes for identifier for which split this is
   4 bytes for index into number of splits, not compressed to improve performance

A message is the data you send from the game. All messages you send to RakNet between RakNet update ticks are grouped together in one datagram. So if you send 1 message only, then the overhead is 1 datagram + 1 message. If you send 5 messages, then it's 1 datagram + 5 messages. If you send 1 message, but it's 10 times bigger than the MTU, then it sends 10 datagrams, each containing 1 message (the message gets split up)

This is the overhead used by the old system:

Per datagram:
1 bit for bitflags
8 bytes for timestamp, used to calculate RTT for congestion control

Per message:
4 bytes for message length
4 bytes for sequence number, used to prevent returning to the user duplicated messages
4 bits for bitflags
if (UNRELIABLE_SEQUENCED,RELIABLE_SEQUENCED,RELIABLE_ORDERED)
   4 bytes for sequence number, used to reorganize messages in order
   5 bits for ordering channel
if (message over MTU)
   4 bytes for number of splits, but compressed, so used 1-2 bytes on average
   4 bytes for identifier for which split this is
   4 bytes for index into number of splits, but compressed, so used 1-2 bytes on average
« Last Edit: August 11, 2009, 02:51:56 PM by Rak'kar »

AshMcConnell

  • Full Member
  • ***
  • Posts: 169
  • Karma: 3
    • View Profile
    • Sirocco Racing Simulator
Re: UDT congestion control algorithm implemented
« Reply #11 on: August 11, 2009, 03:31:51 PM »
Hi Rak'kar,

Thanks for the detailed explanation, it really helps to understand what is going on.   You've obviously put a lot of effort into shaving off some extra bytes :)

I have just tested it with the code from the trunk and there is definitely a marked improvement

Typically 400-430KBits/sec at the start and 150-200 during a race.  Its not quite at the level of the old method, but its getting there.  Would it be possible that the new method is sending more times as it doesn't have limited bandwidth on my LAN (and detects this better than the old method).  If that was the case, then possibly it would be ok with a "proper" test on the internet?

Perhaps I'm clutching at straws :)

Thanks again for all the effort, I will continue to feed back my experience if it is helpful for you.  If there is anything else I can help with, let me know.

All the best,
Ash

Rak'kar

  • Administrator
  • Hero Member
  • *****
  • Posts: 6895
  • Karma: 291
    • View Profile
    • RakNet
Re: UDT congestion control algorithm implemented
« Reply #12 on: August 11, 2009, 10:00:03 PM »
You could use unreliable, instead of unreliable sequenced, and write your own sequence number (one byte). If the incoming sequence number is less than the greatest one you've gotten so far, just ignore the message. A one byte sequenced number would limit you to 256 sends before the variable overflowed, but that should be plenty if you are only sending 30 times per second. RakNet has to use 3 bytes because it doesn't know what the user will do, so has to work with the worst-case scenario. So this would save you 4 bytes off the RakNet overhead, while introducing 1 byte of your own.

Here's some code to determine greater than or less than, with a variable that intentionally overflows:

Code: [Select]
bool CCRakNetUDT::GreaterThan(DatagramSequenceNumberType a, DatagramSequenceNumberType b)
{
// a > b?
const DatagramSequenceNumberType halfSpan =(DatagramSequenceNumberType) (((DatagramSequenceNumberType)-1)/(DatagramSequenceNumberType)2);
return b-a>halfSpan;
}
// ----------------------------------------------------------------------------------------------------------------------------
bool CCRakNetUDT::LessThan(DatagramSequenceNumberType a, DatagramSequenceNumberType b)
{
// a < b?
const DatagramSequenceNumberType halfSpan = ((DatagramSequenceNumberType)-1)/(DatagramSequenceNumberType)2;
return b-a<halfSpan;
}

In your case, replace DatagramSequenceNumberType with unsigned char.

The current version has less overhead than the old version. So if you're getting greater overhead, it's probably that your sends are going out in multiple datagrams rather than the same one.  Using a thread sleep timer of 10 is the best value with the current congestion control. This will ensure responsiveness while still getting multiple messages to go out in the same datagram.

Also, which value in RakNetStatistics are you using to determine how much bandwidth you are sending? I will check to see if the variable uses the same calculation as the old version. The calculation can be fuzzy because it depends on whether you are counting the UDP header or not (which other systems tend not to do, but RakNet tends to do).

AshMcConnell

  • Full Member
  • ***
  • Posts: 169
  • Karma: 3
    • View Profile
    • Sirocco Racing Simulator
Re: UDT congestion control algorithm implemented
« Reply #13 on: August 12, 2009, 02:55:49 AM »
Hi Rak'kar,

Thanks, I'll give that a go, sounds like it would be a useful saving.

I am using bitsPerSecondReceived / bitsPerSecondSent for each connection and totalling them up: -

Code: [Select]
for (int i=0; i < connectedCount; ++i){
stats = raknetServer->getStatistics(i);
instTotalBitsSent += stats->bitsPerSecondSent;
instTotalBitsRec += stats->bitsPerSecondReceived;
cout << "(" << i << ") KBits/s: R(" << stats->bitsPerSecondReceived / 1024.0 << ") S("  << stats->bitsPerSecondSent / 1024.0 << ")" << endl;
totalBitsSent += stats->totalBitsSent;
totalBitsRec += stats->bitsReceived;
}

double bpsReceived= (double) totalBitsRec / elapsedTime;
double bpsSent = (double) totalBitsSent / elapsedTime;
cout << "Avg Total KBits/s: R(" << bpsReceived/ 1024.0 << ") S("  << bpsSent/ 1024.0 << ")" << endl;
cout << "Inst Total KBits/s: R(" << instTotalBitsRec/ 1024.0 << ") S("  << instTotalBitsSent/ 1024.0 << ")" << endl;l

I am using the "Instant" values as it varies greatly between the start of the race and when the cars start to spread.

I was sleeping for 6ms I upped it to 10ms and it seemed to get slightly worse for some reason.  I'll do some more testing and see what I can come up with.

Thanks!
Ash

AshMcConnell

  • Full Member
  • ***
  • Posts: 169
  • Karma: 3
    • View Profile
    • Sirocco Racing Simulator
Re: UDT congestion control algorithm implemented
« Reply #14 on: August 12, 2009, 06:25:18 AM »
Hi Rak'kar,

I've had an idea - since all my packets are timestamped, is sequencing needed at all, the timestamp can be used as a sequence?

Seems that might work, do you see any reason why not?
All the best,
Ash