9.0 KiB
Changed 5 years ago by ReturningNovice
Attachment:bwgraph20-6d00.png added
Graph from OP
comment:1 Changed 5 years ago by foible
Looking into this further, it seems very feasible to improve. There are some challenges, though, like the fact that new tunnels take some time to pass traffic, so the router must "predict" a couple minutes into the future based on its current bandwidth usage. But this can be done (not literally, but we can improve the approximations it's already using).
The relevant code is in RouterThrottleImpl?.java, primarily on lines 389-391:
// limit at 90% - 4KBps (see above)
float maxBps = (maxKBps * 1024f * 0.9f) - MIN_AVAILABLE_BPS;
float pctFull = (maxBps - availBps) / (maxBps);
double probReject = Math.pow(pctFull, 16); // steep curve
There is also a cap on new tunnels if bandwidth usage is > 90% on line 360, which I refer to as a "soft" limit, below.
I ran some crude, approximate numbers on this function to get a rough "feel" for router rejection behavior, not counting the MIN_AVAILABLE_BPS (which is 4kBps) that's cut off of the top:
80% of the 90% "soft limit" (72% of user-set b/w limit used) yields 3% rejection
85% of the 90% "soft limit" (76.5% of user-set b/w limit used) yields 7% rejection
90% of the 90% "soft limit" (81% of user-set b/w limit used) yields 18.5% rejection
95% of the 90% "soft limit" (85.5% of user-set b/w limit used) yields 44% rejection
98% of the 90% "soft limit" (88.2% of user-set b/w limit used) yields 72% rejection
99% of the 90% "soft limit" (89.1% of user-set b/w limit used) yields 85% rejection
99.5% of the 90% "soft limit" (89.55% of user-set b/w limit used) yields 92% rejection
Anything above that goes right to 100% rejection. So basically, the rejection decision as it is now looks a bit like this when graphed (NOT TO SCALE!!):
r | ____|__100%
e | / |
j | / |
e | ; |
c | | |
t | ; |
| | |
% |__________________________________;_______|
0% bandwidth usage 100%
I'm pretty sure that what I noticed in my router bandwidth graph (that started this whole process, and can be seen in the attached image) is a result of a couple of things - that big, very vertical wall you see here, and the wide area of 100% rejection even when the router still has some capacity and is constantly expiring tunnels anyway.
So, I see three goals for tuning, here:
-
Make the existing transition from "few tunnels rejected" to "nearly all tunnels rejected" (the "vertical wall") occur over a wider span, to help prevent very sudden drop-offs in participating traffic.
-
Make rejecting 100% of tunnels a less common state - the router graphs (like in the earlier attached image) clearly show that this is over-compensating (to me).
-
Start rejecting a small number of tunnels at lower bandwidth usage %s, to help prevent "biting off more than we can chew" and ending up with too many tunnels (current setup already achieves this pretty well, but mostly by being very conservative). This may be optional, but may become necessary depending on how aggressive the other tweaks are.
Fortunately, the way this code is written now, it's very tunable, and very smart. I have done some preliminary experiments with reducing the exponent (from its current 16) and believe that this is a promising avenue to pursue.
However, it is my belief that an exponential function may not actually be the best way to determine this throttling at all, as pretty as it is. I think for our purposes, a simple, straight slope, still with some 100% rejection states (like above 95% instead of above 90%) would work much better, but of course this must be tested. So in such a case, the graph would look like this:
r | __|__100%
e | _/ |
j | _/ |
e | _/ |
c | _/ |
t | _/ |
| _/ |
% |___________________________/______________|
0 bandwidth usage 100%
Except the slope would not be wavy, of course.
If it is to remain exponential, though, I think it seems better with an exponent of six or less, and potentially even as low as two.
Naturally all of this conjecture is meaningless without testing, since an i2p router is a very chaotic thing, and there are also feedback considerations involved in this process, where bandwidth-shaping parameters affect future bandwidth which affects future bandwidth-shaping parameters and so on… so we need some real data to see what works. But thankfully, it seems that that data should be relatively easy to get, and doesn't require anyone other than the person doing the testing to do anything. What I hope to find is a configuration where total bandwidth usage remains mostly around 95-99% of the user-specified maximum, with minimal variance. I think this is achievable, but of course I'd settle for 88%-104%, just as an example, which would still be an improvement.
Everyone and anyone else is of course welcome to tweak the code and do some testing and/or maths of their own. Let's see if we can find something that works. Also, it would be good to re-work these numbers a bit without disregarding the MIN_AVAILABLE_BPS, especially at various total bandwidth limits, since it's a fixed value. Finally, this kind of problem is probably pretty well-suited to a plain Monte Carlo simulation, or at least a three- or four-dimensional graph of possibilities, to determine good potential configurations, but I do not have the skills to create either of those.
Please comment with any thoughts you might have below.
comment:2 Changed 5 years ago by foible
Another thought that I've just had is that the 90% usage soft limit might be a better fit in routers that are sharing less bandwidth than mine, and conversely, that a 95% cutoff might still be too conservative in very high-throughput routers. So another thing to consider would be either changing those cutoffs to be absolute in kBps (like by lumping them in with MIN_AVAILABLE_BPS, and having the slope go all the way to "100%" of bandwidth usage instead of 95%), or changing that "soft limit" percentage to change/scale on its own at various total bandwidth limits. Just another small piece of this to think about.
It would be great to hear from people with routers that are different than my own, to get a sense of how their bandwidth graph looks (and how much potential bandwidth is "wasted"), and how it might look with some tuning done.
Last edited 5 years ago by foible ( previous) ( diff)
comment:3 Changed 5 years ago by zzz
Status:new → open
The attached graph makes it pretty obvious that we could be more efficient. I'm not worried about where we set the cap - whether it's at 95, or 98, or 100, or 105% of the setting - but it would be nice to stay roughly at a cap rather than swing wildly. So I'm in agreement with a goal of, e.g., 88-104 as you state above.
There's a couple of dozen places in the code where we can reject a tunnel. I've tried to word each one slightly differently, so you can keep an eye in the console and trace it back to the code. You can also set some logging levels judiciously, that may help. There are - indeed - places in the code that are only hit by very high bandwidth routers, that only a few people can test changes for.
I encourage you to experiment with code changes and see what improvements you can come up with. I'm certainly open to accepting well-reasoned and well-tested patches. Sounds like you're on the right track. I don't have an opinion on the details, esp. the exponential vs. straight-line hypothesis, you may only get to an answer by testing. Also, recall that how the requesting routers respond - do they back off from more requests, and how long does it take - is a factor to consider.