Geen blocks woensdag: uitleg d.net

Pagina: 1
Acties:

  • bkor
  • Registratie: November 2000
  • Niet online
bericht van Jeff Lawson, op de rc5 lijst:

There was indeed a period of disconnectivity on Wednesday morning (U.S. time) where much of the network were unable to queue any additional workunits to or from the keymaster for a handful of hours.

During this time, all of the uploaded results were held backlogged on the fullservers, and they continued to issue new workunits from their queues, as they are designed to do in such conditions.

Due to the nature of how the statistical information presented on the proxyinfo info pages is gathered and collected, all of the uptime, checkin periods, and buffer levels will only reflect their last received values if the central proxyinfo database cannot be updated or contacted to (by either the website or by the fullservers). This may give the impression that all of the servers are down, when in actuality they merely cannot report their status to the proxyinfo database (hosted at the keymaster, and mirrored periodically to the machines hosting the website).

Once connectivity was reestablished, all of the backlog had been successfully dispatched within a few hours afterwards. All fullservers and the keymaster performed admirably during this recovery period and in particular the opportunity provided the ability to realize the performance benefits of recent software architecture changes to the keymaster codebase to give it a higher degree of parallelization. The outtage also permitted us to finetune optimize some of the performance affecting controls used by the keymaster.

We generally try to ensure that all fullservers have enough work units at all times such that they can withstand an extended period of full disconnectivity of five or more hours, at each of their average consumption rates. However, due to the duration of the outtage, at least a couple of the fullservers prematurely exhausted their supplies of available blocks, so clients would eventually reinitiate their connections to one of the alternate fullservers that had not yet been exhausted. (This is why it is far preferable to configure your clients to utilize one of the large geographic DNS round-robins, such as us.v27.distributed.net
instead of only a specific fullserver or IP address.) Since each fullserver operates independently while it is disconnected, this may give the impression that the entire network does not have available workunits
when in actuality only that specific fullserver has been exhausted.

We are currently in the process of performing final validation of a new release build of the proxy codebase and will be formally propagating this new build to all of the fullservers within the next week or so. There are already a couple of our fullservers that are already running these new builds and have not encountered any issues with them yet. Although the main features of this new build focus primarily on performance improvements at the keymaster, there are also a number of networking performance improvements that should be realized by the fullservers. None of these changes should directly be noticable nor will affect clients
directly. The one exception is that personal proxy builds prior to build 309 (released Jul 1999) will now become officially unusable on the network. This planned drop of support was first announced several months ago: http://lists.distributed.net/hypermail/proxyper.Dec2000/0016.html

One of the plans that I currently have in the preliminary development stage is to allow the fullserver network to support a topology in which each fullserver maintains connections to multiple keymasters simultaneously and can divert requests to one of the others. This will serve not only to assist in load balancing, but also to reduce the impact of large losses of connectivity by allowing the multi-homed nature of many of our fullservers to be utilized better.