Update van een WCG Tech over de server storingen vandaag.
Interim update -
We have regained access to our project at Nibicloud. We have ssh access to our servers again, and I am in the process of damage control now before restarting the feeder.
Most of our servers/VMs remained online during the outage, but some appear to have been soft rebooted, losing in-memory caches that I need to repopulate from Kafka/Redpanda. Should have everything back up and running "soon", somewhere in the hours to tomorrow morning range as my current best estimate.
Validation should improve when I am done, as I have the opportunity to push some changes and separate the validation streams for old result pair upload events, vs. new result pair upload events, and launch additional validators with code changes to stripe them on workunit ID within the node-local partition, and do a second tier of batching to keep load on the BOINC db from spiking from multiple validator_assimilator daemons trying to batch update state and credit at once.
I will update here in the forums once I get through everything, and hopefully can address some of the concerns raised in the forums. If all goes well, plan is to start MAM1 beta for Windows as soon as MCM1 is flowing and validating again, along with a new build for Linux, both will run through rounds of beta30 before we run some smoke tests in the MAM1_9999903+ range.
[Jan 15, 2026 9:27:50 PM]
De validation servers kunnen het al ruim 10 dagen niet bijhouden. Het is te hopen dat dit zal verbeteren nu er toch van de mogelijkheid gebruik gemaakt gaat worden om updates door te voeren tijdens deze downtime.