Blog tools

Blog archives

Support Managers

Commandersnoots online
English
Zac Remex online
English, Dutch
show offline managers
English, Russian

Today’s bot outages (the techy post)

Some bots went offline today, which led to web account outages. The reason was one of our bot servers which has died because of a hard drive failure. The affected bots were quickly moved to a nearby server… which got crashed under the extra load, too. This was unexpected.

The good news is that while our server team was trying to reanimate these machines, the development team managed to develop the system which allows SmartBots to quickly detach (or, say, temporary forget) any set of broken bots. This will also help during SL rolling restarts, when large random sets of bots fall offline.

We are sorry if your bots were offline today. Everything is running smoothly now!

P.S. We are going to completely replace the second, unreliable server tomorrow. All bots are expected to stay online during the migration.

The major outage has been fixed, minor is to be fixed soon

SmartBots is experiencing the network connections issue today. Our network provider is changing the network structure, and some inter-connections have been suddenly broken.

Honestly, this is mostly our guilt: we were warned about the possible issues beforehand, and even prepared the reserve gate. But flowing all the traffic through the single gate was quite unwise!

We just solved the major issue, letting bot traffic flow. However, some minor services are still experience problems. For example, the third-party connections (like Qubic or BankBot addon) may still down. We hope to resolve this within a hour or so.

[SOLVED] Network issues and numerous bot restarts

Bots are up and running

Today we were fighting a network issue which caused some bots (actually, about 10% of our hosted bots) to login and logout periodically.

The problem started with some bots getting frozen: they were not responding to the watchdog service commands (this is a SmartBots internal service which pokes bots periodically to restart those which lost a connection to the reality). Thus, bots began to restart.

However, the bots just restarted were still unavailable: they were connecting but then frozen again. Our emergency team got a signal that something is wrong: the number of crazy bots has increased to several percents. Even during a SL rolling restarts, this amount never exceeds 2-3% of bots total.

(more…)

Older Posts »