The minor maintenance turning to epic crash… and some data loss

If you saw SmartBots website offline today, we are really sorry for that! This was a result of an epic crash we’ve faced today. The techy details follow for those who interested or curios.

Once is a chance

We’ve decided to run a minor maintenance this morning. It was supposed to be a regular replacement of an SSD drive (these drives get worn out every 1.5 – 2 years). The drives (say, A and B) are joined into the RAID array, so it was enough to pull out one drive (drive B) and insert a fresh one. (more…)

Maintenance: we have good news and, well, another good news

We are working on integrating bots and Second Life with Discord. Thus, the bots just got a new experimental brains upgrade which is able to be a Discord bot in addition to Second Life bot.

We’ve rolled out the experimental release yesterday, and, ehm, it was too experimental. Few hours later, the subject bots started crushing everything around, eating memory without washing the hands first… you know, very inadequate behavior for good bots!

The reason has been found: this is a Discord’s unstoppable fun made our decent bots crazy! Our team did and applied a special Discord-resistance patch for all of the bots. So far, everything is calm since then!

No kidding: we’ve did a release with new experimental integrations, and it messed some things up. Few updated bots managed to overload and crush two of our servers, causing the account and system lags. The problem is fixed now.

If you are using Discord for your Second Life activity, please contact us (either by replying to this blog post, or by contacting our support). We will be glad to know the ways you use Discord!

Today’s bot outages (the techy post)

Some bots went offline today, which led to web account outages. The reason was one of our bot servers which has died because of a hard drive failure. The affected bots were quickly moved to a nearby server… which got crashed under the extra load, too. This was unexpected.

The good news is that while our server team was trying to reanimate these machines, the development team managed to develop the system which allows SmartBots to quickly detach (or, say, temporary forget) any set of broken bots. This will also help during SL rolling restarts, when large random sets of bots fall offline.

We are sorry if your bots were offline today. Everything is running smoothly now!

P.S. We are going to completely replace the second, unreliable server tomorrow. All bots are expected to stay online during the migration.

The major outage has been fixed, minor is to be fixed soon

SmartBots is experiencing the network connections issue today. Our network provider is changing the network structure, and some inter-connections have been suddenly broken.

Honestly, this is mostly our guilt: we were warned about the possible issues beforehand, and even prepared the reserve gate. But flowing all the traffic through the single gate was quite unwise!

We just solved the major issue, letting bot traffic flow. However, some minor services are still experience problems. For example, the third-party connections (like Qubic or BankBot addon) may still down. We hope to resolve this within a hour or so.

Older Posts »