Archive for the ‘Central Database Cluster’ Category

[RESOLVED] Database Issues

Monday, May 5th, 2008

[RESOLVED 2:13 PM PDT] - The database has been recovered and all services should now perform normally.

We are currently experiencing issues with our database. As such, please refrain from transactions. Slow/problematic logins along with other database related issues may be experienced during this time.

We should be back to normal operation within the hour.

Please see the following Wiki page for more information:

http://wiki.secondlife.com/wiki/Service_Disruptions#Central_Database_Cluster

[RESOLVED] Database disruptions

Saturday, May 3rd, 2008

[RESOLVED] 5:13 PM PDT - All services should have resumed full functionality and you may return to normal activities.

We are currently experiencing issues with our database, although some services may be returning as you read this. In the meantime, please refrain from transactions and be aware of slow/problematic logins along with normal database related issues.

Please see the following Wiki page for more information:

http://wiki.secondlife.com/wiki/Service_Disruptions#Central_Database_Cluster

[CLEARED] Logins and other db-related services slow

Friday, May 2nd, 2008

[FINAL UPDATE 4:27 p.m. Pacific] In world services and support portal access are fully restored. The Concierge and Grid teams are working with a slightly higher than normal count of regions which have entered restart, but that backlog should be cleared within the next few minutes.

[UPDATE 3:01 pm Pacific] Operations has spent the past hour swapping in some database resources. Their efforts have made a difference, but throughput could still stand some improvement. Work continues.

[1:53 pm Pacific] A sudden heavy load on the database is causing slow logins and inhibiting access to the support portal, and may be impacting in world transactions for some residents as well. We’ll keep you posted as we work to assess and correct the problem.

Rolling restart Wed/Thu April 16/17 for 1.21 Server Deploy

Wednesday, April 16th, 2008

[Update 2008-04-16 21:10] Several of the regions that received version 1.21 are showing problems, so we are going to revert them to 1.20. Many of the regions remain down; they will be back up within 1/2 hour.

[Update 2008-04-16 20:30] The deploy to 490 regions will begin momentarily

[Update 2008-04-16 17:00] We are in the middle of updating the central servers. Note that if you watch the concurrency plots, you will see dips in it as we restart servers that report concurrency numbers. This doesn’t actually mean that people are getting kicked offline, it’s just a reset of the data collection. The deploy to 500 regions will begin later tonight.

We will be doing a rolling restart this Wednesday and Thursday to roll out the patches to the server that were to be rolled out with last week’s cancelled rolling restart. Changes include security patches, performance improvements for Havok4 (including the issue that “openspace” or “void” sims have with Havok4), and code designed to mitigate the load on the central database systems.

We will do this with a usual 3-stage deploy:

  • Wednesday, April 16, 8:00PM : ~500 regions will receive the 1.21 server deploy.
  • Thursday, April 17, 5:30AM : ~1600 regions will receive the 1.21 server deploy.
  • Thursday, April 17, 6:00PM : All of Second Life will receive the 1.21 server deploy; this will take 5-6 hours to complete.

There will be no viewer updates required as a result of this deploy. All regions will receive warnings beginning five minutes before they are shut down. During the rolling restart, regions should be back 5-10 minutes after they are stopped. If your region stays down more than 20 minutes, please contact support.

[COMPLETED] Services and Logins will be down for ~30 minutes

Tuesday, April 15th, 2008

[Completed 12:07 p.m. PDT] Our system engineers have completed work on the Central database cluster and services have resumed.

We are about to undergo a 30 minute downtime in order to work on our Central database cluster. All services including transactions, logins and in world services will be affected.