Archive for the ‘Operations’ Category

[RESOLVED] Momentary Disruption to XML-RPC

Thursday, November 13th, 2008

[UPDATE 8:37 pm PST] XML-RPC services are restored.

*****

XML-RPC services in world are temporarily disrupted. Engineers are already repairing the process, and service should return very shortly.

[Resolved] Logins & Land Store Unavailable

Sunday, August 24th, 2008

[RESOLVED 9:00pm PDT] The Land Store is back online and teleports should be behaving normally.

[UPDATE 6:49 pm PDT] Unfortunately the Land Store is still offline, and some residents are still reporting misrouted home teleports. Other in-world services shouldn’t be affected by this.

[UPDATE 5:48 pm PDT] The Land Store is still disabled, and teleport home requests are still intermittently failing. Our ops team is working on resolving these issues as soon as possible.

[UPDATE 4:48 pm PDT] Logins have been re-enabled and most services should be returning to normal, however the Land Store is still down. Also, teleporting home may be affected at the moment, so if your teleport fails, please use a landmark or the World Map to teleport until this issue has been resolved.

[UPDATE 3:58 pm PDT] Ops is still working on logins. No change in status to report yet.

[UPDATE 2:56 pm PDT] Logins are closed again.

[UPDATE 2:47 pm PDT] We’re re-opening logins.  Land store is still disabled. Please ride login efforts all the way, rather than quitting and restarting them.  We’ll keep you posted on Land Store and in world transactions.

[UPDATE 2:27pm PDT]  Please refrain from transacting or working with valuable (no-copy) assets until we sound an ALL CLEAR.

[UPDATE 2:22pm PDT]  Residents are reporting trouble opening tickets to Support.

[UPDATE 1:46pm PDT] Land Store and logins are again temporarily disabled while Operations addresses load.

[UPDATE 12:32pm PDT] Land Store is back online.

[UPDATE 12:20pm PDT] Logins are re-enabled.  We’ll monitor for a few minutes before we turn Land Store back on.

Logins are likely to be slow for the next few minutes.  Please try to ride one login effort all the way through, rather than canceling and restarting the login.  That’ll keep your place in the queue.

[UPDATE 12:12 PM PDT]  The Land Store has also been temporarily disabled.

We’ve momentarily blocked logins to correct a load issue in the login process.  We’ll have them open again ASAP.

[RESOLVED] Logins, Inventory access unavailable for some residents

Sunday, June 22nd, 2008

Resolved at 10:45PM Pacific Work on the inventory server cluster is complete!  Some residents may need to clear Second Life’s cache (Edit > Preferences > Network > Clear Cache) and relog.  — Frontier

[9:24 pm Pacific]  Ops is addressing a disruption within the inventory server cluster.  Until the repair is completed, some residents will find themselves unable to retrieve inventory or to begin a new viewer session. We’ll keep you posted.

[Resolved] Approximately Five Hundred Regions Inaccessible

Friday, June 13th, 2008

[Resolved 9:55pm PDT] - The routing failure has been resolved and the ~500 regions should either be available now, or will be within a few minutes.

[Update 9:30am PDT] - Inventory database work has completed and logins for that database have been restored.

The ~500 regions affected by the routing failure remain inaccessible at this point, but work continues and we expect it to be resolved soon. If you are unable to login, please try logging in to a different region.

Update 8:59 a.m. PDT We are working on one of our inventory databases. This will cause the database to be down for approximately 30 minutes. Residents on this database will not be able to login. Residents in-world on this database will have restricted services.

A routing failure has rendered about 500 regions inaccessible. We expect resolution to take 30-60 minutes.

[All-Clear] Some Residents Reporting Service Slowdowns

Thursday, June 12th, 2008

All-Clear 9:56 p.m. PDT We have identified and corrected the routing issue.  It is all-clear to conduct transactions and issues stemming from this routing problem should be resolved.

Some residents are reporting problems with appearance and inventory. Operations is tracing potential routing issues. Please do not attempt valuable transactions until an all-clear is given.

[RESOLVED] 500 Regions unavailable

Friday, May 30th, 2008

[RESOLVED 22:32] The issue has been corrected and the affected regions are now accessible.

500 regions are currenty unavailable due to a networking issue. Ops is aware of the problem and is working to correct it. Please monitor this post for updates as they become available.

[RESOLVED] Database Issues Affecting Multiple Services

Thursday, May 22nd, 2008

[RESOLVED 13:10 PDT]  The database issues are corrected and the all-clear has been sounded.  Please continue with your regularly scheduled Thursday.

[UPDATE 12:30 PDT] The Support Portal is currently down, and the Land store has been taken off-line for the duration. Please continue to monitor this post for updates.

Issues with the database are currently affecting inworld services, logins, and the website. Please refrain from attempting inworld transactions, rezzing valuable or no-copy items, and logging in until the all-clear is announced.

Brief Power Outage Planned - 2700 regions will be affected

Wednesday, April 30th, 2008

We have received information from our network providers of planned maintenance which will make approximately 2700 regions unreachable for approximately 3 minutes this evening, Wednesday 30th April at 11pm PDT.

There will be two such events; the next outage will occur on Monday 5th May at 11pm PDT for the same length of time and will affect a similar number of regions.

We cannot list all of the regions that may be affected but if yours is one of these please accept our apologies for what we anticipate to be a few minutes lack of connectivity.

Rolling Restart for 1.21 Server Deploy Wed/Thu/Fri

Saturday, April 26th, 2008

[Updated Saturday @ 09:10am] The rolling restart of the rest of the grid is now complete.

[Updated Saturday @ 8:40am] The rolling restart of the rest of the grid is now in progress. It began at 5:10am, and is now 93% complete. As usual, each region will be down for ~5 minutes. if your region is down for more than 20 minutes, please contact support.

[Updated Saturday @ 7:06am] The rolling restart of the rest of the grid is now in progress. It began at 5:10am, and is now 46% complete. As usual, each region will be down for ~5 minutes. if your region is down for more than 20 minutes, please contact support.

[Updated Saturday @ 6:05am] The rolling restart of the rest of the grid is now in progress. It began at 5:10am, and is now 16% complete. As usual, each region will be down for ~5 minutes. if your region is down for more than 20 minutes, please contact support.

[Updated Saturday @ 5:10am] The rolling restart of the rest of the grid is now in progress. It began at 5:10am; we will post hourly updates with a percentage completed. As usual, each region will be down for ~5 minutes. if your region is down for more than 20 minutes, please contact support.

[Updated Friday @ 8:39am] The rolling restart to half of the grid is now complete but for 7 hosts that needed to be manually updated; those will be completed within a few minutes. The rest of the grid will be updated tomorrow morning.

[Updated Thursday @ 7:10pm] We are beginning have completed the deploy of 1.21 to 3 racks (632 regions). Here is a list of regions that as of now are on version 1.21.0.85745.

[Updated Thursday at 12:47pm] We will shortly be deploying have deployed 1.21 to 1 rack (about 170 regions) again. If all goes well, we will continue with the tenative timeline listed in the Wednesday at 8:10pm update below.

[Update Wednesday @ 9:15pm] A slight and subtle wrinkle during the deploy left some object-to-object emails non-functional. The responsible systems have gotten a stern talking to, and this service should be operational again.

[Update Wednesday @ 8:10pm] Another bug was found after we rolled out to one rack. That bug has been found and fixed. We will evaluate exactly what we’re going to do with this deploy after testing tomorrow, but it will likely shift the timeline forward by one day. Meanwhile, we are rolling back the 170 regions that had previously received a 1.21 deploy so that for all simulators are once again running on version 1.20.1 of the server code.

The central updates to 1.21 are complete and things seem “nominal” at the moment, but of course we’ll be watching closely.

  • Wednesday 4/23 @ 11am - deploy to 1 rack [DONE] [REVERTED]
  • Wednesday 4/23 - update central systems throughout the day [COMPLETE]
  • Thursday 4/23 @ 6pm - deploy to 3 racks [COMPLETE]
  • Friday 4/25 @ 5am-11am - deploy to half of remaining servers
  • Saturday 4/26 @ 5am-11am - deploy to remaining servers

[Update Wednesday @ 10:25am]

The bug in the 1.21 Server code identified last night during an initial rollout to 1 rack has been found, fixed, and verified. We’d planning to proceed with the rollout to avoid delaying the code update another week. On the table for today are the central services updates and limited rolling restarts.

What’s Changed in 1.21 Server

The most notable fixes will be physics-related, and have been in testing in the Beta Preview for several days. No new viewer is required.

Read on for more information…

(more…)

Rolling restart Wed/Thu April 16/17 for 1.21 Server Deploy

Wednesday, April 16th, 2008

[Update 2008-04-16 21:10] Several of the regions that received version 1.21 are showing problems, so we are going to revert them to 1.20. Many of the regions remain down; they will be back up within 1/2 hour.

[Update 2008-04-16 20:30] The deploy to 490 regions will begin momentarily

[Update 2008-04-16 17:00] We are in the middle of updating the central servers. Note that if you watch the concurrency plots, you will see dips in it as we restart servers that report concurrency numbers. This doesn’t actually mean that people are getting kicked offline, it’s just a reset of the data collection. The deploy to 500 regions will begin later tonight.

We will be doing a rolling restart this Wednesday and Thursday to roll out the patches to the server that were to be rolled out with last week’s cancelled rolling restart. Changes include security patches, performance improvements for Havok4 (including the issue that “openspace” or “void” sims have with Havok4), and code designed to mitigate the load on the central database systems.

We will do this with a usual 3-stage deploy:

  • Wednesday, April 16, 8:00PM : ~500 regions will receive the 1.21 server deploy.
  • Thursday, April 17, 5:30AM : ~1600 regions will receive the 1.21 server deploy.
  • Thursday, April 17, 6:00PM : All of Second Life will receive the 1.21 server deploy; this will take 5-6 hours to complete.

There will be no viewer updates required as a result of this deploy. All regions will receive warnings beginning five minutes before they are shut down. During the rolling restart, regions should be back 5-10 minutes after they are stopped. If your region stays down more than 20 minutes, please contact support.