Archive for April, 2008

[RESOLVED]Inworld Issues and Logins

Friday, April 25th, 2008

[RESOLVED 22:58] The network situation appears stable, so we’re sounding the All-Clear. Enjoy the rest of your weekend.

[UPDATE 22:18] Ops says the network problem has been fixed, we’ve regained contact with the rest of the world, and things should quickly be returning to normal. We’ll be monitoring the situation, and if no further complications arise, we’ll be giving the all-clear shortly.

There is a major issue causing problems with a number of inworld services, including problems with profiles, teleporting, and maps as well as interfering with logins. Ops is investigating and early indications are that we are being affected by a major networking issue. Please monitor this post for updates.

[CLOSED] Updating Your Credit Card Issue

Friday, April 25th, 2008

[UPDATE 25 April 08, 3:40 pm Pacific]

Our payment processor has fixed and tested the process.  All card operations (including updates) are now running within normal time frames.

*****

Due to an issue with our payment processor, we are unable to update your credit card information at this time. Attempting to delete and update your credit card will be indefinitely delayed. You may enter card data but it will not be active in our system for potentially several more hours .

If you have valid credit card or payment information, you can continue to USE it with no problems- this only affects updating card information, or initial card data entry.

Updating Paypal payment information is completely unaffected.

We are working with our payment processor, and credit card updating will work as normal as soon as their issues are resolved.

[RESOLVED] Logins Re-Opened

Friday, April 25th, 2008

[UPDATE 1:35pm Pacific] Logins are open, and the asset system has returned to full functionality. Please be patient when logging in, as the login queue will be processing a high volume of requests for the next few minutes.

[1:02 pm Pacific] Logins have temporarily been restricted to staff-only as Operations addresses a slowdown in the asset system. We’ve also broadcast a request in world for residents who are currently logged in to refrain from manipulating or transferring assets. We’ll have more info ASAP.

[CLOSED] Some Second Life Regions Offline

Friday, April 25th, 2008

[CLOSED 11:07 a.m. --teeple]

All regions affected have been returned to service. Thanks for your patience!

[UPDATE 10:08 a.m. --teeple]

The vast majority of regions are back online. Operations is still working on a small number of them. We’ll let you know as soon as they’re done.

***

As a result of a recent Rolling Restart, many regions in Second Life are currently down. Our Server Team is working quickly to get them back online as soon as possible. More Updates here as information is received. Thank you for your continued patience.

Rolling Restart for 1.21 Server Deploy Wed/Thu/Fri

Thursday, April 24th, 2008

[Updated Friday @ 8:39am] The rolling restart to half of the grid is now complete but for 7 hosts that needed to be manually updated; those will be completed within a few minutes. The rest of the grid will be updated tomorrow morning.

[Updated Thursday @ 7:10pm] We are beginning have completed the deploy of 1.21 to 3 racks (632 regions). Here is a list of regions that as of now are on version 1.21.0.85745.

[Updated Thursday at 12:47pm] We will shortly be deploying have deployed 1.21 to 1 rack (about 170 regions) again. If all goes well, we will continue with the tenative timeline listed in the Wednesday at 8:10pm update below.

[Update Wednesday @ 9:15pm] A slight and subtle wrinkle during the deploy left some object-to-object emails non-functional. The responsible systems have gotten a stern talking to, and this service should be operational again.

[Update Wednesday @ 8:10pm] Another bug was found after we rolled out to one rack. That bug has been found and fixed. We will evaluate exactly what we’re going to do with this deploy after testing tomorrow, but it will likely shift the timeline forward by one day. Meanwhile, we are rolling back the 170 regions that had previously received a 1.21 deploy so that for all simulators are once again running on version 1.20.1 of the server code.

The central updates to 1.21 are complete and things seem “nominal” at the moment, but of course we’ll be watching closely.

  • Wednesday 4/23 @ 11am - deploy to 1 rack [DONE] [REVERTED]
  • Wednesday 4/23 - update central systems throughout the day [COMPLETE]
  • Thursday 4/23 @ 6pm - deploy to 3 racks [COMPLETE]
  • Friday 4/25 @ 5am-11am - deploy to half of remaining servers
  • Saturday 4/26 @ 5am-11am - deploy to remaining servers

[Update Wednesday @ 10:25am]

The bug in the 1.21 Server code identified last night during an initial rollout to 1 rack has been found, fixed, and verified. We’d planning to proceed with the rollout to avoid delaying the code update another week. On the table for today are the central services updates and limited rolling restarts.

What’s Changed in 1.21 Server

The most notable fixes will be physics-related, and have been in testing in the Beta Preview for several days. No new viewer is required.

Read on for more information…

More Details

A “rack” is a physical set of about 40 sim hosts, so about 160 regions, give or take spares. This is also a handy sized unit for initial rollouts. We’ve started doing restarts spread across several days to catch any configuration or scaling issues before they affect the whole service, and also because the service is now so large (10 times as many hosts as when I started, I believe) that we need to do it in pieces

During the central system updates we expect brief disruptions of some services (less than 2 minutes). For example, it may be necessary to re-join group chats, the reported “residents online” numbers may drop, and logins may not function briefly as the processes responsible for those services are stopped/started. These activities are partitioned by agent and group - for example, a particular resident may not be able to log in while one of the hosts is restarted, while other residents are able to log in. This usually takes less than a minute for each of 16 hosts.

1.21 Rollout History

The 1.21 Server deploy was initially scheduled for April 16th/17th. During the rollout, some problems were encountered which caused us to roll back, review the code, make some fixes, and proceed cautiously. In detail:

  • a component swap intended reduce the disruption caused by central updates exhibited poor performance in production; this was reverted in favor of a different architectural change already planned, but waiting for a future update
  • a data migration intended to alleviate database load during login and other actions turned out to have a subtle bug that required reversion; after further review, the updated code is going out in a multi-phase approach starting with 1.21.
  • during an initial “3-rack roll“, several services on the targeted hosts were not started correctly; further investigation could not determine if this was due to a momentary network glitch, database hiccup, problem with the deploy tools, or a problem with the code that was not caught during testing. Subsequent testing was unable to reproduce the problem. Since this is easily detected and recovered from, we’re proceeding cautiously. So far, no issues have been seen.

[Initial Post Details from April 22nd]

We’re ready to initiate the update of the Second Life servers to the 1.21 version of the code.

The deploy is going to happen in several phases:

  • Tuesday 4/22 @ 1pm - deploy to 1 rack - [DONE]
  • Tuesday 4/22 @ 5pm 7pm - deploy to 3 racks
    (delayed a bit due to a wrinkle discovered during the previous step)
  • Wednesday 4/23 @ 6am - deploy to 10 racks
  • Wednesday 4/23 - update central systems throughout the day
  • Thursday 4/24 @ 5am-11am - deploy to half of remaining servers
  • Friday 4/25 @ 5am-11am - deploy to remaining servers

Should problems be encountered with the 1.21 rollout, we will likely proceed with deploying a subset of the changes focused on physics-related fixes, as we did with last week’s 1.20 patch rollout.

[Update Tuesday April 22nd @ 7:55pm]

After an initial re-rollout to 1 rack, reports came in of attachment failures. The rack currently is being reverted to the previous (1.20.1) simulator version. After some quick tests, we believe we’ve narrowed down the changes responsible (tests for rez permission appear to be checking a remote parcel incorrectly), but a fix is unlikely until tomorrow at the soonest. The issue also affects the “backup plan” for a smaller patch deploy mentioned at the end of the post (which hints at the source of the bug).

[Sorry for making updates at both the top and bottom of this post; I want it to remain understandable for residents who are reading it for the first time, yet retain the history of the post for later comments to remain sensible. -- Joshua Linden]

Support Portal Maintenance - Sat 26th April, 6pm-Midnight PST

Thursday, April 24th, 2008

Our support portal will be offline for six hours of system maintenance this Saturday, 26th April from 6:00pm-Midnight PST. During that time, the support portal will be unavailable for chat or ticketing services.

[DONE] Small Deploy of Server 1.21 Beginning Momentarily

Tuesday, April 22nd, 2008

[Update 13:36] This is complete.

We will be deploying server version 1.21 to 184 regions, beginning momentarily. Regions will receive a warning and will restart thereafter.

A more extensive blog post detailing the rollout plan for 1.21 will be posted after the information we gather with this first rack’s deploy.

Reminder: Auction Site Down for Service Tonight

Tuesday, April 22nd, 2008

As originally reported here, our auction pages will be offline for scheduled maintenance this evening from 9 pm until 11pm PST.

We apologise for any inconvenience this essential event may cause.

[RESOLVED] - Transaction history not showing on the website

Tuesday, April 22nd, 2008

[2:43 AM - RESOLVED] The database is back and transaction histories are now viewable again. Thanks for your patience whilst we sorted it out.

Residents will find that transaction histories are not showing up when running a query on our website. This is due to an error on one of our slave databases, which is now in the process of being resolved. It’s hoped that this will be back in about 20 to 30 minutes.

Sorry for the inconvenience meantime.

Auction Site Maintenance Scheduled, Tue 22nd April, 9pm-11pm PST

Monday, April 21st, 2008

Our auction facility will be offline for two hours of scheduled maintenance tomorrow, Tuesday 22nd April, between 9pm and 11pm PST.

Our apologies for any inconvenience that this may cause.

Matthew