Page 1 of 3 123 LastLast
Results 1 to 10 of 29

Thread: Devices Losing Registration

Hybrid View

Previous Post Previous Post   Next Post Next Post
  1. #1
    Join Date
    Feb 2007
    Location
    Irvine CA
    Posts
    1,542,128,043

    Default Devices Losing Registration

    This should be resolved now. We'd like to apologize for any inconvenience it caused and provide some insight into what happened.

    Problem

    As you may know, we rolled out new Grandstream firmware recently. In preparation for the firmware rollout, we made several systemwide changes to make our system compatible with the new firmware.

    After successfully testing all of those changes and the new firmware with positive results, we proceeded to roll out the firmware to all customers.

    In the days following the firmware roll-out we started seeing some isolated issues with some customers seeing their device "unregistering" which caused calls to go to voicemail or a failover number and prevented outgoing calls from being made. When customers rebooted the device, it would then reconnect and all would be well again.

    Finding the Cause

    We hadn't had this happen before and our system had been working amazingly well for months prior with no reports of this outside of things very isolated and caused by misconfigurations on home routers or similar situations.

    We immediately assumed these issues were related to the new firmware and worked with affected customers to come to what we thought was a resolution. Over the next few days, this pattern of registrations dropping continued for some customers, but it was still somewhat isolated affecting what we estimate to be about 10% of our customer base.

    When we started seeing reports of the same issue from customers that for various reasons did not have firmware updated, we then began going through all system changes that had been made in preparation for the firmware update and reverting the ones we could in case they caused the issue.

    In our logs, we saw the registrations dropping, but nothing really to explain why. At this point, we also were looking at our "clone line" implementation possibly canceling the other registration out given some of the patterns we'd seen of one line remaining registered with the other not. Still, logs did not conclusively indicate this. At that point, we re-structured the way the cloned line is handled. This seemed to significantly reduce the issue, but we still had some reports of it happening.

    Resolution

    Overall, none of those fixes seemed to work. Then a user finally pointed out that vPanel was running slow at the same time his registration dropped. This prompted me to work with our systems admins more and less with developers to see what kind of other issues could be there.

    As it turns out, one of our systems administrators had restructured our backups and was doing hourly backups on all database servers so we would have those in addition to the replication in case something ever become corrupt. This was done to resolve a previous issue with backups and load.

    When these new backups were running, it was introducing some database latency into things which was causing slower responses for registrations and vPanel loading in general.

    Apparently the Grandstream adapters which re-registered (normally done every hour) during this period of a few minutes were affected. The ATA would attempt to register and establish a successful connection but then the latency in the database access would prevent it from fully completing the registration request. To further complicate things, the ATA would send an unregister request with it's registration request. In some cases, the logs showed nothing, some the registration was just delayed, in some the registration was only 1 line, and the ATA was giving out a 500 Error message in some cases as well. We've now learned that GS ATAs are not as tolerant of any abnormal system conditions as some and sometimes didn't even attempt to re-register because they were so confused.

    The fact that some users didn't register during that period would explain why they did not experience this issue.

    In the midst of all the troubleshooting we also increased re-registration times up to every 5 minutes just so everyone was re-registering every 5 minutes in case there was just some issue with keep-alives being sent too infrequently. This put even more users into thebackup window of a few minutes since EVERYONE was now re-connecting every 5 minutes. That's why it gradually picked up and even some of our long-time users who had never seen this in 2 years suddenly saw it.

    Service Credit

    While this did not affect all customers, it should not have happened and was a big issue for those of you that it did affect.

    It's very hard to determine what specific customers were impacted, but we know that more were than we would have liked in any event.

    We're going to issue a service credit in the form of a renewal extension of 14 days for all current residential accounts that are active as of today. This will be applied at some point within the next week.

    Current Issues

    We're pretty confident that this particular issue is resolved. Immediately our support volume dropped by 90% when we changed the backups in question.

    If you're experiencing any issues along these lines now, it is likely another issue entirely. If that's the case, please contact support so we can work with you to address them.

    Future

    This new backup process wasn't communicated well internally, so it was pretty much overlooked by our developers who were more focused on the recent slew of changes they'd made when looking for the cause.

    We're working to bridge the gap to keep the two teams more in sync with each other. We've also started including all changes by sys admins in our internal changelog for development as well. This way it's easier to identify these issues and correct any issues that come up more quickly based on the more comprehensive picture.

    We've also restructured this so the hourly backups are only being done on the secondary replicated database servers and not every server in every node to avoid this in the future.

    No service is perfect and this demonstrates that we can have issues from time to time as well. With that being said, it's always our goal to resolve those issues quickly.

    Once again, we apologize for any inconvenience this caused and want to thank you for your continued confidence in VOIPo.
    Timothy Dick
    Founder/CEO
    VOIPo.com

    Interact with VOIPo: Twitter, Facebook

  2. #2
    Join Date
    Apr 2008
    Location
    Aventura Fl
    Posts
    860

    Default Re: Resolved - Devices Losing Registration

    Tim..

    I take my hat off to you..

    You and your company are a class act!!

  3. #3
    Join Date
    Feb 2007
    Posts
    20

    Default Re: Resolved - Devices Losing Registration

    Thank you for that great explanation of what was happening with our service. Its funny how something so minute like that can cause so many issues for so many people. Its good that you're going to increase communications between your internal people to avoid issues like this again. Thanks and keep up the great work!

  4. #4
    Join Date
    Sep 2008
    Location
    Southwest MO
    Posts
    219

    Default Re: Resolved - Devices Losing Registration

    Fantastic work! We all had faith that you and your team would figure it out quickly. That is what separates VOIPo from the rest of the pack. You show that you care about your customers and put together a great combination of technical infrastructure and brains behind the operation. Thank all of your team for all of us!

  5. #5
    Join Date
    Feb 2007
    Location
    Maryland
    Posts
    314

    Default Re: Resolved - Devices Losing Registration

    Not only is the problem resolution incredible, but the fact that you would type up such a detailed post freely admitting specifics about the company's internal workings is unheard of today - thanks for keeping things so down-to-earth Tim!

  6. #6
    Join Date
    Feb 2007
    Posts
    801

    Default Re: Resolved - Devices Losing Registration

    This detailed account of the issue, and the service credit, are two reasons I am a VOIPo supporter. Note that supporter does not equal 'fanboy'... I won't recommend VOIPo to everyone (as you might notice on BBR), but it's a solid service for anyone who's considering VoIP at all.

    In all honesty, I would not be surprised to see issues come up like this again over the next few months as your systems and processes (not just hardware/software, but policies, etc) are put into practice and you learn what works and what doesn't. If you continue to tackle problems as you did this one, you have a bright future ahead.

  7. #7

    Default Re: Resolved - Devices Losing Registration

    fisamo. How are you. You were so helpful last year when I signed up for ATT CV. Now I am over here with a virtual number getting ready to port ATTCV over. I am glad to see you here. You were a wealth of infomative information at the other site. While I am very computer literate in general, VOIP is not a stron suit at this point. I'm working on it.
    I must say that while I have had my share of issues setting this up, Tim and the team have been more than top notch as far as response and suggestions, taking care of issues as quickly as possible. They are always quick to respond and very pleasent to deal with. I have not sent the porting documents to them yet, but I have the virtual number forwarded to the number I will be porting and that seems to be working out well.
    Some issues with simul ring and call forwarding that Tim and I have been back and forth on today. I am sure it will be resolved.
    Anyway fisamo. Good to see you here. Look forward to talking to you on this forum.

    Kevin

  8. #8
    Join Date
    Dec 2008
    Posts
    1

    Thumbs up Re: Resolved - Devices Losing Registration

    Thanks for the in-depth explanation! I'm in IT, so I understand when changes cause issues and the need to keep good change control records. It sounds like a good learning experience and I look forward to continued solid service in the future. Thanks for the excellent support and keep up the good work.

  9. #9
    Join Date
    Dec 2008
    Location
    Beaverton, OR
    Posts
    114

    Default Re: Resolved - Devices Losing Registration

    Yes, thank you Tim for the explanation, and caring so much about your customers. With mass growth, problems are bound to come up, but you and your team are right on top of things. Thanks again Tim for all your hard work and dedication to your customers.

  10. #10

    Default Re: Resolved - Devices Losing Registration

    Quote Originally Posted by Vumes View Post
    Yes, thank you Tim for the explanation, and caring so much about your customers. With mass growth, problems are bound to come up, but you and your team are right on top of things. Thanks again Tim for all your hard work and dedication to your customers.
    I second this response..

    Kev

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •