Editor’s Note: As third-party software, ServInt does not support Nginx beyond installation. Also, to follow these instructions, users must have a working knowledge of the command line. Click here if you would like more information about logging into your server on the command line.
Nginx offers an alternative to the Apache web server popular with some server admins. Advanced users may choose to run Nginx instead of Apache, as it is believed to offer possible performance benefits in certain configurations. Read on if you are interested in installing Nginx on a server running cPanel. Read more
Earlier this week, we launched two new tools designed to immediately inform our customers whenever issues emerge that may affect the performance or accessibility of their server. In this brief blog post, we’ll explain how these new services work and how you can put them to use.
First, an important detail: both of these notification services are hosted completely independent of the ServInt network — so, no network or datacenter event that impacts ServInt’s core infrastructure will prevent us from providing you with up-to-date information. If you want to be certain you always have visibility into the ServInt nerve center in the event of performance issues that may affect your server, I strongly urge you to sign up for both of these services.
These systems were designed to alert you about two kinds of incidents: those that affect specific infrastructure, where the larger ServInt network is functioning without incident; and events that are having an impact on entire segments of the larger ServInt network, affecting multiple customers at once. We are very proud of our high levels of network redundancy and our enterprise-grade hardware, so incidents that involve our infrastructure are rare. Even so, problems can and do occur, albeit rarely, despite our best efforts. That’s why these systems exist. We hope you never have to use them, but in the interest of full transparency, they are here for you.
For reporting on smaller pieces of infrastructure, we have launched an opt-in Twitter account called ‘@ServIntStatus’ (twitter.com/servintstatus). Use of this system is simple: items will pop up on this feed when we are investigating problems specific to a hardware or system component. Of course, you also have the option of having Twitter send you push notifications from the account whenever an issue is discovered, or a maintenance requirement is being fulfilled — but this may result in information overload due to the size of our network. Please note that this feed will only display brief status updates that reflect initial investigation or discovery on our end. For resolution and ongoing updates, please login to the customer portal.
The second of our two notification systems is called the “ServInt Status Report,” and it can be found at https://servint.statuspage.io/ This site was set up to display information about service issues that are affecting ServInt’s core network and/or datacenter, impacting multiple customers at once. Sometimes, these issues can affect ServInt’s customer-facing systems at the same time they impact our datacenter — so, during the crucial minutes after disruptions are discovered, but before they’re fixed, this is the place to turn for immediate updates on what is happening.
Of course, this report is also capable of sending you push updates to let you know when our network is dealing with a service-impacting incident. We strongly recommend you subscribe to these updates, since the number of alerts you are likely to receive will be small, and each of them could be of critical importance to your business.
In addition to providing a real-time network status monitor for ServInt’s Reston, Los Angeles, and Amsterdam datacenters, the ServInt Status Report portal also shows you the operational status of the ServInt customer portal and phone system. In addition, it provides users with detailed updates on what is being done at the moment to address identified issues. Lastly, it displays service incident reports for all events in the recent past, allowing you to better understand what happened to the ServInt network in retrospect — helpful if you only discovered a performance shortcoming after the root cause had been fixed.
Operating instructions for the ServInt Status Report can be found at the site, and they are very straightforward. I strongly urge all ServInt customers to sign up — both for the Status Report and the ‘@ServIntStatus’ Twitter feed — as soon as possible.
It was a terrible, horrible, no good, very bad week at ServInt – the worst we’ve had since 10 years ago, when a fiber cut in just the wrong place brought us offline completely for seven hours. That day was one of the most professionally terrifying of my life, but we learned from it and we grew. In the wake of that event we added redundancies far beyond the “industry standard,” we fixed a ton of processes and we quickly regained the faith of our customers. To this day, that date in 2004 was the very last time ServInt’s entire network has gone down.
Every time we experience problems I am determined to make sure we learn from them. This week is a big learning week, because it’s been fraught with some of the biggest problems we’ve seen in nearly a decade. Let me take a few moments to tell you a bit about what challenges this week brought — to show you what went wrong, and what we did right as we resolved them.
This week started with an announcement that the largest kernel level exploit in the history of our VPS and virtual dedicated offerings had been discovered. This exploit could have allowed hackers to access not only our customers’ VPSs but also the machines that they were hosted upon. A fix would require the reboot of literally thousands of servers, while minimizing the impact on our clients’ businesses — always our top priority. Within 48 hours we performed emergency maintenance on nearly every single customer in our datacenter. This meant forcing every single customer to accept at least a little downtime in the pursuit of vital security protections. Some customers did not like this, but if I had to do it all over again I would do it the same way. I am proud of the way ServInt rose to the challenge and protected our customer base from this dangerous exploit in such a swift manner.
I was really hoping that the week would get easier from there — but it didn’t. Last night, one of ServInt’s datacenters experienced one of the strangest, most difficult to explain, and most difficult to solve networking problems we have ever seen.
We build our networks to withstand most anything. We have stayed up through hurricanes, ice storms, and more equipment failures than I can count. We’ve made it through power disruption for extended periods, and other horrendous events that would have taken down providers that aren’t as thorough, many times over. But this one got us good for a while.
On Saturday evening, our network was running smoothly, as it generally has for more than a decade. Suddenly our monitoring system started showing red/green/red/green/etc. The phrase “this is not a drill” had to be used as senior engineers were plucked from their lives and rushed into the datacenter. Our COO was on a plane, I was at dinner, but the engineering fix-it team that really needed to be there was there, immediately. What made this situation unique, and what made it impossible to fix in the normal few minutes, was the fact that the critical equipment that was in the process of failing seemed incapable of making up its mind whether it was healthy or not. Making matters more challenging: high levels of equipment redundancy (normally a very good thing) made it nearly impossible to determine where the problem lay. Our top engineers, without access to reliable diagnostic data, literally had to pull the network apart and put it back together to find the exact piece of hardware that went haywire (in this case a router) that caused everything else to behave erratically. In the meantime, there was simply no information to share with increasingly frustrated customers, and our Tweets and Facebook posts began to sound unnecessarily vague.
In a typical router-failure situation, as soon as the router shows “red/down” on our monitoring system, we post “we had a failed router interrupt traffic impact the network. This is being fixed and we’re routing around it — we’re sorry for the inconvenience.” Those are facts and details, things people can get confidence from. However, with no reliable detail to pass on, our team was left to pass on rather vague updates for quite some time. It was frustrating and made us seem much worse about communication than we actually are.
In the end, last night’s events pointed out some of ServInt’s greatest historical strengths — and some newly discovered weaknesses. We’re still the best in the business at running a reliable, robust network and data center — and, when necessary, finding and fixing complex technical problems. When it comes to customer support and communication through a crisis, however, we need to do better. Having no support/communication failover systems, and forcing ServInt and its customers to rely on Twitter and Facebook to communicate, was totally unacceptable. We will build greater redundancy into our ticketing and communication systems to make sure that never happens again.
Having said that, we can’t promise that technical glitches will never happen again. They are a fact of internet life. What matters most is that we must always — always — learn from these thankfully rare events, and become a better service provider as a result. I promise you we will do so in this case as well. You’ll see the results of this growth as the weeks and months unfold. I am confident you’ll like what you see. Thank you, as always, for your continued faith and trust in us.
There’s an interesting parallel between the way people buy web hosting and the way they buy sports cars. Frequently, the sports car purchaser who doesn’t actually compete in races will buy their vehicle based on theoretical maximum performance capability, examining numbers like top speed, maximum horsepower and so forth to see how fast their dream car might theoretically go.
Of course, people who actually race for a living understand a critically important maxim: top speeds don’t win races, high average speeds do. That means it’s just as important to be able to speed around accidents and slow traffic as it is to power down the straightaways as fast as possible.
It’s the same with hosting. The size of a CPU, the amount of RAM, the network uplink speed — these are all important metrics, but everybody’s working with similar engines these days. You can get your specs and never see reliable performance at other host because your server still can’t swerve around the accidents and slower traffic without getting bogged down. Why? Because of something called IOPS. Read more
If you’ve managed online applications or websites for any length of time, you’ve almost certainly dealt with hardware failures. VPS technology mitigates some of the more common types of failures, and Cloud has mitigated others. But the fact remains, hardware failures — failures of the machines housing and crunching your data — can still happen at any time.
There are many hardware and software solutions to limit the damage from hardware failures: RAID arrays, hot-swappable drives, dual power supplies, multi-core computers, and multi-stick RAM all work to introduce redundancy into the hardware; while backup solutions, load balancing and CDNs introduce redundancy into the data.
Most hosted content, however — whether it’s hosted on a dedicated server, VPS server or “in the Cloud” — still exists on one single physical computer. So if there is a catastrophic failure of that computer, your site goes away until the data can be recovered and rewritten to the drives on a new computer. Read more
There was a time in hosting’s distant past when virtualization and Cloud were foreign words. Back then, the idea that you could put multiple customers on a single host machine and give them all fully partitioned and secure “virtual environments” — environments that looked and acted exactly like a small dedicated server — was novel, if not literally unbelievable. Most people who wanted to host a website simply assumed they had to build or rent a physical server in a room somewhere.
Oh, how things have changed. Now, actual physical infrastructure has become conceptually divorced from the idea of a “web server.” Want to host a web site? These days, you buy amorphous cloudy things like “instances” and “environments,” which you scale up or down as your site requires, nearly instantaneously. Costs are down, speed-to-deployment is way up, and it’s all pretty miraculous. But our eagerness to forget what a pain in the neck it is to actually own and manage a real, live server has also made us forget what we sacrificed to get scalability, redundancy, flexibility, and all the other benefits of virtualization.
The big tradeoff — the “con” against which all the “pros” of cloud must be weighed — is the fact that, no matter how you slice it up and partition it, shared infrastructure is just that: shared, usually by many. Read more
Last week we talked about the dangers of generalizing about website and app requirements when picking a cloud service provider. Here’s the big question we’re going to try to answer this week:
Is it even possible to compare prices between cloud hosting options?
An increasing number of large cloud service providers have been trying to address the problem of explaining just what their services cost by producing cost calculators like Amazon’s. There are a few problems with these calculators. Read more
Last week, a good friend who works at Google sent me a link to a Wall Street Journal story on the price wars that seem to be heating up in the cloud computing and storage sectors. (Editor’s note: WSJ hyperlinks only work once. To read this article run a google search for “A Price War Erupts in Cloud Services”)
I found the article fascinating, but I thought it did a surprisingly poor job helping the reader understand how the Cloud might affect real-world hosting decisions.
At the center of the problem was the effort the author made to demystify the cost of cloud hosting. In order to provide a common storage and processing task against which all the major cloud service providers’ fees would be measured, the author chose the following:
“(Hosting) a medium-sized website with about 50 million page views a month…” Read more
In January, ServInt launched our cutting-edge SolidFire SSD VPS cloud storage platform. It is simply the fastest, most highly scalable, and most reliable turn-key hosting solution on the market today.
Almost since the day we launched the SolidFire SSD VPS, our customers have been asking when they’d be able to buy a dedicated server with SolidFire SSD cloud storage.
That day has arrived!
You can now order a Flex Dedicated server with either onboard SSD or SolidFire SSD cloud storage. Both options offer the speed of an all-SSD storage array, but our SolidFire SSD cloud storage gives you additional advantages, summarized below: Read more
“This weakness allows stealing the information protected, under normal conditions, by the SSL/TLS encryption used to secure the Internet. SSL/TLS provides communication security and privacy over the Internet for applications such as web, email, instant messaging (IM) and some virtual private networks (VPNs).”
This vulnerability impacts openssl versions 1.0.1 and 1.0.2-beta. ServInt customers may have this vulnerability if they are running CentOS 6. CentOS 4 and 5 do not have versions impacted by the Heartbleed vulnerability. Read more