Last week I wrote about how ServInt was beating the NSA. Here’s a talk I gave on the same subject in TechWeek in Chicago last month. ServInt cares about protecting our users’ rights. The talk will explain why we care, and what we’re trying to do to fix the “NSA problem.”
Yesterday I was interviewed by Bloomberg News about the effects of NSA surveillance on the Cloud. They wanted to know if we had lost any customers specifically because of the Edward Snowden leaks. This, of course, is a hot topic: how is mass surveillance affecting the cloud, and can we quantify the damage that is being done? Is it costing us jobs and economic growth in the cloud? The answer, of course, is “yes” — and ServInt isn’t scared of saying so.
I said that we had lost customers and even more potential customers — which is true. ServInt has been one of the few players willing to speak up and say this and as a result we have been quoted in places like The Hill and the New York Times. The cloud hosting field is a tough, competitive business and it is hard to talk about losses. But ServInt isn’t afraid of calling out the problem, because we have been leaders in directly addressing the issue since it arose a little over a year ago.
The cloud in the United States has been badly hurt by the actions of the NSA. These days anybody can relocate their digital business with just two or three clicks of a mouse. You don’t need to sign a long contract or tell anybody why you are making your choice, you just move. I’ve talked to a lot of people who have decided they want to move their business outside of the United States because they feel like the US doesn’t care about privacy. I’m quoted in the Bloomberg article about this being a “death by a thousand papercuts.” I was talking about the affect on the overall economy, not our business, which for the record has seen a 30 percent decline in foreign signups since the NSA leaks began, not a 30 percent decline in total foreign customers.
In fact, ServInt is actually weathering the Snowden storm very well, compared to many of our competitors. Why? Because our clients trust us. They understand the cardinal rule of security and data safety:
It’s not where you’re hosted, it’s how you’re hosted.
Your business needs to stay up, online and fast. It needs to stay stable and secure. And your data needs to be protected. You need experts at the helm to accomplish all of those things — experts you trust. And earning the trust of small to medium businesses is what ServInt has been doing for 19 years.
The NSA revelations are just another hurdle to overcome in ServInt’s ongoing pursuit of being the most trusted name in the Cloud. We’re doing so by requiring warrants for content, and by responsible handling of data. We’re doing so by being thought leaders in the fight against NSA surveillance in Washington, through our leadership within the i2Coalition. And we’re trying to curb the misinformation about NSA surveillance. Everybody tempted to move their content out of US datacenters needs to remember that the vast majority of all spying is done on foreign networks. “Move your site out of the U.S. to avoid spying” may be good marketing, but it doesn’t take into account the reality of how surveillance works.
We do all this because we want to win the day, and win it honorably, by doing the right thing We win the day when we make customer trust our number one goal. We win the day when our customers know we have their backs when it comes to protecting their data, and we win the day when we fight for privacy and NSA accountability.
Editor’s Note: As third-party software, ServInt does not support Nginx beyond installation. Also, to follow these instructions, users must have a working knowledge of the command line. Click here if you would like more information about logging into your server on the command line.
Nginx offers an alternative to the Apache web server popular with some server admins. Advanced users may choose to run Nginx instead of Apache, as it is believed to offer possible performance benefits in certain configurations. Read on if you are interested in installing Nginx on a server running cPanel. Read more
Earlier this week, we launched two new tools designed to immediately inform our customers whenever issues emerge that may affect the performance or accessibility of their server. In this brief blog post, we’ll explain how these new services work and how you can put them to use.
First, an important detail: both of these notification services are hosted completely independent of the ServInt network — so, no network or datacenter event that impacts ServInt’s core infrastructure will prevent us from providing you with up-to-date information. If you want to be certain you always have visibility into the ServInt nerve center in the event of performance issues that may affect your server, I strongly urge you to sign up for both of these services.
These systems were designed to alert you about two kinds of incidents: those that affect specific infrastructure, where the larger ServInt network is functioning without incident; and events that are having an impact on entire segments of the larger ServInt network, affecting multiple customers at once. We are very proud of our high levels of network redundancy and our enterprise-grade hardware, so incidents that involve our infrastructure are rare. Even so, problems can and do occur, albeit rarely, despite our best efforts. That’s why these systems exist. We hope you never have to use them, but in the interest of full transparency, they are here for you.
For reporting on smaller pieces of infrastructure, we have launched an opt-in Twitter account called ‘@ServIntStatus’ (twitter.com/servintstatus). Use of this system is simple: items will pop up on this feed when we are investigating problems specific to a hardware or system component. Of course, you also have the option of having Twitter send you push notifications from the account whenever an issue is discovered, or a maintenance requirement is being fulfilled — but this may result in information overload due to the size of our network. Please note that this feed will only display brief status updates that reflect initial investigation or discovery on our end. For resolution and ongoing updates, please login to the customer portal.
The second of our two notification systems is called the “ServInt Status Report,” and it can be found at https://servint.statuspage.io/ This site was set up to display information about service issues that are affecting ServInt’s core network and/or datacenter, impacting multiple customers at once. Sometimes, these issues can affect ServInt’s customer-facing systems at the same time they impact our datacenter — so, during the crucial minutes after disruptions are discovered, but before they’re fixed, this is the place to turn for immediate updates on what is happening.
Of course, this report is also capable of sending you push updates to let you know when our network is dealing with a service-impacting incident. We strongly recommend you subscribe to these updates, since the number of alerts you are likely to receive will be small, and each of them could be of critical importance to your business.
In addition to providing a real-time network status monitor for ServInt’s Reston, Los Angeles, and Amsterdam datacenters, the ServInt Status Report portal also shows you the operational status of the ServInt customer portal and phone system. In addition, it provides users with detailed updates on what is being done at the moment to address identified issues. Lastly, it displays service incident reports for all events in the recent past, allowing you to better understand what happened to the ServInt network in retrospect — helpful if you only discovered a performance shortcoming after the root cause had been fixed.
Operating instructions for the ServInt Status Report can be found at the site, and they are very straightforward. I strongly urge all ServInt customers to sign up — both for the Status Report and the ‘@ServIntStatus’ Twitter feed — as soon as possible.
It was a terrible, horrible, no good, very bad week at ServInt – the worst we’ve had since 10 years ago, when a fiber cut in just the wrong place brought us offline completely for seven hours. That day was one of the most professionally terrifying of my life, but we learned from it and we grew. In the wake of that event we added redundancies far beyond the “industry standard,” we fixed a ton of processes and we quickly regained the faith of our customers. To this day, that date in 2004 was the very last time ServInt’s entire network has gone down.
Every time we experience problems I am determined to make sure we learn from them. This week is a big learning week, because it’s been fraught with some of the biggest problems we’ve seen in nearly a decade. Let me take a few moments to tell you a bit about what challenges this week brought — to show you what went wrong, and what we did right as we resolved them.
This week started with an announcement that the largest kernel level exploit in the history of our VPS and virtual dedicated offerings had been discovered. This exploit could have allowed hackers to access not only our customers’ VPSs but also the machines that they were hosted upon. A fix would require the reboot of literally thousands of servers, while minimizing the impact on our clients’ businesses — always our top priority. Within 48 hours we performed emergency maintenance on nearly every single customer in our datacenter. This meant forcing every single customer to accept at least a little downtime in the pursuit of vital security protections. Some customers did not like this, but if I had to do it all over again I would do it the same way. I am proud of the way ServInt rose to the challenge and protected our customer base from this dangerous exploit in such a swift manner.
I was really hoping that the week would get easier from there — but it didn’t. Last night, one of ServInt’s datacenters experienced one of the strangest, most difficult to explain, and most difficult to solve networking problems we have ever seen.
We build our networks to withstand most anything. We have stayed up through hurricanes, ice storms, and more equipment failures than I can count. We’ve made it through power disruption for extended periods, and other horrendous events that would have taken down providers that aren’t as thorough, many times over. But this one got us good for a while.
On Saturday evening, our network was running smoothly, as it generally has for more than a decade. Suddenly our monitoring system started showing red/green/red/green/etc. The phrase “this is not a drill” had to be used as senior engineers were plucked from their lives and rushed into the datacenter. Our COO was on a plane, I was at dinner, but the engineering fix-it team that really needed to be there was there, immediately. What made this situation unique, and what made it impossible to fix in the normal few minutes, was the fact that the critical equipment that was in the process of failing seemed incapable of making up its mind whether it was healthy or not. Making matters more challenging: high levels of equipment redundancy (normally a very good thing) made it nearly impossible to determine where the problem lay. Our top engineers, without access to reliable diagnostic data, literally had to pull the network apart and put it back together to find the exact piece of hardware that went haywire (in this case a router) that caused everything else to behave erratically. In the meantime, there was simply no information to share with increasingly frustrated customers, and our Tweets and Facebook posts began to sound unnecessarily vague.
In a typical router-failure situation, as soon as the router shows “red/down” on our monitoring system, we post “we had a failed router interrupt traffic impact the network. This is being fixed and we’re routing around it — we’re sorry for the inconvenience.” Those are facts and details, things people can get confidence from. However, with no reliable detail to pass on, our team was left to pass on rather vague updates for quite some time. It was frustrating and made us seem much worse about communication than we actually are.
In the end, last night’s events pointed out some of ServInt’s greatest historical strengths — and some newly discovered weaknesses. We’re still the best in the business at running a reliable, robust network and data center — and, when necessary, finding and fixing complex technical problems. When it comes to customer support and communication through a crisis, however, we need to do better. Having no support/communication failover systems, and forcing ServInt and its customers to rely on Twitter and Facebook to communicate, was totally unacceptable. We will build greater redundancy into our ticketing and communication systems to make sure that never happens again.
Having said that, we can’t promise that technical glitches will never happen again. They are a fact of internet life. What matters most is that we must always — always — learn from these thankfully rare events, and become a better service provider as a result. I promise you we will do so in this case as well. You’ll see the results of this growth as the weeks and months unfold. I am confident you’ll like what you see. Thank you, as always, for your continued faith and trust in us.
There’s an interesting parallel between the way people buy web hosting and the way they buy sports cars. Frequently, the sports car purchaser who doesn’t actually compete in races will buy their vehicle based on theoretical maximum performance capability, examining numbers like top speed, maximum horsepower and so forth to see how fast their dream car might theoretically go.
Of course, people who actually race for a living understand a critically important maxim: top speeds don’t win races, high average speeds do. That means it’s just as important to be able to speed around accidents and slow traffic as it is to power down the straightaways as fast as possible.
It’s the same with hosting. The size of a CPU, the amount of RAM, the network uplink speed — these are all important metrics, but everybody’s working with similar engines these days. You can get your specs and never see reliable performance at other host because your server still can’t swerve around the accidents and slower traffic without getting bogged down. Why? Because of something called IOPS. Read more
If you’ve managed online applications or websites for any length of time, you’ve almost certainly dealt with hardware failures. VPS technology mitigates some of the more common types of failures, and Cloud has mitigated others. But the fact remains, hardware failures — failures of the machines housing and crunching your data — can still happen at any time.
There are many hardware and software solutions to limit the damage from hardware failures: RAID arrays, hot-swappable drives, dual power supplies, multi-core computers, and multi-stick RAM all work to introduce redundancy into the hardware; while backup solutions, load balancing and CDNs introduce redundancy into the data.
Most hosted content, however — whether it’s hosted on a dedicated server, VPS server or “in the Cloud” — still exists on one single physical computer. So if there is a catastrophic failure of that computer, your site goes away until the data can be recovered and rewritten to the drives on a new computer. Read more
There was a time in hosting’s distant past when virtualization and Cloud were foreign words. Back then, the idea that you could put multiple customers on a single host machine and give them all fully partitioned and secure “virtual environments” — environments that looked and acted exactly like a small dedicated server — was novel, if not literally unbelievable. Most people who wanted to host a website simply assumed they had to build or rent a physical server in a room somewhere.
Oh, how things have changed. Now, actual physical infrastructure has become conceptually divorced from the idea of a “web server.” Want to host a web site? These days, you buy amorphous cloudy things like “instances” and “environments,” which you scale up or down as your site requires, nearly instantaneously. Costs are down, speed-to-deployment is way up, and it’s all pretty miraculous. But our eagerness to forget what a pain in the neck it is to actually own and manage a real, live server has also made us forget what we sacrificed to get scalability, redundancy, flexibility, and all the other benefits of virtualization.
The big tradeoff — the “con” against which all the “pros” of cloud must be weighed — is the fact that, no matter how you slice it up and partition it, shared infrastructure is just that: shared, usually by many. Read more
Last week we talked about the dangers of generalizing about website and app requirements when picking a cloud service provider. Here’s the big question we’re going to try to answer this week:
Is it even possible to compare prices between cloud hosting options?
An increasing number of large cloud service providers have been trying to address the problem of explaining just what their services cost by producing cost calculators like Amazon’s. There are a few problems with these calculators. Read more
Last week, a good friend who works at Google sent me a link to a Wall Street Journal story on the price wars that seem to be heating up in the cloud computing and storage sectors. (Editor’s note: WSJ hyperlinks only work once. To read this article run a google search for “A Price War Erupts in Cloud Services”)
I found the article fascinating, but I thought it did a surprisingly poor job helping the reader understand how the Cloud might affect real-world hosting decisions.
At the center of the problem was the effort the author made to demystify the cost of cloud hosting. In order to provide a common storage and processing task against which all the major cloud service providers’ fees would be measured, the author chose the following:
“(Hosting) a medium-sized website with about 50 million page views a month…” Read more