Over the last couple of weeks, we’ve been working on moving our data center at work. This was quite the project in planning (with moving deadlines and contingencies) but, in the end, the planning paid off – this project was very smooth. Continue reading Data Center Move
Tag Archives: Operations
Production Software Support
At work, I’ve been involved in a new effort to overhaul how we handle the development support of our production software. As the Ops lead, that is something I am very passionate about, so I am excited to be involved in this effort. I have to admit that I’m somewhat of an idealist when it comes to projects at work, but I have this vision that production software live in production… Continue reading Production Software Support
Interfacing with Remote Developers
One of the interesting things about my job is that I get to work with a lot of developers. I recently spent a week in Riverdale, MD with a group of developers that we usually only work with remotely – via phone or e-mail, or when we enter a trouble ticket against a system issue or software bug. Working with this team face-to-face was a really rewarding experience.
One of the things that I have learned is that it’s very easy for our team to reach out to developers when something isn’t working. From a professional standpoint, we have all the things necessary to kick up a conversation – something in common (the software), a catalyst (the bug), and a goal (making it work again). With our local development team, that we get to see every day, it’s easy to say, “hey – everything is working swell again, thanks.” With our remote developers, this is a lot harder; it doesn’t really make sense to call them up and say, “Hey – I just wanted to call and say everything is a-ok!” – there is no catalyst. Continue reading Interfacing with Remote Developers
Good Operational Software – Part 2
This is the second part of my thoughts on good operational software. For part one, go back one post to here.
Second, is separating out configuration management from software development. Like logging, this one is easy on the surface; it’s pretty trivial to separate out the main configuration parameters into a configuration file rather than hard coding everything. The most interesting thing about this, however, is that it exposes dependencies in an explicit way. For example, if an application is dependent on a database connection, then there must be configuration related to that database – the host, port, database name, user, and (hopefully encrypted or obscured) the password. Continue reading Good Operational Software – Part 2
Good Operational Software – Part 1
As the Operations lead, I find myself pondering what the difference between good software and good operational software. The software development team here at NSIDC is a sharp group, they know good software when they see it. Further, many of them know software that is not good operational when they see it. But, as a technical group, I don’t think we’ve all nailed down what set of features we can use as a benchmark for “this makes the software operational.” As far as I can tell, these things vary depending on the project, and, like Science Fiction, “good operational software is best described by pointing at it.”
That being said, by way of getting some ideas out there, there are three things that I have noticed I consistently point at and say, “that makes this software operational.” Continue reading Good Operational Software – Part 1
New Operations White Board
This year, during the holiday season, I hatched an idea for a new white board for NSIDC Operations. Our old white board was serviceable, but getting long of the tooth and not as functional as we have needed more recently. So, I ordered a new magnetic white board and started putting together something a bit more flexible that could change as our duties evolve.
Below are pictures of the old and new white boards: Continue reading New Operations White Board
Aside: I love my job
I really enjoy my job, though precisely why I do is quite difficult for me to convey in writing – it’s tied up in what I do, who I work with, and just how much fun the daily problem solving is for me. If this sounds strange to you, this post is likely going to be difficult to grok. For everyone else, here I go…
What I do
I am the Data Operations Supervisor at the National Snow and Ice Data Center. I’ve been working for the NSIDC off and on since 2005 (I started as a student while I was working on my graduate degree), and consistently since 2010 (when I graduated and came back to work full time). My job is currently split between data operations tasks, high-level architecture/engineering, and administrative/supervisory stuff.
The data operations tasks take the bulk of my time, but because I have one foot in architecture, I help bubble up things that would make the job easier/harder, while also integrating new technologies/techniques into our operational stack. Going back and forth between detailed troubleshooting and high-level work in particular makes this very rewarding as I get to tackle problems from both ends – how to fix it now, and how to keep it from breaking tomorrow.
A Day in the Life: Part 2
This is a continuation of my series on a typical day in my life. The first part is here. This part sets up what my day job is like in probably more detail than anyone wants to know. I’ll talk about why I love my job in another aside.
Another Morning
Once I get to work, my day isn’t quite as time stamped and regular as the rest of the morning, but there are some key things that I’ve come to count on. I make breakfast in the work kitchen, and start getting caught up on my morning e-mail. Ops gets a lot of automated e-mail (including normal e-mail, it’s not uncommon for us to get 200-400 e-mails/day), however, a lot of these convey valuable status on the state of the system, so it’s useful to know first thing in the morning. Additionally, we have some daily processing that gets kicked off at ~6:00am, since this is in the critical path for a lot of our near-real-time products, it’s good to have a pulse on that processing earlier rather than later. Typically, gathering an understanding of the state of the system can take until 9:00 or longer depending on what’s going on.
The Operations stand-up is at 9:00am every morning. This is (in theory) a quick, 15 minute, standing meeting with my team to organize the morning and keep everyone on the team apprised of the goings-on of the system. Each day, someone is scheduled to be the ECS lead (responsible for the care and feeding of the EOSDIS Core System – our primary data management system for NASA data), while a second person is scheduled as the V0 lead, which (for historical reasons) is the backup (responsible for our non-ECS operations). During the stand-up, the prior lead and backup provide anything notable from the previous day, as well as any outstanding items. Then, everyone else statuses on their tasks, and any impacts from their work the rest of the team needs to know about.
The Most Interesting Operator
I don’t always deploy code…
…But when I do, I deploy it to production