Unix sysadmin blog: Unixplaza

Unixplaza Blog Site

My friend and former collegue Mohamed recently launched a Unix blog: "Unixplaza". Looks like he is writing about his own Unix sysadmin adventures. He really knows his stuff, so that should be cool to watch.

When you touch that server you touch me

They turned off the HPSIM and general management and alerting server this morning, or at least, unplugged it, cause it was causing this huge network spike at a remote site

I know for a fact that no one besides myself knows what it is exactly that machine does, as its only usefull to me and what I do.

That doesnt mean it isnt explained in the server list in Sharepoint that I made and painstakingly try to keep up to date, that no one bothers to ever look at.

And of course no one bothered to ask during the day what exactly the impact is that they unplugged the server.

I mean, who cares about hardware and remote monitoring of servers anyway. It is, after all, only the most basic part of my job.

That made me feel really appreciated.

HPSIM was reinstalled a few weeks go by one of my collegues. When I explained it took me 2 days to set it up last time I installed it, he was suprised.

I will admit, it doesnt need to take that long. But it was new software to me at the time, and I was carefull, and ran into some awkward service account issues.

Its a very messy collection of software, basicly, so you need to be carefull and precise.

I read the manuals first.

I ended up needing 3 different service accounts. With different levels of rights and access. 

He reinstalled HPSIM in about 1 hour. Its his way, he loves to impress with how fast he can do things.

I havent logged on to it in the meantime, because my time was needed elsewhere for the last few weeks. Build activities that go first. Project. Bids. Money.

I warned them in a long email 2 weeks ago, that no one was now doing any active systems administration. No one was keeping an eye on things. No one was cuting the grass.

Fast forward to this morning...

So, I cant dispute that HPSIM or something on that server killed that sites 2mbit WAN line for an hour, daily, between 10 and 11.

I went in over the ILO to have a look, after I asked them to at least plug -that- back in.

HPSIM service wouldnt start, as it couldnt authenticate its domain service account, cause it had no network. This was expected.

What wasn't expected, was the fact that it was using this collegues domain admin account to start.

And so was the OpenSSH service.

And so was the Sofware update repository service.

I curse myself for not having reinstalled it myself, for one. And I curse myself for not having managed that server myself the past few weeks.

They ask me now, wtf was that server doing? I honestly dont know. I havent managed it for the past few weeks, due to me being allocated to build activities, as they well know.

I hate it. I hate the fact that I dont know.

Even though I have no need to feel responsible, I so very much do. This server was mine, it did this on my watch, at least that is how it feels.

I cant be sure what caused the network spike, and I will never know because they wont let me plug the server back into the network.

This weekend I will reinstall HPSim on a different server. A server that I had racked as spare, for this exact kind of scenario.

It will be reintalled slowly, carefully, with the appropriate documentation at hand, as I did last time.

It will be stable. It will be secure. It will be managed.

It will be beautifull.

And I am not gonna let anyone else on that server. If it ever misbehaves again, they can hold me personally acountable, I want them to, god knows I want them to.

There is only one person in my department with a sense of responsibility for our enviroment.

There is only one person in my department who actually cares things are done correctly.

Every time I place my trust in another technical person, I am dissapointed.

No one else is touching that server from now on.

Happy Sysadmin day.

Big Bang Servermove succesfull

This Saterday, we switched the subnet over, and moved all the remain criticle systems over to a new serverroom across the country.

I didnt get to make as many pics as I liked, and no video, this was mainly cause I was so bussy of course. :D

Half the pics below are curtesy of Arnold.

IMG_3656
Back of the Digital Alpha box, refernece sho for recabling.

IMG_3657
Packing up the servers in special locked crates. You could see these movers where the right stuff, big burly guys, but they handled the servers like feathers.

 

IMG_3658
Justin felt at home at the new location.

Big Bang Photo 005
x3800 waiting to be converted.

Big Bang Photo 001
We layed out the servers in the hall in the order we where building them into the racks.

IMG_3660
Our project operation center, round the corner of the serverroom. From here the managers of the project coordinated the downtime, and the business on-site testing.

IMG_3659
Tom, our WAN guy, on the phone with Mohamed, our Unix guy, who was supporting us remotely.

IMG_3661

Big Bang Photo 004

Big Bang Photo 003
Very good food and snacks where provided by Arnold and Jan. Thank you guys! You know the best way to an engineers heart is through his stomach!

IMG_3658

Big Bang Photo 006
Justin loves Legos, so this pictures seems to make sense.

Big Bang Photo 007
Justin working on the rack conversion kit for the IBM systemx 3800

Big Bang Photo 008
Tom and me discussing the Proxy server and internet line out. Old proxy, new Internet line, and some firewall rules where needed.

Big Bang Photo 010
Arnold:“2 pizza please!” ;  Mustafa is thinking: “I dont like Pizza”…

Big Bang Photo 011
Paul being technical. We hold our collective breath.

Big Bang Photo 009
Sliding in a server into its new home.

IMG_3662
Bottom of racks SR3 and SR4

IMG_3666
Top Left rack, SR4

IMG_3665
A suprise box. This witebox FTP server turned out to be running 4 essential ustomer FTP/EDI flows. We had space for it thankfully, resting on the IBM x3800. Hopefully it will be gone in 2  weeks, but nothing is temporary.  Made this picture of it for the Visio rack diagram.

IMG_3668


IMG_3669
One of the 2 redundant FTP servers failed on transport. Justin and Paul spend 2 hours putting a new one together out of spare parts of old ones. HP Netservers here, P2 machines, a decade old.

IMG_3670
Behind locked door and locked racks, all servers humm quitely, content in their new home.

A day later, back in the first location:

IMG_3674
Empty racks moved out of the server rooms

IMG_3675
Mail Cluster going to get sent back to UK

IMG_3677

IMG_3678
Gertjan visibilly enjoying dismanteling the place. 6 years of mental burden of supporting tihs stuff being dealt with here ;)


IMG_3680
The NAS, all the data for the Netherlands, with all volumes deleted and now unplugged, ready for storage.

Videos of “demolition”:

 


Mustafa and Gertjan have taken apart the entire second server room in 1 day. (click here for link if you cant see the embed above)

 


Oooh.. Nobs! Lovely nobs!!  (click here for link if you cant see the embed above)

Meanwhile, back at the old location

 

 
A tour of the current state of the office, and one of the server rooms, where most racks are now empty or near-empty.


An Epic moment. Gert-Jan uninstalls Citrix on the last 2 servers in the farm, effectively ending 6 years of the 2000-user, 60-server Metaframe XP Citrix farm that served our Netherlands users. I would have had Marcel do this, but he has gone on holiday.


Various IBM servers and almost all the blades lie ready to be moved to the new location. We don't have a use for any of these currently, so they will go into storage.



Various IBM servers and almost all the blades lie ready to be moved to the new location. We don't have a use for any of these currently, so they will go into storage.

IMG_0695 IMG_0684
Remember when we where young and the world (servers) where new?


Both blade centers are going into storage, we have no use for them.


IMG_0909
Blade Centers in they Hayday

The racks are becoming quite bare now.

Before:
IMG_0144

After:

 

Before:
IMG_0143

After:


Before:
IMG_0987

After:

 

Before:
IMG_0149

After:

 


The pile for the garbage container crows and grows.

 


Marcel posing with all off the DL360 G3 severs (going into storage). he built the farm all those years ago, now he bears witness to its demise. Its a little sad for all of us that spent so years maintaining it all.

 


I managed to get my hands on one of my favorite servers, the IBM systemx 3650 with the 8 SAS disks in it. We reinstalled it as a new SQL2005 system, that will host, amungst other things, the HPSIM, IBM Director, Websense and Sharepoint databases. All for internal IT use.

 

T-Minus 1 day to Big Bang serverroom move

Spent today going over some cabling details mostly. Its strange how stressed we all became over these little details, while we will have larger issues to worry about on the day.


(high res)

During the move, we will be moving the Voip servers, RF-Controllers (used by various sites to do handheld-scanning), an Authentication server, the Wyse-Terminal (thin client) management server, and 1 PDC.

We are also moving 2 very criticle FTP/EDI (Electronic Data Interchange) servers, that handle lots of customer-related FTP flows and internal EDI between warehouse management systems.

All the above systems (in the rack diagram in RED) rely on keeping their old IP adress for now, and thus we are moving the entire subnet over to the new site. This is why we call it the big bang. After that, the old site will no longer be routable to the rest of the network, and all that remains is stripping it and storing the hardware that is left.

Also the former Domain Controllers / DNS servers are an issue. Their IP adresses have long since been used in clients all over the place, the config of many of which we cant control centerally (for example through DHCP). Therefore I am moving 1 of the old PDC’s, and its keeping 2 of the 3 legacy IP adresses we need to keep alive.

We spent yesterday doing last cabling work and other such things in preperation for Saterday (see pics below). I still have some things to do, there are some management/reporting scripts still running on the old site I need to migrate tomorow, and I have yet to get round to re-installing HPSIM again. I might start on it after this post, actually.

Here are yesterdays pics:

 

Starting to look more like the planned diagram now. As you can see we, we replaced the Sun Storagetek with the HP Storageworks MSL9000 that we salvaged from one of the Warehouse Management Systems that was migrated to Prague last week. We dont know if we can use it though, its kinda overkill for the amount of data we now need to backup anyway.



Mustafa is set to become my new right-hand man after his project, he certainly has got the right attitude ;)

Oh thats Justin, he has been added to this project to take some of the load of me. He really knows his stuff and is a wirlwind of highly-opinionated energy ;)

Ninja-Consultant. Implements ESX-virtualisation solutions when you are not looking!

Ser is our resident LAN guy.  Here we created a patching/switchport diagram for the racks (on the laptop screen), and he is tightening up the VLAN configs on the stacked core switches. Justin and me then quickly repatched everything before anyone noticed there was a service interruption ;) (dont worry, the really criticle stuff goes in during the Big Bang)

Version 7.1 of the TCR rack diagram

This is more or less how its gonna be. All the bits are now in place, half the rack stuff has been put in place already, the last stuff goes in coming Vriday.

One of the biggest oversights we made was with power requirements. If you want redundant power for a lot of servers, you need to make sure you have a LOT of plugs, enough PDU’s, and plenty of PSU to back it all up. We didnt, partly due to time contraints, budget, but also short-sightedness, and my reliance on people that I, to put it bluntly, shouldnt have relied on to do the math.

Above pic also available this time is very high res, so you can see all the exact server types. These Visio stencils are very detailed, I love em!

 

Computer Room Cleanup and re-cabling Pics and Movies

So a few weeks ago, we spent our Saterday doing major recable work on the new Technical Computer room (TCR) in the new location, in preperation for our Big Bang services move coming up on the 5th of July (it was delayed from the 22nd))

So thise pics and videos are a few weeks old now, but who cares.

PICTURES (scroll down for movies)

IMG_3522 Not enough room to neatly get all the fat KVM cables sorted, so later I actually extended the room for the cables to 2U, and used the cable guide rail only for ethernet cables and power.

IMG_3521 2U above the KVM  not just being used for those fat cables, but also for the Avocent Switchview box, and its adapter, that is attached to the HP KVM box. Advocent KVM/IP hardware rocks by the way. If you have a non-IP KVM swich, consider the Switchview IP 1020, it makes life so much easier and they are super simple to set up!

IMG_3520 Too many cables, not enough room. Once we are done with the move, I am gonna seriously tie this shit down. Meanwhile I curse those HP 10K racks for not having ANY room at the sides for large cable bundles. Oh.. not to mention ANYTHING to attach a tierib to!

IMG_3519 The lineup of servers. The missing one was at the old location being synced up with another FTP box.

IMG_3514 The master at work ;)

IMG_3513 Not easy to work with all the cables all over the place.

IMG_3512 The patch panel guys where not done yet as you can see.

IMG_3511 Hello!

IMG_3508 Mustafa putting some of the servers into cold storage. You wouldn’t believe the kind of hardware we will end up putting into storage cause we have no immediate use for it!

IMG_3507 This is what happens when you dont move your servers professionally. We dont have the budget for it, so that is our exuse. Servers where fine, though I would not recommend this.

IMG_3506 Carting around servers.

IMG_3548 So very little room to work with. Look at the space at the sides, notice the PDU, then imagine it being full of servers.  See my point?

IMG_3547 So here we decided to move some servers over to the middle rack. This is partly to do with the fact that I discovered we dont have nearly enough power connectors in these racks to fill them.

IMG_3545 I havent been back there since, I really hope they cleared these bundles up now.

IMG_3544 I hate loose cables like that.

IMG_3538 HP-UX on a 9000 box. Very old, but still used. Hosts a warehouse management system, all terminal based.

IMG_3537 The proud admin of the HP box.

IMG_3534 Richard wasnt very usefull during our work, so he caught up on project admin ;)

MOVIES

 

 


A lot of engineers in a small space. This movie gives you an idea of the space :)

 
Me working on the cables. All the cables all over the floor was half the point of tackling this place, what a mess.


Me working on getting the KVM set up. The little box is the Avocent IP View 1020

 


The most important element of our productivity right there!

 

 


All the “temporary” plugs and cables finally being cleared out. I wonder how many admins out there wish they had the time or the downtime to do the same and really clean house.


Overview of our LAN and WAN racks.

 


Making slow but steady progress on the cables.  3 ours later it would look very diffferent!

 


Labling ALL of your cables on both ends means you can accuractly document what hooks up to what from A-Z. Doing the initial labeling though is a pain, so this is where our “certified labeler”  came in ;) 


The amount of old hardware our IT department has in stock is just crazy. They dont know what to do with it all! We should have a CRT yardsale or something. Anyone need a Lexmark multifunctional? The IBM xseries 336 being stored here are only a fraction of the hardware we will be storing.

 


Its 2 am and the end of our scheduled downtime. Clearing up the last stuff and wiping the servers off, and turning em all back on. Hooked up to the new UPS, we could finally have them ALL on without burning out the old grid. The best bit of all was we managed to clear almost all cables from the floor.

 

 

 

Datacenter Move post 4

Youtube videos are worth a thousands words, so I will let them do the talking.

Progress TCR Move May27 part 1 (the old location)

Progress TCR Move May27 part 2 (the new location)

 

What Mustafa thinks of IBM rackmount kits: 

  And some pics from the last 3 days:
(Click for larger versions)



IBM xseries 336, bout 3 years old now

IMG_3452
ServeRAID 6M controller (in the PCI slot bay). Its IBM branded but its basically an Adaptec.
This comes out of one of the 2 xseries 336 servers that, together with the EXP400 shelf, served as a Windows 2003 cluster. The ServeRAID controllers are needed to provide failover control of the shared disk shelf.

IMG_3453
Mainboard of an xseries 336, with the PCI card bay/thing removed.  The blue bracket at the top is where usually an RSA-II management card would be sitting, but this one doesnt have one :( 

IMG_3454
ServeRAID 6M removed from PCI bay of the 336 server

IMG_3456
The rack is slowly emptying. I remember when I was building it all up, 3 years ago! Check it out

IMG_3459
Installing windows on an IBM xseries 336. Its been a while. I noticed the IBM Serveguide CD has a few more options now.

IMG_3460
Picture I needed to have to illustrate where to connect everything.

IMG_3461
Ready to move to new location

IMG_3464

IMG_3465
Not the most ideal way of moving servers, but its better than nothing. At least they are softer  here than in the back of the car.

IMG_3467
Richard, our project manager, trying to get more work in.

IMG_3469
Temporary cabling ;)

IMG_3472
Its slowly growing

IMG_3473
They are not done with the rack interconnects, damnit. I cant finish my patching like this.

IMG_3474
Our new firewall cluster

IMG_3475
I love the blue glow of the console. Kinda wierd to have that out-of-place IBM in there, squashed between the HP.s.  The cable rail for the server is different than HP aswell, so that will be fun to cable.

IMG_3476
Moved some servers around, getting to our final config now.

IMG_3477
WAN comms rack in the new location

IMG_3478
LANcoms rack in the new location

IMG_3480
Khalid on servicedesk duty

IMG_3481
Gertjan is helping us decomission

IMG_3482

IMG_3483
Mustafa hard at work decomissioning servers


IMG_3486
A lot of servers where decommissioned today, these are all basically being scrapped.