Planet Sysadmin               

          blogs for sysadmins, chosen by sysadmins...
(Click here for multi-language)

May 18, 2013

/sys/admin/blog

Running IT like a business

Some older but really good articles on running IT like a business:

There are a few additions IT Managers should think hard about:

  • Businesses have customers, not users.  We have to be customer focused and look at everything from the customer’s view at service levels, not at each of our component levels.
  • We should treat the allocation of our human resources, where our staff time goes, just like we treat our financial budgets.
  • Our major goals should include improving business-IT communication and creating value for the business.  The more we integrate with the business the better the value we can add.
  • Business models are moving to cloud strategies.  We’re only going to get busier and need to respond quicker to business needs as our product and IT strategies evolve.   Every little bit we do to improve and standardize processes now will pay us back with dividends as our new world evolves.

We’re embarking on a major culture change in IT if we are going to keep pace with the changing business strategy.

by Joe at May 18, 2013 03:08 PM

Chris Siebenmann

A little habit of our documentation: how we write logins

Ove the years, we've developed a number of local conventions for our local documentation. One of them is that we always write Unix logins with < and > around them, as if they were local email addresses, so that we'll talk about how <cks>'s processes had to be terminated or whatever. When I started here this struck me as vaguely goofy; over time it has rather grown on me and I now think it's a quite clever idea.

Writing logins this way does two things. The first is that they become completely unambiguous. This is not much of an issue with a login like 'cks', but we have any number of logins that are (or could be) people's first or last names, and vice versa. Consistently writing the login with <> around it removes that ambiguity and uncertainty. The second thing it does is that it makes it much easier to search for a particular login in old messages and documentation. Searching for 'chris' may get all sorts of hits that are not actually talking about the login chris; searching for '<chris>' narrows that down a lot.

(Well, sort of. The reality is that we sometimes wind up quoting various sorts of system messages and system logs in our messages and of course these messages generally don't use the '<login>' form. However, often excluding these messages from a later search is good enough because we're mostly interested in the record of active things we did to an account.)

There's a corollary to the convenience of <login>: right now we have no similar notation convention for Unix groups. We write less about Unix groups than about Unix logins (and groups generally have more distinct names), but it would still be nice to have some convention so we could do unambiguous searches and so on.

by cks at May 18, 2013 05:13 AM

May 17, 2013

Aaron Johnson

Chris Siebenmann

Why I'm not considering btrfs for our future fileservers just yet

In a comment on yesterday's entry I was asked:

Could you elaborate on the "btrfs does not qualify" part?

What's missing? How likely do you think this to change in the near future?

I will give a simple looking answer that conceals big depths: what's missing is a btrfs webpage that doesn't say 'run the latest kernel.org kernel' and a Fedora release that doesn't say 'btrfs is still experimental and is included as a technology preview' (which is what Fedora 18 says). It's possible that btrfs is more mature and ready than I think it is, but if so the btrfs people are doing a terrible job of publicizing this. Fundamentally I want to be using something that the developers consider 'mature' or at least 'ready' and I don't want us to be among the first pioneers with a production deployment of decent size in a challenging environment.

Pragmatically there is nothing that btrfs can do to make us consider it in the near future, for reasons I wrote about two years ago in an entry on the timing of production btrfs deployments. If btrfs magically became perfect tomorrow, it would only appear in an Ubuntu LTS release in 2014 and an Red Hat Enterprise release in, well, who knows but probably not this year.

(The current Ubuntu 12.04 LTS has btrfs v3.2, whereas btrfs is up to v3.9 already. The btrfs changelog shows the scope of a year's evolution.)

As far as what in specific is missing, well, I have to confess that I haven't looked at the current state of btrfs in much detail and so I don't have specific answers. I poke at btrfs vaguely every so often; generally I discover something that strikes me as alarming and then I go away again. Since btrfs is never going to be exactly like ZFS, I can't just directly translate our our ZFS fileserver design to btrfs and then complain about what's missing or different. To have a really informed opinion on what btrfs needed and what was wrong with it, I'd have to do a btrfs-based fileserver design from scratch, trying to harmonize what we think we want (which has been shaped by what ZFS gives us) with what btrfs gives us. So far there seems to be no real point to doing that before btrfs stabilizes.

(I'm starting to think that btrfs and ZFS have fundamentally different visions about some things, but that needs some more reading and another entry.)

Sidebar: ZFS on Linux maturity versus btrfs maturity

You might ask why I'm willing to consider ZFS on Linux even though it's a relatively young project, just like btrfs. The answer is that the two are fundamentally different. The ZFS part of ZoL on Linux is generally a mature and well proven codebase; most of the uncertain new bits are just for fitting it into Linux.

by cks at May 17, 2013 05:30 AM

The Nubby Admin

Cascading Failure, Technical Debt, and Punching a House with my Face

At 11:32PM, Saturday May 11th, I got an email from MX Toolbox notifying me that a SBS 2008 machine that I support had gone unresponsive. It’s 600 miles away from me in another state. This was not a strange occurrence with this server.

A Cluster of Prior Failures

Five years ago a small office with a minimal budget needed a SBS implementation. I recommended an HP ML 115 G5 with four hard drives and onboard RAID provided by an NVIDIA chipset. I have regretted that decision for all five years. Here’s a post of mine concerning that chipset and the troubles I’ve had with it.

In short, I have poor insight into and control over the entire server’s health. Some examples include:

  • I couldn’t update the hard drives’ firmware, which was a big deal because the serial numbers of those hard drives fell into a set of drives that have a known problem with suddenly going offline. The firmware update has to be applied through HP’s support tools, which are not supported on the ML 110/115. After much research and seeking help from HP, I was told that, in essence, I was left out to dry.
  • The ML 110/115 does not support the ProLiant Support Pack nor does that model support the Insight Control Manager. Keeping drivers updated and staying abreast of the various components’ health was virtually impossible.
  • There was also no HP ILO CLI interface available which made doing things like firmware updates especially difficult remotely.
  • The on-board storage controller had poor support form Nvidia, and offered very slim storage management features or reporting on hard drive health.

For years I hit the management ceiling with that box which probably cost my client more of my time and theirs than had a more robust server been purchased for twice the hardware cost. And then what I had been dreading for years finally happened…

Two Months Ago

“Did you reboot the server?” That’s never a question you want to hear, especially when you did not reboot a server. I VPN’d into that office’s network and checked for the presence of the server on the network. Yes, the server was down. One power cycle later, the OS loaded just fine.

I checked the event logs and it turns out there was a massive flurry of parity errors that came out of nowhere. The server froze as a result. The controller was apparently dying. After a reboot, the data appeared fine, and there were no more parity errors coming from the Nvidia storage driver. I knew something had to be done, but being remote and working with an office that has a shoestring budget (and can often only afford used shoestrings) made the options few and unattractive.

What’s worse, as I started investigating things further, I noticed that the ILO Advanced card that was in the server was no longer showing on the network. Aaaaand the BIOS clock would reset to July 2009 after being shut down (BIOS battery dying) causing strange problems with Active Directory and other applications running on the network that relied on accurate time (read: everything). AAAaaaaaand the two mirror sets (one for the system volume and one for the email server’s databases) had split apart and could not be re-synced because the Nvidia storage management software no longer recognized that any hard drives were connected.

The options, as I saw them, were for the business to either buy a new RAID controller, BIOS battery, and perhaps ILO card (and then scramble to perform the complex surgery remotely on their own, or pay a local consultant to coordinate with me, or pay to ship me on site) or get a new server altogether (and pay a local consultant to coordinate with me, or… you get the idea). Either way, it started to look more and more like a total forklift migration was necessary.

Two Months Later

Yes, it’s been about two months and the server is still riding in the same perilous state. Split mirrors, bi-monthly freezes that require a power cycle to recover from, and a lot of hoping and praying that data is not corrupted. Welcome to the world of supporting small business IT where people re-use tea bags and don’t run heat or AC in order to save money and keep the business open.

That Saturday night, it was getting late and I was thinking about bed. I checked my email one last time for anything pressing when I saw a MX Toolbox alert. This is never good. I scanned the email, saw what host was causing the alert, and knew that I was dead in the water. I could get into that client’s network via both a SonicWall VPN and unattended TeamViewer installations that existed on most of the workstation PCs. However, it was all futile because I didn’t have hardware level access to the server as a result of the ILO’s failure. The office has a Lantronix Spider KVMoIP device that was being used to work on a workstation migration for one employee, and was therefore not hooked up to the main office server. That was two layers of out of band management that was not doing any good for the most important technology asset in the building.

All of this meant that someone would have to show up at the office to power cycle the PC. The technical debt and compound interest of failure had already mounted fairly high by that point, considering the state of the server. However, things were about to get comical.

I’ll Gladly Pay You Tomorrow for Out of Band Management Today

What happened in the next 24 hours was a morbid comedy of oversights and compounded problems that ended in a whiplash inducing facepalm.

First, I needed to email three people who would most likely be in the vicinity of that office so I could coordinate with one of them to drop by on their Sunday morning and power cycle the server. Except the server is what does email for the organization so I can’t send to their organization email addresses (this is a Microsoft SBS machine). I only know of one employee’s non-work address, and I also happen to know the gmail address of another employee’s son.

I email those two people and tell them of the situation. As it turns out, two key workers are out, traveling to a convention in Texas. That makes access to email even more vital than normal. Everyone knows the situation and there’s not much more I can do so I get to bed. It’s not until about 2PM on Sunday, Mother’s Day here in the USA, that I hear back from one worker who has just enough time to skip by the office and power cycle the server.

Myself, I’m in the midst of a Mother’s Day dinner with my own family so I had ditched my phone… just moments before the employee called me from the remote office. I missed the call and the employee left a voicemail expressing a state of confusion over which server to power cycle. The organization is small and only has two servers. One is the SBS machine and the other is a HP MicroServer that is used as a network monitoring station and catchall for various extraneous services. I had assumed that over the years everyone had each server’s role understood by sight so I simply asked him to power cycle the SBS server, expecting that it would be known which piece of hardware that was. The fellow power cycled both servers since he couldn’t get in touch with me directly.

Okay, no big deal. The MicroServer is just running CentOS and OpenNMS. They’re resilient and can handle a sudden shutdown. As I listened to that voicemail, I checked to see if I could remotely connect to the server that had been down all night. I couldn’t. Great. Time to call the office and talk to the person who was on site and see what else could be done. Except the voicemail had been left over an hour ago and the employee had naturally left shortly after power cycling the server. I called his cell phone back, but he’s didn’t pick up. I left a voicemail.

A little later that Sunday I get in touch with another employee in the area who lives closer. He’s on his way out to pick up Mother’s Day dinner for his wife and can swing by to check out the server. First, I have him power cycle it again. Maybe the first guy just clicked the power button and didn’t hold it in? I held out hope for such a simple explanation. However, after I instructed this second person on how to make sure the server had shut down and then powered up, I waited for the duration of the standard bootup but nothing was showing up. It became apparent that the server was not coming back online.

“Do you know where the Spider is?” I asked hopefully. “No, I dunno where the other guy put it.” Gah! The Spider is a well known piece of equipment in that office, and it’s very rare that it can’t be found. I was about to concede defeat for that Sunday when, after some searching, the employee found the Spider. A few minutes of scrambling around and he had the thing hooked up to the server. Except… now I couldn’t get to the Spider. The fellow had to leave to pick up dinner and I wasn’t about to ruin his family’s Mother’s Day so I told him I’d see what I could do remotely, expecting nothing to be successful.

In the process of hooking up the Lantronix Spider, the employee had pulled the network cable out of the server and put it into the Spider. Then from the spider’s cascade port (it’s essentially a one port switch) he had connected a patch cable to the server’s LAN port. That made me wonder… perhaps it was a port on the ProCurve switch that was bad? That would explain both the server and now the Lantronix Spider being inaccessible. Or maybe the port spontaneously shut down as a result of some bug. Crazier things have happened.

I browsed to the switch’s management interface. “Please enter your username and password!” Okay, no problem! “Wait… I can’t remember what the password is… NOOOOOO!” The organization uses KeePass to store important passwords and software keys. The KeePass file is on the server. The server that is down.

But wait! I have a copy of the keepass databases on my own storage. Once a month or so I copy the files to my local storage so that I have an in-sync copy just in case. Whew! I find the switch’s login credentials and begin inspecting things. I looked, hoping for some bad news concerning the switch’s health (at least that would mean the server was okay), but the switch looked perfect. Nothing was amiss.

I’ve always been told to troubleshoot network problems from the lowest layer first. I had pretty much ruled out the physical layer. Layer 2 seemed healthy. Not much that can go wrong on a small, single subnet LAN. Layer three, IP… IP addresses… I gritted my teeth. I knew what the problem was. The Lantronix Spider is set to pick up an address via DHCP. Specifically it’s a DHCP reservation on the network’s DHCP server. The server that’s down. I wanted the network layer benefits of a static IP address, however I also wanted it to be easily portable between networks. My original idea was that the Spider could be used to support PCs on other LANs, like perhaps workers that were based in their home office that didn’t come into the organization’s building very often. With the Spider getting an IP address via DHCP, I could just tell someone to take it home with them and I’d only be left with walking them through configuring port forwarding, or getting TeamViewer set up on a PC on their LAN so I could get in and access the Spider via a local web browser. Except now the Spider was barking out forlorn DHCP discover packets and not getting any response back.

I fired up Network Monitor on an office PC to be sure. Yep, there it was. A DHCP discover request broadcasting every sixty seconds or so. Okay, I can handle this. The small office has a SonicWall firewall that has DHCP services on it. I only need to enable them, check its list of leases to find what IP address it was given, and I’ll be good! I mosey my web browser on over to the firewall’s administrative page. I stare at it. It wants the password for the admin user. “Password… password… I had to change it a few weeks ago. What did I choose…”

Oh well, I’ll look in the organization’s copied password file that I keep on my local storage! Yay foresight!! I found the firewall admin password and entered it. “Password Failure. Please Retry.” What?! Then I remembered that I had changed the firewall password due to security policy about two weeks ago. However, I hadn’t copied the organization’s password file to my local storage in a month. I had the old password in my copy of the password file, but not the new one. The new one was on the server that was currently down. Backups are taken every few hours, but a restoration needs to be done on functioning hardware. Super.

So that means I did it again. I couldn’t log in to the interface because I didn’t have the long password committed to memory. For super important passwords like that, I do keep a disaster recovery hard copy around. It’s essentially a few pages spelling out the most important usernames and password for the organization. However, only two people have that physical copy of information. While I could call them up and have them read off the password to me, I wasn’t ready to do that.

Instead, I turned to the HP MicroServer running CentOS 6. I have OpenNMS installed on it and have plans to install some ticketing software and maybe smokeping or M/Monit. Now, however, it’s going to be an impromptu DHCP server. Fortunately I can remember the password for the MicroServer! A quick ‘yum install dhcpd’ later and… “Couldn’t resolve host ‘centos-distro.cavecreek.net’” WHAT DEVILRY IS THIS?! But of course; DNS for the network is performed by the SBS server… which is down. After facepalming, I changed resolv.conf to point to OpenDNS and continued my march towards a functioning DHCP server on the network. After a few minutes I have dhcpd running and it quickly hands out a lease to the Spider.

And it was then that I saw it. After logging into the Spider, I viewed the remote console and saw a Windows installation screen on the server. Suddenly, I remembered what happened. In the process of preparing for a migration away from the failing hardware, I needed to experiment with making an unattended installation file. I had a remote worker put the SBS 2008 install CD in the main server’s tray. Of course, rebooting caused the server to boot into the high boot priority CD drive. I sat in horror, thinking about my cascade of failures. Nevertheless, that wasn’t the time to flail in self loathing. I simply needed to hit “cancel” and get out of the installation welcome screen to boot from the hard drive.

Except the Spider was unable to interact with the server as a remote keyboard or mouse. I’ve used the Spider on that very server in the past, and it worked great at all stages of the boot process. In the years that I’ve worked with that office I’ve had to check BIOS settings, ILO firmware settings, and storage controller settings, all using either the Spider or the ILO itseld. But now, for some unexplained reason, the Spider was not able to input anything. I couldn’t move the mouse, I couldn’t press keys. So I sat and stared at the remote video in complete disbelief.

It was a simple matter of leaving a voicemail for someone and telling them to remove the disc from the DVD drive the next time they were in the office. The next morning the worker that I left a message for did just that, power cycled the server, and it booted up as normal. Life continued.

I was abashed.

More about my conclusions concerning the situation later. In the mean time, got a similar story to share? Let me know in the comments below or contact me and you can write a guest blog post about it.

by Wesley David at May 17, 2013 03:38 AM

May 16, 2013

Ubuntu Geek

How to Install Cinnamon 1.8 on ubuntu 13.04

Cinnamon is a user interface. It is a fork of GNOME Shell, initially developed by (and for) Linux Mint. It attempts to provide a more traditional user environment based on the desktop metaphor, like GNOME 2. Cinnamon uses Muffin, a fork of the GNOME 3 window manager Mutter, as its window manager from Cinnamon 1.2 onwards
(...)
Read the rest of How to Install Cinnamon 1.8 on ubuntu 13.04 (249 words)


© ruchi for Ubuntu Geek, 2013. | Permalink | No comment | Add to del.icio.us
Post tags: , ,

Related posts

by ruchi at May 16, 2013 11:54 PM

Standalone Sysadmin

Busy, Busy, Busy

I might not notice it at the time, but I can always tell how busy I am by how many blog posts I manage to get live. By my count, I've been doing about one every eight days so far this month (if you count this one). So I'm behind :-) So what's been going on?

LOPSA-East

But I've been doing good, fun things. For instance, on May 3rd and 4th, I went to LOPSA-East, which was yet another really great conference. There was somewhere around 150 attendees this year, and it was really nice to see everyone again from previous years.

Way back in October of 2011 (were some of you even born then?), I asked about a class on SSDs, to see if there was any interest. Well, in October of 2011, the earliest I could have done it was spring of 2012, and didn't get around to finishing the course before then, so spring of 2013 it was, and I taught the SSD class on Saturday afternoon. Only three years in the making. That's cool, right? :-D

If you were in my class, you probably have the slides from the USB key. If you weren't in my class, then you'll be happy to know that since I don't really intend to teach the class again (although if my feedback is overwhelmingly positive, I'll consider it), I opted to have it recorded, and whenever that goes live, I'll be linking to it from here and including my full slide deck, too.

Storage Field Day

At the end of April, I went to Denver to do Storage Field Day. I haven't had a chance to write about the things I saw yet, but I'm very excited to talk about what we saw with Pernix Data. If you want to see some cool ideas, watch the videos there. I'll write more as soon as I get time.

LOPSA stuff

We're still in the swing of the election season. You might have seen when I updated my earlier post that the LOPSA Live transcript had been posted. That was the first of two candidate sessions. The other is tonight at 9pm, so follow the instructions by Aaron Sachs for connecting to #LOPSA-Live on Freenode and come ask the candidates good, hard questions.

The election is coming up next month. I've posted my series of discussions on internal concerns (including membership numbers, member communications, and operational transparency. Starting tomorrow, I'm going to start posting discussions related to external concerns - we have a lot of problems with marketing and how we're seen externally...when we're seen at all. Make sure to watch for those blog entries, too.

LISA Training

I haven't posted anything about it here, but I'm working with Dan Klein to help get training ideas for LISA'13. For the past several years, I've been involved as a blogger at the LISA conference (along with Ben Cotton, Marius Ducea, Greg Riedesel, and many others. I'm planning on continuing that for as long as they'll have me, but it's also nice to be able to contribute to the program in some small way, too. This means that if there's training that you think LISA should have, but doesn't, let me know and I'll do my best to figure out how we can have it.

Actual, "I get paid to do this" work stuff

At work, we've been doing all kinds of things. I've now got a production vSphere cluster, a new Nimble storage box, I'm trying desperately to get new gear for my core switch (I'm going with a pair of Nexus 5548s and six FEX to go along), and I need to order more five or six server racks to replace some of the ones we have now.

I continue to be mystified by the way that academia works. Specifically, budgeting and deadlines. For reasons that I'm unable to fathom, in order to get things on this year's budget, I have to order hardware and have it delivered and in my space by the end of June. Not, "ordered and paid for". Ordered, delivered, and in my space. I've thought about it, and I can't come up with any kind of compelling reason for this rule. Anyone with more experience in academia than I have want to weigh in? I'm at a loss.

Personal Stuff

I've finally bit the bullet and decided to get LASIK.

I'm in a large-ish metro area now, and the technology has been continually developing for a couple of decades, and I think it's matured to the point where I'm cool with people cutting my eye open and burning part of it away using lasers. I can't be 100% about technology enhancing our lives unless I walk the walk and take advantage of it, so I'm doing it.

I went in last week for my "free consultation", which determined that I was an excellent fit for normal "LASIK" surgery. If my cornea had been too thin, I guess I could have gotten either LASEK or PRK, both of which work well but have a longer healing and recovery time. Turns out my cornea is just fine.

Also, can I just say - they have the coolest eye equipment I've ever seen there. I've worn glasses or contacts since elementary school, and I've lived in a dozen cities or so since then, so I've seen my share of optometry equipment, but man, the toys the LASIK guys have are nuts. I'm practically blind, so when they said, "take off your glasses and look in this machine, and you'll see a hot-air balloon", I thought, "please, I'll be lucky to see a blurry light". Sure enough, looking into the machine, it was blurry...for a second. Then, like a camera, it "autofocused" and just like that, they had nearly my exact prescription. Awesome!

So the whole "lasering my eyeballs" thing is happening tomorrow afternoon. I honestly can't wait. I've been thinking about it for years, and having it this close is really exciting. I'll make sure to update early next week with the results.

So there you go. That's what I've been up to. I'll try to get back to posting more regularly, and maybe even on topics that you care about! Wouldn't that be exciting? ;-)

We'll see. Thanks!

by Matt Simmons at May 16, 2013 10:22 AM

Aaron Johnson

Chris Siebenmann

Why ZFS's CDDL license matters for ZFS on Linux

In a G+ conversation about ZFS I read the following:

[...] so, why use BTRFS at all? :-) Just the fact that it's GPL (and so able to be embedded into the kernel source tree) doesn't seem enough, specially considering that CDDL (the ZFS license) is a bona fide open source license, [...]

On the whole I like ZFS on Linux, but let's not mince words here: this licensing issue is a big issue. Were btrfs and ZFS close to general parity, it would be a very strong push towards btrfs.

That ZFS is CDDL licensed means that it can never be included in the Linux kernel source. It may mean that it can't be prepackaged in binary form by distributions, or at least by distributions that care strongly about licensing issues. The CDDL is part of what makes it extremely unlikely that Red Hat Enterprise or Ubuntu LTS will ever officially support ZoL, making it always be a 'batteries not included, you get to integrate it' portion of the system.

That ZFS will not be included in the Linux kernel source (because of the CDDL among other reasons) means that you are more at risk of developers ceasing to update ZFS for newer kernels (among other less important effects).

(Being in the Linux kernel source is no guarantee that code will be maintained, but it increases the chances a fair bit.)

These are risks that we'd be willing and able to take on, so they aren't real obstacles for us using ZoL if that turns out to be the best option for new fileservers. But they still weigh on my mind and there are any number of places where they are going to be real issues, sometimes killer ones.

(I've written about this before.)

(Given the current situation with 4k disks, we're already looking at recreating pools when we move them to a new fileserver infrastructure. At that point we could just as easily migrate from ZFS to something else, if the something else was good enough. Btrfs currently does not qualify.)

by cks at May 16, 2013 05:17 AM

May 15, 2013

UnixDaemon

Facter 1.7+ and External facts

While Puppet may get all the glory, Facter, the hard working information gathering library that can, seldom gets much exciting new functionality. However with the release of Facter 1.7 Puppetlabs have standardised and included a couple of useful facter enhancements that make it easier than ever to add custom facts to your puppet runs.

These two improvements come under the banner of 'External Facts'. The first allows you to surface your own facts from a static file, either plain text key value pairs or a specific YAML / JSON format. These static files should be placed under /etc/facter/facts.d


$ sudo mkdir -p /etc/facter/facts.d

# note - the .txt file extension
$ echo 'external_fact=yes' | sudo tee /etc/facter/facts.d/external_test.txt
external_fact=worked

$ facter external_fact
worked

At its simplest this is a way to surface basic, static, details from system provisioning and other similar large events but it's also an easy way to include details from other daemon and cronjobs. One of my first use cases for this was to create 'last_backup_time' and 'last_backup_status' facts that are written at the conclusion of my backup cronjob. Having the values inserted from out of band is a much nicer prospect that writing a custom fact that parses the cron logs.

If that's a little too static for you then the second usage might be what you're looking for. Any executable scripts dropped in the same directory that produce the same output formats as allowed above will be executed by facter when it's invoked.


# scripts must be executable!
$ sudo chmod a+rx /etc/facter/facts.d/process_count

$ cat /etc/facter/facts.d/process_count
#!/bin/bash

count=$(ps -efwww | wc -l | tr -s ' ')
echo "process_count=$count"

$ facter process_count
209

The ability to run scripts that provide facts and values makes customisation easier in situations where ruby isn't the best language for the job. It's also a nice way to reuse existing tools or for including information from further afield - such as the current binary log in use by MySQL or Postgres or the hosts current state in the load balancer.

While there have been third party extensions that provided this functionality for a while it's great to see these enhancements get included in core facter.

Like this post? - Digg Me! | Add to del.icio.us! | reddit this!

May 15, 2013 11:29 PM

SysAdmin1138

Yes, that happens

We all know it can happen, a BIOS update of some kind bricks whatever just got flashed, but it's one of those things you hope happens to other people first so you know not to go there. It happened to me recently, which got me thinking about continuous deployment from a hardware POV. Hardware being what it is, hard, you can't iterate and roll-back the way you can do software. There is no such thing as Vagrant for Embedded Systems that I've found!

The problem of, "when do I update the firmware for my server," is one that faces anyone with a physical infrastructure. There isn't really a globally accepted best-practice for this one, though the closest I can find is:

If the vendor lists the update as critical, apply it.
If you're experiencing one of the problems listed in the fixes, apply it.
If vendor tech-support tells you to apply it, apply it.
Otherwise, don't apply it.

But only apply it to a test device first to verify it actually fixes the problem. Then roll it out.

Doing so pro-actively is kind of risky, and only really useful in repurposing scenarios. Also, this 'best practice' assumes you have identical hardware to actually test with. Which a lot of us don't, and often can't due to slight differences between servers of the same model.

So. For those of us who are working on infrastructures either small enough to not be able to afford test hardware, or diverse enough that there is no such thing as a common class of machine, what are we to do?

Hope, mostly, and trust in your vendor support contracts to ship you new hardware in case you get a brick.

Or, trust in your redundancies and treat new-firmware-updates like a lost-server outage. If you get a brick, you're still within your failure tolerance and know not to go there for the rest of 'em. This is the approach we ended up taking, and it worked. We were running without our scale-test environment for a few days but production was unaffected until we could unbrick the affected machines.

In our case I suspect we had a v1.0 hardware revision, and the newest firmware was only backwards compatible for v1.0a and newer or something. I don't have proof of this, but that's what it feels like. Of course, this eventuality was not mentioned in the release-notes anywhere. Thus, testing.

by SysAdmin1138 at May 15, 2013 07:36 PM

Simplehelp

How to “Split” the iPad Keyboard

iPad

This very brief tutorial will show you how to ‘split’ the keyboard on your iPad so that you can type more comfortably using just your thumbs.

Follow the simple steps below to enable this lesser known feature (hat tip to Eric Hogg).

  1. Open any App that uses the keyboard. Put your thumbs on the middle of the keyboard and ‘swipe’ outwards.

  2. click to enlarge

  3. Now your keyboard is split into two halves – making it much easier to type using only your thumbs – which also makes it easier to hold your iPad while typing.

  4. click to enlarge

  5. To revert to the normal/default keyboard, simply swipe the keyboards back together.

  6. click to enlarge

by Ross McKillop at May 15, 2013 06:11 PM

Google Webmasters

Using schema.org markup for organization logos

Webmaster level: all

Today, we’re launching support for the schema.org markup for organization logos, a way to connect your site with an iconic image. We want you to be able to specify which image we use as your logo in Google search results.

Using schema.org Organization markup, you can indicate to our algorithms the location of your preferred logo. For example, a business whose homepage is www.example.com can add the following markup using visible on-page elements on their homepage:

<div itemscope itemtype="http://schema.org/Organization">
  <a itemprop="url" href="http://www.example.com/">Home</a>
  <img itemprop="logo" src="http://www.example.com/logo.png" />
</div>

This example indicates to Google that this image is designated as the organization’s logo image for the homepage also included in the markup, and, where possible, may be used in Google search results. Markup like this is a strong signal to our algorithms to show this image in preference over others, for example when we show Knowledge Graph on the right hand side based on users’ queries.

As always, please ask us in the Webmaster Help Forum if you have any questions.

by Google Webmaster Central (noreply@blogger.com) at May 15, 2013 02:52 PM

Google Blog

Live from Google I/O: Mo’ screens, mo’ goodness

This morning, we kicked off the 6th annual Google I/O developer conference with over 6,000 developers at Moscone Center in San Francisco, 460 I/O Extended sites in 90 countries, and millions of you around the world who tuned in via our livestream. Over the next three days, we’ll be hosting technical sessions, hands-on code labs, and demonstrations of Google's products and partners' technology.

We believe computing is going through one of the most exciting moments in its history: people are increasingly adopting phones, tablets and newer type of devices. And this spread of technology has the potential to make a positive impact in the lives of people around the world—whether it's simply helping you in your daily commute, or connecting you to information that was previously inaccessible.

This is why we focus so much on our two open platforms: Android and Chrome. They enable developers to innovate and reach as many people as possible with their apps and services across multiple devices. Android started as a simple idea to advance open standards on mobile; today it is the world’s leading mobile platform and growing rapidly. Similarly, Chrome launched less than five years ago from an open source project; today it’s the world’s most popular browser.

In line with that vision, we made several announcements today designed to give developers even more tools to build great apps on Android and Chrome. We also shared new innovations from across Google meant to help make life just a little easier for you, including improvements in search, communications, photos, and maps.

Here’s a quick look at some of the announcements we made at I/O:

  • Android & Google Play: In addition to new developer tools, we unveiled Google Play Music All Access, a monthly music subscription service with access to millions of songs that joins our music store and locker; and the Google Play game services with real-time multiplayer and leaderboards. Also, coming next month to Google Play is a special Samsung Galaxy S4, which brings together cutting edge hardware from Samsung with Google’s latest software and services—including the user experience that ships with our popular Nexus devices.
  • Chrome: With over 750 million active users on Chrome, we’re now focused on bringing to mobile the speed, simplicity and security improvements that we’ve seen on the desktop. To that end, today we previewed next-generation video codec VP9 for faster video-streaming performance; the requestAutocomplete API for faster payments; and Chrome Experiments such as "A Journey Through Middle Earth" and Racer to demonstrate the ability to create immersive mobile experiences not possible in years past.
  • Google+: We unveiled the newly designed Google+, which helps you easily explore content as well dramatically improve your online photo experience to give you crisp, beautiful photos—without the work! We also upgraded Google+ Hangouts—our popular group video application—to help bring all of your real-life conversations online, across any device or platform, and with groups of up to 10 friends.
  • Search: Search has evolved considerably in recent years: it can now have a real conversation with you, and even make your day a bit smoother by predicting information you might need. Today we added the ability to set reminders by voice and we previewed “spoken answers” on laptops and desktops in Chrome—meaning you can ask Google a question and it will speak the answer back to you.
  • Maps: Today we previewed the next generation of Google Maps, which gets rid of any clutter in order to put your individual experience and exploration front and center. Each time you click or search, our technology draws you a tailored map that highlights the information you need. From design to directions, the new Google Maps is smarter and more useful.

Technology can have a profound, positive impact on the daily lives of billions of people. But we can’t do this alone—developers play a crucial role. I/O is our chance to come together and thank you for everything you do.

by Emily Wood (noreply@blogger.com) at May 15, 2013 12:48 PM

Aaron Johnson

Chris Siebenmann

Why I've so far been neglecting functional programming languages

Functional programming languages are in many ways the latest hotness and so for years I've been making off and on runs at things like yet another explanation of monads (which I think I sort of understand by now) and similar topics. Despite this, so far I've been almost completely uninterested in actually trying to write a functional program or exploring a FP language.

The big problem for me is that as far as I can tell, the kind of programs I usually work with are exactly the kind of programs that functional programming is stereotypically a bad fit with. The stereotype I've absorbed is that functional programming is quite a good fit for computation but not a good fit for IO, because IO intrinsically has side effects. Unfortunately most of what I write is all about IO and has little or no computation. Bashing a squarish peg into a roundish hole is unlikely to tell me anything particularly meaningful about nice the language is to work in; what I really need is a roundish peg, a computational problem, and those are relatively scarce around here.

(It's possible that I'm not looking hard enough. For example, I do periodically want to do things like log analysis or event reassembly, where the original data could just as well be a predefined data structure in the program instead of processed from logfiles on disk. I suspect that a functional language would handle these fine, maybe better than ad-hoc hackery in awk, Python, or whatever. If I was really crazy I would try rewriting the logic in our ZFS spares handling system in an FP language to see if it got clearer; it's fundamentally a series of transformations of a tree and then some analysis of the result. The result might even be more testable.)

by cks at May 15, 2013 04:57 AM

May 14, 2013

Ubuntu Geek

my other pc is a cloud

My Entry for the Advanced Event #3 of the 2013 Scripting Games

Halfway done.  Here's my third entry for this year's Powershell games.  I used a workflow this time, mostly in an attempt to garner favor from the voters for using new features exclusive to PS3.  Even though the multithreading with jobs that I did in the last event is a neat idea, it really doesn't perform very well.  The workflow will likely perform better, though I don't know if it's going to handle the throttling of thread creation if I handed it a list of 500 computers.

#Requires -Version 3
Function New-DiskSpaceReport
{
	<#
		.SYNOPSIS
			Gets hard drive information from one or more computers and saves it as HTML reports.
		.DESCRIPTION
			Gets hard drive information from one or more computers and saves it as HTML reports.
			The reports are saved to the specified directory with the name of the computer in
			the filename. The list of computers is processed in parallel for increased speed.
			Use the -Verbose switch if you want to see console output, which is very useful if you
			are having problems generating all the desired reports.
		.PARAMETER ComputerName
			One or more computer names from which to get information. This can be a
			comma-separated list, or a file of computer names one per line. The alias
			of this parameter is -Computer. The default value is the local computer.
		.PARAMETER Directory
			The directory to write the HTML files to. E.g., C:\Reports. The directory
			must exist. The default is the current working directory.
		.INPUTS
			[String[]]$ComputerName
			This is an array of strings representing the hostnames of the computers
			for which you want to retrieve information. This can also be supplied by
			(Get-Content file.txt). This can be piped into the cmdlet.
		.INPUTS
			[String]$Directory
			The directory to save the HTML reports to. The directory must exist.
		.OUTPUTS
			HTML files representing the information obtained from all
			the computers supplied to the cmdlet.
		.EXAMPLE
			New-DiskSpaceReport
			
			This will generate a report for the local computer and output the HTML file to
			the current working directory.			
		.EXAMPLE
			New-DiskSpaceReport -ComputerName server01,server02,server03 -Directory C:\Reports
			
			This will generate three HTML reports for the servers and save them in the C:\Reports
			directory.
		.EXAMPLE
			New-DiskSpaceReport -Computer (Get-Content .\computers.txt)
			
			This will generate HTML reports for all the computers in the computers.txt file and
			save the reports in the current working directory.
		.EXAMPLE
			,(Get-Content .\computers.txt) | New-DiskSpaceReport -Directory C:\Reports
			
			This will generate HTML reports for all the computers in the computers.txt file and
			save the reports in C:\Reports. Please note the leading comma in this example.
		.NOTES
			Scripting Games 2013 Advanced Event 3
	#>
	[CmdletBinding()]
	Param([Parameter(ValueFromPipeline=$True)]
			[Alias('Computer')]
			[String[]]$ComputerName = $Env:Computername,
		  [Parameter()]
			[ValidateScript({Test-Path $_ -PathType Container})]
			[String]$Directory = (Get-Location).Path)
	
	Write-Verbose -Message "Writing reports to $Directory..."
	
	WorkFlow BuildReports
	{
		Param([String[]]$Computers, [String]$Directory)
		ForEach -Parallel ($Computer In $Computers)
		{			
			InlineScript
			{				
				Write-Verbose -Message "Generating report for $Using:Computer..."
				$Header = @'
				<title>Disk Free Space Report</title>
				<style type=""text/css"">
					<!--
						TABLE { border-width: 1px; border-style: solid;  border-color: black; }
						TD    { border-width: 1px; border-style: dotted; border-color: black; }
					-->
				</style>
'@
				$Pre  = "<p><h2>Local Fixed Disk Report for $Using:Computer</h2></p>"
				$Post = "<hr><p style=`"font-size: 10px; font-style: italic;`">This report was generated on $(Get-Date)</p>"
				Try
				{					
					$LogicalDisks = Get-WMIObject -Query "SELECT * FROM Win32_LogicalDisk WHERE DriveType = 3" -ComputerName $Using:Computer -ErrorAction Stop | Select-Object -Property DeviceID,@{Label='SizeGB';Expression={"{0:N2}" -F ($_.Size/1GB)}},@{Label='FreeMB';Expression={"{0:N2}" -F ($_.FreeSpace/1MB)}},@{Label='PercentFree';Expression={"{0:N2}" -F (($_.Freespace/$_.Size)*100)}};
					$LogicalDisks | ConvertTo-HTML -Property DeviceID, SizeGB, FreeMB, PercentFree -Head $Header -PreContent $Pre -PostContent $Post | Out-File -FilePath $(Join-Path -Path $Using:Directory -ChildPath $Using:Computer`.html)
					Write-Verbose -Message "Report generated for $Using:Computer."
				}
				Catch
				{
					Write-Verbose -Message "Cannot build report for $Using:Computer. $($_.Exception.Message)"
				}
			}
		}
	}
	
	If($PSBoundParameters['Verbose'])
	{
		BuildReports -Computers $ComputerName -Directory $Directory -Verbose
	}
	Else
	{
		BuildReports -Computers $ComputerName -Directory $Directory
	}
}

by ryan@myotherpcisacloud.com at May 14, 2013 02:09 PM

Chris Siebenmann

My language irritations with Go (so far) and why I'm wrong about them

The great thing about an evolving language is that if you're slow enough about writing up your irritations with it, some of them can wind up fixed (or part fixed). So this list is somewhat shorter than it was when I originally wrote my first Go program, and none of the irritations are major. Also, I will reluctantly concede that Go has good engineering reasons for all of them.

My largest single irritation is that break acts on switch and select; I expected it to act only on any enclosing control structure, so that you could write something like:

for {
   select {
   case <-mchan:
      // message silently swallowed
   case <-schan:
      break
}     

Instead you have to invent a boolean loop condition. I understand why Go does this; it enables you to exit early out of a switch or select case instead of having to wrap everything in ever increasing levels of nesting. This is likely especially important because Go uses explicit error checking (which would otherwise force those nested if blocks).

The issue that got partially fixed is Go's return requirements. When I wrote the original version of my program the natural form of one function was a big switch with a number of specific cases and then a default: to catch the rest; however, the original rules required a surplus return at the end of the function, which irritated me by forcing me to move the default case to the end of the function, obscuring the logic. The Go 1.1 changes make my particular case okay but I believe there remain cases where you need an unreachable ending return (or panic) to make the compiler happy.

You can make an argument that the original and current state of affairs are good software engineering. If the compiler did true reachability analysis it'd increase the number of cases where an innocent looking change to some part of the code would suddenly make the return coverage not be complete and thus produce potentially odd messages about missing returns. The current brute force rules protect against this and lead Go programmers to write in a certain sort of consistent style.

My final issue is my perennial one of being unable to cleanly cancel IO being done by goroutines, breaking them out of things so that they can see a death signal from outside. You can argue that this is a bug in the runtime, but the problem with this is that everything that calls an IO operation then needs to be aware of this particular error case (and catch it, and propagate it up the call stack in whatever way is appropriate). A good start to making it a bug in the runtime would be for the runtime to define a specific error for 'IO attempted on closed connection' and for absolutely everything to use it.

(As it stands, the net package doesn't even define a publicly visible error instance for this case, although it does define one internally. It's my personal view that this beautifully illustrates why this is a general language problem; while you can 'solve' it in code, it requires absolutely everyone to get it right and, well, they clearly don't.)

Again this is a software engineering tradeoff. Both the semantics and the runtime implementation of goroutines are undoubtedly vastly simplified because you don't have to worry about being able to signal or cancel a goroutine from outside itself. Outside of the program exiting, all of the interaction that a goroutine has with the outside world are initiated by itself, on its own terms. This makes it much easier to reason about the effects of a goroutine, especially if it's careful not to use global state.

by cks at May 14, 2013 03:39 AM

May 13, 2013

Linux Poison

eBook - Understanding the Linux Virtual Memory Manager


"Understanding the Linux Virtual Memory Manager"

Finally, a comprehensive guide to the Linux VM! This book describes VM in unprecedented detail, presenting both theoretical foundations and a line-by-line source code commentary.

VM's behavior affects every Linux kernel subsystem and dramatically impacts overall performance. But until now, there was only one way to understand VM: study the poorly documented source one line at a time. Now there's an easier, faster alternative. It systematically covers everything from physical memory description to out-of-memory management. Coverage includes:

 * Linux VM 2.4 architecture in depth-with diagrams and call graphs
 * Physical memory description, page tables, address spaces, and memory allocation
 * High memory, swapping, shared memory, and much more
 * Expert guidance for analyzing the code of any open source project
 * New Linux 2.6 kernel features in every chapter

Well organized and superbly written, Understanding the Linux Virtual Memory Manager will be indispensable to every kernel programmer and researcher.

Download your free copy of "Understanding the Linux Virtual Memory Manager" - here



by noreply@blogger.com (Nikesh Jauhari) at May 13, 2013 04:06 PM

Managing Product Development

Individuals and Interactions With Gil Broza

My friend and colleague, Gil Broza, is interviewing me for his Individuals and Interactions virtual training event. My topic? “Focus Keeps You Going.”

If you read my personal kanban series a couple of weeks ago, you saw how my focus kept me going. Even with a big interruption last week, due to a death in the family, I was able to maintain my focus, because I knew exactly what I had to do, to finish my work, to get ready for my trip today.

Gil has other great people in his event: Doc List, Ellen Gottesdiener, Mary Gorman, David Spann, Christopher Avery and Bob Schatz might be names you recognize. How about Rick Ross? David Spann? Caren DesBrisay? You might not recognize these names, and you should listen to what they have to say, too.

Check out Gil’s Individuals and Interactions training. Sign up. It’s a steal.

by Johanna Rothman at May 13, 2013 01:07 PM

Chris Siebenmann

The Unix philosophy is not an end to itself

Today I feel like opening a can of worms that I've alluded to before.

Here is something very important about the Unix philosophy (regardless of what exactly that is): the Unix philosophy was not conceived as an empty philosophy that was an end to itself. Instead it is above all a theory about how to make computers easy, powerful, and useful. This philosophy (or at least the things built by people following it at Bell Labs and elsewhere) has been extraordinarily successful, and I'm not just talking about Unix; concepts first pioneered in Unix and C now form core pieces of pretty much every computer system in the world.

But it's possible to take this too far. To put it one way, it's my strong view that the core goal of Unix is to be useful, not to be philosophically pure. The underlying purpose comes first and fitting how to be useful into 'the Unix way of doing things' comes second. If Unix has to be non-Unixy for a while (or even permanently) in order to be useful, then, well, I pick usefulness. Excessive minimalism and 'Unixness' for the sake of minimalism and Unixness is a kind of masochism.

(Of course the devil is in the details, as it always is. It's certainly possible to ruin Unix without getting anything worth it in exchange.)

What this biases me towards is an environment where one solves the problem first then try to make it fit into the traditional 'Unix way' second. Which is why part of me thinks that GNU sort's -h option is perfectly fine because it solves a real problem (and solves it now).

(The counterargument is that Unix cannot be all things to all people. As with all systems, at some point you have to draw a line and say 'this doesn't fit, you need to go elsewhere'. I don't know how to balance this. I do know that a certain amount of griping about 'the one true Unix way' and how (some) modern Unixes are ruining it reminds me an awful lot of the griping of Lisp adherents at the rise of Unix, and for that matter the griping of Unix people (myself sometimes included) at the rise of Windows and Macs.)

by cks at May 13, 2013 04:30 AM

bc-log

Removing Files and Directories with rm and rmdir

Normally on this blog I tend to write about more complicated tasks or fancy Linux tricks and completely overlook some of the most basic tasks that a SysAdmin needs to know. Today I have decided that I will make my blog a little more comprehensive and add some posts with some of the basics.

Along with this I will be starting a new category, called Sysadmin Basics and I will try to post an additional article each week that covers some of the more basic concepts and commands used by Linux and Unix Sysadmins.

Remove Directories with the rmdir command

The rmdir command is used to delete and remove empty directories. I bolded empty as it is important to note that rmdir will only remove a directory if there are no files within that directory. If you want to remove a directory and all files within that directory, skip down to the rm section of this article.

Remove a single empty directory

# rmdir somedir/

Remove multiple empty directories (in a single tree)

# rmdir -p somedir/a/b/c/d/e/f/whoa

While rmdir will not remove directories with files in it; rmdir will recursively remove a directory tree that has no files. In the example somedir only has directory a within it, and the a directory only has b which only has c and so on.

Remove multiple empty directories

The above command will also fail if there are multiple directories in one single directory, to handle that scenario you can list the directories individually and include the –ignore-fail-on-non-empty flag.

# rmdir --ignore-fail-on-non-empty -p somedir/a/b/c/ somedir/a2/b2/

Without the –ignore-fail-on-non-empty flag the command will still print that somedir is not empty even though it removes somedir. This is due to the fact that both command line arguments ask rmdir to remove somedir and rmdir cannot remove that directory until the last step.

Removing Files and Directories with the rm Command

While the rmdir command is solely for directories the rm command can remove both files and directories. With the right combination of flags rm will also remove entire directories, files and all.

Remove a file

# rm a-file
 rm: remove regular empty file `a-file'? y

On it’s own rm will not prompt a user before removing a file; to keep systems safe from accidental file removals some distributions of Linux will ship with an alias for rm with the default .bashrc file. This alias gives the interactive (-i) flag for rm, this tells rm to prompt the user before removing files and directories.

# alias
alias rm='rm -i'

Remove a file without being prompted

While you can simply unalias the rm alias, a simplier and generally used method to remove files without being prompted is to add the force (-f) flag to the rm command. It is advisable that you only add the force (-f) flag if you really know what you are removing.

# rm -f b-file

Remove a file without being prompted and with verbosity

If you don’t want to be prompted for each file removable but also want to keep an eye on rm in case the command starts removing unexpected files, you can simply add the verbose (-v) flag.

# rm -fv c-file
 removed `c-file'

Remove multiple files

There are many ways to remove multiple files, one method is to simply list each file you want to remove.

# rm -f a-file b-file

Removing multiple files with a wildcard

The bash command line supports wildcards and regex statements. A simplier way to remove all files that end in the word file is to simply state *file. I suggest being cautious with wildcards as it is entirely possible to remove a file without meaning to.

# rm -f *file

Remove files using a regex

Another common method of deleting files is to use regex statement, the below would remove anything that looks like files-0 through files-9 but would not remove files-a or files-list.

# rm -f files-[0-9]

Remove a directory and all of it’s contents with rm

If you want to simply remove an entire directory and all of the contents within, including both files and directories the easiest method is to add the recursive (-R) flag to rm. If you are in any way unsure of what you are doing than drop the force (-f) and replace it with verbose (-v) or interactive (-i).

# rm -Rf somedir/
Tags: , , , , , ,

by Benjamin Cane at May 13, 2013 04:16 AM

Debian Admin

Debian 7.0 (Wheezy) Desktop Installation Screenshots

After many months of constant development, the Debian project is proud to present its new stable version 7.0 (code name "Wheezy").
This new version of Debian includes various interesting features such as multiarch support, several specific tools to deploy private clouds, an improved installer, and a complete set of multimedia codecs and front-ends which remove the need for third-party repositories.

This Tutorial will show you Debian 7.0 (Wheezy) Desktop Installation process Screenshots

by ruchi at May 13, 2013 12:51 AM

May 12, 2013

Ubuntu Geek

witalis

clear ip nat selectively

On cisco router device you can clear all ip nat translations once doing:

Router#clear ip nat translation *

but when you try remove only one translation you have to write long command i.e.

Router#clear ip nat translation udp inside <ip> <port> <ip> <port> outside <ip> <port> <ip> <port>

which also cannot be easy cut and past from show command result. Fortunetly in cisco device there is a TCL shell, activated by:

Router#tclsh

Sample script to clear ip nat selectively:

proc clearnat {x} {
set result [exec {sh ip nat translation}]
set data [split $result "\n"]

foreach item $data {

    if {[string match *$x* $item]} {
        set wordList [regexp -all -inline {\S+} $item]
        set proto [ lindex $wordList 0 ]
        set insglob [ lindex $wordList 1 ]
        regsub -all ":" $insglob " " insglob
        set inslocal [ lindex $wordList 2 ]
        regsub -all ":" $inslocal " " inslocal
        set outlocal [ lindex $wordList 3 ]
        regsub -all ":" $outlocal " " outlocal
        set outglob [ lindex $wordList 4 ]
        regsub -all ":" $outglob " " outglob
        clear ip nat translation $proto inside $insglob $inslocal outside $outlocal $outglob
    }

}

}

Paste it in tclsh and fire up with:

Router(tcl)#clearnat <ip>

For me it was first met of tcl scripting, so it isn’t written optimally. TCL script looks pretty odd to my previous scripting experience ;)



by admin at May 12, 2013 02:01 PM

Server Density

Everything Sysadmin

I feel pain when articles get inaccurate titles

You may have read the Popular Science article:

Thieves Stole $45 Million From ATMs Because The U.S. Uses Absurd 40-Year-Old Technology

Let me quote:

So why is the US so far behind? Infrastructure is a major factor; countries like Japan and the UK are much smaller, so replacing all the old point-of-sale machines and ATMs is easier.

Bullshit.

Bullshit. Bullshit. Bullshit.

The reason is that bank executives had the choice between paying a lot of money to do the right thing or a little money to consultants who would tell them what they wanted to hear. It's a big win for consultants.

WHY IS POPULAR SCIENCE BEING SO ANTI-CONSULTANT???

Everyone got what they asked for. What's so bad about that?

And besides, I'm sure the banks are insured for this kind of thing.

The real headline should be, "Insurance companies lose $45 million from signing contracts with banks that couldn't care less because they've signing contracts with insurance companies that remove the need for them to give a shit."

Amirite? (No, really, can someone from the banking industry confirm?)

May 12, 2013 12:11 PM

Chris Siebenmann

The consequences of importing a module twice

Back when I wrote about Python's relative import problem, I mentioned that only actually importing a module once can be important due to Python's semantics. Today I feel like discussing what these are and how much they can matter.

The straightforward thing that goes wrong if you manage to import a module twice (under two different names) is that any code in the module gets run twice, not once. Modules that run active code on import assume that this code is only going to be run once; running it again may result in various sorts of malfunctions.

At one level, modules that run code on import are relatively rare because people understand it's bad form for a simple import to have big side effects. At another level, various frameworks like Django effectively run code on module import in order to handle things like setting up models and view forms and so on; it's just that this code isn't directly visible in your module because it's hiding in framework metaclasses. But this issue is a signpost to the really big thing: function and class definitions are executable statements that are run at import time. The net effect is that when you import a module a second time the new import has a completely distinct set of functions, classes, exceptions, sentinel objects, and so on. They look identical to the versions from the first import but as far as Python is concerned they are completely distinct; fred.MyCls is not the same thing as mymod.fred.MyCls.

(This is the same effect that you get when you use reload() on a module.)

However, my guess is that this generally won't matter. Most Python code uses duck typing and the two distinct classes are identical as far as that goes. Use of things like specific exceptions, sentinel values, and imported classes is probably going to be confined to the modules that directly imported the dual-imported module and thus mostly hidden from the outside world (for example, it's usually considered bad manners to leak exceptions from a module that you imported into the outside world). In many cases even the objects from the imported module are going to be significantly confined to the importing module.

(One potentially bad thing is that if the module has an internal cache of some sort, you will get two copies of the cache and thus perhaps twice the memory use.)

by cks at May 12, 2013 02:16 AM

bc-log

Adding and Troubleshooting Static Routes on Red Hat based Linux Distributions

Adding static routes in Linux can be troublesome, but also absolutely necessary depending on your network configuration. I call static routes troublesome because they can often be the cause of long troubleshooting sessions wondering why one server can’t connect to another.

This is especially true when dealing with teams that may not fully understand or know the remote servers IP configuration.

The Default Route

Linux, like any other OS has a routing table that determines what is the next hop for every packet.

Print the routing table contents

There are numerous commands that show the routing table but today we will use the ip command as this command will be replacing the route command in future releases.

# ip route show
 10.1.6.0/26 dev eth0 proto kernel scope link src 10.1.6.21
 10.1.7.0/24 dev eth1 proto kernel scope link src 10.1.7.41
 default via 10.1.6.1 dev eth0

As you can see in the example routing table there are numerous routes however 1 route shows as the default route. This routing table tells the system that if the IP that is being communicated to does not fall into any of the other routes than send the packets to the default route defined as 10.1.6.1. The default route basically acts as a catchall for any packet that isn’t being told what to do in the above routes.

Our Example System

In today’s article I will be referencing an example network configuration in order to show how static routes are added, why to add them and some basic troubleshooting.

Example Interface Configuration

eth0:

# cat /etc/sysconfig/network-scripts/ifcfg-eth0
DEVICE=eth0
BOOTPROTO=static
IPADDR=10.1.6.21
NETMASK=255.255.255.192
ONBOOT=yes

eth1:

# cat /etc/sysconfig/network-scripts/ifcfg-eth1
DEVICE=eth1
BOOTPROTO=static
IPADDR=10.1.7.41
NETMASK=255.255.255.0
ONBOOT=yes

Example Default Route Configuration

# cat /etc/sysconfig/network
NETWORKING=yes
HOSTNAME=testing.example.com
GATEWAY=10.1.6.1

The GATEWAY configuration in /etc/sysconfig/network tells the system that 10.1.6.1 is the default route. This configuration could also be added to /etc/sysconfig/network-scripts/ifcfg-eth0 file; However if multiple ifcfg-<interface> files have a GATEWAY this may provide unexpected results as there can only be one default route.

Example Why we need a static route

For our example network configuration we have two interfaces; eth0 (10.1.6.21) for the internet, and eth1 (10.1.7.41) for the internal network. If we were to hook up to a backup server such as 10.1.5.202 we would want the connectivity to go through eth1 the internal network, rather than eth0 which is the internet network.

Since 10.1.5.202 is not in the same subnet at eth1 (10.1.7.0/24) the routing table does not automatically route the packet through eth1 and would then hit the “catchall” default route out eth0. To force all of our packets destined for 10.1.5.202 out eth1 we will need to set up a static route.

Adding a Static Route

Adding the route to the current routing table

Adding the static route is a fairly simple task however before we start we must first know the gateway for the internal network; for our example the gateway is 10.1.7.1.

Adding a single IP
# ip route add 10.1.5.202/32 via 10.1.7.1 dev eth1

The above command adds a route that tells the system to send all packets for 10.1.5.202 and only that IP to 10.1.7.1 from device eth1.

Adding a subnet of IP’s

In order to add a whole subnet than you will need to change the CIDR on the end of the IP. In this case I want to add anything in the 10.1.5.0 – 10.1.5.255 IP range. To do that I can specify the netmask of 255.255.255.0 in CIDR format (/24) at the end of the IP itself.

If a CIDR (or netmask) is not specified the route will default to a /32 (single ip) route.

# ip route add 10.1.5.0/24 via 10.1.7.1 dev eth1

The difference between these two routes is that the second will route anything between 10.1.5.0 and 10.1.5.255 out eth1 with 1 route command. This is useful if you need to communicate with multiple servers in a network and don’t want to manage lengthy routing tables.

Adding the route even after a network restart

While the commands above added the static route they are only in the routing table until either the server or network service is restarted. In order to add the route permanently the route can be added to the route-<interface> file.

# vi /etc/sysconfig/network-scripts/route-eth1

Append:

10.1.5.0/24 via 10.1.7.1 dev eth1

If the above configuration file does not already exist than simply create it and put only the route itself in the file (# comments are ok). When the interface is restarted next the system will add any valid route in the route-eth1 file to the routing table.

I highly suggest that when possible anytime you add a route to the route-<interface> files that the interface itself is restarted to validate whether the route is actually in place correctly or not. I have been on many late night calls where a static route was not added correctly to the configuration files and was removed on the next reboot, which is also long after everyone has forgotten that a static route was required.

Troubleshooting a Static Route

Check if the route is in the routing table

Before performing any deep down troubleshooting steps the easiest and first step should be to check if the routing table actually has the route you expect it to have.

# ip route show
 10.1.5.0/24 via 10.1.7.1 dev eth1
 10.1.6.0/26 dev eth0 proto kernel scope link src 10.1.6.21
 10.1.7.0/24 dev eth1 proto kernel scope link src 10.1.7.41
 default via 10.1.6.1 dev eth0

Use tcpdump to see tcp/ip communication

The easiest way that I have found to find out whether a static route is working correctly or not is to use tcpdump to look at the network communication. In our example above we were attempting to communicate to 10.1.5.202 through device eth1.

# tcpdump -qnnvvv -i eth1 host 10.1.5.202
tcpdump: listening on eth1, link-type EN10MB (Ethernet), capture size 96 bytes
16:50:35.880941 IP (tos 0x10, ttl 64, id 59563, offset 0, flags [DF], proto: TCP (6), length: 60) 10.1.7.41.41403 > 10.1.5.202.22: tcp 0
16:50:35.881266 IP (tos 0x0, ttl 59, id 0, offset 0, flags [DF], proto: TCP (6), length: 60) 10.1.5.202.22 > 10.1.7.41.41403: tcp 0

The above tcpdump command will only listen on eth1 and output only results that to or from 10.1.5.202.

TCP connections require communication from both the source and the destination, to validate a static route you can simply initiate a tcp connection (telnet to port 22 in this case) from the server with the static route to the destination server. In the output above you can see communication from 10.1.7.41 to 10.1.5.202 from the eth1 interface, this line alone shows that the static route is working correctly.

If the static route was incorrect or missing the tcpdump output would look similar to the following.

# tcpdump -qnnvvv -i eth1 host 10.1.5.202
tcpdump: listening on eth1, link-type EN10MB (Ethernet), capture size 96 bytes
16:50:35.881266 IP (tos 0x0, ttl 59, id 0, offset 0, flags [DF], proto: TCP (6), length: 60) 10.1.5.202.22500 > 10.1.7.41.22: tcp 0

In the above, only the target server is communicating over eth1.

Tags: , , , , , , , , , , , , ,

by Benjamin Cane at May 12, 2013 12:44 AM


Administered by Joe. Content copyright by their respective authors.