Planet Sysadmin               

          blogs for sysadmins, chosen by sysadmins...
(Click here for multi-language)

April 19, 2014

Yellow Bricks

Alert: vSphere 5.5 U1 and NFS issue!


Some had already reported on this on twitter and the various blog posts but I had to wait until I received the green light from our KB/GSS team. An issue has been discovered with vSphere 5.5 Update 1 that is related to loss of connection of NFS based datastores. (NFS volumes include VSA datastores.)

This is a serious issue, as it results in an APD of the datastore meaning that the virtual machines will not be able to do any IO to the datastore at the time of the APD. This by itself can result in BSOD’s for Windows guests and filesystems becoming read only for Linux guests.

Witnessed log entries can include:

2014-04-01T14:35:08.074Z: [APDCorrelator] 9413898746us: [vob.storage.apd.start] Device or filesystem with identifier [12345678-abcdefg0] has entered the All Paths Down state.
2014-04-01T14:35:08.075Z: [APDCorrelator] 9414268686us: [esx.problem.storage.apd.start] Device or filesystem with identifier [12345678-abcdefg0] has entered the All Paths Down state.
2014-04-01T14:36:55.274Z: No correlator for vob.vmfs.nfs.server.disconnect
2014-04-01T14:36:55.274Z: [vmfsCorrelator] 9521467867us: [esx.problem.vmfs.nfs.server.disconnect] 192.168.1.1/NFS-DS1 12345678-abcdefg0-0000-000000000000 NFS-DS1
2014-04-01T14:37:28.081Z: [APDCorrelator] 9553899639us: [vob.storage.apd.timeout] Device or filesystem with identifier [12345678-abcdefg0] has entered the All Paths Down Timeout state after being in the All Paths Down state for 140 seconds. I/Os will now be fast failed.
2014-04-01T14:37:28.081Z: [APDCorrelator] 9554275221us: [esx.problem.storage.apd.timeout] Device or filesystem with identifier [12345678-abcdefg0] has entered the All Paths Down Timeout state after being in the All Paths Down state for 140 seconds. I/Os will now be fast failed.

If you are hitting these issues than VMware recommends reverting back to vSphere 5.5. Please monitor the following KB closely for more details and hopefully a fix in the near future: http://kb.vmware.com/kb/2076392

 

"Alert: vSphere 5.5 U1 and NFS issue!" originally appeared on Yellow-Bricks.com. Follow me on twitter - @DuncanYB.

by Duncan Epping at April 19, 2014 08:29 AM

Chris Siebenmann

Cross-system NFS locking and unlocking is not necessarily fast

If you're faced with a problem of coordinating reads and writes on an NFS filesystem between several machines, you may be tempted to use NFS locking to communicate between process A (on machine 1) and process B (on machine 2). The attraction of this is that all they have to do is contend for a write lock on a particular file; you don't have to write network communication code and then configure A and B to find each other.

The good news is that this works, in that cross system NFS locking and unlocking actually works right (at least most of the time). The bad news is that this doesn't necessarily work fast. In practice, it can take a fairly significant amount of time for process B on machine 2 to find out that process A on machine 1 has unlocked the coordination file, time that can be measured in tens of seconds. In short, NFS locking works but it can require patience and this makes it not necessarily the best option in cases like this.

(The corollary of this is that when you're testing this part of NFS locking to see if it actually works you need to wait for quite a while before declaring things a failure. Based on my experiences I'd wait at least a minute before declaring an NFS lock to be 'stuck'. Implications for impatient programs with lock timeouts are left as an exercise for the reader.)

I don't know if acquiring an NFS lock on a file after a delay normally causes your machine's kernel to flush cached information about the file. In an ideal world it would, but NFS implementations are often not ideal worlds and the NFS locking protocol is a sidecar thing that's not necessarily closely integrated with the NFS client. Certainly I wouldn't count on NFS locking to flush cached information on, say, the directory that the locked file is in.

In short: you want to test this stuff if you need it.

PS: Possibly this is obvious but when I started testing NFS locking to make sure it worked in our environment I was a little bit surprised by how slow it could be in cross-client cases.

by cks at April 19, 2014 04:38 AM

April 18, 2014

Google Blog

Through the Google lens: this week’s search trends

What did you search for this week? What about everyone else? Starting today, we’ll be sharing a regular look back at some of the top trending items on Google Search. Let’s dive in.

From afikomen to 1040EZ
People were looking for information on Palm Sunday and Good Friday ahead of Easter; searches for both days were even higher than searches for the Pope himself. Turning to another religious tradition, with Passover beginning on Monday we saw searches rise over 100 percent for Seder staples like [charoset recipe], [brisket passover] and of course [matzo balls]. Alongside these celebrations, U.S. citizens observed another annual rite of spring: taxes were due on April 15, leading to a rise in searches for [turbotax free], [irs] and (whoops) [turbotax extension].
But what made this year different from all other years? A rare lunar eclipse known as the “blood moon,” when the Earth’s shadow covers the moon, making it look red, and which occurred on Tuesday. There were more than 5 million searches on the topic, as people were eager to learn more. (Hint: if you missed seeing the blood moon this time around, keep your eyes on the sky in October. This is the first lunar eclipse in a “lunar tetrad,” a series of four total lunar eclipses each taking place six lunar months apart.)
Say goodbye and say hello
This week marked the first anniversary of last year’s Boston Marathon bombing, and commemorations led searches for the term [boston strong] to rise once again. And just yesterday, we were saddened by the passing of Gabriel Garcia Marquez, the Colombian writer best known for his masterpiece “100 Years of Solitude”—not to mention responsible for high schoolers across the U.S. knowing the term “magical realism.” On a happier note, former First Daughter Chelsea Clinton announced she’s expecting.

Entertainment that makes you go ZOMG
“Game of Thrones” fans—at least those who hadn’t read the books—were treated to a bombshell in this past Sunday’s episode when (spoiler alert) yet another wedding turned murderous. Searches for [who killed joffrey] skyrocketed as people struggled to process the loss of the boy king we love to hate. On the more sedate end of the Sunday TV spectrum, we welcomed back AMC’s “Mad Men,” which continues to provide viewers with plenty of innuendo, allusion and fashion to chew on—and search for—in between episodes.

The trailer for the highly anticipated film version of “Gone Girl” dropped this week—vaulting searches for [gone girl trailer] nearly 1,000 percent—as did a clip from another book-to-movie remake, “The Fault in Our Stars.” Between these two films we expect no dry eyes in June and no intact fingernails come October. At least we’ve got something funny to look forward to: as news broke this week that Fox 2000 is developing a sequel to the 1993 comedy classic "Mrs. Doubtfire," searches on the subject have since spiked.
And that’s it for this week in search. If you’re interested in exploring trending topics on your own, check out Google Trends. And starting today, you can also sign up to receive emails on your favorite terms, topics, or Top Charts for any of 47 countries.

by Emily Wood (noreply@blogger.com) at April 18, 2014 04:23 PM

bc-log

Using sysdig to Troubleshoot like a boss

If you haven't seen it yet there is a new troubleshooting tool out called sysdig. It's been touted as strace meets tcpdump and well, it seems like it is living up to the hype. I would actually rather compare sysdig to SystemTap meets tcpdump, as it has the command line syntax of tcpdump but the power of SystemTap.

In this article I am going to cover some basic and cool examples for sysdig, for a more complete list you can look over the sysdig wiki. However, it seems that even the sysdig official documentation is only scratching the surface of what can be done with sysdig.

Installation

In this article we will be installing sysdig on Ubuntu using apt-get. If you are running an rpm based distribution you can find details on installing via yum on sysdig's wiki.

Setting up the apt repository

To install sysdig via apt we will need to setup the apt repository maintained by Draios the company behind sysdig. We can do this by running the following curl commands.

# curl -s https://s3.amazonaws.com/download.draios.com/DRAIOS-GPG-KEY.public | apt-key add -  
# curl -s -o /etc/apt/sources.list.d/draios.list http://download.draios.com/stable/deb/draios.list

The first command above will download the Draios gpg key and add it to apt's key repository. The second will download an apt sources file from Draios and place it into the /etc/apt/sources.list.d/ directory.

Update apt's indexes

Once the sources list and gpg key are installed we will need to re-sync apt's package indexes, this can be done by running apt-get update.

# apt-get update

Kernel headers package

The sysdig utility requires the kernel headers package, before installing we will need to validate that the kernel headers package is installed.

Check if kernel headers is installed

The system that I am using for this example already had the kernel headers packaged installed, to validate if they are installed on your system you can use the dpkg command.

    # dpkg --list | grep header
    ii  linux-generic                       3.11.0.12.13                     amd64        Complete Generic Linux kernel and headers
    ii  linux-headers-3.11.0-12             3.11.0-12.19                     all          Header files related to Linux kernel version 3.11.0
    ii  linux-headers-3.11.0-12-generic     3.11.0-12.19                     amd64        Linux kernel headers for version 3.11.0 on 64 bit x86 SMP
    ii  linux-headers-generic               3.11.0.12.13                     amd64        Generic Linux kernel headers

It is important to note that the kernel headers package must be for the specific kernel version your system is running. In the output above you can see the linux-generic package is version 3.11.0.12 and the headers package is for 3.11.0.12. If you have multiple kernels installed you can validate which version your system is running with the uname command.

# uname -r
3.11.0-12-generic

Installing the kernel headers package

To install the headers package for this specific kernel you can use apt-get. Keep in mind, you must specify the kernel version listed from uname -r.

# apt-get install linux-headers-<kernel version>

Example:

# apt-get install linux-headers-3.11.0-12-generic

Installing sysdig

Now that the apt repository is setup and we have the required dependencies, we can install the sysdig command.

# apt-get install sysdig

Using sysdig

Basic Usage

The syntax for sysdig is similar to tcpdump in particular the saving and reading of trace files. All of sysdig's output can be saved to a file and read later just like tcpdump. This is useful if you are running a process or experiencing an issue and wanted to dig through the information later.

Writing trace files

To write a file we can use the -w flag with sysdig and specify the file name.

Syntax:

# sysdig -w <output file>

Example:

# sysdig -w tracefile.dump

Like tcpdump the sysdig command can be stopped with CTRL+C.

Reading trace files

Once you have written the trace file you will need to use sysdig to read the file, this can be accomplished with the -r flag.

Syntax:

# sysdig -r <output file>

Example:

    # sysdig -r tracefile.dump
    1 23:44:57.964150879 0 <NA> (7) > switch next=6200(sysdig) 
    2 23:44:57.966700100 0 rsyslogd (358) < read res=414 data=<6>[ 3785.473354] sysdig_probe: starting capture.<6>[ 3785.473523] sysdig_probe: 
    3 23:44:57.966707800 0 rsyslogd (358) > gettimeofday 
    4 23:44:57.966708216 0 rsyslogd (358) < gettimeofday 
    5 23:44:57.966717424 0 rsyslogd (358) > futex addr=13892708 op=133(FUTEX_PRIVATE_FLAG|FUTEX_WAKE_OP) val=1 
    6 23:44:57.966721656 0 rsyslogd (358) < futex res=1 
    7 23:44:57.966724081 0 rsyslogd (358) > gettimeofday 
    8 23:44:57.966724305 0 rsyslogd (358) < gettimeofday 
    9 23:44:57.966726254 0 rsyslogd (358) > gettimeofday 
    10 23:44:57.966726456 0 rsyslogd (358) < gettimeofday

Output in ASCII

By default sysdig saves the files in binary, however you can use the -A flag to have sysdig output in ASCII.

Syntax:

# sysdig -A

Example:

# sysdig -A > /var/tmp/out.txt
# cat /var/tmp/out.txt
1 22:26:15.076829633 0 <NA> (7) > switch next=11920(sysdig)

The above example will redirect the output to a file in plain text, this can be helpful if you wanted to save and review the data on a system that doesn't have sysdig installed.

sysdig filters

Much like tcpdump the sysdig command has filters that allow you to filter the output to specific information. You can find a list of available filters by running sysdig with the -l flag.

Example:

    # sysdig -l

    ----------------------
    Field Class: fd

    fd.num            the unique number identifying the file descriptor.
    fd.type           type of FD. Can be 'file', 'ipv4', 'ipv6', 'unix', 'pipe', 'e
                      vent', 'signalfd', 'eventpoll', 'inotify' or 'signalfd'.
    fd.typechar       type of FD as a single character. Can be 'f' for file, 4 for 
                      IPv4 socket, 6 for IPv6 socket, 'u' for unix socket, p for pi
                      pe, 'e' for eventfd, 's' for signalfd, 'l' for eventpoll, 'i'
                       for inotify, 'o' for uknown.
    fd.name           FD full name. If the fd is a file, this field contains the fu
                      ll path. If the FD is a socket, this field contain the connec
                      tion tuple.
<truncated output>

Filter examples

Capturing a specific process

You can use the "proc.name" filter to capture all of the sysdig events for a specific process. In the example below I am filtering on any process named sshd.

Example:

    # sysdig -r tracefile.dump proc.name=sshd
    530 23:45:02.804469114 0 sshd (917) < select res=1 
    531 23:45:02.804476093 0 sshd (917) > rt_sigprocmask 
    532 23:45:02.804478942 0 sshd (917) < rt_sigprocmask 
    533 23:45:02.804479542 0 sshd (917) > rt_sigprocmask 
    534 23:45:02.804479767 0 sshd (917) < rt_sigprocmask 
    535 23:45:02.804487255 0 sshd (917) > read fd=3(<4t>10.0.0.12:55993->162.0.0.80:22) size=16384
Capturing all processes that open a specific file

The fd.name filter is used to filter events for a specific file name. This can be useful to see what processes are reading or writing a specific file or socket.

Example:

# sysdig fd.name=/dev/log
14 11:13:30.982445884 0 rsyslogd (357) < read res=414 data=<6>[  582.136312] sysdig_probe: starting capture.<6>[  582.136472] sysdig_probe:

Capturing all processes that open a specific filesystem

You can also use comparison operators with filters such as contains, =, !=, <=, >=, < and >.

Example:

    # sysdig fd.name contains /etc
    8675 11:16:18.424407754 0 apache2 (1287) < open fd=13(<f>/etc/apache2/.htpasswd) name=/etc/apache2/.htpasswd flags=1(O_RDONLY) mode=0 
    8678 11:16:18.424422599 0 apache2 (1287) > fstat fd=13(<f>/etc/apache2/.htpasswd) 
    8679 11:16:18.424423601 0 apache2 (1287) < fstat res=0 
    8680 11:16:18.424427497 0 apache2 (1287) > read fd=13(<f>/etc/apache2/.htpasswd) size=4096 
    8683 11:16:18.424606422 0 apache2 (1287) < read res=44 data=admin:$apr1$OXXed8Rc$rbXNhN/VqLCP.ojKu1aUN1. 
    8684 11:16:18.424623679 0 apache2 (1287) > close fd=13(<f>/etc/apache2/.htpasswd) 
    8685 11:16:18.424625424 0 apache2 (1287) < close res=0 
    9702 11:16:21.285934861 0 apache2 (1287) < open fd=13(<f>/etc/apache2/.htpasswd) name=/etc/apache2/.htpasswd flags=1(O_RDONLY) mode=0 
    9703 11:16:21.285936317 0 apache2 (1287) > fstat fd=13(<f>/etc/apache2/.htpasswd) 
    9704 11:16:21.285937024 0 apache2 (1287) < fstat res=0

As you can see from the above examples filters can be used for both reading from a file or the live event stream.

Chisels

Earlier I compared sysdig to SystemTap, Chisels is why I made that reference. Similar tools like SystemTap have a SystemTap only scripting language that allows you to extend the functionality of SystemTap. In sysdig these are called chisels and they can be written in LUA which is a common programming language. I personally think the choice to use LUA was a good one, as it makes extending sysdig easy for newcomers.

List available chisels

To list the available chisels you can use the -cl flag with sysdig.

Example:

    # sysdig -cl

    Category: CPU Usage
    -------------------
    topprocs_cpu    Top processes by CPU usage

    Category: I/O
    -------------
    echo_fds        Print the data read and written by processes.
    fdbytes_by      I/O bytes, aggregated by an arbitrary filter field
    fdcount_by      FD count, aggregated by an arbitrary filter field
    iobytes         Sum of I/O bytes on any type of FD
    iobytes_file    Sum of file I/O bytes
    stderr          Print stderr of processes
    stdin           Print stdin of processes
    stdout          Print stdout of processes
    <truncated output>

The list is fairly long even though sysdig is still pretty new, and since sysdig is on GitHub you can easily contribute and extend sysdig with your own chisels.

Display chisel information

While the list command gives a small description of the chisels you can display more information using the -i flag with the chisel name.

Example:

    # sysdig -i bottlenecks

    Category: Performance
    ---------------------
    bottlenecks     Slowest system calls

    Use the -i flag to get detailed information about a specific chisel

    Lists the 10 system calls that took the longest to return dur
    ing the capture interval.

    Args:
    (None)

Running a chisel

To run a chisel you can run sysdig with the -c flag and specify the chisel name.

Example:

    # sysdig -c topprocs_net
    Bytes     Process
    ------------------------------
    296B      sshd

Running a chisel with filters

Even with chisels you can still use filters to run chisels against specific events.

Capturing all network traffic from a specific process

The below example shows using the echo_fds chisel against the processes named apache2.

# sysdig -A -c echo_fds proc.name=apache2
------ Read 444B from 127.0.0.1:57793->162.243.109.80:80

GET /wp-admin/install.php HTTP/1.1
Host: 162.243.109.80
Connection: keep-alive
Cache-Control: max-age=0
Authorization: Basic YWRtaW46ZUNCM3lyZmRRcg==
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/33.0.1750.152 Safari/537.36
Accept-Encoding: gzip,deflate,sdch
Accept-Language: en-US,en;q=0.8

Capturing network traffic exchanged between a specific ip

We can also use the the echo_fds chisel to show all network traffic for a single ip using the fd.cip filter.

# sysdig -A -c echo_fds fd.cip=127.0.0.1
------ Write 1.92KB to 127.0.0.1:58896->162.243.109.80:80

HTTP/1.1 200 OK
Date: Thu, 17 Apr 2014 03:11:33 GMT
Server: Apache
X-Powered-By: PHP/5.5.3-1ubuntu2.3
Vary: Accept-Encoding
Content-Encoding: gzip
Content-Length: 1698
Keep-Alive: timeout=5, max=100
Connection: Keep-Alive
Content-Type: text/html; charset=utf-8

Originally Posted on BenCane.com: Go To Article

by Benjamin Cane at April 18, 2014 01:30 PM

Chris Siebenmann

What modern filesystems need from volume management

One of the things said about modern filesystems like btrfs and ZFS is that their volume management functionality is a layering violation; this view holds that filesystems should stick to filesystem stuff and volume managers should stick to that. For the moment let's not open that can of worms and just talk about what (theoretical) modern filesystems need from an underlying volume management layer.

Arguably the crucial defining aspect of modern filesystems like ZFS and btrfs is a focus on resilience against disk problems. A modern filesystem no longer trusts disks not to have silent errors; instead it checksums everything so that it can at least detect data faults and it often tries to create some internal resilience by duplicating metadata or at least spreading it around (copy on write is also common, partly because it gives resilience a boost).

In order to make checksums useful for healing data instead of just simply detecting when it's been corrupted, a modern filesystem needs an additional operation from any underlying volume management layer. Since the filesystem can actually identify the correct block from a number of copies, it needs to be able to get all copies or variations of a set of data blocks from the underlying volume manager (and then be able to tell the volume manager which is the correct copy). In mirroring this is straightforward; in RAID 5 and RAID 6 it gets a little more complex. This 'all variants' operation will be used both during regular reads if a corrupt block is detected and during a full verification check where the filesystem will deliberately read every copy to check that they're all intact.

(I'm not sure what the right primitive operation here should be for RAID 5 and RAID 6. On RAID 5 you basically need the ability to try all possible reconstructions of a stripe in order to see which one generates the correct block checksum. Things get even more convoluted if the filesystem level block that you're checksumming spans multiple stripes.)

Modern filesystems generally also want some way of saying 'put A and B on different devices or redundancy clusters' in situations where they're dealing with stripes of things. This enables them to create multiple copies of (important) metadata on different devices for even more protection against read errors. This is not as crucial if the volume manager is already providing redundancy.

This level of volume manager support is a minimum level, as it still leaves a modern filesystem with the RAID-5+ rewrite hole and a potentially inefficient resynchronization process. But it gets you the really important stuff, namely redundancy that will actually help you against disk corruption.

by cks at April 18, 2014 06:18 AM

April 17, 2014

Rich Bowen

ApacheCon NA 2014 Keynotes

This year at ApacheCon, I had the unenviable task of selecting the keynotes. This is always difficult, because you want to pick people who are inspirational, exciting speakers, but people who haven't already been heard by everyone at the event. You also need to give some of your sponsors the stage for a bit, and hope that they don't take the opportunity to bore the audience with a sales pitch.

I got lucky.

(By the way, videos of all of these talks will be on the Apache YouTube channel very soon - https://www.youtube.com/user/TheApacheFoundation)

We had a great lineup, covering a wide range of topics.

Day One:

0022_ApacheCon

We started with Hillary Mason, talking about Big Data. Unlike a lot of droney Big Data talks, she defined Big Data in terms of using huge quantities of data to solve actual human problems, and gave a historical view of Big Data going back to the first US Census. Good stuff.

0084_ApacheCon

Next, Samisa Abeysinghe talked about Apache Stratos, and the services and products that WSO2 is building on top of them. Although he had the opportunity to do nothing more than promote his (admittedly awesome) company, Samisa talked more about the Stratos project and the great things that it's doing in the Platform As A Service space. We love WSO2.

0127_ApacheCon

And to round out the first day of keynotes, James Watters from Pivotal talked about the CloudFoundry foundation that he's set up, and why he chose to do that rather than going with an existing foundation. Among other things. I had talked some with James prior to the conference about his talk, and he came through with a really great talk.

Day Two:

0602.ApacheCon

Day Two started with something a little different. Upayavira talked about the tool that geeks seldom mention - their minds - and how to take care of it. He talked about mindfullness - the art of being where you are when you are, and noticing what is going on around you. He then led us through several minutes of quiet contemplation and focusing of our minds. While some people thought this was a little weird, most people I talked with appreciated this calm centering way to start the morning.

0635.ApacheCon

Mark Hinkle, from Citrix, talked about community and code, and made a specific call to the foundation to revise its sponsorship rules to permit companies like Citrix to give us more money in a per-project targeted fashion.

0772.ApacheCon

And Jim Zemlin rounded out the day two keynotes by talking about what he does at the Linux Foundation, and how different foundations fill different niches in the Open Source software ecosystem. This is a talk I personally asked him to do, so I was very pleased with how it turned out. Different foundations do things differently, and I wanted him to talk some about why, and why some projects may fit better in one or another.

At the end of day three, we had two closing keynotes. We've done closing keynotes before with mixed results - a lot of people leave before. But we figured that with more content on the days after that, people would stay around. So it was disappointing to see how empty the rooms were. But the talks were great.

1052_ApacheCon

Allison Randal, a self-proclaimed Unix Graybeard (no, really!) talked about the cloud, and how it's just the latest incarnation of a steady series of small innovations over the last 50 years or so, and what we can look for in the coming decade. She spoke glowingly about Apache and its leadership role in that space.

1105_ApacheCon

Then Jason Hibbets finished up by talking about his work in Open Source Cities, and how Open Source methodologies can work in real-world collaboration to make your home town so much better. I'd heard this presentation before, but it was still great to hear the things that he's been doing in his town, and how they can be done in other places using the same model.

So, check the Apache YouTube channel in a week or so - https://www.youtube.com/user/TheApacheFoundation - and make some time to watch these presentations. I was especially pleased with Hillary and Upayavira's talks, and recommend you watch those if you are short on time and want to pick just a few.

by rbowen at April 17, 2014 04:05 PM

Google Blog

Providing more CS professional development for K-12 teachers with an expanded CS4HS

For more than five years, we’ve provided free and inexpensive teacher professional development trainings in computer science education through Computer Science for High School (CS4HS). In this program, Google provides funding and support for experts to create hands-on professional development training in CS education for K-12 educators. The goal is to arm teachers with the knowledge they need to help their students succeed in the field. The program has already trained more than 12,000 teachers, and reached more than 600,000 students—and we’ve gotten great feedback over the years (a 95% satisfaction rate!).

It’s been a great success, but there is still much more to do. So this year, we’re taking the first steps toward extending CS4HS across the globe. We’re piloting CS4HS projects in Latin America for the first time—an area where computer science education is often mistaken for computer literacy (think word processing, typing, or changing settings on your operating system rather than robotics or coding a game). We’re also introducing eight new online workshops, so teachers no longer need to be located near a CS4HS event to get quality training.

It’s not just the “where” we’re expanding, but the “when,” as well. We’re now providing new resources for teachers to get ongoing, year-round help. Our Google+ Community page hosts Hangouts on Air with CS industry leaders, Googlers, and top educators on a regular basis. And we’ve added a new Resources page with online workshops, tutorials and information on computational thinking, robotics and more. Finally, if you happen to be in the neighborhood at the right time, sign up for one of our in-person workshops available around the world in these locations:

by Emily Wood (noreply@blogger.com) at April 17, 2014 12:00 PM

Yellow Bricks

Disk Controller features and Queue Depth?


I have been working on various VSAN configurations and a question that always comes up is what are my disk controller features and queue depth for controller X? (Local disks, not FC based…) Note that this is not only useful to know when using VSAN, but also when you are planning on doing host local caching with solutions like PernixData FVP or SanDisk FlashSoft for instance. The controller used can impact the performance, and a really low queue depth will result in a lower performance, it is as simple as that.

I have found myself digging through documentation and doing searches on the internet until I stumbled across the following website. I figured I would share the link with you, as it will help you (especially consultants) when you need to go through this exercise multiple times:

http://forums.servethehome.com/index.php?threads/lsi-raid-controller-and-hba-complete-listing-plus-oem-models.599/

Just as an example, the Dell H200 Integrated disk controller is on the VSAN HCL. According to the website above it is based on the LSI 2008 and provides the following feature set: 2×4 port internal SAS, no cache, no BBU, RAID 0, 1 and 10. According to the VSAN HCL also provides “Virtual SAN Pass-Through”. I guess the only info missing is queue depth of the controller. I have not been able to find a good source for this. So I figured I would make this thread a source for that info.

Before we dive in to that, I want to show something which is also important to realize. Some controllers take: SAS / NL-SAS and SATA. Although typically the price difference between SATA and NL-SAS is neglectable, the queue depth difference is not. Erik Bussink was kind enough to provide me with these details of one of the controllers he is using as an example, first in the list is “RAID” device – second is SATA and third SAS… As you can see SAS is the clear winner here, and that includes NL-SAS drives.

mpt2sas_raid_queue_depth: int
     Max RAID Device Queue Depth (default=128)
  mpt2sas_sata_queue_depth: int
     Max SATA Device Queue Depth (default=32)
  mpt2sas_sas_queue_depth: int
     Max SAS Device Queue Depth (default=254)

If you want to contribute, please take the following steps and report the Vendor, Controller type and aqlength in a comment please.

  1. Run the esxtop command on the ESXi shell / SSH session
  2. Press d
  3. Press f and select Queue Stats (d)
  4. The value listed under AQLEN is the queue depth of the storage adapter

The following table shows the Vendor, Controller and Queue Depth. Note that this is based on what we (my readers and I) have witnessed in our labs and results my vary depending on the firmware and driver used. Make sure to check the VSAN HCL for the supported driver / firmware version, note that not all controllers below are on the VSAN HCL, this is a “generic” list as I want it to serve multiple use cases.

Generally speaking it is recommended to use a disk controller with a queue depth > 256 when used for VSAN or “host local caching” solutions.

HPSmart Array P700m1200

Vendor Disk Controller Queue Depth
Adaptec RAID 2405 504
Dell H310 25
Dell (R610) SAS 6/iR 127
Dell PERC 6/i 925
Dell (M710HD) PERC H200 Embedded 499
Dell (M910) PERC H700 Modular 975
Dell PERC H700 Integrated 975
Dell (M620) PERC H710 Mini 975
Dell (T620) PERC H710 Adapter 975
Dell (T620) PERC H710p 975
Dell PERC H810 975
HP Smart Array P220i 1020
HP Smart Array P400i 128
HP Smart Array P410i 1020
HP Smart Array P420i 1020
IBM ServeRAID-M5015 965
Intel C602 AHCI (Patsburg) 31 (per port)
Intel C602 SCU (Patsburg) 256
Intel RMS25KB040 600
LSI 2004 25
LSI 2008 25
LSI 2108 600
LSI 2208 600
LSI 2308 600
LSI 3008 600
LSI 9300-8i 600

"Disk Controller features and Queue Depth?" originally appeared on Yellow-Bricks.com. Follow me on twitter - @DuncanYB.

by Duncan Epping at April 17, 2014 09:30 AM

Startup News Flash part 17


Number 17 already… A short one, I expect more news next week when we have “Storage Field Day”, hence I figured I would release this one already. Make sure to watch the live feed if you are interested in getting the details on new releases from companies like Diablo, SanDisk, PernixData etc.

Last week Tintri announced support for the Red Hat Enterprise Virtualization platform. Kind of surprising to see them selecting a specific linux vendor to be honest, but then again it probably also is the more popular option for people who want full support etc. What is nice in my opinion is that Tintri offers the exact same “VM Aware” experience for both platforms. Although I don’t see too many customers using both VMware and RHEV in production, it is nice to have the option.

CloudVolumes, no not a storage company, announced support for View 6.0. CloudVolumes developed a solution which helps you manage applications. They provude a central management solution, and the option to distribute and elimate the need for streaming / packaging. I have looked at it briefly and it is an interesting approach they take. I like how they solved the “layering” problem by isolating the app in its own disk container. It does make me wonder how this scales when you have dozens of apps per desktop, never the less an interesting approach worth looking in to.

"Startup News Flash part 17" originally appeared on Yellow-Bricks.com. Follow me on twitter - @DuncanYB.

by Duncan Epping at April 17, 2014 07:10 AM

Chris Siebenmann

Partly getting around NFS's concurrent write problem

In a comment on my entry about NFS's problem with concurrent writes, a commentator asked this very good question:

So if A writes a file to an NFS directory and B needs to read it "immediately" as the file appears, is the only workaround to use low values of actimeo? Or should A and B be communicating directly with some simple mechanism instead of setting, say, actimeo=1?

(Let's assume that we've got 'close to open' consistency to start with, where A fully writes the file before B processes it.)

If I was faced with this problem and I had a free hand with A and B, I would make A create the file with some non-repeating name and then send an explicit message to B with 'look at file <X>' (using eg a TCP connection between the two). A should probably fsync() the file before it sends this message to make sure that the file's on the server. The goal of this approach is to avoid B's kernel having any cached information about whether or not file <X> might exist (or what the contents of the directory are). With no cached information, B's kernel must go ask the NFS fileserver and thus get accurate information back. I'd want to test this with my actual NFS server and client just to be sure (actual NFS implementations can be endlessly crazy) but I'd expect it to work reliably.

Note that it's important to not reuse filenames. If A ever reuses a filename, B's kernel may have stale information about the old version of the file cached; at the best this will get B a stale filehandle error and at the worst B will read old information from the old version of the file.

If you can't communicate between A and B directly and B operates by scanning the directory to look for new files, you have a moderate caching problem. B's kernel will normally cache information about the contents of the directory for a while and this caching can delay B noticing that there is a new file in the directory. Your only option is to force B's kernel to cache as little as possible. Note that if B is scanning it will presumably only be scanning, say, once a second and so there's always going to be at least a little processing lag (and this processing lag would happen even if A and B were on the same machine); if you really want immediately, you need A to explicitly poke B in some way no matter what.

(I don't think it matters what A's kernel caches about the directory, unless there's communication that runs the other way such as B removing files when it's done with them and A needing to know about this.)

Disclaimer: this is partly theoretical because I've never been trapped in this situation myself. The closest I've come is safely updating files that are read over NFS. See also.

by cks at April 17, 2014 04:11 AM

Raymii.org

FreeBSD 10, Converting from RELEASE to STABLE

Because of a [bug in mpd][1] which is fixed in 10-STABLE I wanted to move one of my FreeBSD machines from 10.0-RELEASE to 10.0-STABLE. The process to do so is fairly simple. Basically, you check out the new source code, build the world, build the kernel, install the kernel, install the world, merge some stuff and reboot. Read on to see the entire process

April 17, 2014 12:00 AM

April 16, 2014

Yellow Bricks

Win a Jackery Giant backup battery, by just leaving a comment


I was one of the lucky guys who won a price during the Top Bloggers award “ceremony”. Veeam was so kind enough to provide two of the exact same items so that every blogger who won a price could also give away a price to their readers. I am not going to make it more difficult than it needs to be. Leave a comment before Friday the 18th of April, make sure use your real email address in the form, and I will let my daughter pick a random winner on Saturday morning. I will update this blog post and inform the winner.

What can you win? (Funny, I was at the point of buying one of these myself as I always run out of battery on my phone and iPad during all-day events!)

Jackery Giant

- Large power capacity with 2.1A output
- The world’s most powerful external rechargeable battery
- 2.1A fast charging
- Size, style and speed make this most powerful external rechargeable battery to-date

This large capacity portable external battery has dual output ports and 10,400mAh for lengthening mobile device battery life up to 500% for smart phones. Its compact size and stylish design has three LED charge status indicators with a two LED flashlight for up to 700 hours of illumination.

"Win a Jackery Giant backup battery, by just leaving a comment" originally appeared on Yellow-Bricks.com. Follow me on twitter - @DuncanYB.

by Duncan Epping at April 16, 2014 07:25 AM

Chris Siebenmann

Where I feel that btrfs went wrong

I recently finished reading this LWN series on btrfs, which was the most in-depth exposure at the details of using btrfs that I've had so far. While I'm sure that LWN intended the series to make people enthused about btrfs, I came away with a rather different reaction; I've wound up feeling that btrfs has made a significant misstep along its way that's resulted in a number of design mistakes. To explain why I feel this way I need to contrast it with ZFS.

Btrfs and ZFS are each both volume managers and filesystems merged together. One of the fundamental interface differences between them is that ZFS has decided that it is a volume manager first and a filesystem second, while btrfs has decided that it is a filesystem first and a volume manager second. This is what I see as btrfs's core mistake.

(Overall I've been left with the strong impression that btrfs basically considers volume management to be icky and tries to have as little to do with it as possible. If correct, this is a terrible mistake.)

Since it's a volume manager first, ZFS places volume management front and center in operation. Before you do anything ZFS-related, you need to create a ZFS volume (which ZFS calls a pool); only once this is done do you really start dealing with ZFS filesystems. ZFS even puts the two jobs in two different commands (zpool for pool management, zfs for filesystem management). Because it's firmly made this split, ZFS is free to have filesystem level things such as df present a logical, filesystem based view of things like free space and device usage. If you want the actual physical details you go to the volume management commands.

Because btrfs puts the filesystem first it wedges volume creation in as a side effect of filesystem creation, not a separate activity, and then it carries a series of lies and uselessly physical details through to filesystem level operations like df. Consider the the discussion of what df shows for a RAID1 btrfs filesystem here, which has both a lie (that the filesystem uses only a single physical device) and a needlessly physical view (of the physical block usage and space free on a RAID 1 mirror pair). That btrfs refuses to expose itself as a first class volume manager and pretends that you're dealing with real devices forces it into utterly awkward things like mounting a multi-device btrfs filesystem with 'mount /dev/adevice /mnt'.

I think that this also leads to the asinine design decision that subvolumes have magic flat numeric IDs instead of useful names. Something that's willing to admit it's a volume manager, such as LVM or ZFS, has a name for the volume and can then hang sub-names off that name in a sensible way, even if where those sub-objects appear in the filesystem hierarchy (and under what names) gets shuffled around. But btrfs has no name for the volume to start with and there you go (the filesystem-volume has a mount point, but that's a different thing).

All of this really matters for how easily you can manage and keep track of things. df on ZFS filesystems does not lie to me; it tells me where the filesystem comes from (what pool and what object path within the pool), how much logical space the filesystem is using (more or less), and roughly how much more I can write to it. Since they have full names, ZFS objects such as snapshots can be more or less self documenting if you name them well. With an object hierarchy, ZFS has a natural way to inherit various things from parent object to sub-objects. And so on.

Btrfs's 'I am not a volume manager' approach also leads it to drastically limit the physical shape of a btrfs RAID array in a way that is actually painfully limiting. In ZFS, a pool stripes its data over a number of vdevs and each vdev can be any RAID type with any number of devices. Because ZFS allows multi-way mirrors this creates a straightforward way to create a three-way or four-way RAID 10 array; you just make all of the vdevs be three or four way mirrors. You can also change the mirror count on the fly, which is handy for all sorts of operations. In btrfs, the shape 'raid10' is a top level property of the overall btrfs 'filesystem' and, well, that's all you get. There is no easy place to put in multi-way mirroring; because of btrfs's model of not being a volume manager it would require changes in any number of places.

(And while I'm here, that btrfs requires you to specify both your data and your metadata RAID levels is crazy and gives people a great way to accidentally blow their own foot off.)

As a side note, I believe that btrfs's lack of allocation guarantees in a raid10 setup makes it impossible to create a btrfs filesystem split evenly across two controllers that is guaranteed to survive the loss of one entire controller. In ZFS this is trivial because of the explicit structure of vdevs in the pool.

PS: ZFS is too permissive in how you can assemble vdevs, because there is almost no point of a pool with, say, a mirror vdev plus a RAID-6 vdev. That configuration is all but guaranteed to be a mistake in some way.

by cks at April 16, 2014 05:28 AM

April 15, 2014

Ubuntu Geek

Yarock – Modern Music Player

Sponsored Link
Yarock is Qt4 Modern Music Player designed to provide an easy and pretty music collection browser based on cover art.Yarock is written in C++ using Qt and Phonon multimedia framework, only for linux platform.

Feel free to download, test it and tell me what you think about it.
(...)
Read the rest of Yarock – Modern Music Player (128 words)


© ruchi for Ubuntu Geek, 2014. | Permalink | No comment | Add to del.icio.us
Post tags: , ,

Related posts

by ruchi at April 15, 2014 11:15 PM

Trouble with tribbles

Partial root zones

In Tribblix, I support sparse-root and whole-root zones, which work largely the same way as in Solaris 10.

The implementation of zone creation is rather different. The original Solaris implementation extended packaging - so the packaging system, and every package, had to be zone-aware. This is clearly unsustainable. (Unfortunately, the same mistake was made when IPS was introduced.)

Apart from creating work, this approach limits flexibility - in order to innovate with zones, for example by adding new types, you have to extend the packaging system, and then modify every package in existence.

The approach taken by Tribblix is rather different. Instead of baking zone architecture into packaging, packaging is kept dumb and the zone creation scripts understand how packages are put together.

In particular, the decision as to whether a given file is present in a zone (and how it ends up there) is not based on package attributes, but is a simple pathname filter. For example, files under /kernel never end up in a zone. Files under /usr might be copied (for a whole-root zone) or loopback mounted (for a sparse-root zone). If it's under /var or /etc, you get a fresh copy. And so on. But the decision is based on pathname.

It's not just the files within packages that get copied. The package metadata is also copied; the contents file is simply filtered by pathname - and that's how the list of files to copy is generated. This filtering takes place during zone creation, and is all done by the zone scripts - the packaging tools aren't invoked (one reason why it's so quick). The scripts, if you want to look, are at /usr/lib/brand/*/pkgcreatezone.

In the traditional model, the list of installed packages in the zone is (initially) identical to that in the global zone. For a sparse-root zone, you're pretty much stuck with that. For a whole-root zone, you can add and remove packages later.

I've been working on some alternative models for zones in Tribblix that add more flexibility to zone creation. These will appear in upcoming releases, but I wanted to talk about the technology.

The first of these is what you might call a partial-root zone. This is similar to a whole-root zone in the sense that you get an independent copy, rather than being loopback mounted. And, it's using the same TRIBwhole brand. The difference is that you can specify a subset of the overlays present in the global zone to be installed in the zone. For example, you would use the following install invocation:

zoneadm -z myzone install -o developer

and only the developer overlay (and the overlays it depends on) will be installed in the zone.

This is still a copy - the installed files in the global zone are the source of the files that end up in the zone, so there's still no package installation, no need for repository access, and it's pretty quick.

This is still a filter, but you're now filtering both on pathname and package name.

As for package metadata, for partial-root zones, references to the packages that don't end up being used are removed.

That's the subset variant. The next obvious extension is to be able to specify additional packages (or, preferably, overlays) to be installed at zone creation time. That does require an additional source of packages - either a repository or a local cache - which is why I treat it as a logically distinct operation.

Time to get coding.

by Peter Tribble (noreply@blogger.com) at April 15, 2014 09:27 PM

Tech Teapot

Stack Overflow Driven Development

The rise of Stack Overflow has certainly changed how many programmers go about their trade.

I have recently been learning some new client side web skills because I need them for a new project. I have noticed that the way I go about learning is quite different from the way I used to learn pre-web.

I used to have a standard technique. I’d go through back issues of magazines I’d bought (I used to have hundreds of back issues) and read any articles related to the new technology. Then I’d purchase a book about the topic, read it and start a simple starter project. Whilst doing the starter project, I’d likely pick up a couple of extra books and skim them to find techniques I needed for the project. This method worked pretty well, I’d be working idiomatically, without a manual in anywhere from a month to three months.

Using the old method, if I got stuck on something, I’d have to figure it out on my own. I remember it took three days to get a simple window to display when I was learning Windows programming in 1991. Without the internet, there was nobody you could ask when you got stuck. If you didn’t own the reference materials you needed, then you were stuck.

Fast forward twenty years and things are rather different. For starters, I don’t have a bunch of magazines sitting around. I don’t even read tech magazines any more, either in print or digitally. None of my favourite magazines survived the transition to digital.

Now when I want to learn a new tech, I head to Wikipedia first to get a basic idea. Then I start trawling google for simple tutorials. I then read one of the new generation of short introductory books on my Kindle.

I then start my project safe in the knowledge that google will always be there. And, of course, google returns an awful lot of Stack Overflow pages. Whilst I would have felt very uncomfortable starting a project without a full grasp of a technology twenty years ago, now I think it would be odd not to. The main purpose of the initial reading is to get a basic understanding of the technology and, most importantly, the vocabulary. You can’t search properly if you don’t know what to search for.

Using my new approach, I’ve cut my learning time from one to three months down to one to three weeks.

The main downside to my approach is that, at the beginning at least, I may not write idiomatic code. But, whilst that is a problem, software is very maleable and you can always re-write parts later on if the project is a success. The biggest challenge now seems to be getting to the point when you know a project has legs as quickly as possible. Fully understanding a tech before starting a project, just delays the start and I doubt you’ll get that time back later in increased productivity.

Of course, by far the quickest approach is to use a tech stack you already know. Unfortunately, in my case that wasn’t possible because I don’t know a suitable client side tech. It is a testament to the designers of Angular.js, SignalR and NancyFX that I have found it pretty easy to get started. I wish everything was so well designed.

The post Stack Overflow Driven Development appeared first on Openxtra Tech Teapot.

by Jack Hughes at April 15, 2014 12:53 PM

Chris Siebenmann

Chasing SSL certificate chains to build a chain file

Supposes that you have some shiny new SSL certificates for some reason. These new certificates need a chain of intermediate certificates in order to work with everything, but for some reason you don't have the right set. In ideal circumstances you'll be able to easily find the right intermediate certificates on your SSL CA's website and won't need the rest of this entry.

Okay, let's assume that your SSL CA's website is an unhelpful swamp pit. Fortunately all is not lost, because these days at least some SSL certificates come with the information needed to find the intermediate certificates. First we need to dump out our certificate, following my OpenSSL basics:

openssl x509 -text -noout -in WHAT.crt

This will print out a bunch of information. If you're in luck (or possibly always), down at the bottom there will be a 'Authority Information Access' section with a 'CA Issuers - URI' bit. That is the URL of the next certificate up the chain, so we fetch it:

wget <SOME-URL>.crt

(In case it's not obvious: for this purpose you don't have to worry if this URL is being fetched over HTTP instead of HTTPS. Either your certificate is signed by this public key or it isn't.)

Generally or perhaps always this will not be a plain text file like your certificate is, but instead a binary blob. The plain text format is called PEM; your fetched binary blob of a certificate is probably in the binary DER encoding. To convert from DER to PEM we do:

openssl x509 -inform DER -in <WGOT-FILE>.crt -outform PEM -out intermediate-01.crt

Now you can inspect intermediate-01.crt in the same to see if it needs a further intermediate certificate; if it does, iterate this process. When you have a suitable collection of PEM format intermediate certificates, simply concatenate them together in order (from the first you fetched to the last, per here) to create your chain file.

PS: The Qualys SSL Server Test is a good way to see how correct your certificate chain is. If it reports that it had to download any certificates, your chain of intermediate certificates is not complete. Similarly it may report that some entries in your chain are not necessary, although in practice this rarely hurts.

Sidebar: Browsers and certificate chains

As you might guess, some but not all browsers appear to use this embedded intermediate certificate URL to automatically fetch any necessary intermediate certificates during certificate validation (as mentioned eg here). Relatedly, browsers will probably not tell you about unnecessary intermediate certificates they received from your website. The upshot of this can be a HTTPS website that works in some browsers but fails in others, and in the failing browser it may appear that you sent no additional certificates as part of a certificate chain. Always test with a tool that will tell you the low-level details.

(Doing otherwise can cause a great deal of head scratching and frustration. Don't ask how I came to know this.)

by cks at April 15, 2014 02:03 AM

April 14, 2014

/sys/net Adventures

Zabbix : Create a production network interface trigger

Following my two previous posts on how to add interface's description in Zabbix graphs [1] and triggers [2], I will finish this serie of Zabbix posts with the creation of a production interface trigger.

By default Zabbix includes the "Operational status was changed ..." trigger which is (from my opinion) a big joke :
  • The trigger disappears (status "OK") after the next ifOperStatus check (60 seconds by default)
  • The trigger is raised when an equipment is plugged in. This is a "good to know information" but I can't rise a high severity trigger each time something is plugged !
  • I can't tell if the interface was up and went down OR if the interface was down and went up.
  • If I want to have a "Something was plugged in on GEX/X/X" trigger, I would make a special trigger for that purpose.
  • The trigger doesn't include the interface's description (which is extremely irritating and makes me want to kill little kittens). Check my previous post [2] if you care about kitten's survival.
This new trigger will have the following properties :
  • Raise ONLY if the interface was up (something was plugged in) and went down (equipment stopped, interface shut or somebody removed the cable). 
  • Will disappear if the interface come back up.
  • A "high" severity and will include interface's description.

Go to "Configuration -> Templates -> Template SNMP Interfaces -> Discovery -> Trigger prototypes" and click on "Create trigger prototype".

Use the following line as trigger's name : 

 Production Interface status on {HOST.HOST}: {#SNMPVALUE}, {ITEM.VALUE2} : {ITEM.VALUE3}  

Use this as trigger's expression : 

 {Template SNMP Interfaces:ifOperStatus[{#SNMPVALUE}].avg(3600)}<2&{Template SNMP Interfaces:ifOperStatus[{#SNMPVALUE}].last(0)}=2&{Template SNMP Interfaces:ifAlias[{#SNMPVALUE}].str(this_does_not_exist)}=0  

This expression means, raise if interface was up "avg(3600)}<2" AND went down "last(0)}=2". The 3600 value specify how long the trigger will stay up; After 3600s "avg(3600)" will equals 2 and the trigger will disappear.
The .str(this_does_not_exist)}=0 expression is used to show the interface's description and is explained in my previous post [2].

Use this as trigger's description :
 Interface status went up to down !!!  
Interface : {#SNMPVALUE}, {ITEM.VALUE1} = {ITEM.VALUE3}

Set the severity to "high" (or whatever is your concern), you can override severity for each of your interface/equipement.

Wait until the discovery rule is refreshed (default is 3600s) or temporarily set it to 60s. We can now try to disable an interface to check the results, let's do this on bccsw02 ge/0/0/3 :

The trigger is raised as expected with the hostname, interface name and description, if you configured Zabbix actions, the alert message will look like
"Production Interface status ev-bccsw02: ge-0/0/3, down (2) : EV-ORADB01 - BACK_PROD"

Let's renable the interface :

Trigger goes green as the interface went up, you should receive a message saying :
"Production Interface status ev-bccsw02: ge-0/0/3, up (1) : EV-ORADB01 - BACK_PROD"
 
Be aware that you can also use SNMP traps for that purpose.

Hope that helps !

[1] : http://sysnet-adventures.blogspot.fr/2014/02/zabbix-display-network-interface.html
[2] : http://sysnet-adventures.blogspot.fr/2014/04/zabbix-display-network-interface.html

by Greg (noreply@blogger.com) at April 14, 2014 05:28 PM

Zabbix : Display network interface description in triggers


 In a previous post [1], I explained how to solve a very fustrating thing about Zabbix : "How add network interface's description in your graph names."

In this post, I'll explain how to fix another very fustrating thing about Zabbix : "How to add network interface description in your trigger names"

Zabbix has a default interface trigger which is raised when an interface status changes.
Good thing it would have been if we didn't have the same issue we had with the graphs; you don't have the interface description neither in the trigger's name nor in the comment. This is very annonying, especially if you receive alerts during the night.

Below an example of the default Zabbix trigger alert :



Seems like Ge1 operational status changed, good to know, but again what the hell is "ge1" ???
Message to Zabbix team : Do you really thing I learnt all my switches port allocations by heart ???

The good news here is you can solve this stupidity with à "crafty" trick !

Trigger names/descriptions don't interpret items so using the "Zabbix Graph" trick [1] won't work...
To get your interface's description, you'll need to insert a "interface alias" item (ifAlias) in your trigger expression and reference it in the trigger name with the Zabbix standard macro "{ITEM.VALUEX}"

Go to "Configuration -> Templates -> Template SNMP Interfaces -> Discovery -> Trigger prototypes"

You should have a trigger named "Operational status was changed on {HOST.NAME} interface {#SNMPVALUE}" which matches the screenshot above.
To get the interface description, we first add a trigger expression that checks if the interface alias (i.e description) equals (str() function) a string that will NEVER match for example "this_does_not_exist" :

 {Template SNMP Interfaces:ifAlias[{#SNMPVALUE}].str(this_does_not_exist)}=0  

This line means, the network interface description is NOT "this_does_not_exist" which is always true. Finally we add an AND operator (&) between the original expression and the string comparison which gives us the final trigger expression :

 {Template SNMP Interfaces:ifOperStatus[{#SNMPVALUE}].diff(0)}=1&amp;{Template SNMP Interfaces:ifAlias[{#SNMPVALUE}].str(this_does_not_exist)}=0  

This line means there were a interface operational status change AND the interface's alias is NOT "this_does_not_exist".
This alias comparaison is just a trick so we can reference the interface's alias (i.e description) with the "{ITEM.VALUEX}" standard macro.

Now change the trigger name with the following string :

  Operational status was changed on {HOST.NAME} interface {#SNMPVALUE} : {ITEM.VALUE2}  

As you can see, I added the macro {ITEM.VALUE2} that returns the name of the second item in the trigger's expression which is, you guessed it, the interface alias !

Wait until the discovery rule is refreshed (default is 3600s) or temporarily set it to 60s and enjoy the happiness of the result :


You can also use the {ITEM.VALUE2} macro in the trigger's description, very handy if you want to include additional information for the on-call guy.

In the next post [2], I'll show how to create a real interface trigger; from my point of view this default trigger is completely useless :
  • The trigger disappears after the next ifOperStatus check (60 seconds by default)
  • The trigger is raised when an equipment is plugged in. This is a "good to know information" but I can't rise a high severity trigger each time something is plugged !
  • I can't tell if the interface was up and went down OR if the interface was down and went up.
  • If I want to have a "Something was plugged in on GEX/X/X" trigger, I would make a special trigger for that purpose.

[1] http://sysnet-adventures.blogspot.fr/2014/02/zabbix-display-network-interface.html
[2] http://sysnet-adventures.blogspot.fr/2014/04/zabbix-create-production-network.html

by Greg (noreply@blogger.com) at April 14, 2014 05:28 PM

Rich Bowen

ApacheCon North America 2014

Last week I had the honor of chairing ApacheCon North America 2014 in Denver Colorado. I could hardly be any prouder of what we were able to do on such an incredibly short timeline. Most of the credit goes to Angela Brown and her amazing team at the Linux Foundation who handled the logistics of the event.

My report to the Apache Software Foundation board follows:

ApacheCon North America 2014 was held April 7-9 in Denver, Colorado, USA. Despite the very late start, we had higher attendance than last year, and almost everyone that I have spoken with has declared it an enormous success. Attendees, speakers and sponsors have all expressed approval of the job that Angela and the Linux Foundation did in the production of the event. Speaking personally, it was the most stress-free ApacheCon I have ever had.

Several projects had dedicated hackathon spaces, while the main hackathon room was unfortunately well off of the beaten path, and went unnoticed by many attendees. We plan to have the main hackathon space much more prominently located in a main traffic area, where it cannot be missed, in Budapest, as I feel that the hackathon should remain a central part of the event, for its community-building opportunities.

Speaking of Budapest, on the first day of the event, we announced ApacheCon Europe, which will be held November 17-21 2014 in Budapest. The website for that is up at http://apachecon.eu/ and the CFP is open, and will close June 25, 2014. We plan to announce the schedule on July 28, 2014, giving us nearly 4 months lead time before the conference. We have already received talk submissions, and a few conference registrations. I will try to provide statistics each month between now and the conference.

As with ApacheCon NA, there will be a CloudStack Collaboration Conference co-located with ApacheCon. We are also discussing the possibility of a co-located Apache OpenOffice user-focused event on the 20th and 21st, or possibly just one day.

We eagerly welcome proposals from other projects which wish to have similar co-located events, or other more developer- or PMC-focused events like the Traffic Server Summit, which was held in Denver.

Discussion has begun regarding a venue for ApacheCon North America 2015, with Austin and Las Vegas early favorites, but several other cities being considered.

I'll be posting several more things abut it, because they deserve individual attention. Also, we'll be posting video and audio from the event on the ApacheCon website in the very near future.

by rbowen at April 14, 2014 04:03 PM

Everything Sysadmin

Time Management training at SpiceWorld Austin, 2014

I'll be doing a time management class at SpiceWorld.

Read about my talk and the conference at their website.

If you register, use code "LIMONCELLI20" to save 20%.

See you there!

April 14, 2014 02:28 PM

Interview with LOPSA-East Keynote: Vish Ishaya

Vish Ishaya will be giving the opening keynote at LOPSA-East this year. I caught up with him to talk about his keynote, OpenStack, and how he got his start in tech. The conference is May 2-3, 2014 in New Brunswick, NJ. If you haven't registered, do it now!

Tom Limoncelli: Tell us about your keynote. What should people expect / expect to learn?

Vish Ishaya: The keynote will be about OpenStack as well as the unique challenges of running a cloud in the datacenter. Cloud development methodologies mean different approaches to problems. These approaches bring with them a new set of concerns. By the end of the session people should understand where OpenStack came from, know why businesses are clamoring for it, and have strategies for bringing it into the datacenter effectively.

TL: How did you get started in tech?

VI: I started coding in 7th Grade, when I saw someone "doing machine language" on a computer at school (He was programming in QBasic). I started copying programs from books and I was hooked.

TL: If an attendee wanted to learn OpenStack, what's the smallest installation they can build to be able to experiment? How quickly could they go from bare metal to a working demo?

VI: The easiest way to get started experimenting with OpenStack is to run DevStack (http://devstack.org) on a base Ubuntu or Fedora OS. It works on a single node and is generally running in just a few minutes.

TL: What are the early-adopters using OpenStack for? What do you see the next tier of customers using it for?

VI: OpenStack is a cloud toolkit, so the early-adopters are building clouds. These tend to be service providers and large enterprises. The next tier of customers are smaller businesses that just want access to a private cloud. These are the ones that are already solving interesting business problems using public clouds and want that same flexibility on their own infrastructure.

TL: Suppose a company had a big investment in AWS and wanted to bring it in-house and on-premise. What is the compatibility overlap between OpenStack and AWS?

We've spent quite a bit of time analyzing this at Nebula, because it is a big use-case for our customers. It really depends on what features in AWS one is using. If just the basics are being used, the transition is very easy. If you're using a bunch of the more esoteric services, finding an open source analog can be tricky.

TL: OpenStack was founded by Rackspace Hosting and NASA. Does OpenStack run well in zero-G environments? Would you go into space if NASA needed an OpenStack deployment on the moon?

When I was working on the Nebula project at NASA (where the OpenStack compute project gestated), everyone always asked if I had been to space. I haven't yet, but I would surely volunteer.

Thanks to Vish for taking the time to do this interview! See you at LOPSA-East!

April 14, 2014 02:28 PM

Yellow Bricks

FUD it!


In the last couple of weeks something stood out to me when it comes to the world of storage and virtualisation and that is animosity. What struck me personally is how aggressive some storage vendors have responded to Virtual SAN, and Server Side Storage in general. I can understand it in a way as Virtual SAN plays in the same field and they probably feel threatened and it makes them anxious. In some cases I even see vendors responding to VSAN who do not even play in the same space, I guess they are in need of attention. Not sure this is the way to go about to be honest, if I were considering a hyper(visor)-converged solution I wouldn’t like being called lazy because of it. Then again, I was always taught that lazy administrators are the best administrators in the world as they plan accordingly and pro-actively take action. This allows them to lean back while everyone else is running around chasing problems, so maybe it was a compliment.

Personally I am perfectly fine with competition, and I don’t mind being challenged. Whether that includes FUD or just cold hard facts is even besides the point, although I prefer to play it fair. It is a free world, and if you feel you need to say something about someone else product you are free to do so. However you may want to think about the impression you leave behind. In a way it is insulting to our customers. With our customers including your customers.

For the majority of my professional career I have been a customer, and personally I can’t think of anything more insulting than a vendor spoon feeding why their competitor is not what you are looking for. It is insulting as it insinuates that you are not smart enough to do your own research and tear it down as you desire, not smart enough to know what you really need, not smart enough to make the decision by yourself.

Personally when this happened in the past, I would simply ask them to skip the mud slinging and go to the part where they explain their value add. And in many cases, I would end up just ignoring the whole pitch… cause if you feel it is more important to “educate” me on what someone else does over what you do… then they probably do something very well and I should be looking at them instead.

So lets respect our customers… let them be the lazy admin when they want, let them decide what is best for them… and not what is best for you.

PS: I love the products that our competitors are working on, and I have a lot of respect how they paved the way of the future.

"FUD it!" originally appeared on Yellow-Bricks.com. Follow me on twitter - @DuncanYB.

by Duncan Epping at April 14, 2014 11:31 AM

Chris Siebenmann

My reactions to Python's warnings module

A commentator on my entry on the warnings problem pointed out the existence of the warnings module as a possible solution to my issue. I've now played around with it and I don't think it fits my needs here, for two somewhat related reasons.

The first reason is that it simply makes me nervous to use or even take over the same infrastructure that Python itself uses for things like deprecation warnings. Warnings produced about Python code and warnings that my code produces are completely separate things and I don't like mingling them together, partly because they have significantly different needs.

The second reason is that the default formatting that the warnings module uses is completely wrong for the 'warnings produced from my program' case. I want my program warnings to produce standard Unix format (warning) messages and to, for example, not include the Python code snippet that generated them. Based on playing around with the warnings module briefly it's fairly clear that I would have to significantly reformat standard warnings to do what I want. At that point I'm not getting much out of the warnings module itself.

All of this is a sign of a fundamental decision in the warnings module: the warnings module is only designed to produce warnings about Python code. This core design purpose is reflected in many ways throughout the module, such as in the various sorts of filtering it offers and how you can't actually change the output format as far as I can see. I think that this makes it a bad fit for anything except that core purpose.

In short, if I want to log warnings I'm better off using general logging and general log filtering to control what warnings get printed. What features I want there are another entry.

by cks at April 14, 2014 05:20 AM

Raymii.org

Linux software raid, rebuilding broken raid 1

Last week Nagios alerted me about a broken disk in one of my clients testing servers. There is a best effort SLA on the thing, and there were spare drives of the same type and size in the datacenter. Lucky me. This particular data center is on biking distance, so I enjoyed a sunny ride there. Simply put, I needed to replace the disk and rebuild the raid 1 array. This server is a simple Ubuntu 12.04 LTS server with two disks running in raid 1, no spare. Client has a tight budget, and with a best effort SLA not in production, fine with me. Consultant tip, make sure you have those things signed.

April 14, 2014 12:00 AM

April 13, 2014

Ubuntu Geek

i7z-gui – A reporting tool for i7, i5, i3 CPUs

i7z reports Intel Core i7, i5, i3 CPU information about Turbo Boost,frequencies, multipliers, ... and comes top-like display showing per core the current frequency, temperature and times spent in the C0/C1/C3/C6/C7 states.

There was no standard way (june/09) to report on CPU information for i7 within Linux, so i coded a small program that has capabilities of reporting on the stock and overclocked i7. This tool will work only on linux (i tested 64-bit but 32-bit should would too) and on an i7 (tested it on 920). Readme, and Code provided in attachment.
(...)
Read the rest of i7z-gui – A reporting tool for i7, i5, i3 CPUs (305 words)


© ruchi for Ubuntu Geek, 2014. | Permalink | 3 comments | Add to del.icio.us
Post tags: , , , , ,

Related posts

by ruchi at April 13, 2014 11:47 PM

Trouble with tribbles

Cloud analogies: Food As A Service

There's a recurring analogy of Cloud as utility, such as electrical power. I'm not convinced by this, and regard a comparison of the Cloud with the restaurant trade as more interesting. Read on...

Few IT departments build their own hardware, in the same way that few people grow their own food or keep their own livestock. Most buy from a supplier, in the same way that most buy food from a supermarket.

You could avoid cooking by eating out for every meal. Food as a Service, in current IT parlance.

The Cloud shares other properties with a restaurant. It operates on demand. It's self service, in the sense that anyone can walk in and order - you don't have to be a chef. There's a fixed menu of dishes, and portion sizes are fixed. It deals with wide fluctuations of usage throughout the day. For basic dishes, it can be more expensive than cooking at home. It's elastic, and scales, whereas most people would struggle if 100 visitors suddenly dropped by for dinner.

There's a wide choice of restaurants. And a wide variety of pricing models to match - Prix Fixe, a la carte, all you can eat.

Based on this analogy, the current infatuation with moving everything to the cloud would be the same as telling everybody that they shouldn't cook at home, but should always order in or eat out. You no longer need a kitchen, white goods, or utensils, nor do you need to retain any culinary skills.

Sure, some people do eat primarily at a basic burger bar. Some eat out all the time. Some have abandoned the kitchen. Is it appropriate for everyone?

Many people go out to eat not necessarily to avoid preparing their own food, but to eat dishes they cannot prepare at home, to try something new, or for special occasions.

In other words, while you can eat out for every meal, Food as a Service really comes into its own when it delivers capabilities beyond that of your own kitchen. Whether that be in the expertise of its staff, the tools in its kitchens, or the special ingredients that it can source, a restaurant can take your tastebuds places that your own kitchen can't.

As for the lunacy that is Private Cloud, that's really like setting up your own industrial kitchen and hiring your own chefs to run it.

by Peter Tribble (noreply@blogger.com) at April 13, 2014 08:27 PM

Rands in Repose

Protecting Yourself from Heartbleed

Earlier this morning, I tweeted:

This is not actually good advice. You shouldn’t be changing your password on a server until the server administrator has confirmed whether their servers were affected and, if so, whether the server has been patched.

Mashable appears has an up-to-date breakdown of the most popular services out there and their disposition relative to Heartbleed.

#

by rands at April 13, 2014 07:35 PM

Server Density

Chris Siebenmann

A problem: handling warnings generated at low levels in your code

Python has a well honed approach for handling errors that happen at a low level in your code; you raise a specific exception and let it bubble up through your program. There's even a pattern for adding more context as you go up through the call stack, where you catch the exception, add more context to it (through one of various ways), and then propagate the exception onwards.

(You can also use things like phase tracking to make error messages more specific. And you may want to catch and re-raise exceptions for other reasons, such as wrapping foreign exceptions.)

All of this is great when it's an error. But what about warnings? I recently ran into a case where I wanted to 'raise' (in the abstract) a warning at a very low level in my code, and that left me completely stymied about what the best way to do it was. The disconnect between errors and warnings is that in most cases errors immediately stop further processing while warnings don't, so you can't deal with warnings by raising an exception; you need to somehow both 'raise' the warning and continue further processing.

I can think of several ways of handling this, all of which I've sort of used in code in the past:

  • Explicitly return warnings as part of the function's output. This is the most straightforward but also sprays warnings through your APIs, which can be a problem if you realize that you've found a need to add warnings to existing code.

  • Have functions accumulate warnings on some global or relatively global object (perhaps hidden through 'record a warning' function calls). Then at the end of processing, high-level code will go through the accumulated warnings and do whatever is desired with them.

  • Log the warnings immediately through a general logging system that you're using for all program messages (ranging from simple to very complex). This has the benefit that both warnings and errors will be produced in the correct order.

The second and third approaches have the problem that it's hard for intermediate layers to add context to warning messages; they'll wind up wanting or needing to pass the context down to the low level routines that generate the warnings. The third approach can have the general options problem when it comes to controlling what warnings are and aren't produced, or you can try to control this by having the high level code configure the logging system to discard some messages.

I don't have any answers here, but I can't help thinking that I'm missing a way of doing this that would make it all easy. Probably logging is the best general approach for this and I should just give in, learn a Python logging system, and use it for everything in the future.

(In the incident that sparked this entry, I wound up punting and just printing out a message with sys.stderr.write() because I wasn't in a mood to significantly restructure the code just because I now wanted to emit a warning.)

by cks at April 13, 2014 06:15 AM


Administered by Joe. Content copyright by their respective authors.