Apr 14

I love the idea of Netomata! I haven’t used it yet, but have often lamented the lack of structure around networking configs. This is not just a great idea for the implementation level, but also for management. If you run your shop with this, a director/manager can learn the tool and get visibility into the entire networking infrastructure rather than having to trace through the decentralized networking equipment configs.

It’s also makes the networking piece of Disaster Recovery significantly easier.

The benefits and why pages are great summaries of why to use centrally generated configs for all machine management. One of the points is “Providing a limited kind of process documentation.” This massivly sells the process short. It would be better to say “Provides unequivocally and 100% repeatable process documentation.”

If you’ve got experience with it, please post a trip report.

-Tony

written by admin

Mar 26

A necessary piece of operations is riding herd on home grown applications and projects from the corporate wilds. These things come to you late in their lifecycle with little to say about how their technology or composition. Often the expectation is that you’ll just take them over and “make them work.” Sometimes that’s doable, but most time there are support limitations.

Here’s the interview and explanation process I use to work with groups outside of Ops to set realistic expectations and about what can and and can’t do for them. It is step 0 of a project plan work. I like to avoid surprised and clearly set expectations about Operations can and can’t do.

How to have Ops take ownership for systems or processes or programs:

  1. What is the business justification for this process?
  2. Who sponsors the process (outside of operations)?
  3. When will the process be turned over to operations?
  4. How will your group know the process is in place and being monitored?
  5. What are Operations obligations and responsibilities?
  6. What are the sponsoring groups obligations and responsibilities?

System category:

  1. Requires full/half/quarter time staff member.
  2. Existing process needs monitoring and response plan.
  3. Trivial process that doesn’t require monitoring.
  4. Trivial process that needs monitoring.
  5. Ops can monitor but not trouble shoot.
  6. Ops can troubleshoot at level 1/2/3 but cannot fix.

Why would ops decline to accept your system, process or program:

  1. There may be no way to support the process (for instance it involves on-going manual work – in this case the process likely needs to start at Engineering).
  2. It will incur resource costs beyond reasonable levels (i.e. network usage beyond our current capacity, etc.)
  3. The sponsoring group does not provide ongoing budgetary support.

What you should expect from us.

  1. Integrity and discipline in all our work.
  2. A consulting approach to putting your process into production. This means being an organization that is committed to your success and wants to put your work into production.
  3. A “closed loop” system that has clear responsibility, reporting, troubleshooting and escalation procedures.

written by admin

Jan 20

I setup dhcpd and tfpt just infrequently enough to forget the details. I’m putting my gottchas here so I don’t forget them.

syslinux package ‘pxelinux’:
pxelinux loads and gets the right IP, then it fails trying to
getting the error “tftp server does not support tsize option”

Fix:

in file /etc/dhcpd.conf:

# absolutly critical to have the next-server line for tftp booting
# when you get "tftp server does not support tsize option" error,
#it's because your missing the config line, Double check with:
#          grep next-server     /etc/dhcpd.conf
#    - Tony 10/17/08
next-server 192.168.0.50;

Troubleshooting:

1] for setting up tftpd you have to make sure there are not entries like
this in /etc/hosts file

127.0.1.1      joust.famemobile.com joust

if so you have to change them to this.

192.168.1.155   joust.famemobile.com joust

2] Using tcpdump for tftp trouble shooting

The fact that loading pxelinux.0 succeeds made me think everything else should work.

The pxelinux.0 loads fine, but the config file ‘pxelinux.cfg/01-00-0c-29-c4-b0-5a’ does not.

05:27:20.882329 IP (tos 0×0, ttl 20, id 2, offset 0, flags [none], proto: UDP (17), length: 55) 192.168.0.51.ah-esp-encap > 192.168.0.50.tftp: [udp sum ok] 27 RRQ “pxelinux.0″ octet tsize 0
05:27:20.893400 IP (tos 0×0, ttl 20, id 4, offset 0, flags [none], proto: UDP (17), length: 60) 192.168.0.51.acp-port > 192.168.0.50.tftp: [udp sum ok] 32 RRQ “pxelinux.0″ octet blksize 1456
05:27:20.953322 IP (tos 0×0, ttl 20, id 29, offset 0, flags [none], proto: UDP (17), length: 91) 192.168.0.51.57089 > 0.0.0.0.tftp: 63 RRQ “pxelinux.cfg/01-00-0c-29-c4-b0-5a” octet tsize 0 blks
… stuff cut out…
05:27:20.972168 IP (tos 0×0, ttl 18, id 44911, offset 0, flags [none], proto: UDP (17), length: 54) 0.0.0.0.tftp > 192.168.0.51.57089: [udp sum ok] 26 ERROR tftp-err-#8 ” tsize option required”

The “0.0.0.0.tftp” is the indicator there is something wrong.

written by admin \\ tags:

Jan 19

I use blogger and host the files on my server, after I edit a post it has to sftp the files so they appear here. This is the process for adding them.

Adding the Blogger sftp servers to iptables.

Blogger.com lists their outbound ip’s here. (It was current Jan 19, 2009)

# always check the addresses are correct and the link above.
for i in 66.102.15.83 216.34.7.186 64.233.178.192/28  64.233.178/28
  do
             echo iptables -A INPUT -i eth0 -s $i -p tcp --dport ssh -j ACCEPT
  done
### Output
iptables -A INPUT -i eth0 -s 66.102.15.83 -p tcp --dport ssh -j ACCEPT
iptables -A INPUT -i eth0 -s 216.34.7.186 -p tcp --dport ssh -j ACCEPT
iptables -A INPUT -i eth0 -s 64.233.178.192/28 -p tcp --dport ssh -j ACCEPT
iptables -A INPUT -i eth0 -s 64.233.178/28 -p tcp --dport ssh -j ACCEPT

######## Other notes
I cheated and used ipcalc to the get the subnet calculations:

  ipcalc 64.233.178.192 - 64.233.178.207
  64.233.178.192/28

written by admin

Jan 19

How to test if your rules are being activated:

# logging just the first packet - this shows an external host is reaching you,
# but does not flood messages with notices for every packet.

# Insert at the top of the INPUT chain a request to log only NEW connections
iptables -I INPUT -m state –state NEW -j LOG

Turning off logging on iptables:

# find the logging entry, use –line-number so you know which rule to delete.
iptables -L INPUT –line-number |egrep ‘Chain|LOG’
Chain INPUT (policy DROP)
1 LOG all — anywhere anywhere LOG level warning

# delete it
iptables –delete INPUT 1

## here’s a quicky perl script to get the same info and generate (but not execute) the delete line.

#!/usr/bin/perl

my $CHAIN_NAME;
my $RULE_NUM;

# grab the iptables output
#@iptables_output = qx{iptables -L -n --line-numbers } ;

@iptables_output = qx{~/tmp/iptables -L -n --line-numbers } ;

# cut off the newlines
chomp @iptables_output;

for my $iptables_output_line (@iptables_output) {
    ( $TMP_CHAIN_NAME ) =  $iptables_output_line =~ m/
                      \A         # at the beginning of the line
                      Chain      # match chain
                      \s+
                      (\w+(-)?\w+)
                      /xms
                          and $CHAIN_NAME = $TMP_CHAIN_NAME;

    ($RULE_NUM) = $iptables_output_line =~ m/
                                             \A # at the beginning of the line
                                             (\d)+ # match any number of numbers
                                             \s+   # some space
                                             LOG    # the literal 'LOG'
                                             /xms
                                                 and print "found a log line for $CHAIN_NAME, delete it with:\n",
                                                     "\tiptables --delete $CHAIN_NAME $RULE_NUM\n";

}

######### END perl script #######

#A couple of bash helper functions:
function iptshow () {
iptables -L $1 –line-numbers
}

iptedit () {
vi /etc/sysconfig/iptables
}

written by admin

Jan 07

I always seem to need a tmp file, I used to do ‘vi /tmp/foo’ but it usually had something in it from last time.  This function opens a new file and stores the file name in $f.

I use it like:

vt
<paste some stuff, clean it up>
perl -pe ’s/foo/bar/’ $f

####
function vt () {
    for i in `seq 0 255`;
    do
        FILE=/tmp/$USER-foo-$i;
        if [ -f "$FILE" ]; then
            echo -n '.';
        else
            f=$FILE;
            vi $FILE;
            echo $FILE;
            return;
        fi;
    done
}

###### Cleanup
function cleanvt () {
for i in `seq 0 255`
do
    FILE=/tmp/$USER-foo-$i

    if [ -f "$FILE" ]
    then
    echo -n '.'
    rm $FILE
    else
        echo
        return
fi
done
echo
}

written by admin

Dec 16

If you love perl and are tired of hearing the “executable line noise” and “write only” comments, this book is the antidote.

There is a free preview at Google: Perl Best Practices

The chapter on choosing a brace style and using a beautifier (perltidy) is worth the price of the book. Same goes for the regular expression best practices.

My first copy was ‘mislaid’ :^) Highly recommended.

written by admin

Dec 12

Useful little shell productivity tricks:

I work remotely a lot. To avoid the lag associated with VPN’s and
to assure I have all the tools I love, I setup up an environment in
Cygwin that gets me all I need to do development. The one thing
that’s irritating is when it comes time to test in the customers
environment (or just a far machine). This is where this little while
loop comes in. I open a new xterm and

while :
do
rsync -zav /home/tony/work/SomeCustomerProject/ Far.Box.com:/home/tony/work/SomeCustomerProject/
read foo # wait here until any key is hit
fortune # gives me unique phrase/thought to remind me if I’ve pushed or not
done

After running the loop I resize the window down to one or two lines. Whenever I want to rsync, I focus on it and hit return.

Other ways to simulate a Linux/UNIX environment:

Netcat is native to Cygwin however you have to choose it in the ’setup’ app.
# simulate a network server - cats whatever is written, great for text protocols/SOAP, etc. Binary protocols will work, but you’ll have to go out of your way to translate them.
while :; do nc -n -t -vv -l -p 4913 127.0.0.1 ; echo “—–” ; done

Cygwin also implements named pipes:
mknode /usr/local/groundwork/nagios/var/spool/nagios.cmd p
# keep the pipe read so it doesn’t block
echo “while :; do cat /usr/local/groundwork/nagios/var/spool/nagios.cmd; done”

Mysql compiles under Cygwin. I use mysql-5.0.33.tar.gz because I have it laying around. It installs all the libs you need to compile DBD::mysql. It has pretty bad performance, but is great for development and light loads.

# add a user to cygwin’s /etc/passwd:
mysql:unused_by_nt/2000/xp:1003:513:mysql user:/usr/local/var:/bin/bash

### config line
./configure –enable-assembler –with-mysqld-ldflags=-all-static
make install # installs to /usr/local/*

After that it’s just a regular MySQL setup.

-Tony

written by admin

Nov 21

#!/usr/bin/perl

##############################
##########
# Fri Nov 21 15:36:39 2008

# A util I’ve always wanted! when you have a perms problem in unix you
# need to know the perms on each level above the file. AFAIK there’s
# no way to get them other than typing ls for each element of the
# path. This cuts that process down.

while ( my $_ = shift @ARGV) {
my @broken_down_path;
my @path_list;
my %ls_line;

chomp;
# get the path elemets so we can see the perms at each level.
@broken_down_path = split q{/}, “$_” ;

# the first element from the split is nul, pulling it out and
# putting a ’slash’ in so I get the entire tree
shift @path_list; unshift @path_list, qq{/};

for (0..$#broken_down_path) {
my $rebuild_path_element;
# as we go thru the array we building up an list that makes up the path
$rebuild_path_element = join q{/}, @broken_down_path[0..$_];
push @path_list, $rebuild_path_element;
}

#
for my $path (@path_list) {
# get the output of ls -ld on every level of the path, story in a hash of arrays
push @{$ls_line{$path}}, split q{ }, qx{/bin/ls -ld $path} ;
}

# rev sort the keys so we get the file at the top the dirs underneith.
for my $path (reverse sort keys %ls_line) {
# make sure $path is not null.
$path or next ;
# just the perms owner and group
print join qq{\t}, @{$ls_line{$path}}[0,2,3];
# put in the path
print “\t$path\n” ;
}

# if there are more to go, print a space.
@ARGV ? print “\n” : ”;

} # end for (@ARGV)

=pod

sample usage and output. Look at permissions at all tree levels to
figure out while a user can’t read a file or dir.

tony-ws:bin> tree_perms.pl /home/tony/work/getactive/kickstart_configs/post_install_scripts/_base/service_control.bash /home/tony/bin/At.pm
-rw——- tony None /home/tony/work/getactive/kickstart_configs/post_install_scripts/_base/service_control.bash
drwx——+ tony None /home/tony/work/getactive/kickstart_configs/post_install_scripts/_base
drwx——+ tony None /home/tony/work/getactive/kickstart_configs/post_install_scripts
drwx——+ tony None /home/tony/work/getactive/kickstart_configs
drwx——+ tony None /home/tony/work/getactive
drwx——+ tony None /home/tony/work
drwxrwxrwx+ tony None /home/tony
drwxrwxrwx+ tony None /home
drwxrwx—+ tony Users /

-rw——- tony None /home/tony/bin/At.pm
drwxr-xr-x+ tony None /home/tony/bin
drwxrwxrwx+ tony None /home/tony
drwxrwxrwx+ tony None /home
drwxrwx—+ tony Users /

=cut

written by admin

Oct 12
  1. Anything that is not automated has no future. It will die or kill you.
  2. People can only do 3 things at once, really. Beyond that energy is spread too thin and everything starts to wind down.
  3. Fire heroes. They have a perverse incentive to create problems.
  4. There is a straight line correlation between the number of steps in the process and it’s error rate.
  5. Anything without an owner will be blamed. If you’re the boss - you’ll be blamed.
  6. Computers can only do what their told - if they’re not doing the right thing, its because of a person.
  7. The last change broke it.
  8. The answer is always in the log file (or the output of trace, strace, ktrace, etc.)
  9. No one reads docs. It’s better to automate procedures and only hire people who are able to read code.
  10. Success is not writing software, its running it.
  11. Use the scientific method as your published troubleshooting method.
  12. Pay on-call differentials.
  13. Have a rigorous hiring procedure.
  14. A Sr. Admin spends his day automating away work or training someone else how to do it. Anything else is mid-level at best.
  15. Put all your admins on a development plan that has them automating work.

written by admin