Graphing GC Logs

I needed to go from a set of GC logs provided to me by a customer, to some kind of easy way to present them with the data. Normally, I would use a tool such as GCViewer, but the format of this log set did not allow for that.

I ended up using the following command to generate a CSV file which I could then run through Gnuplot (use brew to install this on OSX).

grep “2014-03-31″ gc.log-2014-0* | cut -d ‘T’ -f 2 | cut -d ‘:’ -f 1 | uniq -c | sed s/^’ ‘*/”/g | sed s/’ ‘/’,’/ > ~/gnuplot/datafile

Breakdown of this command is:

grep “2014-03-31″ gc.log-2014-0*

Find all the files with the day I am interested in, and don’t trust the file name to hold the correct dates.

cut -d ‘T’ -f 2

Remove everything before the hour value of the result string.

cut -d ‘:’ -f 1

Remove everything after the hour value of the result string.

uniq -c

Give me a list of hours, and how many times a GC occurred during each one.

sed s/^’ ‘*/”/g

Strip out the leading white spaces.

sed s/’ ‘/’,’/ 

And to finish it off, replace the remaining spaces with commas to make it into a CSV file.

The resulting file looked like:

18,00
4,01
3,02
8,03
5,04
5,05
4,06
6,07
11,08
21,09
35,10
29,11
22,12
23,13
27,14
28,15
24,16
10,17
7,18
8,19
8,20
9,21
26,22
24,23

And when I used it with the following Gnuplot config:

set term png
set title ‘Histogram’
set autoscale
set style histogram cluster gap 1
set style fill solid border -1
set boxwidth 0.9
set datafile separator “,”
set timefmt ‘%H’
set xlabel “hour”
set ylabel “GC operations”
set style data histogram
set xtics format “%H”  
set xrange [“00″:”23″]
plot ‘datafile’ using 1:xticlabels(2) with histogram

I was able to produce a plot like so:

This is by no means perfect, but I think it is pretty good for my first time playing with this tool.

Transforming SQL slow query logs into CSV

I had a set of slow query logs I needed to analyze in the following format:

# Time: 140313 17:48:57
# User@Host: stuff[stuff] @ server.local [10.10.10.10]
# Query_time: 7.071040  Lock_time: 0.000122 Rows_sent: 10  Rows_examined: 4271447
SET timestamp=1394758137;
SELECT jiraissue.ID FROM jiraissue jiraissue INNER JOIN changegroup cg ON jiraissue.ID = cg.issueid WHERE (jiraissue.PROJECT IN (10806, 10570, 13111, 10700, 10350, 13611, 13300, 10270, 13502, 13104, 13000, 13103, 13102, 13602, 11500, 10510, 13400, 11402, 10180, 10030, 12701, 10431, 10140, 10360, 12100, 10802, 13900, 12900, 13604, 11701, 11901, 10573, 13108, 10551, 11301, 10100, 13607, 13401, 12500, 10111, 11601, 11403, 13002, 13503, 10191, 10430, 12800, 11202, 10450, 11503, 10552, 10490, 11400, 10150, 13605, 10260, 10000, 11200, 11203, 11902, 10170, 10803, 10590, 11502, 10512, 13110, 13112, 11702, 10451, 11300, 13501, 11600, 10470, 10230, 10120, 10071, 12300, 10571, 10131, 10161, 11100, 10083, 10560, 11404, 13100, 11102, 13601, 13402, 10580, 11003, 10060, 10020, 10380, 10031, 10084, 12801, 10561, 10220, 12700, 13500, 12600, 13603, 10480, 10602, 10010, 10370, 13701, 12400, 12301, 13107, 10901, 10703, 10440, 10550, 11004, 13612, 10323, 11103, 10324, 13001, 13106, 11501, 13702, 10500, 10574, 11204, 10511, 10341, 11800, 12903, 10460, 11201, 10340, 10702, 13700, 13404, 13504, 12601, 13608, 10801, 12101) ) AND (cg.AUTHOR IN (‘abcdefg’) ) ORDER BY cg.CREATED DESC LIMIT 10;

But what I really wanted to do was to correlate the time and the query, the rest of the information was irrelevant to me. I ended up using the following command to get what I need:

pcregrep -M ‘Query.*n.*n.*;’ slow_query.log | grep -v ‘SET timestamp=’ | awk ‘/Query/ {print $3};/^.*;/{print}’ | awk ‘/^[0-9]+.[0-9]+/ {x=$0; getline; print x”,”$0}’ | sort

The breakdown here is:

pcregrep -M ‘Query.*n.*n.*;’ slow_query.log

This gives me just the query run time line, the SET line, and the query contents.

grep -v ‘SET timestamp=’

 This removed the lines with SET in them

awk ‘/Query/ {print $3};/^.*;/{print}’

This gives me just the query time on it’s own line

awk ‘/^[0-9]+.[0-9]+/ {x=$0; getline; print x”,”$0}’

This combines the two lines into a CSV

sort

This lets me see them in order of how much time they took to run.

Notes:

pcregrep needs to be installed using Brew on OSX:

brew install pcre

Tips when theming Octopress

I have recently started to play with Octopress, and have run into a couple of issues with getting things to update the way I want on Heroku.

  • If you use a git submodule to install your theme, your Heroku dynamo will re download that submodule every single time you push a new version of Octopress. If you want to customize your theme, you will need to clone the theme, and then manage it yourself. 
  • The theme resides in .themes/themename. Installing the theme, copies it’s files into the source folder. Generating the theme moves the completed Octopress blog to the public directory.
  • Themes should be updated in the ./theme/themename folder, then installed using this command sequence:
    rake install[‘themename’]
    rake generate 

Adding timestamp to recurring commands

The common scenario is that you want to add timestamps to a ping command.

The workaround to this is to use the ts command from the moreutils package.

ping google.com | ts ‘[%Y-%m-%d %H:%M:%S]’ > pingtest.out

[2013-03-15 14:29:14] Request timeout for icmp_seq 7113
[2013-03-15 14:30:19] Request timeout for icmp_seq 7177
[2013-03-15 14:30:28] Request timeout for icmp_seq 7186
[2013-03-15 14:30:49] Request timeout for icmp_seq 7207
[2013-03-15 14:31:00] Request timeout for icmp_seq 7218
[2013-03-15 14:32:13] Request timeout for icmp_seq 7291
[2013-03-15 14:32:31] Request timeout for icmp_seq 7309

Quick, simple, and helps to prove to end users that their network is unstable.

Exporting sqlite to csv easily on OSX

I was using an application called Dash Expander. I had some issues with it and decided that I needed to move away from it, but I wanted to retain my data. Dash Expander stores it’s data in a sqlite database and I wanted to have it in csv.

The solution was quite simple in terminal:

sqlite3 snippets.dash
sqlite> .mode csv
sqlite> .output snippets.csv
sqlite> select * from snippets;

And voila, you are done.

Automatic VJ for Soundcloud

I almost exclusively listen to music in some form of streaming. Whether it be via Pandora or DI.FM or YouTube, I always run into the challenge that streaming music very rarely has well done, programmatically generated videos to match them. What I am talking about is commonly called a visualizer. iTunes has one, VLC has one built into it also, as do many other free audio players. When I am listening to something (such as DI.FM) that can be accessed with one of these apps then I can generally get some basic visualizations going.  But what happens if you are listening to music on a site like SoundCloud?

Finnish web developer Simo Santavirta has created a cool solution to this issue called APEXvj. The best feature here is that it connects directly to SoundCloud; APEXvj also supports uploading your own MP3, and reacting to input from the computers microphone but neither of these are really anything special. Hopefully it will be adapted to support more sources of streaming audio in the future. Before we proceed I will say that APEXvj is made using Adobe Flash, I know many people do not like it but that is just how it is.

I connected to it, and after a short “I’m initializing” loading message was prompted to close any other windows where there may be a video running. From what I can tell this means any window with an active flash object. Once I got past that I got into the meat of the application and saw the following interface:

APEXvj – Configuration Interface

The application seems to generate everything based on the waveform, but I decided to test it by authenticating against my SoundCloud account (which worked flawlessly) and then having the AutoVJ feature shuffle my songs and automatically generate and transition the visuals as needed. The result? A fairly basic but cool waveform. The transitions between songs are not very smooth, but the software is smart enough to detect different songs within a single long mix.  Check out this screenshot of the visualizer in action:

The “Sweet Ripples” visualization

You will notice in the topmost image that there is a small “iDevice” icon in the upper right hand corner. This is what I feel to be the big “money maker” of this application.  You enter in a remote word which acts as both the username and password for your APEXvj session. I noticed a small bug here, when entering your remote word you can not use any of the pre bound keys in this application: f for fullscreen, space to pause, right and left to change the visuals (this would be used while moving right and left within the remote word field). Edit: I notified the developer and he has confirmed the issue and will fix it. So I recommend you don’t use any of those keys, or just copy and paste your remote word in. Then I navigated to http://apexvj.com/remote and entered my remote word, and voila, I was controlling my laptop from my phone. I noticed a delay of 2-3 seconds when issuing commands, but this is not bad for non comercial applications. The controls are not too extensive, just a visualizer controller, and then a next, previous, and pause / play button. Here are some screenshots (iPhone 4 using Atomic Web browser):

APEXvj – iPhone 4 remote word entry page
APEXvj – iPhone 4 control page
I am not able to take screenshots on my Nexus One but I confirmed that it works there as well, so while the icon used is that of an iDevice, the site actually seems to support other mobile browsers as well. I can also tell you that if a device connects to the remote interface and uses the same remote word as an existing session that the existing session will be disconnected.

The app comes with some basic sharing features. You can either share a link like I do below, or you can post your current configuration to StumbleUpon. Those are currently your only two options.

Want to jump right into all of this? Check out this mix by San Francisco’s DJ Tall Sasha.

Converting blockquote to hold code in Blogger

Because I post code on this blog sometimes, I decided that I should make it easier to read, and easier to use for myself. Up until now I have been using the <code></code> tags with some carriage returns to separate the code from the text. However, this was a hassle as it required me to keep flipping back and forth between the Compose view and the Edit HTML view when writing a post. I decided that I had to take stock of my resources in terms of what the new wysiwyg post editor allowed me to do easily. I noticed that it had a quote feature, and that it was tied to the <blockquote> tag. So I went to Design -> Blogger Template Designer -> Scrolled all the way down to the bottom and found the Add CSS option. There I added the following code:

blockquote{
   background:#E9E9E9; //This is the color for my blog, sub out your own.
   font-family: Courier New, Courier !important;
}

And voila. You have your basic template done. Now let us add some more fancy stuff to it.

Most terminal applications are designed with an 80 character terminal width. So I wanted to ensure that I would see 80 characters of width in my code section. Because I used a monospaced font I can now use the ex size unit to control the width. I had to adjust the font size as well to ensure that everything it my blog column nicely. See the adjusted code below:

blockquote{
   background:#E9E9E9;
   font-family: Courier New, Courier !important;
   font-size: 1.8ex;
   width: 80ex;
   margin: 1ex;
}

Monitoring Progress of DD on OSX

I needed to copy an iso to a flash drive to install Windows 7 on my friends computer and decided that dd was the right tool to perform this transfer. However, I know from past experience that dd will not show any kind of progress or progress bar. In the past I have been able to send a command along the lines of:

killall -USR1 dd

It appears that dd in OSX requires a -INFO flag from a kill / killall command rather then a -USR1 flag as is the standard in most Linux distributions I have played with. This brought me the following info:

1952+0 records in
1951+0 records out
2045771776 bytes transferred in 1486.640041 secs (1376104 bytes/sec)

This is nice, but what if I want more information? Ok well then I can use iostat. In my case the command I used was:

iostat -Iw 3 disk1

This showed me a transfer summary every 3 seconds for disk1. The output looked like this:

    disk1       cpu     load average
KB/t xfrs   MB  us sy id   1m   5m   15m
4.00 683747 2670.89   2  4 94  0.18 0.29 0.70
4.00 684771 2674.89   2  3 95  0.17 0.29 0.70
4.00 685795 2678.89   6  6 88  0.17 0.29 0.70
4.00 686819 2682.89   4  5 91  0.15 0.28 0.69
4.00 687843 2686.89   6  5 89  0.14 0.28 0.69
4.00 688867 2690.89   3  5 92  0.14 0.28 0.69
4.00 689868 2694.80   3  5 92  0.13 0.27 0.69

 Between the two, this was enough for my needs. However if you want a “graphical” way to monitor the transfer, check out Pipe Viewer.

Set up phpmyadmin for quick mysql db inspection

I need to view alot of mysql dumps, which are part of the contents of my companies customer’s logs. This is always a huge pain in the @#%. Today I decided to set up phpMyAdmin on my Centos 5 server to make this easier. This involved a little bit of work but was surprisingly easy. Here we go:

1. To use an up to date version of phpmyadmin you will need an up to date version of php. In this case 5.2+. PHP 5.2 is not yet available in the stable Centos 5 repos. Thus you must create the following file first:

/etc/yum.repos.d/CentOS-Testing.repo

Then you must add the following contents:

[c5-testing] name=CentOS-5 Testing baseurl=http://dev.centos.org/centos/$releasever/testing/$basearch/ enabled=1 gpgcheck=1 gpgkey=http://dev.centos.org/centos/RPM-GPG-KEY-CentOS-testing includepkgs=php*

2. To actually install all the packages you will need run the following command:

yum install php-mbstring pcre-devel php-pear php-devel httpd-devel php php-mysql httpd mysql mysql-server

3. Then run the following commands to install the apc and uploadprogress php modules:

pecl install apc pecl install uploadprogress

4. Then add the following two lines to your php.ini file (typically located at /etc/php.ini)

extension=apc.so extension=uploadprogress.so

5. Download phpMyAdmin from here: http://www.phpmyadmin.net/home_page/downloads.php I just use the english.zip format for ease of use.

6. Unzip and place the extracted files in the appropriate directory where apache will display them. By default, this is /var/www/html/

7. Copy the config.sample.inc.php file and rename it to config.inc.php. At this point you are basically done. Everything from here on our is extra stuff I did for ease of use in my environment*. You can access your phpMyAdmin by going to http://<ip of your server, or the domain name of it>/<folder where you placed phpMyAdmin> For me, this means: http://10.5.51.60/tools/phpmyadmin/

*I will add more info on what I did at a later date. See below:

8. For my use, I found that the upload limit set by php was too low. It was 2MB for upload_max_filesize and 8MB for post_max_size. Since my use is 100% on a corporate local network, I increased both of these values to 200MB 900MB so that they would pretty much never be a limiting factor.

9. In adittion to the upload limit, I had to increase the max_execution_time and max_input_time to ensure that php had enough time to parse the large DBs. Since the server this runs on is dedicated to me and if php goes haywire it is not a huge deal I set them to 300 seconds, and 600 seconds respectively.

10. And just for fun I boosted the memory_limit setting because the phpMyAdmin documentation says that can also cause issues.

Edit: I have come across a utility called Bigump which can also help.

Future of content distribution

The researchers at Delft University of Technology and the Vrije Universiteit Amsterdam have been working on a torrent client called Tribler. It has many of the basic features you see in every torrent client. However they added a new innovative distributed bandwidth currency system. Since the basic model of torrents does not encourage users to upload at all unless their tracker forces them to many people tend to just drop off the swarm as soon as they are done downloading. Tribler on the other hand will watch how much you upload, and comunicate with other people to whom you connect whose clients also support this and negotiates speeds based on how much you share. They call this Full Incentivisation and you can read about the ideas and supporting theories on their site. The important thing to note is that users who share more (bandwidth, disk space, up to date client version, etc) are rewarded with higher download speeds. But how does this affect the future of content distribution? Well one important thing to note is that the researches who are making Tribler are actually funded by none other then the European Union’s P2P-Next project. The push behind this is to utilize P2P technology in a comercial setting. Well when you realize that P2P-Next is also funding software such as SwarmPlayer (and that SwarmPlayer is integrated into Tribler) you will see where this is going. Peer to peer video streaming, and as a result, a massive drop in the barrier to entry for content providers. It would take a system where distributing content to 1 person used X resources and distributing content to 100 people used 100X resources and convert it to one where you would use X resources no matter the number of clients. Extra seeds can be added but they are not needed. The EU hopes that P2P-Next will help develop software that european content providers can use to legally distribute various forms of media.