You are reading Morning Coffee, a blog about software engineering written by Igor Šarčević.
Have a nice day!

Sorting your production.log

Igor Sarcevic Igor wrote this in March 2015

Many developers fail to realize that by using the basic Unix tools, you can find on any server, you can find and collect valuable data from your logs. Often the data you need can be found without using any external services, including but not limited to service measurements.

This is the continuation of the series where I show how to collect valuable data from your production.log. Please read the previous posts if you haven’t already

Basic Unix sorting

In this article I want to show you how to sort your data using the sort Unix command. With that you can order the routes by response time, or to count them and show a list of most frequently visited. But first let’s see how the sort command works.

Let’s start with a file animals.txt that has the following content:

dog
cat
fish

If you call sort on this file the output will be the following:

$ sort animals.txt
cat
dog
fish

As you can see in the above output, the sort command orders the lines alphabetically. To show the reverse order:

$ sort -r animals.txt
fish
dog
cat

You can also sort numerically. For example if you have a file called numbered-animals.txt with the following content:

117 cat
1 fish
10 dog

You can sort it with the following command:

$ sort -n numbered-animals.txt
1 fish
10 dog
117 cat

Please note that if you leave out -n your output will be the same in some cases, but not always. For example:

$ sort -n numbered-animals.txt
10 dog
117 cat
1 fish

We can of course order by other words other than the first one in the line. Using the -k option we can tell sort which field it to use. Let’s look at the following example.txt file:

a=abc b=999
a=abd b=462
a=abf b=517

To sort using the b= part we can invoke sort with the -k 2 option to select the second word in the lines:

$ sort -k 2 example.txt
a=abd b=462
a=abf b=517
a=abc b=999

But, the -k option doesn’t stop there. We can also select the character from which we want to start sorting. For example to cut of the first 3 characters in the b=462 and get 62 we would write -k 2.4 option, and thus sort only using the last two digits:

$ sort -k 2.4 example.txt
a=abf b=517
a=abd b=462
a=abc b=999

Sorting logs by response time

Now let’s use the above knowledge to sort the routes from fastest to slowest. Let’s use the following example:

method=GET path=/user format=*/* controller=users action=create status=200 duration=4.2 view=0.00 db=3.91 time=2016-02-16 16:50:37 +0000
method=POST path=/user format=*/* controller=users action=create status=200 duration=2.4 view=0.00 db=3.91 time=2016-02-16 16:48:37 +0000
method=POST path=/user format=*/* controller=users action=create status=200 duration=56.4 view=0.00 db=3.91 time=2016-02-16 16:52:12 +0000

The only thing now we have to do is to count the location of the things we want to sort. In this example we are interested in the duration=<number> part of the lines. The duration is the 7th word on the line and the numbers start at the 10th place in the word. Now it is easy to conclude that we need the -k 7.10 option in our sort command.

The other option we want to use is the -n option to sort the lines numerically.

$ cat production.log | sort -n -k 7.10
method=POST path=/user format=*/* controller=users action=create status=200 duration=2.4 view=0.00 db=3.91 time=2016-02-16 16:48:37 +0000
method=GET path=/user format=*/* controller=users action=create status=200 duration=4.2 view=0.00 db=3.91 time=2016-02-16 16:50:37 +0000
method=POST path=/user format=*/* controller=users action=create status=200 duration=56.4 view=0.00 db=3.91 time=2016-02-16 16:52:12 +0000

Final words

I hope you enjoyed this tutorial, I will give my best to finish the next one soon, where I will be writing about unique lines.

Happy hacking!