When I began studying Ruby, one of the things that most caught my attention was the length of time between having an idea and being able to validate the first step; if this were an article about TDD, I would refer to this as the speed with which we receive feedback. The feedback loop is very short.
Why? My first years as an IT professional I worked as a programmer using mainly Microsoft Windows and, more specifically, .NET technologies.
In that environment, the distance (of time) between an idea and obtaining the
first feedback requires many steps. Open Visual Studio, create a console
project (or Windows Forms or web project), then open the file that contains
the main
method declaration (the entry point to all Windows programs) in
order to enter the code that I want to test.
This is followed by compiling, fixing the errors at compile time, and recompiling until finally seeing a window that opens and closes automatically (in less than one second) in order to go to the console within Visual Studio and finally see the much-awaited feedback.
When I began using Ruby, the first thing I did was create a text file, write
the first “Hello World” and run ruby hello_world.rb
, and voilà, feedback! In
that moment I fell in love with Ruby as a language and a platform.
For software developers, it is essential to receive immediate feedback from the tests we are running, which in turn leads us to the solution for a given problem. Ruby is wonderful at this given that the length of time between the idea and the feedback is minimal which creates a great environment for experimentation.
Not everything is that simple…
While working on a project, we encountered a problem. In reality, it was not so
much a problem as a desire. In the console, we wanted to generate a list of all
the defined routes in a specific web application. The project was fairly large
and had been active for many years, and therefore the existing solutions didn’t
work for us either because the version of the sinatra
web framework was very
old or because the project was not very standardized.
One day when I had a bit of free time I thought about testing some solutions in order to generate a list of routes. After giving some thought to the solution, I said to myself…
The only thing I have to do is redefine the methods
get
,post
,put
anddelete
, generate a new proxy class, save the route definition and call the original implementation, then…
There I realized I had again fallen into a trap. The cognitive distance between the problem I wanted to solve and the means of reaching a solution via Ruby was quite long, especially as there was no immediate feedback.
I took a step back and returned to thinking about the problem.
I have a list of files that can be found in the
app/routes/
folder. From these files I want all the lines that begin withget "…" do
orpost "…" do
, etc.
When I began working with Ruby, I also began working quite a bit with UNIX. Some of the UNIX features I fell in love with include that all the system’s resources are exposed as files; that the commands or utilities perform a task and perform it well; and lastly, that all the utilities can be combined to resolve more complex problems.
This permits us to obtain immediate feedback when working with UNIX utilities.
On to UNIX
The following is a step-by-step example of how to solve a problem utilizing a few UNIX tools. Additionally, we will see how we can achieve good results while obtaining almost instantaneous feedback by taking small steps and building on previous tests.
The problem presented here is as follows: We want to generate a list in the
console of all the defined routes in a sinatra
application. The code for the
routes has the following format:
$ cat app/routes/schedules.rb
get "/schedules" do
# … route code …
end
post "/schedules/new" do
# … route code …
end
put "/schedules/:id" do
# … route code …
end
delete '/schedules/:id' do
# … route code …
end
The first thing to know is that for the problem at hand, we’re not interested in 100% effectiveness; we’ll settle for a sufficiently well-functioning solution. Hence, we can take some shortcuts while testing the code.
The first test is to obtain a list of all the lines that begin with get
,
post
, put
or delete
. For this we can use the grep
utility.
$ grep --line-number --extended-regexp '^\s*(get|post|put|delete) ' app/routes/*.rb
Note: Moving forward the short versions of the utility options will be used in the examples to make them more concise.
The first utility we use is grep
(global regular expression print)
which looks for patterns within text files and, by default, prints the results.
It receives a regular expression as a parameter and a list of files to search.
In this case, it searches within all route files for lines that begin with the
text get
, post
, put
or delete
(ignoring the spaces at the beginning of
each line).
The search result is:
app/routes/schedules.rb:5 get "/schedules" do
app/routes/schedules.rb:25 post "/schedules/new" do
app/routes/schedules.rb:55 put "/schedules/:id" do
app/routes/schedules.rb:60 delete '/schedules/:id' do
app/routes/jobs.rb:2 get '/jobs' do
app/routes/dummy:25 post %r{/post\d+} do
...
Success! With a simple command we have already come a long way; we have a list of routes and for each one the file and line number where the route is defined.
The strategy we will utilize to achieve what we want is very similar to that which Business Intelligence analysts use, in the sense that what we want is to create a three-step process referred to as ETL:
- Extract data
- Transform data
- Load data
The first thing we want to do is extract the information (what we just did), then transform this information into a more manageable format and finally load the information somewhere (in our case we simply want to display it on the screen).
The second step, then, is to adjust the format of the output to make it
uniform. For this we use a command called sed
(stream editor). The
idea behind this command is to, given a text file, transform one line at a time
and return the result of the transformation. It’s ideal for “cleaning” and
filtering the output of other utilities.
In our case we would like to:
- Standardize the use of quotation marks
- Eliminate multiple spaces
- Eliminate the text that appears after the route path (the string ”… do”)
- Eliminate the use of the character
:
separating the name of the file from the line number
To achieve the first bullet point we simply have to redirect the output of the
previous command to sed
and apply a transformation.
$ grep -n -E '^\s*(get|post|put|delete) ' app/routes/*.rb \
| sed "s/'/\"/g"
Two side notes about the command described above: In the first line we use a
backslash to indicate that the command continues on the next line, and later,
in the second line, we use the “pipe” character |
to indicate that the output
of the first command is the input of the second command. This allows us to
combine commands in a very simple way.
Here we are telling sed
to replace (command s/.../.../
) that which is
caught by the regular expression /'/
(single quotation marks), with double
quotation marks (/"/
). We use the g
modifier to substitute for all
occurrences, not just the first.
The output of the combination of these commands is as follows:
app/routes/schedules.rb:5 get "/schedules" do
app/routes/schedules.rb:25 post "/schedules/new" do
app/routes/schedules.rb:55 put "/schedules/:id" do
app/routes/schedules.rb:60 delete "/schedules/:id" do
app/routes/jobs.rb:2 get "/jobs" do
app/routes/dummy:25 post %r{/post\d+} do
...
Now there are no single quotation marks! sed
allows us to specify more than
one transformation at a time, separated by semicolons. Continuing with the
example, the transformations we want to apply are:
s/'/"/g
Standardize the use of quotation markss/ */ /g
Eliminate multiple spacess/ do//
Eliminate the text that appears after the route path (the string… do
)s/:/ /
Eliminate the use of the character:
separating the name of the file from the line number
All together it would be:
$ grep -n -E '^\s*(get|post|put|delete) ' app/routes/*.rb \
| sed "s/'/\"/g; s/ *//g; s/ do//; s/:/ /"
Executing the command we obtain:
app/routes/schedules.rb 5 get "/schedules"
app/routes/schedules.rb 25 post "/schedules/new"
app/routes/schedules.rb 55 put "/schedules/:id"
app/routes/schedules.rb 60 delete "/schedules/:id"
app/routes/jobs.rb 2 get "/jobs"
app/routes/dummy 25 post %r{/post\d+}
With this we conclude the second stage of our process to standardize the results. Now we start preparing the print format we want. The aim is to achieve a list like the following:
post %r{/post\d+} app/routes/dummy:25
get "/jobs" app/routes/jobs.rb:2
put "/schedules/:id" app/routes/schedules.rb:55
delete "/schedules/:id" app/routes/schedules.rb:60
post "/schedules/new" app/routes/schedules.rb:25
get "/schedules" app/routes/schedules.rb:5
That is to say, we want to print the HTTP verb in the first column with right-alignment, followed by the route path and lastly the route file with the line number. We must also note that we want the list ordered by the route path (second column).
To achieve this we first rearrange the fields of each line in the desired
order. For this we will utilize another UNIX utility called awk
.
awk
is a scanning and text-processing language. It is extremely useful when
we want to treat each line as a sequence of fields separated by a space (or
another separator).
To reorder the fields in each line we simply have to:
$ grep -n -E '^\s*(get|post|put|delete) ' app/routes/*.rb \
| sed "s/'/\"/g; s/ *//g; s/ do//; s/:/ /" \
| awk '{ print $3, $4, $1, $2 }'
Each field is identified by its position starting from 1, hence the sequence we are left with is 3,4,1 and lastly 2. The combination of these commands results in the following:
get "/schedules" app/routes/schedules.rb 5
post "/schedules/new" app/routes/schedules.rb 25
put "/schedules/:id" app/routes/schedules.rb 55
delete "/schedules/:id" app/routes/schedules.rb 60
get "/jobs" app/routes/jobs.rb 2
post %r{/post\d+} app/routes/dummy 25
It’s almost what we want, but first a few touch ups. To begin with, we have to
right-align the HTTP verbs. If we take a closer look, the verb with the most
characters is delete
with 6. awk
has another command similar to print
called printf
, which allows for the addition of some formatting information.
$ grep -n -E '^\s*(get|post|put|delete) ' app/routes/*.rb \
| sed "s/'/\"/g; s/ *//g; s/ do//; s/://" \
| awk '{ printf("%6s %s %s %s\n", $3, $4, $1, $2) }'
get "/schedules" app/routes/schedules.rb 5
post "/schedules/new" app/routes/schedules.rb 25
put "/schedules/:id" app/routes/schedules.rb 55
delete "/schedules/:id" app/routes/schedules.rb 60
get "/jobs" app/routes/jobs.rb 2
post %r{/post\d+} app/routes/dummy 25
Next, we want to sort the list by the route path. To do this we will use
another command called sort
. This utility sorts the lines in alphabetical
order by default, but it has a parameter that allows us to indicate which field
from each line should be used for the comparison. Thus, indicating that we want
to sort by the second field, the command looks like:
$ grep -n -E '^\s*(get|post|put|delete) ' app/routes/*.rb \
| sed "s/'/\"/g; s/ *//g; s/ do//; s/://" \
| awk '{ printf("%6s %s %s %s\n", $3, $4, $1, $2) }' \
| sort --key=2
get "/jobs" app/routes/jobs.rb 2
get "/schedules" app/routes/schedules.rb 5
put "/schedules/:id" app/routes/schedules.rb 55
delete "/schedules/:id" app/routes/schedules.rb 60
post "/schedules/new" app/routes/schedules.rb 25
post %r{/post\d+} app/routes/dummy 25
Finally and to conclude, we will use the column
utility, which allows us to
separate another command’s output into columns. For example, if we utilize the
command with the -t
option, it prints the following:
$ grep -n -E '^\s*(get|post|put|delete) ' app/routes/*.rb \
| sed "s/'/\"/g; s/ *//g; s/ do//; s/://" \
| awk '{ printf("%6s %s %s %s\n", $3, $4, $1, $2) }' \
| sort --key=2 \
| column -t
get "/jobs" app/routes/jobs.rb 2
get "/schedules" app/routes/schedules.rb 5
put "/schedules/:id" app/routes/schedules.rb 55
delete "/schedules/:id" app/routes/schedules.rb 60
post "/schedules/new" app/routes/schedules.rb 25
post %r{/post\d+} app/routes/dummy 25
This is very similar to what we want, but in reality we want to split the
output into two columns only, the verb plus the route path on one side and the
file name together with the line number on the other. Making a few small
changes to the awk
line and the column
line, we can add an arbitrary and
temporary marker ~
to mark the place where we want to separate the columns.
In addition, we adjust the format to re-incorporate the colon before the line
number (we repeat this in the awk
line).
$ grep -n -E '^\s*(get|post|put|delete) ' app/routes/*.rb \
| sed "s/'/\"/g; s/ *//g; s/ do//; s/://" \
| awk '{ printf("%6s %s~%s:%s\n", $3, $4, $1, $2) }' \
| sort --key=2 \
| column -t -s~
get "/jobs" app/routes/jobs.rb:2
get "/schedules" app/routes/schedules.rb:5
put "/schedules/:id" app/routes/schedules.rb:55
delete "/schedules/:id" app/routes/schedules.rb:60
post "/schedules/new" app/routes/schedules.rb:25
post %r{/post\d+} app/routes/dummy:25
In this way we’ve built the desired format through a series of small steps.
Conclusions
Unix includes a bunch of tools that, when combined, lead to great solutions. The key to learning to use these commands is using them habitually to resolve small problems. Over time we will improve and deepen our understanding of each command’s options and how to combine the commands themselves. Their ease of use and swift feedback make these options perfect for testing quick solutions to complex problems.
Resources
All of the commands we’ve seen come with manuals that can be accessed using the
man
command. For example:
$ man sed
In addition, here is a list of resources to further deepen your understanding of a few of them:
grep
http://www.grymoire.com/Unix/Grep.htmlsed
http://www.grymoire.com/unix/Sed.htmlawk
http://www.grymoire.com/unix/Awk.html
Appendix
Many of the commands we reviewed have the same roots. In fact, learning one of them leads to learning many concepts applicable to the rest.
+-----+
+-----+ ed +-----+
| +--+--+ |
| | |
| | |
| | |
+-v--+ +-v--+ +--v--+
|sed | |grep| | awk |
+----+ +-+--+ +-----+
|
|
+-v---+
|egrep|
+-----+
sed
and other tools originated from another tool called ed
, a command-based
text editor. Here is a sample of ed
in action: