Perl references explained

Coming to Perl from PHP can be confusing because of the apparently similar and yet quite different ways the two languages use the $ identifier as a variable prefix. If you’re accustomed to PHP, you’ll be used to declaring variables of various types, using the $ prefix each time:

<?php
$string = "string";
$integer = 6;
$float = 1.337;
$object = new Object();

So when you begin working in Perl, you’re lulled into a false sense of security because it looks like you can do the same thing with any kind of data:

#!/usr/bin/perl
my $string = "string";
my $integer = 6;
my $float = 1.337;
my $object = Object->new();

But then you start dealing with arrays and hashes, and suddenly everything stops working the way you expect. Consider this snippet:

#!/usr/bin/perl
my $array = (1, 2, 3);
print $array. "\n";

That’s perfectly valid syntax. However, when you run it, expecting to see all your elements or at least an Array like in PHP, you get the output of “3”. Not being able to assign a list to a scalar, Perl just gives you the last item of the list instead. You sit there confused, wondering how on earth Perl could think that’s what you meant.

References in PHP

In PHP, every identifier is a reference, or pointer, towards some underlying data. PHP handles the memory management for you. When you declare an object, PHP writes the data into memory, and puts into the variable you define a reference to that data. The variable is not the data itself, it’s just a pointer to it. To oversimplify things a bit, what’s actually stored in your $variable is an address in memory, and not a sequence of data.

When PHP manages all this for you and you’re writing basic programs, you don’t really notice, because any time you actually use the value it gets dereferenced implicitly, meaning that PHP will use the data that the variable points to. So when you write something like:

$string = "string";
print $string;

The output you get is “string”, and not “0x00abf0a9”, which might be the “real” value of $string as an address in memory. In this way, PHP is kind of coddling you a bit. In fact, if you actually want two identifiers to point to the same piece of data rather than making a copy in memory, you have to use a special reference syntax:

$string2 = &$string1;

Perl and C programmers aren’t quite as timid about hiding references, because being able to manipulate them a bit more directly turns out to be very useful for writing quick, clean code, in particular for conserving memory and dealing with state intelligently.

References in Perl

In Perl, you have three basic data types; scalars, arrays, and hashes. These all use different identifiers; Scalars use $, arrays use @, and hashes use %.

my $string = "string";
my $integer = 0;
my $float = 0.0;
my @array = (1,2,3);
my %hash = (name => "Tom Ryder",
            blog => "Arabesque");

So scalars can refer directly to data in the way you’re accustomed to in PHP, but they can also be references to any other kind of data. For example, you could write:

my $string = "string";
my $copy = $string;
my $reference = \$string;

The value of both $string and $copy, when printed, will be “string”, as you might expect. However, the $reference scalar becomes a reference to the data stored in $string, and when printed out would give something like SCALAR(0x2160718). Similarly, you can define a scalar as a reference to an array or hash:

my @array = (1,2,3);
my $arrayref = \@array;
my %hash = (name => "Tom Ryder",
            blog => "Arabesque");
my $hashref = \%hash;

There are even shorthands for doing this, if you want to declare a reference and the data it references inline. For array references, you use square brackets, and for hash references, curly brackets:

$arrayref = [1,2,3];
$hashref = {name => "Tom Ryder",
            blog => "Arabesque"};

And if you really do want to operate with the data of the reference, rather than the reference itself, you can explicitly dereference it:

$string = ${$reference};
@array = @{$arrayref};
%hash = %{$hashref};

For a much more in-depth discussion of how references work in Perl and their general usefulness, check out perlref in the Perl documentation.

Useless use of cat

If you’re reasonably confident with using pipes and redirection to build command-line strings in Bash and similar shells, at some point somebody is going to tell you off for a “useless use of cat“. This means that somewhere in your script or pipeline, you’ve used the cat command unnecessarily.

Here’s a simple example of such a use of cat; here, I’m running an Apache log through a pipe to awk to get a list of all the IP addresses that have accessed the server.

# cat /var/log/apache2/access.log | awk '{print $1}'

That works just fine, but I don’t actually need the cat instance there. Instead, I can supply a file argument directly to awk, and the result will be the same, with one less process spawned as a result:

# awk '{print $1}' /var/log/apache2/access.log

Most of the standard UNIX command-line tools designed for filtering text actually work like this; like me, you probably already knew that, but fell into the habit of using cat anyway.

For tools that don’t take a filename instead of standard input, such as mail, you can explicitly specify that a file’s contents should be used as the standard input stream. Consider this line, where I’m mailing a copy of my Apache configuration to myself:

# cat /etc/apache2/apache2.conf | mail tom@sanctum.geek.nz

That works fine, but we can compact it using < as a redirection symbol for the same result, and a command line a whole five characters shorter:

# mail tom@sanctum.geek.nz </etc/apache2/apache2.conf

If you overuse cat, then these two methods will probably enable you to fix that with a little effort. Incidentally, the second method can also be extended to provide in-place arguments for tools that read from files, by adding parentheses. Here, for example, I’m comparing the difference in output when I run grep with two different flags on the same file, using diff:

# diff -u <(grep -E '(a|b)' file.txt) <(grep -F '(a|b)' file.txt)