Brandon's Notepad

June 30, 2009

Using Perl Hashes

Filed under: Perl — Brandon @ 10:00 pm

Home > My Lists > Programming Notes > PERL > Using PERL Hashes


A hash, or associative array, is basically an array that has a textual key instead of a numeric key. Any Perl reference will explain basic usage, but the following is a compilation of items that I almost always have to look up if I have not coded in Perl for a while. A few “tricks” that I’ve learnt over time have been included as well.


Things to Remember

Creating Hashes. Hashes can be created using various methods. Below is the syntax for several options. Each method produces exactly the same result, a hash in which numeric values ‘1’ and ‘2’ are referenced by their textual equivalents.

%numbers = (); # Create an empty hash.
$numbers{one} = 1; $numbers{two} = 2; # Set one at a time.
%numbers = (one,1,two,2); # Use barewords for key/value pairs.
%numbers = ('one','1','two','2'); # Quote if key or value has spaces.
%numbers = qw(one 1 two 2); # Use a quoted-word list.
%numbers = (one => 1,two => 2); # Force strings on the left.
@numbers{one,two} = (1,2); # In non-intuitive array context.
@numbers{@keys} = @values; # Logical extention of the above.
map { $_ = ++$i } @numbers{one,two}; # Something to ponder.

Elements Existing & Defined.
The ‘exists’ function returns true if they specified key exists in the specified hash, whereas the ‘defined’ function determines if the value of the element has been defined.

The following two commands report the statuses of the key and the value of $hash{$key}:

exists($hash{$key})  ? print 'Exists, and ' : print 'Does not exist, and ';
defined($hash{$key}) ? print 'is defined.'  : print 'is not defined.';

Running these lines only should result in:

Does not exist, and is not defined.

Prepending the the following line, setting the element to an undefined value:

$hash{$key} = undef;

should result in:

Exists, and is not defined.

Prepending the the following line, setting the element to a null value:

$hash{$key} = '';

should result in:

Exists, and is defined.

General Note About Data Structures. Hashes are often used to represent data entities, but how they are used dictates perspective. Consider the following:

  • A simple hash can be analogous to a simple data table and the hash’s key to the table’s primary key. For example, %phone_numbers represents the same thing as a table called “Phone Numbers” and $phone_numbers{‘Bob’} = 555-1234 represents a record in that data table which we now know contains at least two fields, “Name” (pk) & “Number”. Additional fields in the table that (of course) relate to the same key must be stored in a different hash.
  • A hash of (anonymous) hashes eliminates the need to use multiple named hashes to represent fields. For example, %contacts can represent the “Contacts” table in the database, $contacts{‘Bob’} represents the row uniquely identified by the primary key ‘Bob’, and $contacts{‘Bob’}{‘address’}, $contacts{‘Bob’}{‘home phone’}, $contacts{‘Bob’}{‘work phone’}, $contacts{‘Bob’}{‘mobile phone’}, and $contacts{‘Bob’}{‘birthday’} represent the various fields in that table.
  • A hash of arrays can represent a set of named lists. For example, the %lists hash may include $lists{‘To Do’}, $lists{‘Shopping’} & $lists{‘Calls To Make’}. If each of these were a simple string element, this hash wouldn’t be very useful; however, if each contains an array reference, then elements can be pushed, popped, shifted, unshifted, and even grepped, sliced & spliced as needed.
  • Larger structures are simply combinations of these smaller structures. For example, returning to the %contacts example above, $contacts{‘Bob’}{‘call log’} could be an array containing string elements, each noting the date and a brief description of a call to Bob. The first would be referenced as $contacts{‘Bob’}{‘call log’}[0], the second as $contacts{‘Bob’}{‘call log’}[1], and so forth.

Creation of these types of structures is covered next.

Hash of Hashes. Three different methods are used here to create such a structure, though all of the methods for creating hashes as explained above can apply.

# Using anonymous hash constructor.  Structure engineered from the start.
%numbers = (
     german => {
          one => ein,
          two => zwei
     }
);

# Using the constructor inline.  Structure segments built only when needed.
$numbers{spanish} = {
     one => uno,
     two => dos
};

# Using named hashes.  Least efficient and least intuitive.
%french = (
     one => un,
     two => deux
);
$numbers{french} = \%french;

The following code snippet will do two things: illustrate how to reference elements within the structure and prove that the %numbers hash was built according to expectations.

print "Languages: " . join(', ',keys %numbers) . "\n";
foreach $lang (keys %numbers) {
     print "Numbers in $lang:\n";
     foreach $num (keys %{$numbers{$lang}}) {
          print "\t$num == $numbers{$lang}{$num}\n";
     }
}

Let’s interpret this clause from line 5 of the snippet: $numbers{$lang}{$num}. The way to read this is that the first segment, $numbers{$lang}, is translated into a hash reference (a pointer for C programmers), which is used in turn to resolve the second segment, HASH(0x123456){$num}. In the loop on the preceding line, the $numbers{$lang} hash reference is being placed (or “cast”) into hash context using the %{} construct because the keys function expects a hash reference as a parameter.

Hashes of Arrays. Here is one way to create an array as a hash elements and then push values to it.

# Using anonymous array constructor.
%numbers = (
     English => ["zero", "one", "two", "three"],
     German => ["null", "ein", "zwei", "drei"],
     Spanish => ["cero", "uno", "dos", "tres"]
);

# Using push and quoted words.
push(@{$numbers{French}}, qw/ zero un deux trois/);

As with hashes of hashes above, other methods are available, such as using references to named arrays. These have been omitted for brevity. Note the casting of the element into an array context: @{$numbers{French}}. Here is a snippet of code to validate the contents:

foreach $k (sort keys(%numbers)) {
	print "$k numbers are " . join(', ',@{$numbers{$k}}) . ", etc.\n";
}



Hash Tricks

Sort Unique. On multiple occasions, I’ve had the need to remove duplicate values from an array. In the Unix shell, a sort -u works nicely, but Perl’s sort doesn’t work quite that way. Instead of programming this in place each time, I’ve often written a subroutine similar to this one:

sub sort_unique {
     my %h; my $k;
     foreach $k (@_) {
          $h{$k}++;
     }
     return(sort keys(%h));
}

The subroutine accepts a simple list as input. In a loop, elements of the list are used as keys for an internal hash. With each pass, the value of the corresponding hash element is incremented. The list of hash keys is sorted and returned as a list. Strictly speaking, the sorting isn’t mandatory, but otherwise, the keys will be returned in an arbitrary order. A side-benefit that may be exploited is that the number of occurances of each list item is known once the input list is traversed.

Please see this page from the perl.com FAQ for some shorter and more efficient alternatives. Here are modularized variants of options (b):

sub sort_unique {
     undef my %h;
     return sort grep(!$h{$_}++, @_);
}

and (d):

sub sort_unique {
     undef my %h;
     @h{@_} = ();
     return sort keys %h;
}

Again, the sort is optional but nice.




Advertisements

Create a free website or blog at WordPress.com.

%d bloggers like this: