Thursday 6 February 2014

@arr = ("my", "first", "array");

Now, I know that arrays are a huge topic so I can't possibly fit everything into one post or you'll all get really bored. So here are some of what I think are the key points to understanding arrays, enjoy...

Arrays are just data structures where we can store a list of singular things (scalars) in a specific order. They are like lists, but each element is associated with an incremental number and you can access each of these elements using its associated number. In fact you can assign a list to an array variable name and it will turn into an array.

Arrays are really important, in my clearly expert opinion, because sometimes we need to store things in a specific order and sometimes it makes sense to store a lot of scalars together in one place. You can also do a lot of things with them like iterate through each entry or do the same thing to each entry and put it into a new array. That kind of thing. You can store numbers and strings and arrays and other scalars and mixtures of all of those things into the elements of the array. Here's how:

You can tell you are looking at an array if the variable name is prefixed by an @ sign so for example:
@myarray
I've recently found out that punctuation marks in front of variables in perl are actually called sigils so I may or may not start calling them that from now on.


Declaring and Assigning

As with many things (nearly everything) in perl, arrays can be declared and assigned in different ways:

These two ways show the whole array being created in one go - a list is created and then this list is assigned to the array name:

1.   my @arr;
2. @arr = ("this", "is", "array", 1);

Note that strings are in quotes and numbers are not

or

1.   my @arr = ("this", "is", "array", 2);
Arrays can also be created one element at a time:

1.   my @arr;
2. $arr[0] = "this";
3. $arr[1] = "is";
4. $arr[2] = "array";
5. $arr[3] = 3;

Note that the array index starts at 0 and also that when you assign something to the individual array positions, you use a $ sign (sigil!). This is because each array entry is a singular piece of data, and from my post about scalars, we know that singular pieces of data are stored in scalars and scalars are identified by a $ sign.

If you skip out element positions when you are creating an array like this:

1.   my @arr;
2. $arr[0] = "this array";
3. $arr[2] = "has";
4. $arr[5] = "some";
5. $arr[6] = "gaps";

When you print it out, you will get:

this array, undef, has, undef, undef, some, gaps


Accessing the elements

Once you've declared and assigned your array, you probably want to do things with it. You can access each element by referring to it's index number, remembering that it starts from 0. So for the array
1.   my @myarray = ("this", "array", "is", 6, "elements", "long");

To print the word "is", you need to use the $ sigil as it's a single piece of data, then the name of the array, then a 2 in square brackets because "is" is in the array with element index 2:

1.   print $myarray[2];

Tip

If you want to print out your whole array with a space in between each element, you need to write something like:

print "@array\n";

This interpolates all of the array elements into a string and by default, each element is separated by a space.

Length of Arrays

You can either use

scalar(@arr)

or

$#arr+1

The first one makes more sense and I think it's easier to understand. It just gives you the array as a scalar value which turns out to be the number of elements it contains. The second one gives you the index value of the last element in the array and because the index starts from 0, you have to +1 to get the number of elements.

There's something you need to be careful about within this topic and I think it's a bit of a red herring. If you write:
1.   print scalar(100, 300, 200);
The answer you will get is:
200
this is because it's giving us the value of the last item in the array. So watch out!

Iterating through arrays - this topic is a bit more complex and needs scary things like loops, which I'll cover later so I'm going to run away from it now...


Array of arrays

I think my brain is exploding at trying to visualise this. You can put as many arrays in arrays in arrays in arrays as you like and end up with a multi-dimensional mess. But as long as you can understand it and everyone reading your code can understand it, then everything is fine.

There are, again, many ways implement this. You can declare some arrays and then put all of these into one big array:
1. my @name1 = ("Emma", "Jane", "Howson", 23);
2. my @name2 = ("Willard", "Carroll", "Smith", 45);
3. my @name3 = ("Tyra", "Lynne", "Banks", 40);
4. my@name4 = ("Alecia", "Beth", "Moore", 34);
5. my@name5 = ("David", "Boreanaz", 44);
6.
7. my@perfectdinnerparty = (\@name1, \@name2, \@name3, \@name4, \@name5);
On line 7 you may have notice the backslashes before the @ sigils. This is to prevent one giant array being created where 23 is the 4th element in the array with an index of 3, "Willard" will then become the 5th element in the array with an index of 4 and "Carroll" will be the 6th element with an index of 5 etc. The backslashes actually turn the arrays into array refs, which I don't understand at the moment but I'll get to in a later post. All I know is they enable the structure of an array of arrays.

Or you can declare them all at once:
1.   my @perfectdinnerparty = (
2.     ["Emma", "Jane", "Howson", 23],
3.     ["Willard", "Carroll", "Smith", 45],
4.     ["Tyra", "Lynne", "Banks", 40],
5.     ["Alecia", "Beth", "Moore", 34],
6.     ["David", "Boreanaz", 44],
7. );

The outer brackets need to be parentheses () and the inner ones surrounding each array element need to be square brackets [] and these indicate that the contents are also an array ref.

Another note is that you need to put commas between each outer array element (so between each set of square brackets), but you don't need to put a comma after the last array entry (where it's highlighted). I think it's good practise to do this anyway because if you need to add another element (another square bracket set) to the array, you only need to alter one like of code.

Accessing elements in an array of arrays

The process is very similar to an array although there are two numbers in brackets to find the location of the element you need. So to get the string "Smith" from my perfectdinnerparty array, I would use:

1.   $perfectdinnerparty[1][2];

The first number is the index of the number of arrays in the super array and the second number is the index of the element in each array in square brackets.

Another note

$var and @var are two completely different variables and have nothing to do with each other. You can't access one by using the other one so it's probably best not to use the same variable name to avoid confusion. Another case of just because you can do it, doesn't mean you should.


As always, please correct me if I'm wrong with anything and any suggestions of how to find out more are welcome.

10 comments:

  1. Sorry to say, but your explanation of the behaviour of this code:

    print scalar(100,300,200);

    is wrong (though I agree that the problem itself *is* quite unclear).

    The trick is, there is *no* array in this example. Were there an array involved, scalar() would return the number of elements in that array, as you correctly stated a bit earlier. However, in this case there is no array. Thjs list of comma-separated numbers is an expression, and comma acts here not as a separator, but as an operator. And this operator always return its last argument. This (and some other quirks of Perl related to the arrays and the comma) is very well described here:

    http://www.modernperlbooks.com/mt/2013/11/context-and-the-comma-operator.html

    much better than I can do anyway, so I'll stop here.

    ReplyDelete
  2. Really enjoying your blog!

    On the tip you say that "each element is separated by a space." But your code has the newline character, not a space character:

    print "@array\n";

    ReplyDelete
    Replies
    1. Emma wrote it correctly. An array in double quotes is expanded into it's element joined by a single space. The newline is only printed after the element of the array. Try it yourself.

      perl -E 'my @arr = ( 1, 2, 3 ); print "@arr\n4";'

      gives:

      1 2 3
      4

      Delete
    2. To be completely precise, the space is used by default, but this can be changed by setting $" variable to a different value. Check this:

      my @x = (1, 2, 3);
      print "@x\n";
      $" = ',';
      print "@x\n";

      Delete
    3. Ah, OK, I understand now. (I'm a Perl newbie myself.) Thanks for the clarification, Barney.

      Delete
  3. One more tiny nitpick - I think when you wrote you can find the length of an array with:

    $#arr-1

    You probably meant:

    $#arr+1

    Good stuff - keep it up.

    ReplyDelete
  4. There is another way to assign to/read from arrays: slices. Slices let you assign to/read from multiple individual elements of the array at the same time. To make a slice you use an @ sigil instead of a $ sigil:

    my @arr;

    @arr[3, 1, 2, 0] = ("4", "is", "array", "this");

    print "@arr\n@arr[reverse 0 .. $#arr]\n";

    ReplyDelete
  5. For references I highly recommend reading perlreftut doc page. It makes things really easy. I also recommend never ever looking at the perldsc page. It's a crutch, better to really learn from reftut than to just mimic dsc.

    ReplyDelete