Friday, 30 May 2014

Loops and conditionals: Control your loop

I've used loop controls once and I can't remember why, but I know it was useful...

Loop controls, as it indicates in the name, give you more control over the flow of a loop. Without them, you just have to accept the way that for, if, while etc. loops function and work around them to serve your purpose. With them, you can modify the way a loop works, for example you might want to jump out of a loop completely for reasons other than the initial condition not being satisfied.

There are three loop control operators:

Last
Next
Redo

These operators can be applied to the five kinds of loops there are in Perl:
  • for
  • foreach
  • while
  • until
  • naked block
NB You can't apply them to an if statement because it's not a loop and it really won't make sense, the code blocks in if statements are only run zero times or once. 

Wait, what is this naked block thing? It sounds kind of weird...

This block is naked! It has nothing in front of its curly braces :-o

{
    code goes here
}

A naked block is just a block of code surrounded by curly braces, with no keyword or condition. It runs only once, as if the curly braces weren't there so in my opinion, it's not really a loop. You might wonder what the point is of putting curly braces around code. They can be useful if you want to have a variable that is only in scope in part of the code because the guideline is to declare a variable in the smallest scope available.

Here's how the loop control operators work:

Last
This operator immediately ends execution and breaks out of the loop, no more iterations happen, even if you haven't finished going through all the elements in the array or the condition still evaluates to true etc.

If you're used to how perl works, it may look a bit strange in the code below because it's a bareword (i.e. has no prefix or suffix, or sigil, it's just a word on it's own). Barewords usually aren't allowed but there are a few key words, including the loop control words, that are. If you try to type in a bareword that isn't one of these key words, you will be complained at.
1.   foreach(@myarray) {
2.     if( $_ eq "end") {
3.         last;
4.     else {
5.         print "$_\n";
6.     }
7. }
This code goes through the array called myarray and prints out each element until it gets to an element that is the string "end". End will not be printed and the loop will stop, no further elements will be printed.


Next
This operator is used when you want to jump out of the current iteration but you then want the next iteration to continue afterwards. When next is hit, it jumps to the end of inside the current loop (at the closing curly bracket) and then the next iteration starts.
I've nicked this example straight from "Learning Perl" by Randal Schwartz so thanks people that wrote it :-)
1.   while (<STDIN>) {
2.     foreach my $word (split(' ', <STDIN>)) {
3.         $total++;
4.         next if /\W/;
5.         $valid++;
6.     }
7. }
The code above basically looks at words in a file and counts how many words in total there are and of these, how many are real words. Here is a mored detailed explanation if you'd like to know more:

Lines of input are being read one at a time through <STDIN>. The split function splits the line up using spaces so individual words will be looked at. Then for each of the words in the line, the count of words is upped one ($total) and then on line 4, the word is looked at to see if it contains any non-word characters (anything but letters, numbers and underscores) This is done by /\W/ where 'W' matches any non-word character. If the word contains any non-word characters, next is invoked, the iteration exits and the next word is looked at. If the word doesn't contain any non-word characters, the valid count will be incremented.
Hope that makes sense!

You may have noticed I've sneakily glossed over a few things like how the input is being taken from <STDIN> but just trust me (and the people that wrote the example) for the moment, that it works and does what it's supposed to.

Redo
This operator means go back to the beginning of the iteration you are currently in and do it again, without testing the condition. Why would we need this? Here's an example, again, curtesy of "Learning Perl", but slightly modified - thanks guys!
1.   my @words = qw( accommodate believe colleague disappear embarrass);
2. my $errors = 0;
3.
4. foreach my $word (@words) {
5.         print "Type the word '$word':  ";
6. chomp(my $try = <STDIN>);
7. if ($try ne $word) {
8.     print "Sorry - that's not right.\n";
9.     $errors++;
10.     redo;
11.     }
12. }
13. print "You completed the test with $errors errors.\n";
Ok so what does this mean? It's basically a spelling test and you have to keep writing the words in the @words array until you spell them correctly. It's not the best spelling test in the world because you are shown the word you need to type, but you get the idea.

So, for each of the words in the words array, the command line will display a prompt for the user to type in the word. Then line 6 uses the chomp operator, which cuts off any newline characters from the end of the word that the user has typed in (this is from pressing enter to submit the word they have spelt). The user input is assigned to $try. Then $try is compared to the correct version of the word from @words on line 7. If the word is spelt correctly, the next iteration begins, giving the user a new word to spell. If they have spelt it wrong however, the terminal asks them to spell it again, then adds one to the error count and on line 10, redo tells the programme to do the iteration again.


All of the above examples contain nested loops and the inner one is always an if statement. I'm trying to think of examples where you use a loop control operator without a nested if statement and I'm not sure if they would work without one. Please let me know if you have any examples.


Labelling Blocks

Labels are very helpful, they can actually control where you will be taken next. They also allow anyone else reading your code - or even yourself if you've forgotten - to see clearly what is going on.

Sometimes you will be nesting loops and blocks of code inside of each other and it can be hard to figure out where the loop controls will take you next. You can use labels to help with this, especially when you want to work with a loop block that's not the innermost one.

I've also been told that it's best practise to use labels even if you think it's easy to follow where the loop will go. This may be true for you but you need to bear in mind that other people may look at your code and not understand or this loop could eventually be expanded and adding a label now will make it clearer for the future.

Labels are another example of a bareword and they are named like any other identifier in perl; they can include letters, digits and underscores, but can't start with a digit. Larry Wall has actually recommended that labels are all upper case and this is what I've come across in real-world code so far. Perl is case sensitive and having an uppercase label will avoid having a label with the same name as a built-in function or even one of your own subroutines.

To add a label, first you define the block of code that you want to label by writing the label name, followed by a colon, followed by the loop:

ELEMENT: foreach (condition) {
    code block
}

This has labelled the whole foreach loop as "ELEMENT", (not just the line) and it includes everything inside the curly braces, as well as the foreach statement. After labelling your loop, then you decide which of the operators you are going to use and where in the code block you are going to put it.
This example is curtesy of Andrew Solomon, who has taught me a lot about perl and taught me about loop labels.
1.   sub pick_random_colour { 'purple' };
2. my %betters = (
3.     amy => [qw/green/],
4.     ben => [qw/ blue purple yellow/],
5.     chloe => [qw/red orange pink/],
6. );
7.
8. my $win = pick_random_colour();
9.
10. HOORAY: foreach my $better (sort(keys(%betters))) {
11.     foreach my $ticket (@{$betters{$better}}) {
12.         if ($ticket eq $win) {
13.             print "Hooray! $better wins with $ticket\n";
14.             last HOORAY;
15.         }
16.         print "Nope, $better has no luck with $ticket\n";
17.     }
18. }
Ok, this is my largest example so far, but I think it really shows how labels work. Try it for yourself, run it, then take the label out, and then run it again and see the difference. Or you can look below and see the results:

With labels:
Nope, Amy has no luck with green
Nope, Ben has no luck with blue
Hooray! Ben wins with purple

Without labels:
Nope, Amy has no luck with green
Nope, Ben has no luck with blue
Hooray! Ben wins with purple
Nope, Chloe has no luck with red
Nope, Chloe has no luck with orange
Nope, Chloe has no luck with pink

Notice Ben has no luck with yellow is not printed out.

The basic gist of this example is kind of like a lottery where each person can buy a ticket with a unique colour on it. Amy has bought one ticket (green) and Ben and Chloe have bought three tickets each (blue purple and yellow and red, orange and pink respectively). A random colour is picked and the winner is the one with the matching ticket.
I've hacked it a little bit and written the pick_random_colour subroutine so that purple is always picked instead of a random colour. This is just so I can be certain of the outcome each time and can really see the difference when I take the labels out.

So here is the full breakdown, again, skip this if you feel you understand the example already.
Line 1, this is where my pick_random_colour subroutine is written that picks purple every time. I'll explain subroutines in more detail in a later post but for now, they are just pieces of code written in one place and called in another. You can see it being called on line 8. I've then declared an array, %betters, and assigned it names as keys and different lottery tickets as bets, each with a different, unique colour. Then the pick_random_colour subroutine is called on line 8 and of course, it's going to pick "purple". Now comes the exciting bit - the loop. So, we've got for each better in the hash and then for each ticket the better has, the ticket is compared to $win, which is purple. I've used sort on line 10 just so that the betters will be looked at in the same order each time. If the ticket is equal to purple, then the string on line 13 is printed and then we get to line 14 with the loop control and label. This indicates that we should cut out of the inner loop as well as the loop that is labelled with "HOORAY". This means that the program has finished. If the ticket is not equal to purple, the string on line 16 will be printed and the search will continue as to who has the winning ticket.
If we take out the labels, but leave in the loop control, only the inner loop will be exited when the winner is found and the next better will be looked at. You can see from the results above that once the winning ticket from Ben has been found, Ben's remaining tickets are not looked at, but then Chloe's tickets are looked at.


Wednesday, 23 April 2014

Loops and conditionals: While/until

This is going to be simple little post about the while statement. It's general format looks like this:

while (condition) {
    code block
}

As you can probably figure out, the code block is executed while the condition evaluates to true. As soon as the condition evaluates to false, the code block is passed over and the rest of the file is run. The conditions function exactly the same way as I wrote about in my if post, so if you need to brush up on what a condition is and how it works, go to http://www.perladventures.blogspot.co.uk/2014/04/ifelsifelse.html

Here's an example of how you can use while in your code:
1.   my $i = 5;
2.
3. while ($i > 0) {
4.     print "$i\n";
5.     $i--;
6. }
This will give output:
5
4
3
2
1

The condition is evaluated before the first iteration so if the condition is initially false, the code block will never be run.

Until

There's also a revers version of while. I've never actually seen it in real code before and it looks just as difficult as unless.

until (condition) {
    code block
}

This time, the code black runs while the condition evaluates to false
1.   my $i = 1;
2. my $j = 12;
3.
4. until ($i > $j) {
5.     print "$i\n";
6.     $i *= 2;
7. }
This will give output:
1
2
4
8

Again, the condition is evaluated before the code block is run. So if your condition initially evaluates to true, the code block will never be run.

The examples I've done are all to do with numbers but you can use pretty much anything you want.

Next in the loops and conditionals mini-series - loop controls

Thursday, 10 April 2014

Loops and conditionals: For each

Arrays and lists and things like that are all very well, but we often need a way of going through each element sequentially to make use out of them - maybe to search for something or to do some function on each element. This is where for and foreach loops come in.

A for(each) loop goes through each item in an array or a list or anything else you want to iterate through for example numbers 1-10. In fact, if the thing you put inside the brackets returns a list, it can be iterated over.

It's general layout looks something like this:

for(each) (array/list/whatever) {
    code block
}

For or foreach?
I've just discovered that for and foreach are completely identical and do exactly the same thing. You can use them both interchangeably. The underlying code for both is exactly the same. This leads me to think why then, are there two different words for it, but unfortunately I don't think there is a real answer to it.
I wanted to find out which one programmers prefer and I asked around at work and it seems like the only reason they pick one over the other is convention. Most of the people I asked use for and foreach in the following way:

For is generally used if you don't have something specific to iterate through. You can initialise a variable, condition check and increment the variable.
1.   for (my $i = 1; $i < 9; $i++) {
2.     print "$i "
3. }
This just prints out numbers 1 to 8 with a space between each.

For those who are unfamiliar with the above -
my $i = 1 - initialising the variable to use.
$i < 9 - giving the variable a maximum or minimum size.
$i++ - showing how much to increment or decrement the variable with each pass of the loop - in this case, it means plus one.

Basically all of this put together means, there is a variable called $i with a value of one. Start with $i = 1 in the first pass of the loop and add one to it each time you go through the loop until $i is no longer less than 9, then exit the loop.


Foreach is often used if you are going through an array or list or hash and you want to go through the elements of each one. For example:
1.   foreach (@myarray) {
2.     print "$_\n";
3. }
This just goes through each item in the array and prints them out on individual lines.

You can however, swap the for and foreach around or you can use the same word for both usages, it's totally up to you but I think I'm going to stick with how I've described it as above.

The good thing about foreach loops in perl, which I haven't come across before, is that you don't have to explicitly say how big the array or whatever you're iterating through is and you don't have to tell it that you want to go to the next item once it's finished with the item it's on.

What is this $_?
In this case, it is a quick and anonymous way of referring to each individual item of the array, it's called the "default" variable. It represents the scalar that's being focussed on so in the case of a for loop, it's the list item or array item that's being currently looked at. It's kind of like using the word "it" in the English language - you know what you are referring to, but you're using a general word.

It is bad practice to directly alter the original array so instead you can assign the individual elements to a new variable as I've done below - $item refers to each individual array item:
1.   foreach my $item (@myarray) {
2.     print "$item\n";
3. }
You can also use foreach with hashes, it's very similar to arrays:
1.   foreach my $key (keys %myhash) {
2.     print "this is the key: $key\n";
3.     print "this is the value: ".$myhash{$key}."\n";
4. }
This will print out the statements followed by the key on one line and the value on the next for each of the items in the hash.
Note the "my" isn't completely necessary, because if you leave it out, the "my" will be implied anyway.

You need to write the word "key" inside the brackets before writing the name of the hash because what you are doing is getting a list of keys in the hash and then iterating over them. Within the code block, you can then use the key to get the values.

I then wondered about not just getting a list of the keys and iterating over them, but getting the keys and the values and iterating over them and I was told about "each". You can't really use it with a for loop and it looks something like this:
1.   while (my ($key, $value) = each %hash) {
2.     print "key is $key, value is $value\n";
3. }
As you can see you use a while loop, which I haven't covered yet but basically it just goes through all of the keys and values and prints them out.
A warning does come with using each - you need to make sure that nothing else in your program can is changing the hash you're iterating over because if changes happen during the while loop. You may end up skipping or duplicating entries.

Map
I think I have mentioned map before but this is definitely a place to write a reminder. It's just a slightly cleaner way of writing code that takes each member of an array/list and modifies or uses it in the same way to create a new list.
1.   my @new_array = map {
2.     print "this is the key: \n";
3.     print "this is the value: $item \n"
4. }

Thursday, 3 April 2014

Loops and conditionals: Ifelsifelse

The next things on my list are conditionals and loops. I think that these concepts are fundamental to actually making any programming language work and without them, there wouldn't be much to do, especially when combined with conditional logic.

The first one is the if statement. It's basic structure is something like this:

if (condition) {
    code block
}

This evaluates the condition in the parentheses, and if it's true, the code block will be run and if it's false, the code block will be passed over and the rest of the file will be run.

Boolean Logic

How do I make the condition equal to true or false?
Perl doesn't actually have any specific true or false objects or identifiers so we have to make the condition evaluate to values that themselves are either true or false.

What evaluates to true or false?
Basically everything is true, apart from undef, "0", an empty string ("") and anything that evaluates to any of these.

The table below shows how you can get true or false values:

True False
"0.0" ""
" " "0"
1 undef
+ve integers 0 # converts to "0"
-ve integers 0.0 #computes to 0 and then converts to "0"
strings unassigned variable  #evaluates to undef

()  #empty list

("") #empty string in a list

You can put any of these as the condition, for example:

1.   if (7) {
2.     code block
3. }

1.   my $value = 0.0;
2. if ($value) {
3.     code block
4. }

The code block will run in the first example because 7 is a positive integer and positive integers evaluate to true. The code block will not run in the second example because 0.0 evaluates to false.

You don't have to put single values into the condition - you can also use boolean operators to make things more interesting and with these, you can make expressions that evaluate to true or false.


Boolean Operators Meaning
> Greater than (numerical)
< Less than (numerical)
== Equal to (numerical)
gt Greater than (string)
lt Less than (string)
eq Equal to (string)
! Not
&& And
|| Or

1.   my $temperature = 28;
2. if ($temperature >  25) {
3.     print "it's hot!";
4. }
1.   my $value = 4;
2. if ($value <  8  &&  $value > 5) {
3.     print "this number is between 5 and 8";
4. }
1.   my $string = "hello";
2. if ( ($string eq "hello") || ($string eq "hi") {
3.     print "hello there";
4. }

Of course you can put more interesting things inside the code blocks than just a print statement, but you get the idea.

You can also experiment with making things more complicated and incorporate as many ands and ors as you want and and combination of boolean operators or do crazy things like XOR and NAND.

To use not, you just put the exclamation mark in front of the expression you want to evaluate:
(!($string eq "hello"))
I think the inside brackets are optional but it makes things clearer and shows that you want to negate the whole of the expression rather than just the variable.

Note
If you try to compare two numbers using the string equal to operator (eq) the numbers will be stringified and then compared. If your two numbers are the same, the condition will evaluate to true as the strings will be the same. If the numbers are different, you will get false.

If you try to compare two strings using the numerical equal to operator (==) you will get warnings but it will always evaluate to true, even if the strings are not the same. This is because strings are evaluated as 1 and then you will be comparing two 1's.

Else

Sometimes you want something to happen if the if condition isn't satisfied. You can then add extra code to say what you want to do if the if condition evaluates to false by using else. The else immediately follows the if as follows:

if (condition) {
    code block
}
else {
    code block
}

This way some code will always be run no matter what the condition evaluates to. If it evaluates to true, only the first code block is run, if it evaluates to false, only the second code block is run. Then the rest of the file is then run as normal.

This example could work in a shop selling alcohol (can you tell I used to work in a supermarket?):

1.   if ($customer_age > 17 {
2.     print "sale authorised";
3. }
4. else {
5.     print "sale denied";
6. }

Elsif

What happens if you want to put multiple ifs together? Say that you want to test one condition, and if it's not satisfied, you want to test another condition. This is where elsif comes in, and for reasons unknown to me, it is spelt with a missing 'e'. Excellent.

if (condition) {
    code block
}
elsif (condition) {
    code block
}

If neither of the conditions are satisfied, neither of the code blocks will be run. If the first and the second condition are satisfied, only the first code block will be run because the second conditions is only evaluated if the first condition is not true.

For example:
1.   my $x = 50;
2. my @small_numbers;
3. my @medium_numbers;
4.
5. if ($x < 101) {
6.     push(@small_numbers, $x)
7. } elsif ($x < 201) {
8.     push(@medium_numbers, $x)
9. }

This code is just looking at the number and putting it into an array of small or medium numbers. So here I've defined small as 100 or less and medium as 200 or less. If the number is larger than 200, nothing will happen. I know this makes no sense in the real world but I think it works as an example.

You can write it like this or you can add an "else" onto the end if neither of the two conditions are satisfied.

You can chain as many elsifs as you want together but only on else can be put onto the end.

I couldn't think of a real life example so, again, here are a bunch of numbers:

1.   my $x = 5;
2. my @small_numbers; 
3. my @medium_numbers;
4. my @large_numbers;
5. my @very_large_numbers;
6.
7. if ($x < 4) {
8.     push (@small_numbers, $x);
9.  } elsif ($x < 8) {
10.     push (@medium_numbers, $x);
11.  } elsif ($x < 20) {
12.     push (@medium_numbers, $x);
13.  } else {
14.     push (@very_large_numbers, $x);
15.  }

This code is just looking at the number and putting it into an array of small, medium, large or very large numbers. So here I've defined small as below 4, medium between 4 and 7, large between 8 and 19 and very large as 20 and above. I know this makes no sense in the real world but I think it works as an example.

Again, make sure you spell elsif properly!!!


Unless

This one is the exact opposite of if and when I see it, it always messes with my head and I have to think about it for a moment - every single time! I think that it's used when it looks cleaner to use rather than having double negatives everywhere or having lots of ands and ors.

unless (condition) {
    code block
}

Here the code block only runs if the condition is not satisfied:
1.   unless ($age < 18)
2. {
3.     print "Sale approved";
4. }
This just means that sale approved will only be printed if the customer is 18 or over.

Tuesday, 11 March 2014

Hashes

Hashes are another kind of variable and in my opinion they are pretty similar to arrays. They are indexed, the same as arrays, but instead of being indexed with a number, they are indexed with a user defined string and this index is called a key. This gives us a key/value pair. A good example of this is a dictionary (an actual book dictionary) where the word is the key and the definition is the value.
The value of a hash can be any type of variable - it can be a number, a string, another hash, an array or even a variable name...

You can tell that you are looking at a hash if it's prefixed by a % sigil. I think it makes more sense to have a hash (#) as the sigil if you're going to call it a hash but never mind, a hash is already reserved for comments.
%hash
A hash can actually be modelled by using arrays like this:
1.   my @french_words = ("bonjour", "au revoir", "merci");
2. my $hello = 0;
3. my $goodbye = 1;
4. my $thank_you = 2;
5. print $french_words[$goodbye];
But it looks really strange and clumsy and who wants to bother with all that when a hash works perfectly well?


Declaring and assigning

Again there are lots and lots and lots of ways of declaring and assigning a hash.
You can declare the hash and then add one element at a time:
1.   my %hash;
2. $hash{"up"} = "down";
3. $hash{"top"} = "bottom";
4. $hash{"charm"} = "strange";
What does this all mean? - First, I've declared the hash in line 1 using the % sigil and in lines 2 to 4 I've added individual elements to it. The individual elements of the hash are prefixed with a $ because they hold an individual items - a scalar. Then comes the hash name followed by the key you want in curly brackets, the key can be any string. You then assign this to what you want as a value, more on what you can assign to it later.

or you can declare everything all together:
1. my %hash = ("up", "down", "top", "bottom", "charm", "strange");
Here I've assigned a list to the hash. The list needs to have an even amount of elements because consecutive elements are paired up into key/value pairs. If you have an odd amount of elements, the last value is going to be undef. In this hash the key/value pairs are up-down, top-bottom and charm-strange.

This version is more readable:
1. %hash = ("up" => "down", "top" => "bottom", "charm" => "strange");
so you can see which are the keys and which are the values more easily.

Or you can do the even more readable and, I think, preferred by the perl community:
1.   my %hash = (
2.     up => "down",
3.     top => "bottom",
4.     charm => "strange",
5. );
This is exactly the same as the method above it, only spaced out to be clearer.
And of course if I was coding properly all of the "fat commas" (these things "=>") as they're called would all line up, but this blog thing won't let me do that!

Also the thing about the comma at the end of line 4 applies here as well - you don't need it but it's a good idea to put it there to prevent unnecessary line changes if extra elements are added to the end of the hash. See my array post for more details.

You may have noticed that in my last code example, I didn't put quotes around the keys. You don't actually need them because the key is always going to be a string and perl knows this. You only need to put quotes if you're including whitespace or other special characters such as "-", so in these cases you need to explicitly stringify (make into a string) the key by using the quotes.


Printing
As far as I can see, when you print out a hash, they don't necessarily come out in the same order that you declared them in but they do print out in the same order every time you print them. This is because they are printed out in their internal order, which can't be relied on because it will change if you add or delete key/values pairs but will stay the same otherwise. It also changes with the version of perl you're using because the way the keys are ordered has changed several times.

This is one easy way to print a hash that involves a for loop, which I haven't covered yet. But for the moment, just trust me that it works. This will print out all of the key/value pairs with nice spacing:
1.   print "$_ $hash{$_}\n" for (keys %hash);
writing print %hash does the same thing but squashes everything together

Note The trick I showed to print arrays in my last post doesn't work at all here. If you type print "%hash\n" you will get "%hash" printed to the screen.


Accessing and using elements of the hash

Deleting Elements
Hashes are not fixed sizes so you can add to them and delete from them as you like. Unlike with arrays, when you delete an element from a hash, there won't be and undefs unless you only get rid of the key or only the value. This means you don't need to worry about having any gaps in your hash.

To delete a key/value pair, you can do this:
1.   delete $hash{"up"};
Note that you only have to specify the key and the value will be automatically found and deleted as well. I guess this would be useful in cases where you might not know the value. Maybe.

Adding Elements
This is exactly the same as when you're first creating the hash and adding one element at a time. If you want to add more elements later on, you just do exactly the same thing:
1.   $hash{key} = "value";
There is no need to use "my" because the hash has already been declared, you're just adding to an existing variable.

What can I put in my hash?
You don't always have to have the value as a string, it could be a number or a variable containing a string or a number:
1.  $hash{key} = $value;
Or an array or a hash:
1.  $hash{key} = \@values;
Be very careful when doing this, you have to make sure that you give it a reference to an array or a reference to a hash rather than the thing itself. I'll go more into these later on but this is how to do it for now. You either put a backslash in front if you're using a variable as above or, if you're putting the hash or array straight in, you need to use [] for arrays and {} for hashes.

The reason for doing this is because, if you don't, the array or hash will just become part of the original hash you've created, new keys and values will be created. Hopefully this example will explain what I mean:

1.  my %address = ('Line One' => "5 The Street", 'Town/City' => "London", 'Post Code' => "W15 9QT");
2.
3.my %person = (
4.    Name => "Emma", 
5.    Age => 23,
6.    Height => "164cm",
7.    Address => \%address,
8.    );

The hash drawn out will now look something like this (as you would expect):

Name => Emma
Age => 23
Height => 164 cm
Address => (
    Line One => 5 The Street
    Town/City => London
    Post Code => W15 9QT
)

I created a script that would run the code above but I took out the backslash on line 7. Here is what came out:

Name => Emma
Age => 23
Height => 164cm
Address => Post Code
W15 9QT => Line One
5 The Street => Town/City
London => undef

This is clearly not what we wanted, instead of a hash within a hash, there is only one big hash.

And also, if you were reading carefully before, you'll see that I said each element of the hash contains a scalar. Arrays and hashes aren't scalars but their references are so this is another reason why you must make any hashes or arrays into references if you don't want them to be

Moral of the story, make sure you use a reference if you're going to do a hash within a hash or an array within a hash!!!


Editing Elements
If you want change a value, you just assign it to the key and the old value will be overwritten:
1.  $hash{key} = "new value";
If the key doesn't already exist, a new key/value pair is created so you need to be careful with the spelling of the key when you want to edit a key that's already there or you could end up with the original key and a misspelled version of the key.

Changing the key of a key/value pair is a lot more tricky. On looking up ways to change it, I think the best way to do it is to delete the key/value pair and start again. You can change it, but it's a lot more code than just deleting and starting again.

Duplication
Duplicate keys are not allowed although duplicate values are. If there are duplicate keys declared, only the last one will be acknowledged and the rest will be disregarded. So if we did something like this:
1.   my %hash = (
2.     "Name" => "Fred",
3.     "Weight" => "70kg",
4.     "Height" => "190cm",
5.     "Weight" => "75kg",
6. );
7. print $hash{Weight};
The answer printed will be 75kg.

Can you get the element key from the value?
Kind of. With some coding. There's unfortunately no easy trick here. Also values don't have to be unique so you could end up with the wrong key.

Adding hashes together

This is really easy, assuming you already have two hashes that already have things in them (%hash_one and %hash_two), you can just do this:
1.  %hash_three = (%hash_one, %hash_two);
Easy!

Exists - is the key already in the hash?
This is useful because duplicate keys aren't really allowed so you can check first if the key already exists before you add a new key/value pair.

And finally...
To get a list of all the keys in the hash:
print keys(%hash);
And to get a list of all the values in the hash - you guessed it:
print values(%hash);

Wednesday, 19 February 2014

How does sort actually work?

I talked about sort in my last post, but I started to read more about it and got wondering, how does it actually work?

You can use sort to order lists numerically or lexically. To sort lexically you use the cmp operator within the curly braces:
1.   sort{$a cmp $b}("please", "sort", "this", "list");
and for sorting numerically, you use the <=> operator:
1.   sort{$a <=> $b}(13,2,8,1,3,1,5,21);
What I think this means is that $a and $b refer to the two strings or numbers that are being compared when the algorithm is run. The letter "a" comes before the letter "b" in the alphabet and $a is put in front of $b in the code which means that words beginning with a letter that comes before other words in the alphabet should be sorted in front of those other words. Hopefully this all makes sense and is actually accurate.

Perl 5.6 and earlier used the quicksort algorithm and perl 5.7 and onwards, uses the merge sort algorithm. On wikipedia (http://en.wikipedia.org/wiki/Merge_sort) there's a pretty good animation at the top of the page that shows how merge sort works.

Quicksort - perl 5.6 and earlier

I remember learning this at school and uni and thinking it's pretty cool but also wondering what it was used for. We had to write a program in Java that carried out a quicksort. I should have just submitted one line of perl code...

Quicksort is a divide and conquer algorithm and there are different variations of it based on which position you choose for the pivot (explanation very soon) but I couldn't find out which version was used for sort in perl. So for simplicity, and because it's how I was taught, I'm going to choose the pivot and the value in the middle of the list.

So we have a list of numbers (assuming we're using numbers and sorting them in ascending numerical order) and the value in the centre of the list is the pivot. All numbers lower than the pivot are put on the left of this pivot and all numbers higher are put on the right. This creates three different lists - numbers lower than the pivot, numbers higher than the pivot and the pivot itself. The pivot list is one element long and is considered sorted so now we need to sort the other two lists in the same way. For each list, a pivot is chosen and again, lower numbers go on the left and higher numbers go on the right. This now creates seven lists. This continues until you have lists containing only one element, putting these lists all together will give one sorted list. Here's an example (please feel free to skip the example if you already know how it works or if it doesn't interest you!):

(4,2,9,6,5,1,8,7,3)

The number in the centre of the list is the pivot.

Starting from the left, numbers lower than 5 are put into one list to the left and numbers higher than 5 are put into another list on the right. 4 is taken out first giving:

(4)  (2,9,6,5,1,8,7,3)

Then 2 is taken out and put in the list on the left:

(4,2) (9,6,5,1,8,7,3)

Then 9:

(4,2) (6,5,1,8,7,3) (9)

This is repeated, excluding the pivot until we have three list with the pivot on its own:

(4,2,1,3)  (5)  (9,6,8,7)

Pivots are now chosen from any lists with more than one element. I think this time, as there is no real middle element, I will use the (n+1)/2th element.

(4,2,1,3)  (5)  (9,6,8,7)

The same is applied starting with the first list - all element lower than one go on the left and all numbers higher than one go on the right:

(1) (4,2,3) (5) (9,6,8,7)

The same is applied to the other list:

(1) (4,2,3) (5) (6,7) (8) (9)

Again, pivots are picked for the lists with more than one element:

(1) (4,2,3) (5) (6,7) (8) (9)

And the same sorting is applied giving:

(1) (2) (3,4) (5) (6) (7) (8) (9)

Then the final list that contains more than one element is sorted:

(1) (2) (3) (4) (5) (6) (7) (8) (9)

Put this all together and you have a sorted list:

(1,2,3,4,5,6,7,8,9)

Merge Sort - perl 5.7 onwards

So all of that wasn't entirely relevant for those of you that have upgraded your perl version since 2002 because from perl 5.6, the sort function has used the merge sort algorithm.

Merge sort is also a divide and conquer algorithm. It works by splitting up the list into individual elements. Each element is compared to its neighbour and (again assuming we're using numbers and sorting them in ascending numerical order) the smaller number is put on the left and the larger on the right. Then the next set of pairs are compared and so on until the end of the list. Then these sorted pair are compared to each other and ordered to make groups of four. Then these sorted groups are compared to the other groups and this is repeated until the whole list is in one big, sorted group. Hopefully this example will make all this clearer (again, feel free to skip this bit if you don't really care!):

Say we have a list of:

(4,2,6,5,1,8,7,3)

We then split it into sublists which contain one individual element. This means we have eight sublists of single elements:

(4)  (2)  (6)  (5)  (1)  (8)  (7)  (3)

Then we compare the elements to their next door neighbour. First will be (4) and (2). 2 is smaller so goes in front of 4 and we have our first sorted pair of (2,4)

(2,4)  (6)  (5)  (1)  (8)  (7)  (3)

The smallest sublists are always compared first so we compare the next two single element sublists - (6) and (5) giving:

(2,4)  (5,6)  (1)  (8)  (7)  (3)

This is continued until all the sublists contain two sorted elements:

(2,4)  (5,6),  (1,8)  (7)  (3)

(2,4)  (5,6)  (1,8)  (3,7)

Then the pairs are compared to each other starting with (2,4) and (5,6) and always comparing the first element of each list. First 2 and 5 are compared, 2 is smaller so is taken out to make a new list:

(2,4)  (5,6)  (1,8)  (3,7)

(2

Then the first elements of the target sublists are compared so 4 and 5, 4 is smaller so is taken out making:

(4)  (5,6)  (1,8)  (3,7)

(2,4

Then there's only one target sublist left so elements are taken out one by one from the front:

(5,6)  (1,8)  (3,7)

(2,4,5

(6)  (1,8)  (3,7)

(2,4,5,6)

And finally:

(2,4,5,6) (1,8) (3,7)

Then the pairs (1,8) and (7,3) are compared in the same way with intermediate steps:

(2,4,5,6) (1,8) (3,7)

(1

(2,4,5,6) (8) (3,7)

(1,3

(2,4,5,6) (8) (7)

(1,3,7

(2,4,5,6) (8)

(1,3,7,8)

Giving us two sorted lists of:

(2,4,5,6) (1,3,7,8)

Finally these two sublists are compared in the same way to give one sorted list of

(1,2,3,4,5,6,7,8)

If that doesn't make sense, again, I highly recommend looking at the animation on the wikipedia page.


Why was quicksort ditched in favour of merge sort?

According to the perl docs, the quicksort algorithm used was unstable because it can become quadratic. What????

A stable sort preserves the order of list elements that compare equal so maybe the quicksort algorithm kept sorting list items that were exactly the same as each other, which made the run time and complexity higher.

The run time of both of these algorithms comes out as O(n log n) when averaged over all arrays of length n. However, because quicksort does not always preserve the order of equal elements, its run time can become O(n^2) - which is a quadratic because the n is squared. This behaviour doesn't happen with merge sort so quicksort was replaced.

The perl docs do say that for "some inputs" on "some platforms" the original quicksort was faster, but no other information is given - I'm not sure which inputs or which platforms, but I'm sure that these are a minority. Also as an extra note, I think that 99.99% of the time, any time or performance differences will not be large enough to be noticeable anyway.



Thursday, 13 February 2014

Lists, Lists, Lists

List or Array?

I've written a post about arrays and kind of glossed over the array/list distinction. Some people got back to me about the fact that I hadn't really talked about lists at all so here I'm going to try to explore what a list is and how it differs from an array.

A list is just what it sounds like - a number of items separated by a comma.
An array is assigned a list, it contains the list but it isn't the list itself. I don't think that you can access elements in a list via index numbers unless you explicitly do something to make it be treated like an array.

So this is a list:

("apples", "bananas", "cherries", 45, 360)

We can assign it to an array:
1.   my @array = ("apples", "bananas", "cherries", 45, 360);
And this is an array:

@array

In my post about arrays, I showed a way to get the number of elements in an array using scalar(@arr) and I also talked about using scalar on a list but I didn't explain it very well.

So if we have:
1.   my @arr = ("this", "is", "a", "list);
The difference between
1.   print scalar(@arr);
and
1.   print scalar("this", "is", "a", "list");

is that the first one is giving the scalar value of an array, which is the number of elements in the array (4) and the second one is giving the scalar value of a list, which is the value of the last element ("list"). Two very similar statements, but giving you very different results - I think this is very strange behaviour.


Note To prevent you from having to type loads of quotes in your list, which I know for me definitely slows down my typing and results in a lot of pressing the wrong keys, you can do this:

my @arr = ("this", "is", "a", "list");

my @arr = qw(this is a list);

Much quicker and qw stands for "quote word". The only problem is that the space between the words means that the words are separate list items so if you want to include a space in one of your list items, you're going to have to use quotes.

You can do lots of things with lists and here are some of them.

The following only work on an array so you have to assign your list to an array first.

Pop

This takes the last element from the array and gives it to what you are assigning it to.

my @arr = (1,2,3,4,5);
my $val = pop(@arr);

$val has the value 5 and @arr is now (1,2,3,4)

Push

This one is kind of the opposite of pop. Instead of taking away from the end of the array, you add an element on to the end of it.

my @arr = (1,2,3,4,5);
push (@arr, "x");

@arr is now (1,2,3,4,5,"x")

Shift

This is a different opposite of pop. Instead of taking the last element of the list and assigning it to a value, you take the first element.

my @arr = (1,2,3,4,5);
my $val = shift(@arr);

$val has the value 1 and @arr is now (2,3,4,5)

Unshift

This is the opposite of shift and push. An element is added to the front of the array.

my @arr = (1,2,3,4,5);
unshift (@arr, "x");

@arr is now ("x",1,2,3,4,5)


I think this little table sums up everything above:

Beginning or end of array Add or remove element
Pop End Remove
Push End Add
Shift Beginning Remove
Unshift Beginning Add


Sort

You can also sort your lists using the sort function:

Lexically: 

If sort is used by itself with no parameters, it will sort the list in standard string comparison order, which basically means alphabetical order. You can either use sort directly on a list as I have below or you can assign the list to an array and use sort on the array. Just type "sort" before the list or before the array.
1.   my @arr = sort("hello", "my", "name", "is", "emma");
2.   print "@arr\n";
When this the above code is run, it will come out with:
emma hello is my name
This can also be written as
sort {$a cmp $b} ("hello", "my", "name", "is" "emma"); 
Note
If you have words beginning with capital letters, these will always be sorted in front of lower cased words, even if they come after the lower cased word alphabetically.

To sort the words into a backwards order just reverse the a and b:
sort {$b cmp $a} ("hello", "my", "name", "is" "emma");

Numerically:
1.   my @arr = sort{$a<=>$b}(7,9,4,2,8);
2.   print "@arr\n";
To sort numerically, you use the <=> operator rather than cmp and you still use $a and $b to represent the two numbers being sorted in the alogorithm. When the above code is run, you will get:
(2,4,7,8,9)
Again, to sort backwards, you reverse the a and b:
1.   my @arr = sort{$b<=>$a}(7,9,4,2,8);
2.   print "@arr\n";
 This will print:
(9,8,7,4,2) 
List mapping
 
The map function is used to transform lists element-wise. You can go through each element of a list and perform a function on it and a new list of the new values will be created.

To do the map function, you say that you want to put the result into a new array (@new_numbers in the code below), then you type an equals sing, then the word "map" and then what you want to do to each element in curly brackets. $_ refers to each individual element, kind of like x in a mathematical equation. In the code below what I've said is to take each element and times that element by two. Then you type the list that you want to perform the operation on - you can either write the list out or give an array like I've done.
1.   my @numbers = (1,2,3,4,5);
2.   my @new_numbers = map{$_*2}@numbers;
3.   print "@new_numbers\n";
Which will print:
2 4 6 8 10
You can also apply map to text:

1.   my @text = ("this", "is", "a", "list");
2.   my @new_text = map{$_.":"}@text;
3.   print "@new_text\n";
Which will print:
this: is: a: list:

Grep

Grep is similar to sort, although instead of applying a change to each element, it evaluates the result of the operation and if the result is true, the original value will be put into a new list, if the result is false, the original value will be filtered out.

Again you start with the array you want your new list to be put into (@multiples_of_two), then an equals sign and then the evaluation you want is put in curly braces. The evaluation must create an answer that is either true or false. Then you give the list, either written out as a list or as an array. The code below goes through the list of 1-10 and puts the multiples of two into a new list.

1.   my @numbers = (1,2,3,4,5,6,7,8,9,10);
2.   my @multiples_of_two = grep{$_%2==0}@numbers;
3.   print "@multiples_of_two\n";
This will print:
(2,4,6,8,10)