Perl Adventures: January 2014

Tuesday, 28 January 2014

"Strings"

Strings

A string is basically any amount of characters in sequence including zero characters, which is known as the empty string. Similarly to numbers, the maximum size or length of a string is basically limited to your computer's memory, which is probably far larger than any real string you're going to work with.

These are all examples of strings:

"Hello"
""
"This is a string"
"This 1 is a 2 string 3 including 4 some 5 numbers 8907"
"27195"
"☆☘☺"
"مرحبا"

Perl has full support for unicode (basically any letter or symbol or number in any language you can think of), but if you're going to use characters out of the ASCII range in your program (so variable names and other bits of code that aren't in single quoted strings), you'll need to use the pragma

use utf8;

I think it doesn't use unicode by default for historical reasons so when Perl was written, there was only a need to use ASCII characters.

Quotes

Strings are surrounded by either single or double quotes or delimiters that imply quotes - more on these later. These quotes and delimiters are not part of the string itself, they just indicate that whatever is inside should be treated as a string.

What is the difference between having single and double quotes?

Single quotes mean that whatever is in the quotes is represented literally so newline characters or any other control characters are not interpreted as control characters, they are just printed out literally:

print 'Hello everyone\n'

will print:
Hello everyone\n
which is probably not what you want to see in most cases.

The only character that is interpreted is the backslash. This is to enable you to use a single quote in a string. For example:

'I\'m a Perl programmer'

The backslash means that the quote doesn't end where the single quote is used as an apostrophe (if that makes sense). You can use a single backslash in your string if it isn't at the end of the string. If you do want a backslash at the end of a string, you need to put two in a row. I would say it is advisable to always use two backslashes in a row incase this kind of thing happens as two backslashes will always print just one.

'This string contains one backslash \\'

This will print:
This string contains one backslash \

Double quotes interpret special symbols so you can include all sorts of control characters as well as displaying characters through octal and hex representations. You can also embed other variables into the string.

my $name = "Emma\n"; # will print 'Emma' and then a newline
my $string = "Hello $name"; # will print 'Hello Emma'

Delimiters are useful if you want to use quotes inside your string are used by first typing 'q' for a single quoted string and 'qq' for a double quoted string. Then you type the character that you want to be the delimiter, then write out the string with no quotes and then add the delimiter character at the end.

I thought a delimiter could be any character, but I tried to use letters and numbers and they don't seem to work and anyway using these wouldn't be very advisable because it doesn't look very clear. It does work with any type of punctuation and it's usually useful to have it as a character that doesn't appear in your string because otherwise you have to use a backslash to escape your delimiter character and all this can start to look a bit confusing.
For example:

'Hello World!'
is equal to all of these:

q(Hello World!)
q{Hello World!}
q.Hello World!.

"Hello World!"
is equal to all of these:

qq/Hello World!/
qq[Hello World!]
qq?Hello World!?

I told a lie - you don't always use exactly the same character as you did to start the string. If you notice the brackets always start with a left bracket and end with a right one. Interestingly, you can use two close brackets but you can't use two open brackets. Maybe because if you open a bracket it expects a close bracket and gets very confused and upset if you don't give it one.

You can use punctuation marks like $, @ and % as delimiters. Also very inadvisable because these are used commonly to declare scalars, hashes and arrays and you don't want to confuse yourself or other poor people that have to look at your code.

String Manipulations

There are many many many things you can do with a string in Perl, far too many to list in this post. Some of the main ones and ones that I think would be useful are as follows:

Concatenation

To concatenate (join together) two or more strings you use the . operator.

my $string1 = "apple";
my $string2 = "pear";
my $string3 = "peach";
my $all = $string1.$string2.$string3;
print $all;

This will print:
applepearpeach

You don't have to just concatenate variable names, you can also add in new strings:

print $string1.", ".$string2." and ".string3;

will print:
apple, pear and peach

Repetition

To repeat a string, you put the string in quotes, then x, then the number of times to repeat the string.

print 'Hello' x 5;

This will print:
HelloHelloHelloHelloHello

Interpolation

Interpolation means that you can include a variable in a string and if you put that string in double quotes, the value of the variable will be printed. You must use double quotes for interpolation, because if you use single quotes, the string will be printed literally and you will just see the variable name.

my $name = "Emma";
print "The variable \$name has the value $name";

This will print:
The variable $name has the value Emma

And that's the end of my very long post, congratulations if you've made it this far!

Wednesday, 22 January 2014

Numbers

Many different types of numbers can be represented in Perl. The good thing is that you don't have to specify what kind of number you are using as the interpreter can work it out for itself.

Integers

Integers consist of any whole number either with a sign or without:

1
0
2014
-365
7875493754977009

Perl allows you to add underscores to integers to make them easier for a human to read so for the last number in the list you can write:

7_875_493_754_977_009

This makes absolutely no difference to what the number is or how it is handled, it's just easier for our inferior mortal brains to comprehend.

There aren't really limits as to the highest and lowest integers that can be stored. The limits are basically subject to your computer's memory but I don't think in practise anyone really wants to use numbers that large. Also you will be sacrificing some precision when your numbers get that large and I think the reason for this is similar to why floating points are not entirely precise. Although I could be making that up.

So, on to floating points...

Floating Points

Floating points are numbers with a decimal point and also with or without a sign.
As with most programming languages, integers are (mostly) represented exactly but floating points are an approximation of what the decimal places are. It seems to me that this is because numbers are stored in binary and some fractions, for example a third, can only be expressed with an infinite binary representation. Seeing as computers do not have infinite memory, we have to chop off the end of the number and store the closest thing we have.
I think this level of precision is sufficient for most applications but I guess for research or anything scientific that requires precision, something else is going to be needed.

1.5
-23.45
550.0
-1.0
5.43e24
-5.2e28
-5e-12
6.34e-13

The 'e' in the middle of some of the numbers above mean that they are using the exponent. So 5.43e24 means 54.3 X 10^23 or 543 followed by 22 zeroes.

Non-decimal Integers

I always forget about these types of numbers, it makes me stop and remember that the only reason we count the way we do is because we have ten fingers and in a parallel world, eight-fingered martians are learning about this weird decimal way of counting.

I'm not really sure why anyone would use these in a real world application - maybe I'll see later on.

With non-decimal numbers, we have to indicate which type of numerical system we are using, otherwise they will just be treated as decimal numbers. The way to indicate these are really simple and not at all cumbersome:

Binary
0b1101110 - all binary numbers start with '0b'
Octal
0156 - all octal numbers start with a '0'
Hexadecimal
0x6e - all hex numbers start with '0x'

all the above amount to 110

These numbers can also be signed or unsigned.

Numeric Operators

Perl allows you to use operators on numbers so you can do calculations, these are the main ones:

+ addition
- subtraction
* multiplication
/ division - this will always give you the floating point value if the result of the division is not an integer
% modulus - the values are always reduced to integers first, for example 10.5 % 3.2 will be calculated using 10 % 3 where the answer will be 1
** exponentiation - for example 2**3 means 2^3 or two to the power of three where the answer will be 8

So now we can do things like:

my $sum = 24 + 12;
my $subtraction = 10.5 - 5.3;
my $exponentiation = $sum**$subtraction;

As always, please add comments to this or any other of my posts. I'm a beginner so I need feedback on anything I've written that's wrong and also any extra information or helpful links or anything else you can think of that will help me learn is appreciated.

Tuesday, 21 January 2014

my $scalar;

In most of the books I'm using, the first thing after all the set up and hello world stuff is scalars. This seems to be sensible to me because they're quite simple for a beginner to understand and seemingly fundamental to the Perl language.

From what I understand, scalars make up any singular piece of data - it's when Perl has one of something and often this will be a number or a string but not an array or hash or other complex data type. You can identify a scalar variable because it's prefixed by a $ sign so if you see a variable named:

"$something"

you know that it contains a singular piece of data.

Unlike other programming languages (Java), Perl is not strongly typed and is intelligent enough to know that when you have typed a string in quotes, you want it to behave like a string and when you type a number, you want it to behave like a number.

This means there is no faffing around telling the interpreter what kind of variable you want when it can work it out for itself. All you need to do is tell it what you want the variable to be called and what you want in that variable. Simple

There are ways of giving the variable a number but wanting it to behave like a string and vice versa but these have to explicitly be done. The default is to treat whatever it is as what it looks like - which sounds very sensible to me but I think there could be some potential problems with this so I guess it needs to be kept in mind.

Declaring and assigning a scalar

There are two ways to declare and assign a scalar variable; you can either do it all in one go:

my $name = "Emma";

or declare and assign separately:

my $name; #This is declaring the scalar
$name = "Emma;" #This is assigning it

This has basically said that there is a scalar variable called "name" and the value of this scalar is the string "Emma". The quotes just denote the beginning and the end of the string and won't actually be printed out. If you want to assign a number to a variable, you don't need to put the quotes but more on that later.

If we leave 'my' out and we have use strict turned on (which you should have!!), the program will fall to bits when you try to run it and complain at you with something that looks like this:

Global symbol "$name" requires explicit package name at declaration.pl line 5.
Execution of declaration.pl aborted due to compilation errors.

Not the friendliest of error messages, but I have seen worse. This error message could also mean that you've made a typo when writing a variable name. This will mean having different spellings between the declaration and any usage of it, which will make the interpreter cry.

So, now we know how to declare a scalar, the next question is what exactly can we store in these scalars? Posts to follow on numbers and strings...

Thursday, 16 January 2014

Hello World! - writing and running my first print program

Hmmm what kind of program shall I write first? I wonder if there are any easy start up programs. You know what's coming up...

1. #!/usr/local/bin/perl

2. use strict;

3. use warnings;

4.

5. print "Hello World!\n"

and that's it. Really, nothing more. I can now go and run it.

How do I run it?

Easy, there are two different ways you can run a perl script.

1. type "perl filename" into the command line, replacing "filename" with the name of your script including the extension.

2. make the file executable by typing "chmod +x filename" into the command line, again replacing filename with the name of your script including the extension. This command has changed the access permissions to the file and the "+x" means that we have added execution rights for all users.

We can now run the file in the command line by typing "./filename".

This now appears on the terminal:

Hello World!

What have I actually achieved?
This is of course the infamous Hello World script, which is a rite of passage for every programmer learning a new language. This very small 4/5 line script (have you seen this, Java?) has allowed me to print the words in quotes to the console. Print takes a list so you can either give it one thing to print (some text in quotes) or a list of things to print (more than one set of text in quotes separated by a comma). It then stringifies (as it sounds, turns into a string)* the items passed to it, in this case "Hello World!\n". Then this string (or list of strings) is printed to the standard output channel, which I will explain below.

*not entirely sure how this works but I don't think this post is the place to delve into it. I will do a separate stringy post later on.

Output Channels
When we use the word "print" followed by whatever it is we want to print, it is actually as if we have written "print STDOUT "some text"". STDOUT means that the standard output channel is being used and since this is the default, we don't have to explicitly write it. The standard output channel is directed by default to the terminal screen but this can be changed to a file. There is also another output channel called STDERR, usually reserved for error information, which also is defaulted to go to the terminal screen. This time however, you have to use the whole term "print STDERR" if you want to print to this channel. You will not be able to tell the difference by looking at the output on the screen which text is from the standard output and which is from the standard error channel. To separate these, you can direct one or both of them to separate files.

This is a link to the print section of the perldocs, if you want to read about it in more detail. I'm personally not advanced enough to understand it all yet.
http://perldoc.perl.org/functions/print.html

\n
The \n is an indicator to start a new line so if I wrote another print statement in the script, it would appear underneath "Hello World!" in the console.
Note that double quotes must be used for the \n to be interpreted as a newline character. Single quotes would print the whole string literally and "Hello World!\n" would be printed to the console.

Monday, 13 January 2014

Use Statements and the Whole Shebang

See what I did with the title??.....
Nearly 100% I'm not the first to come up with that.
The shebang and use statements are things that we all put at the top of our perl scripts but do we really know why? Maybe a lot of you do, but I've just been blindly copying and pasting them because I know that my scripts won't work properly without them and it's the right thing to do.
Now I'm actually going to explore what they are for.

Shebang
As far as I can tell shebang is actually a slang word, because why would any normal person come up with that as an official term?
It consists of the characters "#!" and then the path to the interpreter program so the complete shebang line will look similar to the line below:

#!/usr/local/bin/perl

In fact, in all of the perl code I've seen, which admittedly isn't very much, the shebang has had the exact format as above.

I've done a little digging to find out what all this actually means.
When a script with a shebang is run as a program, the program loader (a part of the operating system that is one of the essential stages in the process of starting a program) sees the "#!" characters and then parses the rest of the line as an interpreter directive. What does this mean? The interpreter is the thing that executes the script and is specific for each programming language, but the script could be written in any language so we need to know which interpreter is needed. The path points to the location of the interpreter, in this case, the perl interpreter. This means that the Shebang isn't only used in perl, it's used in python, ruby, PHP and other scripting languages.

Use Statements - also called "Pragmas"

The most common use statements I've seen are:

use strict;
use warnings;
use Modern::Perl 2011;

The last one actually implies use strict and use warnings so you can use it by itself and not have to go through all the bother of writing them separately. But what do they actually mean?

use strict;
This pragma activates three different pragmas:

use strict 'vars'; - this complains when you try to use a variable that you have not previously declared,
use strict 'refs'; - prevents you from using symbolic references*,
use strict 'subs'; - this stops you from using barewords inappropriately, these are words that appear on their own without quotes or other punctuation. For example, with use strict 'subs' activated, my $x = hello; will not be allowed because hello needs to be in quotes.

*I've been looking for ages and I have absolutely no idea what this means! I cannot figure it out. All the explanations I can find just confuse me and talk about concepts that I have no idea about. I guess when I get that far I can come back and fill in this bit.

You can deactivate any of these individual strict pragmas by using "no" rather than "use".

use warnings;

This gives all kinds of debug information, for example typos. It will stop the program from running and tell you the things that you would expect any IDE to. The only thing is that it will only give you one error at a time so you may need to run the script several times to be able to find all the bugs.

You can also activate warnings by typing "-w" in the command line when you run the script.

Both the strict and warning pragmas are very useful to have on your perl scripts so it's advisable to include them at the beginning of the script. Alternatively, you can just write "use Modern::Perl 2011;".

Friday, 10 January 2014

Background of Perl

According to wikipedia, perl is a "family of high-level, general-purpose, interpreted, dynamic programming languages. The languages in this family include Perl 5 and Perl 6"

high-level: has strong abstraction from the details of the computer
general-purpose: to be used for writing software for a variety of applications
interpreted: uses an interpreter to execute the code
dynamic: code execution is at run time rather than compile time

Exciting stuff... Here's some more interesting (I hope) information:

In the short time I've been using perl, I've always wondered where the name actually comes from. I found out that Larry Wall, the creator of Perl just wanted an easy to remember, short word with a nice, positive connotation. It's that simple. Apparently he went through every short word in the dictionary before settling on "Pearl". He then realised before the official release that there was in fact another programming language with the same name already in existence and promptly changed it to the spelling we all know and love - "Perl".

Perl also has a backronym associated with it (making up a phrase to go with the letters of the word):

Practical
Extraction and
Reporting
Language

I've also discovered that Perl was invented in 1987, before I was born, which definitely excuses me for being way behind.

Perl is notorious for the ever lengthening time between releases with the last full release being in 1994. In 2000, Larry Wall took suggestions from the perl community for the development of Perl 6 and created documents called "apocalypses" which showed the changes and proposed design based on these suggestions. From what I see, Perl 6 is still just theoretical although there are implementations of it based on the apocalypses. So I guess we're still waiting. Meanwhile Perl 5 is still being updated and at the time of writing, the latest stable release is 5.18.1.

Thursday, 9 January 2014

Beginning

I've decided to start this blog because I'm just starting out on my perl journey and I think that I learn the best by writing everything down and explaining it in my own words as if I were explaining it to someone else. I think this is a good way to know that you truly understand something, even Einstein said:

"If you can't explain it simply, you don't understand it well enough."

so this is what I'm hoping to achieve.

I have done a bit of perl before on my grad scheme. I got the chance to try perl out and this was about six months ago. At the time I was trying to learn quickly so I could get up to speed with everyone else and actually contribute to the team. This meant I started off learning things by heart rather than really understanding what I was writing. Now it's the end of my grad scheme and I have a permanent job in Perl so I want to start from the beginning again and learn things properly because this is now the start of my development career.

I want to do this properly, which means starting at the very beginning, looking at everything I come into contact with and asking why it's there, what is it's purpose, rather than just accepting it and learning it off by heart without really knowing what it means. I think in this way, I may be learning more slowly but I will be learning completely and understanding everything and hopefully in the future I will be able to pick up Perl concepts more quickly because of this.

I do have a background in computer science, I have a degree from UCL so I've been programming before for a while with my main language being Java. Now, Java and I haven't really been the best of friends and unfortunately it was what was mostly taught. I did some Prolog, Groovy and a hell of a lot of theoretical stuff - logic, maths etc but mostly Java. I'm not really sure why I didn't get on with it, maybe it's because there's so much faff to get anything working, so much code to write to do on simple thing. Maybe it's because I started off behind everyone else and never really caught up - I don't really know. What I do know is that when I started learning Perl, it seemed to me so much easier to understand and so much easier to get started and make things work.

I've called this blog "Perl Adventures" because it feels like I'm at the beginning of a very long and slightly overwhelming journey. It's like I've just stepped off the plane and I am in this new world that I can go and explore. I'm not really sure where to start and I'm not really sure what's out there, but I am little bit excited to find out. - That's enough of the cheesy metaphor.

It's not my main intention, but hopefully I will be able to help other people in the same situation as me.
I'm hoping in this way to shed light on some of the concepts of perl for other people to learn about because I know that finding information about perl isn't the easiest thing in the world. I've often googled a perl term that I don't understand and then been greeted with many more terms I don't understand and then have to google all of those! So maybe if I use the approach of never explaining things with words I don't understand, I can get concepts across that others can really understand.

Here goes, wish me luck!

1.	#!/usr/local/bin/perl
2.	use strict;
3.	use warnings;
4.
5.	print "Hello World!\n"