Beruflich Dokumente
Kultur Dokumente
About Perl
1987
Larry Wall Develops PERL PERL is not officially a Programming Language per se. Walls original intent was to develop a scripting language more powerful than Unix Shell Scripting, but not as tedious as C. PERL is an interpreted language. That means that there is no explicitly separate compilation step. Rather, the processor reads the whole file, converts it to an internal form and executes it immediately. P.E.R.L. = Practical Extraction and Report Language
1989
October 18 Perl 3.0 is released under the GNU Protection License
1991
March 21 Perl 4.0 is released under the GPL and the new Perl Artistic License
Now
Perl 5.14
Variables
A variable is a name of a place where some information is stored. For example: $yearOfBirth = 1976; $currentYear = 2011; $age = $currentYear-$yearOfBirth; print $age;
The variables in the example program can be identified as such because their names start with a dollar ($). Perl uses different prefix characters for structure names in programs. Here is an overview: $: variable containing scalar values such as a number or a string @: variable containing a list with numeric keys %: variable containing a list with strings as keys &: subroutine *: matches all structures with the associated name
Operations on numbers
Perl contains the following arithmetic operators: +: sum -: subtraction *: product /: division %: modulo division **: exponent Apart from these operators, Perl contains some built-in arithmetic functions. Some of these are mentioned in the following list: abs($x): absolute value int($x): integer part rand(): random number between 0 and 1 sqrt($x): square root
# count the number of lines in a file open INPUTFILE, <$myfile; (-r INPUTFILE) || die Could not open the file $myfile \n; $count = 0; while($line = <INPUTFILE>) { $count++; } print $count lines in file $myfile\n;
Conditional structures
# determine whether number is odd or even
print "Enter number: "; $number = <>; chomp($number); if ($number-2*int($number/2) == 0) { print "$number is even\n"; } elsif (abs($number-2*int($number/2)) == 1) { print "$number is odd\n"; } else { print "Something strange has happened!\n"; }
All these operators can be used for comparing two numeric values in an if condition.
Truth expressions
three logical operators: and: and (alternative: &&) or: or (alternative: ||) not: not (alternative: !)
Iterative structures
#print numbers 1-10 in three different ways $i = 1; while ($i<=10) { print "$i\n"; $i++; }
for ($i=1;$i<=10;$i++) { print "$i\n"; } foreach $i (1,2,3,4,5,6,7,8,9,10) { print "$i\n"; } Stop a loop, or force continuation: last; # C break next; # C continue; Exercise: Read ten numbers and print the largest, the smallest and a count representing how many of them are dividable by three. if (not(defined($largest)) or $number > $largest) { $largest = $number; } if ($number-3*int($number/3) == 0) { $count3++; }
Examples
# replace first occurrence of "bug" $text =~ s/bug/feature/; # replace all occurrences of "bug" $text =~ s/bug/feature/g; # convert to lower case $text =~ tr/[A-Z]/[a-z]/; # delete vowels $text =~ tr/AEIOUaeiou//d; # replace nonnumber sequences with x $text =~ tr/[0-9]/x/cs; # replace all capital characters by CAPS $text =~ s/[A-Z]/CAPS/g; Simple example: Print all lines from a file that include a given sequence of characters [emulate grep behavior]
Regular expressions
\b: word boundaries \d: digits \n: newline \r: carriage return
Examples: 1. Clean an HTML formatted text 2. Grab URLs from a Web page 3. Transform all lines from a file into lower case
Practical construction operators ($x..$y) @x = (1..6) # same as (1, 2, 3, 4, 5, 6) @y = (1.2..4.2) # same as (1.2, 2.2, 3.2, 4.2, 5.2) @z = (2..5,8,11..13) # same as (2,3,4,5,8,11,12,13) qw() ("quote word") function qw(Jan Piet Marie) is a shorter notation for ("Jan","Piet","Marie"). split function
Split function
$string = "Jan Piet\nMarie \tDirk"; @list = split /\s+/, $string; # yields ( "Jan","Piet","Marie","Dirk" ) $string = " Jan Piet\nMarie \tDirk\n"; # watch out, empty string at the begin and end!!! @list = split /\s+/, $string; # yields ( "", "Jan","Piet","Marie","Dirk", "" ) $string = "Jan:Piet;Marie---Dirk"; # use any regular expression... @list = split /[:;]|---/, $string; # yields ( "Jan","Piet","Marie","Dirk" ) $string = "Jan Piet"; # use an empty regular expression to split on letters @letters= split //, $string; # yields ( "J","a","n"," ","P","i","e","t")
Example: 1. Tokenize a text: separate simple punctuation (, . ; ! ? ( ) ) 2. Add all the digits in a number
($a, $b) = ("one","two"); ($onething, @manythings) = (1,2,3,4,5,6) # now $onething equals 1 # and # @manythings = (2,3,4,5,6) ($array[0],$array[1]) = ($array[1],$array[0]); # swap the first two
Pay attention to the fact that assignment to a variable first evaluates the right hand -side of the expression, and then makes a copy of the result @array = ("an","bert","cindy","dirk"); @copyarray = @array; # makes a copy $copyarray[2] = "XXXXX";
shift ARRAY works on the left end of the list, but is otherwise the same as pop.
unshift ARRAY LIST puts stuff on the left side of the list, just as push does for the right side.
For example: @large = grep $_ > 10, (1,2,4,8,16,25); # returns (16,25) @i_names = grep /i/, @array; # returns ("cindy","dirk")
Example: Print all lines from a file that include a given sequence of characters [emulate grep behavior]
map OPERATION LIST is an extension of grep, and performs an arbitrary operation on each element of a list. For example: @more = map $_ + 3, (1,2,4,8,16,25); # returns (4,5,7,11,19,28) @initials = map substr($_,0,1), @array; # returns ("a","b","c","d")
Hashes (contd)
Examples $wordfrequency{"the"} = 12731; # creates key "the", value 12731 $phonenumber{"An De Wilde"} = "+31-20-6777871"; $index{$word} = $nwords; $occurrences{$a}++; # if this is the first reference, # the value associated with $a will # be increased from 0 to 1
%birthdays = ("An","25-02-1975","Bert","12-10-1953","Cindy","23-05-1969","Dirk","01-04-1961"); # fill the hash %birthdays = (An => "25-02-1975", Bert => "12-10-1953", Cindy => "23-05-1969", Dirk => "01-041961" ); # fill the hash; the same as above, but more explicit
Operations on Hashes
- keys HASH returns a list with only the keys in the hash. As with any list, using it in a scalar context returns the number of keys in that list. - values HASH returns a list with only the values in the hash, in the same order as the keys returned by keys. foreach $key (sort keys %hash ){ push @sortedlist, ($key , $hash{$key} ); print "Key $key has value $hash{$key}\n"; }
Operations on Hashes
reverse the direction of the mapping, i.e. construct a hash with keys and values swapped: %backwards = reverse %forward;
(if %forward has two identical values associated with different keys, those will end up as only a single element in %backwards)
- hash slice @birthdays{"An","Bert","Cindy","Dirk"} = ("25-02-1975","12-10-1953","23-
05-1969","01-04-1961"); each( HASH ) traverse a hash while (($name,$date) = each(%birthdays)) { print "$name's birthday is $date\n"; } # alternative: foreach $key (keys %birthdays)
Multidimensional structures
Hash of arrays %lexicon1 = ( the => [ "Det", 12731 ], man => [ "Noun", 658 ], with => [ "Prep", 3482 ] ); Hash of hashes %lexicon2 = (
to numbers
);
the => { Det => 12731 }, man => { Noun => 658 , Verb => 12 }, with => { Prep => 3482 }
Programming Example
A program that reads lines of text, gives a unique index number to each word and counts the word frequencies
#!/usr/local/bin/perl # read all lines in the input $nwords = 0; while(defined($line = <>)){ # cut off leading and trailing whitespace $line =~ s/^\s*//; $line =~ s/\s*$//; # and put the words in an array @words = split /\s+/, $line; if(!@words){ # there are no words? next; } # process each word... while($word = pop @words){ # if it's unknown assign a new index if(!exists($index{$word})){ $index{$word} = $nwords++; } # always update the frequency $frequency{$word}++; } } # now we print the words sorted foreach $word ( sort keys %index ){ print "$word has frequency $frequency{$word} and index $index{$word}\n"; }
A note on sorting
If we would like to have the words sorted by their frequency instead of by alphabet, we need a construct that imposes a different sort order. sort function can use any sort order that is provided as an expression. - the usual alphabetical sort order: sort { $a cmp $b } @list; !! $a and $b are placeholders for the two items from the list that are to be compared. Do not attempt to replace them with other variable names. Using $x and $y instead will not provide the same effect - a numerical sort order sort { $a <=> $b } @list; - for a reverse sort, change the order of the arguments: sort { $b <=> $a } @list; - sort the keys of a hash by their value instead of by their own identity, substitute the values for the arguments of sort: sort { $hash{$b} <=> $hash{$a} } ( keys %hash )
sub askForInput { print "Please enter something: "; } # function call &askForInput();
Tip: put related subroutines in a file (usually with the extention .pm = perl module) and include the file with the command require: # files with subroutines are stored here use lib "C:\PERL\MYLIBS"; # we will use this file require "nlp";
Variables Scope
A variable $a is used both in the subroutine and in the main part program of the program.
$a = 0; print "$a\n";
sub changeA { $a = 1; } print "$a\n"; &changeA(); print "$a\n"; The value of $a is printed three times. Can you guess what values are printed? - $a is a global variable.
Variables Scope
Hide variables from the rest of the program using my.
Access the argument values inside the procedure with the special list @_. E.g. my($number, $letter, $string) = @_; # reads the parameters from @_ - A tricky problem is passing two or more lists as arguments of a subroutine. &sub(@a,@b) the subroutine receives the two list as one big one and it will be unable to determine where the first ends and where the second starts. - pass the lists as reference arguments: &sub(\@a,\@b).
- Subroutines also use a list as output. # the return statement from a subroutine return(1,2); # or simply (1,2) # read the return values from the subroutine ($a,$b) = &subr().
- Read the main program arguments using $ARGC and @ARGV (same as in C)
Example open(INFILE,"myfile") or die("cannot open myfile!"); Other About $_ Holds the content of the current variable Examples: while(<INFILE>) # $_ contains the current line read foreach (@array) # $_ contains the current element in @array