Perl A language for Systems and Network Administration and Management Nick Urbanik nicku@vtc.edu.hk Department of Information and Communications Technology Copyright Conditions: Open Publication License (see http://www.opencontent.org/openpub/) What is Perl? What is Perl? . . . . . . . . . . . . . . . . . . . . . What is Perl? — 2 . . . . . . . . . . . . . . . . . Compiled and run each time . . . . . . . . . . . Perl is Evolving . . . . . . . . . . . . . . . . . . . . Eclectic. . . . . . . . . . . . . . . . . . . . . . . . . . Regular Expressions. . . . . . . . . . . . . . . . . Example Problem Why should I learn it? . . . . . . . . . . . . . . . The available data . . . . . . . . . . . . . . . . . . Sample data for new courses: . . . . . . . . . . Problems . . . . . . . . . . . . . . . . . . . . . . . . Solution in Perl — 1 . . . . . . . . . . . . . . . . Solution in Perl — 2 . . . . . . . . . . . . . . . . Solution in Perl — 3 . . . . . . . . . . . . . . . . But I can use any other language! . . . . . . . Other Solutions may take Longer to Write The hello world program . . . . . . . . . . . . . Variables Variables. . . . . . . . . . . . . . . . . . . . . . . . . Scalars: . . . . . . . . . . . . . . . . . . . . . . . . . @Array . . . . . . . . . . . . . . . . . . . . . . . . . . %Hashes . . . . . . . . . . . . . . . . . . . . . . . . . Conclusion . . . . . . . . . . . . . . . . . . . . . . . Perl Community An Overview of Perl . . . . . . . . . . . . . . . . Where do I get Perl? . . . . . . . . . . . . . . . . Where do I get Info about Perl?—1. . . . . . Where do I get Info about Perl?—2. . . . . . CPAN, PPM: Many Modules . . . . . . . . . . PPM: Perl Package Manager . . . . . . . . . . Mailing Lists: help from experts . . . . . . . . How to ask Questions on a List. . . . . . . . . The Shabang Where is Perl on my system? . . . . . . . . . . How OS knows it’s a Perl program—1 . . . . How OS knows it’s a Perl program—2 . . . .   . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . slide slide slide slide slide slide #2 #3 #4 #5 #6 #7 . slide #8 . slide #9 slide #10 slide #11 slide #12 slide #13 slide #14 slide #15 slide #16 slide #17 slide slide slide slide slide slide slide slide slide slide slide slide slide #18 #19 #20 #21 #22 #23 #24 #25 #26 #27 #28 #29 #30 . . . . . . . . . . . slide #31 . . . . . . . . . . . slide #32 . . . . . . . . . . . slide #33 Language Overview Language Overview . . . . . . . . . . . . . . . Language Overview — 2 . . . . . . . . . . . Data Types Funny Characters , @, % . . . . . . . . . . Arrays . . . . . . . . . . . . . . . . . . . . . . . . Array Examples. . . . . . . . . . . . . . . . . . More About Arrays . . . . . . . . . . . . . . . List Assignment. . . . . . . . . . . . . . . . . . Even More About Arrays . . . . . . . . . . . Scalar, List Context . . . . . . . . . . . . . . . Hashes . . . . . . . . . . . . . . . . . . . . . . . . Initialising a Hash . . . . . . . . . . . . . . . . Hash Examples — 1. . . . . . . . . . . . . . . Hash Examples — 2. . . . . . . . . . . . . . . Hash slices . . . . . . . . . . . . . . . . . . . . . Another Hash Example . . . . . . . . . . . . Hashes are Not Ordered . . . . . . . . . . . . Good Practice Discipline—use warnings . . . . . . . . . . use strict and Declaring Variables . . . Examples of use strict and Variables . Operators, Quoting Operators and Quoting . . . . . . . . . . . . Quoting . . . . . . . . . . . . . . . . . . . . . . . Input, Output Input and Output . . . . . . . . . . . . . . . . What is Truth? . . . . . . . . . . . . . . . . . . Statements Statements for Looping and Conditions . if Statements . . . . . . . . . . . . . . . . . . . unless Statement . . . . . . . . . . . . . . . . while loop . . . . . . . . . . . . . . . . . . . . . Input with while. . . . . . . . . . . . . . . . . The Special variable. . . . . . . . . . . . . while and the <> operator . . . . . . . . . . while and the <> operator — 2 . . . . . .   ¡ . . . . . . . . . . . . . slide #34 . . . . . . . . . . . . . slide #35 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . slide slide slide slide slide slide slide slide slide slide slide slide slide slide #36 #37 #38 #39 #40 #41 #42 #43 #44 #45 #46 #47 #48 #49 . . . . . . . . . . . . . slide #50 . . . . . . . . . . . . . slide #51 . . . . . . . . . . . . . slide #52 . . . . . . . . . . . . . slide #53 . . . . . . . . . . . . . slide #54 . . . . . . . . . . . . . slide #55 . . . . . . . . . . . . . slide #56 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . slide slide slide slide slide slide slide slide #57 #58 #59 #60 #61 #62 #63 #64 for loop . . . . . . . . . . . . . . . . . . . . . foreach loop . . . . . . . . . . . . . . . . . Iteration Iterating over a Hash . . . . . . . . . . . . Iterating over a Hash in Sorted Order Iterating over a Hash in Sorted Order Other Statements Exit a Loop Early . . . . . . . . . . . . . . “Backwards” Statements . . . . . . . . . “Backwards” Statements—Examples . List Operations Array Operations—push and pop . . . Array Ops—shift and unshift . . . . split and join. . . . . . . . . . . . . . . . Subroutines Subroutines. . . . . . . . . . . . . . . . . . . Parameters — 1 . . . . . . . . . . . . . . . Parameters — 2 . . . . . . . . . . . . . . . Error Handling Checking for Errors: die and warn . . File and Process I/O Files and Filehandles . . . . . . . . . . . . Open for Writing . . . . . . . . . . . . . . . Executing External Programs . . . . . . system . . . . . . . . . . . . . . . . . . . . . . Was system Call Successful? . . . . . . Was system Call Successful? — 2 . . . Backticks: ‘...‘ or qx{...} . . . . . . See the perl summary . . . . . . . . . . . Regular Expressions Regular Expressions. . . . . . . . . . . . . What is a Regular Expression? . . . . . Regular Expressions as a language . . How to use a Regular Expression . . . What do they look like? . . . . . . . . . . Example: searching for “Course:” . . The “match operator” =∼ . . . . . . . . . . . . . . . . . . . . . . . slide #65 . . . . . . . . . . . . . . . slide #66 . . . . . . . . . . . . . . . slide #67 . . . . . . . . . . . . . . . slide #68 . . . . . . . . . . . . . . . slide #69 . . . . . . . . . . . . . . . slide #70 . . . . . . . . . . . . . . . slide #71 . . . . . . . . . . . . . . . slide #72 . . . . . . . . . . . . . . . slide #73 . . . . . . . . . . . . . . . slide #74 . . . . . . . . . . . . . . . slide #75 . . . . . . . . . . . . . . . slide #76 . . . . . . . . . . . . . . . slide #77 . . . . . . . . . . . . . . . slide #78 . . . . . . . . . . . . . . . slide #79 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . slide slide slide slide slide slide slide slide slide slide slide slide slide slide slide #80 #81 #82 #83 #84 #85 #86 #87 #88 #89 #90 #91 #92 #93 #94 The “match operator” =∼ — 2 . . . . . . . /i — Matching without case sensitivity. Using !∼ instead of =∼ . . . . . . . . . . . . Embedding variables in regexps . . . . . . The Metacharacters . . . . . . . . . . . . . . . Character Classes [...]. . . . . . . . . . . . Examples of use of [...] . . . . . . . . . . . Negated character class: [^...] . . . . . . Example using [^...] . . . . . . . . . . . . . Shorthand: Common Character Classes . Matching any character . . . . . . . . . . . . Matching the beginning or end . . . . . . . Matching Repetitions: * + ? {n,m} . . Example using .* . . . . . . . . . . . . . . . . Capturing the Match with (...) . . . . . Capturing the match: greediness . . . . . . Being Stingy (not Greedy): ? . . . . . . . . Being Less Greedy: Example . . . . . . . . Sifting through large amounts of data . . Capturing the Match: (...) . . . . . . . . The Substitution Operator s/// . . . . . . Avoiding leaning toothpicks: /\/\/. . . . Substitution and the /g modifier. . . . . . Readable regex: /x Modifier . . . . . . . . . Other Topics Special Vars: Input Record Separator . . Paragraph, Whole-file Modes . . . . . . . . localising Global Variables . . . . . . . . . One Line Perl Programs . . . . . . . . . . . . References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . slide #95 slide #96 slide #97 slide #98 slide #99 slide #100 slide #101 slide #102 slide #103 slide #104 slide #105 slide #106 slide #107 slide #108 slide #109 slide #110 slide #111 slide #112 slide #113 slide #114 slide #115 slide #116 slide #117 slide #118 slide slide slide slide slide #119 #120 #121 #122 #123 What is Perl? Perl is a programming language The best language for processing text Cross platform, free, open Microsoft have invested heavily in ActiveState to improve support for Windows in Perl Has excellent connection to the operating system Has enormous range of modules for thousands of application types Perl — slide #2 ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ SNM — ver. 1.7 What is Perl? — 2 Robust and reliable (has very few bugs) Supports object oriented programming Good for big projects as well as small Java 1.4 has borrowed one of Perl’s best features: regular expressions Perl has garbage collection The “duct tape of the Internet” Easy to use, since it usually “does the right thing” Based on freedom of choice: “There is more than one way to do it!” — timtowtdi £ SNM — ver. 1.7 Perl — slide #3 Compiled and run each time Perl is interpreted, but runs about as fast as a Java program Software development is very fast The Apache web server provides mod perl, allows Perl applications to run very fast Used on some very large Internet sites: – The Internet Move Database – Macromedia, Adobe, http://slashdot.org/ SNM — ver. 1.7 Perl — slide #4 ¢ Regular Expressions One of the best features of Perl A new concept for most of you . . . But very useful! Used to: – extract information from text – transform information – You will spend much time in this topic learning about regular expressions — see slide 47 SNM — ver. 1.7 Perl — slide #7 ¢ ¢ ¢ ¢ ¢ Perl is Evolving Perl 6 will introduce many great features to make Perl – easier to use – Even more widely usable for more purposes – Even better for bigger projects SNM — ver. 1.7 Perl — slide #5 ¢ ¢ ¢ ¢ Why should I learn it? It will be in the final exam! – Okay, that’s to get your attention, but. . . Consider a real-life sys-admin problem: – You must make student accounts for 1500 students – TEACHING BEGINS TOMORROW!!! – The Computing Division has a multi-million dollar application to give you student enrollment data – . . . but it can only give you PDF files with a strange and irregular format for now (But Oh, it will be infinitely better in the future! Just wait a year or two. . . ) ¢ Eclectic Borrows ideas from many languages, including: C, C++ Shell Lisp basic . . . even Fortran Many others. . . Perl — slide #6 ¢ ¢ ¢ ¢ ¢ ¢ ¢ SNM — ver. 1.7 SNM — ver. 1.7 Perl — slide #8 The available data Has a variable number of lines before the student data begins Has a variable number of columns between different files Has many rows per enrolled student Goes on for dozens of pages, only 7 students per page!!!!!!! There are two formats, both equally peculiar!!!! Perl — slide #9 ¢ Solution in Perl — 1 #! /usr/bin/perl -w use strict; my $course; my $year; while ( <> ) { chomp; if ( /^\s*Course :\s(\d+)\s/ ) { $course = $1; undef $year; next; } SNM — ver. 1.7 ¢ ¢ ¢ ¢ SNM — ver. 1.7 Perl — slide #12 Solution in Perl — 2 Sample data for new courses: elsif ( m!^\s*Course :\s(\d+)/(\d)\s! ) { $course = $1; $year = $2; next; } if ( my ( $name, $gender, $student_id, $hk_id ) = m{ \s\s+ # at leaset 2 spaces ( # this matches $name [A-Z]+ # family name is upper case (?:\s[A-Z][a-z]*)+ # one or more given names ) \s\s+ # at leaset 2 spaces ([MF]) # gender \s+ # at least one space (\d{9}) # student id is 9 digits # at leaset 2 spaces \s\s+ ([a-zA-Z]\d{6}\([\dA-Z]\)) # HK ID }x ) ¥# ¨ ¤! " ¥  ¥ ¤¥ ¦ §¨© ¦    ¤ ¥ $ !% ! ¤ !  ¤ §¨&    ¤ & )1 0 ¤ ¤ & )1 0 ¤ ! ¤! " ¥   © 2 ) §¨ ' ¦( SNM — ver. 1.7 ) Perl — slide #10 2 Problems There is a different number of lines above the student records There is a different number of characters within each column from file to file There are many files The format can change any time the computing division determines necessary Perl — slide #11 ¢ SNM — ver. 1.7 Perl — slide #13 ¢ Solution in Perl — 3 { print "sex=$gender, student ID = $student_id, ", "hkID = $hk_id, course = $course, name=$name, ", defined $year ? "year = $year\n" : "\n"; next; } warn "POSSIBLE UNMATCHED STUDENT: $_\n" if m!^\s*\d+\s+!; } SNM — ver. 1.7 ¢ ¢ SNM — ver. 1.7 Perl — slide #14 But I can use any other language! I will give you HK 200 if you are the first person to write a solution in another language in fewer keystrokes   ¢ Variables There are three basic types of variable: Scalar (can be a number or string or. . . ) Array (an ordered array of scalars) Hash (an unordered array of scalars indexed by strings instead of numbers) Each type distinguished with a “funny character” Perl — slide #18 ¢ 3 ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ Note: the Perl solution given has: – comments – Plenty of space to show structure – . . . and handles exceptional situations (i.e., it is robust) To claim your 200 from Nick, your solution must have – similar space for comments – Similar readability and robustness – Be written in a general purpose language using ordinary libraries   ¢ SNM — ver. 1.7 SNM — ver. 1.7 ¢ Perl — slide #15 Scalars: Other Solutions may take Longer to Write This program took a very short time to write It is very robust For problems like this, Perl is second to no other programming language. Perl — slide #16 ¢ Start with a dollar sign Hold a single value, not a collection A string is a scalar, so is a number Since Perl is a loosely typed language, a scalar can be an integer, a floating point number, a character or a string. – Note that later you will see that a scalar can also hold a reference to another piece of data, which may also be an array or hash. Examples: $apple = 2; $banana = "curly yellow fruit"; SNM — ver. 1.7 ¢ ¢ The hello world program print "hello world\n" SNM — ver. 1.7 Perl — slide #17 SNM — ver. 1.7 ¢ Perl — slide #19 @Array Starts with a @ Indexes start at 0, like in C or Java Each entry in an array is a scalar. – Multidimensional arrays are made by entry of an array being a reference to another array. See slide 19 Perl — slide #20 ¢ ¢ An Overview of Perl ¢ A language for Systems and Network Administration and Management: An overview of the language SNM — ver. 1.7 Perl — slide #23 SNM — ver. 1.7 %Hashes Unfamiliar concept to many of you Like an array, but indexed by a string A data structure like a database See slide 22 Perl — slide #21 ¢ ¢ ¢ ¢ ¢ ¢ SNM — ver. 1.7 Conclusion Perl is optimised for text and systems administration programming Has great portability Is strongly supported by Microsoft Has three main built-in data types: Scalar: starts with ¡ ¢ ¢ ¢ ¢ ¢ ¢ Where do I get Perl? For Windows, go to http://www.activestate.com, download the installer For Linux: it will be already installed For other platforms: go to http://www.perl.com This is a good source of other information about Perl Perl — slide #24 ¢ ¢ Array: starts with @ Hash: starts with % Perl — slide #22 SNM — ver. 1.7 ¢ SNM — ver. 1.7 ¢ Where do I get Info about Perl?—1 On your hard disk: –$ function 78 45 6 9@ B A ¢ CPAN, PPM: Many Modules A very strong feature of Perl is the community that supports it There are tens of thousands of third party modules for many, many purposes: – Eg. Net::LDAP module supports all ldap operations, Net::LWP provides a comprehensive web client Installation is easy: $ cpan> 8 EF ¢ 7 GH IP Q 5R 9 45 6 A5 TS 7 7 A 5Q XY P VU WU E W I E 7 7 ¢ ¢ will look up the documentation for the built-in function (from the documentation perlfunc) word 6 78 9@ –$ –$ 45 C will look up word in the headings of the faq 78 6 9@ 45 C A D 45 shows a list of much of your locally installed documentation, divided into topics C 6 7 ¢ – ActiveState Perl provides a Programs menu item that links to online html documentation SNM — ver. 1.7 Perl — slide #25 Will check if a newer version is available on the Internet from cpan, and if so, download it, compile it, test it, and if it passes tests, install it. Perl — slide #27 SNM — ver. 1.7 PPM: Perl Package Manager Where do I get Info about Perl?—2 Web sites: – http://www.perl.com – http://www.activestate.com – http://use.perl.org See slide 64 for a list of books. Perl — slide #26 ¢ ¢ ¢ ¢ For Windows Avoids need for a C compiler, other development tools Download precompiled modules from ActiveState and other sites, and install them: C:\> TS 7 7 5Q XY VU WU E W P 4 4` I ¢ See documentation with ActiveState Perl Perl — slide #28 SNM — ver. 1.7 SNM — ver. 1.7 ¢ Mailing Lists: help from experts There are many mailing lists and newsgroups for Perl When subscribe to mailing list, receive all mail from list When send mail to list, all subscribers receive For Windows, many lists at http://www.activestate.com Perl — slide #29 ¢ How OS knows it’s a Perl program—1 To run your Perl program, os needs to call perl How does os know when to call Perl? Linux, Unix: – programs have execute permission: $ program `R @ 98 ab C C ¢ ¢ ¢ SNM — ver. 1.7 ¢ ¢ ¢ How to ask Questions on a List I receive many email questions from students about many topics Most questions are not clear enough to be able to answer in any way except, “please tell me more about your problem” Such questions sent to mailing lists are often unanswered Need to be concise, accurate, and clear see also Eric Raymond’s How to Ask Questions the Smart Way at http://catb.org/∼esr/faqs/smart-questions.html Search the faqs first—see slide 13 Perl — slide #30 ¢ os reads first 2 bytes of program: if they are “#!” then read to end of line, then use that as the interpreter os doesn’t care what your program file is called – If program file is not in a directory on your PATH, call it like this: $ program Perl — slide #32 SNM — ver. 1.7 c d ¢ ¢ ¢ ¢ How OS knows it’s a Perl program—2 Windows: – os uses the extension of the file to decide what to do (e.g., .bat, .exe) ¢ SNM — ver. 1.7 ¢ Where is Perl on my system? ActiveState Perl installs perl.exe in C:\Perl\perl.exe Linux systems have a standard location for perl at /usr/bin/perl On some Unix systems, /usr/local/bin/perl it may be installed at ¢ ¢ – Your program names end with .pl For cross platform support: – Put this at the top of all your programs: #! /usr/bin/perl -w – Name your programs with an extension .pl Perl — slide #31 SNM — ver. 1.7 Perl — slide #33 SNM — ver. 1.7 ¢ ¢ Language Overview variables: scalars, arrays and hashes — 18– 27 e e ¢ Funny Characters , @, % Variables in Perl start with a funny character Why? No problem with reserved words: can have a variable called while, and another variable called @while, and a third called %while. Can interpolate value into a Double-quoted string (but not a single quoted string): my $string = "long"; my $number = 42.42; print "my string is $string ", "and my number is $number\n"; SNM — ver. 1.7 Perl — slide #36 ¡ e e ¢ compiler warnings, use strict; — 26– 27 operators, quoting — 28– 29 e e ¢ ¢ input and output — 30 e ¢ statements: — 31 e ¢ – if. . . elsif. . . else and unless statements — 31– 32 e – while, for and foreach loops — 32– 36 e iterating over arrays and hashes — 36– 37 e e e e e – Exit early from a loop with last, and next — 38 – “backwards” statements — 38– 39 SNM — ver. 1.7 Perl — slide #34 C e e Arrays Define an array like this: my @array = ( 1, 5, "fifteen" ); This is an array containing three elements The first can be accessed as array[0], second as array[1], the last as array[2] ¡ ¡ ¡ ¢ ¢ Language Overview — 2 We also will examine: – subroutines, parameters and return statement — 41– 42 e e ¢ – array operations — 39– 40 e – Error reporting: die and warn — 42 e e ¢ ¢ ¢ ¢ ¢ – Opening files — 43– 44 e Note that since each element is a scalar, it has the funny character for a scalar variable value In Perl, we seldom use an array with an index—use list processing array operations: push, pop, shift, unshift, split, grep, map and iterate over arrays with the foreach statement—see slide 36 – higher level. Perl — slide #37 ¡ – executing external programs — 44– 46 e e e – regular expressions — 47– 60 e – Special input modes — 61– 62 e – One line Perl programs — 63 SNM — ver. 1.7 e e e Perl — slide #35 SNM — ver. 1.7 ¢ ¢ 3 Array Examples Use the qw// “quote words” operator to help initialise arrays — see slide 29 See slide 36 for how the foreach loop works. my @fruit = qw( apple banana mandarin peach pear plum ); foreach my $fruit ( @fruit ) { print "$fruit\n"; } Note that these two are equivalent: my @fruit = qw( apple banana mandarin peach pear plum ); my @fruit = ( "apple", "banana", "mandarin", "peach", "pear", "plum" ); SNM — ver. 1.7 Perl — slide #38 ¢ ¢ More About Arrays Instead of initialiasing the array as in slide 19, we can initialise the elements one by one: my @fruit; $fruit[ 0 ] = "apple"; $fruit[ 1 ] = "banana"; # ... $fruit[ 5 ] = "plum"; We can get a slice of an array: my @favourite_fruit = @fruit[ 0, 3 ]; print "@favourite_fruit\n"; – execute the program: $ ./slice.pl apple peach SNM — ver. 1.7 Perl — slide #39 ¢ ¢ ¢ ¢ List Assignment We can use a list of scalars whenever it makes some sense, e.g., – We can assign a list of scalars to a list of values Examples: my ( @a, $b, $c ) = ( 1, 2, 3 ); my @array = ( @a, $b, $c ); my ( $d, $e, $f ) = @array; SNM — ver. 1.7 Perl — slide #40 ¢ Even More About Arrays How many elements are in the array? See slide 22 print scalar @fruit, "\n" Does the array contain any data? See slide 32 ¢ ¢ Hashes Hashes are probably new to you Like an array, but indexed by a string Similar idea was implemented in java.lang.HashTable Perl hashes are easier to use Perl — slide #43 ¢ print "empty\n" unless @fruit; Is there any data at the index index? if ( defined $fruit[ $index ] and $fruit[ $index ] eq "apple" ) { print "found an apple.\n"; } – See perldoc -f defined. Also see perdoc -f exists. SNM — ver. 1.7 Perl — slide #41 ¡ ¢ SNM — ver. 1.7 Initialising a Hash my %hash = ( NL => ’Netherlands’, BE => ’Belgium’ ); This creates a hash with two elements one is hash{NL}, has value “Netherlands”; the other is hash{BE} with value “Belgium” The “=>” is a “quoting comma”. – It is the same as a comma, but it also quotes the string on its left. – So you can write the above like this: my %hash = ( ’NL’, ’Netherlands’, ’BE’, ’Belgium’ ); but the “=>” operator make it more clear which is the key and which is the value. SNM — ver. 1.7 Perl — slide #44 ¡ ¡ ¢ ¢ ¢ ¢ Scalar, List Context Each part of a program expects a value to be either scalar or list Example: print is a list operator, so if you print something, it is in list context If you look in the Perl Reference, you will see LIST shown as a parameter to many functions. – Any value there will be in a list context Many built-in functions, and your own functions (see perldoc -f wantarray), can give a different result in a scalar or list context force scalar context with scalar, e.g., ¢ ¢ print "the time is now ", scalar localtime, "\n"; SNM — ver. 1.7 Perl — slide #42 ¢ ¢ ¢ ¢ ¢ ¢ Hash Examples — 1 As with arrays, you make a new element just by assigning to it: my %fruit; $fruit{apple} = "crunchy"; $fruit{peach} = "soft"; Here, we made two hash elements. – The keys were "apple" and "peach". – The corresponding values were "cruchy" and "soft". You could print the values like this: ¢ ¢ ¢ Hash slices We can assign some values to part of a hash: $score{fred} = 150; $score{barney} = 100; $score{dino} = 10; We could use a list assignment (see 21): ( $score{fred}, $score{barney}, $score{dino} ) = ( 150, 100, 10 ); e ¢ ¢ ¢ . . . too long. A hash slice makes this easier: @score{ "fred", "barney", "dino" } = ( 150, 100, 10 ); print "$fruit{apple}, $fruit{peach}\n"; prints: crunchy, soft SNM — ver. 1.7 Perl — slide #45 We can interpolate this too (see slides 18 and 29): Hash Examples — 2 How to see if a hash is empty? See 32 print "empty\n" unless %fruit; How to delete a hash element? delete $fruit{coconut}; Hashes are often useful for storing counts (see slides 32–34 for more about while loops): my %wordcounts; while ( <> ) { chomp; ++$wordcount{$_}; } SNM — ver. 1.7 Perl — slide #46 ¢ ¢ ¢ my @players = qw( fred barney dino ); print "scores are @score{@players}\n"; SNM — ver. 1.7 Perl — slide #47 Another Hash Example Often used to keep a count of the number of occurrences of data read in: #! /usr/bin/perl -w use strict; our %words; while ( <> ) { next unless /\S/; # Skip blank lines my @line = split; foreach my $word ( @line ) { ++$words{$word}; } } print "Words unsorted, in the order they come from the hash:\n\n"; foreach my $word ( keys %words ) { printf "%4d %s\n", $words{$word}, $word; } ¢ see slide 32 for while loop, slide 34 for while ( <> ), slide 36 for the foreach statement, slides 32 and 38 for the unless statement SNM — ver. 1.7 f Perl — slide #48 Hashes are Not Ordered A big difference from arrays is that hashes have no order. The data in a hash will be available in only an unpredictable order. See slide 36 for how to iterate over hash elements Perl — slide #49 ¢ and Declaring Variables w xt r v gh i ¢ ¢ SNM — ver. 1.7 ¢ ¢ ¢ ¢ h All programs that are more than a few lines long should have the pragma use strict; This turns on additional checking that all variables are declared, all subroutines are okay, and that references to variables are “hard references” — see perldoc strict. All variables that you use in your program need to be declared before they are used with either my or our. my defines a local variable that exists only in the scope of the current block, or outside of a block, in the file. – See perldoc my. our defines a global variable. – See perldoc our. Perl — slide #51 SNM — ver. 1.7 Discipline— ¢ st gh i pq rs Better to let compiler detect problems, not your customer Develop your program with all warnings enabled Either: – put -w as an option to perl when execute the program, i.e., Make the first line of your program: #! /usr/bin/perl -w Or better: put a line: use warnings; near the top of your program. Perl — slide #50 C ¢ ¢ SNM — ver. 1.7 C uh ¢ v Examples of ¢ and Variables Operators and Quoting Perl has all the operators from C (and so Java), in same precedence Has more operators for strings: Join strings with a dot, e.g. print "The sum of 3 and 4 is " . 3 + 4 . "\n"; Quote special characters with backslash, as in C or Java ¢ ¢ ¢ ¢ ¢ ¢ ¢ wt r v gh i Without use strict, a variable just springs into life whenever you use it. Problem: a typing mistake in a variable creates a new variable and a hard-to-find bug! . . . so always start your programs like this: #! /usr/bin/perl use warnings; use strict; use warnings; enables compile time warnings which help find bugs earlier—see perldoc warnings After use strict, it will be an error to use a variable without declaring it with my or our. – Most code examples in these notes define variables with my or our ¢ ¢ h v print "\$value = $value\n"; Can quote all characters using single quotes: print ’output of \$perl = "rapid";print \$perl; is "rapid"’; ¢ ¢ Note that double quotes are okay in single quotes, single quotes okay in double quotes. Documentation in perldoc perlop. Perl — slide #53 SNM — ver. 1.7 Perl — slide #52 SNM — ver. 1.7 Quoting Perl has lots of ways of quoting, too many to list here Meaning Interpolates ’’ q// Literal No "" qq// Literal Yes ‘‘ qx// Command Yes () qw// quote word list No // m// Pattern match Yes s/// s/// Substitution Yes y/// tr/// Translation No – See slide 18 for meaning of “interpolate” y/// or tr/// works just like the posix tr (translate) program in Linux. Perl — slide #54 ¢ ¢ What is Truth? Anything that has the string value "" or "0" is false Any other value is true. This means: – No number is false except 0 – any undefined value is false – any reference is true (see perldoc perlref) Examples: 0 1 0.00 "" "0.00" undef() # # # # # # becomes the string "0", so false becomes the string "1", so true becomes 0, would convert to the string "0", so false The null string, so false the string "0.00", neither empty nor "0", so true a function returning the undefined value, so false ¢ Slide 28, 18 28, 18 46 19, 38 50 59 e e e e e SNM — ver. 1.7 e e e e SNM — ver. 1.7 ¢ ¢ ¢ Perl — slide #56 Input and Output Read from standard input like this: my $value = ; Note that there will be a newline character read at the end – To remove trailing newline, use chomp: chomp $value; – The word STDIN is a predefined filehandle. You can define your own filehandles with the open builtin function. C ¢ Statements for Looping and Conditions We look at the following statements in the language: – if. . . elsif. . . else statements — 31 The unless statement is similar to the if statement — 32 processing input using while The <> operator e e e C e e ¢ ¢ – while loops — 32 C C C – for loops — 35 – foreach loops — 36 iterating over arrays and hashes with foreach, while — 36– 37 e e e write to standard output with the list operator print – print takes a list of strings: print "The product of $a and $b is ", $a * $b, "\n"; ¢ ¢ – Exit early from a loop with last, and next — 38 We will also look at “backwards statements” — 38– 39 e SNM — ver. 1.7 Perl — slide #55 SNM — ver. 1.7 Perl — slide #57 e Statements if statements work as in C or Java, except: – braces are required, not optional – Use elsif instead of else if Example: if ( $age print } elsif ( print } else { print } > $max ) { "Too old\n"; $age < $min ) { "Too young\n"; "Just right\n"; Perl — slide #58 ¢ loop – . . . but braces are required: yt  p t € ¢ i Just as in C or Java while ( $tickets_sold < 1000 ) { $available = 1000 - $tickets_sold; print "$available tickets are available. "How many do you want: "; $purchase = ; chomp $purchase; $tickets_sold += $purchase; } ¢ ¢ ", SNM — ver. 1.7 Perl — slide #60 SNM — ver. 1.7 Statement – except that the block is executed if the condition is false: gs € ih ¢ h Same as if statement, unless ( $destination eq $home { print "I’m not going home.\n"; } Input with ¢  p t € i corresponds to: unless ( condition ) { statements. . . ; } ¢ Input is often done using while: while ( $line = ) { process this $line } This loop will iterate once for each line of input will terminate at end of file Perl — slide #61 if ( ! ( condition ) ) { statements. . . ; } ¢ else works, but I suggest you don’t use it – Use if. . . else instead Perl — slide #59 SNM — ver. 1.7 SNM — ver. 1.7 ¢ The Special ¢ variable and the operator ‚  t € p Nearly every built-in input function, many input operators, most statements with input and regular expressions use a special variable ¡ Most input is done using the <> operator with a while loop The <> operator processes files named on the command line – These are called command line parameters or command line arguments – If you execute it like this: angle-brackets.pl If you don’t specify a variable, Perl uses ¢ For example, this while loop reads one line from standard input at a time, and prints that line: while ( ) { print; } while loop reads one line into at each iteration. if you do not tell it to print ¡ ¡ ¢ ¡ ¢ ¢ i then you have no command line arguments passed to the program. – But if you execute it like this: angle-brackets.pl file_1 file_2 file_3 then the command line has three arguments, which here, happen to be the names of files. SNM — ver. 1.7 Perl — slide #63 ¢ print statement prints the value of anything else. ¢ See the Perl Reference on page 2 under Conventions Perl — slide #62 SNM — ver. 1.7 ¢ and the ƒ„ ƒ„ operator — 2  p t € ¢ ¢ i We most often use the <> operator like this: while ( <> ) { statements. . . } This loop does a lot. The pseudocode here shows what it does: if there are no command line arguments, while there are lines to read from standard input read next line into $_ execute statements. . . else for each command line argument open the file while there are lines to read read next line from the file into $_ execute statements. . . close the file SNM — ver. 1.7 Perl — slide #64 loop The for loop works as in C or Java, except that braces are required, not optional. Example: for ( $i = 0; $i < $max; ++$i ) { $sum += $array[ i ]; } Note that we rarely use this type of loop in Perl. Instead, use the higher level foreach loop. . . Perl — slide #65 Iterating over a Hash Referring to our example hash in slide 22, we can process each element like this: foreach my $key ( keys %hash ) { process $hash{$key} } – keys creates a temporary array of all the keys of the hash – We then looped through that array with foreach. More efficient is to use the each built in function, which truly iterates through the hash: while ( my ( $key, $value ) = each %hash ) { process $key and $value } ¢ ¢ y … ¢ ¢ ¢ SNM — ver. 1.7 r loop SNM — ver. 1.7 Perl — slide #67 y … r iq The foreach loop iterates over an array or list. Most useful looping construct in Perl It is so good, that Java 1.5 has borrowed this type of loop to simplify iterators. An example: adds 1 to each element of an array: foreach my $a ( @array ) { ++$a; } a here is a reference to each element of the array, so ¡ ¢ ¡ ¢ ¢ ¢ w  Iterating over a Hash in Sorted Order Did we process the contents of %hash in alphabetical order in slide 36? – No. – So what do we do if we want to print the elements in order? In order of key by alphabet? Numerically? In order of element by alphabet? Numerically? C C ¢ ¢ ¢ changing a actually changes the array element. You can write “for” or “foreach”, Perl won’t mind. Perl — slide #66 Use built in sort function see perldoc -f sort Perl — slide #68 ¢ SNM — ver. 1.7 ¢ SNM — ver. 1.7 ¢ Iterating over a Hash in Sorted Order You cannot sort a hash . . . but you can read all the keys, sort them, then process each element in that order: foreach my $key ( sort keys %hash ) { process $hash{$key} } – see perldoc sort A reverse sort: foreach my $key ( reverse sort keys %hash ) { process $hash{$key} } – see perldoc reverse SNM — ver. 1.7 Perl — slide #69 ¢ ¢ “Backwards” Statements Put an if, while or foreach modifier after a simple statement. You can put a simple statement (i.e., with no braces), and put one of these afterwards: if EXPR unless EXPR while EXPR until EXPR foreach EXPR SNM — ver. 1.7 Perl — slide #71 ¢ ¢ ¢ ¢ “Backwards” Statements—Examples Examples: – print $1 if /(\d{9})/; is equivalent to: if ( /(\d{9})/ ) { print $1; } – # print unless this is a blank line: print unless /^\s*$/; is equivalent to if ( ! /^\s*$/ ) { print; } Exit a Loop Early Java and C provide break and continue Perl provides and V7 U E T 5 b U ¢ ¢ ¢ my @super_people = qw( Superman Robin Wonder Woman Batman Superboy ); foreach my $person ( @super_people ) { next if $person eq "Robin"; print "$person\n"; last if $person eq "Batman"; } What do you think this program will print? Perl — slide #70 SNM — ver. 1.7 SNM — ver. 1.7 Perl — slide #72 Array Operations— ¢ and and 7 U S t ‡ … The documentation for these is in the very loo–oong document perlfunc, and is best read with perldoc -f Function add a value at the end of an array, e.g., 4F E R Do perldoc -f split and perldoc -f join. splits a string into an array: E 4 ¢ ¢ my @array = ( 1, 2, 3 ); push @array, 4; # now @array contains ( 1, 2, 3, 4 ) – Do perldoc -f push remove and return value from end of an array my @array = ( 1, 2, 3 ); my $element = pop @array; # now @array contains ( 1, 2 ) # and $element contains 3 – Do perldoc -f pop SNM — ver. 1.7 Perl — slide #73 4 9 4 ¢ my $pwline = "nicku:x:500:500:Nick Urbanik:/home/nicku:/bin/bash"; my ( $userid, $pw, $userid_number, $group_id_number, $name, $home_dir, $shell ) = split /:/, $pwline; Another application is reading two or more values on the same input line: my ( $a, $b ) = split ’ ’, ; is the opposite of split and joins an array into a string: ˆ ¢ my $pwline = join ’:’, @pwfields; SNM — ver. 1.7 Perl — slide #75 Array Ops— R E U B S and yt v yt   h gs remove and return value from the beginning of an array, e.g., my @array = ( 1, 2, 3 ); my $element = shift @array; # now @array contains ( 2, 3 ) # and $element contains 1 Do perldoc -f shift h v Subroutines See perldoc perlsub Syntax: sub subroutine˙name { statements. . . } Perl — slide #74 SNM — ver. 1.7 Perl — slide #76 ¢ R F T E U B S ¢ add value to the beginning of an array, e.g., ¢ my @array = ( 1, 2, 3 ); unshift @array, 4; # now @array contains ( 4, 1, 2, 3 ) Do perldoc -f unshift ¢ SNM — ver. 1.7 TS 9 st  € † … † h † v † gh Parameters — 1 Subroutines calls pass their parameters to the subroutine in an list named @ . It is best to show with an example: #! /usr/bin/perl -w use strict; sub product { my ( $a, $b ) = @_; return $a * $b; } print "enter two numbers on one line: a b "; my ( $x, $y ) = split ’ ’, ; print "The product of $x and $y is ", product( $x, $y ), "\n"; SNM — ver. 1.7 Perl — slide #77 ¢ Checking for Errors: ¢ and ‰ i t pq System calls can fail; examples: – Attempt to read a file that doesn’t exist – Attempt to execute an external program that you do not have permission to execute In Perl, use the built in function with the or operator to terminate (or raise an exception) on error: chdir ’/tmp’ or die "can’t cd to tmp: $!"; die and warn both print a message to STDERR, but die will raise a fatal exception, warn will continue If no newline at the end of string, die and warn print the program name and line number where were called ! holds the value of the last system error message Perl — slide #79 ¡ 5S 8 SNM — ver. 1.7 Parameters — 2 parameters are passed in one list @ . If you are passing one parameter, then the builtin function shift will conveniently remove the first item from this list, e.g., sub square { my $number = shift; return $number * $number; } SNM — ver. 1.7 Perl — slide #78 ¢ ¢ ¢ ¢ ¢ ¢ rs Files and Filehandles STDIN, STDOUT and STDERR are predefined filehandles You can define your own using the open built-in function Generally use all upper-case letters by convention Example: open for input: use strict; open PASSWD, ’<’, "/etc/passwd" or die "unable to open passwd file: $!"; while ( ) { my ( $user ) = split /:/; print "$user\n"; } close PASSWD; SNM — ver. 1.7 Perl — slide #80 ¢ Executing External Programs Many ways of doing this: – system built-in function – backticks – many other ways not covered here. SNM — ver. 1.7 Perl — slide #82 ¢ h h ¢ ¢ ¢ ¢ ¢ ¢ v Open for Writing To create a new file for output, use “>” instead of “<” with the file name. use strict; open OUT, ’>’, "data.txt" or die "unable to open data.txt: $!"; for ( my $i = 0; $i < 10; ++$i ) { print OUT "Time is now ", scalar localtime, "\n"; } close OUT; Note there is no comma after the filehandle in print To append to a file if it exists, or otherwise create a new file for output, use “>>” instead of “>” with the file name. Perl — slide #81 ¢ ¢ Example: my @cmd = ( ’useradd’, ’-c’, "\"$name\"", ’-p’, $hashed_passwd, $id ); print "@cmd\n"; system @cmd; This also works: system "useradd -c \"$name\" -p \"$hashed_passwd\" $id"; difference: second form is usually passed to a command shell (such as /bin/sh or CMD.EXE) to execute, whereas the first form is executed directly. Perl — slide #83 SNM — ver. 1.7 ¢ SNM — ver. 1.7 i‘ Was ¢ Call Successful? Backticks: ¢ or { } “ ”’ h h v i‘ ’ “ “ •– “ “ Check that the return value was zero: Perl provides command substitution Just like in shell programming, where the output of the program replaces the code that calls it: print ‘ls -l‘; Note that you can write qx{...} instead: print qx{df -h /}; if ( system( "useradd -c \"$name\" -p \"$hashed_passwd\" $id" ) != 0 ){ print "useradd failed"; exit; } This is usually written in Perl more simply using the built in function die, and the or operator: ¢ system( "useradd -c \"$name\" -p \"$hashed_passwd\" $id" ) == 0 or die "useradd failed"; ¢ ¢ ¢ SNM — ver. 1.7 Perl — slide #84 – qx// is mentioned in slide 29 SNM — ver. 1.7 Perl — slide #86 See the perl summary The Perl summary on the subject web site provides. . . well, a good summary! Called perl.pdf Stored in same directory as these notes Perl — slide #87 ¢ SNM — ver. 1.7 Was ¢ Call Successful? — 2 h h v i‘ Regular Expressions I usually prefer to call system like this: my @cmd = ( ’useradd’, ’-c’, "\"$name\"", ’-p’, $hashed_passwd, $id ); print "@cmd\n"; system @cmd == 0 or die "Can’t execute @cmd"; Regular Expressions are available as part of the programming languages Java, JScript, Visual Basic and VBScript, JavaScript, C, C++, C#, elisp, Perl, Python, Ruby, PHP, sed, awk, and in many applications, such as editors, grep, egrep. Regular Expressions help you master your data. — Sales Department. SNM — ver. 1.7 Perl — slide #85 SNM — ver. 1.7 ¢ ¢ “ Perl — slide #88 What is a Regular Expression? Powerful. Low level description: – Describes some text – Can use to: Verify a user’s input Sift through large amounts of data C C ¢ How to use a Regular Expression ¢ How to make a regular expression as part of your program SNM — ver. 1.7 Perl — slide #91 High level description: – Allow you to master your data Perl — slide #89 SNM — ver. 1.7 Regular Expressions as a language Can consider regular expressions as a language Made of two types of characters: – Literal characters Normal text characters Like words of the program The special characters + ? . * ^ ( ) [ { | \ Act as the grammar that combines with the words according to a set of rules to create and expression that communicates an idea ¡ C C ¢ ¢ ¢ What do they look like? In Perl, a regular expression begins and ends with ‘/’, like this: /abc/ /abc/ matches the string “abc” – Are these literal characters or metacharacters? Returns true if matches, so often use as condition in an if statement Perl — slide #92 ¢ ¢ – Metacharacters C C SNM — ver. 1.7 Perl — slide #90 SNM — ver. 1.7 ¢ Example: searching for “ ¢ ” The “match operator” ∼ — 2 # sets the string to be searched: $_ = "perl for Win32"; # is ’perl’ inside $_? if ( $_ =~ /perl/ ) { print "Found perl\n" }; # Same as the regex above. # Don’t need the =~ as we are testing $_: if ( /perl/ ) { print "Found perl\n" }; SNM — ver. 1.7 Perl — slide #95 ™ — … g rh Problem: want to print all lines in all input files that contain the string “Course:” while ( <> ) { my $line = $_; if ( $line =~ /Course:/ ) { print $line; } } Or more concisely: while ( <> ) { print if $_ =~ /Course:/; } or even: print if /Course:/ while <>; ¢ ¢ i˜ SNM — ver. 1.7 Perl — slide #93 — Matching without case sensitivity d $_ = "perl for Win32"; # this will fail because the case doesn’t match: if ( /PeRl/ ) { print "Found PeRl\n" }; The “match operator” ∼ If just use /Course:/, this returns true if “Course:” ¡ ™ # this will match, because there is an ’er’ in ’perl’: if ( /er/ ) { print "Found er\n" }; contains the string ¢ ¡ # this will match, because there is an ’n3’ in ’Win32’: if ( /n3/ ) { print "Found n3\n" }; # this will fail because the case doesn’t match: if ( /win32/ ) { print "Found win32\n" }; # This matches because the /i at the end means # "match without case sensitivity": if ( /win32/i ) { print "Found win32 (i)\n" }; SNM — ver. 1.7 If want to test another string variable var to see if it contains the regular expression, use var =~ /regular expression/ Perl — slide #94 ¡ ¢ ¢ Under what condition is this true? SNM — ver. 1.7 ¢ t Perl — slide #96 Using ∼ instead of e ∼ / /; Character Classes f ™ “ “ # Looking for a space: print "Found!\n" if # both these are the same, but reversing the logic with # unless and !~ print "Found!!\n" unless $_ !~ / /; print "Found!!\n" unless !~ / /; SNM — ver. 1.7 Perl — slide #97 my @names = ( "Nick", "Albert", "Alex", "Pick" ); foreach my $name ( @names ) { if ( $name =~ /[NP]ick/ ) { print "$name: Out for a Pick Nick\n"; else { print "$name is not Pick or Nick\n"; } } Square brackets match one single character Perl — slide #100 ¢ Embedding variables in regexps # Create two variables containing # regular expressions to search for: my $find = 32; my $find2 = " for "; if ( /$find/ ) \{ print "Found ’$find’\n" }; if ( /$find2/ ) \{ print "Found ’$find2’\n" }; # different way to do the above: print "Found $find2\n" if /$find2/; This is the meaning of the “Yes” under “Interpolates” in the table on slide 29 on the row for m// Perl — slide #98 ¢ SNM — ver. 1.7 Examples of use of ¢ f “ g “ “ “ g Match a capital letter: [ABCDEFGHIJKLMNOPQRSTUVWXYZ] Same thing: [A-Z] Match a vowel: [aeiou] Match a letter or digit: [A-Za-z0-9] Perl — slide #101 SNM — ver. 1.7 The Metacharacters SNM — ver. 1.7 The funny characters What they do How to use them SNM — ver. 1.7 Perl — slide #99 ¢ Negated character class: ¢ ¢ ¢ ¢ hf “ “ Match any single character that is not a letter: [^A-Za-z] Match any character that is not a space or a tab: [^ \t] Perl — slide #102 SNM — ver. 1.7 “ g Example using ¢ Matching any character The dot matches any character except a newline This matches any line with at least 5 characters before the newline: print if /...../; SNM — ver. 1.7 Perl — slide #105 ¢ ¢ ¢ ¢ hf “ “ This simple program prints only lines that contain characters that are not a space: while ( <> ) { print $_ if /[^ ]/; } This prints lines that start with a character that is not a space: while ( <> ) { print if /^[^ ]/; } Notice that ^ has two meanings: one inside [...], the other outside. Perl — slide #103 ¢ “ g Matching the beginning or end to match a line that contains exactly five characters before the newline: print if /^.....$/; the ^ matches the beginning of the line. the matches at the end of the line Perl — slide #106 ¡ SNM — ver. 1.7 ¢ SNM — ver. 1.7 Since matching a digit is very common, Perl provides \d as a short way of writing [0-9] \D matches a non-digit: [^0-9] \s matches any whitespace character; shorthand for [ \t\n\r\f] \S non-whitespace, [^ \t\n\r\f] \w word character, [a-zA-Z0-9 ] \W non-word character, [^a-zA-Z0-9 ] Perl — slide #104 ¢ To match zero or more: – /a*/ will match zero or more letter ‘a’, so matches “”, “a”, “aaaa”, “qwereqwqwer”, or the nothing in front of anything! to match at least one: – /a+/ matches at least one “a” – /a?/ matches zero or one “a” – /a{3,5}/ matches between 3 and 5 “a”s. ¢ ¢ SNM — ver. 1.7 ¢ ¢ ¢ SNM — ver. 1.7 ¢ ¢ l‘ Shorthand: Common Character Classes Matching Repetitions: ¢ { } i j k s Perl — slide #107 Example using Capturing the match: greediness Look at this example: ¢ $_ = ’Nick Urbanik ’; print "found something in <>\bs n" if /<.*>/; # Find everything between quotes: $_ = ’He said, "Hi there!", and then "What\’s up?"’; print "quoted!\n" if /"[^"]*"/; print "too much!\n" if /".*"/; SNM — ver. 1.7 Perl — slide #108 “ i $_ = ’He said, "Hi there!", and then "What\’s up?"’; print "$1\n" if /"([^"]*)"/; print "$1\n" if /"(.*)"/; What will each print? The first one works; the second one prints: "Hi there!", and then "What’s up? Why? Because *, ?, +, {m,n} are greedy! They match as much as they possibly can! Perl — slide #110 SNM — ver. 1.7 Being Stingy (not Greedy): Capturing the Match with ¢ m “ “ “ n ¢ ¢ ¢ ¢ ¢ ¢ Usually greedy matching is what we want, but not always How can we match as little as possible? Put a ? after the quantifier: *? +? ?? {n,}? Match 0 or more times Match 1 or more times Match 0 or 1 time Match at least n times Perl — slide #111 Often want to scan large amounts of data, extracting important items Use parentheses and regular expressions Silly example of capturing an email address: $_ = ’Nick Urbanik ’; print "found $1 in <>\n" if /<(.*)>/; ¢ ¢ ¢ ¢ {n,m}? Match at least n, but no more than m times Perl — slide #109 SNM — ver. 1.7 SNM — ver. 1.7 k Being Less Greedy: Example We can solve the problem we saw earlier using non-greedy matching: ¢ Capturing the Match: m “ “ $_ = ’He said, "Hi there!", and then "What\’s up?"’; print "\$1\n" if /"([^"]*)"/; print "\$1\n" if /"(.*?)"/; These both work, and match only: Hi there! SNM — ver. 1.7 Perl — slide #112 # useradd() is a function defined elsewhere # that creates a computer account with # username as first parameter, password as # the second parameter while ( <> ) { if ( /^(\d{9})\t([A-Z]\d{6}\([\dA]\))/ ) { my $student_id = $1; my $hk_id = $2; useradd( $student_id, $hk_id ); } } SNM — ver. 1.7 Perl — slide #114 Sifting through large amounts of data Imagine you need to create computing accounts for thousands of students As input, you have data of the form: – Some heading on the top of each page – More headings with other content, including blank lines – A tab character separates the columns 123456789 234567890 345678901 ... 987654321 SNM — ver. 1.7 ¢ ¢ ¢ The Substitution Operator ¢ “ n d h d Sometimes want to replace one string with another (editing) Example: want to replace Nicholas with Nick on input files: while ( <> ) { $_ =~ s/Nicholas/Nick/; print $_; } H123456(1) I234567(2) J345678(3) ... A123456(1) Perl — slide #113 SNM — ver. 1.7 ¢ d Perl — slide #115 Avoiding leaning toothpicks: ¢ Readable regex: ¢ Modifier do do d Want to change a filename, edit the directory in the path from, say /usr/local/bin/filename to /usr/bin/filename Could do like this: – s/\/usr\/local\/bin\//\/usr/\bin\//; – but this makes me dizzy! We can do this instead: – s!/usr/local/bin/!/usr/bin/!; Can use any character instead of / in s/// – For matches, can put m//, and use any char instead of / – Can also use parentheses or braces: – s{...}{...} or m{...} Sometimes regular expressions can get long, and need comments inside so others (or you later!) understand Use /x at the end of s///x or m//x Allows white space, newlines, comments See example on slide 9 Perl — slide #118 ¢ SNM — ver. 1.7 SNM — ver. 1.7 ¢ ¢ Perl — slide #116 Substitution and the ¢ modifier d If an input line contains: Nicholas Urbanik read “Nicholas Nickleby” then the output is: Nick Urbanik read “Nicholas Nickleby” How change all the Nicholas in one line? Use the /g (global) modifier: while ( <> ) { $_ =~ s/Nicholas/Nick/g; print $_; } ¢ ¢ ¢ u Special Vars: Input Record Separator When I described the <> operator, I lied a little As while ( <> ) { ...} executes, it iterates once per record, not just once per line. The definition of what a record is is given by the special built-in variable the Input Record Separator / – default value is a newline, so by default read one line at a time But useful alternatives are paragraph mode and the whole-file mode Perl — slide #119 Perl — slide #117 SNM — ver. 1.7 ¢ ¡ ¢ SNM — ver. 1.7 ¢ ¢ ¢ ¢ ¢ ¢ ¢ –d Paragraph, Whole-file Modes To input in paragraph mode, put this line before you read input: $/ = ""; Then when you read input, it will be split at two or more newlines – You could split the fields at the newlines To slurp a whole file into one string, you can do: undef $/; $_ = ; # slurp whole file into $_ s/\n[ \t]+/ /g; # fold indented lines See perldoc -f paragraph, perldoc perlvar and perldoc -f local for important information on how to localise the change to /. ¡ ¢ ¢ ¢ ¢ ising Global Variables ) ¡ € …w q It is not a good idea to globally change /, (or even ¡ € ¢ – Your program may use other modules, and they may behave differently if / is changed. – Best to localise the change to / (or ¡ ,...) Example localising whole-file mode: my $content; open FH, "foo.txt" or die $!; { local $/; $_ = ; } close FH; For paragraph mode, put: local ¢ SNM — ver. 1.7 Perl — slide #120 SNM — ver. 1.7 ¢ ¡ / = ""; Perl — slide #121 ¡ ¡ One Line Perl Programs Called “one liners” Just execute on the command line See perldoc perlrun Example: $ ‚ v~ vu u u u ts yz x |{ v v €s s qs x ƒ qs „ ƒ qs u … ~ pqr p p q } q w t ~ } ¢ ¢ ¢ ¢ ¢ – edits the files fileA and fileB – makes backups of the original files in fileA.backup and fileB.backup – substitutes all instances of “Silly” and replaces them with “Sensible”. Useful for editing configuration files in shell scripts, automating tasks Perl — slide #122 ¢ † SNM — ver. 1.7 References Learning Perl, 3rd Edition, Randal L. Schwartz and Tom Phoenix, ISBN 0-596-00132-0, O’Reilly, July 2001. – The second edition is fine, too. Don’t bother with the first edition, it is too old. Perl Reference Guide, Johan Vromans, handed out to each one of you, and will be handed out in the final examination. Become familiar with it. Perl for System Administration: Managing multi-platform environments with Perl, David N. Blank-Edelman, ISBN 1-56592-609-9, O’Reilly, July 2000. Perl Cookbook, 2nd Edition, Tom Christiansen and Nathan Torkington, ISBN 0-596-00313-7, O’Reilly, August 2003 – The first edition is fine, too. Don’t forget perldoc and all the other documentation on your hard disk. Object Oriented Perl, Damian Conway, ISBN 1-884777-79-1, Manning, 2000. — A more advanced book for those wanting to build bigger projects in Perl. † † † † † SNM — ver. 1.7 Perl — slide #123