Software Carpentry

License

Copyright © 2005-06 Python Software Foundation

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Introduction


Introduction


Self Assessment


The State of Play


Meeting Standards


The Grass Isn't That Much Greener


Hidden in Plain Sight


The Times They Are A-Changin'


This Course


Setting Up


A Note on Tool Choice


Contributing


Recommended Reading


Typographic Conventions


Summary


Exercises

Exercise 2.1:

What is the largest software project you have ever worked on? How well did it meet its original objectives? What is the most important thing you learned from it?

Exercise 2.2:

Write a point-form list of the programming tools you use on a regular basis. When and how did you learn each one? How proficient do you think you are with each? Compared to whom?

Exercise 2.3:

Suppose you have been given one week to write a program to translate old-style configuration files to a new syntax. Write a point-form description of how you would go about it.

Exercise 2.4:

Rewrite the following fragment of code to make it more readable. Don't worry about the fact that you don't know the language it's written in; feel free to use any functions or language features you're familiar with from other languages.

i = open('oldconfig.cnf', 'r');
ll = i.readlines();
for j in 0..len(ll) {
    if len(j) > 0 {
        if not defined(r) r = new list;
        r.append(j);
    }
}
sort(r);
print 'longest line is', r[0];

Exercise 2.5:

What are the errors in the function shown below? Don't worry about the lack of variable declarations: this language doesn't need them. Note that, like C and Java, this language uses 0 as the first index for lists.

# Calculate a running sum of a list of numbers.
# If the input values are [1, 2, 3], the final values are [1, 3, 6].

def running_sum(values) {
    i = 1;
    while (i < len(values)) {
        values[i] = values[i] + values[i-1];
    }
}

Exercise 2.6:

A sub-contractor in Euphoristan has just written a function that takes two lists of phone numbers (represented as strings), and returns all those in the first list that are not in the second. You only have a few minutes to test it before she goes off-line for the weekend; what are the first half-dozen test cases you would try?

Send comments

Shell Basics


Introduction


You Can Skip This Lecture If...


The Shell


The Shell is Not the Operating Sytsem


The File System


Paths


Navigating the File System


Execution Cycle


Providing Options


Creating Files and Directories


Looking at Files


Basic Tools


Summary


Exercises

Exercise 3.1:

Suppose ls shows you this:

Makefile    biography.txt   data    enrolment.txt   programs    thesis

What argument(s) will make it print the names in reverse, like this:

thesis  programs    enrolment.txt   data    biography.txt   Makefile

Exercise 3.2:

What does the command cd ~ do? What about cd ~hpotter?

Exercise 3.3:

What command will show you the first 10 lines of a file? The first 25? The last 12?

Exercise 3.4:

What do the commands pushd, popd, and dirs do? Where do their names come from?

Exercise 3.5:

How would you send the file earth.txt to the default printer? How would you check it made it (other than wandering over to the printer and standing there)?

Exercise 3.6:

The instructor wants you to use a hitherto unknown command for manipulating files. How would you get help on this command?

Exercise 3.7:

diff finds and displays the differences between two text files. For example, if you modify earth.txt to create a new file earth2.txt that contains:

Name: Earth
Period: 365.26 days
Inclination: 0.00 degrees
Eccentricity: 0.02
Satellites: 1

you can then compare the two files like this:

$ diff earth.txt earth2.txt
3c3
< Inclination: 0.00
---
> Inclination: 0.00 degrees
4a5
> Satellites: 1

(The rather cryptic header "3c3" means that line 3 of the first file must be changed to get line 3 of the second; "4a5" means that a line is being added after line 4 of the original file.)

What flag(s) should you give diff to tell it to ignore changes that just insert or delete blank lines? What if you want to ignore changes in case (i.e., treat lowercase and uppercase letters as the same)?

Send comments

More Shell


Introduction


You Can Skip This Lecture If...


Wildcards


Redirecting Input and Output


Redirection Examples


Pipes


Environment Variables


Setting Environment Variables


Configuration


How the Shell Finds Programs


Common Search Path Entries


Cygwin on Windows


File Ownership and Permissions


Directory Permissions


Changing Permissions


Ownership and Permission: Windows


More Advanced Tools


Summary


Exercises

Exercise 4.1:

-rwxr-xr-x   1 aturing   cambridge  69 Jul 12 09:17 mars.txt
-rwxr-xr-x   1 ghopper   usnavy     71 Jul 12 09:15 venus.txt

According to the listing of the data directory above, who can read the file earth.txt? Who can write it (i.e., change its contents or delete it)? When was earth.txt last changed? What command would you run to allow everyone to edit or delete the file?

Exercise 4.2:

Suppose you want to remove all files whose names (not including their extensions) are of length 3, start with the letter a, and have .txt as extension. What command would you use? For example, if the directory contains three files a.txt, abc.txt, and abcd.txt, the command should remove abc.txt , but not the other two files.

Exercise 4.3:

You're worried your data files can be read by your nemesis, Dr. Evil. How would you check whether or not he can, and if necessary change permissions so only you can read or write the files?

Exercise 4.4:

What's the difference between the commands cd HOME and cd $HOME?

Exercise 4.5:

Suppose you want to list the names of all the text files in the data directory that contain the word "carpentry". What command or commands could you use?

Exercise 4.6:

Suppose you have written a program called analyze. What command or commands could you use to display the first ten lines of its output? What would you use to display lines 50-100? To send lines 50-100 to a file called tmp.txt?

Exercise 4.7:

The command ls data > tmp.txt writes a listing of the data directory's contents into tmp.txt. Anything that was in the file before the command was run is overwritten. What command could you use to append the listing to tmp.txt instead?

Exercise 4.8:

What command(s) would you use to find out how many subdirectories there are in the lectures directory?

Exercise 4.9:

What does rm *.ch? What about rm *.[ch]?

Exercise 4.10:

What command(s) could you use to find out how many instances of a program are running on your computer at once? For example, if you are on Windows, what would you do to find out how many instances of svchost.exe are running? On Unix, what would you do to find out how many instances of bash are running?

Exercise 4.11:

A colleague asks for your data files. How would you archive them to send as one file? How could you compress them?

Exercise 4.12:

You have changed a text file on your home PC, and mailed it to the university terminal. What steps can you take to see what changes you may have made, compared with a master copy in your home directory?

Exercise 4.13:

How would you change your password?

Exercise 4.14:

grep is one of the more useful tools in the toolbox. It finds lines in files that match a pattern and prints them out. For example, assume the files earth.txt and venus.txt contain lines like this:

Name: Earth
Period: 365.26 days
Inclination: 0.00
Eccentricity: 0.02

grep can extract lines containing the text "Period" from all the files:

$ grep Period *.txt
earth.txt:Period: 365.26 days
venus.txt:Period: 224.70 days

Search strings can use regular expressions, which will be discussed in a Regular Expressions. grep takes many options as well; for example, grep -c /bin/bash /etc/passwd reports how many lines in /etc/passwd (the Unix password file) that contain the string /bin/bash, which in turn tells me how many users are using bash as their shell.

Suppose all you wanted was a list of the files that contained lines matching a pattern, rather than the matches themselves—what flag or flags would you give to grep? What if you wanted the line numbers of matching lines?

Exercise 4.15:

Suppose you wanted ls to sort its output by filename extension, i.e., to list all .cmd files before all .exe files, and all .exe's before all .txt files. What command or commands would you use?

Exercise 4.16:

What does the alias command do? When would you use it?

Send comments

Version Control


Introduction


You Can Skip This Lecture If...


Problem #1: Collaboration


Solution: Version Control


Problem #2: Undoing Changes


Solution: Version Control (Again)


Which Version Control System?


Basic Use


How To Do It


Resolving Conflicts


Example of Resolving


Example of Resolving (continued)


Example of Resolving (continued)


Starvation


Binary Files


Reverting


Rolling Back


Creating and Checking Out


Subversion Command Reference


Reading Subversion Output


Summary


Exercises

Exercise 5.1:

Follow the instructions given to you by your instructor to check out a copy of the Subversion repository you'll be using in this course. Unless otherwise noted, the exercises below assume that you have done this, and that your working copy is in a directory called course. You will submit all of your exercises in this course by checking files into your repository.

Exercise 5.2:

Create a file course/ex01/bio.txt (where course is the root of your working copy of your Subversion repository), and write a short biography of yourself (100 words or so) of the kind used in academic journals, conference proceedings, etc. Commit this file to your repository. Remember to provide a meaningful comment when committing the file!

Exercise 5.3:

What's the difference between mv and svn mv? Put the answer in a file called course/ex01/mv.txt and commit your changes.

Once you have committed your changes, type svn log in your course directory. If you didn't know what you'd just done, would you be able to figure it out from the log messages? If not, why not?

Exercise 5.4:

In this exercise, you'll simulate the actions of two people editing a single file. To do that, you'll need to check out a second copy of your repository. One way to do this is to use a separate computer (e.g., your laptop, your home computer, or a machine in the lab). Another is to make a temporary directory, and check out a second copy of your repository there. Please make sure that the second copy isn't inside the first, or vice versa—Subversion will become very confused.

Let's call the two working copies Blue and Green. Do the following:

a) Create Blue/ex01/planets.txt, and add the following lines:

Mercury
Venus
Earth
Mars
Jupiter
Saturn

Commit the file.

b) Update the Green repository. (You should get a copy of planets.txt.)

c) Change Blue/ex01/planets.txt so that it reads:

1. Mercury
2. Venus
3. Earth
4. Mars
5. Jupiter
6. Saturn

Commit the changes.

d) Edit Green/ex01/planets.txt so that its contents are as shown below. Do not do svn update before editing this file, as that will spoil the exercise.

Mercury 0
Venus 0
Earth 1
Mars 2
Jupiter 16 (and counting)
Saturn 14 (and counting)

e) Now, in Green, do svn update. Subversion should tell you that there are conflicts in planets.txt. Resolve the conflicts so that the file contains:

1. Mercury 0
2. Venus 0
3. Earth 1
4. Mars 2
5. Jupiter 16
6. Saturn 14

Commit the changes.

f) Update the Blue repository, and check that planets.txt now has the same content as it has in the Green repository.

Exercise 5.5:

Add another line or two to course/ex01/bio.txt and commit those changes. Then, use svn merge to restore the original contents of your biography (course/ex01/bio.txt), and commit the result. When you are done, bio.txt should look the way it did at the end of the first part of the previous exercise.) Note: the purpose of this exercise is to teach you how to go back in time to get old versions of files—while it would be simpler in this case just to edit bio.txt, you can't (reliably) do that when you've made larger changes, to multiple files, over a longer period of time.

Exercise 5.6:

Subversion allows users to set properties on files and directories using svn propset, and to inspect their values using svn propget. Describe three properties you might want to change on a file or directory, and how you might use them in your current project.

Send comments

Automated Builds


Introduction


You Can Skip This Lecture If...


Automate, Automate, Automate


Make


Our Example


Hello, Make


Terminology


Multiple Targets


Phony Targets


Dependencies


Updating Dependencies


Conventions


Automatic Variables


Automatic Variables Example


Pattern Rules


Adding More Dependencies


Tidying Up


Defining Macros


Passing Values to Make


Functions


Commonly-Used Functions


Pros and Cons


Alternatives


Summary


Exercises

Exercise 6.1:

Make gets definitions from environment variables, command-line parameters, and explicit definitions in Makefiles. What order does it check these in?

Send comments

Basic Scripting


Introduction


You Can Skim This Lecture If...


Python's Strengths


Python's Weaknesses


Why Another Language?


Execution Cycle


Running Python Programs


Execution Shortcuts


Variables


Possible Mistakes


Printing


Quoting


Converting Values to Strings


Escape Sequences


Numbers


Arithmetic


Booleans


Short-Circuit Evaluation


Comparisons


String Comparisons


Conditionals


Why Indentation?


While Loops


Break and Continue


String Formatting


Format Specifiers


Supported Formats


Summary

Strings, Lists, and Files


Introduction


You Can Skip This Lecture If...


Strings


Immutability


Slicing


Bounds Checking


Negative Indices


Consequences


Methods


String Methods


Notes on String Methods


Chaining Method Calls


Testing for Membership


Lists


Modifying Lists


Concatenation


Deleting List Elements


List Methods


Notes on List Methods


For Loops


Ranges


Ranged Loops


Membership


Nesting Lists


Aliasing


Indexing vs. Slicing


Tuples


Multi-Valued Assignment


Unpacking Structures in Loops


Files


Copying a File


Looping Over Files


Other Ways To Copy Files


Summary


Exercises

Exercise 8.1:

What does "aaaaa".count("aaa") return? Why?

Exercise 8.2:

What do each of the following five code fragments do? Why?

x = ['a', 'b', 'c', 'd']
x[0:2] = []
x = ['a', 'b', 'c', 'd']
x[0:2] = ['q']
x = ['a', 'b', 'c', 'd']
x[0:2] = 'q'
x = ['a', 'b', 'c', 'd']
x[0:2] = 99
x = ['a', 'b', 'c', 'd']
x[0:2] = [99]

Exercise 8.3:

What does 'a'.join(['b', 'c', 'd']) return? If you have a list of strings, how can you concatenate them in a single statement? Why do you think join is written this way, rather than as ['b', 'c', 'd'].join('a')?

Send comments

Functions and Libraries


Introduction


You Can Skip This Lecture If...


Defining Functions


Returning Values


Everything Returns Something


Scope


Parameter Passing Rules


Making Copies


Default Parameter Values


Functions Are Objects


Function Object Examples


Function Attributes


Creating Modules


Module Scope


Other Ways to Import


Import Executes Statements


Knowing Who You Are


The System Library


Command-Line Arguments


Standard I/O


The Python Search Path


Exiting


The Math Library


Working with the File System


File and Directory Status


Manipulating Pathnames


Summary


Exercises

Exercise 9.1:

Write a function that takes two strings called text and fragment as arguments, and returns the number of times fragment appears in the second half of text. Your function must not create a copy of the second half of text. (Hint: read the documentation for string.count.)

Exercise 9.2:

What does the Python keyword global do? What are some reasons not to write code that uses it?

Exercise 9.3:

Python allows you to import all the functions and variables in a module at once, making them local name. For example, if the module is called values, and contains a variable called Threshold and a function called limit, then after the statement from values import *, you can then refer directly to Threshold and limit, rather than having to use values.Threshold or values.limit. Explain why this is generally considered a bad thing to do, even though it reduces the amount programmers have to type.

Exercise 9.4:

sys.stdin, sys.stdout, and sys.stderr are variables, which means that you can assign to them. For example, if you want to change where print sends its output, you can do this:

import sys

print 'this goes to stdout'
temp = sys.stdout
sys.stdout = open('temporary.txt', 'w')
print 'this goes to temporary.txt'
sys.stdout = temp

Do you think this is a good programming practice? When and why do you think its use might be justified?

Exercise 9.5:

os.stat(path) returns an object whose members describe various properties of the file or directory identified by path. Using this, write a function that will determine whether or not a file is more than one year old.

Exercise 9.6:

Write a Python program that takes as its arguments two years (such as 1997 and 2007), prints out the number of days between the 15th of each month from January of the first year until December of the last year.

Exercise 9.7:

Write a simple version of which in Python. Your program should check each directory on the caller's path (in order) to find an executable program that has the name given to it on the command line.

Exercise 9.8:

In the default parameter value example, why does total use a default value of None for end, rather than an integer such as 0 or -1?

Exercise 9.9:

What does the * in front of the parameter extras mean in the following code example?

def total(*extras):
    result = 0
    for e in extras:
        result += e
    return result

Hint: look at the following three examples:

print total()
print total(19)
print total(2, 3, 5)

Exercise 9.10:

Use the os.path, stat, and time modules to write a program that finds all files in a directory whose names end with a specific suffix, and which are more than a certain number of days old. For example, if your program is run as oldfiles /tmp .backup 10, it will print a list of all files in the /tmp directory whose names end in .backup that are more than 10 days old.

Exercise 9.11:

The Strings, Lists, and Files ended by showing several different ways to copy files using Python. Read the documentation for the shutil module, and see if there's a simpler way.

Exercise 9.12:

Consider the short program shown below:

def add_and_max(new_value, collection=[]):
    collection.append(new_value)
    return max(collection)

print 'first call:', add_and_max(22)
print 'second call:', add_and_max(9)
print 'third call:', add_and_max(15)

What do you expect its output to be? What is its actual output? Why?

Send comments

Style


Introduction


You Can Skip This Lecture If...


Reading is Learning


Seven Plus or Minus


The Mind's Eye


What Does This Have to Do With Programming?


Python Style Guide


Naming


Scope and Size


The Difference It Makes


Function Length


What Does This Function Do?


Ways to Answer the Question


Other Sources of Information


Idioms


Style Tools


Python Style Tools


Documentation


More On Documentation


Traceability


Tracing Data


Embedding Documentation


Docstrings


Summary

Quality Assurance


Introduction


You Can Skip This Lecture If...


Limits to Testing


Terminology


Test Results and Specifications


Structuring Tests


A Simple Example


Catching Errors


Simple Exception Example


Exception Objects


Exception Hierarchy


Functions and Exceptions


Raising Exceptions


Exceptional Style


Handling Errors in Tests


Test-Driven Design


TDD Example


Design by Contract


Assertions


Defensive Programming


Summary

Sets, Dictionaries, and Complexity


Introduction


You Can Skip This Lecture If...


Sets


Set Operations


Set Example


How Set Values Are Stored


Immutability


Frozen Sets


A Note on Language Design


Efficiency


Complexity Curves


Algorithmic Complexity


Motivating Dictionaries


Creating and Indexing


Updating Dictionaries


Membership and Loops


Dictionary Methods


Counting Frequency


A Slight Simplification


Imposing Order


Inverting a Dictionary


Another Way to Do It


Formatting Strings with Dictionaries


Extra Keyword Arguments


Extra Positional Arguments


Summary

Debugging


Introduction


You Can Skip This Lecture If...


What's Wrong with Print Statements


Symbolic Debuggers


Debugger Features


Kinds of Debuggers


Integrated Development Environments


Command-Line Debuggers


Inspecting Values


Controlling Execution


Under the Hood


Implementing Breakpoints


Inspecting More Values


Conditional Breakpoints and Watchpoints


Logging


Logging Levels


Logging Example


Agans' Rules


Rule 0: Get It Right the First Time


Rule 1: What Is It Supposed to Do?


Rule 2: Is It Plugged In?


Rule 3: Make It Fail


Alternatives


Rule 4: Divide and Conquer


Rule 5: Change One Thing at a Time, For a Reason


Rule 6: Write It Down


Rule 7: Be Humble


Summary

Object-Oriented Programming


Introduction


Objects to the Rescue


You Can Skip This Lecture If...


Abstract Data Types


Classes and Instances


Defining a Class


Creating an Instance


Methods


Creating Members


Encapsulation


Constructors


Constructor Style


Special Methods


New Classes from Old


Inheritance Example


Overriding Methods


Polymorphism


Duck Typing


The Liskov Substitution Principle


Tidal Pools Revisited


Class, Responsibility, Collaborator


Summary

More on Objects


Introduction


You Can Skip This Lecture If...


Length


Overloading Operators


Commutativity


Other Special Methods


Example: Sparse Vector


How Long is a Sparse Vector?


Vector Behavior


Dot Product


Addition


Testing


Static Data Members


Static Methods


Design Patterns


The Singleton Pattern


Singleton Implementation


Demonstration


The Visitor Pattern


Visitor Implementation


Demonstration


The Abstract Factory Pattern


Abstract Factory Builder


Abstract Factory Manager


Demonstration


The Command Pattern


Base Command Class


A Particular Command


Demonstration


A Few Others


Summary

Unit Testing


Introduction


JUnit and Its Children


You Can Skip This Lecture If...


The Big Idea


Checking


Example: Checking Addition


Running Sums


Flawed Implementation


Check and Re-check


Is This Cost-Effective?


Eliminating Redundancy


Testing Exceptions


Manual Exception Testing Example


Testing I/O


I/O Testing Example


Stubs and Mock Objects


Test Performance


Choosing Test Cases


Example: Rectangle Overlap


Solution


What Tests To Write First


Summary


Exercises

Exercise 16.1:

Python has another unit testing module called doctest. It searches files for sections of text that look like interactive Python sessions, then re-executes those sections and checks the results. A typical use is shown below.

def ave(values):
    '''Calculate an average value, or 0.0 if 'values' is empty.
    >>> ave([])
    0.0
    >>> ave([3])
    3.0
    >>> ave([15, -1.0])
    7.0
    '''

    sum = 0.0
    for v in values:
        sum += v
    return sum / float(max(1, len(values)))

if __name__ == '__main__':
    import doctest
    doctest.testmod()

Convert a handful of the tests you have written for other questions in this lecture to use doctest. Do you prefer it to unittest? Why or why not? Do you think doctest makes it easier to test small problems? Large ones? Would it be possible to write something similar for C, Java, Fortran, or Mathematica?

Send comments

Regular Expressions


Introduction


You Can Skip This Lecture If...


A Simple Example


This or That


Precedence


Escaping Special Characters


Raw Strings


Sequences


Making Something Optional


Character Sets


Abbreviations


Special Cases


Anchoring


Extracting Matches


Match Objects


Match Groups


Reversing Columns


Compiling


Finding Title Case Words


Finding All Matches


Reference Material


But Wait, There's More


Summary


Exercises

Exercise 17.1:

By default, regular expression matches are greedy: the first term in the RE matches as much as it can, then the second part, and so on. As a result, if you apply the RE «X(.*)X(.*)» to the string "XaX and XbX", the first group will contain "aX and Xb", and the second group will be empty.

It's also possible to make REs match reluctantly, i.e., to have the parts match as little as possible, rather than as much. Find out how to do this, and then modify the RE in the previous paragraph so that the first group winds up containing "a", and the second group " and XbX".

Exercise 17.2:

What the easiest way to write a case-insensitive regular expression? (Hint: read the documentation on compilation options.)

Exercise 17.3:

What does the VERBOSE option do when compiling a regular expression? Use it to rewrite some of the REs in this lecture in a more readable way.

Exercise 17.4:

What does the DOTALL option do when compiling a regular expression? Use it to get rid of the call to string.split in the example that finds words ending in vowels.

Send comments

Binary Data


Introduction


You Can Skip This Lecture If...


Why Binary?


How Numbers Are Stored


Two's Complement


Bitwise Operators


Shifting


Cautions


Setting and Clearing Bits


Bit Flags


Floating Point


Floating Point Spacing


Floating Point Roundoff


Binary I/O


Binary I/O Mode


Packing and Unpacking


Packing Data


Unpacking Data


The struct Module


Hexadecimal Characters


Format Specifiers


Calculating Sizes


Endianness


Packing Variable-Length Data


Unpacking Variable-Length Data


Dynamic Formats


Unpacking Dynamic Formats