Software Carpentry

License

Copyright © 2005-06 Python Software Foundation

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Introduction


Introduction


Self Assessment


The State of Play


Meeting Standards


The Grass Isn't That Much Greener


Hidden in Plain Sight


The Times They Are A-Changin'


This Course


Setting Up


A Note on Tool Choice


Contributing


Recommended Reading


Typographic Conventions


Summary


Exercises

Exercise 2.1:

What is the largest software project you have ever worked on? How well did it meet its original objectives? What is the most important thing you learned from it?

Exercise 2.2:

Write a point-form list of the programming tools you use on a regular basis. When and how did you learn each one? How proficient do you think you are with each? Compared to whom?

Exercise 2.3:

Suppose you have been given one week to write a program to translate old-style configuration files to a new syntax. Write a point-form description of how you would go about it.

Exercise 2.4:

Rewrite the following fragment of code to make it more readable. Don't worry about the fact that you don't know the language it's written in; feel free to use any functions or language features you're familiar with from other languages.

i = open('oldconfig.cnf', 'r');
ll = i.readlines();
for j in 0..len(ll) {
    if len(j) > 0 {
        if not defined(r) r = new list;
        r.append(j);
    }
}
sort(r);
print 'longest line is', r[0];

Exercise 2.5:

What are the errors in the function shown below? Don't worry about the lack of variable declarations: this language doesn't need them. Note that, like C and Java, this language uses 0 as the first index for lists.

# Calculate a running sum of a list of numbers.
# If the input values are [1, 2, 3], the final values are [1, 3, 6].

def running_sum(values) {
    i = 1;
    while (i < len(values)) {
        values[i] = values[i] + values[i-1];
    }
}

Exercise 2.6:

A sub-contractor in Euphoristan has just written a function that takes two lists of phone numbers (represented as strings), and returns all those in the first list that are not in the second. You only have a few minutes to test it before she goes off-line for the weekend; what are the first half-dozen test cases you would try?

Send comments

Shell Basics


Introduction


You Can Skip This Lecture If...


The Shell


The Shell is Not the Operating Sytsem


The File System


Paths


Navigating the File System


Execution Cycle


Providing Options


Creating Files and Directories


Looking at Files


Basic Tools


Summary


Exercises

Exercise 3.1:

Suppose ls shows you this:

Makefile    biography.txt   data    enrolment.txt   programs    thesis

What argument(s) will make it print the names in reverse, like this:

thesis  programs    enrolment.txt   data    biography.txt   Makefile

Exercise 3.2:

What does the command cd ~ do? What about cd ~hpotter?

Exercise 3.3:

What command will show you the first 10 lines of a file? The first 25? The last 12?

Exercise 3.4:

What do the commands pushd, popd, and dirs do? Where do their names come from?

Exercise 3.5:

How would you send the file earth.txt to the default printer? How would you check it made it (other than wandering over to the printer and standing there)?

Exercise 3.6:

The instructor wants you to use a hitherto unknown command for manipulating files. How would you get help on this command?

Exercise 3.7:

diff finds and displays the differences between two text files. For example, if you modify earth.txt to create a new file earth2.txt that contains:

Name: Earth
Period: 365.26 days
Inclination: 0.00 degrees
Eccentricity: 0.02
Satellites: 1

you can then compare the two files like this:

$ diff earth.txt earth2.txt
3c3
< Inclination: 0.00
---
> Inclination: 0.00 degrees
4a5
> Satellites: 1

(The rather cryptic header "3c3" means that line 3 of the first file must be changed to get line 3 of the second; "4a5" means that a line is being added after line 4 of the original file.)

What flag(s) should you give diff to tell it to ignore changes that just insert or delete blank lines? What if you want to ignore changes in case (i.e., treat lowercase and uppercase letters as the same)?

Send comments

More Shell


Introduction


You Can Skip This Lecture If...


Wildcards


Redirecting Input and Output


Redirection Examples


Pipes


Environment Variables


Setting Environment Variables


Configuration


How the Shell Finds Programs


Common Search Path Entries


Cygwin on Windows


File Ownership and Permissions


Directory Permissions


Changing Permissions


Ownership and Permission: Windows


More Advanced Tools


Summary


Exercises

Exercise 4.1:

-rwxr-xr-x   1 aturing   cambridge  69 Jul 12 09:17 mars.txt
-rwxr-xr-x   1 ghopper   usnavy     71 Jul 12 09:15 venus.txt

According to the listing of the data directory above, who can read the file earth.txt? Who can write it (i.e., change its contents or delete it)? When was earth.txt last changed? What command would you run to allow everyone to edit or delete the file?

Exercise 4.2:

Suppose you want to remove all files whose names (not including their extensions) are of length 3, start with the letter a, and have .txt as extension. What command would you use? For example, if the directory contains three files a.txt, abc.txt, and abcd.txt, the command should remove abc.txt , but not the other two files.

Exercise 4.3:

You're worried your data files can be read by your nemesis, Dr. Evil. How would you check whether or not he can, and if necessary change permissions so only you can read or write the files?

Exercise 4.4:

What's the difference between the commands cd HOME and cd $HOME?

Exercise 4.5:

Suppose you want to list the names of all the text files in the data directory that contain the word "carpentry". What command or commands could you use?

Exercise 4.6:

Suppose you have written a program called analyze. What command or commands could you use to display the first ten lines of its output? What would you use to display lines 50-100? To send lines 50-100 to a file called tmp.txt?

Exercise 4.7:

The command ls data > tmp.txt writes a listing of the data directory's contents into tmp.txt. Anything that was in the file before the command was run is overwritten. What command could you use to append the listing to tmp.txt instead?

Exercise 4.8:

What command(s) would you use to find out how many subdirectories there are in the lectures directory?

Exercise 4.9:

What does rm *.ch? What about rm *.[ch]?

Exercise 4.10:

What command(s) could you use to find out how many instances of a program are running on your computer at once? For example, if you are on Windows, what would you do to find out how many instances of svchost.exe are running? On Unix, what would you do to find out how many instances of bash are running?

Exercise 4.11:

A colleague asks for your data files. How would you archive them to send as one file? How could you compress them?

Exercise 4.12:

You have changed a text file on your home PC, and mailed it to the university terminal. What steps can you take to see what changes you may have made, compared with a master copy in your home directory?

Exercise 4.13:

How would you change your password?

Exercise 4.14:

grep is one of the more useful tools in the toolbox. It finds lines in files that match a pattern and prints them out. For example, assume the files earth.txt and venus.txt contain lines like this:

Name: Earth
Period: 365.26 days
Inclination: 0.00
Eccentricity: 0.02

grep can extract lines containing the text "Period" from all the files:

$ grep Period *.txt
earth.txt:Period: 365.26 days
venus.txt:Period: 224.70 days

Search strings can use regular expressions, which will be discussed in a Regular Expressions. grep takes many options as well; for example, grep -c /bin/bash /etc/passwd reports how many lines in /etc/passwd (the Unix password file) that contain the string /bin/bash, which in turn tells me how many users are using bash as their shell.

Suppose all you wanted was a list of the files that contained lines matching a pattern, rather than the matches themselves—what flag or flags would you give to grep? What if you wanted the line numbers of matching lines?

Exercise 4.15:

Suppose you wanted ls to sort its output by filename extension, i.e., to list all .cmd files before all .exe files, and all .exe's before all .txt files. What command or commands would you use?

Exercise 4.16:

What does the alias command do? When would you use it?

Send comments

Version Control


Introduction


You Can Skip This Lecture If...


Problem #1: Collaboration


Solution: Version Control


Problem #2: Undoing Changes


Solution: Version Control (Again)


Which Version Control System?


Basic Use


How To Do It


Resolving Conflicts


Example of Resolving


Example of Resolving (continued)


Example of Resolving (continued)


Starvation


Binary Files


Reverting


Rolling Back


Creating and Checking Out


Subversion Command Reference


Reading Subversion Output


Summary


Exercises

Exercise 5.1:

Follow the instructions given to you by your instructor to check out a copy of the Subversion repository you'll be using in this course. Unless otherwise noted, the exercises below assume that you have done this, and that your working copy is in a directory called course. You will submit all of your exercises in this course by checking files into your repository.

Exercise 5.2:

Create a file course/ex01/bio.txt (where course is the root of your working copy of your Subversion repository), and write a short biography of yourself (100 words or so) of the kind used in academic journals, conference proceedings, etc. Commit this file to your repository. Remember to provide a meaningful comment when committing the file!

Exercise 5.3:

What's the difference between mv and svn mv? Put the answer in a file called course/ex01/mv.txt and commit your changes.

Once you have committed your changes, type svn log in your course directory. If you didn't know what you'd just done, would you be able to figure it out from the log messages? If not, why not?

Exercise 5.4:

In this exercise, you'll simulate the actions of two people editing a single file. To do that, you'll need to check out a second copy of your repository. One way to do this is to use a separate computer (e.g., your laptop, your home computer, or a machine in the lab). Another is to make a temporary directory, and check out a second copy of your repository there. Please make sure that the second copy isn't inside the first, or vice versa—Subversion will become very confused.

Let's call the two working copies Blue and Green. Do the following:

a) Create Blue/ex01/planets.txt, and add the following lines:

Mercury
Venus
Earth
Mars
Jupiter
Saturn

Commit the file.

b) Update the Green repository. (You should get a copy of planets.txt.)

c) Change Blue/ex01/planets.txt so that it reads:

1. Mercury
2. Venus
3. Earth
4. Mars
5. Jupiter
6. Saturn

Commit the changes.

d) Edit Green/ex01/planets.txt so that its contents are as shown below. Do not do svn update before editing this file, as that will spoil the exercise.

Mercury 0
Venus 0
Earth 1
Mars 2
Jupiter 16 (and counting)
Saturn 14 (and counting)

e) Now, in Green, do svn update. Subversion should tell you that there are conflicts in planets.txt. Resolve the conflicts so that the file contains:

1. Mercury 0
2. Venus 0
3. Earth 1
4. Mars 2
5. Jupiter 16
6. Saturn 14

Commit the changes.

f) Update the Blue repository, and check that planets.txt now has the same content as it has in the Green repository.

Exercise 5.5:

Add another line or two to course/ex01/bio.txt and commit those changes. Then, use svn merge to restore the original contents of your biography (course/ex01/bio.txt), and commit the result. When you are done, bio.txt should look the way it did at the end of the first part of the previous exercise.) Note: the purpose of this exercise is to teach you how to go back in time to get old versions of files—while it would be simpler in this case just to edit bio.txt, you can't (reliably) do that when you've made larger changes, to multiple files, over a longer period of time.

Exercise 5.6:

Subversion allows users to set properties on files and directories using svn propset, and to inspect their values using svn propget. Describe three properties you might want to change on a file or directory, and how you might use them in your current project.

Send comments

Automated Builds


Introduction


You Can Skip This Lecture If...


Automate, Automate, Automate


Make


Our Example


Hello, Make


Terminology


Multiple Targets


Phony Targets


Dependencies


Updating Dependencies


Conventions


Automatic Variables


Automatic Variables Example


Pattern Rules


Adding More Dependencies


Tidying Up


Defining Macros


Passing Values to Make


Functions


Commonly-Used Functions


Pros and Cons


Alternatives


Summary


Exercises

Exercise 6.1:

Make gets definitions from environment variables, command-line parameters, and explicit definitions in Makefiles. What order does it check these in?

Send comments

Basic Scripting


Introduction


You Can Skim This Lecture If...


Python's Strengths


Python's Weaknesses


Why Another Language?


Execution Cycle


Running Python Programs


Execution Shortcuts


Variables


Possible Mistakes


Printing


Quoting


Converting Values to Strings


Escape Sequences


Numbers


Arithmetic


Booleans


Short-Circuit Evaluation


Comparisons


String Comparisons


Conditionals


Why Indentation?


While Loops


Break and Continue


String Formatting


Format Specifiers


Supported Formats


Summary

Strings, Lists, and Files


Introduction


You Can Skip This Lecture If...


Strings


Immutability


Slicing


Bounds Checking


Negative Indices


Consequences


Methods


String Methods


Notes on String Methods


Chaining Method Calls


Testing for Membership


Lists


Modifying Lists


Concatenation


Deleting List Elements


List Methods


Notes on List Methods


For Loops


Ranges


Ranged Loops


Membership


Nesting Lists


Aliasing


Indexing vs. Slicing


Tuples


Multi-Valued Assignment


Unpacking Structures in Loops


Files


Copying a File


Looping Over Files


Other Ways To Copy Files


Summary


Exercises

Exercise 8.1:

What does "aaaaa".count("aaa") return? Why?

Exercise 8.2:

What do each of the following five code fragments do? Why?

x = ['a', 'b', 'c', 'd']
x[0:2] = []
x = ['a', 'b', 'c', 'd']
x[0:2] = ['q']
x = ['a', 'b', 'c', 'd']
x[0:2] = 'q'
x = ['a', 'b', 'c', 'd']
x[0:2] = 99
x = ['a', 'b', 'c', 'd']
x[0:2] = [99]

Exercise 8.3:

What does 'a'.join(['b', 'c', 'd']) return? If you have a list of strings, how can you concatenate them in a single statement? Why do you think join is written this way, rather than as ['b', 'c', 'd'].join('a')?

Send comments

Functions and Libraries


Introduction


You Can Skip This Lecture If...


Defining Functions


Returning Values


Everything Returns Something


Scope


Parameter Passing Rules


Making Copies


Default Parameter Values


Functions Are Objects


Function Object Examples


Function Attributes


Creating Modules


Module Scope


Other Ways to Import


Import Executes Statements


Knowing Who You Are


The System Library


Command-Line Arguments


Standard I/O


The Python Search Path


Exiting


The Math Library


Working with the File System


File and Directory Status


Manipulating Pathnames


Summary


Exercises

Exercise 9.1:

Write a function that takes two strings called text and fragment as arguments, and returns the number of times fragment appears in the second half of text. Your function must not create a copy of the second half of text. (Hint: read the documentation for string.count.)

Exercise 9.2:

What does the Python keyword global do? What are some reasons not to write code that uses it?

Exercise 9.3:

Python allows you to import all the functions and variables in a module at once, making them local name. For example, if the module is called values, and contains a variable called Threshold and a function called limit, then after the statement from values import *, you can then refer directly to Threshold and limit, rather than having to use values.Threshold or values.limit. Explain why this is generally considered a bad thing to do, even though it reduces the amount programmers have to type.

Exercise 9.4:

sys.stdin, sys.stdout, and sys.stderr are variables, which means that you can assign to them. For example, if you want to change where print sends its output, you can do this:

import sys

print 'this goes to stdout'
temp = sys.stdout
sys.stdout = open('temporary.txt', 'w')
print 'this goes to temporary.txt'
sys.stdout = temp

Do you think this is a good programming practice? When and why do you think its use might be justified?

Exercise 9.5:

os.stat(path) returns an object whose members describe various properties of the file or directory identified by path. Using this, write a function that will determine whether or not a file is more than one year old.

Exercise 9.6:

Write a Python program that takes as its arguments two years (such as 1997 and 2007), prints out the number of days between the 15th of each month from January of the first year until December of the last year.

Exercise 9.7:

Write a simple version of which in Python. Your program should check each directory on the caller's path (in order) to find an executable program that has the name given to it on the command line.

Exercise 9.8:

In the default parameter value example, why does total use a default value of None for end, rather than an integer such as 0 or -1?

Exercise 9.9:

What does the * in front of the parameter extras mean in the following code example?

def total(*extras):
    result = 0
    for e in extras:
        result += e
    return result

Hint: look at the following three examples:

print total()
print total(19)
print total(2, 3, 5)

Exercise 9.10:

Use the os.path, stat, and time modules to write a program that finds all files in a directory whose names end with a specific suffix, and which are more than a certain number of days old. For example, if your program is run as oldfiles /tmp .backup 10, it will print a list of all files in the /tmp directory whose names end in .backup that are more than 10 days old.

Exercise 9.11:

The Strings, Lists, and Files ended by showing several different ways to copy files using Python. Read the documentation for the shutil module, and see if there's a simpler way.

Exercise 9.12:

Consider the short program shown below:

def add_and_max(new_value, collection=[]):
    collection.append(new_value)
    return max(collection)

print 'first call:', add_and_max(22)
print 'second call:', add_and_max(9)
print 'third call:', add_and_max(15)

What do you expect its output to be? What is its actual output? Why?

Send comments

Style


Introduction


You Can Skip This Lecture If...


Reading is Learning


Seven Plus or Minus


The Mind's Eye


What Does This Have to Do With Programming?


Python Style Guide


Naming


Scope and Size


The Difference It Makes


Function Length


What Does This Function Do?


Ways to Answer the Question


Other Sources of Information


Idioms


Style Tools


Python Style Tools


Documentation


More On Documentation


Traceability


Tracing Data


Embedding Documentation


Docstrings


Summary

Quality Assurance


Introduction


You Can Skip This Lecture If...


Limits to Testing


Terminology


Test Results and Specifications


Structuring Tests


A Simple Example


Catching Errors


Simple Exception Example


Exception Objects


Exception Hierarchy


Functions and Exceptions


Raising Exceptions


Exceptional Style


Handling Errors in Tests


Test-Driven Design


TDD Example


Design by Contract


Assertions


Defensive Programming


Summary

Sets, Dictionaries, and Complexity


Introduction


You Can Skip This Lecture If...


Sets


Set Operations


Set Example


How Set Values Are Stored


Immutability


Frozen Sets


A Note on Language Design


Efficiency


Complexity Curves


Algorithmic Complexity


Motivating Dictionaries


Creating and Indexing


Updating Dictionaries


Membership and Loops


Dictionary Methods


Counting Frequency


A Slight Simplification


Imposing Order


Inverting a Dictionary


Another Way to Do It


Formatting Strings with Dictionaries


Extra Keyword Arguments


Extra Positional Arguments


Summary

Debugging


Introduction


You Can Skip This Lecture If...


What's Wrong with Print Statements


Symbolic Debuggers


Debugger Features


Kinds of Debuggers


Integrated Development Environments


Command-Line Debuggers


Inspecting Values


Controlling Execution


Under the Hood


Implementing Breakpoints


Inspecting More Values


Conditional Breakpoints and Watchpoints


Logging


Logging Levels


Logging Example


Agans' Rules


Rule 0: Get It Right the First Time


Rule 1: What Is It Supposed to Do?


Rule 2: Is It Plugged In?


Rule 3: Make It Fail


Alternatives


Rule 4: Divide and Conquer


Rule 5: Change One Thing at a Time, For a Reason


Rule 6: Write It Down


Rule 7: Be Humble


Summary

Object-Oriented Programming


Introduction


Objects to the Rescue


You Can Skip This Lecture If...


Abstract Data Types


Classes and Instances


Defining a Class


Creating an Instance


Methods


Creating Members


Encapsulation


Constructors


Constructor Style


Special Methods


New Classes from Old


Inheritance Example


Overriding Methods


Polymorphism


Duck Typing


The Liskov Substitution Principle


Tidal Pools Revisited


Class, Responsibility, Collaborator


Summary

More on Objects


Introduction


You Can Skip This Lecture If...


Length


Overloading Operators


Commutativity


Other Special Methods


Example: Sparse Vector


How Long is a Sparse Vector?


Vector Behavior


Dot Product


Addition


Testing


Static Data Members


Static Methods


Design Patterns


The Singleton Pattern


Singleton Implementation


Demonstration


The Visitor Pattern


Visitor Implementation


Demonstration


The Abstract Factory Pattern


Abstract Factory Builder


Abstract Factory Manager


Demonstration


The Command Pattern


Base Command Class


A Particular Command


Demonstration


A Few Others


Summary

Unit Testing


Introduction


JUnit and Its Children


You Can Skip This Lecture If...


The Big Idea


Checking


Example: Checking Addition


Running Sums


Flawed Implementation


Check and Re-check


Is This Cost-Effective?


Eliminating Redundancy


Testing Exceptions


Manual Exception Testing Example


Testing I/O


I/O Testing Example


Stubs and Mock Objects


Test Performance


Choosing Test Cases


Example: Rectangle Overlap


Solution


What Tests To Write First


Summary


Exercises

Exercise 16.1:

Python has another unit testing module called doctest. It searches files for sections of text that look like interactive Python sessions, then re-executes those sections and checks the results. A typical use is shown below.

def ave(values):
    '''Calculate an average value, or 0.0 if 'values' is empty.
    >>> ave([])
    0.0
    >>> ave([3])
    3.0
    >>> ave([15, -1.0])
    7.0
    '''

    sum = 0.0
    for v in values:
        sum += v
    return sum / float(max(1, len(values)))

if __name__ == '__main__':
    import doctest
    doctest.testmod()

Convert a handful of the tests you have written for other questions in this lecture to use doctest. Do you prefer it to unittest? Why or why not? Do you think doctest makes it easier to test small problems? Large ones? Would it be possible to write something similar for C, Java, Fortran, or Mathematica?

Send comments

Regular Expressions


Introduction


You Can Skip This Lecture If...


A Simple Example


This or That


Precedence


Escaping Special Characters


Raw Strings


Sequences


Making Something Optional


Character Sets


Abbreviations


Special Cases


Anchoring


Extracting Matches


Match Objects


Match Groups


Reversing Columns


Compiling


Finding Title Case Words


Finding All Matches


Reference Material


But Wait, There's More


Summary


Exercises

Exercise 17.1:

By default, regular expression matches are greedy: the first term in the RE matches as much as it can, then the second part, and so on. As a result, if you apply the RE «X(.*)X(.*)» to the string "XaX and XbX", the first group will contain "aX and Xb", and the second group will be empty.

It's also possible to make REs match reluctantly, i.e., to have the parts match as little as possible, rather than as much. Find out how to do this, and then modify the RE in the previous paragraph so that the first group winds up containing "a", and the second group " and XbX".

Exercise 17.2:

What the easiest way to write a case-insensitive regular expression? (Hint: read the documentation on compilation options.)

Exercise 17.3:

What does the VERBOSE option do when compiling a regular expression? Use it to rewrite some of the REs in this lecture in a more readable way.

Exercise 17.4:

What does the DOTALL option do when compiling a regular expression? Use it to get rid of the call to string.split in the example that finds words ending in vowels.

Send comments

Binary Data


Introduction


You Can Skip This Lecture If...


Why Binary?


How Numbers Are Stored


Two's Complement


Bitwise Operators


Shifting


Cautions


Setting and Clearing Bits


Bit Flags


Floating Point


Floating Point Spacing


Floating Point Roundoff


Binary I/O


Binary I/O Mode


Packing and Unpacking


Packing Data


Unpacking Data


The struct Module


Hexadecimal Characters


Format Specifiers


Calculating Sizes


Endianness


Packing Variable-Length Data


Unpacking Variable-Length Data


Dynamic Formats


Unpacking Dynamic Formats


Metadata


Metadata File Structure


Packing with Metadata


Unpacking with Metadata


Testing


Summary

XML


Introduction


You Can Skip This Lecture If...


In the Beginning


The Modern Era


Formatting Rules


Document Structure


Text


XHTML


Sample XHTML Page


Critique of HTML/XHTML


Attributes


Attributes Vs. Elements


More XHTML Tags


Lists and Tables


Example


Images


Links


The Document Object Model


The Basics


DOM Tree Example


More On Tree Structure


Creating a Tree


Converting to Text


Other Ways To Create Documents


The Details


Finding Nodes


Walking a Tree


Recursive Tree Walker


Modifying the Tree


Complications


Solution


Not Finished Yet


Summary

Relational Databases


Introduction


You Can Skip This Lecture If...


History


When To Use A Database


Getting Started


Example: Experimental Data


Using SQL


Creating Tables


Inserting Data


Simple Queries


Sorting


Selection


Joins


Example: Translating IDs


Keys and Constraints


Eliminating Duplicates


Aggregation


Grouping


Self Joins


Using Self Joins


Who Has Worked Together?


Null


Operations on Nulls


Managing Nulls


Database Design


Normal Forms


Nested Queries


Nested Query Example


More Uses for Nested Queries


Using Other Languages


Example: Database Access from Python


Concurrency


Transactions


Example: Changing User ID


Using Transactions


Testing


Advanced Topics


Summary

Spreadsheets


Introduction


You Can Skip This Lecture If...


First Steps


Entering Data


Formatting Data


Formulas


Replicating Formulas


Built-In Functions


Commonly-Used Functions


Dependencies


Conditionals


Multi-Valued Conditionsl


Lookup Tables


Lookup Table Example


Absolute References


Adjusting The Formula


A Larger Data Set


Creating Charts


Customizing The Display


Creating A Log-Log Chart


Fixing The Error


Analysis


Programming


Summary


Exercises

Exercise 21.1:

Spreadsheets use conditional expressions, rather than conditional statements. C/C++, Java, and Python also support conditional expressions. How are they written? When should you use them? When shouldn't you?

Exercise 21.2:

$B$9 is an absolute reference to the cell B9. What does the expression $B9 refer to? What about B$9? When would you use expressions like these?

Send comments

Integration


Introduction


You Can Skip This Lecture If...


Running External Programs


The subprocess Module


Running In Place


Running With Arguments


Capturing Output


Providing Input


Deadlock


Pros and Cons


Plan B: Integrating with C


How Python Represents Objects


Calling Conventions


Boilerplate


Loading and Calling


What About C++?


SWIG


Integrating the Other Way


Loading Modules


Plugin Frameworks


Manual Loading


Using Manual Loading


Manipulating Namespaces


Summary

Web Client Programming


Introduction


You Can Skip This Lecture If...


Small Pieces, Loosely Joined


Distributed Is Different


Partial Failure


Under the Hood


Sockets


Client/Server vs. Peer-to-Peer


Socket Client


Socket Server


The Hypertext Transfer Protocol


HTTP Request Line


Headers


Body


HTTP Response


HTTP Response Codes


HTTP Example


Fetching Pages


urllib Example


Building A Spider


Passing Parameters


Special Characters


Encoding Example


Screen Scraping (And Why Not)


Web Services


Example: Amazon


Summary

Web Server Programming


Introduction


You Can Skip This Lecture If...


The Pluggable Web


The CGI Protocol


From Server To CGI


From CGI To Server


MIME Types


Hello, CGI


Invoking a CGI


Generating Dynamic Content


Forms


Creating Forms


A Simple Form


Parameter Names


Handling Forms


Form Handling Example


Development Tips


Maintaining State


Maintaining State in Files


HTML Generation


HTML Templating


What About Concurrency?


File Locking


Implementing Locking


Who Are You?


Cookies


Creating Cookies


Cookie Example


Cookie Tips


Summmary


Exercises

Exercise 24.1:

One way to test a CGI application is to send it HTTP requests, and examine the responses. Write a program that takes a hostname, port, and partial URL as command-line parameters, and sends the URL to the server identified by the hostname and port. The program should display the status code, reason, headers, and response page (if any) that are returned by the web server.

For example, if your program is run as httptest localhost 80 /greeting.html, it should send a request for /greeting.html to a web server running on port 80 on the local machine, and display something like:

STATUS: 200
REASON: OK
HEADERS:
        content-length [49]
        server ['Apache/2.0.54 (Debian GNU/Linux) DAV/2 SVN/1.1.4 mod_python/3.1.3 Python/2.3.5]
        last-modified ['Wed, 19 Apr 2006 13:59:19 GMT']
        date ['Sun, 30 Apr 2006 14:12:13 GMT']
        content-type ['text/html']
PAGE:
        <html>
        <body>
        <h1>Hello, CGI!</h1>
        </body>
        </html>

What are the pros and cons of testing a CGI application this way?

Exercise 24.2:

Another way to test a CGI application is to construct a mock container to take the place of the web server. As described in the lecture, CGI applications read data from environment variables and standard input; by using the subprocess module described in the Integration, you can run the CGI yourself, passing it whatever test data you want. Write a program that does this. (For bonus marks, explain how you would test the mock container…)

Exercise 24.3:

The third way to test a CGI application is to construct a mock container that calls the CGI directly, rather than creating a new process and passing it data through environment variables and standard input. In order for this to work, the CGI program must import specially-crafted versions of the sys and os libraries that provide the CGI with data from the testing program, rather than reading it from the real sources:

if testing:
    import test_sys as sys
    import test_os as os
else:
    import sys, os

What other changes must be made to the CGI application to allow it to be tested this way? What are the pros and cons of making such changes?

Send comments

Security


Evil Exists


You Can Skip This Lecture If...


What Are We Trying to Do?


Technology Alone Is Not A Solution


More Ways Security Can Fail


How to Think About Security


Risk Assessment


Thinking Like A Villain


Example: Don't Trust Your Input


Attacking URLs


Leaking Information


SQL Injection


Attacking Defaults and Denial of Service


Phishing


Attacking Data Entry


Timed Attacks


Securing HTTP


Cryptography 101


Public-Key Cryptography


Sending and Receiving


Digital Signatures


Securing Login


Red Queen Race


It Isn't Just The Web


Summary

The Development Process


Introduction


You Can Skip This Lecture If...


Design vs. Agility


Project Lifecycle


Step 0: Vision


Step 1: Gathering Requirements


What Requirements Are and Aren't


Step 2: From Requirements to Features


Waterfalls And Why Not


The Spiral Model


Enter the Extremists


Pitfalls


Step 3: Analysis & Estimation


Where Estimates Come From


What Goes Into An A&E


Reviews


What Can Go Wrong with A&Es


Step 4: Prioritization


Step 5: Scheduling


Science Fiction Scheduling


Step 6: Development


Tracking Progress


Burn Rate


Step 7: Finishing


After the Party's Over


Summary


Exercises

Exercise 26.1:

Does your manager know when you expect to complete your current task? How inaccurate the schedule currently is?

Exercise 26.2:

Can you find out when your manager expects you to complete your current task (without asking her directly)? When team members expect to complete their current tasks (without asking them directly)? Who would be affected if you slipped a week?

Send comments

Teamware


Introduction


You Can Skip This Lecture If...


Motivation


Architecture


Getting Started


Blogging


Repository Browser


Viewing Revision History


Viewing Changesets


Mailing Lists


Less Is More


Managing Mail Addresses


Issue Tracker


Creating and Viewing Tickets


When To Create, How To Use


How to Write Tickets


Other Fields


Updating Tickets


Roadmap and Milestones


Priorities And Triage


Workflow


Wiki


Wiki Syntax


Saving Changes


Tying It All Together


Rules of the Road


More Rules


Summary


Exercises

Exercise 27.1:

Can you find out what bugs are currently being worked on? What feature requests have been deferred? Which files were changed to fix a problem? What fixes are currently being tested? How long it took to fix/implement something?

Exercise 27.2:

What is the status of the overnight build? The overnight regression tests? The issue database? The team's discussions?

Send comments

Backward, Forward, and Sideways


Introduction


Classic Mistakes


Branching, Merging, and Tagging


Managing Branches


Patching


A Better Way to Build


SCons Example


Persistence


Pickling Example


Object-Relational Mapping


Web Development Frameworks


Refactoring


Refactoring Examples


More Refactoring Examples


Refactoring Tools


Code Reviews


Reading Code


Code Review Checklist


User Interface Design


Paper Prototyping


Where To Go Next


The Rules


Conclusion

Acknowledgments


Support


Major Contributors


Comments and Corrections


Prior Art


Dedication

Bibliography

[Agans 2002]: David J. Agans: Debugging. American Management Association, 2002, 0814471684.
Its first sentence says, “This book tells you how to find out what's wrong with stuff, quick,” and that's exactly what it does. In fifteen (very) short chapters, the author presents nine simple rules to help you track down and fix problems in software, hardware, or anything else. His war stories are entertaining (although I think one or two are urban myths), and his advice is eminently practical.
[Andrews & Whittaker 2006]: Mike Andrews and James A. Whittaker: How to Break Web Software. Addison-Wesley, 2006, 0321369440.
This practical companion to [Whittaker 2003] catalogs things you can do to break web-based applications.
[Beck & Cunningham 1989]: Kent Beck and Ward Cunningham: "A Laboratory for Teaching Object-Oriented Thinking", SIGPLAN Notices, vol. 24, no. 10, pp. -, 1989.
The first description of CRC cards.
[Boehm 1988]: Barry Boehm: "A Spiral Model of Software Development and Enhancement", IEEE Computer, vol. , no. , pp. -, 1988.
Boehm's landmark description of spiral software development.
[Brand 1995]: Stewart Brand: How Buildings Learn. Penguin USA, 1995, 0140139966.
This beautiful, thought-provoking book starts with the observation that most architects spend their time re-working or extending existing buildings, rather than creating new ones from scratch. Of course, if Brand had written “program” instead of “building”, and “programmer” where he'd written “architect”, everything he said would have been true of computing as well. A lot of software engineering books try to convey the same message about allowing for change, but few do it so successfully. By presenting examples ranging from the MIT Media Lab to a one-room extension to a house, Brand encourages us to see patterns in the way buildings change (or, to adopt Brand's metaphor, the way buildings learn from their environment and from use). Concurrently, he uses those insights to argue that since buildings are always going to be modified, they should be designed to accommodate unanticipated change.
[Brooks 1995]: Frederick P. Brooks: The Mythical Man Month: Essays on Software Engineering. Addison-Wesley, 1995, 0201835959.
The classic text in software engineering, most famous for its discussion of how adding people to a project that's late will only make it later.
[Castro 2002]: Elizabeth Castro: HTML for the World Wide Web. Peachpit Press, 2000, 0321130073.
A clean, clear, comprehensive guide to creating HTML for the web, with good coverage of Cascading Style Sheets (CSS).
[Castro 2000]: Elizabeth Castro: XML for the World Wide Web. Peachpit Press, 2000, 0201710986.
Like other books in Peachpit's Visual Quickstart series, this one is beautifully designed, and easy to read without ever being condescending. Its 16 chapters and 4 appendices are organized into 1- and 2-page explanations of particular topics, from writing non-empty elements to namespaces, schemas, and XML transformation. Throughout, Castro strikes a perfect balance between “what”, “why”, and “how”, and provides a surprising amount of detail without ever overwhelming the reader.
[Chase & Simon 1973]: W.G. Chase and H.A. Simon: "Perception in chess", Cognitive Psychology, vol. 4, no. , pp. 55-81, 1973.
The original paper comparing the performance of novice and master chess players when confronted with actual and random positions.
[Collins-Sussman et al 2004]: Ben Collins-Sussman, Brian W. Fitzpatrick, and C. Michael Pilato: Version Control with Subversion. O'Reilly, 2004, 0596004486.
A good tutorial and reference guide for Subversion, which is also Version Control with Subversion.
[Doar 2005]: Matt Doar: Practical Development Environments. O'Reilly, , 0596007965.
Matt Doar has produced a practical guide to what should be in every team's toolbox, how competing entries stack up, and how they ought to be used. This book covers everything from configuration management tools like CVS and Subversion, to build tools (make, GNU's Autotools, Ant, Jam, and SCons), various testing aids, bug tracking systems, documentation generators, and we're still only at the halfway mark. He names names, provides links, and treats free and commercial offerings on equal terms. My copy currently has 28 folded-down corners, which is 28 more than most books get.
[Eick et al 2001]: Stephen G. Eick, Todd L. Graves, Alan F. Karr, J.S. Marron, and Audris Mockus: "Does Code Decay? Assessing the Evidence from Change Management Data", IEEE Transactions on Software Engineering, vol. 27, no. 1, pp. -, 2001.
Analyzes the evolution of several million lines of telephone switching software over fifteen years to show that code quality, comprehensibility, and maintainability decline over time.
[Fagan 1986]: Michael E. Fagan: "Advances in Software Inspections", IEEE Transactions on Software Engineering, vol. 12, no. 7, pp. -, 1986.
Empirical data showing that code reviews are the most effective way known to find bugs.
[Fehily 2006]: Chris Fehily: Python. Peachpit Press, 2006, 0321423135.
A gentle introduction to Python, beautifully typeset, with lots of helpful examples.
[Fehily 2003]: Chris Fehily: SQL. Peachpit Press, 2003, 0321118030.
This very readable book describes the subset of SQL that covers most real-world needs. While the book moves a little slowly in some places, the examples are exceptionally clear.
[Feldman 1979]: Stuart I. Feldman: "Make—A Program for Maintaining Computer Programs", Software: Practice and Experience, vol. 9, no. 4, pp. 255-265, 1979.
The original description of Make. Last time I checked, Stu Feldman was a vice president at IBM, which shows you just how far a good tool can take you…
[Feathers 2005]: Michael C. Feathers: Working Effectively with Legacy Code. Prentice-Hall PTR, 2005, 0131177052.
Most programmers spend most of their time fixing bugs, porting to new platforms, adding new features—in short, changing existing code. If that code is exercised by unit tests, then changes can be made quickly and safely; if it isn't, they can't, so your first job when you inherit legacy code should be to write some. That's where this book comes in. What to know three different ways to inject a test into a C++ class without changing the code? They're here. Want to know which classes or methods to focus testing on? Read his discussion of pinch points. Need to break inter-class dependencies in Java so that you can test one module without having to configure the entire application? That's in here too, along with dozens of other useful bits of information. Everything is illustrated with small examples, all of them clearly explained and to the point. There are lots of simple diagrams, and a short glossary; all that's missing is hype.
[Fogel 2005]: Karl Fogel: Producing Open Source Software. O'Reilly, 2005, 0596007590.
A community is more than just a bunch of people. It's a shared set of values, and rules for how to behave. By this standard, the open source community isn't just what some programmers choose to do with their time, and why; it's also how they do it. This book is an excellent guide to that “how”. Every page offers practical advice; every point is made clearly and concisely, and clearly draws upon the author's extensive personal experience. Want to know how to earn commit privileges on a project? It's here. Do you and other project members have irreconcilable differences? Fogel explains when and how to fork, and what the pros and cons are. Want to get your project more attention? Want to take something closed, and open it up? It's all here, and much more.
[Fowler 1999]: Martin Fowler: Refactoring. Addison-Wesley Professional, 1999, 0201485672.
Like architects, most programmers spend most of their time renovating, rather than creating something completely new on a blank sheet of paper. This book presents and analyzes patterns that come up again and again when programs are being reorganized. Some of these are well-known, such as placing common code in a utility method. Others, such as replacing temporary objects with queries, or replacing constructors with factory methods, are subtler, but no less important. Each entry includes a section on motivation, the mechanics of actually carrying out the transformation, and an example in Java.
[Friedl 2002]: Jeffrey E. F. Friedl: Mastering Regular Expressions. O'Reilly, 2002, 0596002890.
The definitive programmer's guide to regular expressions.
[Gamma et al 1995]: Erich Gamma, Richard Helm, Ralph Johnson, and John Vlissides: Design Patterns. Addison-Wesley, 1995, 0201633612.
The book that started the software design patterns movement. Much of the discussion has been superseded by more recent books, and the use of C++ and Smalltalk for examples feels a little dated, but it is still a landmark in programming.
[Glass 2002]: Robert L. Glass: Facts and Fallacies of Software Engineering. Addison-Wesley Professional, 2002, 0321117425.
I really wish someone had given me something like this book when I took my first programming job. If nothing else, it would have been a better way to start thinking about the profession I had stumbled into than the “everybody knows” factoids that I soaked up at coffee time. Some of what he says is well-known: good programmers are up to N times better than bad ones (his value for N is 28), reusable components are three times harder to build than non-reusable ones, and so on. Other facts aren't part of the zeitgeist, though they should be. For example, most of us know that maintenance consumes 40-80% of software costs, but did you know that roughly 60% of that is enhancements, rather than bug fixes? Or that if more than 20-25% of a component has to be modified, it is more efficient to re-write it from scratch? Best of all, Glass backs up every statement he makes with copious references to the primary literature; if you still disagree with him, you'd better be sure you have as much evidence for your point of view as he has for his.
[Goerzen 2004]: John Goerzen: Foundations of Python Network Programming. APress, 2004, 1590593715.
This book looks at how to handle several common protocols, including HTTP, SMTP, and FTP. Goerzen also doesn't delve as deeply into their internals, but instead on how to build clients that use them. His approach is to build solutions to complex problems one step at a time, explaining each addition or modification along the way. He occasionally assumes more background knowledge than most readers of this book are likely to have, but only occasionally, and makes up for it by providing both clear code, and clear explanations of why this particular function has to do things in a particular order, or why that one really ought to be multithreaded.
[Good 2005]: Nathan A. Good: Regular Expression Recipes. APress, 2005, 159059441X.
A great how-to for regular expressions, with examples in many different languages.
[Gunderloy 2004]: Mike Gunderloy: Coder to Developer. Sybex, 2004, 078214327X.
This practical, readable book is subtitled “Tools and Strategies for Delivering Your Software”, and that's exactly what it's about. Project planning, source code control, unit testing, logging, and build management are all there. Importantly, so are newer topics, like building plugins for your IDE, code generation, and things you can do to protect your intellectual property. Everything is clearly explained, and illustrated with well-chosen examples. While the focus is definitely on .NET, Gunderloy covers a wide range of other technologies, both proprietary and open source. I'm already using two new tools based on references from this book, and plan to make the chapter on “Working with Small Teams” required reading for my students.
[Hammond 1994]: Nick Hammond: "Software Carpentry --- A Tool-Based Approach to Monte Carlo Radiation Transport", Proc. 8th Int'l Conference on Radiation Shielding, vol. , no. , pp. -, 1994.
A prior use of the phrase “software carpentry”.
[Harold 2004]: Elliotte Rusty Harold: Effective XML. Addison-Wesley, 2004, 0321150406.
This book explains which of XML's many features should be used when: Item 12 tells you to store metadata in attributes, and then spends six pages explaining why, while Item 24 analyzes the strengths and weaknesses of various schema languages, and Item 38 covers character set encodings. It's more than most developers will ever want to know, but when you need it, you really need it.
[Hock 2004]: Roger R. Hock: Forty Studies that Changed Psychology. Prentice Hall, 2004, 0131147293.
In forty short chapters, Hock describes the turning points in our understanding of how our minds work. The book isn't just about psychology; you'll also learn a lot about how science gets done, and about the scientists who do it.
[Humphrey 1996]: Watts S. Humphrey: Introduction to the Personal Software Process. Addison-Wesley, 1996, 0201548097.
A methodology for improving programmers' productivity by having them record and track just about everything they do. The idea has a lot of merit, but in practice, the cost of record keeping can outweigh the benefits.
[Hunt & Thomas 1999]: Andrew Hunt and David Thomas: The Pragmatic Programmer. Addison-Wesley, 1999, 020161622X.
This book is about those things that make up the difference between typing in code that compiles, and writing software that reliably does what it's supposed to. Topics range from gathering requirements through design, to the mechanics of coding, testing, and delivering a finished product. The second section, for example, covers “The Evils of Duplication”, “Orthogonality”, “Reversibility”, “Tracer Bullets”, “Prototypes and Post-It Notes”, and “Domain Languages”, and illuminates each with plenty of examples and short exercises.
[Johnson 2000]: Jeff Johnson: GUI Bloopers. Morgan Kaufmann, 2000, 1558605827.
Most books on GUI design are long on well-meaning aesthetic principles, but short on examples of what it means to put those principles into practice. In contrast, GUI Bloopers presents case study after case study: what's wrong with this dialog? What should its creators have done instead. And, most importantly, why? The net effect is to teach all of the same principles that other books try to, but in a grounded, understandable way.
[Kernighan & Pike 1984]: Brian W. Kernighan and Rob Pike: The Unix Programming Environment. Prentice Hall, 1984, 013937681X.
I have long believed that this book is the real secret to Unix's success. It doesn't just show readers how to use Unix—it explains why the operating system is built that way, and how its “lots of little tools” philosophy keeps simple tasks simple, while making hard ones doable.
[Kernighan & Ritchie 1998]: Brian W. Kernighan and Dennis Ritchie: The C Programming Language. Prentice Hall PTR, 1998, 0131103628.
The classic description of the one programming language every serious programmer absolutely, positively has to learn.
[Knuth 1998]: Donald E. Knuth: The Art of Programming. Addison-Wesley, 1998, 0201485419.
The lifework of the man who invented many of the basic concepts of algorithm analysis, these massive tomes are like Everest: awe-inspiring, but not for the weak of heart. Most readers will find [Sedgewick 2001] much more approachable.
[Langtangen 2004]: Hans P. Langtangen: Python Scripting for Computational Science. Springer-Verlag, 2004, 3540435085.
The book's aim is to show scientists and engineers with little formal training in programming how Python can make their lives better. Regular expressions, numerical arrays, persistence, the basics of GUI and web programming, interfacing to C, C++, and Fortran: it's all here, along with hundreds of short example programs. Some readers may be intimidated by the book's weight, and the dense page layout, but what really made me blink was that I didn't find a single typo or error. It's a great achievement, and a great resource for anyone doing scientific programming.
[Lutz & Ascher 2003]: Mark Lutz and David Ascher: Learning Python. O'Reilly, 2003, 0596002815.
This is not only the best introduction to Python on the market, it is one of the best introductions to any programming language that I have ever read. Lutz and Ascher cover the entire core of the language, and enough of its advanced features and libraries to give readers a feeling for just how powerful Python is. In keeping with the spirit of the language itself, their writing is clear, their explanations lucid, and their examples well chosen.
[Margolis & Fisher 2002]: Jane Margolis and Allan Fisher: Unlocking the Clubhouse. MIT Press, 2002, 0262133989.
This book describes a project at Carnegie-Mellon University that tried to figure out why so few women become programmers, and what can be done to correct the imbalance. Its first six chapters describe the many small ways in which we are all, male and female, are conditioned to believe that computers are “boy's things”. Sometimes it's as simple as putting the computer in the boy's room, because “he's the one who uses it most”. Later on, the “who needs a social life?” atmosphere of undergraduate computer labs drives many women away (and many men, too). The last two chapters describe what the authors have done to remedy the situation at high schools and university. This work proves that by being conscious of the many things that turn women off computing, and by viewing computer science from different angles, we can attract a broader cross-section of society, which can only make our discipline a better place to be. The results are impressive: female undergraduate enrolment at CMU rose by more than a factor of four during their work, while the proportion of women dropping out decreased significantly.
[Martelli 2005]: Alex Martelli, Anna Ravenscroft, and David Ascher: Python Cookbook. O'Reilly, 2005, 0596007973.
A useful reference for every serious Python programmer, this book is a collection of tips and tricks, some very simple, others so complex that they require careful line-by-line reading. The book's companion web site is updated regularly.
[Mason 2005]: Mike Mason: Pragmatic Version Control Using Subversion. Pragmatic Bookshelf, 2005, 0974514063.
Yet another book from the folks at Pragmatic, this one is everything you'll ever need to know about Subversion, which is on its way to becoming the version control system of choice for open source development.
[McConnell 2004]: Steve McConnell: Code Complete. Microsoft Press, 2004, 0735619670.
This classic is a handbook of do's and don'ts for working programmers. It covers everything from how to avoid common mistakes in C to how to set up a testing framework, how to organize multi-platform builds, and how to coordinate the members of a team. In short, it is everything I wished someone had told me before I started my first full-time programming job.
[McConnell 1996]: Steve McConnell: Rapid Development. Microsoft Press, 1996, 1556159005.
This book describes what it takes to develop robust code quickly, what mistakes are often made in the name of rapid development, and how to identify and analyze potential risks. It includes a list of 25 best practices, and discusses things that most other books leave out (like recovering from disasters and dealing with impossible demands). Unlike most “how to do it better” books, it isn't try to sell any particular practice or style, which adds even more weight to McConnell's carefully balanced opinions.
[Pilgrim 2004]: Mark Pilgrim: Dive Into Python. APress, 2004, 1590593561.
A good introduction to Python, which is also available on-line at Dive Into Python.
[Prechelt 2000]: Lutz Prechelt: "An Empirical Comparison of Seven Programming Languages", IEEE Computer, vol. 33, no. 10, pp. 23-29, 2000.
Some hard data on the relative effectiveness of C, C++, Java, Perl, Python, Rexx, and Tcl.
[Ray & Ray 2003]: Deborah S. Ray and Eric J. Ray: Unix. Peachpit Press, 2003, 0321170105.
A gentle introduction to Unix, with many examples.
[Robinson 2005]: Evan Robinson: "Why Crunch Mode Doesn't Work: 6 Lessons", http://www.igda.org/articles/erobinson_crunch.php (viewed 2006-02-26).
An incisive summary of the effect of fatigue on human productivity, the conclusion of which is that crunch mode winds up making projects later.
[Rosen 2005]: Lawrence Rosen: Open Source Licensing: Software Freedom and Intellectual Property Law. Prentice Hall PTR, 2005, 0131487876.
If you're involved in open source software in any way, shape, or form, then this book is a useful read. Its author is intimately familiar with the field; here, he lays out a general background for discussion of intellectual property, and the history of free/open source software, then discusses what various popular licenses actually mean. The book closes with chapters on topics such as how to choose a license, litigation, and standards. The writing is clear—exceptionally so by legal standards—and he takes time to explain terms and assumptions that most software developers won't have encountered before. What's more, he doesn't seem to have any particular axes to grind: the book is US-centric, but his treatment of the various options open to today's developers is very even-handed.
[Royce 1970]: W. W. Royce: "Managing the Development of Large Software Systems", Proceedings of IEEE WESCON, vol. , no. , pp. -, 1970.
The original description of the waterfall model of software development.
[Schneier 2003]: Bruce Schneier: Beyond Fear. Springer, 2003, 0387026207.
A thought-provoking look at how we are encouraged to think about security, and how much security is actually desirable. For example, he explains why security systems must not just work well, but fail well, and why secrecy often undermines security instead of enhancing it.
[Schneier 2005]: Bruce Schneier: Secrets and Lies. Wiley, 2005, 0471453803.
Having written the standard book on cryptography, Schneier now argues that technology alone can't solve most real security problems. The book covers systems and threats, the technologies used to protect and intercept data, and strategies for proper implementation of security systems. Rather than blind faith in prevention, Schneier advocates swift detection and response to an attack, while maintaining firewalls and other gateways to keep out the amateurs.
[Sedgewick 2001]: Robert Sedgewick: Algorithms in C, Parts 1-5. Addison-Wesley Professional, 2001, 0201756080.
Far too many programmers still think and code as if resizeable vectors and string-to-pointer hash tables were the only data structures ever invented. These books are a guide to all the other conceptual tools that working programmers ought to have at their fingertips, from sorting and searching algorithms to different kinds of trees and graphs. The analysis isn't as deep as that in Knuth's monumental The Art of Programming, but that makes the book far more accessible. And while the author's use of C may seem old-fashioned in an age of Java and C#, it does ensure that nothing magical is hidden inside an overloaded operator or virtual method call.
[Skoudis 2004]: Ed Skoudis: Malware. Prentice-Hall, 2004, 0131014056.
This 647-page tome is a survey of harmful software, from viruses and worms through Trojan horses, root kits, and even malicious microcode. Each threat is described and analyzed in detail, and the author gives plenty of examples to show exactly how the attack works, and how to block (or at least detect) it. The writing is straightforward, and the case studies in Chapter 10 are funny without being too cute.
[Spinellis 2006]: Diomidis Spinellis: Code Quality. Addison-Wesley, 2006, 0321166078.
A companion to the same author's earlier [Spinellis 2003], this book concentrates on what distinguishes good code from bad. The first one was great; this one is even better.
[Spinellis 2003]: Diomidis Spinellis: Code Reading. Addison-Wesley, 2003, 0201799405.
The book's preface says it best: “The reading of code is likely to be one of the most common activities of a computing professional, yet it is seldom taught as a subject or formally used as a method for learning how to design and program.” Spinellis isn't the first person to make this point, but he is the first person I know of to do something about it. In this book, he walks through hundreds of examples of C, C++, Java, and Perl, drawn from dozens of Open Source projects such as Apache, NetBSD, and Cocoon. Each example illustrates a point about how programs are actually built. How do people represent multi-dimensional tables in C? How do people avoid nonreentrant code in signal handlers? How do they create packages in Java? How can you recognize that a data structure is a graph? A hashtable? That it might contain a race condition? And on, and on, real-world issue after real-world issue, each one analyzed and cross-referenced. There's also a section on additional documentation sources, and a chapter on tools that can help you make sense of whatever you've just inherited.
[Steele 1999]: Guy L. Steele Jr.: "Growing a Language", Journal of Higher-Order and Symbolic Computation, vol. 12, no. 3, pp. 221-236, 1999.
The best (and wittiest) discussion ever published of how programming languages ought to evolve.
[Spolsky 2004]: Joel Spolsky: Joel on Software. APress, 2004, 1590593898.
Joel on Software collects some of the witty, insightful articles Spolsky has blogged over the past few years. His observations on hiring programmers, measuring how well a development team is doing its job, the API wars, and other topics are always entertaining and informative. Over the course of forty-five short chapters, he ranges from the specific to the general and back again, tossing out pithy observations on the commoditization of the operating system, why you need to hire more testers, and why NIH (the not-invented-here syndrome) isn't necessarily a bad thing.
[Thompson & Chase 2005]: Herbert H. Thompson and Scott G. Chase: The Software Vulnerability Guide. Charles River Media, 2005, 1584503580.
My current favorite guide to computer security for programmers, this books walks through each major family of security holes in turn: faulty permission models, bad passwords, macros, dynamic linking and loading, buffer overflow, format strings and various injection attacks, temporary files, spoofing, and more.
[Ullman & Liyanage 2004]: Larry Ullman and Marc Liyanage: C Programming. Peachpit Press, 2004, 0321287630.
A gentle introduction to C, with many examples.
[Whittaker 2003]: James A. Whittaker: How to Break Software. Addison-Wesley, 2003, 0201796198.
A slim catalog of things testers can do to break software.
[Whittaker & Thompson 2004]: James A. Whittaker and Herbert H. Thompson: How to Break Software Security. Addison-Wesley, 2004, 0321194330.
This practical companion to [Whittaker 2003] catalogs things you can do to test (and break) security measures in programs.
[Williams & Kessler 2003]: Laurie Williams and Rober Kessler: Pair Programming Illuminated. Addison-Wesley, 2003, 0201745763.
A combination of an instruction manual, a summary of the authors' empirical studies of pair programming's effectiveness, and advocacy, this book is the reference guide for anyone who wants to introduce pair programming into their development team.
[Wilson 2005]: Greg Wilson: Data Crunching. Pragmatic Bookshelf, 2005, 0974514071.
Every day, all around the world, programmers have to recycle legacy data, translate from one vendor's proprietary format into another's, check that configuration files are internally consistent, and search through web logs to see how many people have downloaded the latest release of their product. It may not be glamorous, but knowing how to do it efficiently is essential to being a good programmer. This book describes the most useful data crunching techniques, explains when you should use them, and shows how they will make your life easier.
[Zeller 2006]: Andreas Zeller: Why Programs Fail: A Guide to Systematic Debugging. Morgan Kaufmann, 2006, 1558608664.
This well-written, copiously-illustrated book from the creator of DDD (a graphical front end for the GNU debugger) is a survey of current and next-generation debugging tools. Some are old friends, like bug trackers and symbolic debuggers. Others are new: there's a detailed look at the pros and cons of replay debugging, an automatic divide-and-conquer tool that can strip test cases down to their essentials, and a whole chapter on how dependency analysis and program slicing can be used to isolate faults. If, ten years from now, debuggers have taken a much-needed leap forward, much of the credit will go to this book.

Glossary

A

B

C

D

E

F

G

H

I

J

K

L

M

N

O

P

Q

R

S

T

U

V

W

X

Y

Z

Online Resources

List of Figures

List of Tables

Syllabus

Introduction: introduction, self assessment, scientific programming today, comparison with experimental science, comparison with industry, solutions, changes on the horizon, course content, what you will need, open source vs. commercial tools, contributing, recommended reading, typographic conventions, summary.

Shell Basics: introduction, the shell, shell vs. operating system, file system, absolute and relative paths, basic navigation commands, command execution cycle, command flags, creating files and directories, basic tools, summary.

More Shell: introduction, wildcards, input, output, and redirection, pipes, environment variables, configuration, the PATH variable, file ownership and permissions, directory permissions, changing permissions, Windows ownership and permission, some more advanced tools, summary.

Version Control: introduction, collaboration, version control systems, choosing a version control system, basic operations, command line and GUI clients, resolving conflicts, starvation, binary files, reverting, rolling back, creating repositories and checking out working copies, Subversion command reference, reading Subversion output, summary.

Automated Builds: introduction, build tool requirements, introducing Make, basic features, structure of a Makefile, handling multiple targets, defining phony targets, dependencies, updating dependencies, conventions, automatic variables, pattern rules, dependencies once again, macros, getting information from the outside world, functions, pros and cons, alternatives, summary.

Basic Scripting: motivation, Python's Strengths, Python's Weaknesses, why Python?, sturdy vs. nimble execution cycle, running Python, shortcuts, variables, printing, quoting, converting values to strings, escape sequences, numbers, arithmetic, Booleans, short-circuit evaluation, comparisons, conditionals, why indentation?, while loops, break and continue, string formatting, format specifiers, supported formats, summary.

Strings, Lists, and Files: introduction, strings, slicing, bounds checking rules, negative indices, methods, string methods, chaining method calls, membership, lists, modifying lists, concatenation, deletion, list methods, for loops, ranges, list membership, nesting lists, aliasing, tuples, multi-valued assignment, unpacking structures loops, file I/O, file I/O example, looping over files, summary.

Functions and Libraries: introduction, defining functions, returning values, variable scope, aliasing, default parameter values, functions are objects, function attributes, creating modules, module scope, other ways to import, import executes statements, the __name__ variable, system library, command-line arguments, standard I/O, search path, exiting the program, math library, file system programming, file and directory status, The os.path Module, summary.

Style: why read code, cognition, Python style guide, naming, scope and size, example, function length, determining functionality, reading techniques, idioms, style-checking tools, documentation, traceability, embedding documentation, docstrings.

Quality Assurance: introduction, limits to testing, terminology, test results and specifications, structuring tests, simple example, try and except, simple exception example, exceptions, exception hierarchy, exception handler stack, raising exceptions, when and how to use exceptions, handling errors in tests, test-driven design, design by contract, assertions, defensive programming, summary.

Sets, Dictionaries, and Complexity: introduction, sets, set operations, example, implementation and implications, why set elements must be immutable, frozen sets, language design, quantifying efficiency, algorithmic complexity, motivating dictionaries, working with dictionaries, dictionary methods, counting frequency, ordering, inverting, dictionary string formatting, variable-length argument lists, summary.

Debugging: introduction, what's wrong with print statements, symbolic debuggers, debugger features, kinds of debuggers, integrated development environments, command-line debuggers, inspecting values, controlling execution, how debuggers work, implementing breakpoints, advanced operations, logging, logging levels, logging example, Agans' Rules, get it right the first time, what is it supposed to do?, is it plugged in?, make it fail, divide and conquer, change one thing at a time, write it down, be humble, summary.

Object-Oriented Programming: introduction, abstract data types, terminology, a simple class, methods, members, encapsulation, constructors, constructor style, special methods, inheritance, inheritance example, overriding methods, polymorphism, duck typing, Liskov substitution principle, ecosystem example, CRC cards, summary.

More on Objects: introduction, overriding built-in functions, operator overloading, right-hand and left-hand operators, other special methods, sparse vectors, semantics of vector length, vector behavior, dot product, addition, testing, static data members, static methods, design patterns, singleton pattern, visitor pattern, abstract factory pattern, command pattern, other patterns, summary.

Unit Testing: introduction, unit testing frameworks, big picture, implementing checks, simple example, running sum example, cost effectiveness, setup and teardown, testing exceptions, testing I/O, stubs and mock objects, test performance, choosing tests, rectangle overlap example, what to test first, summary.

Regular Expressions: matching constant strings, matching alternatives, precedence, escaping special characters, raw strings, sequences, optional elements, character sets, common abbreviations, special cases, anchors, extracting matches, match objects, match groups, compiled REs, finding all matches, other patterns, summary.

Binary Data: introduction, why use binary, representing numbers, two's complement, bitwise operators, shifting, setting and clearing bits, bit flags, floating point numbers, floating point spacing, floating point roundoff, binary I/O, binary I/O mode, packing data structures, packing, unpacking data, struct module, hexadecimal characters, format specifiers, calculating sizes, endianness, variable-length data, dynamic formats, metadata, metadata file structure, summary.

XML: introduction, history, formatting rules, text, XHTML, critique, attributes, when to use attributes, more XHTML tags, lists and tables, images, links, DOM, basic features of DOM, DOM tree example, creating a DOM tree, converting to text, other ways to create documents, details, finding nodes, walking a DOM tree, modifying a DOM tree.

Relational Databases: history, when to use a database, experimental data example, Using SQL, creating tables, inserting data, simple selection, sorting, selection, joins, ID translation example, keys and constraints, eliminating duplicates, aggregation, grouping, self joins, null, database design, nested queries, nested query example, further examples, application programming, Python database example, concurrency, transactions, transaction example, using transactions, testing, advanced topics, summary.

Spreadsheets: introduction, getting started, entering data, formatting data, formulas, replicating formulas, built-in functions, commonly-used functions, dependencies, conditionals, lookup tables, lookup table example, absolute references, larger data set, creating charts, customizing charts, creating a log-log chart, analysis, programming spreadsheets, summary.

Integration: introduction, running external programs, subprocess module, running in place, running with arguments, capturing output, providing input, deadlock, pros and cons, integrating C into Python, Python object structure, calling conventions, boilerplate, loading and calling, wrapping C++, SWIG, Embedding Python in C, loading modules, plugin frameworks, manual loading, application, namespaces, summary.

Web Client Programming: introduction, component object models, concurrency, partial failure, underlying protocols, sockets, client/server vs. peer-to-peer, socket client example, socket server example, HTTP, HTTP request line, HTTP headers, HTTP body, HTTP response, HTTP example, urllib, urllib example, building a spider, parameterizing requests, URL encoding, encoding example, screen scraping, web services, Amazon example.

Web Server Programming: introduction, motivation for CGI, CGI, passing information to the CGI, passing information back, MIME types, basic CGI, invoking the basic CGI, generating dynamic content, forms, creating forms, example form, parameter names, handling form data, form handling example, development tips, server-side state, maintaining state in files, HTML templating, concurrency, file locking, the problem of state, cookies, creating cookies, cookie example, cookie tips.

Security: introduction, goals, limitations of technical solution, terminology, risk assessment, cataloguing attacks, example attack, leaking information, SQL injection, default settings and denial of service, phishing, attacking data entry, timed attacks, secure HTTP, basic cryptography, public-key cryptography, public-key cryptography in action, digital signatures, securing login, other areas of insecurity.

The Development Process: introduction, design vs. agility, project lifecycle, vision statement, gathering requirements, from requirements to features, waterfall model, spiral model, Extreme Programming, analysis & estimation, time estimates, A&E format, reviews, prioritization, scheduling, science fiction scheduling, development, tracking progress, burn rate, winding up, post mortem, summary.

Teamware: motivation, DrProject architecture, event log, weblogs, repository browser, mailing lists, whitelisting mail addresses, issue tracker, creating and viewing tickets, ticket guidelines, writing useful tickets, updating tickets, roadmap and milestones, priorities and triage, more complex workflows, wiki, wiki syntax, editing wiki pages, wiki links, rules of the road.

Backward, Forward, and Sideways: introduction, classic mistakes, branching, merging, and tagging, creating patches, SCons, SCons example, persistence, pickling example, object-relational mapping, web development frameworks, refactoring, refactoring examples, refactoring tools, code reviews, reading code, code review checklist, UI design, paper prototyping, more reading, rules of programming.