DekGenius.com
[ Team LiB ] Previous Section Next Section

5.2 Strings in Action

Once you've written a string, you will almost certainly want to do things with it. This section and the next two demonstrate string basics, formatting, and methods.

5.2.1 Basic Operations

Let's begin by interacting with the Python interpreter to illustrate the basic string operations listed in Table 5-1. Strings can be concatenated using the + operator, and repeated using the * operator:

% python
>>> len('abc')         # Length: number items 
3
>>> 'abc' + 'def'      # Concatenation: a new string
'abcdef'
>>> 'Ni!' * 4          # Repitition: like "Ni!" + "Ni!" + ...
'Ni!Ni!Ni!Ni!'

Formally, adding two string objects creates a new string object, with the contents of its operands joined; repetition is like adding a string to itself a number of times. In both cases, Python lets you create arbitrarily sized strings; there's no need to predeclare anything in Python, including the sizes of data structures.[3] The len built-in function returns the length of strings (and other objects with a length).

[3] Unlike C character arrays, you don't need to allocate or manage storage arrays when using Python strings. Simply create string objects as needed, and let Python manage the underlying memory space. Python reclaims unused objects' memory space automatically, using a reference-count garbage collection strategy. Each object keeps track of the number of names, data-structures, etc. that reference it; when the count reaches zero, Python frees the object's space. This scheme means Python doesn't have to stop and scan all of memory to find unused space to free (an additional garbage component also collects cyclic objects).

Repetition may seem a bit obscure at first, but it comes in handy in a surprising number of contexts. For example, to print a line of 80 dashes, you can either count up to 80 or let Python count for you:

>>> print '------- ...more... ---'      # 80 dashes, the hard way
>>> print '-'*80                        # 80 dashes, the easy way

Notice that operator overloading is at work here already: we're using the same + and * operators that are called addition and multiplication when using numbers. Python does the correct operation, because it knows the types of objects being added and multiplied. But be careful: this isn't quite as liberal as you might expect. For instance, Python doesn't allow you to mix numbers and strings in + expressions: 'abc'+9 raises an error, instead of automatically converting 9 to a string.

As shown in the last line in Table 5-1, you can also iterate over strings in loops using for statements and test membership with the in expression operator, which is essentially a search:

>>> myjob = "hacker"
>>> for c in myjob: print c,       # Step through items.
...
h a c k e r
>>> "k" in myjob                   # 1 means true (found).
1
>>> "z" in myjob                   # 0 means false (not found).
0

The for loop assigns a variable to successive items in a sequence (here, a string), and executes one or more statements for each item. In effect, the variable c becomes a cursor stepping across the string here. But further details on these examples will be discussed later.

5.2.2 Indexing and Slicing

Because strings are defined as an ordered collection of characters, we can access their components by position. In Python, characters in a string are fetched by indexing—providing the numeric offset of the desired component in square brackets after the string. You get back the one-character string.

As in the C language, Python offsets start at zero and end at one less than the length of the string. Unlike C, Python also lets you fetch items from sequences such as strings using negative offsets. Technically, negative offsets are added to the length of a string to derive a positive offset. You can also think of negative offsets as counting backwards from the end.

>>> S = 'spam'
>>> S[0], S[-2]               # Indexing from front or end
('s', 'a')
>>> S[1:3], S[1:], S[:-1]     # Slicing: extract section
('pa', 'pam', 'spa')

The first line defines a four-character string and assign it the name S. The next line indexes it two ways: S[0] fetches the item at offset 0 from the left (the one-character string 's'), and S[-2] gets the item at offset 2 from the end (or equivalently, at offset (4 + -2) from the front). Offsets and slices map to cells as shown in Figure 5-1.[4]

[4] More mathematically minded readers (and students in my classes) sometimes detect a small asymmetry here: the leftmost item is at offset 0, but the rightmost is at offset -1. Alas, there is no such thing as a distinct -0 value in Python.

Figure 5-1. Using offsets and slices
figs/lpy2_0501.gif

The last line in the example above is our first look at slicing. Probably the best way to think of slicing is that it is a form of parsing (analyzing structure), especially when applied to strings—it allows us to extract an entire section (substring) in a single step. Slices can extract columns of data, chop off leading and trailing text, and more.

Here's how slicing works. When you index a sequence object such as a string on a pair of offsets seperated by a colon, Python returns a new object containing the contiguous section identified by the offsets pair. The left offset is taken to be the lower bound (inclusive), and the right is the upper bound (noninclusive). Python fetches all items from the lower bound, up to but not including the upper bound, and returns a new object containing the fetched items. If omitted, the left and right bound default to zero, and the length of the object you are slicing, respectively.

For instance, in the example above, S[1:3] extracts items at offsets 1 and 2. It grabs the second and third items, and stops before the fourth item at offset 3. Next S[1:] gets all items past the first—the upper bound defaults to the length of the string. Finally, S[:-1] fetches all but the last item—the lower bound defaults to zero, and -1 refers to the last item, non-inclusive.

This may seem confusing on first glance, but indexing and slicing are simple and powerful to use, once you get the knack. Remember, if you're unsure about what a slice means, try it out interactively. In the next chapter, you'll see that it's also possible to change an entire section of a certain object in one step, by assigning to a slice. Here's a summary of the details for reference:

5.2.2.1 I ndexing (S[i]) fetches components at offsets
  • The first item is at offset 0.

  • Negative indexes mean to count backwards from the end or right.

  • S[0] fetches the first item.

  • S[-2] fetches the second from the end (like S[len(S)-2]).

5.2.2.2 S licing (S[i:j]) extracts contiguous sections of a sequence
  • The upper bound is noninclusive.

  • Slice boundaries default to 0 and the sequence length, if omitted.

  • S[1:3] fetches from offsets 1 up to, but not including, 3.

  • S[1:] fetches from offset 1 through the end (length).

  • S[:3] fetches from offset 0 up to, but not including, 3.

  • S[:-1] fetches from offset 0 up to, but not including, the last item.

  • S[:] fetches from offsets 0 through the end—a top-level copy of S.

We'll see another slicing-as-parsing example later in this section. The last item listed here turns out to be a very common trick: it makes a full top-level copy of a sequence object—an object with the same value, but a distinct piece of memory. This isn't very useful for immutable objects like strings, but comes in handy for objects that may be changed, such as lists (more on copies in Chapter 7). Later, we'll also see that the syntax used to index by offset (the square brackets) is used to index dictionaries by key as well; the operations look the same, but have different interpretations.

In Python 2.3, slice expressions support an optional third index, used as a step (sometimes called a stride). The step is added to the index of each item extracted. For instance, X[1:10:2] will fetch every other item in X from offsets 1-9; it will collect items from offsets 1, 3, 5, and so on. Similarly, the slicing expression "hello"[::-1] returns the new string "olleh". For more details, see Python's standard documentation, or run a few experiments interactively.

Why You Will Care: Slices

Throughout the core language parts of this book, we include sidebars such as this to give you a peek at how some of the language features being introduced are typically used in real programs. Since we can't show much of real use until you've seen most of the Python picture, these sidebars necessarily contain many references to topics not introduced yet; at most, you should consider them previews of ways you may find these abstract language concepts useful for common programming tasks.

For instance, you'll see later that the argument words listed on a command line used to launch a Python program are made available in the argv attribute of the built-in sys module:

# File echo.py
import sys
print sys.argv

% python echo.py -a -b -c
['echo.py', '-a', '-b', '-c']

Usually, you're only interested in inspecting the arguments past the program name. This leads to a very typical application of slices: a single slice expression can strip off all but the first item of the list. Here, sys.argv[1:] returns the desired list, ['-a', '-b', '-c']. You can then process without having to accommodate the program name at the front.

Slices are also often used to clean up lines read from input files; if you know that a line will have an end-of-line character at the end (a '\n' newline marker), you can get rid of it with a single expression such as line[:-1], which extracts all but the last character in the line (the lower limit defaults to 0). In both cases, slices do the job of logic that must be explicit in a lower-level language.


5.2.3 String Conversion Tools

You cannot add a number and a string together in Python, even if the string looks like a number (i.e., is all digits):

>>> "42" + 1
TypeError: cannot concatenate 'str' and 'int' objects

This is by design: because + can mean both addition and concatenation, the choice of conversion would be ambiguous. So, Python treats this as an error. In Python, magic is generally omitted, if it would make your life more complex.

What to do, then, if your script obtains a number as a text string from a file or user interface? The trick is that you need to employ conversion tools before you can treat a string like a number, or vice versa. For instance:

>>> int("42"), str(42)         # Convert from/to string.
(42, '42')
>>> string.atoi("42"), `42`    # Same, but older techniques
(42, '42')

The int and string.atoi functions both convert a string to a number, and the str function and backquotes around any object convert that object to its string representation (e.g., `42` converts a number to a string). Of these, int and str are the newer, and generally prescribed conversion techniques, and do not require importing the string module.

Although you can't mix strings and number types around operators such as +, you can manually convert before that operation if needed:

>>> int("42") + 1            # Force addition.
43
>>> "spam" + str(42)         # Force concatenation.
'spam42'

Similar built-in functions handle floating-point number conversions:

>>> str(3.1415), float("1.5")
('3.1415', 1.5)

>>> text = "1.234E-10"
>>> float(text)
1.2340000000000001e-010

Later, we'll further study the built-in eval function; it runs a string containing Python expression code, and so can convert a string to any kind of object. The functions int, string.atoi, and their relatives convert only to numbers, but this restriction means they are usually faster. As seen in Chapter 4, the string formatting expression provides another way to convert numbers to strings.

5.2.4 C hanging Strings

Remember the term—immutable sequence? The immutable part means that you can't change a string in-place (e.g., by assigning to an index):

>>> S = 'spam'
>>> S[0] = "x"
Raises an error!

So how do you modify text information in Python? To change a string, you just need to build and assign a new string using tools such as concatenation and slicing, and possibly assigning the result back to the string's original name.

>>> S = S + 'SPAM!'       # To change a string, make a new one.
>>> S
'spamSPAM!'
>>> S = S[:4] + 'Burger' + S[-1]
>>> S
'spamBurger!'

The first example adds a substring at the end of S, by concatenation; really, it makes a new string, and assigns it back to S, but you can usually think of this as changing a string. The second example replaces four characters with six by slicing, indexing, and concatenating. Later in this section, you'll see how to achieve a similar effect with string method calls. Finally, it's also possible to build up new text values with string formatting expressions:

>>> 'That is %d %s bird!' % (1, 'dead')    # like C sprintf
That is 1 dead bird!

The next section shows how.

    [ Team LiB ] Previous Section Next Section