[ Team LiB ] |
5.2 Strings in ActionOnce you've written a string, you will almost certainly want to do things with it. This section and the next two demonstrate string basics, formatting, and methods. 5.2.1 Basic OperationsLet's begin by interacting with the Python interpreter to illustrate the basic string operations listed in Table 5-1. Strings can be concatenated using the + operator, and repeated using the * operator: % python >>> len('abc') # Length: number items 3 >>> 'abc' + 'def' # Concatenation: a new string 'abcdef' >>> 'Ni!' * 4 # Repitition: like "Ni!" + "Ni!" + ... 'Ni!Ni!Ni!Ni!' Formally, adding two string objects creates a new string object, with the contents of its operands joined; repetition is like adding a string to itself a number of times. In both cases, Python lets you create arbitrarily sized strings; there's no need to predeclare anything in Python, including the sizes of data structures.[3] The len built-in function returns the length of strings (and other objects with a length).
Repetition may seem a bit obscure at first, but it comes in handy in a surprising number of contexts. For example, to print a line of 80 dashes, you can either count up to 80 or let Python count for you: >>> print '------- ...more... ---' # 80 dashes, the hard way >>> print '-'*80 # 80 dashes, the easy way Notice that operator overloading is at work here already: we're using the same + and * operators that are called addition and multiplication when using numbers. Python does the correct operation, because it knows the types of objects being added and multiplied. But be careful: this isn't quite as liberal as you might expect. For instance, Python doesn't allow you to mix numbers and strings in + expressions: 'abc'+9 raises an error, instead of automatically converting 9 to a string. As shown in the last line in Table 5-1, you can also iterate over strings in loops using for statements and test membership with the in expression operator, which is essentially a search: >>> myjob = "hacker" >>> for c in myjob: print c, # Step through items. ... h a c k e r >>> "k" in myjob # 1 means true (found). 1 >>> "z" in myjob # 0 means false (not found). 0 The for loop assigns a variable to successive items in a sequence (here, a string), and executes one or more statements for each item. In effect, the variable c becomes a cursor stepping across the string here. But further details on these examples will be discussed later. 5.2.2 Indexing and SlicingBecause strings are defined as an ordered collection of characters, we can access their components by position. In Python, characters in a string are fetched by indexing—providing the numeric offset of the desired component in square brackets after the string. You get back the one-character string. As in the C language, Python offsets start at zero and end at one less than the length of the string. Unlike C, Python also lets you fetch items from sequences such as strings using negative offsets. Technically, negative offsets are added to the length of a string to derive a positive offset. You can also think of negative offsets as counting backwards from the end. >>> S = 'spam' >>> S[0], S[-2] # Indexing from front or end ('s', 'a') >>> S[1:3], S[1:], S[:-1] # Slicing: extract section ('pa', 'pam', 'spa') The first line defines a four-character string and assign it the name S. The next line indexes it two ways: S[0] fetches the item at offset 0 from the left (the one-character string 's'), and S[-2] gets the item at offset 2 from the end (or equivalently, at offset (4 + -2) from the front). Offsets and slices map to cells as shown in Figure 5-1.[4]
Figure 5-1. Using offsets and slicesThe last line in the example above is our first look at slicing. Probably the best way to think of slicing is that it is a form of parsing (analyzing structure), especially when applied to strings—it allows us to extract an entire section (substring) in a single step. Slices can extract columns of data, chop off leading and trailing text, and more. Here's how slicing works. When you index a sequence object such as a string on a pair of offsets seperated by a colon, Python returns a new object containing the contiguous section identified by the offsets pair. The left offset is taken to be the lower bound (inclusive), and the right is the upper bound (noninclusive). Python fetches all items from the lower bound, up to but not including the upper bound, and returns a new object containing the fetched items. If omitted, the left and right bound default to zero, and the length of the object you are slicing, respectively. For instance, in the example above, S[1:3] extracts items at offsets 1 and 2. It grabs the second and third items, and stops before the fourth item at offset 3. Next S[1:] gets all items past the first—the upper bound defaults to the length of the string. Finally, S[:-1] fetches all but the last item—the lower bound defaults to zero, and -1 refers to the last item, non-inclusive. This may seem confusing on first glance, but indexing and slicing are simple and powerful to use, once you get the knack. Remember, if you're unsure about what a slice means, try it out interactively. In the next chapter, you'll see that it's also possible to change an entire section of a certain object in one step, by assigning to a slice. Here's a summary of the details for reference: 5.2.2.1 I ndexing (S[i]) fetches components at offsets
5.2.2.2 S licing (S[i:j]) extracts contiguous sections of a sequence
We'll see another slicing-as-parsing example later in this section. The last item listed here turns out to be a very common trick: it makes a full top-level copy of a sequence object—an object with the same value, but a distinct piece of memory. This isn't very useful for immutable objects like strings, but comes in handy for objects that may be changed, such as lists (more on copies in Chapter 7). Later, we'll also see that the syntax used to index by offset (the square brackets) is used to index dictionaries by key as well; the operations look the same, but have different interpretations. In Python 2.3, slice expressions support an optional third index, used as a step (sometimes called a stride). The step is added to the index of each item extracted. For instance, X[1:10:2] will fetch every other item in X from offsets 1-9; it will collect items from offsets 1, 3, 5, and so on. Similarly, the slicing expression "hello"[::-1] returns the new string "olleh". For more details, see Python's standard documentation, or run a few experiments interactively.
5.2.3 String Conversion ToolsYou cannot add a number and a string together in Python, even if the string looks like a number (i.e., is all digits): >>> "42" + 1 TypeError: cannot concatenate 'str' and 'int' objects This is by design: because + can mean both addition and concatenation, the choice of conversion would be ambiguous. So, Python treats this as an error. In Python, magic is generally omitted, if it would make your life more complex. What to do, then, if your script obtains a number as a text string from a file or user interface? The trick is that you need to employ conversion tools before you can treat a string like a number, or vice versa. For instance: >>> int("42"), str(42) # Convert from/to string. (42, '42') >>> string.atoi("42"), `42` # Same, but older techniques (42, '42') The int and string.atoi functions both convert a string to a number, and the str function and backquotes around any object convert that object to its string representation (e.g., `42` converts a number to a string). Of these, int and str are the newer, and generally prescribed conversion techniques, and do not require importing the string module. Although you can't mix strings and number types around operators such as +, you can manually convert before that operation if needed: >>> int("42") + 1 # Force addition. 43 >>> "spam" + str(42) # Force concatenation. 'spam42' Similar built-in functions handle floating-point number conversions: >>> str(3.1415), float("1.5") ('3.1415', 1.5) >>> text = "1.234E-10" >>> float(text) 1.2340000000000001e-010 Later, we'll further study the built-in eval function; it runs a string containing Python expression code, and so can convert a string to any kind of object. The functions int, string.atoi, and their relatives convert only to numbers, but this restriction means they are usually faster. As seen in Chapter 4, the string formatting expression provides another way to convert numbers to strings. 5.2.4 C hanging StringsRemember the term—immutable sequence? The immutable part means that you can't change a string in-place (e.g., by assigning to an index): >>> S = 'spam' >>> S[0] = "x" Raises an error! So how do you modify text information in Python? To change a string, you just need to build and assign a new string using tools such as concatenation and slicing, and possibly assigning the result back to the string's original name. >>> S = S + 'SPAM!' # To change a string, make a new one. >>> S 'spamSPAM!' >>> S = S[:4] + 'Burger' + S[-1] >>> S 'spamBurger!' The first example adds a substring at the end of S, by concatenation; really, it makes a new string, and assigns it back to S, but you can usually think of this as changing a string. The second example replaces four characters with six by slicing, indexing, and concatenating. Later in this section, you'll see how to achieve a similar effect with string method calls. Finally, it's also possible to build up new text values with string formatting expressions: >>> 'That is %d %s bird!' % (1, 'dead') # like C sprintf That is 1 dead bird! The next section shows how. |
[ Team LiB ] |