Back to Contents

5. More with Lists and Strings

We have learned the basics of list manipulation, and practiced them. In this chapter we explore lists further, including their connection to strings. We pick up a few more string methods along the way. Finally we try three advanced list manipulation techniques.

Splitting and joining

We can split a string into a list of its letters, each as a string, using the built-in list function:

Python
>>> l = list('tumultuous')
>>> l
['t', 'u', 'm', 'u', 'l', 't', 'u', 'o', 'u', 's']

We might think the reverse can be achieved using the familiar built-in str function, but that just builds a string showing how the list would be printed by Python:

Python
>>> str(l)
"['t', 'u', 'm', 'u', 'l', 't', 'u', 'o', 'u', 's']"

We could write a function to do it ourselves:

image

Here is the result:

Python
>>> l = list('tumultuous')
>>> l
['t', 'u', 'm', 'u', 'l', 't', 'u', 'o', 'u', 's']
>>> join(l)
'tumultuous'

As you might suspect, there is a built-in join function: it is, somewhat counterintuitively, a method on strings. We specify the empty string and we see this:

Python
>>> l = list('tumultuous')
>>> ''.join(l)
'tumultuous'

If we specify a different string, it will be used to glue the letters together instead:

Python
>>> ' '.join(l)
't u m u l t u o u s'

Another method on strings is split, which splits a given string into a list of strings, one for each word in the original:

Python
>>> s = '   Once   upon a    time   '
>>> s.split()
['Once', 'upon', 'a', 'time']

As you can see, multiple spaces are considered the same as a single space, and spaces at the beginning and end are ignored.

Finding strings in other strings

The find method gives the index of the first position a string appears in another:

Python
>>> s = 'Once upon a time'
>>> s.find('upon')
5
>>> s.find('not there')
-1

In one of the questions, you will be asked to write a similar function yourself, from scratch. Of course, we can use indices and slices on strings too:

Python
>>> s = 'Once upon a time'
>>> s[0]
>>> 'O'
>>> s[:4]
'Once'
>>> s[:-4]
'Once upon a '
>>> s[-4:]
'time'

And so there is no need to convert a string to a list to take advantage of the useful slicing constructs. We can combine these two new techniques to isolate the first sentence in a string by removing anything which follows:

Python
>>> s = 'The first sentence. And the second...'
>>> pos = s.find('.')
>>> pos
18
>>> s[:pos + 1]
'The first sentence.'

Of course, in practice we would need to check that find does not return -1. What would happen if it did?

Sorting

Now, we leave strings and return to lists. We often need to sort a list into increasing order prior to further processing. This can be achieved with the sort method:

Python
>>> l = [1, 2, 3, 2, 1, 3, 2]
>>> l.sort()
>>> l
[1, 1, 2, 2, 2, 3, 3]

The list is sorted in-place. The sorted function, on the other hand, returns a new, sorted version of the list, leaving the original list alone.

Python
>>> l = [1, 2, 3, 2, 1, 3, 2]
>>> sorted(l)
[1, 1, 2, 2, 2, 3, 3]
>>> l
[1, 2, 3, 2, 1, 3, 2]

This is useful when we want to, for example, iterate over a list in sorted order but leave the original data intact for later use.

Two useful functions: map and filter

There are two built-in functions for producing lists by modifying other lists. The first is map which applies a function to each element of a list:

Python
>>> l = [1, 2, 3, 4, 5]
>>> def square(x): return x * x
... 
>>> list(map(square, l))
[1, 4, 9, 16, 25]

We must use list to retrieve the result. We shall discuss why in a moment. The second useful function is filter which can be used to select only such elements of a list for which a given function returns True:

Python
>>> l = [1, 2, 3, 4, 5]
>>> def even(x): return x % 2 == 0
... 
>>> list(filter(even, l))
[2, 4]

You can imagine how these functions can be used instead of for loops, leading to shorter and easier to understand programs. As programmers, we spend a lot of our time reading programs we have already written (or reading programs written by others), compared with the time we spend writing new ones, so such ease of understanding is very important.

Iterators

We have just written this fragment, making use of map:

Python
>>> l = [1, 2, 3, 4, 5]
>>> def square(x): return x * x
... 
>>> list(map(square, l))
[1, 4, 9, 16, 25]

Why did we need to use list to convert the result of map into a list? It is because map returns an iterator not a list. An iterator is something which can be used to range over a data structure, but does not return a list – it returns items one by one. This means that the individual items are not created until they are needed. We can use a for loop over an iterator, without needing to make a list of it:

Python
>>> l = [1, 2, 3, 4, 5]
>>> def square(x): return x * x
... 
>>> for x in map(square, l):
...     print(x)
...
1
4
9
16
25

Another example of a function returning an iterator is Python’s reversed:

Python
>>> reversed([1, 4, 3, 2])
<list_reverseiterator object at 0x7fd45aa03dc0>
>>> list(reversed([1, 4, 3, 2]))
[2, 3, 4, 1]

If we use reversed in a for loop, we would not notice that it did not return a list, but an iterator. Many built-in functions in Python operate over any iterable structure, not just lists: for example, sum calculates the sum of any such structure containing numbers.

List comprehensions

Instead of producing one list from another, or producing it manually by repeated use of append or insert, we can also build a list from scratch using a list comprehension. For example:

Python
>>> [x * x for x in range(10)]
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
>>> [str(x) for x in range(10)]
['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']
>>> [x % 2 == 0 for x in range(10)]
[True, False, True, False, True, False, True,
False, True, False]

We can also provide a filter inside the list comprehension by adding an if at the end. Here are some cubes which are also even:

Python
>>> [x * x * x for x in range(20) if (x * x * x) % 2 == 0]
[0, 8, 64, 216, 512, 1000, 1728, 2744, 4096, 5832]

Such comprehensions provide a concise and readable way to produce lists of items meeting certain criteria, without having to iterate over them with a for loop.

Common problems

The strange formulation of the join mechanism, as a method on the string which is being used to glue the other together, can lead to confusion. Consider, for example, the following two strings:

Python
>>> x = 'marginal'
>>> y = ' '

We intend to write the following:

Python
>>> y.join(x)
'm a r g i n a l'

But if we get the strings in the wrong order, the operation still succeeds, but the result is not what we wanted:

Python
>>> x.join(y)
' '

The find method on strings has a way of signalling failure which we have not seen before: instead of returning an error, it returns normally but with an answer of -1. We must check for this, otherwise the -1 may be used unwittingly by the rest of the program without errors, for example in a slice:

Python
>>> 'Once'[-1]
'e'

Summary

We have learned about some of the connections between strings and lists, two kinds of ordered data structure. We have manipulated strings by splitting and joining them, and found strings within one another. We have introduced the important topic of sorting. We have seen maps and filters, two powerful mechanisms for processing lists. We have shown how iterators can simplify list-heavy programs. Finally, we have looked at list comprehensions, a way of combining one or more of these mechanisms together.

Questions

  1. Use the sort method to build a function which returns an alphabetically sorted list of all the words in a given sentence.

  2. Use sorted to write a similar function.

  3. Use a sorting method to make our histogram function from question 7 of the previous chapter produce the histogram sorted in alphabetical order.

  4. Write a function to remove spaces from the beginning and end of a string representing a sentence by converting the string to a list of its letters, processing it, and converting it back to a single string. You might find the built-in reverse method on lists useful, or another list-reversal mechanism.

  5. Can you find a simpler way to perform this task, using a built-in method described in this chapter?

  6. Write a function clip which, given an integer, clips it to the range 1…10 so that integers bigger than 10 round down to 10, and those smaller than 1 round up to 1. Write another function clip_list which uses this first function together with map to apply this clipping to a whole list of integers.

  7. Write a function to detect if a given string is palindromic (i.e. equals its own reverse). Now use filter to write a function which takes a list of strings and returns only those which are palindromic. Then write a function to return a list of the numbers in a given range which are palindromic, for example 1331.

  8. Rewrite your clip_list example from question 6 in the form of a list comprehension.

  9. Similarly, rewrite your palindromic number detector from question 7 in the form of a list comprehension.