Todays lecture:

  • Understanding "import"
  • Creating our own matrix class
  • Introduction to numpy

1. Import statements

Python by itself does not have a lot of functionality included (unlike e.g. Matlab, Mathematica). Instead, we rely on extra "modules", which we import into our script.

There are several ways to import a module: * import math * import math as m * from math import * * from math import sin,cos * from math import sin as cos

There is a lot of external libraries that can do pretty much everything for you. How do you find it? WSE* is your friend!

* your favourite Web Search Engine

First simple example, find a trigonometric function...

In [1]:
sin(3.1416)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-1-856f566c7e7b> in <module>()
----> 1 sin(3.1416)

NameError: name 'sin' is not defined
WSE, "trigonometric functions in python"?

standard imports, recommended for official code:

In [2]:
import math
math.sin(3.1416)
Out[2]:
-7.346410206643587e-06

You will often see

import long_name as short_name
In [4]:
import math as m
m.sin(3.1416)
Out[4]:
-7.346410206643587e-06

This is only for convenience, as 'm' is quicker to write than 'math'

Generally not problematic, but when we read a long script we then have to see where "m" was imported to understand where the function comes from (and more importantly, what it does).`

Sometimes you just need one part of a large library.

Then it can be convenient, and faster, to import specific functions:

In [5]:
from math import sin
sin(3.1416)
Out[5]:
-7.346410206643587e-06

You can also import "everything" this way:

In [6]:
from math import *
sin(3.1416)
Out[6]:
-7.346410206643587e-06

Quite popular for "quick scripting", but potentially very problematic for published code.

Imagine something like:

class array:
    def __init__(self,data):
        self.data=data
...
from numpy import *
...
array([[1,2],[3,4]])

Ooops, array is now suddenly not our own implementation anymore, it is a numpy.array() instead.

More info: Idioms and Anti-Idioms: https://python.readthedocs.org/en/v2.7.2/howto/doanddont.html

Sometimes you have to use the syntax

from .. import ..
In [7]:
import matplotlib
matplotlib.pylab.plot([1,2,3])
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-7-25a509c20117> in <module>()
      1 import matplotlib
----> 2 matplotlib.pylab.plot([1,2,3])

AttributeError: 'module' object has no attribute 'pylab'
In [8]:
from matplotlib import pylab
%matplotlib inline
pylab.plot([1,2,3])
# question to the confused: does matplotlib.pylab.plot now exist?
Out[8]:
[<matplotlib.lines.Line2D at 0x10df9bbd0>]

The reason for this is a bit complicated, but basically, when you do "import numpy" you excecute a file where the developer has specified what functionality is imported "by default".

And yes, you are correct,

import matplotlib.pylab
matplotlib.pylab.plot([1,2,3])

also works.

In [9]:
#popular way
from matplotlib import pylab as plt
plt.plot([1,2,3])
Out[9]:
[<matplotlib.lines.Line2D at 0x10e0c6e50>]

Comment: name referencing in Python

import math as m

effectively is the same as

import math
m=math

(though in the latter, math.sin is also available)

2. Creating our own matrix class

Python is a scripting language. A scripting language (as compared to a compiled language like C/C++, Fortran) is generally * flexible * slow

We will learn this by following Emanuele's philosophy. First do it ourselves, then find a finished good implementation

Let's first create two small matrices for testing:

In [10]:
A=[
 [1.0, 2.0],
 [3.0, 4.0]
]
B=[
 [5.0, 6.0, 7.0],
 [8.0, 9.0, 10.0]
]
In [11]:
# Our own, silly implementation of matrix multiplication
def matrix_mult(m1,m2):
    '''
    m1 and m2 are expected to be list objects
    len(m1)==len(m2[0])
    '''
    matrix=[]
    for i in range(len(m1[0])):
        new_column=[]
        for j in range(len(m2)):
            new_data=0.0
            for k in range(len(m1)):
                new_data+=m1[k][i]*m2[j][k]
            new_column.append(new_data)
        matrix.append(new_column)
    return matrix



matrix_mult(A,A)
Out[11]:
[[7.0, 15.0], [10.0, 22.0]]

This is simple enough (need to keep indices right..)

However, not very flexible. We do remember Emanuele mentioned that essentially everything is a class. So perhaps we could rather create a class and define our multiplication there?

In [12]:
class mymatrix:
    def __init__(self,data):
        '''
        data should be a list of lists
        all lists in list data should have equal length,
        and only contain numbers
        '''
        self.i=len(data)
        self.j=len(data[0])
        self.data=data

The way we have written comments in the function here is proper python documentation.

This documentation is available with the Python built-in help() function

This is very useful, I use the built-in help() a lot

In [13]:
help(mymatrix)
Help on class mymatrix in module __main__:

class mymatrix
 |  Methods defined here:
 |  
 |  __init__(self, data)
 |      data should be a list of lists
 |      all lists in list data should have equal length,
 |      and only contain numbers


In [14]:
help(pylab.plot)
Help on function plot in module matplotlib.pyplot:

plot(*args, **kwargs)
    Plot lines and/or markers to the
    :class:`~matplotlib.axes.Axes`.  *args* is a variable length
    argument, allowing for multiple *x*, *y* pairs with an
    optional format string.  For example, each of the following is
    legal::
    
        plot(x, y)        # plot x and y using default line style and color
        plot(x, y, 'bo')  # plot x and y using blue circle markers
        plot(y)           # plot y using x as index array 0..N-1
        plot(y, 'r+')     # ditto, but with red plusses
    
    If *x* and/or *y* is 2-dimensional, then the corresponding columns
    will be plotted.
    
    An arbitrary number of *x*, *y*, *fmt* groups can be
    specified, as in::
    
        a.plot(x1, y1, 'g^', x2, y2, 'g-')
    
    Return value is a list of lines that were added.
    
    By default, each line is assigned a different color specified by a
    'color cycle'.  To change this behavior, you can edit the
    axes.color_cycle rcParam.
    
    The following format string characters are accepted to control
    the line style or marker:
    
    ================    ===============================
    character           description
    ================    ===============================
    ``'-'``             solid line style
    ``'--'``            dashed line style
    ``'-.'``            dash-dot line style
    ``':'``             dotted line style
    ``'.'``             point marker
    ``','``             pixel marker
    ``'o'``             circle marker
    ``'v'``             triangle_down marker
    ``'^'``             triangle_up marker
    ``'<'``             triangle_left marker
    ``'>'``             triangle_right marker
    ``'1'``             tri_down marker
    ``'2'``             tri_up marker
    ``'3'``             tri_left marker
    ``'4'``             tri_right marker
    ``'s'``             square marker
    ``'p'``             pentagon marker
    ``'*'``             star marker
    ``'h'``             hexagon1 marker
    ``'H'``             hexagon2 marker
    ``'+'``             plus marker
    ``'x'``             x marker
    ``'D'``             diamond marker
    ``'d'``             thin_diamond marker
    ``'|'``             vline marker
    ``'_'``             hline marker
    ================    ===============================
    
    
    The following color abbreviations are supported:
    
    ==========  ========
    character   color
    ==========  ========
    'b'         blue
    'g'         green
    'r'         red
    'c'         cyan
    'm'         magenta
    'y'         yellow
    'k'         black
    'w'         white
    ==========  ========
    
    In addition, you can specify colors in many weird and
    wonderful ways, including full names (``'green'``), hex
    strings (``'#008000'``), RGB or RGBA tuples (``(0,1,0,1)``) or
    grayscale intensities as a string (``'0.8'``).  Of these, the
    string specifications can be used in place of a ``fmt`` group,
    but the tuple forms can be used only as ``kwargs``.
    
    Line styles and colors are combined in a single format string, as in
    ``'bo'`` for blue circles.
    
    The *kwargs* can be used to set line properties (any property that has
    a ``set_*`` method).  You can use this to set a line label (for auto
    legends), linewidth, anitialising, marker face color, etc.  Here is an
    example::
    
        plot([1,2,3], [1,2,3], 'go-', label='line 1', linewidth=2)
        plot([1,2,3], [1,4,9], 'rs',  label='line 2')
        axis([0, 4, 0, 10])
        legend()
    
    If you make multiple lines with one plot command, the kwargs
    apply to all those lines, e.g.::
    
        plot(x1, y1, x2, y2, antialised=False)
    
    Neither line will be antialiased.
    
    You do not need to use format strings, which are just
    abbreviations.  All of the line properties can be controlled
    by keyword arguments.  For example, you can set the color,
    marker, linestyle, and markercolor with::
    
        plot(x, y, color='green', linestyle='dashed', marker='o',
             markerfacecolor='blue', markersize=12).
    
    See :class:`~matplotlib.lines.Line2D` for details.
    
    The kwargs are :class:`~matplotlib.lines.Line2D` properties:
    
      agg_filter: unknown
      alpha: float (0.0 transparent through 1.0 opaque)         
      animated: [True | False]         
      antialiased or aa: [True | False]         
      axes: an :class:`~matplotlib.axes.Axes` instance         
      clip_box: a :class:`matplotlib.transforms.Bbox` instance         
      clip_on: [True | False]         
      clip_path: [ (:class:`~matplotlib.path.Path`,         :class:`~matplotlib.transforms.Transform`) |         :class:`~matplotlib.patches.Patch` | None ]         
      color or c: any matplotlib color         
      contains: a callable function         
      dash_capstyle: ['butt' | 'round' | 'projecting']         
      dash_joinstyle: ['miter' | 'round' | 'bevel']         
      dashes: sequence of on/off ink in points         
      drawstyle: ['default' | 'steps' | 'steps-pre' | 'steps-mid' |                   'steps-post']         
      figure: a :class:`matplotlib.figure.Figure` instance         
      fillstyle: ['full' | 'left' | 'right' | 'bottom' | 'top' | 'none']         
      gid: an id string         
      label: string or anything printable with '%s' conversion.         
      linestyle or ls: [``'-'`` | ``'--'`` | ``'-.'`` | ``':'`` | ``'None'`` |                   ``' '`` | ``''``]         and any drawstyle in combination with a linestyle, e.g., ``'steps--'``.         
      linewidth or lw: float value in points         
      lod: [True | False]         
      marker: unknown
      markeredgecolor or mec: any matplotlib color         
      markeredgewidth or mew: float value in points         
      markerfacecolor or mfc: any matplotlib color         
      markerfacecoloralt or mfcalt: any matplotlib color         
      markersize or ms: float         
      markevery: unknown
      path_effects: unknown
      picker: float distance in points or callable pick function         ``fn(artist, event)``         
      pickradius: float distance in points         
      rasterized: [True | False | None]         
      sketch_params: unknown
      snap: unknown
      solid_capstyle: ['butt' | 'round' |  'projecting']         
      solid_joinstyle: ['miter' | 'round' | 'bevel']         
      transform: a :class:`matplotlib.transforms.Transform` instance         
      url: a url string         
      visible: [True | False]         
      xdata: 1D array         
      ydata: 1D array         
      zorder: any number         
    
    kwargs *scalex* and *scaley*, if defined, are passed on to
    :meth:`~matplotlib.axes.Axes.autoscale_view` to determine
    whether the *x* and *y* axes are autoscaled; the default is
    *True*.
    
    
    
    Additional kwargs: hold = [True|False] overrides default hold state


In [15]:
# we now create an instance of our matrix class and try to multiply:
myA=mymatrix(A)
myB=mymatrix(B)

print myA*myB
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-15-545de5c562f8> in <module>()
      3 myB=mymatrix(B)
      4 
----> 5 print myA*myB

TypeError: unsupported operand type(s) for *: 'instance' and 'instance'

As expected, we do not yet support multiplication. WSE tells us we first need to define

mymatrix.__mul__(self, other)
In [16]:
class mymatrix2(mymatrix):
    def __mul__(self, other):
        '''
        defines multiplication of two matrices
        '''
        if self.j!=other.i:
            raise ValueError("Cannot multiply matrices with these dimensions!")
        matrix=[]
        for i in range(self.i):
            column=[]
            for j in range(other.j):
                data=0.0
                for k in range(self.j):
                    data+=self.data[i][k]*other.data[k][j]
                column.append(data)
            matrix.append(column)
        return mymatrix2(matrix)
In [17]:
myA=mymatrix2(A)
myB=mymatrix2(B)
In [18]:
print myA*myB
<__main__.mymatrix2 instance at 0x10e19cab8>

In [19]:
print myB*myA
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-19-3a103739eb83> in <module>()
----> 1 print myB*myA

<ipython-input-16-7228b1fac1ba> in __mul__(self, other)
      5         '''
      6         if self.j!=other.i:
----> 7             raise ValueError("Cannot multiply matrices with these dimensions!")
      8         matrix=[]
      9         for i in range(self.i):

ValueError: Cannot multiply matrices with these dimensions!

Great, the multiplication logic seems to work.. But the print is not very understandable. Let's see if we can make a pretty printing. This is done by

mymatrix.__str__(self)
In [24]:
class mymatrix3(mymatrix2):
    def __mul__(self, other):
        m2=mymatrix2.__mul__(self,other)
        return mymatrix3(m2.data)
    def __str__(self):
        '''
        Define a pretty print of mymatrix
        '''
        data_str=[str(d) for d in self.data]
        string='['
        string = string +'\n '.join(data_str)
        string = string +'] '
        return string

myA=mymatrix3(A)
print myA*myB
[[21.0, 24.0, 27.0]
 [47.0, 54.0, 61.0]] 

In [22]:
c=['a','b','c']
''.join(c)
Out[22]:
'abc'

So, to recap quickly, when we write a*b in python, that is actually equivalent of

a.__mul__(b)

Some more useful functions for classes: * , multiplication, __mul__(self, other) -, subtraction, __sub__(self, other) * +, addition, __add__(self, other) * /, division, __div__(self, other) * str(), string representation, __str__(self) * and many more, see e.g. http://www.rafekettler.com/magicmethods.html

With this, we can now write e.g.

a+b*c-d

Instead of the much less readable

a.__add__(b.__mul__(c)).__sub__(d)

The more complete matrix class can be found at the bottom of this lecture. Exercise for the eager: Try to write it yourself before you look at the solution..

I will not go through it... Because there is a better way!

3. Introducing numpy

As we said before, a weakness of scripting languages is that it is slow (compared to compiled languages, not compared to Excel!)

Python has a great way to get around that problem, by introducing external modules that are written in faster languages. These modules are typically written in C or Fortran.

A very popular such module in the scientific community, is the numpy module.

Homepage: http://www.numpy.org/

At it's core, it provides an array object implemented in C. It also provides tools for linear algebra, random number generation, and more.

Other libraries that are covered in future lectures are built on numpy (perhaps with the exception of pandas?)

In [25]:
import numpy as np
# np has become a very common acronym for numpy in scripts,
# so this can be used even in public code without any worry
npA=np.array(A)
print npA
[[ 1.  2.]
 [ 3.  4.]]

In [26]:
print npA*npA
[[  1.   4.]
 [  9.  16.]]

In [27]:
print 2*npA
[[ 2.  4.]
 [ 6.  8.]]

We see that this array class already has our pretty print implemented. (try print A for comparison)

However, some of you may notice that the multiplication here is a simple elementwise product, differing from the matrix product we defined ourselves. This makes sense because numpy calls it an array.

numpy.matrix() is another class provided by numpy, in which a*b returns the matrix product. This class is less used in numpy, but is perhaps more familiar for those who come from a Matlab environment. Read more at http://wiki.scipy.org/NumPy_for_Matlab_Users

In [29]:
npA.dot(npA)
Out[29]:
array([[  7.,  10.],
       [ 15.,  22.]])
In [30]:
mA=np.matrix(A)
print mA
[[ 1.  2.]
 [ 3.  4.]]

In [31]:
print mA*mA
[[  7.  10.]
 [ 15.  22.]]

We also have matrix multiplication available for arrays, but need then to specify by calling the function directly. The function is very flexible, it can take any list-like objects:

In [32]:
print npA.dot(npA)
print npA.dot(mA)
print npA.dot(A)
[[  7.  10.]
 [ 15.  22.]]
[[  7.  10.]
 [ 15.  22.]]
[[  7.  10.]
 [ 15.  22.]]

Random numbers...

Random number operations are found in the numpy.random module. help(module.random) is useful, or you can find documentation on the web: http://docs.scipy.org/doc/numpy/reference/routines.random.html

Noticeably: * numpy.random.rand() : returns uniform random numbers between 0 and 1 * numpy.random.randn() : returns random numbers from normal distribution with \(\mu\)=0, \(\sigma\)=1 * numpy.random.randint(low,high) : returns random integers between low and high

All of these can take the optional argument "size", which returns an array of many numbers.

In [33]:
np.random.randn()
Out[33]:
0.1924755307217171
In [34]:
np.random.randn(100,100)
Out[34]:
array([[ -1.12181229e+00,   1.47301110e+00,   8.39405936e-01, ...,
         -1.83289464e+00,  -6.02832418e-01,   7.11041154e-01],
       [ -5.33694492e-01,   9.28033014e-01,  -2.36805050e+00, ...,
          5.19090125e-01,   4.13829561e-01,  -1.79141773e+00],
       [  1.41491212e+00,  -7.06821844e-01,  -6.90058363e-01, ...,
          5.58559846e-01,  -6.62295266e-01,  -5.36632217e-01],
       ..., 
       [  2.16839512e+00,  -5.46978617e-01,   9.51576889e-01, ...,
          1.27726500e+00,   5.70112489e-02,  -1.27747774e+00],
       [  8.38842905e-01,   5.61059050e-01,  -2.76070120e-01, ...,
          8.01692715e-01,  -9.57985100e-02,   1.66103507e-03],
       [  3.34122107e-01,  -2.89028172e-01,   1.55221953e+00, ...,
         -1.11459515e+00,  -1.71959840e+00,  -1.89215002e+00]])
In [35]:
# Now, I claimed that scripting languages are slow, so let us compare...
from timeit import timeit

npArr=np.random.randn(1e2,1e2)
myArr=mymatrix3(npArr.tolist()) #tolist() returns a normal python list object..

def test_mymat():
    myArr*myArr

def test_numpy():
    npArr.dot(npArr)
In [39]:
t_np=timeit(test_numpy, number=1)
print t_np
0.000582933425903

In [40]:
t_my=timeit(test_mymat, number=1)
print t_my, t_my/t_np
0.163679838181 280.786503067

This is terrible, our code is in this quick test 400 times slower than numpy.

Conclusion: Listen to Emanuele, WSE will always give you a better solution than implementing yourself!

As a final test, we can plot a function that shows how much slower our code is depending on array size

In [41]:
# Let us plot a timing evolution...
def test_array(size):
    '''
    assuming array is a numpy.array
    '''
    from timeit import timeit
    npArr=np.random.randn(size,size)
    myArr=mymatrix3(npArr.tolist())
    def test_mymat():
        myArr*myArr

    def test_numpy():
        npArr.dot(npArr)
        
    t_np=timeit(test_numpy, number=10)
    t_my=timeit(test_mymat, number=10)
    # we return the timing ratio:
    return t_my/t_np

sizes=[1,5,10,30,50,70,100]
ratios=[]
for size in sizes:
    result=test_array(size)
    ratios.append(result)
from matplotlib import pylab as plt
fig=plt.plot(sizes,ratios)
Other useful functions and modules in numpy

Size of an array..?

In [49]:
myarray=np.random.randn(100,50,10)
In [50]:
myarray.size
Out[50]:
50000
In [51]:
len(myarray)
Out[51]:
100
In [55]:
myarray.shape
Out[55]:
tuple
In [56]:
# useful built-in
type(myarray)
Out[56]:
numpy.ndarray
In [54]:
sum(sum(myarray))
Out[54]:
array([  -3.97778285,  -74.83839126,   75.03738766,  -89.52875254,
        -28.76858491,   61.41025783,  126.33178858,  -45.86607728,
        -33.5180614 ,  -76.5404473 ])
In [57]:
print myarray.min(), myarray.max()
-4.67804759863 4.09261715935

In [59]:
myarray=np.random.randn(100,50)
fig=pylab.plot(np.cos(myarray),np.sin(myarray),'r.')

4. Backup

In []:
# The kind-of-complete example of a matrix class
# This also does some basic testing of the input


class mymatrix:
    def __init__(self,data):
        '''
        data should be a list of lists
        all lists in list data should have equal length,
        and only contain numbers
        '''
        import numbers
        
        self.i=len(data)
        self.j=len(data[0])
        # We check that the content is OK:
        for l in data:
            if len(l)!=self.j:
                raise ValueError("Wrong array dimensions")
            for k in l:
                if not isinstance(k,numbers.Real):
                    raise TypeError("Wrong content type in data")
        self.data=data
        
    # Addition:
    def __add__(self,other):
        '''
        Defines addition of two matrices
        '''
        if self.i!=other.i or self.j!=other.j:
            raise ValueError("Cannot add matrices of different dimensions")

        matrix=[]
        for i in range(self.i):
            column=[]
            for j in range(self.j):
                column.append(self.data[i][j]+other.data[i][j])
            matrix.append(column)
        return mymatrix(matrix)
    
    def __mul__(self,other):
        '''
        defines multiplication of two matrices
        '''
        if self.j!=other.i:
            raise ValueError("Cannot multiply these matrices")
        matrix=[]
        for j in range(self.j):
            column=[]
            for i in range(other.i):
                data=0.0
                for k in range(self.i):
                    data+=self.data[k][j]*other.data[i][k]
                column.append(data)
            matrix.append(column)
        return mymatrix(matrix)
    
    def __neg__(self):
        '''
        Defines -self
        '''
        matrix=[]
        for i in range(self.i):
            column=[]
            for j in range(self.j):
                column.append(-self.data[i][j])
            matrix.append(column)
        return mymatrix(matrix)
    
    def __sub__(self,other):
        '''
        Defines self-other
        '''
        myother=-other
        return self+myother
    
    def __pow__(self,power):
        if not isinstance(power,int):
            raise ValueError("Only integer power defined")
        if power<1:
            raise ValueError("Only positive powers defined")

        # Expert question: Why is this potentially dangerous?
        matrix=self
        for i in range(1,power):
            matrix*=self
        return matrix
    
    def __str__(self):
        '''
        Define a pretty print
        '''
        data_str=[str(d) for d in self.data]
        string='['
        string+='\n '.join(data_str)
        string+='] '
        return string
In []:
a=mymatrix([[1,2],[3,4]])
print a
print a+a
print a*a
print a-a+a*a
# New task for you, figure out how to have 2*a working...