Python by itself does not have a lot of functionality included (unlike e.g. Matlab, Mathematica). Instead, we rely on extra "modules", which we import into our script.
There are several ways to import a module: * import math
* import math as m
* from math import *
* from math import sin,cos
* from math import sin as cos
There is a lot of external libraries that can do pretty much everything for you. How do you find it? WSE* is your friend!
sin(3.1416)
standard imports, recommended for official code:
import math
math.sin(3.1416)
You will often see
import long_name as short_name
import math as m
m.sin(3.1416)
This is only for convenience, as 'm' is quicker to write than 'math'
Generally not problematic, but when we read a long script we then have to see where "m" was imported to understand where the function comes from (and more importantly, what it does).`
Sometimes you just need one part of a large library.
Then it can be convenient, and faster, to import specific functions:
from math import sin
sin(3.1416)
You can also import "everything" this way:
from math import *
sin(3.1416)
Quite popular for "quick scripting", but potentially very problematic for published code.
Imagine something like:
class array:
def __init__(self,data):
self.data=data
...
from numpy import *
...
array([[1,2],[3,4]])
Ooops, array is now suddenly not our own implementation anymore, it is a numpy.array() instead.
More info: Idioms and Anti-Idioms: https://python.readthedocs.org/en/v2.7.2/howto/doanddont.html
Sometimes you have to use the syntax
from .. import ..
import matplotlib
matplotlib.pylab.plot([1,2,3])
from matplotlib import pylab
%matplotlib inline
pylab.plot([1,2,3])
# question to the confused: does matplotlib.pylab.plot now exist?
The reason for this is a bit complicated, but basically, when you do "import numpy" you excecute a file where the developer has specified what functionality is imported "by default".
And yes, you are correct,
import matplotlib.pylab
matplotlib.pylab.plot([1,2,3])
also works.
#popular way
from matplotlib import pylab as plt
plt.plot([1,2,3])
import math as m
effectively is the same as
import math
m=math
(though in the latter, math.sin is also available)
Python is a scripting language. A scripting language (as compared to a compiled language like C/C++, Fortran) is generally * flexible * slow
We will learn this by following Emanuele's philosophy. First do it ourselves, then find a finished good implementation
A=[
[1.0, 2.0],
[3.0, 4.0]
]
B=[
[5.0, 6.0, 7.0],
[8.0, 9.0, 10.0]
]
# Our own, silly implementation of matrix multiplication
def matrix_mult(m1,m2):
'''
m1 and m2 are expected to be list objects
len(m1)==len(m2[0])
'''
matrix=[]
for i in range(len(m1[0])):
new_column=[]
for j in range(len(m2)):
new_data=0.0
for k in range(len(m1)):
new_data+=m1[k][i]*m2[j][k]
new_column.append(new_data)
matrix.append(new_column)
return matrix
matrix_mult(A,A)
This is simple enough (need to keep indices right..)
However, not very flexible. We do remember Emanuele mentioned that essentially everything is a class. So perhaps we could rather create a class and define our multiplication there?
class mymatrix:
def __init__(self,data):
'''
data should be a list of lists
all lists in list data should have equal length,
and only contain numbers
'''
self.i=len(data)
self.j=len(data[0])
self.data=data
The way we have written comments in the function here is proper python documentation.
This documentation is available with the Python built-in help() function
This is very useful, I use the built-in help() a lot
help(mymatrix)
help(pylab.plot)
# we now create an instance of our matrix class and try to multiply:
myA=mymatrix(A)
myB=mymatrix(B)
print myA*myB
As expected, we do not yet support multiplication. WSE tells us we first need to define
mymatrix.__mul__(self, other)
class mymatrix2(mymatrix):
def __mul__(self, other):
'''
defines multiplication of two matrices
'''
if self.j!=other.i:
raise ValueError("Cannot multiply matrices with these dimensions!")
matrix=[]
for i in range(self.i):
column=[]
for j in range(other.j):
data=0.0
for k in range(self.j):
data+=self.data[i][k]*other.data[k][j]
column.append(data)
matrix.append(column)
return mymatrix2(matrix)
myA=mymatrix2(A)
myB=mymatrix2(B)
print myA*myB
print myB*myA
Great, the multiplication logic seems to work.. But the print is not very understandable. Let's see if we can make a pretty printing. This is done by
mymatrix.__str__(self)
class mymatrix3(mymatrix2):
def __mul__(self, other):
m2=mymatrix2.__mul__(self,other)
return mymatrix3(m2.data)
def __str__(self):
'''
Define a pretty print of mymatrix
'''
data_str=[str(d) for d in self.data]
string='['
string = string +'\n '.join(data_str)
string = string +'] '
return string
myA=mymatrix3(A)
print myA*myB
c=['a','b','c']
''.join(c)
So, to recap quickly, when we write a*b in python, that is actually equivalent of
a.__mul__(b)
Some more useful functions for classes: * , multiplication, __mul__(self, other) -, subtraction, __sub__(self, other) * +, addition, __add__(self, other) * /, division, __div__(self, other) * str(), string representation, __str__(self) * and many more, see e.g. http://www.rafekettler.com/magicmethods.html
With this, we can now write e.g.
a+b*c-d
Instead of the much less readable
a.__add__(b.__mul__(c)).__sub__(d)
The more complete matrix class can be found at the bottom of this lecture. Exercise for the eager: Try to write it yourself before you look at the solution..
I will not go through it... Because there is a better way!
As we said before, a weakness of scripting languages is that it is slow (compared to compiled languages, not compared to Excel!)
Python has a great way to get around that problem, by introducing external modules that are written in faster languages. These modules are typically written in C or Fortran.
A very popular such module in the scientific community, is the numpy module.
Homepage: http://www.numpy.org/
At it's core, it provides an array object implemented in C. It also provides tools for linear algebra, random number generation, and more.
Other libraries that are covered in future lectures are built on numpy (perhaps with the exception of pandas?)
import numpy as np
# np has become a very common acronym for numpy in scripts,
# so this can be used even in public code without any worry
npA=np.array(A)
print npA
print npA*npA
print 2*npA
We see that this array class already has our pretty print implemented. (try print A for comparison)
However, some of you may notice that the multiplication here is a simple elementwise product, differing from the matrix product we defined ourselves. This makes sense because numpy calls it an array.
numpy.matrix() is another class provided by numpy, in which a*b returns the matrix product. This class is less used in numpy, but is perhaps more familiar for those who come from a Matlab environment. Read more at http://wiki.scipy.org/NumPy_for_Matlab_Users
npA.dot(npA)
mA=np.matrix(A)
print mA
print mA*mA
We also have matrix multiplication available for arrays, but need then to specify by calling the function directly. The function is very flexible, it can take any list-like objects:
print npA.dot(npA)
print npA.dot(mA)
print npA.dot(A)
Random number operations are found in the numpy.random module. help(module.random) is useful, or you can find documentation on the web: http://docs.scipy.org/doc/numpy/reference/routines.random.html
Noticeably: * numpy.random.rand() : returns uniform random numbers between 0 and 1 * numpy.random.randn() : returns random numbers from normal distribution with \(\mu\)=0, \(\sigma\)=1 * numpy.random.randint(low,high) : returns random integers between low and high
All of these can take the optional argument "size", which returns an array of many numbers.
np.random.randn()
np.random.randn(100,100)
# Now, I claimed that scripting languages are slow, so let us compare...
from timeit import timeit
npArr=np.random.randn(1e2,1e2)
myArr=mymatrix3(npArr.tolist()) #tolist() returns a normal python list object..
def test_mymat():
myArr*myArr
def test_numpy():
npArr.dot(npArr)
t_np=timeit(test_numpy, number=1)
print t_np
t_my=timeit(test_mymat, number=1)
print t_my, t_my/t_np
This is terrible, our code is in this quick test 400 times slower than numpy.
Conclusion: Listen to Emanuele, WSE will always give you a better solution than implementing yourself!
As a final test, we can plot a function that shows how much slower our code is depending on array size
# Let us plot a timing evolution...
def test_array(size):
'''
assuming array is a numpy.array
'''
from timeit import timeit
npArr=np.random.randn(size,size)
myArr=mymatrix3(npArr.tolist())
def test_mymat():
myArr*myArr
def test_numpy():
npArr.dot(npArr)
t_np=timeit(test_numpy, number=10)
t_my=timeit(test_mymat, number=10)
# we return the timing ratio:
return t_my/t_np
sizes=[1,5,10,30,50,70,100]
ratios=[]
for size in sizes:
result=test_array(size)
ratios.append(result)
from matplotlib import pylab as plt
fig=plt.plot(sizes,ratios)
Size of an array..?
myarray=np.random.randn(100,50,10)
myarray.size
len(myarray)
myarray.shape
# useful built-in
type(myarray)
sum(sum(myarray))
print myarray.min(), myarray.max()
myarray=np.random.randn(100,50)
fig=pylab.plot(np.cos(myarray),np.sin(myarray),'r.')
# The kind-of-complete example of a matrix class
# This also does some basic testing of the input
class mymatrix:
def __init__(self,data):
'''
data should be a list of lists
all lists in list data should have equal length,
and only contain numbers
'''
import numbers
self.i=len(data)
self.j=len(data[0])
# We check that the content is OK:
for l in data:
if len(l)!=self.j:
raise ValueError("Wrong array dimensions")
for k in l:
if not isinstance(k,numbers.Real):
raise TypeError("Wrong content type in data")
self.data=data
# Addition:
def __add__(self,other):
'''
Defines addition of two matrices
'''
if self.i!=other.i or self.j!=other.j:
raise ValueError("Cannot add matrices of different dimensions")
matrix=[]
for i in range(self.i):
column=[]
for j in range(self.j):
column.append(self.data[i][j]+other.data[i][j])
matrix.append(column)
return mymatrix(matrix)
def __mul__(self,other):
'''
defines multiplication of two matrices
'''
if self.j!=other.i:
raise ValueError("Cannot multiply these matrices")
matrix=[]
for j in range(self.j):
column=[]
for i in range(other.i):
data=0.0
for k in range(self.i):
data+=self.data[k][j]*other.data[i][k]
column.append(data)
matrix.append(column)
return mymatrix(matrix)
def __neg__(self):
'''
Defines -self
'''
matrix=[]
for i in range(self.i):
column=[]
for j in range(self.j):
column.append(-self.data[i][j])
matrix.append(column)
return mymatrix(matrix)
def __sub__(self,other):
'''
Defines self-other
'''
myother=-other
return self+myother
def __pow__(self,power):
if not isinstance(power,int):
raise ValueError("Only integer power defined")
if power<1:
raise ValueError("Only positive powers defined")
# Expert question: Why is this potentially dangerous?
matrix=self
for i in range(1,power):
matrix*=self
return matrix
def __str__(self):
'''
Define a pretty print
'''
data_str=[str(d) for d in self.data]
string='['
string+='\n '.join(data_str)
string+='] '
return string
a=mymatrix([[1,2],[3,4]])
print a
print a+a
print a*a
print a-a+a*a
# New task for you, figure out how to have 2*a working...