Saturday, June 23, 2018

Quick guide of Python classes

This week, I will give some examples of the object-oriented programming part of Python. The object-oriented programming has many advantages. And I really like them due to the reusability of the code and the feeling of modeling the world :-). I will give some examples that I usually use in my code as a quick guide of how to use these in your code. This tutorial serves as a quick guide, if you want to learn more, you should go to the documentation. 

Define a class

Let's start to look at the following examples, where we create a People class. 
class People(object):
    
    def __init__(self, name, age):
        self.name = name
        self.age = age
        
    def greet(self):
        print("Greetings, " + self.name)
Typical python class has a constructor, that is the _init_() function, which initializes the class when you call it. This means that, when you first initialize your class, this function will be only run once. The 'self' means the instance itself, you can find a very good explanation here. The self.name and self.age are the attributes that the class will have, you can see from the following example. And the greet function is the method that we define in the class. Let's see how we can actually use it. 
person1 = People(name = 'Iron Man', age = 35)
person1.greet()
print(person1.name)
Greetings, Iron Man
Iron Man

Inheritance

One of the most important features of using the object-oriented programming is that we can easily reuse the code above to create some new classes. For example, we want to have another class Teacher that have all the attributes and methods People class have, but at the same time, we want to have more new methods. 
class Teacher(People):
    
    def teach_students(self, x, y):
        print('x + y = %d'%(x+y))
teacher1 = Teacher(name = 'Susan', age = 24)
teacher1.greet()
teacher1.teach_students(x = 3, y = 5)
Greetings, Susan
x + y = 8
We can see from the above code that, we don't need to re-define all the attributes and greet function, the Teacher class actually have all these from People class, this is due to this line: class Teacher(People), which basically say that we want Teacher class to get all the things from People class. This is called inherit, and class Teacher inherits from class People. And People is the parent class and Teacher is a child class. And then we could extend the methods in class Teacher by just define new functions. Or if we want to replace some of the old methods in People, all we need to do is to re-define the function, for example, in the following lines, we replace the greet method in People with a new one that greet the teacher. 
class Teacher(People):
    
    def greet(self):
        print("Greetings, teacher: " + self.name)
    
    def teach_students(self, x, y):
        print('x + y = %d'%(x+y))
teacher1 = Teacher(name = 'Susan', age = 24)
teacher1.greet()
Greetings, teacher: Susan

The super method

Also, often times, we want to expand the constructor by having more attributes and so on, but at the same time, we don't want to re-type all the code as before, therefore, we could use the super method to avoid referring to the parent class explicitly. Let's see below that we want to add a studentId field into the Student class. 
class Student(People):
    def __init__(self, name, age, studentId):
        super().__init__(name, age)
        self.studentId = studentId
student1 = Student(name = 'Kevin', age = 20, studentId = '12345')
print('Student %s has id as %s'%(student1.name, student1.studentId))
Student Kevin has id as 12345

Multiple inheritances

What if we have a student_teacher class that we want to inherit from both Teacher and Student class. Easy, you can just do the following:
class Student_Teacher(Teacher, Student):
    pass
st1 = Student_Teacher(name = 'Kate', age = 23, studentId = '54321')
print('Teacher %s has studentId as %s'%(st1.name, st1.studentId))
st1.teach_students(3,6)
Teacher Kate has studentId as 54321
x + y = 9

Thursday, June 21, 2018

Independent component analysis example

Explain of ICA

This week, let's talk about the Independent Component Analysis (ICA), which is a method that could separate the mixture of the signals back to the sources. Let's first see an example here:
The most famous example is the cocktail party effect: imagine a very simple case, that you are in a cocktail party that has 4 people, you, and A, B, C. You are talking with person A, and person B and C are in another conversation. Therefore, there are two conversations going on at the same time independently. Now, let's say, one conversation is signal 1 - s1 and the other one is signal 2 - s2. If we have two recorders near us at different place, they will record the two conversations that mixed together. Say r1 and r2 are the recorded conversations, they are from different mixing of the two signals s1, s2. As sounds can be summed linearly, therefore, r1 = a1 * s1 + b1 * s2, and r2 = a2 * s1 + b2 * s2. Now, from our recorded two signals, r1 and r2, is there a way we could find out the source s1 and s2? 
This is where ICA comes in, it is a method that could help us to find the two signal sources from the mixed recordings. It is really useful, and belong to a larger area called blind signal separation. You can find all the code on Qingkai's Github. 

Example below

Let's generate two signals and see if we could separate them. We use two different sources, one is a sine wave, and the other one is just a square wave. 
import numpy as np
import matplotlib.pyplot as plt
plt.style.use('seaborn-poster')
%matplotlib inline
t = np.arange(0, 10, 0.1)

s1 = np.sin(2*t + 3)

s2 = np.sign(np.sin(3 * t)) 
Let's plot the two sources:
plt.figure(figsize = (10, 8))
plt.plot(t, s1, label = 'Source 1')
plt.plot(t, s2, label = 'Source 2')
plt.legend(loc = 2)
plt.xlabel('Time')
plt.ylabel('Amplitude')
Text(0,0.5,'Amplitude')
png
Now let's assume that we have two recorders that recorded the mixture of the two sources, but they emphasize the two sources differently (plus, we add in some white noise as well). 
r1 = 2*s1 + 3*s2 + 0.05* np.random.normal(size=len(t))
r2 = 0.5*s1 + 2*s2 + 0.05 * np.random.normal(size=len(t))
plt.figure(figsize = (10, 8))
plt.subplot(211)
plt.plot(t, r1, label = 'Recording 1')
plt.legend(loc = 2)
plt.xlabel('Time')
plt.ylabel('Amplitude')
plt.subplot(212)
plt.plot(t, r2, label = 'Recording 2')
plt.legend(loc = 2)
plt.xlabel('Time')
plt.ylabel('Amplitude')
plt.tight_layout()
png
Now let's try to use ICA to separate the two signals. 
from sklearn.decomposition import FastICA
S = np.c_[r1, r2]
S /= S.std(axis=0) 
# Compute ICA
ica = FastICA(n_components = 2, random_state=2)
signal_ica = ica.fit_transform(S)  # Reconstruct signals
A_ = ica.mixing_  # Get estimated mixing matrix
print(A_)
[[4.41784919 8.97120998]
 [1.98294353 9.80142515]]
plt.figure(figsize = (10, 8))
l = plt.plot(signal_ica)
plt.legend(iter(l), ('Reconstructed source 1', 'Reconstructed source 2'), loc = 2)
plt.xlabel('Time')
plt.ylabel('Amplitude')
Text(0,0.5,'Amplitude')
png

Wednesday, June 13, 2018

Profile your code in Jupyter notebook/lab

We discussed using profiler to profile your code and find out where it is slow in the previous blog, and but you need to run from command line. Today, we will have a look of the profile code in jupyter notebook. Note that, if you haven’t installed ‘line_profiler’, install it first:
pip install line_profiler
Let’s first define some functions to calculate random things. There are three functions that calling one by one.
def square_the_value(x, y):
    
    a = add_1000_times(x, y)
    
    return a**2

def add_1000_times(x, y):
    z = 0
    for i in range(1000):
        z += x
        for j in range(1000):
            z += y
        
    return z

def calculate_my_value(x, y):
    
    a = x + y
    b = x - y
    
    print(square_the_value(a, b))
calculate_my_value(1, 2)
994009000000
Now we want to have an idea of which part of the code running fast and which part running slow. We could use the line_profiler to do the job. First, we need to load the extension:
%load_ext line_profiler
Let’s profile the top level function that we run. We can see that we use ‘%lprun’, which basically run the line_profiler, the ‘-f’ flag is to tell it which function or method we want to profile, and the calculate_my_value(1, 2) is the real statement that we want to run:
%lprun -f calculate_my_value calculate_my_value(1, 2)
994009000000



Timer unit: 1e-06 s

Total time: 0.295409 s
File: <ipython-input-1-0c3fada21717>
Function: calculate_my_value at line 16

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
    16                                           def calculate_my_value(x, y):
    17                                               
    18         1          3.0      3.0      0.0      a = x + y
    19         1          1.0      1.0      0.0      b = x - y
    20                                               
    21         1     295405.0 295405.0    100.0      print(square_the_value(a, b))
Now we could see that the line_profiler give us the time to run each line, and what’s the percentage of this line takes. We could see that the last line used all the time. We can continue to profile the last time by entering into the square_the_value function:
%lprun -f square_the_value calculate_my_value(1, 2)
994009000000



Timer unit: 1e-06 s

Total time: 0.39605 s
File: <ipython-input-1-0c3fada21717>
Function: square_the_value at line 1

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     1                                           def square_the_value(x, y):
     2                                               
     3         1     396048.0 396048.0    100.0      a = add_1000_times(x, y)
     4                                               
     5         1          2.0      2.0      0.0      return a**2
Similarly, we could profile the add_1000_times function to figure out which line really takes all the time:
%lprun -f add_1000_times calculate_my_value(1, 2)
994009000000



Timer unit: 1e-06 s

Total time: 0.829793 s
File: <ipython-input-1-0c3fada21717>
Function: add_1000_times at line 7

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     7                                           def add_1000_times(x, y):
     8         1          1.0      1.0      0.0      z = 0
     9      1001        382.0      0.4      0.0      for i in range(1000):
    10      1000        423.0      0.4      0.1          z += x
    11   1001000     388197.0      0.4     46.8          for j in range(1000):
    12   1000000     440788.0      0.4     53.1              z += y
    13                                                   
    14         1          2.0      2.0      0.0      return z
The profiler is really useful, I use it all the time to optimize my code to remove some of the inefficient code. Hope you will find it useful as well.