Saturday, June 24, 2017

Machine learning 12: Machine learning using scikit-learn (Workshop materials)

I am teaching a workshop on machine learning using scikit-learn at 2017 CDIPS Data Science Workshop, UC Berkeley. It covers the general idea of machine learning and a brief tour of different types of learning (regression, classification, clustering, dimensionality reduction) with examples written in scikit-learn. It is a 2-hour workshop, and you can find all the materials on Qingkai's Github.


Monday, June 19, 2017

My son is here for Father's day

Last weekend on June 17th, my son was born, just in time for the Father's day. He is 8 pounds 9 ounces (much heavier than my daughter was born). 

My son looks like a thinker in the future. 


class newBaby:
    def __init__(self, name, weight): = name
        self.weight = weight
    def sayHello(self):
        print('Hello world, I am %s'
    def showWeight(self):
        print('I am %.1f pounds'%self.weight)
baby2 = newBaby('Fanqi', 9)
Hello world, I am Fanqi
I am 9.0 pounds

Saturday, June 10, 2017

Python: Using virtual environments

Many times, we find ourselves need to use some of the python packages that we don't want to install in our system or certain version. For example, with pandas updated to version 0.20.2, but you find out you have some old codes depend on the version 0.19.2. In this situation, using a virtual environment to manage it will be really handy. Or you find some cool python packages online, but they require Python 3 instead of 2. This week, I will write here what I usually do in these situations. 

Using Virtualenv

I usually use virtualenv to create an environment that isolated from my main python environment. As we mentioned in one situation, I have pandas version 0.20.2 installed in my python environment, but I want to use pandas version 0.19.2 in some of my old scripts.
# if you don't have virtualenv, you need to install it first
$ pip install virtualenv
# enter into your project folder
$ cd path_to_old_project/
# create the virtual environment
$ virtualenv venv
# activate the virtual environment
$ source venv/bin/activate

# after the activation, we should see a (venv) at the beginning of 
# the terminal prompt indicating that we are working inside the 
# virtual environment now. 

# install the old package I need, i.e. pandas 0.19.2
$ pip install pandas==0.19.2
When I work in my virtual environment, I usually add venv to my project's .gitignore file in case I accidentally commit all the virtual environment. 
After working in the virtual environment, to leave it:
$ deactivate

Using Python 3 to create virtual environment

If you are using Python 3, things will be easier, since you can create the virtual environment directly, for example: python3 -m venv /path/to/new/virtual/environment
$ python3 -m venv venv
$ source venv/bin/activate

Managing Python 2 and 3 on MAC using conda

Sometimes, we want to have both Python 2 and 3 on our machine. Since I am using conda as the package manager, I also use it to manage different environments. On default, I am using Python 2.7, and I usually create and activate Python 3 environment this way:
# create an environment that have Python 3 installed
$ conda create -n py3 python=3
# start the Python 3 environment
$ source activate py3
# You can use the following to check different environment
$ conda info -e

Create Python 3 environment using Virtualenv

The other way is to use Virtualenv to create a Python 3 environment. 
$ virtualenv -p python3 env
$ source ./env/bin/activate

Saturday, June 3, 2017

Guitar: Happy birthday to you

My daughter and I were singing together for her birthday. It was fun, and now she really like to sing with me whenever I play guitar. I hope one day, she will play and I sing along, not long ^)^

Sunday, May 28, 2017

Python tricks I really like to use in my daily work

This week, I'd like to write a blog with the python tricks that I really like to use in my everyday work. These tricks can save me time or space. Hope these are useful to you as well. If you have good ones, let me know. You can find the notebook on Qingkai's Github

Print path of the imported string

import threading 
import socket
<module 'threading' from '/Users/qingkaikong/miniconda2/lib/python2.7/threading.pyc'>
<module 'socket' from '/Users/qingkaikong/miniconda2/lib/python2.7/socket.pyc'>

Inspect an object

a = [1, 2, 3, 4]
['__add__', '__class__', '__contains__', '__delattr__', '__delitem__', '__delslice__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__getslice__', '__gt__', '__hash__', '__iadd__', '__imul__', '__init__', '__iter__', '__le__', '__len__', '__lt__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__reversed__', '__rmul__', '__setattr__', '__setitem__', '__setslice__', '__sizeof__', '__str__', '__subclasshook__', 'append', 'count', 'extend', 'index', 'insert', 'pop', 'remove', 'reverse', 'sort']

Reversing an iterable using negative step

This is really handy if you want to reverse an object. 
a = 'Hello world'
b = [1,2,3,4,5]
dlrow olleH
[5, 4, 3, 2, 1]

Using zip

I like to use zip in the loop, especially when I plot something with different colors, very handy. 
a = ['H', 'O', 'H']
b = ['i', 'k', 'a']
for x, y in zip(a, b):

Swap two numbers

Swap in oneliner. 
a = 4
b = 2 
b, a = a, b
print(a, b)
(2, 4)

List/dictionary comprehension

One of my favorite, and it can save a lot of spaces. 
print([x * x for x in range(0, 10)])
print({i: i**2 for i in range(5)})
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
{0: 0, 1: 1, 2: 4, 3: 9, 4: 16}


When you need the index as well in the loop. 
a = ['Hello', 'world', '!']
for i, x in enumerate(a):
    print('{}, {}'.format(i, x))
0, Hello
1, world
2, !

Conditional assignment

A quick conditional assignment that saves a lot of spaces. 
y = 3
x = 3 if (y == 1) else 2

Transpose an array

A quick way to transpose an array. 
a = [(1,2), (3,4), (5,6)]
[(1, 3, 5), (2, 4, 6)]

lambda function

I like lambda function, especially use it to define some simple functions. 
f = lambda x = 1, y = 1: x + y 
print(f(1, 2))

Map function

A quick way to apply the same operation on all the items in a container. 
f = lambda x: x**2

a = [1, 2, 3, 4, 5]
print(map(f, a))
[1, 4, 9, 16, 25]


A quick sort of tuple.
# sort based on the 1st item
a = [(2, "b"), (1, "a"), (4, "d"), (3, "c")]

# sort based on the 2nd item
b = [("b", 2), ("a", 1), ("d", 4), ("c", 3)]
print(sorted(b, key=lambda x: x[1]))
[(1, 'a'), (2, 'b'), (3, 'c'), (4, 'd')]
[('a', 1), ('b', 2), ('c', 3), ('d', 4)]

Flatting a list with sum

a = [[1, 2, 3], [4, 5], [6], [7, 8, 9]]
print(sum(a, []))
[1, 2, 3, 4, 5, 6, 7, 8, 9]

Inverting a dictionary

a = {"one":1, "two":2, "three":3, "four":4, "five":5}
print(dict(zip(a.values(), a.keys())))
{'four': 4, 'three': 3, 'five': 5, 'two': 2, 'one': 1}
{1: 'one', 2: 'two', 3: 'three', 4: 'four', 5: 'five'}

Partial function

from functools import partial
bound_func = partial(range, 0, 11)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

Start a simple server

Sometimes, I need a simple server to test. Oneliner!
# in the command line run this
python -m SimpleHTTPServer

Chaining comparison operators

a = 5
if 0 < a < 10:

Function argument unpack

def print_number(x, y):
    print("x: %d"%x)
    print("y: %d"%y)

point_foo = (3, 4)
point_bar = {'y': 3, 'x': 2}

x: 3
y: 4
x: 2
y: 3

Get a unique random ID

import uuid
print uuid.uuid4()

import this

import this
The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!