New sidebar widgets
June 24th, 2008I added twitter and del.icio.us sidebar widgets to this site, in case you’re interested.
I added twitter and del.icio.us sidebar widgets to this site, in case you’re interested.
18549 - Embedded Systems Design (Capstone)
For our ECE capstone design project, my team developed a wearable sensor network geared towards casual fitness enthusiasts. The system consists of two “sensor nodes” which collect EKG, pulse oximetry, and pedometer data and one “sensor server” which combines and stores this data, associating it with space and time using a GPS module. After an exercise session, the data can be uploaded to a PC via bluetooth and then to a website which displays the collected data in space and time.
Here is one exercise dataset that we collected.
For more information about the project, see this page.
10601 - Machine Learning final project
For Machine Learning, I did a project related to my embedded systems project. The idea was to classify body sensor data by user properties such as gender, age, current activity, and handedness. The BodyMedia dataset from the ICML 2004 Physiological Data Modeling Contest was used. I analyzed the effectiveness of common learning algorithms such as (Gaussian) Naive Bayes, logistic regression, and k-nearest neighbors in classifying the data (using varying degrees of smoothing) and used the parameters generated by these algorithms to build an “online” classifier which could run on an embedded sensor apparatus and classify properties of the user in real time (pretty much just LR weights because the others have much larger classification-time time and space requirements). I didn’t get very far with the online part, but there were some interesting offline results.
For more information, check out my final report and source code (mentioned two posts ago).
Research
I spent more than 60 hours over 7 days last week working on my research group’s submission to OSDI 2008, which was due on Thursday. I’m not sure how much I can reveal about it at this point, but it’s basically a system for detecting failures in Hadoop in real time using both application and system level information (and it actually works).
So yeah, the past few weeks were pretty busy :)
Something like this. My focus next year will be on research, but I’m also going to be a TA for 15411 - Compiler Design, and I’m considering taking Acting for Non-Majors if it doesn’t require much time outside of class.
I’ve been working on my final project for my machine learning class (10-601) which involves throwing a few standard machine learning techniques at the Bodymedia dataset. I decided to write the code in ML, specifically Caml, which I’ve been using for my code so far in the class. I started using Caml instead of SML because it has easier to use built-in libraries.
The elegance of ML is obvious when you work on a project with several components and interfaces between them. In this case, I was impressed by the clarity and simplicity with which I was able to define a framework for writing machine learning code. To be more concrete, the common types that I’ve defined are:
type property = RealProperty of float | IntProperty of int
type class’ = int
type example = property list
type labeled_example = example * class’
type classifier = example -> class’
(class is a reserved word in OCaml, so I use class’). Admittedly this is not perfect because class’ is an int, whereas I really want it to be a small set which depends on the dataset, but using an int turns out to make building classifiers a bit easier. To build a classifier, for example a logistic regression classifier, I use a function
val build_lr_classifier : float -> float -> labeled_example list -> classifier
where the first two arguments are parameters (gradient step and regularization factor in this case). The logistic regression classifier builder returns a classifier by currying a classify function with the computed logistic regression weights, giving a function which is simply an “example -> class’”.
To validate a classifier builder, I use a function such as
val kfold_validation_accuracy: int -> labeled_example list -> (labeled_example list -> classifier) -> float
where “labeled_example list -> classifier” is a function that I create by currying the classifier builder with
its parameters, i.e. “build_lr_classifier 0.1 0.1″ or “build_naivebayes_classifier 2″.
In general, one of the great things about ML and other languages with similar typing systems is that implementation falls straight out of the type specifications in the interfaces. In this case we see that the essence of machine learning is expressible in a few type definitions, and some cool features emerge from these definitions such as the ability to use currying to provide classifiers and generic classifier builders. Writing the functions is just a matter of fulfilling the type specifications by implementing the non-trivial rules specific to the algorithm at hand.
Now that midterms are over, I’m headed to Cancun, Mexico, with some of my friends for a stereotypical spring break experience. It will probably be pretty awkward. I’ll be back in a week.
During my winter break I worked with Mike on the Multisnake project. Multisnake is a multiplayer online snake game. We wrote the client interface in ActionScript and compiled it using MTASC and swfmill. The server is written in Java. In order to scale to an arbitrary number of players, every server reports to a central database, and clients are routed to the optimal server by a frontend script. There are still several major bugs in the system, so we haven’t promoted it much. In particular, the game is pretty laggy because Flash only implements TCP sockets, which have a relatively large packet overhead. This is a problem when you have to send a message to every player at every game step.
This is part of Supple Labs, which is basically a pseudo-startup that pools skills and resources to implement fun, innovative, and potentially profitable projects.
One of the most interesting concepts that I’m learning about this semester is the equivalence of logical proofs and programs (Curry-Howard correspondance). This was touched upon in some of my earlier CS courses, but I never really took the time to understand it. Now that I’ve gained some practical experience, I’m starting to enjoy these abstract topics more in general.
For the past 3-4 years I’ve felt deficient in math (proving, not so much applying) compared to programming. I think it’s because programming is obviously fun, whereas the appeal of math is more subtle, and I get better at things that I enjoy doing. It’s easy to say this just to sound tough, but I’m going to work at bridging the gap between my elite hacking skills (well, not quite) and my general mathematical reasoning abilities by putting a lot of effort into my current “theory” courses and my research now and trying some advanced courses related to these topics (programming languages, machine learning, logic, algorithms) as a master’s student next year.
Here is some consolation for those of us who have been less than successful in the realm of dating.