Saturday, December 24, 2016

LED shirt with pocket

Over the past few years my brother has given me some pretty sweet t-shirts. He is a professional screen printer, so he has at his disposal some pretty awesome facilities for making shirts and posters. This year I decided to give him a shirt in return. Instead of trying to print something, I figured I use my Arduino skillz to make a cool wearable. Originally I wanted to arrange some LED sequins in the shape of two boobs, but after seeing the LEDs I decided to arrange them randomly on the shirt. The LEDs are really bright, so I thought it would be best if I let them speak for themselves.

This was the first time I've made something outside of NYUAD. I had to order the everything online, which ended up being kind of pricey. It's sort of crazy how spoiled we get at school! For now, I've just got the LEDs on all the time. I'm using a Lilypad USB for this project. I chose it because it has a lot of pins, and its relatively small. I ended up buying a big battery (2000 mAh) so the shirt will stay on for several consecutive days. The size of the battery meant that I had to create some means of making sure the battery didn't swing around when the user (my brother) was wearing the shirt. I ended up just sewing on a pocket, with the help of my mom.

Below you can see the progression of how the shirt was made.

Friday, September 23, 2016

t-SNE in Python (part 2)

As promised, in this post I'll talk about my implementation of the t-SNE dimensionality reduction algorithm in Python. Sorry in advance for the lack of syntax highlighting -- I haven't figured that out yet. I've made a version that explicitly calculates the gradient of each vector $Y$ in the reduced dataset, and another version that employs Theano's grad function. The Numpy version was a bit tricky to implement. The Theano version, for once, was actually easier than the Numpy version, because all you do is just slam in T.grad. I'll start with the Numpy version, and then move on to the Theano version. Before we jump into any code, let's state the the gradient of the cost function:
$$
\frac{\partial Cost}{\partial y_{i}} = 4\sum_{j} (p_{ij} - q_{ij})(y_i - y_j)(1 + ||y_i - y_j||^2)^{-1}
$$
Whaaaaaaa??? Take 11 lines of code and turn it into 2? Theano, you've stolen my heart! The crazy part about this is that it even runs a little faster than the Numpy version. As always, there is some overhead (around 3 seconds on my machine) to compile the function that computes the gradient. When 'training' the t-SNE algorithm however, the Theano version is about 1.5x faster, so you quickly make back this time.

Tuesday, September 20, 2016

t-SNE in Python (part 1)

In reading some papers about Tomas Mikolov's word2vec algorithm for creating word embeddings, I came across a cool method for visualizing high dimensional data. I have some experience with methods that are used for dimensionality reduction like PCA and autoencoders (check out an old but presumably working version here), but I'd never encountered something that was purpose built for visualizing high dimensional data in two or three dimensions. t-SNE (t distributed stochastic neighbor embedding) attempts to minimize a 'difference' (I'll clarify this a hot sec) function between what amounts to distances in high and low dimensional space. Here, we are calculating 'distances' (really probabilities) between vectors in the dataset. t-SNE has a homepage here, where you can find a bunch of resources. The original paper (also on the homepage) is pretty straightforward in its explanation of the algorithm. I'll quick describe what's going on with the algorithm before discussing one aspect of it that I've been working on.

Say you have a set of vectors ${x_0, x_1, x_2, ... ,x_N} \in X$ each with dimension k. As we're dealing with a dimensionality reduction, assume that kis something large. For example, we could be dealing with the MNIST dataset, where kwould be 784. The goal is to generate some good representation of $X$ in a lower dimensional space, say of dimension l.Ideally lwould be something like 2 or 3, so we could see our data on a 2- or 3-D scatter plot. Let's call this low dimensional representation $Y$. The goal here is to preserve the relative position of each vector in $X$ in our new representation $Y$. In other words, for each vector $x_i$ we want to be able to create an embedding $y_i$ that preserves some distance metric between each other vector in both $X$ and $Y$. t-SNE uses two difference distance metrics -- one for the vectors in $X$ and another for the vectors in $Y$. For a detailed look at why they do this, check out their paper. For each $x_i$ we can calculate an associated probability $p_{j|i}$ that looks as follows:
$$
p_{j|i} = \frac{exp(-{||x_i - x_j||}^{2} / 2\sigma_i^2)}{\sum_{k\ne i}exp(-{||x_i - x_k||}^{2} / 2\sigma_i^2)}
$$

Here $x_j$ is another vector in $X$. As such we are calculating the probability of vector $j$ given vector $i$. I don't think that this distribution is entirely intuitive, but you can think of it as the probability that $x_i$ would 'choose' vector $x_j$ As van der Maaten points out in their paper, vectors that are close to each other (as in their euclidean distance is near zero) will have a relatively high probability, while those with larger distances will have a much lower probability. In practice, one doesn't actually use this probability distribution. Instead, we use a joint probability distribution over indices $i$ and $j$. I was puzzled by the normalization factor, and it only became clear to me once I took a peak at some code.
$$
p_{ij} = \frac{exp(-{||x_i - x_j||}^{2} / 2\sigma^2)}{\sum_{k\ne l}exp(-{||x_i - x_l||}^{2} / 2\sigma^2)} \\
p_{ij} = \frac{p_{i|j} + p_{j|i}}{2n} $$

The second line above has to do with a symmetry consideration. Again, if you're interested, check out van der Maaten's paper. The way I think of this is as follows: We calculate the $p_{i|j}$ as we normally would. What we get from this is an $nxn$ matrix, where $n$ is the number of vectors in our dataset. Note that as a joint probability matrix, this sucker is not yet normalized! We then add the transpose of the unnormalized probability distribution to the $p_{i|j}$ matrix, and normalize that. Van der Maaten and crew also define a joint probability distribution for the vectors in $Y$. Instead of using the same distribution as for the vectors in $X$, t-SNE employs a Student t-distribution with a single degree of freedom. This choice is well motivated, as it avoids overcrowding in the lower dimensional space, as the Student t-distribution has fatter tails than the exponential distribution. Note that we could use an exponential distribution (with no sigma factor), and we would simply be doing symmetric SNE. Anyways, the joint probability distribution looks as follows: $$
q_{ij} = \frac{{(1 + {||y_i - y_j||}^{2})}^{-1}}{\sum_{k\ne l}{(1 + {||y_l - y_k||}^{2})}^{-1}}
$$

As before, that $l$ index confused me quite a bit before I saw how this algorithm got implemented in some code. Basically the idea is that we calculate a matrix full of unnormalized $q_{ij}$s, and then normalize over the whole matrix. The cost function that we minimize in order to generate the best embedding $y_i$ for each each vector $x_i$ looks as follows.

$$
Cost = \sum_{i=0} \sum_{j=0} p_{ji}log(\frac{p_{ji}}{q_{ji}})
$$
We can minimize this function using conventional neural net techniques, like gradient descent. Van der Maaten and co spend some time talking about the gradient of the cost function, but I thought I would leave it up to Theano to do that... The problem I've been concerned with is calculating the free parameter $\sigma_i$ in $p_{j|i}$. In their paper, Van der Maaten and co. say that they use binary search to find an optimal value of $\sigma_i$. I thought it would be cool if I could leverage the machinery of Scipy/Numpy (and even Theano!) to do this. First, however, a description of the problem. t-SNE has a hyperparameter, called the perplexity that determines the value of $\sigma_i$ for each $x_i$. The perplexity is the same for every vector in $X$. One calculates the perplexity as follows:
$$
Perplexity(x_i) = 2^{H_i} \\
H_i = - \sum_{j} p_{j|i} log_{2}(p_{j|i}) $$
$H_i$ is called the entropy (which looks the same as entropy in Physics, minus the Boltzmann factor). So lets say we set the perplexity to be 10, the goal is to find a value of $\sigma_i$ for each $x_i$ such that the $Perplexity(x_i)$ is 10. In principle this is just a matter of creating some Python function that has $\sigma_i$ as an argument, and use scipy's optimize module to find the minimum of the squared distance between the chosen perplexity and the calculated value. I wondered, however, if I could use Newton's method, in conjunction with Theano's symbolic gradient to do this. It turns out that I couldn't do that, so I just stuck with Scipy's optimize module. In the next post, I'll talk about my implementation of the t-SNE algorithm, using conventional Numpy and Theano.

Saturday, September 3, 2016

Solving differential equations in Theano

I've spent some time messing around with Theano, building and training neural nets. For my senior thesis, I used my own implementation of a Theano neural net to classify waveforms in the XENON100 dark matter detector. I've even spent some time working on LSTMs for text generation (The latter project is essentially my attempt to implement Karpathy's char-rnn in Theano). My understanding is that Theano was built with training neural networks in mind, hence all the emphasis on automatic differentiation and GPU support. In reality, Theano is a general purpose symbolic math library with lots of convenient neural net functionality, like built in squashing functions, downsampling functions, and even 2D convolution functions. Theano is fast because it compiles code into C code on the fly. We make Theano functions by creating computational graphs using symbolic variables. This means that there is some overhead when running Theano code -- my LSTM might take as much as a minute to compile before it starts actually training.

Tuesday, August 9, 2016

NBody in LuaJIT

If you're tired of reading about me doing NBody simulations, then you might want to skip this post. The interesting thing about this one is not the NBody part, but the LuaJIT part. I've been toying with the idea of learning Lua for awhile now. It doesn't have a lot of the convenient functionalities of Python, but the Just in Time (JIT) compiler promises to be really darn fast. As an added bonus, Lua is used in Torch, a neural net framework that I've played around with in the past. This time however, I wanted to build something in pure Lua, with no help from an external library. I thought it would be instructive to see if I could code up an array object that sort of simulates the behavior of the Numpy array. With this object in hand, it would (presumably) be a piece of cake to code up an NBody simulation. Additionally, I thought it would be fun to do a little speed comparison between my Lua implementation and a Python (numpy) implementation.

Wednesday, July 27, 2016

Sram Rival 1x Hydro first 40 miles review

After moving to NYC for the summer, I found myself in need of a new bike. The bike I rode across the country last summer was always a little too small for me, and after growing about half an inch in the last year, it really doesn't fit me anymore. I was going to bring a bike to NYC for commuting, but it got stolen out of my car when I was visiting San Francisco. Luckily the one that got stolen was kind of a piece of junk. It was a fixed gear strapped to a cheap origin8 frame. The one thing I liked (and now miss) were the larger volume tires and the wide mountain bike handlebars. There's something about rolling around on a fixed gear with wide-ass bars that just puts a smile on my face.

Tuesday, July 26, 2016

Multithreading in PyQt

Monday, March 14, 2016

Simple MLP with Torch 7

I've been busy working on my capstone project lately. The goal of my project is to be able to distinguish between good and bad data generated by the XENON100 dark matter experiment. Researchers at the collaboration have been using some simple cut based methods to get rid of noisy data, but as the experiment has aged these methods have become really inefficient. In other words, they've been throwing away a lot of data that could include the signal from a dark matter interaction event. Long story short, I'm using some machine learning techniques to categorize data into "good" and "bad" categories. A method called boosted decision tree (BDT) has proven to be pretty effective at this task in the past, and I'm using an implementation bundled in this piece of software called TMVA. TMVA has been used for data analysis in a variety of high energy physics experiments. It's pretty effing cool because it includes many classification algorithms, including, but not limited to SVM (support vector machine), MLP (multilayer perceptron), BDT, other decision tree methods, and a bunch of stuff I've never heard of. Skip the next bit if you don't want to read my rant on ROOT. It's awful because it's all built in ROOT. I don't want to learn ROOT. I'm a Python programmer, with some experience with Java and JavaScript. I see all those asterisks, arrows, and character arrays and my eyes glaze over. I've tried to write some basic ROOT "macros" but they throw these cryptic errors that don't make a damn bit of sense. You don't realize how spoiled you are with Python until you use something like ROOT. The worst shit is the interface between Python and ROOT. There is a pretty decent library called rootpy, but the documentation and source code is all but impenetrable (500 line pieces of code just to bind the ROOT histogram class to Python ???). When rootpy works, it is pretty Pythonic, but it doesn't allow for a ton of flexibility. Say you run some stuff through TMVA, and you want to grab and manipulate some histograms that are stored in root files. Forget about it. You just can't.

I should note that ROOT has some very powerful tools for data analysis, and it's pretty darn fast, as it's built right on top of C++. If part of my capstone were learning ROOT, I think I'd be a little less critical, but right now I see it as this obnoxious obstacle to me doing my analysis.

Anyways, I thought it would be cool to see if I could dump some of the data I've been creating into another machine learning framework and see if I can categorize my signals. In the past, I've used Theano, and found it to be really difficult. Even the tutorial on the most basic neural net architecture (MLP) stymied me. Even though I implemented my own version of a MLP, I still felt very uncomfortable with Theano. Lately, I've moved to Torch. Torch is awesome. Below is the code for setting up and training my MLP.


Okay, okay, I'll admit the line for setting up the training seems pretty black box. Implementing something similar to the built in stochastic gradient descent is not that tricky though. (I'll show some code that does it soon). Basically you chop up your training dataset into chunks (minibatches), feed it through the net, calculate gradients, and update weight matrices. Torch is built on Lua. Lua is pretty straight forward. Lua isn't as widely divulged as Python, so there isn't as much help on line, but it's not terrible. Torch's documentation is on github, and like matplotlib, is contained in one big page. This is a bit obnoxious, but not the end of the world. The biggest issue with Torch at the end of the day is loading and creating datasets. The code that loads in my data (stored in csv files) is about twice as long as the code to build the MLP. Once you have it though, the neural net is almost trivial (at least compared to Theano). As I delve deeper into this stuff, I'm sure I'll find that Torch is every bit as complicated as Theano, but I'm pretty happy for the moment.




Tuesday, February 23, 2016

Annoytron 3000 update

Ben and I have been hard at work getting the annoytron up and running. We've solved a number of problems with our circuit such that the only challenge we now face is one of miniaturization. With the help of one our idea lab colleagues, Ben found something called a thyristor that allows us to turn our circuit on, but not turn it off. This means that users will not be able to turn the device off using the main power switch. This solves perhaps the biggest problem associated with our project. Now we don't have to worry about hiding the power switch or introducing some sort of software solution -- we have a single component that allows us to control the way users interact with the on switch. We've got a very basic working prototype contained in a cardboard tube. Check out some pictures below.

Annoytron with guts shoved inside. 

Annoytron with guts hanging out 

On/off switch. We'll be changing this out later. Using this switch feels great. 

In addition to the switch problem, we were having trouble figuring out how we would charge the battery. Looking around online, I found some nice looking breakout boards that allow one to charge the battery while plugged in, but they looked too large for our purposes. We ultimately settled on a simple circuit that will charge the battery as long as the battery isn't being used by the Arduino. We think this is a reasonable decision, as Ben and I assume that users won't be using the dildo while it's plugged in.

The power regulator we're using has a "shutdown" pin, which cuts power when pulled to ground. This allows us to automatically turn of the device after the user brings it to completion. I hooked up a transistor to the shutdown pin I think my colleagues in the Idea Lab will appreciate this, as the piezo is really obnoxious.

Moving forward, Ben and I need to print another, larger dildo (so we can fit all the electronics inside) that doesn't have any internal structure. I'm anxious to get this device put together so we can start annoying Allen.







Tuesday, January 26, 2016

Annoy-tron 3000

I'm currently working on a little project with my buddy Ben. The goal of the project is to make something that will annoy Allen. The electronics we build for this project can definitely be used for other, more potentially profitable purposes I believe. The idea of the project is to embed a temperature sensor, Arduino and buzzer in an 3D printed dildo. By themselves, Arduinos can only generate square waves which are super obnoxious when very high pitched. I envision a typical interaction going as follows. A user will be presented with a screeching dildo. He or she will be told that he or she can only turn it off if they can heat up the dildo past a certain threshold. After furiously heating up the dildo, it will eventually turn off. The interaction here is quite simple, but this project entails an interesting technical challenge. We have to embed an arduino, on/off switch, buzzer, temperature sensor, battery and charging implement in a prosthetic phallus. The on/off switch has to be sufficiently difficult to reach that the user is incentivized to rub the dildo instead of trying to turn it off. We also need to able to charge the battery while still plugged into the Arduino.

In future posts I'll put up some pictures so Ben and I can track our progress. 

Jterm 2016 - UAE and Oman

Over J-term, I had the opportunity to leave coding, electronics and Physics aside for a few weeks. I took a class in which we travelled all over the UAE and part of Oman. I was able to make some new friends and spend some quality time with my boy Eder. The class itself was interesting as well. In principle, the purpose of the class was to study the interaction between three different landscapes in the region (Oasis, Coast and Mountain, also the name of the course). In my mind the overriding themes of the class were water management, increased dependence on modern economies, and the interaction between place, state, and identity.


Monday, January 25, 2016

Final Project for Interactive Media

While I've been concentrating a lot on plotting cool systems of differential equations, I want to move my attention a little to the realm of making physically interactive systems. My final project for interactive media is a good example of the kind of stuff that I want to investigate in the coming weeks. My inspiration for the project was the thought that I might be able to feel a virtual electric field. By virtual, I mean that I wouldn't be actually creating an electric field and using a physical sensor to detect it. Instead, I place a virtual point charge at some point in space, and by moving her hand about in the neighboring space, a user can feel how the magnitude of the field changes.