Filling the Gap

Over the last few years I’ve been able to learn a good breadth of skills ranging from product management to large scale data processing and analysis.  But a gap in my toolset still remains: beautiful design.  I can’t for the life of me make something beautiful.  I know Javascript and HTML pretty well, but my CSS sucks.  So, designer/front-end friends, can you please recommend some resources I can start exploring that’ll enable me to make beautiful websites and mobile apps?  This is my current plan:

Design: Defensive Design for the Web, Don’t Make Me Think

CSS: CSS Mastery

Leave me a comment or shoot me an email if you have thoughts.  Thanks!

How Math Can Help You Make Friends

At OneUpMe Eric and I have created several formulas that correlate one user to another based on the votes each player has made.  We’ve used this “friend score” to bring people closer together, both as they play the game and also as they communicate with other players.  Similarly, we’ve created formulas that rank players’ popularity and taste scores, ultimately showing how good or bad a player is at playing the game.  And with these formulas we’ve been able to create a significantly richer OneUpMe experience, connecting similar players together to forge new and deep friendships.  In this post I’ll talk specifically about the friend score, and how math has helped OneUpMe players make friends.

The Math Behind The Friend
Let’s talk about our “friend score” formula first.  On OneUpMe players compete by supplying creative responses to seed topics.  Every day sees a new topic, and players vote on other players’ responses, the best voted response winning for the day.  A player’s response is initially anonymous to all players, until a post is voted on, at which point the voter is able to see the post’s author.  So, if Eric posts a response, I won’t know it’s his until I vote on it.  The formula we chose to determine friend score is listed below.  For sake of example let’s say this formula is calculating the friend score from person A to person B.

N * T / P

In this formula N represents the number of times person A voted for B, T represents the total number of posts made by person B, and P represents the total number of posts made by person B since I’ve joined the game.  Basically, what we’re doing is we’re multiplying the number of votes person A has made for person B by the ratio of posts person A has had the opportunity to vote on.  So friend score will be maximized if person A has voted for each of person B’s posts, assuming person A joined the site earlier than person B.  The math is simple, but in this case simplicity is all we need to represent meaningful relationships.

After we’ve calculated the friend score between each player we normalize the scores, effectively assigning percentiles to friendships.

Applying the friend score
With the normalized friend score–meaning we can say that person A and person B have a friendship strength in the Nth percentile–we’re able to do a lot of things in the game to really drive engagement between friends.  For example, the “hottest” list of posts utilizes the normalized friend score to increase the likeliness of me voting on a post made by someone I’ve previously voted for.  Take a look, in the below example I voted for Eric’s and Carla’s posts, both players I often vote for:

Similarly, we’ve built a news feed feature which takes relevant forum and wall discussions and displays them on my news feed.  The news feed scores discussions based on the friend score between me and the people in discussion, only displaying posts of people I’ve voted for previously, or, people I’m interested in chatting with.

Ultimately we’ve changed the dynamic of the game to be more social, to be played more between friends, friends that develop and change as you play the game.  Our users have said they’re less overwhelmed with the thousands of posts the site sees per day, because the “hottest” post list and the news feed let them connect with the people they already know.  And the OneUpMe champion players, known on the site as “Elders,” have made powerful friendships, and are visiting one and other all across the world.  Seriously, last week a pair met up in London!  Math makes friends!

Next steps
We base friendship on a player’s voting pattern, and we use friendships to encourage discussion and further playing among friends.  Currently a friend score is purely based on gameplay–people you vote for become your friends.  However, such a measurement isn’t quite complete.  We need to start using forum commenting, wall posting, and liking to measure friend strength as well.  So instead of just using game play, we also need to use discussion.

Map Your Facebook Friends

I created a little site that maps Facebook friends.  It’s called Shibby (shibby.us), and it uses the Facebook Python SDK, a Python geocoding library, the Google Maps API, and Django.  Take a look at my graph:

I have a few ideas to make this thing more engaging and more interesting, but for now I plan to leave Shibby as is.  Let me know if you’d like to see other visualizations/features.  Otherwise, enjoy :).

SQLite3: Beware of Concurrency

SQLite3 is a very lightweight implementation of a SQL database.  I’ve been using it in conjunction with Python on a single-threaded tool.  This morning I started refactoring my tool to have multi-process support, but I was interrupted by the following error:

sqlite3.OperationalError: database is locked

After reading through the SQLite3 documentation, I found that SQLite3 does database-level locking when performing write operations.  This means that all hope of parallelizing SQLite3 write operations is lost, because SQLite rejects a concurrent write attempt to the same database.

And I thought MySQL‘s table-level locking was bad (unless you’re using InnoDB) …

Update: for the record, PostreSQL does row-level locking.

Python First Impression

I’ve been using Python now for just about two weeks; I’m falling in love.

Let’s see, where do I begin.  Python makes lots of things really, really easy — things like date formating, date comparisons, db interaction, list manipulation, etc.  The list goes on.  Its built-in support for dictionaries and tuples make it super easy to never, ever define a Java Bean-style class, yet they’re in many ways more powerful than C-style structs.

Python module (egg) support is unreal.  A module exists for just about any task you’d ever want to fulfil — modules for XHTML parsing, modules for URL fetching, etc.

In summary, Python has the speed and flexibility of Perl, with much more powerful built-in support.

Complaints: all member functions need to have the “self” parameter as the first parameter.  In order to have a Python file execute something, one must add a line, “if __name__ ==’__main__’:.”  This is just weird.

Mmmmm.  Python.

Update: I forgot about my biggest complaint of all: how Python deals with default parameters.  Read more here, or take a look at the quote below:

Default parameter values are evaluated when the function definition is executed. This means that the expression is evaluated once, when the function is defined, and that that same “pre-computed” value is used for each call.

My Google, Shanghai: Explained

I’ve talked a little about being in China, but I haven’t said much about why. Up until only recently my duties at Google were unclear, but now I understand my purpose: Christophe dragged me over here to contribute to Hadoop, an open-source MapReduce implementation.

Hadoop is essentially a tool used by software engineers to write programs that use large amounts of computers to process vast amounts of data. Cloud computing is the new buzz word, but Google revolutionized large-scale computing, or distributed-computing, many years ago. Historically, lots of data (like that of the internet) was analyzed by large, expensive computers. In fact, historically, lots of data just flat out wasn’t analyzed. Now, in the wake of MapReduce, Hadoop puts hundreds or even thousands of commodity computers to work to analyze data. Cloud computing is one of the reasons why Google is the best search engine, and industries all over are benefiting from the cloud. Cancer researchers are able to more efficiently understand their data. Astronomers can crunch their images much faster. Hadoop allows any company to effectively understand large amounts of data.

It’s not yet clear exactly how I’ll be contributing to Hadoop; those details should surface soon. I admit that Hadoop is my first open-source project, and I’m very, very excited to be contributing to a field that is growing so rapidly. More updates to come!

Bonus story: after slaving away for four days, I finally have Hadoop’s trunk build running on a multi-node cluster. Boom shakalaka!

I’m Burned Out on Web Programming; Give me Mayhem

I’ve been working feverishly on my developing world capstone project, which is a Ruby on Rails e-commerce+content management website.  I’m starting to realize that I’m slowly becoming burnt out on web programming.  I’m fairly certain that this is just a phase; this is what I’m thinking:

Making websites is great because your products can be used and seen by huge numbers of people with very little upfront time and cost commitments — this is the main reason why I fell in love with the web in the first place.  I remember when I made my first website and got my first user contribution from a stranger; I was so happy I jumped out of my chair and ran around the house for a while.  It’s an awesome feeling having regular people use your product; I’d even go so far as to say that I live for it, partly at least.

The downside to programming websites is that the majority of your tasks are repetitive and, at lest in my opinion, annoying.  The tasks I’m referring to are getting CSS to work in IE6, copying and pasting DB code so that one model can function the same way as another, figuring out how to vertically align something in CSS, getting an XML traversal to work in all browsers, etc.  Rapid development frameworks and JavaScript libraries such as Ruby on Rails and Prototype, respectively, abstract a lot of the knitty-gritty that I just complained about, but they don’t let you totally avoid annoying web development details.  Moreover, most of the websites that I’ve made, with the exception of Timedex, did not have large performance constraints that create interesting engineering challenges.  Perhaps part of that is my own fault for not making popular websites ;).  Regardless, my involvement in website creation has been for the most part not as engineering-esque as I would like it to be.

What interests me the most about the web is scalability.  I would love to be Twitter’s lead softare engineer right now, facing tons of downtime and pissed off customers, figuring out clever ways to deal with insanely computationally-intensive problems.  I get excited just thinking about it.  It seems to me that the only way a 22-year-old kid could be involved with scalability whatsoever is if the company was a very, very small startup.  Otherwise the chances are high that an older, more experienced developer will own scalability issues.

Lately I’ve been recalling all the hours I’ve spent in the CSE labs hacking Linux kernels, extending a poor implementation of the EXT2 filesystem, creating peer-to-peer networking applications, taking a single-threaded web server and making it multi-threaded, creating my own preeumptive thread library in C, lexical analysis to create a timeline of events, creating Netflix movie recommendations, computing PageRank for Wikipedia, etc.  I badly, badly want more of this.  I want meaty, huge, disgusting engineering problems that make people scour and cry at the mere thought of them.  Now I’m not arguing that I’m capable, qualified, or what have you; I’m merely stating my interests — to be engulfed and overwhelmed with vomitous engineering problems dealing with scalability.

Partly what has motived this post is my frustration with web programming.  The recent Mars mission, Phoenix, has also got me thinking some.  I’m hoping that my desires will be fulfilled while at Google this summer.  In fact I’m confident they will be.  A wise man once said, “Be careful what you wish for.”  Hopefully I won’t regret this post in the future :).

Twitter: A Case Study for Bad Software Development

I read (yet) another post about Twitter’s performance problems, but this one unlike others shined light onto the technical difficulties that Twitter is facing. From the post it appears as though Twitter was created too rapidly with an underemphasis on performance. It’s very easy for software developers to be so interested in cranking out features that they disregard performance altogether. I think Twitter is one of these cases. Read the post for more details, but basically they were naive when developing their application and didn’t dig deep into performance bottlenecks and limitations.

Let this be a lesson that performance must be a factor when developing an application. Consider what will happen to your code if you experience an insane amount of usage, and understand the performance bottlenecks that you’ll have. Google is a good counterexample to Twitter. Larry and Sergey knew how difficult it would be to create a fast index of the internet, so they developed tools to deal with large data. I’ll bet the developers at Twitter are running around like maniacs, profiling, testing, and screaming profanities. Had they taken performance into account earlier they could have avoided their recent downtime altogether or at least been more prepared to fix the problem when it occurred.

Ruby on Rails: Building a Reverse Index for Search

Wow.  I’ve never been so impressed with a framework.  Take a look at this guide to create a reverse index for search in Ruby on Rails.  Here’s the basic idea:

  1. Install a gem
  2. Install a plugin
  3. Specify the fields for each model that should be indexed
  4. Call the find_by_contents method

Insane!  I used Lucene with Timedex, and I can’t even begin to explain how much more work that was.

So rad!

Ruby on Rails: Second Impression

My second impression of Ruby on Rails is again good. I’ve began to dig into it quite a bit with my developing world project. I admit that I’m still not an expert, but I do have a few things to note:

Pros
As everyone will tell you, Rails does a lot for you. The amount of code you have to write is extremely small, and the code that you typically write is more interesting relative to most web code. What I mean is that Rails code is minimized in areas such as form processing or database access, so most of your time is spent coding business logic and views.

Rails is insanely well supported. There is a plugin or gem to do almost anything you want, and the support is wonderful. Googling for Rails information always turns up good results, and the IDEs available, for example Aptana, are super powerful.

Cons
You’ll try to do everything the “Rails way.” Rails allows you to do things by hand, for example manually include a style sheet or manually write an AJAX widget. However, Rails also offers tools and techniques to make things such as including style sheets and writing AJAX widgets easier and faster. This is somewhat of a catch-22, though, because I find myself spending too much time learning the “Rail way.” I suppose this is the same with any framework – in order to use it to its full potential, there is usually a large learning curve.

Another con that I have not noticed first hand is performance. I just read an article about Ruby on Rails performance problems, which describes how Twitter is moving away from Ruby on Rails for performance reasons. While reading through that article I noticed a comment about a particular PHP rapid application framework called CakePHP. I’m going to look into Cake, because it might be a pretty cool alternative to Rails.

More Rails updates coming soon!

Update: Open has a good post about Ruby on Rails scalability enhancements. Another good read.

Update2: ReadWriteWeb has an awesome article on website scalability, in particular about Twitter scalability.