Surf Roots, Software Thoughts A blog by Alex Loddengaard

14Oct/085

SQLite3: Beware of Concurrency

SQLite3 is a very lightweight implementation of a SQL database.  I've been using it in conjunction with Python on a single-threaded tool.  This morning I started refactoring my tool to have multi-process support, but I was interrupted by the following error:

sqlite3.OperationalError: database is locked

After reading through the SQLite3 documentation, I found that SQLite3 does database-level locking when performing write operations.  This means that all hope of parallelizing SQLite3 write operations is lost, because SQLite rejects a concurrent write attempt to the same database.

And I thought MySQL's table-level locking was bad (unless you're using InnoDB) ...

Update: for the record, PostreSQL does row-level locking.

8Oct/088

Python First Impression

I've been using Python now for just about two weeks; I'm falling in love.

Let's see, where do I begin.  Python makes lots of things really, really easy -- things like date formating, date comparisons, db interaction, list manipulation, etc.  The list goes on.  Its built-in support for dictionaries and tuples make it super easy to never, ever define a Java Bean-style class, yet they're in many ways more powerful than C-style structs.

Python module (egg) support is unreal.  A module exists for just about any task you'd ever want to fulfil -- modules for XHTML parsing, modules for URL fetching, etc.

In summary, Python has the speed and flexibility of Perl, with much more powerful built-in support.

Complaints: all member functions need to have the "self" parameter as the first parameter.  In order to have a Python file execute something, one must add a line, "if __name__ =='__main__':."  This is just weird.

Mmmmm.  Python.

Update: I forgot about my biggest complaint of all: how Python deals with default parameters.  Read more here, or take a look at the quote below:

Default parameter values are evaluated when the function definition is executed. This means that the expression is evaluated once, when the function is defined, and that that same ``pre-computed'' value is used for each call.

Filed under: Programming, Python 8 Comments
28Aug/082

My Google, Shanghai: Explained

I've talked a little about being in China, but I haven't said much about why. Up until only recently my duties at Google were unclear, but now I understand my purpose: Christophe dragged me over here to contribute to Hadoop, an open-source MapReduce implementation.

Hadoop is essentially a tool used by software engineers to write programs that use large amounts of computers to process vast amounts of data. Cloud computing is the new buzz word, but Google revolutionized large-scale computing, or distributed-computing, many years ago. Historically, lots of data (like that of the internet) was analyzed by large, expensive computers. In fact, historically, lots of data just flat out wasn't analyzed. Now, in the wake of MapReduce, Hadoop puts hundreds or even thousands of commodity computers to work to analyze data. Cloud computing is one of the reasons why Google is the best search engine, and industries all over are benefiting from the cloud. Cancer researchers are able to more efficiently understand their data. Astronomers can crunch their images much faster. Hadoop allows any company to effectively understand large amounts of data.

It's not yet clear exactly how I'll be contributing to Hadoop; those details should surface soon. I admit that Hadoop is my first open-source project, and I'm very, very excited to be contributing to a field that is growing so rapidly. More updates to come!

Bonus story: after slaving away for four days, I finally have Hadoop's trunk build running on a multi-node cluster. Boom shakalaka!

27May/084

I’m Burned Out on Web Programming; Give me Mayhem

I've been working feverishly on my developing world capstone project, which is a Ruby on Rails e-commerce+content management website.  I'm starting to realize that I'm slowly becoming burnt out on web programming.  I'm fairly certain that this is just a phase; this is what I'm thinking:

Making websites is great because your products can be used and seen by huge numbers of people with very little upfront time and cost commitments -- this is the main reason why I fell in love with the web in the first place.  I remember when I made my first website and got my first user contribution from a stranger; I was so happy I jumped out of my chair and ran around the house for a while.  It's an awesome feeling having regular people use your product; I'd even go so far as to say that I live for it, partly at least.

The downside to programming websites is that the majority of your tasks are repetitive and, at lest in my opinion, annoying.  The tasks I'm referring to are getting CSS to work in IE6, copying and pasting DB code so that one model can function the same way as another, figuring out how to vertically align something in CSS, getting an XML traversal to work in all browsers, etc.  Rapid development frameworks and JavaScript libraries such as Ruby on Rails and Prototype, respectively, abstract a lot of the knitty-gritty that I just complained about, but they don't let you totally avoid annoying web development details.  Moreover, most of the websites that I've made, with the exception of Timedex, did not have large performance constraints that create interesting engineering challenges.  Perhaps part of that is my own fault for not making popular websites ;) .  Regardless, my involvement in website creation has been for the most part not as engineering-esque as I would like it to be.

What interests me the most about the web is scalability.  I would love to be Twitter's lead softare engineer right now, facing tons of downtime and pissed off customers, figuring out clever ways to deal with insanely computationally-intensive problems.  I get excited just thinking about it.  It seems to me that the only way a 22-year-old kid could be involved with scalability whatsoever is if the company was a very, very small startup.  Otherwise the chances are high that an older, more experienced developer will own scalability issues.

Lately I've been recalling all the hours I've spent in the CSE labs hacking Linux kernels, extending a poor implementation of the EXT2 filesystem, creating peer-to-peer networking applications, taking a single-threaded web server and making it multi-threaded, creating my own preeumptive thread library in C, lexical analysis to create a timeline of events, creating Netflix movie recommendations, computing PageRank for Wikipedia, etc.  I badly, badly want more of this.  I want meaty, huge, disgusting engineering problems that make people scour and cry at the mere thought of them.  Now I'm not arguing that I'm capable, qualified, or what have you; I'm merely stating my interests -- to be engulfed and overwhelmed with vomitous engineering problems dealing with scalability.

Partly what has motived this post is my frustration with web programming.  The recent Mars mission, Phoenix, has also got me thinking some.  I'm hoping that my desires will be fulfilled while at Google this summer.  In fact I'm confident they will be.  A wise man once said, "Be careful what you wish for."  Hopefully I won't regret this post in the future :) .

Filed under: Programming 4 Comments
22May/083

Twitter: A Case Study for Bad Software Development

I read (yet) another post about Twitter's performance problems, but this one unlike others shined light onto the technical difficulties that Twitter is facing. From the post it appears as though Twitter was created too rapidly with an underemphasis on performance. It's very easy for software developers to be so interested in cranking out features that they disregard performance altogether. I think Twitter is one of these cases. Read the post for more details, but basically they were naive when developing their application and didn't dig deep into performance bottlenecks and limitations.

Let this be a lesson that performance must be a factor when developing an application. Consider what will happen to your code if you experience an insane amount of usage, and understand the performance bottlenecks that you'll have. Google is a good counterexample to Twitter. Larry and Sergey knew how difficult it would be to create a fast index of the internet, so they developed tools to deal with large data. I'll bet the developers at Twitter are running around like maniacs, profiling, testing, and screaming profanities. Had they taken performance into account earlier they could have avoided their recent downtime altogether or at least been more prepared to fix the problem when it occurred.

Filed under: Links, Programming 3 Comments
9May/082

Ruby on Rails: Building a Reverse Index for Search

Wow.  I've never been so impressed with a framework.  Take a look at this guide to create a reverse index for search in Ruby on Rails.  Here's the basic idea:

  1. Install a gem
  2. Install a plugin
  3. Specify the fields for each model that should be indexed
  4. Call the find_by_contents method

Insane!  I used Lucene with Timedex, and I can't even begin to explain how much more work that was.

So rad!

1May/081

Ruby on Rails: Second Impression

My second impression of Ruby on Rails is again good. I've began to dig into it quite a bit with my developing world project. I admit that I'm still not an expert, but I do have a few things to note:

Pros
As everyone will tell you, Rails does a lot for you. The amount of code you have to write is extremely small, and the code that you typically write is more interesting relative to most web code. What I mean is that Rails code is minimized in areas such as form processing or database access, so most of your time is spent coding business logic and views.

Rails is insanely well supported. There is a plugin or gem to do almost anything you want, and the support is wonderful. Googling for Rails information always turns up good results, and the IDEs available, for example Aptana, are super powerful.

Cons
You'll try to do everything the "Rails way." Rails allows you to do things by hand, for example manually include a style sheet or manually write an AJAX widget. However, Rails also offers tools and techniques to make things such as including style sheets and writing AJAX widgets easier and faster. This is somewhat of a catch-22, though, because I find myself spending too much time learning the "Rail way." I suppose this is the same with any framework - in order to use it to its full potential, there is usually a large learning curve.

Another con that I have not noticed first hand is performance. I just read an article about Ruby on Rails performance problems, which describes how Twitter is moving away from Ruby on Rails for performance reasons. While reading through that article I noticed a comment about a particular PHP rapid application framework called CakePHP. I'm going to look into Cake, because it might be a pretty cool alternative to Rails.

More Rails updates coming soon!

Update: Open has a good post about Ruby on Rails scalability enhancements. Another good read.

Update2: ReadWriteWeb has an awesome article on website scalability, in particular about Twitter scalability.

24Apr/080

This is Customer Service: Aptana

Earlier today I wrote a post describing my difficulties with Aptana, which is an IDE for PHP, Ruby on Rails, and other things.  A few hours after I published my post I was contacted by means of a comment and an email by several people from the Aptana engineering team.  They wanted to know more about my troubles and offered personal assistance.  Are you kidding me?  This is perhaps the most exquisite customer service I've ever been witness to.  Not only is the Aptana team willing to help me for free, but they're also actively searching for bloggers having trouble.  I'm speechless.

Customer service like this makes me want to run down the streets of Seattle skipping and screaming, "Aptana! Aptana!"  Seriously.  Once I get a response I'm going to give Aptana another chance as an environment for running Rails apps - currently I just use it for text editing.

Hats off to you guys, Aptana engineering team.

24Apr/082

A Second Look at Software Engineering a Startup

I wrote a post a while back about software engineering a startup, and Clint's comment made me realize that my post was poorly thought out and just plain bad.  I argued that the quality of code you write should be related to the type of financial situation you're in.  For example, if you're a funded startup, then you should be writing good, maintainable code.  However, if you're not funded or if you're doing a personal project, then you should code quickly to get features out the door.  I want to revise my argument.

I've now worked on two pretty large personal projects: Cellarspot and Best Seattle Bars.  Cellarspot was developed in JSP, and the code we wrote was totally maintainable and robust.  On the other hand, Best Seattle Bars was developed in PHP, and the code I wrote was all inline.  The following analysis may be obvious, but I'll make it anyway:

Robust, Maintainable Code
Advantages: Maintainable, easily refactorable, and expandable
Disadvantages: Generally requires more work, thought, and usually a more advanced development environment

Inline Code
Advantages: Requires less work, less thought, and a much less sophisticated development enrivonment
Disadvantages: Not maintainable, not easily refactorable, and not expandable

So when should you write fast, inline code, and when should you write good, robust code?  Inline code is great when you want to prototype something.  It allows you to quickly launch a set of features so you can evaluate your idea.  Robust, maintainable code is basically great for all other cases.  Now let's examine three cases:

You write inline code and your idea blows up
Your idea takes flight and before you know it you have tons of users, tons of peers programming with you, and your users want more features and better reliability.  Chances are good that you'll have to recode your entire web app, and that's going to be a HUGE upfront cost that perhaps is more costly than writing good code to begin with.

You write good, maintainable, robust code, and your idea flops
First, let me just say that I feel your pain. One might think that this case is a total waste of time, but I'll argue the contrary. I learned an insane amount by going through the process of writing good code, and it would be a good practice for me to do it again. The value one gains by going through this process is huge - it can help you get jobs (it did for me at least), and it keeps you sharp.

You use Ruby on Rails or some similar framework
Frameworks such as Ruby on Rails and Django let you write good, maintainable code very, very quickly.  They let you avoid the above two cases completely, because you won't waste your time on a flopped idea and you won't find yourself scrambling for dear life if your idea blows up.

My conslusion is the following: write good, maintainable code, and choose a framework that makes it easy for you to do so.  You'll avoid the upfront costs of writing good code, and you'll avoid the posiblity of large refactoring down the road.

24Apr/085

Ruby on Rails: First Impression

First impression of Ruby on Rails: awesome.  It does so many things for you, and it makes things so easy.  "Things" is a very vague term, so I plan to write a much more in-depth post when I know more.

I tried to use Aptana as a fully enclosed IDE, but I couldn't get it to work correctly after trying to reinstall it at least 10 times.  I followed a guide on how to install Ruby on Rails on Mac 10.4; I got everything working from the command line.  Finally!

More Ruby on Rails updates to come.