This is a list of books currently on my To Read shelf... literally. I do not suggest or anti-suggest any of them at this time as I haven't read them yet.
Current Efforts:
Blue Parabola, LLC
CaseyMultiMedia
web2Project
PHP'ers:
Ben Ramsey
Brandon Savage
Cal Evans
Eli White
Elizabeth Naramore
Joe LeBlanc
Matthew Turland
Matthew Weier O'Phinney
Planet PHP
Tony Bibbs
DC Social Media:
Aaron Brazell
Jessie X
Shashi B
Business/mISV:
Bob Walsh
Eric Sink
Gavin Bowman
Guy Kawasaki
Joel Spolsky
Micah Baldwin
Paul Graham
Planet mISV
Past Projects:
CodeSnipers
HOBY
Judicial Watch
mobile FoxNews.com
NRTW
Great Tools I use:
Drupal
phpUnit
Subversion
Zend Framework
This is not the home of dotProject or web2project. It is the home of CaseySoftware, LLC. Any dotProject support questions should be referred to their support forums.
This is the first item in what I hope to become a series covering one of the largest systems I've ever been involved in building. A full year has passed since it was put into production so now I can give a level of detail which wasn't possible before. Yes, this is all written after the fact which gives insight not available at the time... oh well.
Alright, the first thing you need with any new system is a set of requirements. Some people want to skip this step and dive directly into the design or even coding but hopefully those people will be tied up and left in a closet by themselves somewhere. Just like you can't buy plane tickets without a destination, you can start system design or implementation without a destination. So here we go...
Let's build an RSS reader (r1). No, not something to compete with Bloglines, Thunderbird, whatever. Instead, let's build an RSS reader which can retrieve news from all of the major primary news sources. We're going to skip the New York Times, CNN, or FoxNews and go directly to the Associated Press, LexisNexis, etc. Alright, so although this doesn't say it explicitly, it does imply that we're going to have to deal with huge volumes of information coming all day and night (r2). Unfortunately, since requesting new items every minute might annoy them, let's throttle our requests (r3).
Now for some assumptions... we can assume that articles will have updates throughout their lifetime due to new and better information becoming available (a1). Well, there is an upside of using the AP instead of the secondary sources. We don't have to worry about getting the same story from different sources (a2). Let's also assume that every feed can be retrieved by a simple http request (a3) and that since many websites simply syndicate this content, it may have html in it (a4). And of course, we've skipped the biggest assumption of all... that we're actually getting RSS (a5).
So the summary of our requirements are:
(r1) An RSS reader importing primary news sources;
(r2) We have to support huge volumes of content being made available constantly;
(r3) We should throttle our requests so as not to annoy the powers that be.
And the summary of our assumptions are:
(a1) Individual articles can be updated multiple times as corrections/new filings happen;
(a2) Any given item will only appear in one source;
(a3) Any feed should be retrievable via http;
(a4) The content items may have html in them;
(a5) The feeds will conform to RSS.
Alright, so I think we're ready to sketch out an initial database design. We need two tables, one to hold the list of feeds and one to hold the content from those feeds. On the Newsfeeds table, we'll start with these fields:
id (primary key) feed_name feed_url feed_last_updated feed_update_interval
And for our Newsitems table, we'll start with these:
id (primary key) feed_id (foreign key) item_title item_body
Remember, now that we have a bit of design, we will regularly have to make the decision to update this or dump it and begin again. Depending on how much our two lists change, updating this design could get messy. And that's when it gets fun...
If programmers were to make an airplane
This video instantly comes to mind - http://youtube.com/watch?v=Nq55R7R-qfw
You are absolutely right Keith, a solid design is essential before taking on a large system. Complications will come up.
Building Planes
Ugh, I've seen that one. And if there's another commercial that I could hate more, I have yet to see it. Flying by the seat of your pants (literally in that one) is one of the worst things you can do in our industry... and for some reason they almost celebrate it.
Can it really be avoided?
I agree with you Keith. Flying by the seat of ones pants is less than optimal, but can it really be avoided in our industry? There so much pioneering still going on that changing course mid-stream seems inevitable.
Granted, in relation to your blog post, getting customers to define what they want before they contract you to start building it certainly a must. However, I think the "organic" (and frustrating) nature of web software (particularly open source web software) is unavoidable.
I think that's why keeping systems as modular as possible is the best way to go. Building the entire airplane while the whole thing is in the sky is ludicrous. However, fixing or even building an additional engine while the machines running is a lot more feasible and probably even par for the course.
Cal Henderson often talks about how each little piece of Flickr can be taken offline and repaired/upgraded/maintained. That sort of system does lend itself to safe organic growth.
If there's another way around the chaos, I'd love to hear about it. :)
Thanks, Keith.
I hope so...
I think if developers want to be taken as serious professionals, we need to start getting away from this image. We need to polish things up a bit and start implementing standard and practices that not only ensure a bit of repeatability but also fit with the organization... it's a tough balance, but just simply having some deployment standards can be a powerful step.
Yes, the battlefield surgeon has an amazing set of skills and will power, but that sort of attitude could be taken outside the realm of reasonableness very easily.
Post new comment