Tuesday, October 11, 2011

On Kottu development, and oxygen for ideas

Kottu, after the 8.0 redesign

The thing with designing a software system you have no idea about is that you don't know how deep the water is, and you never will, until you jump right in. When Indi contacted me back in July with the proposal to join him at Kottu, I was stunned. It was truly a dream come true. I loved Kottu, Kottu was every young blogger's best friend. And it was where people could find good content, and sometimes even love

Coming back to the jumping in the water business: there is no way you can get something as complex and dynamic as Kottu right first time around. And I jumped right in, rewrote the whole code base from scratch. Every software best practice that I learned went flying out the window, and I cobbled up the least complicated thing that worked. No OOP, no comments, no nothing. Getting a working system out (no matter how ugly the code looked) was first priority. And only after going public with that did everything else flow along: documentation (still partial), code refactoring (check out the Posts class, which is basically what the whole back-end of Kottu looks like now) and more features.

When Kottu 7.8 beta first launched, we had no thumbnails, no categories and a bunch of those buggy Javascript social widgets that made loading the home page take ages. Only user feedback and time fixed those issues, and I don't think it is humanly possible for it to have happened any other way. Good systems don't just happen. Of course, there was UNIX, but let's not try to pretend we're Richie and Thompson, shall we? ;)

Right, then. The back-end of Kottu hit version 8.0 with some nice little refactoring (basically a revamped utils folder.) There are some development goals that I hope to achieve before we hit version 8.0 officially. (Note: These are not long-term goals like Indi has described here, but more of short-term, little, code-based goals)

  • Refactoring the front end

    There is a lot of code that gets duplicated in index.php and search.php and elsewhere in the front end. I hope to unify everything into a PageGen class to make generating pages easier and the code much cleaner.

  • Caching

    Kottu stupidly generates every page dynamically... every.single.time. I plan to put an end to this madness and store cached copies of several frequently accessed pages (or options within Kottu, say for example "Sinhala + Popular Today"). This would make me feel less guilty about running long and costly queries to give users a bit more of the beautiful data we have at Kottu. Don't tell anyone, but there will be graphs! ;) Shh!




  • Better documentation

    If you ask people why many open source projects fail, they would say that the two primary factors are developers losing interest, and lax user documentation. I've been a typical programmer, and been slacking off when I was supposed to be writing the f@#%ing manual (couldn't resist, sorry!) :D Yeah, there is a nice little markdown file on our Github, but we need more, including better on site documentation (not about the code, obviously, but think of a better About Us page).

  • Not-so-active blogs, what is?

    Okay, so there are active bloggers, and there are not-so-active bloggers. Thing about Kottu is we're dealing with limited resources, servers that might melt sooner or later, and other real world problems. We currently have 907 blogs listed on Kottu (wait, I'm sure that number was higher... dafuq?). What does FeedGet.php do? It takes 50 blogs (least recently polled), goes to each of those RSS feeds and adds any new posts to our database. Now, FeedGet.php is cronjob'd to run every 5 (!!!) minutes, the minimum limit allowed by our hosting provider, but still it takes approximately 1 hour 30 minutes to repoll a feed. So, if we polled your feed just before you posted a blog post, then sorry.com, bro. You will have to wait for one and a half hours for your blogpost to appear on Kottu. (We can't poll more that 50 feeds at a time due to the fact that PHP is a freaking crack addict that gobbles up insane amounts of memory without de-allocating any)

    What do we do now? What do we do now that doesn't get me banned from the next blogger meetup? Maybe we should poll not-so-active blogs less frequently, giving priority to active blogs - which are more likely to have new posts. And as soon as a less-than-active blogger makes his comeback, his blog is made active again, and gets polled in the usual way. It's sort of evil, yes, but necessary if we're to increase the response time of Kottu. And it also encourages bloggers to be more active. Whaddaya think?

So, finally, don't let the version numbering and jibber jabber fool you. Kottu is pretty much a work in progress, and Indi and I will continue to tinker and attempt to make it better and faster and cuter. And hopefully, I will realise my dream of a happier, friendlier, active little Blogosphere, like we had back when I first started in February 2009. Those were the days, maaan! :')

P.S. The title of the post is from an awesome awesome article by the guy behind Wordpress. READ IT! It totally changed the way I view software, and contributed to numerous bugs on Kottu, and I highly recommend it for anybody, geek or Greek. :D

8 comments:

  1. Great work Janith!!! Looks like Kottu is going to improve quite a bit. Been following Indi's and your posts about the progress and am quite impressed.

    Very proud of you :)

    ReplyDelete
  2. Technically speaking you can figure out how often a blogger blogs. based on past data. You can use that as a hint. Also for a not so active blog like mine an checking once for 24 hours is probably sufficient. But anything more than that might be little bit too evil. Because lets say when a new post get polled 24 hours late from its creation there is a good chance that the post will be too old to have a place on the first page on kottu. And being on first page even for a shot time matters. Probably the polling frequency should be based on half life on first page for a post. But then again if it get too short it will not serve the purpose.

    /Rakhitha

    ReplyDelete
  3. Awesome work with kottu dude, it was long overdue for revival.

    Also, looked at the source a bit, got some comments. Dunno where to post so here goes :D

    You can use a service like cronless, which is kinda cheap for a pro account, which apparently lets you execute every 30 seconds even.

    For php's memory eating habits, saw the memory leak thing on simplepie?

    For thumbnails, check out timthumb. Might have to mod the allowed domains list though. Make it accessible to all or something.

    Also, feature suggestion: infinite scroll! :D

    Btw, now that you've wrangled with web stuff, WHY are you still on blogger? Do you not see what horrendous pages it generates? :P

    ReplyDelete
  4. Yes!! I vote on the infinite scroll too :D

    ReplyDelete
  5. Chathura - Thank you. And awww! :D Hopefully we don't disappoint, and by the end of the year we have something kickass to show off. :D

    Rakitha - Yeah, it's a very sensitive issue. There could be other ways, like analyzing past data to figure out when a blogger is most likely to post. :D It requires work, but I've got time to kill. ;) Let's see how it goes.

    Jerry - I thanked you for the comment on G+, but I didn't think you saw. THANK YOU!!! :D I added TimThumb to Kottu almost as soon as I saw your comment. It doesn't affect security, does it? I tried to load some js and stuff onto the server using it and it didn't work, but have to figure out. :) But I think the most obvious holes must've been closed after the whole "TimThumb is an evil security hole" fiasco. :D

    As for Simple Pie, I do de-allocate memory manually but it still has issues. I think Simple Pie eats up lots of memory regardless of leaks when polling a huge number of feeds. Not to worry though, it's not that much of an issue, we're still young. :D

    And Blogger - I don't mind the non-standard HTML really, but the RSS feeds make me think monkeys wrote them. Is putting a \n at the end of a line so hard? :/

    Chathura and Jerry - Oh noez Javascript! ;) Let's see let's see. There's probably some jQuery script that takes care of that. :D

    Thanks for the comments and encouragement, guys. :D

    ReplyDelete
  6. Just saw the rss. I think they're trying to make it look like a json object :D

    Yeah, I can't believe timthumb didn't have MIME type filtering till the recent pitchfork wielding website owners convention about it! Just allow the base minimum permissions on its cach folder, and maybe write an htaccess in there to redirect all php etc requests if you're paranoid.

    ReplyDelete
  7. Finally you have posted an amazing thing here which is very helpful to us.

    ReplyDelete