Monday, June 22, 2009

The second-to-last release of Braydix...

So I've definitely made some Braydix progress - here's the latest. I need to eventually come up with some kind of name that's catchier than "Braydix" - that has the syllable for "dicks" in it, which...isn't that nice. It's still pretty huge - 140MB right now - but I have things being compiled with debugging information 'on' (which bloats the sizes out) and some things that I should have working as libraries are actually compiled-in. I think I may also make some stuff compressed, which very rough back-of-the-envelope math says should cut the size to a third (!).

So what's in it, and why is it the second to last release? Well, I've tied in the CREST-fs 'internetty' filesystem pretty early into the boot sequence, so the system in essence can 'boot off the web'. The idea is, when it's integrated perfectly, that you would be able to do some kind of initial installation thing, and never have to do any upgrades or anything. Just reboot, and what you need is 'there'. But right now it's not quite integrated perfectly - I'm not sure how I do kernel or initramfs updates, or how they'd get picked up by the client. So once I've got *that* figured out, then *that* release becomes my 'final' release.

I've also built a very very simple installer, which you can launch using one of the virtual terminals (Ctrl-Alt-F1 is the main browser window, Ctrl-Alt-F2 through F4 gives you a terminal. Type 'installer'). I wouldn't run it on any piece of hardware you cared about though - it could format the wrong disk, or install to the wrong partition, or do any number of other, awful things.

The clever bit - well, there are many clever bits, but the 'novel' clever bit - is that I have a special 'bootstrap' image to prime the CREST-fs backing cache. So you can, in fact, boot up off this disk image with no network connection at all. And you'll at least get the browser launched and the GUI up and running. From there, you can possibly configure wireless or something and continue to use the system from there.

The idea - and I don't think I've articulated this very well - is that you should never have to install anything on this. Whenever new things show up on the internet, they're just 'there' automatically. You never have to update anything - reboot and you should have the latest. If I end up doing a third-party application thing, just chuck your binaries and libs and stuff on a website somewhere, and it's available to the systems - the CREST-fs mounting can talk to any webserver just as easily as any other, mine isn't particularly special (except for the odd symlink that points to it, or a PATH entry). Some notes:

  • Libraries SUCK
  • Compiling things 'static' is really hard
  • Booting is really complicated
  • The Linux Initramfs is really brilliant. You can do all kinds of ridiculously crazy shit with this. Very cool stuff.
  • Busybox is still awesome
  • "Virtualbox" as a development environment under OS X is (relatively speaking) an extremely pleasant place to develop in

Sunday, May 10, 2009

Nerd Alert!

The following post is intended for a very technical audience. Consultant supervision is advised.

I wrote a thingee that lets you mount "The Web" as a filesystem. So far it works under Mac OS X, but I intend to port it to Linux. It uses MacFUSE for the filesystem interface - makes it much easier to write without kernel muckery. It's all written in C. It's available on GitHub. It's called CREST fs - Cached REST, or Cached Representational State Transfer.

The thinking is that The Web, as a whole, is a set of resources, which I should be able to access like files from a file system. So as opposed to doing something like trying to curl http://samplesamp.sa/apage/something, then trying to run the code that lives on that page, you could instead mount the web under some directory, then cd to samplesamp.sa and execute apage/something directly. The first access to 'something' would require it be fetched from across the internet, but with a caching scheme built in, subsequent accesses should be from the local hard-drive caching system.

I thought it would be clever to be able to mount it under a folder called /http:/ - so you could say:
ls -al /http:/www.google.com/someurl/something
See? I think that's clever. You could probably put two slashes there and it wouldnt' mess up anything. And if you were sitting in the / directory you could skip the first slash. But that's starting to get too clever just for cleverness's sake.

There's stuff like that Out There already, but I wanted to build something that was extremely aggressive about caching, and very primitive (low-level, usable in Early boot environments). I expect it to notbe remotely coherent, but I do want it to be fast. So far, so good.

Sidenote: it's my first Git project. Git is nice. Hosting it on GitHub is interesting, too, but less interesting than the fact that it's on Git.

Writing code in C is painful. Allocating memory is not fun. Troubleshooting subtle memory leaks is not fun. Doing your own string manipulation by hand is not fun. But there is a certain feeling you get from being this close to the bare metal of the hardware...that's really pretty exciting.

This is the second time I've written this - the first version was lost in the Great Hard Drive crash of '08. Writing all this low-level crap is not that cool, but writing it for the second time is even less cool.

The intent behind all of this is to tie it in with Braydix somehow - to allow you to boot a "minimal" Braydix image from CD or USB key, and have it pull the rest from Teh Intarwubs. I have a new client who does a lot of work within Amazon's EC2 environment, so I've had to study how that works. I think this FS might be interesting for that, too. As soon as I can find an excuse to put something up over there, I definitely will mess around with that too. Once I've done that, just think of it - they'll be Braydix both client and server versions.

Friday, May 01, 2009

More Spam

Ugh.

So my clever hack about RSET apparently triggers problems in feeble, horrible, nasty mail clients like Eudora - which one of my client's clients actually uses. So I had to back out my change. It was funny to hear someone read me my 'garbage' message right back to me, though.

So in the process of poking around, I found that there was already a feature in the qmail chkuser patch which allows you to set a number of bad recipients before which you are over your limit. So I enabled that. And it did not at all stem the flood, because it simply just rejected all subsequent attempts with 400-series messages - not disconnecting the sender.

So once again, I jumped in to the code. And I made it so that it actually disconnects you instead of just marking subsequent connection attempts as automatically-failing.

This seems like it's working. I have 6500 IP's in my self-written blacklist, and the smtp server-load has dropped to half. It's still there, though, so I'll have to keep an eye on it.

All in all, not a fun day...

Spam

Spammers are nasty little pieces of work.

It's been a constant cat-and-mouse game where we (anti-spammer people!) take a few steps forward, then the spammers hit us back twice as hard.

This time, they're doing some kind of distributed dictionary attack. So that means that thousands upon thousands of computers across the globe are all trying to send mail to various mailservers (including one I'm responsible for) looking like "joe@domain.com, jack@domain.com, jeb@domain.com, jorge@domain.com..." for several domains that we host.

The problem is - they are slamming the servers so hard that they're starting to overpower the DNS blacklists we use to block spammers. And they're not showing up in the blacklists always.

So my idea was to find out when someone fails to send mail to 5 or 10 accounts in a row, and then add them to a blacklist. I wrote a simple PHP script to do that, and it works...eh, okay. Not stellar. I even added in a piece that kill -9's their smtp process when they get listed, it doesn't always seem to work right. Maybe they're coming in 20 times at once, or something.

So I've run my little blacklister script for a while - and as of press time I have about 5100 IP's in my block list. And it doesn't really seem like it's getting any better. I finally turn on 'record entire SMTP conversation'.

So this is what they're doing -

HELO IMASPAMMER
MAIL FROM:<somelikelyinnocentvictim@somerandomdomain.com>
RCPT TO:<joe@domain.com>
RCPT TO:<jack@domain.com>
RCPT TO:<jeb@domain.com>
RCPT TO:<jorge@domain.com>

To which it gets answers like:

451 No such user 'joe@domain.com'
451 No such user 'jack@domain.com'

etc.

So here's the clever bit - then they do:

RSET


Which apparently just 'resets' the SMTP communication, and start again to do the next five recipients. Ugh.

So now it's time to dust off the ole C coding, and I've rewritten the 'rset' command to now say:

502 Just send your mail again, don't pull this RSET garbage.

And disconnect 'em. That seems to have helped a lot - with the spammers having to reconnect, they get a second chance to get looked-up in the blacklists, or checked against on the my own custom blacklist. Load is reduced - though not eliminated. I guess we'll see how well it works.

My next thing will be to augment this username-check with a counter, and if the counter goes about 'n' bad lookups, bounce the connection. That could help as well - but I don't think by as much as what I've done so far.

Thursday, April 23, 2009

Browsing Nirvana - Achieved?

So as anyone who has worked with me since around 1995 or so knows that I am a notoriously heavy browser of the web. Since tabbed browsing came out, I have been using a complex two-level hierarchical system to manage my web pages. I'll have a window that has a general sort of topic - like maybe a web page I'm developing - plus several tabs for some php functions that I'm using or MySQL documentation or whatever. And several other windows set up similarly - sometimes 'singletons' for various links I've clicked off that people have AIM'ed or Twittered to me. The end result is I can never find anything, and when my browser crashes, I'm screwed. This is why I've been so excited about Google Chrome coming to Mac, and about Stainless browser for Mac. Which I still play with, and is getting better every week.

However, I think I've just made a change that might have switched up how I use them all. I've added a third layer of hierarchy using Fluid. Fluid lets you make little Site-Specific Browsers (SSB's) for websites you keep open all the time. They show up on the dock as separate applications with their own sets of windows. Basically indistinguishable from a regular Mac OS X application. So here's how it's made a difference for me. There are certain sites that I keep open all the time, and certain sites that I'm just browsing and not finished with (hence the window staying open). The ones that I keep open all the time I've made little SSB's for, and closed their windows within my main Safari application (Safari is today's browser of choice, I switch back and forth from Stainless lately). Now, when I'm trying to find something that's in one of my always-open applications, it shows up in the dock. I can command-tab to it. Once I'm in it, I can command-squiggle (tilde) to the correct window. Anything that isn't in one of my always-open applications is in my regular Safari, which only has 5 windows of its own to flip through.

It may sound insane, and probably is, but now that I have this new third layer of hierarchy I feel like a great weight has been lifted. Whereas before I would have to go through ruthless window-culling rampages - "Seriously, I'm not going to do anything about this thing I've been sent, I've left this window open for 3 hours, let's accept I'm going to do nothing here and close it" - now I don't need to, because I can get to everything I need. Furthermore, as a bonus (though I haven't seen it in action yet) I should have some level of crash-isolation - it should hopefully only knock down one of my SSB's, and not everything. We'll have to wait and see how that turns out to be.

I've tried Fluid before, once, and it didn't stick. This time, I still have one main problem - cookies won't pass between SSB's and/or Safari. For most people this may be okay but it's annoying for me. Not a huge deal, just annoying. The other thing I did was spend a full 30 minutes or so making sure I had identifiable icons for my SSB's - this has helped IMMENSELY. Why they haven't set up a protocol for this that just requests the icons from the websites is totally beyond me, but, whatever, I just did it and it looks...mediocre. Which is good enough for me! I even made a little icon for my own web application that I run all the time.

I shall report back with how it goes, but it really does feel like a huge weight has been lifted from my shoulders right now.

Tuesday, April 21, 2009

Rails Documentation

Is the worst fucking thing on the planet. I've actually googled for stuff, clicked on it, and gone to redirecting cybersquatter pages, it's so goddamned bad. Maybe I'm spoiled. The bulk of the professional development I've done has been with PHP, though I was pretty heavy into Perl, Tcl and other such languages at their time. Compared to any of them, Rails documentation is, hands down, the absolute worst.

Half the time I feel like they're being too goddamned clever for their own good. But the 'sensible defaults' that they espouse aren't documented anywhere, so how the hell am I supposed to know what they are? What seems sensible to me might not be sensible to you. I've found myself drilling down into source code more times than I'd like to count to try and figure out what's going on. That is total and complete fail. It's lucky that it's so powerful and cool regardless, or I would've left it in the dust a million years ago.

Maybe I have to be more...loquacious in PHP. That's fine. At least I know what to do and how to do it. 70-80% of the time I'm working in Rails, I have no friggin clue how to tell it how I want to do something. Then when I find out, it's always something like - type two magic words into the right file, then Rails reads your mind. Awesome. I just hate that sickening feeling during that not-20-to-30 percent of the time. I feel helpless.

Then when you do find documentation, it's all stories. "So here's what active record aims to do, here's different ways you can make it do things, blah blah blah." I like my programming docs terse. I look it up, it tells me what that does. But the documentation, especially, just seems all jumbled together and awful. Or the other thing I'll find is the opposite granularity - "Class Foo::Helper::Doodad::fwipple::dingus has methods 'get','put','set','be','execute'. The source code to method 'execute' is: ......." That doesn't help either. That's why it's called DOCUMENTATION. Not fucking SOURCE CODE. I feel like it's some kind of 'hipster' framework - if they actually explained it to you, and regular unhip people "got it", then the hip people would all switch to using Scala.

And, embarassingly enough, I only just 'got' the yield command in Ruby. That's just sad, man. Though I don't see the difference between a yield and an anonymous function, but I guess I'm just not that bright.

I assume it's one of those things where as soon as you buy into it 100%, completely, and spend time just soaking in it, then you'll fully understand. But I don't like having to commit to that level of buy-in. I'll continue to fiddle with it, and even choose it as a framework in whichever contexts it seems right for, but I'll always look slightly askance at it - perhaps until I've been so steeped in it that I can't look at it objectively anymore. But until then, fix your fucking docs Rails, it's horrible.

Saturday, April 04, 2009

divs vs. tables, part II - the compromise (maybe?)

<div class='tablesque'>
   <div class='rowesque'>
      <div class='cellish'>A</div>
      <div class='cellish'>B</div>
   </div>
   <div class='rowesque'>
      <div class='cellish'>C</div>
      <div class='cellish'>D</div>
   </div>
</div>
stylesheet:
.tablesque { display: table; }
.rowesque { display: table-row; }
.cellish { display: table-cell; }

There - it looks like a table, because you told it to look like a table in the CSS. But the markup doesn't say it's a table - it just says you have a hierarchy.

I sorta fell into this idea because I'm working on making a web application work for iphone or for a regular browser, and in the plain browser context I wanted something to be a table, but on the iphone, I wanted it to act more like spans and divs.

To give you an idea of what a moron I am, you should know my first idea was to have a big table, and on the iphone, do things like: display: block, display: inline, etc. But the iPhone (and even Safari on the desktop) had problems with letting me convince it to display tables as non-tables. So finally I switched it to divs, and made the regular browser side do display: table, display: table-row, display: table-cell. And that seems to work okay for now.

So, standards people, there, I'm standardy. My 'layout-like-a-table' CSS is all in the CSS. I think this CSS looks a hell of a lot prettier than the crazy 'float, clear, width, etc' routines. And it should stretch better based on its contents รก la tables.
As a bonus, within the table DOM stuff I don't have mysterious invisible 'tbody' tags that chuck themselves in my table. I lost 3 or 4 hours to that a while ago.

Friday, March 27, 2009

Bravo, MS!

Never thought I'd be saying that...

But I ran into a couple of different intertube posts that talk about the new MS ad campaign that says "Macs=expensive". (Here's Engadget's.)

I'm a huge Mac lover, and were technology company/human marriages legal, perhaps I would've married it (sorry Nicola...). But they can actually make a valid, salient, understandable point here, so more power to 'em.

Now, the real point here is the value for what you get - yes you can buy a computer for $1000 or $5000, the same way you can buy a car for $10,000 or $50,000, it depends on what you're looking for and what value you're getting. But, that's a complicated argument and isn't going to compare to: "Teh Macs are Expensives!" There's also an undercurrent of "Macs are for latte-sippers!" and that's pretty subtle, and also valuable.

I kinda feel like Apple has grown a little...comfortable, perhaps, lately. So I like the idea of MS really breathing down their necks to keep them from becoming too complacent.

I mean, we are in a down economy, letting people buy cheaper stuff becomes a good idea.

Most of MS'es advertising attempts have either left Apple completely unmentioned, or been just completely pointless. This is the first one that actually seems like it has a message, and could cause a little motion in the marketplace. Good on 'em. About time they did something right.

Now let's see Apple's response where they come out with some more 'everyman' style pricing.