The ravings of a sane person. [entries|archive|friends|userinfo]
Josiah Carlson

[ website | My Website ]
[ userinfo | livejournal userinfo ]
[ archive | journal archive ]

Back in Minnesota [Jul. 25th, 2008|12:47 pm]
Got to Minnesota without much difficulty; Northwest had oversold the seats, but enough people bit on the cash + hotel stay to get me a seat.

Saw Damien and his parents, walked around town, and took a picture of everything off the back deck to remind me of what I'm missing ;) .

Just messing around for a bit longer before heading up to northern Minnesota to go to a wedding and see some extended family.

I suspect the world isn't falling apart with me not in LA, and that's just fine ;) .
linkpost comment

Using data structures properly; making it fast [Jul. 17th, 2008|08:35 am]
[Tags|, ]

First, a bit of context.

One of the more-requested features of the asynchronous sockets library available with Python is a scheduler (I maintain the async sockets library). That is, being able to say "execute operation X in Y seconds". Once you have that behavior, two other behaviors become really useful to have; "I know I said run the operation in Y seconds, make that Z seconds", or "don't run X at all". These are generally referred to as 'schedule', 'reschedule', and 'cancel'.

The schedule operation is very easy to implement. You take your minheap[1], put pairs of items into it (absolute time to execute, operation), and any time your current time passes the smallest absolute time in the heap, you remove and execute items until their time is greater than the current time. Simple, fast, and effective.

But what about reschedule and cancel? Well, there's the slow way, the fast but not great method, and then there's what I would call the right way.

The slow way (for rescheduling and canceling) pretty much involves scanning through your entire heap (which takes time linearly proportionate to the length of the heap), removing your item, and running the heapify() operation (which also takes time linearly proportionate to the length of the heap). Generally, we don't like to do operations that are linearly dependent on the size of anything (unless absolutely necessary), because when things grow to be 1 million or 1 billion items, those operations get *really* slow. Twisted[2] and sched.py (available in Python) does this. I know, a big wtf from me.

What about the fast but not great method? Basically, when you cancel an item, you mark it as "don't execute this when it comes up", and when you reschedule, you first cancel the original, then you make a copy for the new schedule. It's not great because if you end up doing a lot of reschedules, your heap can grow. Now, the basic schedule operation doesn't grow linearly, it grows logarithmically. That is, whenever you double the number of items in the heap, you take about one more operation to insert a new item or remove the smallest item. Not a big deal, because 1 million takes 20 operations, and 1 billion takes 30 operations, but if your application is hugely dominated by reschedules or cancels, this can be overwhelming for your memory. If you are smart, you can pay attention to how many canceled items are in the heap, and clear them out when canceled > total/log(total) (this tweaking is known as amortization, because to clear it out will take time linear in the length of the heap, but if you divide the total amount of work by the number of canceled items, it only takes log(total) time for each of the canceled items!)

Now we come to the right way. There is actually a data structure that was pretty much designed for this called a "pairing heap". Schedules, reschedules, and cancels are all O(logn), but the problem is that standard implementation from the literature internally uses a tree, which requires creating tree nodes, linking them, and paying attention to the links. It's not a nightmare, but it's not terribly elegant.

What would Josiah do? Josiah would make a new data structure, and did. Basically you get all the benefits of the pairing heap from the literature; O(logn) schedules, reschedules, and cancels, without any of the nastiness; no trees. It is implemented by using a hash table to pay attention to the location of every item in the binary heap (you know that item X_7 is at position 23, for example) so that when you need to reschedule (or cancel), you know what item to change, and where to adjust within the heap. I actually implemented it publicly in December 2006, but I originally implemented it for closed-source contract software in the summer of 2004. It probably existed before then, but I didn't find it in literature (kind-of like the O(1) LRU/MRU algorithm and data structure that I posted like 5 years ago in the Python cookbook).


Why do I bring this up? Like I said, people want a scheduler within the asynchronous sockets library. One of my motivators, a fellow from Italy named Giampaolo, has implemented his own scheduler using the 'slow' version, based on what is available in Twisted. That Twisted uses the slow method is unfortunate. That Giampaolo has chosen to reimplement the slow method is even more unfortunate. I was originally considering putting the pair heap implementation that I have in the Python standard library, out of a sense of "my way is better, stronger, and faster". But the latter is only asymptotically. In real-life, because it's all implemented in Python, it's probably a bit slower than the "slow" method for most event queues smaller than a few thousand items. If I were to spend the next couple weeks re-implementing it in C it would be faster, but I'm not sure that I want to do that.


What to do. I've decided that I'm going to update Python's sched.py module to offer a reschedule method, make it use the first fast method (with the trimming feature), and add a few other convenience features (locking support for multi-threaded apps, get the items that are ready to execute now, ...). Then I'm going to hook it into Python's asynchronous sockets library rather than my own, because it will be 10-100 times faster than my pure Python implementation in practice. That's not to say that a proper pairing heap implementation wouldn't be useful in Python, it's just not the correct solution for Python at this point in time. And with the way that I'm going to be fixing sched.py, practically speaking, it's a pair heap under a different name.



[1] Mark C. Chu-Carrol talks about heaps here.

[2] Twisted is an asynchronous everything framework for Python and Java (maybe others). It includes hooks for sockets, GUI libraries, and many other things.
link6 comments|post comment

thought dump [Jul. 16th, 2008|08:58 am]
I took the bus this morning (rather than riding my bike) because my legs were sore. Three and a half weeks of riding after not riding for a year will do that, I suppose. The bus is actually more convenient physically (I walk 1 block on either end) and time-wise (it runs every 10 minutes in both directions during any time I would conceivably need to ride it) than any route I've ever seen in OC. And here OC is supposed to have the best bus system; bullshit (you needed to get a transfer to travel between UCI and a local mall, wtf?). To be honest, it was a little difficult for me this morning, because there were so many options, and on my way home I could have cut off a solid 15 minutes if I would have remembered which routes went my way. Ah well. I may make this a Wednesday thing, give my legs a break mid-week.


Incidentally, I'll also be up in San Francisco M-W next week for work in the San Bruno office. I don't know if I'll be able to get to the other sites to visit anyone, but who knows.


My 1-month stint with OS X has had it's ups and downs. The most recent external monitor stuff is annoying. Also, the current state of samba drive mapping in Leopard leaves something to be desired (I ended up writing a couple scripts to automate the mounting and unmounting, because it was actually easier than using the built-in tools). For a while I was using Expandrive, but for day-to-day work, the samba mounting scripts work with less confusion. I could manually use Macfuse + sshfs, but again, samba works (and was easy to set up). I'm considering buying it and the Windows version for personal use, but that's another discussion entirely.


Also, I joined Netflix. So far I've gotten a chance to see a few good movies, and even to start going through Outlaw Star, which I've wanted to do for quite a while. One interesting feature is that you can see what DVDs others in your area are watching. For reference, people in Venice have really strange tastes.
linkpost comment

LJ spam [Jul. 15th, 2008|06:12 pm]
Has anyone else noticed an upswing in the volume of LJ spam? I've gotten at least a half-dozen messages posted in some really old entries from different users, all of whose journals contain random words followed by a set of links.

It smells like someone broke the LJ captcha for new users. :(
link5 comments|post comment

External display on MacbookPro [Jul. 15th, 2008|08:53 am]
I have run into an interesting, if not very frustrating bug with regards to how OS X Leopard handles displays.

When I first got the laptop, I could plug in an external monitor, keyboard, mouse, and power. The moment I plugged in the power, the laptop would spring to life, and the only display would be the external monitor. Which is exactly what I wanted.

At some point along the line, the laptop locked up for some reason or another (related to mapping drives using Samba or NFS), and since then, any time I attempt to wake the computer from sleep with the external display attached and the lid closed, the external display will make a momentary showing, but will go dark. Opening up the laptop lid gets me nothing, as does hammering fn+F7 (which is supposed to switched between mirrored mode).

Before you ask, yes, I have tried the "leave the lid open a smidge and hit a key", that worked a week or two ago to get it out of a similar wedged state, but attempts at the same more recently haven't worked.

Right now, the external display is working fine, the laptop display is black, but I can move my mouse to the laptop by dragging it off the edge of the screen. To reduce the probability of accidentally mousing to the laptop, I've arranged the desktops diagonally from one another, so I would need to mouse up past the Spotlight icon in the upper-right corner. But still, this should work. That it doesn't is bullshit.

On Windows, I would open up the display stuff, set the primary monitor to the external monitor, and disable the laptop display. But on Leopard? Um...You can mirror or not.


So, anyone with Mac experience know of any tools to fix this behavior?

As an aside, I fixed 75% of my annoyance with hotkeys in OS X by doing a circular shift of the control, option, and command keys (System Preferences, Keyboard & Mouse, Modifier Keys). I've played a bit of the WoW expansion F&F alpha, and it almost works. Toss in "alt" really being a method of producing alternate characters (versus being used as hotkeys), control *sometimes* working as a hotkey (ctrl+m inserts a line ending), and 75% is being generous. What would be really nifty is if I could get key swaps based on application without using 3rd party software, but I know that won't happen.
link2 comments|post comment

Going to be in Minnesota late July/early August [Jul. 14th, 2008|01:40 pm]
Minnesota friends, I will be in Minnesota July 25-August 3. The first weekend I'll probably be in northern MN with my extended family, and August 1 I'll be busy at my 10-year HS reunion, but other than that, I will likely be attempting to spend time with people. Care to hang out? Comment here or email me at my firstname DOT lastname AT gmail DOT com .
link2 comments|post comment

[Jul. 2nd, 2008|12:12 pm]
Stopped off an picked up the latest (and last?) Y The Last Man on my way to the Stan Lee talk at Google (which I'm sitting in on now). Obviously all of the questions were pre-determined, but Stan was quite funny nevertheless.

A crappy webcam picture of Stan:

Stan "The Man" Lee

I suspect I'm the only one blogging about this right now...at least in this location ;) .

Update: Also, one of our web devs BSed with Mark Hammil at the comic book store next to the after-talk signing at the Santa Monica library. He bought (among other things) a Superman figurine.
linkpost comment

[Jun. 27th, 2008|11:52 am]
[Tags|]

So, this is the end of my second week working for Google in Santa Monica. More specifically, I work on the YouTube team.

I get access to a lot of really interesting statistical information, like what videos are watched primarily by females, males, etc. It's all really interesting, and if I was a sociologist, I would do a comparative study about what people like to watch (backed up by a huge amount of data). Then again, I can't really share the information I have discovered, even if some of it is pretty obvious if you pause to think for about 5 minutes.

Back to work!

Update: Also, Stan Lee is going to be giving a talk here next week.
link2 comments|post comment

[Jun. 23rd, 2008|09:02 am]
[Tags|]

So far, I've noticed a couple annoying bits in dealing with OSX and programming GUIs.

The first thing is that it does not seem possible to get automatic notification when a window is maximized or minimized. The events that are supposed to fire when the transitions occur, don't. For the particular application that I've been working on; an encrypted password store with optional remote storage via AppEngine (for easy sync among multiple computers), being able to tell when a user (or the application) minimizes (to clear the list) or restores (to ask for the password) the window, is *very convenient. There is a method to see whether or not the window is currently minimized, which I'll hook to look for transitions, but I shouldn't have to do that.

The second thing is that Notebook controls (those things with tabs along the top to let you select documents, preferences, etc.) don't get a little scroll thing like on every other platform. So when you have too many documents open (as an example), it stops being useful. In wxPython (where I'm writing my GUIs), there is a fully custom notebook control, which draws everything by itself, not relying on the underlying platform implementation. You don't get the pretty OS X notebook tabs, but that doesn't seem like a huge loss. On the other hand, you do generally lose the platform native look in those cases, but there are worse things I suppose.


I rode my bike to/from work yesterday evening to make sure that it wasn't going to be bad, and it wasn't. Got in this morning, and aside from some lightly sunburnt shoulders from yesterday, which made carrying the backpack a bit uncomfortable, it was nice to ride in. Not tiring at all. It helps that the ground height doesn't vary by more than a foot for over 3 miles.

I took some pictures of my walk to the beach to check out the sunset, which I'll post sometime soon (I don't have internet access at home quite yet).
link4 comments|post comment

Interesting OSX/wxPython issue [Jun. 19th, 2008|08:48 am]
[Tags|]

One thing that has cropped up as I attempt to use OS X as a real development platform is that on occasion in PyPE, the Undo/Redo and Cut/Copy menu commands go gray, making it so that I can't do any of those operations. Now, I can ctrl+click on a selection and use the underlying Scintilla cut/copy, but I can't Undo/Redo, which sucks.

A quick restart of PyPE and I'm good to go, but that the menu options (and thusly the keyboard shortcuts) get disabled is a mystery.

I thought that it was a double-tapping of the command key when Google Desktop was open (that was my hotkey), but I can't seem to reproduce the issue when I start both up fresh.

Anyone have any ideas?


Also, I'm going to have to update PyPE to properly handle hotkeys in menus on OS X. It doesn't work the same as on Windows and Linux, and I can't seem to figure out the pattern (some Alt+key combos work, others don't, setting Ctrl turns into Command in the menus, Command is actually Meta according to the underlying layer, but Meta is not understood by the OS X menuing system, ...).

Is it so much to ask for Apple to be consistent with the two other major platforms?

Speaking of which, I actually had an idea for some GUI hacker to play with out there. One of the biggest complaints (at least among those that I've spoken to) about switching from any other platform to OS X is the single menu bar at the top of the screen, making it bothersome to get to the menu for smaller windows at the bottom of the screen. Here's the idea; hack the menuing system to do nothing. Then re-implement the menu drawing, selection, hotkey support, etc. (none of which is terribly difficult) and draw it yourself at the top of your application. In wxPython, I've seen some very slick custom menuing systems, some of which are a lot more convenient to use than the regular menus system that we are all used to.
link4 comments|post comment

Huh... [Jun. 18th, 2008|07:34 pm]
[Tags|]

I just discovered (thanks to PBS) that Django Reinhardt only had the use of his thumb, pointer, and middle finger on his left hand. So, as the best jazz guitarist that has lived so far, it's a huge deal. Awesome.
linkpost comment

Interesting applied cryptography (I am not a cryptographer) [Jun. 17th, 2008|12:40 pm]
[Tags|]

This will probably only be interesting to a few of you, but that's ok.

I need to encrypt data in a platform-independent way (Windows, Mac, Linux), so I need to implement an encryption algorithm in pure Python (I don't want to deal with C compilers). I started with the ARCFOUR algorithm (RC4). Implementing straight RC4 is quite easy, but it has a few security issues.

The first security issue is that if you use the same password over and over, you can leak details about your data and password every time you use it. In order to handle this issue, most people end up using a random salt either prepended or appended to the plaintext password. I have chosen to generate two 10-byte random salts, one as a prefix, one as a suffix, which are prepended to the output file in plaintext (necessary for all random salts). Why not just a suffix? There are known attacks against plaintext prefixes and suffixes.

I apply the setup routine not a single time (as is typical in RC4), but cycled 256 times (thus increasing setup time for the algorithm by a factor of 256), which also allows for passwords of up to around 235 bytes to still reasonably apply. Expanding the table from 256 bytes to 65536 bytes would give us the opportunity to take keys of up to ~64k, but it would also increase setup time significantly for short sessions (never mind result in questions about endianness for this "stream" cipher).

I discard the first 2048 bytes of the keystream (as per recommendations online, which is greater than the 256 byte recommendation from "Foundations of Security").

For verification, I set up a sha1 MAC with the presalt+key+postsalt, which is appended to the data stream and encrypted (so if the wrong password is provided, the hash is also garbage ;) ).


The results? ~400KB/second encode/decode speed on a modern machine in pure Python, over 1.2Mb/second on my 1.3 ghz celeron laptop with Psyco, and an algorithm that attempts to address all of the known attacks against RC4. Assuming your /dev/random doesn't suck (or Mersenne Twister initialized with the system timer gives you decent random data), this algorithm would seem to be pretty reasonable.

Now all I need to do is to finish the interface, get my passwords stored properly, implement this with a stream interface without the checksum (it currently does single-shot string encryption/decryption), and make all of the extra "improve security" features options for both the one-shot and stream versions (the latter of which will become the only version).

Update: Using Psyco 1.6 on my OSX laptop gets me 5.7 megs/second. Not bad.
link8 comments|post comment

[Jun. 16th, 2008|11:34 pm]
In an attempt to expand my horizons, I've chosen OSX machines for both my desktop and laptop dev machines. I've noticed some interesting things since going down this path, but I'll keep them to myself for a few weeks until I've really had an experience doing real dev work on it.

On the positive side, wireless configuration is quite nice (far better than the standard XP/Vista nonsense), but maybe that's just the local networks.

Known ucks: keyboard key repeat frequency is far too low to be useful, and the behavior of home/end on the keyboard (keys that I use quite often as a developer) are useless. I'll probably try to pick up some key remapper software somewhere, anyone have any ideas? I also miss my IBM's trackpoint (I hate trackpads) and the scrolling by using two fingers on the trackpad in OSX is trash. I'm not saying that I could do any better, just that it's crap (slow if it works, commonly jumps around...).

Otherwise the Macbook Pro is pretty sweet. A claimed 4+ hour battery life when I'm doing normal stuff online, a gorgeous screen (though not quite as high of a range in viewing angle as the Toshiba that I just sent off to my older brother), and quite fast in most everything I've wanted to do.


Anyone know of any good GUI archive software for Macs? Anything that can handle .zip should be sufficient (I do know about the command-line 'zip' command, but it would be nice to not have to rely on command-line help all the time). Thanks.
link4 comments|post comment

Multi-tiered internet service [Jun. 15th, 2008|09:21 pm]
Over the years, many ISPs have talked about offering multi-tiered internet service, that is, for those who transfer more data, charge them more. Pretty simple idea. It's the same idea behind cable TV packages; those who want more channels, charge them more. Seems to make sense, right? For the internet providers, it makes perfect sense, but for the consumer, it's downright silly.

Take your current unlimited internet package, which in America is overpriced and underperforming (compared to S. Korea, Japan, Sweden, and most other technically advanced nations). Since gaining a bit of a life outside of computers, I don't necessarily need 15 megabit service to my apartment (which I can't get from the local monopoly anyways), but even the "entry-level" 1.5 megabit runs $35.

But what about the "regular" people that the DSL/Cable companies talk about? They are all saying, "those who use it more should pay more". But what about those who use it less? Well, Time Warner is offering 5 gigs of monthly transfer at 768kbit for $29.95/month in Beaumont, Texas. Maxing out at $54.90 for 40 gigs of monthly transfer at 15mbit. Wow, so...charge mom and pop out the nose for minimal service...and charge p2p-master out the nose for similarly minimal service. If Time Warner were an individual, we would call them a hypocrite. But since they are a corporation, their stock prices rose because of the upward tiering of access, where mom and pop really needs the service to be tiered downwards.

It's not all that surprising. The health insurance industry is doing the same thing today, only in addition to charging the sick more, they also drop their coverage. But I digress, I'm not an instrument for social(ized health insurance) change, I'm just talking about reasonably priced internet access.

If I had time and money, I would seriously consider setting up paid-for wireless internet. I'm not versed in the most recent technologies, but I'm sure that entry-level wireless could profitably be delivered for under the $30/month that Time Warner and others are overcharging their customers. I think that this is one reason behind Google's purchase of spectrum, but I don't know for sure (being that I don't start until tomorrow), and if I find out, I won't be able to tell anyone :/ .

I hope everyone out there in internet land is doing well, I'm going to eat some dried cherries, some trail mix, and head to bed. Eight AM call for the first day of orientation is going to be interesting.
link3 comments|post comment

My new home... [Jun. 8th, 2008|10:44 pm]
As stated earlier, pictures! These photos were all taken with me standing in the same spot, just turning...

Northeast along my avenue:
Looking northeast along my avenue

Southeast looking at my door (sorry for the bad lighting, I was doing a continuous-shooting mode on my camera):
Looking southeast at my door

Southwest down the path to the beach:
Looking southwest at my walk to the beach

When I get all settled in, there will be a gathering for beach fun.
linkpost comment

mmm, comics [Jun. 8th, 2008|10:30 am]
This morning I was reminded of The Red Star; a fairly short comic series that I enjoy, but whose continued enjoyment was limited by the fact that one of the trade paperbacks was in copyright/publishing limbo for a long time. It seems as though those issues have been worked out, so it would seem that I need to take a trip by my storage locker to see which books I have/need (yes, need).
linkpost comment

Found a place... [Jun. 7th, 2008|06:41 pm]
I'll post pictures later, but let me describe where it is.

I walk out my front door, and cross the alley. I walk 30 yards down the pedestrian path, cross the sidewalk, cross the 30 feet of grass. Then there is the 50 yards of sand, and I'm in the water. My new place is in Venice, about a half-dozen blocks down from the core strangeness that is typical of Venice Beach. Craigslist came through like gangbusters.

Surfing in the morning, a 3 1/2 mile bicycle commute to work, ...

Looking at how the waves break south of the jetty (where I will be surfing), it's nice and easy. No hard breaks like Salt Creek. Very much like Newport Beach was a couple weeks ago. Also, even on a Saturday in June, there weren't many people near my section of the beach.

I can't wait. Unfortunately, I don't really get to stay in the place until the 21st/22nd, because I have one more week of work at NIM, then a week of orientation in Mountain View.

Also, at least I won't need to worry about having an address for mail forwarding/change of address.


Now it's time to celebrate!
link5 comments|post comment

[Jun. 5th, 2008|05:47 pm]
Though I am looking and calling as much as I can, I don't believe I'll be able to find an apartment by the time I need to move in the next couple weeks. On the upside, I could probably find a short-term place close enough to where I need to be, as sort of a "base of operations" until I can actually find a real place, but it leaves me in a bit of limbo with regards to my mailing address. Bah!

My tentative plan is to call as many places as I can, see as many as I can this weekend, and while wandering around the neighborhoods, take down information for any "for rent" signs that I see, then call and possibly visit/apply the following week/weekend. That week in Mountain View is sort-of a killer. It does delay by a week when I *need* to move, but it also removes the opportunity to go to complexes, talk, etc.

I suppose worst-case scenario, I sleep in my van in the Google parking garage, and shower at the gym. At least I'll be fed, have internet access, and be safe (it's gated) ;).
link8 comments|post comment

Goodbye OC, hello LA [May. 30th, 2008|01:14 pm]
I was offered a position at Google today, and I accepted. I've put in my 2 week notice.

It's now a matter of getting and signing paperwork, listening to and refusing counter offers from my current employer, finding a place to live somewhere along the coast between Manhattan Beach and Redondo Beach, and all of the other nonsense related to changing jobs and moving.

If all goes as planned, I'll be starting on June 16.


I still would prefer to teach at a college or university, but paying off student loans is not a bad idea in the interim.
link23 comments|post comment

[May. 24th, 2008|10:12 am]
I managed to bring clouds, cold weather, and rain to Lake Havasu. I've been partially blamed. I got to see the new Indiana Jones movie, which I would consider overall good, but which I have a few issues with (nothing worth mentioning on a Saturday morning).

I've been getting good news over the week, and if things go really well (which I'm assured by everyone they will), I'll talk about it more in a couple weeks. The short version: I'll definitely be able to move out of my current place (which I share with a nice guy, who happens to be a partially-functioning alcoholic) by the end of June, and I'll definitely be moving to the beach. Sun, sand, surf, my own place. Awesome.

Now I just need to get my new public key accepted at python.org .


Enough from me. Pictures and a longer blog entry when I get back. Everyone be safe this weekend.
linkpost comment

navigation
[ viewing | most recent entries ]
[ go | earlier ]