Saturday, May 09, 2009
Websites with "sound" are evil...
and anyone using them is guaranteed to have me click away instantly.
Thursday, March 05, 2009
The "Obama-Conned"
During the election, there were a lot of conservative pundits, especially of the big-think types who write for newspapers and magazines in New York and Washington, who decided to plump for Obama. Basically, their arguments were that he was "smart", was surrounding himself with smart people, and he wasn't a wild-eyed radical, despite his associations with Hyde Park 1960s lefties, his history as a "community organizer", and clear radical sympathies as expressed in his books.
I also was hopeful immediately after the election, even though I didn't go so far as to actually vote for him. I figured events would constrain him to find his inner Calvin Coolidge, as they did to Bill Clinton in his first term. But instead of the smart, reasonable guy we were hoping for, we're seeing the Hyde Park "community organizer" radical, channeling his inner Hugo Chavez.
These pundits are now withdrawing their support, and the ranks of the Obama-Conned are growing by leaps and bounds.
I also was hopeful immediately after the election, even though I didn't go so far as to actually vote for him. I figured events would constrain him to find his inner Calvin Coolidge, as they did to Bill Clinton in his first term. But instead of the smart, reasonable guy we were hoping for, we're seeing the Hyde Park "community organizer" radical, channeling his inner Hugo Chavez.
These pundits are now withdrawing their support, and the ranks of the Obama-Conned are growing by leaps and bounds.
The economic problem with consumption...
is that many products we have nowadays are actually quite good. A decently maintained car in a friendly climate can last 15 years or more. A modern PC can be used until it physically fails; unless you're a hard core gamer, you don't need to upgrade a PC every couple of years or so like we did in the 1990s. And many people can get away with cheap, micro-things like netbooks or tiny PC thingies that cost $300 or less.
Clothes are cheap, shoes are high quality, and all this stuff can last quite a long time if it's maintained.
So, many people can easily live "out of inventory" for several years before needing to buy much of anything beyond consumables.
This is why the "stimulus" won't do much; people will just save it. Even trying to loosen up consumer credit won't help much; the last thing any sane person wants to do now is go out and buy a new car on credit or go on a credit-card-powered shopping spree.
The US savings rate is up to 5% and rising, from near-zero in the middle of last year. I suspect most of this is non-savers paying down debt (which looks like economic savings), although many congenital savers are also saving even more.
Anyway, the Great Deleveraging continues, and will likely go on for a long time.
Clothes are cheap, shoes are high quality, and all this stuff can last quite a long time if it's maintained.
So, many people can easily live "out of inventory" for several years before needing to buy much of anything beyond consumables.
This is why the "stimulus" won't do much; people will just save it. Even trying to loosen up consumer credit won't help much; the last thing any sane person wants to do now is go out and buy a new car on credit or go on a credit-card-powered shopping spree.
The US savings rate is up to 5% and rising, from near-zero in the middle of last year. I suspect most of this is non-savers paying down debt (which looks like economic savings), although many congenital savers are also saving even more.
Anyway, the Great Deleveraging continues, and will likely go on for a long time.
Tuesday, February 03, 2009
A word from my current work world...
Now that I have a job, my current Big Project is to make our product, which currently falls apart under a load of about 20 million "sessions" (or about 60M records), to run with 500M sessions or about 1.5B records. Searches are ad-hoc on any combination of AND, OR, and NOT, on about 20 different search parameters. Also, I was to do this without needing fancy hardware support. To make things more interesting, the schema had to work on both Oracle and MySQL.
To solve this problem, I used a schema derived from "snowflake" designs from data warehouses, and broke up the data into "dimension" (search) versus "fact" (data display) tables. The dimension tables are denormalized in that they store some data that is also in FACT tables.
Queries are done by doing initial qualification on the search tables. The results of these searches are saved to temp tables. The final set of qualifying IDs is computed using SQL INTERSECT, UNION, or MINUS queries on the IDs in the temp tables, depending on the search logic. After we have the final set of qualifying IDs, the set is sorted using a computed value that is bound to all "sessions", and the top N records are joined with the FACT tables to produce the final result.
The fundamental problem with the old schema is it relied heavily on ID joins for secondary qualification, after "picking" a primary search qualification. Large-scale ID joins require full B-tree descent for every record in the join, and if there are hundreds of thousands of records being qualified this way, the query will take minutes or more. My new schema avoided this problem by simply qualifying each criterion separately, doing a single pass through the B-tree index per criterion, and doing the qualification logic without actually needing to visit the - very wide - FACT table.
A thing that helped hugely was using INDEX ORGANIZED tables in Oracle, which means that all table data is physically stored in the index - as opposed to the standard storage method of having the index records pointing at "real" table storage in a heap. This meant my qualification searches didn't actually have to traverse index recs to get at a base table - they were pure index scans. I haven't done MySQL yet, but InnoDB storage is basically identical to INDEX ORGANIZED tables, so this trick should still work.
One advantage that I have with our application is it has a magic number I can use to short-circuit searches, but even without this magic number, my new approach is still far faster than a more "standard" approach using simple joins.
The final results are that my new approach is nearly four orders of magnitude faster than the existing schema, and is completely predictable. We're able to go after much bigger deals than before now that we have a more scalable schema.
To solve this problem, I used a schema derived from "snowflake" designs from data warehouses, and broke up the data into "dimension" (search) versus "fact" (data display) tables. The dimension tables are denormalized in that they store some data that is also in FACT tables.
Queries are done by doing initial qualification on the search tables. The results of these searches are saved to temp tables. The final set of qualifying IDs is computed using SQL INTERSECT, UNION, or MINUS queries on the IDs in the temp tables, depending on the search logic. After we have the final set of qualifying IDs, the set is sorted using a computed value that is bound to all "sessions", and the top N records are joined with the FACT tables to produce the final result.
The fundamental problem with the old schema is it relied heavily on ID joins for secondary qualification, after "picking" a primary search qualification. Large-scale ID joins require full B-tree descent for every record in the join, and if there are hundreds of thousands of records being qualified this way, the query will take minutes or more. My new schema avoided this problem by simply qualifying each criterion separately, doing a single pass through the B-tree index per criterion, and doing the qualification logic without actually needing to visit the - very wide - FACT table.
A thing that helped hugely was using INDEX ORGANIZED tables in Oracle, which means that all table data is physically stored in the index - as opposed to the standard storage method of having the index records pointing at "real" table storage in a heap. This meant my qualification searches didn't actually have to traverse index recs to get at a base table - they were pure index scans. I haven't done MySQL yet, but InnoDB storage is basically identical to INDEX ORGANIZED tables, so this trick should still work.
One advantage that I have with our application is it has a magic number I can use to short-circuit searches, but even without this magic number, my new approach is still far faster than a more "standard" approach using simple joins.
The final results are that my new approach is nearly four orders of magnitude faster than the existing schema, and is completely predictable. We're able to go after much bigger deals than before now that we have a more scalable schema.
Monday, December 01, 2008
The Car "Format Wars"
Everyone who owns DVDs or lived through Beta versus VHS knows all about Format Wars in the electronics industry. A similar problem is "related tech" wars, such as Plasma versus LCD in flat-screen TVs, etc. And, as a consumer - at least if you're a cheapscate like me who expect that things that cost $hundreds or $thousands will be useful for several years at least - is to wait until there's one or two clear technology winners that are well-understood and will be around awhile.
And one thing I often wonder about is whether we're seeing the beginnings of "format wars" reactions by consumers in the car industry? I suspect so - who wants to buy a Chevy sedan today if they can wait for the Volt (assuming GM lasts that long)? The credit crunch and the recent gas-price crunch are bad enough, but I suspect looming format wars aren't helping either.
Personally, I'm a techie who likes to buy "cool" stuff, but not enough to qualify as an early adopter. I want something that costs $20K+ to last for about fifteen years and 175K miles, as have my first three cars. I buy new, over-maintain, and drive it until the wheels fall off.
Where the format war issue comes in is I'm not interested in buying a "gas-only" car, and would like to "vote" for a good hybrid tech vehicle, but am hesitant about the various hybrid techs out there. I'd like a plug-in serial hybrid - at least - and would love to be able to buy a hydrogen car if one were available. The problem is there's way too many technology moving parts in hybrids for me to be interested in paying a premium for a vehicle which may need to be basically dumped if one element or another in the powertrain proves to need expensive replacement.
Also, will we have neighborhood mechanics who can fix these babies? Or do we have to the dealer, where everything beyond an oil change costs $1K?
And, ironically, most cars with ICE powertrains are so reliable nowadays that if they're decently maintained, they'll run nearly forever. (Note that this includes GM cars, as my 1993 Saturn demonstrated after running up 175K miles with little unscheduled maintenance before it was stolen (!) three years ago.)
Our eight year old Toyota and our three year old Honda should last awhile longer, but I'm hoping they'll last another half-decade or so while the powertrain format war plays out, and we'll start to understand what happens when hybrids and new-style powertrain cars get old.
And one thing I often wonder about is whether we're seeing the beginnings of "format wars" reactions by consumers in the car industry? I suspect so - who wants to buy a Chevy sedan today if they can wait for the Volt (assuming GM lasts that long)? The credit crunch and the recent gas-price crunch are bad enough, but I suspect looming format wars aren't helping either.
Personally, I'm a techie who likes to buy "cool" stuff, but not enough to qualify as an early adopter. I want something that costs $20K+ to last for about fifteen years and 175K miles, as have my first three cars. I buy new, over-maintain, and drive it until the wheels fall off.
Where the format war issue comes in is I'm not interested in buying a "gas-only" car, and would like to "vote" for a good hybrid tech vehicle, but am hesitant about the various hybrid techs out there. I'd like a plug-in serial hybrid - at least - and would love to be able to buy a hydrogen car if one were available. The problem is there's way too many technology moving parts in hybrids for me to be interested in paying a premium for a vehicle which may need to be basically dumped if one element or another in the powertrain proves to need expensive replacement.
Also, will we have neighborhood mechanics who can fix these babies? Or do we have to the dealer, where everything beyond an oil change costs $1K?
And, ironically, most cars with ICE powertrains are so reliable nowadays that if they're decently maintained, they'll run nearly forever. (Note that this includes GM cars, as my 1993 Saturn demonstrated after running up 175K miles with little unscheduled maintenance before it was stolen (!) three years ago.)
Our eight year old Toyota and our three year old Honda should last awhile longer, but I'm hoping they'll last another half-decade or so while the powertrain format war plays out, and we'll start to understand what happens when hybrids and new-style powertrain cars get old.
Saturday, October 04, 2008
Political discussions, trust, and logical argument
As a basically conservative "small-ell" libertarian in Silicon Valley, I'm surrounded by Obamaphiles - when I'm not encountering people further to the Left - who basically want to grind those who oppose them into the dust. Needless to say, it's hard to have a political conversation without being thrust into a situation where name-calling or other "categorization behavior" begins, and any hope that you can actually have an interesting discussion ends.
This is particularly painful for me, since I like to talk about politics and ideas, and have always felt that one's politics is informed by one's life experience, and that the religious view that one's politics are Right and one's opponents are Wrong is silly. In a large, complex world, policies will always be unsatisfying, inelegant muddles, and perspectives of small-state types like me and gung-ho, let's Use The Government to Solve Social Problems types like most honest progressives will be useful.
One thing I do try to do is to avoid logical fallacies in political discussions. This is hugely difficult, since political arguments are always rhetorical, and driven as much by personalities (Bush/Cheney/Rove is Hitler! Obama is a Commie!) as by any actual policy or philosophy discussion.
One other thing that's even harder to deal with is political humor. At the risk of appearing humorless, my feeling is that political humor is hugely rhetorical and manipulative, driven by stereotypes and logical fallacies buried behind a veneer of "trying to be funny", which makes it all OK.
Sorry, I'm not laughing. Political humor, which is the way many people - especially younger people - shape their political opinions these days, is a very serious business and drives a lot of the political tribalism that I find so dangerous.
Another dimension is the use of rhetoric as argument. One of the key points of logical argument is separation of the argument from the person making the argument, so that it is a fallacy to say that "Policy X is wrong because Bush/Rove/Obama advocated it" (a variant of ad-hominem) or it flip-side "Policy Y must be good because Really Smart Guy Z that I Really Like advocates it" (argument to authority).
In ideal logical argument, proper names are simply not used. It's all about the arguments and the fact base underneath the arguments.
The Question of Hypocrisy
The most seemingly powerful rhetorical argument one can make is hypocrisy. And it can't be denied that if someone makes one argument one day and puts a completely opposite argument on the table the next day for the purpose of political convenience is being a hypocrite and should lose credibility in the world of political punditry. But their arguments themselves still stand and should be refuted - or not - as arguments, and not simply discounted because they were advanced by a hypocrite.
I am well aware that logical argument is seen by many post-modernists, post-structuralists, critical theorists, etc as being an invalid way to argue, because logical argument purposely ignores the reasons why someone advances an argument. But the "why" shouldn't matter! If the argument is invalid, it will be shown to be invalid by a better argument. It shouldn't matter whether the person making the argument is an Asian woman or a gay black man - or holds General Motors stock.
Once you toss logical argument, with its common set of rules and clear definitions of validity that are available to all sides of the argument, into the ditch, all that's left is a thousand variants of "might makes right". The Greeks figured this out 2500 years ago, and they're still right.
This is particularly painful for me, since I like to talk about politics and ideas, and have always felt that one's politics is informed by one's life experience, and that the religious view that one's politics are Right and one's opponents are Wrong is silly. In a large, complex world, policies will always be unsatisfying, inelegant muddles, and perspectives of small-state types like me and gung-ho, let's Use The Government to Solve Social Problems types like most honest progressives will be useful.
One thing I do try to do is to avoid logical fallacies in political discussions. This is hugely difficult, since political arguments are always rhetorical, and driven as much by personalities (Bush/Cheney/Rove is Hitler! Obama is a Commie!) as by any actual policy or philosophy discussion.
One other thing that's even harder to deal with is political humor. At the risk of appearing humorless, my feeling is that political humor is hugely rhetorical and manipulative, driven by stereotypes and logical fallacies buried behind a veneer of "trying to be funny", which makes it all OK.
Sorry, I'm not laughing. Political humor, which is the way many people - especially younger people - shape their political opinions these days, is a very serious business and drives a lot of the political tribalism that I find so dangerous.
Another dimension is the use of rhetoric as argument. One of the key points of logical argument is separation of the argument from the person making the argument, so that it is a fallacy to say that "Policy X is wrong because Bush/Rove/Obama advocated it" (a variant of ad-hominem) or it flip-side "Policy Y must be good because Really Smart Guy Z that I Really Like advocates it" (argument to authority).
In ideal logical argument, proper names are simply not used. It's all about the arguments and the fact base underneath the arguments.
The Question of Hypocrisy
The most seemingly powerful rhetorical argument one can make is hypocrisy. And it can't be denied that if someone makes one argument one day and puts a completely opposite argument on the table the next day for the purpose of political convenience is being a hypocrite and should lose credibility in the world of political punditry. But their arguments themselves still stand and should be refuted - or not - as arguments, and not simply discounted because they were advanced by a hypocrite.
I am well aware that logical argument is seen by many post-modernists, post-structuralists, critical
Once you toss logical argument, with its common set of rules and clear definitions of validity that are available to all sides of the argument, into the ditch, all that's left is a thousand variants of "might makes right". The Greeks figured this out 2500 years ago, and they're still right.
Saturday, August 23, 2008
The pronunciation of "Beijing"
As someone who spent nearly a year living in Beijing - and not being Chinese, although married to a wonderful lady from that part of the world - I've been annoyed at the tendency to pronounce "Beijing" like "Beige-Ing". It is (basically) "Bay-Jing", with the second syllable starting with a hard "J" like "Juice", versus the drawled "Ge" in "Beige".
One would think that Bob Costas would have figured this out by now...
One would think that Bob Costas would have figured this out by now...
Friday, August 22, 2008
On Software Tools
In my new job, I'm in a largely Java shop for the first time. I'll be doing a "query cache accelerator" for them fairly soon, which will be done in C...
Anyway, one thing that I'm now highly exposed to is the Java world's proliferation of software tools. As a guy who figured that 20+ year old tools like make, vi, gdb, gprof, and purify are the cat's meow, it's hard to deal with tools that change every six months. Also, I hate wasting time learning the fiddly idiosyncrasies of yet another bunch of gooey-licious tools, who's main "advantage" over standbys like make is primarily their GUIness.
But, there we are. And I'm stuck with them, I suppose. The one thing I'm insisting on is that we pick a suite of tools, do all the customization we need to do, and stick with them and not change the world every few months as new tools appear.
I've always figured that software tools are like lawyers: you need to know a few good ones well for various purposes, but you don't want them to get in the way of living your life or doing your job.
Anyway, one thing that I'm now highly exposed to is the Java world's proliferation of software tools. As a guy who figured that 20+ year old tools like make, vi, gdb, gprof, and purify are the cat's meow, it's hard to deal with tools that change every six months. Also, I hate wasting time learning the fiddly idiosyncrasies of yet another bunch of gooey-licious tools, who's main "advantage" over standbys like make is primarily their GUIness.
But, there we are. And I'm stuck with them, I suppose. The one thing I'm insisting on is that we pick a suite of tools, do all the customization we need to do, and stick with them and not change the world every few months as new tools appear.
I've always figured that software tools are like lawyers: you need to know a few good ones well for various purposes, but you don't want them to get in the way of living your life or doing your job.
Wednesday, August 06, 2008
Thoughts on "Coding Interviews"
In my job search, I was asked to write code in a couple of interviews. Personally, I don't like asking coding questions in interviews, and if I have a question about a programmer's ability, I'll use a simple programming test that's emailed to them a couple of days before - and ask them to explain their code in the interview.
For me, the problems with interview code questions are the following:
1. It's a completely weird environment. No computer, no compiler, and extremely vague requirements. Few programmers do well in this environment.
2. Lots of room for silly "gotcha" pickiness that has little to do with actual programming skill.
3. The programming problem often involves too much code for a few minutes.
As an interviewee, my strategy for programming questions was the following:
0. Get the requirements, and state any assumptions. If you're wondering about a definition or something, go ahead and ask - it's better to do this than to get "gotcha'ed". Frankly, I also like to use the requirements discussion to "run out the clock" so that the coding lasts only a few minutes.
An aside: if the interviewer gets impatient at this point, they're probably not very good programmers themselves, or they're very junior.
1. Set up the data structures.
2. Discuss the main modules/classes/APIs you expect to use. Don't get hung up on the details of library calls or whatever; in real life, you'd look these up anyway, so go ahead and say so.
3. Only code the "top level" function that implements the main algorithm, using the data structures and APIs you outlined above. Bury stuff that's tedious and that takes a lot of code behind a module that you discuss but don't write, unless the interviewer insists.
Also, up front, state that you're explicitly skipping error conditions and error handling, but discuss how you'd code error handling if you were doing this "for real".
4. If you get gotcha'ed, go ahead and make the corrections. Don't get flustered.
If you expect to have this sort of "live programming" asked in an interview, it may be useful to have a practice session where you sit in front of a "friendly" and write some code on a piece of paper so you get used to the environment.
For me, the problems with interview code questions are the following:
1. It's a completely weird environment. No computer, no compiler, and extremely vague requirements. Few programmers do well in this environment.
2. Lots of room for silly "gotcha" pickiness that has little to do with actual programming skill.
3. The programming problem often involves too much code for a few minutes.
As an interviewee, my strategy for programming questions was the following:
0. Get the requirements, and state any assumptions. If you're wondering about a definition or something, go ahead and ask - it's better to do this than to get "gotcha'ed". Frankly, I also like to use the requirements discussion to "run out the clock" so that the coding lasts only a few minutes.
An aside: if the interviewer gets impatient at this point, they're probably not very good programmers themselves, or they're very junior.
1. Set up the data structures.
2. Discuss the main modules/classes/APIs you expect to use. Don't get hung up on the details of library calls or whatever; in real life, you'd look these up anyway, so go ahead and say so.
3. Only code the "top level" function that implements the main algorithm, using the data structures and APIs you outlined above. Bury stuff that's tedious and that takes a lot of code behind a module that you discuss but don't write, unless the interviewer insists.
Also, up front, state that you're explicitly skipping error conditions and error handling, but discuss how you'd code error handling if you were doing this "for real".
4. If you get gotcha'ed, go ahead and make the corrections. Don't get flustered.
If you expect to have this sort of "live programming" asked in an interview, it may be useful to have a practice session where you sit in front of a "friendly" and write some code on a piece of paper so you get used to the environment.
Technology observations from my job search
For all that job searching is a pain in the butt, it gives you a chance to see what companies are doing and where their "pain points" are. Some observations from my search:
1. Companies are drowning in data, particularly small ones that can't afford to build a fancy glass house for racks of servers running Oracle. There's a golden opportunity here.
2. A few players are going after this opportunity, but they seem focused on the top of the market. This is probably reasonable from a business perspective, but my impression is that bowie-knives.com needs high performance stuff as much as Google.
3. The DB market may well end up splitting in two: high-performance query answering and search, and highly reliable archival and storage. The latter interferes with the former, and several places where I was interviewing definitely want high-performance query answering. (I'm building one of these puppies for my new company.)
4. There is lots of other movement going on in databases. With the end of Moore's Law, and data and the need for high-speed data processing growing quickly, new db architectures are appearing. Several companies are building various forms of "database appliances", and some new hardware stuff may help here.
5. Programming is about to get a whole lot harder as computer hardware changes intrude on programming in a way they haven't for at least 20 years. This is good for us old fogies who like hard "edge condition" problems, but Joe Java, who lives in a world where abstractions hide all the fun stuff, may have a hard time adjusting.
I saw this in a couple of companies: a bunch of smart young guys, steeped in the latest languages and alphabet-soup "skills", wondering how to get a factor of 100 performance speedup in their complex application without buying 100x more hardware. It can be done, but commodity approaches probably won't do it.
1. Companies are drowning in data, particularly small ones that can't afford to build a fancy glass house for racks of servers running Oracle. There's a golden opportunity here.
2. A few players are going after this opportunity, but they seem focused on the top of the market. This is probably reasonable from a business perspective, but my impression is that bowie-knives.com needs high performance stuff as much as Google.
3. The DB market may well end up splitting in two: high-performance query answering and search, and highly reliable archival and storage. The latter interferes with the former, and several places where I was interviewing definitely want high-performance query answering. (I'm building one of these puppies for my new company.)
4. There is lots of other movement going on in databases. With the end of Moore's Law, and data and the need for high-speed data processing growing quickly, new db architectures are appearing. Several companies are building various forms of "database appliances", and some new hardware stuff may help here.
5. Programming is about to get a whole lot harder as computer hardware changes intrude on programming in a way they haven't for at least 20 years. This is good for us old fogies who like hard "edge condition" problems, but Joe Java, who lives in a world where abstractions hide all the fun stuff, may have a hard time adjusting.
I saw this in a couple of companies: a bunch of smart young guys, steeped in the latest languages and alphabet-soup "skills", wondering how to get a factor of 100 performance speedup in their complex application without buying 100x more hardware. It can be done, but commodity approaches probably won't do it.