Skip to main content

2010 (old posts, page 11)

• −    − •    − • •    − • − −     − − −    • −    • − •    • − • •    •    − • − −
• −    − •    − • •    − • − −     − − −    • −    • − •    • − • •    •    − • − −

Microsoft data center pictures

A post on Gizmodo has some good pictures and short video of the container-based approach for housing machines in some Microsoft data centers such as the ones used to run Bing.

Working on the software side of cloud services, it is at times easy to forget how much hardware, infrastructure, sophistication exists in these mega deployments. While it's important to understand the characteristics of the deployment environment when building applications, at the same time it's really remarkable how easy it is to deploy code at a large-scale onto thousands of machines without even knowing which state they're physically in.

I do wonder whether one day we'll look at these huge data centers and find them quaint relics of the past, a throwback to when there were efficiency gains to be had by physically consolidating the weak computing power of the day. That day is still a good way away.

The Microsoft Datacenters blog actually has some interesting longer posts on the topic. The Global Foundation Services site also has some deeper whitepapers too.

• −    − •    − • •    − • − −     − − −    • −    • − •    • − • •    •    − • − −
• −    − •    − • •    − • − −     − − −    • −    • − •    • − • •    •    − • − −

Result counts for search phrases

I liked this week's comic over at xkcd titled Google results for various phrases:

Using result counts in this manner isn't exactly scientific as in most cases they're only estimates rather than exact numbers. However, for fun like this it's probably good enough especially comparing among like queries.

Value parameter sweeps (my IQ is <X>) are only one way of looking at this. Performing similar analysis against time with query volume, published document count, visit counts, or a combination of all of these can be similarly enlightening. Unfortunately, the whole technique is somewhat dependent on the reliability of the underlying data. As the data implies the typical internet citizen has an IQ of 147, there's definitely some bias in here somewhere but of course the results of an in-person verbal survey might be similarly skewed if not validated in some other way.

There's a a post over at Google Operating System titled Data Mining Using Google which actually has a clever mashup with Google Spreadsheets too.
• −    − •    − • •    − • − −     − − −    • −    • − •    • − • •    •    − • − −

Is there really room for many location-based networks?

I've been 'playing' Foursquare on and off for about six months now so it's interesting to see it now gaining much more visibility among the people I see on Facebook and Twitter. Since Beijing support was only added a couple of months ago it's still like the Wild West here with few mayors and much opportunity to stake claims and that's continued to keep me engaged.

Meanwhile, in a recent post, Dare Obasanjo talks about applying being more cautious with accepting friend requests due to the real world privacy implications. That got me thinking about the overall growth potential of the multiple player already here (Gowalla, Foursquare, Loopt and others) and whether Metcalfe's law plays the same role here as in other social networks. With an increased sensitivity to privacy, perhaps it's sufficient to just be a part of the network of people I'm likely to have some reason to share my location with rather than the network that contains everyone. That could set up for an equilibrium with multiple services rather than an inevitable winner-takes-all situation. As a result, it could look similar to the fragmentation of social networking sites across country boundaries (e.g. where different networks are popular in different regions) except at a smaller scale such as city, social group or a combination of the two.

Overall though I have a hard time believing that there's room for parallel set of location-based social networks against the backdrop of well-established social networks. A well-designed and neatly-integrated feature in Facebook (for example) with sufficient controls to declare which subset of friends can see your location seems likely to dominate by momentum alone.

• −    − •    − • •    − • − −     − − −    • −    • − •    • − • •    •    − • − −
• −    − •    − • •    − • − −     − − −    • −    • − •    • − • •    •    − • − −

Education data mining and visualization

From a post titled Education Data Mining and Visualization by David Wiley at BYU:

The first visualization we’ve developed is one we call the “Waterfall.” The vertical axis represents students’ final grades (higher final grades at the top). The horizontal axis represents time, with each cell representing a day in the semester. Each individual row represents an individual student. Finally, the darkness of the water droplet represents the amount of time that student spent that day completing gradable activities.
The result is a lot of information on a specific measure densely packaged into a single glance. education,data,visualization

The correlation between time spent (effort, in other words) and final grade is clearly evident but there's another interesting part of the chart. Around 75% to 85% final grade there's a band which seems to put in consistently less effort overall than the clusters of students above and below it, yet still result in a decent grade. I wonder what characteristics those students possess and whether there are any patterns in the paths their lives then take.

• −    − •    − • •    − • − −     − − −    • −    • − •    • − • •    •    − • − −

Daily photo: Sandstorm in Beijing

Sand storm in Beijing

I haven't made any adjustments to this photo. It really was this yellow at 10am this morning during a sandstorm.

Happy first day of Spring!

• −    − •    − • •    − • − −     − − −    • −    • − •    • − • •    •    − • − −

Nobody buys gold for the price of silver

I finally got around to reading an interesting paper titled Nobody Sells Gold for the Price of Silver: Dishonesty, Uncertainty and the Underground Economy.

The finding is quite interesting although in consideration, quite plausible. Abstract follows (emphasis is mine):

Using basic arguments from economics we show that the IRC markets studied[trading stolen identities, botnets, etc.] represent classic examples of lemon markets. The ever-present rippers who cheat other participants ensure that the market cannot operate effectively. Their presence represents a tax on every transaction. Those who form gangs and alliances avoid this tax, enjoy a lower cost basis and higher profit. This suggests a two tier underground economy where organization is the route to profit. The IRC markets appear to be the lower tier, and are occupied by those without skills or alliances, newcomers, and those who seek to cheat them. The goods offered for sale on these markets are those that are easy to acquire, but hard to monetize. We find that estimates of the size of the IRC markets are enormously exaggerated. Finally, we find that defenders recruit their own opponents by publicizing exaggerated estimates of the rewards of cybercrime. Those so recruited inhabit the lower tier; they produce very little profit, but contribute greatly to the externalities of cybercrime.