ZD: 2008

Thursday, December 18, 2008

California Immigration

I've uploaded the Census US demographic data to Good Data and I can't stop wondering. For example, would you believe that good one fourth of California population are immigrants? More precisely 26.2% Californians were born outside of US.

Do you want start wondering too? Let me know ( zd at gooddata.com) and I'll invite you to this analytic project.

Wednesday, November 5, 2008

My Good Data Web Expo Slides

See the slides that I presented on the WebExpo conference.

Good Data REST API

The Good Data BI platform is accessible through the stateless REST API. This HTTP-based API can be simply used from any 3rd party application as well as from a plain browser. The API provides the full power of our platform (we actually use it as the backend for our web frontend).

In fact the Good Data application consists of a handful types of services. Instances of these services can be dynamically added or removed (via simple HTTP load-balancing) on as needed basis. Add the Amazon EC2 cloud that allows us to add or remove a new machine and only pay for the CPU ticks that we really use. The net result is the great flexibility, scalability and cost efficiency.

The demo video below points at the fundamental architecture differences between our approach and some other on-demand BI vendors who simply deployed an existing BI package (e.g. Pentaho or MS Analytics) on the web (which unfortunately does not prevent their marketing from using the multi-tenant, SaaS mambo jumbo).

This video might help you to better understand the Good Data architecture. I apologize for no audio. Hopefully the simple step-by-step description below helps:

1. The /gdc suffix in the GDC BI platform URL shows the list of the REST API services that the platform provides.

2. Then we navigate to the metadata services that manage metadata for a selected BI project (the FoodMartDemo in our case).

3. We first show the FULL-TEXT SEARCH service. We specify the search term ("sales") directly in the service's URL. The list of matching results is shown.

4. We select one of the reports from the search result to inspect the report's definition. We can spit out the definition in many formats (e.g. JSON, YAML, ATOM, or XML). We use YAML as the default.

5. Then we demonstrate the metadata QUERY service. We list all reports in the FoodMartDemo project. We again inspect one of the reports: Salary by Year and State.

6. Then we are going to demonstrate the using service that shows us all dependencies (metadata objects that the selected report references) of the report. For example the report depends on it's definition (reportDefinition) object. We copy and paste the link of the report definition to the browser URL bar to inspect the report definition object structure. It contains all attributes and metrics that the report displays (all inner objects have their URLs too, so we could continue investigating them).

7. Then we navigate to the XTAB service. The XTAB can execute and cross-tabulate (or pivot if you like) the report's definition. We supply the report definition URL and it spits out the representation of the report result (you can see the the machine representation of the report's data). Notice the asynchronous processing here.

8. Then we go back to the original report Salary by Year and State. The report contains a reference to it's result.

9. We will copy and paste the result's URL to the EXPORTER service that returns (again asynchronously) the report result's data in MS Excel format.

If you have the Good Data platform demo account, you can try this script yourself at http://demo.gooddata.com/gdc (hint - you'll need to take a look at the LOGIN service).

New Good Data Website!

Check out the new Good Data website.

Thursday, October 9, 2008

Deadly iPhone screen

Now I know why one would want to set up multiple wake-up alarms for a single night. My "beloved" iPhone is the only device that we currently have at home capable of doing this. So it's Marimba, diapers, milk, burp. Twice every three hours. :-(

Ray has entered blogosphere!

Ray Light, the guy who sits next to me in Good Data has started blogging. I know that he will be damn good at it. You can start following him at http://www.collaborativeanalytics.com/.

Monday, September 29, 2008

Cloud Transparency

Radovan wrote a comment on Werner's article about cloud transparency. I agree that operators need monitoring and developers want to optimize their apps. However I don't understand how these needs relate to (location) transparency. I think that this is rather about functionality (e.g. OOTB performance and right management tools) that the cloud offers.

I see a little schizophrenia in the AWS messaging. "We (AWS) want developers to play with all nuts and bolts, optimize, monitor, and trace at the network packet level. And when the code jumps into our queuing, simple db, payment or whatever other high-level service then forget all transparency, close your eyes and cross your fingers." :-)

I think that different developers have different needs in terms of the right transparency level. IMO AWS is heavily used by web developers. I believe that particularly web developers will lean towards higher transparency, packaged high-level services and easy deployment.

Radovan, what do you think about the Google AppEngine?

Ema & Anna

Erika gave birth to Ema and Anna last Thursday. You can see few very bad quality (iPhone) photos of our newborns on my Flickr.

Tuesday, July 29, 2008

The Good Data Public Beta is Out!

Go to or website and sign up. And please remember, that we would love to hear from you via our GetSatisfaction forum. Please let us know what you think and help us to create vital community around our product. Thanks!

Thursday, July 24, 2008

Good Data Beta is Close!

I just noticed this article. The UI screenshots are a bit outdated. Yep, 3 months are deep history in SaaS. :-)

Stay tuned, we are close to our first public beta release. Hopefully we will release the public beta next week. If you can't wait and promise me tons of feedback, let me know at zd at gooddata.com. Perhaps I can let you in sooner ;-)

Friday, June 6, 2008

Does SAAS-BI make sense?

I recently came across this interesting article on the poeticcode blog. In short, it's author claims that BI as service (SAAS-BI) does not make sense because of the difficulties with incremental data loading and higher network bandwidth requirements.

I do not want to dispute here the difficulty of the incremental data warehouse loading. It is difficult in certain situations. It is not that difficult in others where the nature of data help solving it in elegant ways. Moreover the incremental loads are necessary for many BI applications no matter if these are deployed in house or as a hosted service. This integration "baseline" (from the labor and cost standpoint) is the same for both in-house and SAAS. SAAS then removes the labor and costs associated with the data warehouse and analytical stack implementation/maintenance. The multi-tenancy leads to improved HW utilization. Only these two factors lead to huge savings and far better return of investment.

The Internet network connection bandwidth argument is ridiculous: "Not to mention, all the extra network bandwidth needed to encrypt and transfer the data from your data center to the SAAS-BI data center. That also means factoring your internet pipes for much more peak-bandwidth else your potential customers visiting your corporate website might have network problems and worse, you might lose sales and loyal customers."

This argument does not even apply to my naively and quickly implemented FAN (Family Area Network). :-)

Summary: It is certainly good to know about the data integration issues associated with the (SAAS) BI. You should not overlook them. However, I do not believe that these issues are the key decision points like cost of ownership, risk management, return on investment, implementation time etc.

Tuesday, May 20, 2008

Gooddata Architecture

Take a look at the Gooddata BI platform architecture here.

Friday, April 11, 2008

See Gooddata live!

We posted the first Gooddata product video on our website. Please bear in mind that we are sharing it early. It is not even alpha. Check it out and let us know what you think.

Wednesday, March 19, 2008

Just-in-Time vs. Just-in-Case BI Costs

Do you know how much power your BI really needs? More precisely, how much power it needs today at 9 AM, next weekend, and at the last day of the quarter or year? Have you bought the ultra-super-duper machine that handles even the highest usage spikes with ease? Or have you decided to sacrifice performance during these peak hours? Do you wait or waste?

Gooddata approach to this dilemma can be described with two keywords: Stateless & Virtualized.

The Stateless is about our architecture. Our product relies on six generic stateless services. The stateless is important for scalability. We can dynamically add any of the six generic service instances as we need to increase throughput of our BI platform.

Virtualized is how these services are deployed. Virtualization allows us to flexibly add hardware nodes to our computing cloud. We have images of different virtual nodes on hand. We can create a new node and dynamically add it to our computing cloud. The beauty is that this all can happen in just few minutes. And the decommissioning of such node is even faster.

We (and you, as our customer) pay for CPU ticks and storage, so the Stateless & Virtual gives you unmatched cost efficiency. Gooddata offers you access to unlimited computing resources. You can get as much of CPU, storage and network bandwidth as you need. And you pay only for what you are really consuming.

Pay for your BI project on Just-in-Time not on Just-in-Case basis.

Friday, March 14, 2008

Gooddata Collaborative Analytics

US dollar exchange rate goes steadily down for more than 5 years. This works for us who buy our gadgets in the US. We use the "Zasilkova Sluzba" service to deliver stuff to our doorstep in Czech Republic.

However, the declining dollar doesn't make us happy as entrepreneurs. It makes all resources that we buy in Europe more expensive. We sell our products for dollars and buy our resources mostly for Czech crowns and getting less and less bang for one dollar. I mean 100% less bang since 2002.

Fortunately we are the "analytics guys" so we can predict the future and actively hedge ourselves against the declining US currency. Our latest research shows that US dollar reaches zero sometimes around 6/18/2012.

Knowing this, we can optimize our financial operations and get the most out of it. Tons of gadgets plus nice access to local resources.

DOES THIS RING SOME BELL? Have you seen such a "great" analysis before? Numbers are right, their interpretation is absurd. That is why we focus on collaborative analytics. Collaboration works great in such cases. You bet that I'm going to see a bull*it tag and some lampoon comments regarding my brainpower a few seconds after I publish such analysis to the Gooddata platform. The product separates wheat from the chaff using collaboration capabilities like tag rating and commenting .

P.S.: Roman, I apologize for stealing your idea with the dollar analysis. I can't resist. At the end this is also a bit about collaboration. You invented and designed it and just I implemented it. :)

Tuesday, March 11, 2008

World Wide Telescope

I always wanted to explore night sky. I'll never forget an August night when I watched falling stars with my father. We lay on a haystack and the million of stars seemed to be so close. I was about seven years old.

Recently I decided to buy a telescope and started talking about this with my friends. One of them told me about the World Wide Telescope project from Microsoft. I was surprised that they haven't started with a World Wide Microscope first but anyway. You can check it out at http://www.worldwidetelescope.org/. The project was recently announced at TED and it is promised to launch in spring 2008. Isn't it spring already? ;)

Friday, March 7, 2008

Gooddata Academy Awards

Yesterday, Roman showed me his personal AMEX card management application that allows him to download his creditcard transaction history. I started thinking about personal business intelligence at this very moment. One of our first Gooddata project templates should focus on loading these data and analyzing them upside down. Nice help for people who wants a bit more than just few standard charts that they get from AMEX. This one even beats the website log enrichment and analysis project idea that was my favorite until yesterday.

I bet that there are zillions of similar situations and formats. The question is which one is the best theme for the soon-to-be-released Gooddata tutorial? Do you have some nice idea? Send it my way (zd at gooddata dot com). The Gooddata Academy will evaluate your nomination. The cute, little ipod nano is waiting for the winner.

You should definitely participate! You might hate such contests. However, you certainly don't want me to give yet-another ipod nano, pico, video whatever to Roman for the AMEX idea. ;)

Wednesday, March 5, 2008

RUP vs Agile

Gooddata is growing. I see new faces in our office every Monday. The Gooddata "legacy" is often under fire. Our new colleagues come from many different environments and bring new ideas and perspectives. That's how we got to the RUP vs Agile discussion.

We practice agile development (scrum) at Gooddata. We have just proudly started our fifth sprint. Our mantra is design, build & re-factor. Iterate quickly until it is done. And suddenly there is a new Yam (yet-another-Martin in our office) who talks about the old good Rational Unified Process. Analysis, design, implementation, deployment. Requirements, use-cases, class and sequence diagrams, logical and physical models etc.

So where is the truth? Where the agile meets RUP? How to marry concepts of these two worlds? I believe that RUP tells us WHAT we need to deliver (use-cases, diagrams, models, schemas, code and deployments) and Agile shows HOW to deliver all the above. The essential pieces that we can't live without need to go to the first iteration of the RUP delivery cycle. All the rest will follow in subsequent iterations. If an iteration crashes because of a new findings, we simply restart it. If we need to dump everything what we have done in an previous iteration we will dump it. If we keep the iterations very short we are not going to regret such loses. And we need to realize that we do not need to go all the way down from use-case to deployment in every iteration. The first iteration can end up with a half-baked command-line script that pretends some functionality rather than fully fledged component with all belts and whistles. We can get to deployment, REST API or whatever it needs later.

I believe that we can have the best from both worlds. Let's blend the proven methodology, useful deliverables and tools from the RUP with the agility, flexibility and efficiency that comes from the Agile. Quickly define the Gooddata way, start using it knowing that we are going to re-factor it million times in the Gooddata future. :)

ZD