Tuesday, January 27, 2009

"Oranzova vznikne kdyz se nachcije do rudy."



Czech proverb very pertinent to the current Czech Democratic party leaders - dr. Rath & mr. Paroubek (not your average Obama). They are recalling Hitler's way to get out of the economic crisis. Unbelievable! I feel really ashamed for them and only hope that Czechs can't be so stupid to let them back to our government.

P.S.: I still can't believe I blogged Czech politics.

Setting Amazon S3 ACLs Programatically

I recently wrote a simple Python script for setting Amazon S3 ACLs. You might find it useful.

Usage:

setS3acl.py -a $AWS_ACCESS_KEY -s $AWS_SECURE_KEY -b "myBucket" -f myfile.txt -o $OWNER_AWS_ID "$USR1_AWS_ID:FULL_CONTROL" "$USR2_AWS_ID:READ"

Options

-a - your AWS access key
-s - your AWS secure access key
-b - AWS bucket name (no s3:// nor any other slashes)
-f - file name (no s3:// nor bucket prefixes, just the filename)
-o file owner (AWS ID)

Parameters:

[AWS_ID:FULL_CONTROL|READ|WRITE]

You'll need to use the AWS IDs to identify the user (the long and ugly Amazon account identifiers - e.g. a382d287d4d58222758254ddebac103f70e6f5b).

There are two Python files that you'll need:

* setS3acl.py
* S3.py

Look at the URL to find out where I store the scripts ($0.15 / GB / month) ;-)

Enjoy!

Wednesday, January 21, 2009

Thursday, December 18, 2008

California Immigration

I've uploaded the Census US demographic data to Good Data and I can't stop wondering. For example, would you believe that good one fourth of California population are immigrants? More precisely 26.2% Californians were born outside of US.




Do you want start wondering too? Let me know ( zd at gooddata.com) and I'll invite you to this analytic project.

Wednesday, November 5, 2008

My Good Data Web Expo Slides

See the slides that I presented on the WebExpo conference.

Good Data REST API

The Good Data BI platform is accessible through the stateless REST API. This HTTP-based API can be simply used from any 3rd party application as well as from a plain browser. The API provides the full power of our platform (we actually use it as the backend for our web frontend).

In fact the Good Data application consists of a handful types of services. Instances of these services can be dynamically added or removed (via simple HTTP load-balancing) on as needed basis. Add the Amazon EC2 cloud that allows us to add or remove a new machine and only pay for the CPU ticks that we really use. The net result is the great flexibility, scalability and cost efficiency.

The demo video below points at the fundamental architecture differences between our approach and some other on-demand BI vendors who simply deployed an existing BI package (e.g. Pentaho or MS Analytics) on the web (which unfortunately does not prevent their marketing from using the multi-tenant, SaaS mambo jumbo).

This video might help you to better understand the Good Data architecture. I apologize for no audio. Hopefully the simple step-by-step description below helps:

1. The /gdc suffix in the GDC BI platform URL shows the list of the REST API services that the platform provides.

2. Then we navigate to the metadata services that manage metadata for a selected BI project (the FoodMartDemo in our case).

3. We first show the FULL-TEXT SEARCH service. We specify the search term ("sales") directly in the service's URL. The list of matching results is shown.

4. We select one of the reports from the search result to inspect the report's definition. We can spit out the definition in many formats (e.g. JSON, YAML, ATOM, or XML). We use YAML as the default.

5. Then we demonstrate the metadata QUERY service. We list all reports in the FoodMartDemo project. We again inspect one of the reports: Salary by Year and State.

6. Then we are going to demonstrate the using service that shows us all dependencies (metadata objects that the selected report references) of the report. For example the report depends on it's definition (reportDefinition) object. We copy and paste the link of the report definition to the browser URL bar to inspect the report definition object structure. It contains all attributes and metrics that the report displays (all inner objects have their URLs too, so we could continue investigating them).

7. Then we navigate to the XTAB service. The XTAB can execute and cross-tabulate (or pivot if you like) the report's definition. We supply the report definition URL and it spits out the representation of the report result (you can see the the machine representation of the report's data). Notice the asynchronous processing here.

8. Then we go back to the original report Salary by Year and State. The report contains a reference to it's result.

9. We will copy and paste the result's URL to the EXPORTER service that returns (again asynchronously) the report result's data in MS Excel format.

If you have the Good Data platform demo account, you can try this script yourself at http://demo.gooddata.com/gdc (hint - you'll need to take a look at the LOGIN service).

New Good Data Website!

Check out the new Good Data website.