For most companies, big data comes from enterprise data including: point of sales records, financial, medical, intelligence and call records, stored in Oracle databases. However, MapReduce algorithms, and patterns such as clustering (K-Means, Canopy), classification (Naive Bayes, K-NN), recommenders, and so on, cannot be easily implemented using SQL. Furthermore shipping enterprise data has latency, scalablity, integrity and security issues. How to analyze enterprise data in Oracle databases using the latest and modern Hadoop compute engines (HiveQL, MapReduce, Tez, Cascading, SparkQL, Impala, Red Shift, etc) and programming model without copying data over? This session describes (i) an HCatalog-based JDBC Storage Handler for safely processing relational data from Hadoop cluster w/o copying and (ii) "In-Database Container for Hadoop" for processing data in-place.
Kuassi Mensah is Group Product Manager for Oracle Database Data Access services, Net Services, and database programming APIs (Java, C/C++, PHP, Ruby, Python, Perl). Mr Mensah holds a MS in Computer Sciences from the Programming Institute of University of Paris VI. He is is a frequent speaker at Oracle and IT events; has published several articles and a book @ http://www.amazon.com/exec/obidos/ASIN/1555583296. He maintains a blog @ http://db360.blogspot.com, and facebook, linkedin, and twitter http://twitter.com/kmensah pages.