Blogs

Jesse Zuckerman
This article was co-authored with Benjamin Webb   Foundational to any map—whether it be a globe, GPS or any online map— is the functionality to understand data on specific locations. The ability to plot geospatial data is powerful as it allows one to distinguish, aggregate and display information in a very familiar manner. Using Pentaho, one can… (Show more)
in Pentaho
Kevin Haas
This post originally published by Chris Deptula on Tuesday, October 27, 2015   I recently attended the Strata-HadoopWorld conference in NYC.  I have been attending this conference for the past few years and each year a theme emerges.  A few years ago it was SQL on Hadoop, last year was all Spark.  This year there was a lot of buzz about streaming… (Show more)
in Pentaho
Kevin Haas
This post originally published by Kevin Haas on Tuesday, July 14, 2015   When working with our clients, we find a growing number who regard their customer or transaction data as not just an internally leveraged asset, but one that can enrich their relationships with customers and supporting partners. They need to systematically share data and… (Show more)
in Pentaho
Kevin Haas
This post originally published by Bryan Senseman on Wednesday, October 15, 2014   I'm a huge user of Mondrian, the high speed open source OLAP engine behind Pentaho Analyzer, Saiku, and more. While the core Mondrian engine is wonderful, there are times when it doesn't do what I need it to, or exactly what I expect it to. Take this special case… (Show more)
in Pentaho
Kevin Haas
This post originally published by Chris Deptula on Wednesday, November 19, 2014.   Many of you requested more information on the inner workings of the Sqoop component. Perhaps the best way to explain is via "lessons learned". Here goes...   Use the split-by Option Sqoop is primarily used to extract and import data from databases into HDFS and… (Show more)
in Pentaho
Kevin Haas
This post originally published by Chris Deptula on Tuesday, February 24, 2015.   This is the third in a three part blog on working with small files in Hadoop.   In my previous blogs, we defined what constitutes a small file and why Hadoop prefers fewer, larger files. We then elaborated on the specific issues that small files cause, specifically… (Show more)
in Pentaho
Kevin Haas
This post originally published by Chris Deptula on Wednesday, February 18, 2015   This is the second in a three part blog on working with small files in Hadoop. In my first blog, I discussed what constitutes a small file and why Hadoop has problems with small files. I defined a small file as any file smaller than 75% of the Hadoop block size, and… (Show more)
in Pentaho
Kevin Haas
This post published by Chris Deptula on Wednesday, February 11, 2015   This is the first in a 3 part blog on working with small files in Hadoop. Hadoop does not work well with lots of small files and instead wants fewer large files. This is probably a statement you have heard before. But, why does Hadoop have a problem with large numbers of small… (Show more)
in Pentaho
Kevin Haas
This post was written by Dave Reinke and originally published on Wednesday, June 22, 2016   In a previous blog, we discussed the importance of tuning data lookups within Pentaho Data Integration (PDI) transformations.  We suggested that there were identifiable design patterns which can be exploited to improve the performance and stability of our… (Show more)
in Pentaho
Kevin Haas
This post was written by Dave Reinke and originally published on Wednesday, July 6, 2016   As we continue our series of Pentaho Data Integration (PDI) Lookup Patterns, we next discuss best practice options for looking up the “most recent record”. Common use cases for this pattern include finding the most recent order for a customer, the last… (Show more)
in Pentaho
Kevin Haas
This post was written by Chris Deptula and originally published on Wednesday, January 28, 2015   With an immutable file system and no update command, how do you perform updates in Hadoop?   This problem occurs in just about every Hadoop project.   Yes, the most common use case of Hadoop data ingestion is to append new sets of event-based and/or… (Show more)
in Pentaho
cchaffey
Managing large quantities of data is part of Accident Exchange (part of AIS Group) business.  An insurance claim can result in multiple pieces of data in many different formats, ultimately result in millions of pieces of claim evidence for the company and their agents to sort through.   Regulations and insurance protocols mean that Accident… (Show more)
in Hitachi Customer Voices
Amol Bhoite
Business Problem Statement Today we live in a world that new and creative business models are driven by technology advancement. The need for Digital Transformation to create new revenue streams, reduce costs, streamline operations, increase speed to market, and allow internal collaborations has become ever more paramount, and the result of that… (Show more)
in Oracle
Load more items