Help & Support

212.660.6600

Dremio

Dremio is an open source Data-as-a-service (DAAS) platform. 🡇See it in action🡇 Login: demo Password: demouser5

Try it now
3.5

About Dremio

What is data lineage?

Dremio is particularly useful for how it captures your data lineage. Data lineage is the record of the life cycle of data which includes data origin, data loading, data aggregation, data transformation, etc. The vision of Dremio is to make data engineers more productive while making data consumers more self-sufficient.
 

As a direct marketer, you will find many reasons to use Dremio. But there are three features which are notable:

  • Out of the box support for many data sources including Amazon S3, Hadoop, NoSQL, and most relational databases (SQL).
  • Query optimization with Native Push-Downs.
  • Ability to connect with any BI tool, python or SQL live.

The Following image shows you what Dremio is capable of doing.

 

1_1.png

Top Functions and Benefits of Dremio Useful for Direct Marketers

Data Extraction

Dremio is a self-service data ingestion tool. Although data extraction is a basic feature of any DAAS tool, most DAAS tools require custom scripts for different data sources. Dremio has a different approach for data extraction. Dremio creates a central data catalog for all the data sources you connect to it. With that, anyone can access and explore any data any time, regardless of structure, volume or location. No matter how you store your data, Dremio makes it work like a standard relational database. Furthermore, you don’t have to build data pipelines when a new data source comes online. Dremio gives you instant access.
 

Rapid Processing of Data

The Data Reflection feature of Dremio makes it one of the fastest data processing systems. The system automatically accelerates data and queries up to 1000x faster leveraging the full power of relational algebra. Dremio has a vertically integrated Query Engine that automatically generates query planes to make the best use of Data Reflection. Another important feature of Dremio is its Native Push Downs which result in query optimization for every data source. In other words, you have a query language which is optimized for Amazon S3, HDFS, NoSQL, RDBMS, ADLS independently. Last but not least, Dremio uses Apache Arrow and Apache Parquet to utilize high-performance columnar storage and execution as opposed to normal row based databases. In simple terms, this means lightning fast performance on very large data sets.

 

2_1.png

The Ability to Scale

Dremio facilitates automatic scaling from one server to thousands of servers in one cluster if needed. You can easily integrate new data sources as well within the cluster. Dremio can handle very large data sets and heavy workloads.
 

Data Visualization

Data visualization is the easiest way to get meaningful insight from your data. Visualization enables data to be more human readable. For example, different types of graphs available in Dremio will display data in a format easier to interpret. Dremio functions as the data visualization pipeline. With Dremio, you don’t have to do complex manipulation of data by writing complex SQL queries or complex code. It does joining, filtering or processing of data for you.

Dremio charts interpret data in a more human-readable form.
 

3.jpg

Support Different Data Sources

There is a long list of data sources that Dremio supports. Most simply, you can upload a CSV file, Excel sheet, or delimited file from your local computer. After that, you simply join with Dremio data sources before querying or using any BI tool. Alternatively, Dremio supports many third-party data sources such as Amazon Redshift, Amazon S3, Amazon Elasticsearch, Azure Data Lake Store, Elasticsearch, HDFS, Hive, MapR-FS, Microsoft SQL Server, MongoDB, MySQL, NAS, Oracle, Postgres and others.


Advanced Security

Data breaching is the most common form of cybercrime. Analytical systems are a natural target. Therefore, the value of a high-security architecture for a product like Dremio can’t be emphasized enough. Dremio has taken many steps to protect users from possible threats.

Authentication and authorization play the biggest role in any security architecture. Dremio uses a FIPS 140-2 compliant cryptographic algorithm to manage user credentials in internal user authentication and supports secret and key rotation. Certificates can be updated by using the Java Keystore tool.

 

4.png

 

Deployment

Dremio can be deployed on-premises or in a public cloud. There are three deployment patterns commonly used:

  • Using dedicated infrastructures such as EC2 instances
  • Using docker containers with Kubernetes for provisioning and management
  • Using Hadoop as a Yarn application

It is recommended to use Dremio on dedicated hardware as it will allow Dremio to use the local filesystem for persisting reflections. For example, for AWS deployments, S3 is supported for persisting reflections, which provides cost-effective reliability without sacrificing performance


Note that your deployment plan should consider the following factors as well.

  • Hardware
  • Size
  • High Availability
  • Back up and recovery.

Diagram of Dremio Deployment in Azure VM

 

5.png

Integrations

We listed many of the data sources that Dremio supports before. Additionally, Dremio supports Excel, CSV and JSON formats. You can use advanced data science languages such as R and Python. Dremio connects analysts with their favorite BI tools such as Power BI, Tableau, and Qlik Sense. For example, joining data in Tableau is much easier with Dremio than other data-as-a-service solutions. Dremio supports LDAP servers for security.

Following image shows some of the most used tools with Dremio

 

7.png

Summary

Dremio is an open source (meaning, no licensing cost) self-service data access tool. Dremio is among the best data lineage documentation and tracking tools. It supports all the major third-party data sources and has super-fast analytical algorithms. Several deployment options are available. Intuitive dashboards will help make it easier to use. Dremio provides documentation good enough to start using it.. Dremio has established a good reputation among the direct marketing community among others and it should remain a leader for some time.

Ratings of Stream Processor

Overall functionality useful to a direct marketer
4 /5

Support for a wide range of data sources, easy deployment, accelerated analysis, optimized queries, and advanced security are a few reasons to choose Dremio over other open source data ingestion tools. One Dremio success story comes from Hotmart, a digital marketplace for online courses. As its customer base reached 1 million, Hotmart started facing data access and performance challenges. Dremio was able to successfully resolve those challenges by introducing this DAAS platform allowing business users to search, curate, and share data from any source with others, then analyze it using their favorite tools, all without being dependent on IT.

Intuitive User Experience
4 /5

Dremio is a complicated product. It was primarily built for data engineers. But it has evolved over time. Despite its sophistication, it is quite intuitive to work with. It has done justice to its vision of being a self-sufficient data platform for everyday users. Its user interface contains self-guiding instructions so you can understand easily how to connect data sources, how to perform data aggregation and data transformation, how to optimize SQL queries, how to create virtual data sets and how to use BI tools. We give it 4 stars for its intuitive user experience.

Following image shows you a screenshot of an intuitive Dremio Dashboard

Active Support Community
4 /5

Dremio has only 100 contributors on GitHub with this product as a repo in their profiles. This is a low number compared to contributions for other data-as-a-service solutions. You won’t even find many Dremio related questions on StackOverflow. Therefore, we give this product 3/5 stars for its active support community rating.

Minimal Technical Skill Required
4 /5

You have to have an understanding of data engineering and data science to make the maximum use of Dremio. For example, data aggregation, data transformation, and SQL optimization requires you to have some knowledge of these subjects. Also, you should be familiar with the data sources you are using. Deployment requires you to make some important decisions. Knowledge of clusters and clouds would be necessary. Dremio’s getting started guide and documentation is helpful. Its dashboard is fairly easy to use and very intuitive. Considering all this, we give 4 stars for this rating.

Related Articles

Why investors bet on Dremio’s data analytics play

Why investors bet on...

Dremio Data-as-a-Service: Product Overview and Insight

Dremio Data-as-a-Ser...

Dremio Launches Platform to Free Analytics Users from IT Dependence

Dremio Launches Plat...

Related Experts

Data Architects

Data Architects

Project Manager

Project Manager

Database Administrator

Database Administrator

Data Quality Analyst

Data Quality Analyst

Data Engineer

Data Engineer

Related Solutions

Capture Actionable Data From Anywhere

Capture Actionable Data From Anywhere

Tune Your Data For Peak Performance

Tune Your Data For Peak Performance

Profile Your Best Customers

Profile Your Best Customers

Other Tools

Pentaho
Data ETL & Data Wrangling FREE Open Source

Pentaho

By tightly coupling data integration with business analytics, Pentaho empowers users to integrate, blend and analyze their data. Pentaho`s open source heritage...

WSO2 Stream Processor
Data Ingestion and Pipeline Management Commercial

WSO2 Stream Processor

WSO2 Stream Processor is a cloud-based ingestion and processing system. It is designed to capture, process and analyze big data in real time.

Tableau
Data Visualization Commercial

Tableau

Tableau is a Business Intelligence tool created to help anyone see and understand their data. Connect to almost any database, drag and drop to create...