- Performance optimization: Databricks I/O technology, (DBIO), improves processing speed with a tuned and optimized Spark version for a wide range of instance types. It also includes an optimized AWS S3 access Layer — accelerating data exploration up to 10x.
- Cost management: AWS Spot instances and auto-scaling are two examples of cluster management capabilities that reduce operational costs. They also eliminate the time-consuming tasks of configuring, building, and maintaining complex Spark infrastructure.
- Optimized Integration: REST APIs for programmatically launching clusters and jobs, and integrating tools or services such as Redshift and Kinesis with the Databricks platform. Databricks users have instant access to all data sources via an integrated data sources catalog. This eliminates the need for duplicate data ingest work.
- Enterprise security: Key security standards, including SOC 2 Type 1 certification, HIPAA compliance, data encryption and detailed logs easily accessable in AWS S3 to debugging. IT admin capabilities include Single Sign-On with SAML 2.0 support, and role-based access control for clusters, jobs and notebooks.
- Collaboration with data science: Integration into Databricks’ data science workspaces, allowing seamless transition between interactive data science and data engineering workloads.
The company stated that pricing for the optimized platform is determined by data engineering workloads such ETL and automated jobs ($0.20/Databricks Unit plus cost of AWS).