Question 1

How should organizations choose between Databricks and Snowflake?

Accepted Answer

The choice depends on the specific workloads and use cases. Databricks has historical strength in data engineering, machine learning, and data science workloads, with a Spark-based architecture that handles complex transformations and unstructured data effectively. Snowflake has historical strength in traditional analytical workloads with SQL-based querying, offering ease of use and strong performance for structured data. Both platforms have expanded significantly, and feature sets have converged in many areas. Databricks has improved its SQL and analytics capabilities. Snowflake has added support for data engineering and machine learning. The decision should consider the actual workload profile, team skills, existing tool ecosystem, and cost characteristics for the specific usage patterns. Organizations sometimes use both platforms for different workloads, though this adds complexity that should be justified by specific requirements rather than accepted by default.

Question 2

What is a data lakehouse and why does it matter?

Accepted Answer

A data lakehouse is an architecture that combines characteristics of data lakes (flexible storage for various data types at low cost) and data warehouses (structured data management with performance and governance features). The architecture allows organizations to maintain a single source for all their data while supporting both analytics and machine learning workloads without moving data between separate platforms. The lakehouse concept has been promoted particularly by Databricks through their Delta Lake technology, though similar architectures are now supported by multiple platforms. The lakehouse matters because it reduces the data duplication and movement that traditional architectures required, simplifying data management while maintaining the capabilities that different workloads need.

Question 3

How long does cloud data platform migration typically take?

Accepted Answer

Migration timelines depend significantly on scope and complexity. Focused migrations of specific workloads can be completed in 3 to 6 months. Comprehensive migrations of enterprise data warehouses typically take 12 to 24 months. Large complex migrations involving multiple legacy systems, extensive applications, and organizational change can take longer. The timelines that produce failures are usually the ones that compress comprehensive migration into unrealistic timeframes or underestimate the work required to handle applications that depend on the legacy systems. Effective migration plans address the complete picture including source system changes, target platform implementation, application updates, data reconciliation, and the operational transition that determines whether the migration actually delivers value.

Question 4

How should organizations manage cloud data platform costs?

Accepted Answer

Cloud data platform cost management requires specific discipline. FinOps practices including cost visibility by team and workload, chargeback mechanisms that make consuming teams aware of cost implications, workload optimization to reduce unnecessary consumption, reserved capacity for predictable workloads, and ongoing attention to cost efficiency are essential. Cost management should begin during initial implementation rather than being added later after costs have become a problem. Organizations that implement strong cost governance from the beginning typically produce cloud data platform outcomes that are cost-effective as well as functionally successful. Organizations that treat cost as a technical consideration for IT rather than a business discipline for all consuming teams typically produce cost escalation that becomes difficult to control.

Question 5

What is the difference between lift-and-shift and re-platforming migration?

Accepted Answer

Lift-and-shift migration moves existing workloads to the cloud platform with minimal changes, preserving the existing structure and logic. It is faster and lower-risk but may not capture the full value that cloud platforms enable. Re-platforming modernizes workloads during migration to take advantage of cloud-specific capabilities like automatic scaling, separation of storage and compute, and modern data formats. It requires more effort but produces better outcomes. The right approach varies by workload. Workloads that will not be changed significantly after migration may be candidates for lift-and-shift. Workloads where the migration is part of broader modernization benefit from re-platforming. Organizations often use a mix of approaches rather than applying one to all workloads. The decisions should be made deliberately for each workload based on its characteristics and the overall migration strategy.

Question 6

What skills do teams need to operate cloud data platforms effectively?

Accepted Answer

Effective operation of cloud data platforms requires a combination of skills. Platform-specific expertise in the chosen platform (Databricks, Snowflake, or others) is essential for architecture and configuration decisions. Cloud infrastructure skills are needed for networking, security, and cost management. Data engineering skills are required for building and maintaining data pipelines. SQL and analytical skills are needed for working with the data. Machine learning and data science skills are needed for advanced analytics use cases. Governance and security skills are needed for compliance and risk management. Most organizations need to build or hire teams with these skill combinations, which takes time and investment. Organizations that attempt to operate cloud data platforms with teams that do not have adequate skills typically produce implementations that underperform their potential.

Question 7

How should cloud data platform security be designed?

Accepted Answer

Cloud data platform security involves multiple dimensions including network security (connections, VPCs, private endpoints), identity and access management (user authentication, role-based access, service accounts), data encryption (at rest and in transit), data classification and handling rules, audit logging, and integration with enterprise security tools. Security should be designed into the implementation from the beginning rather than added later. Modern cloud data platforms offer sophisticated security features, but the features must be configured correctly and maintained over time. Security failures in cloud data platforms can expose substantial amounts of data quickly, making security a higher priority than it might be for systems with more limited scope. Organizations should ensure that security expertise is part of the implementation team rather than being deferred to later phases.

Cloud Data Platform Implementation: Building the Modern Data Infrastructure That Enterprises Need

Implementations Are Harder Than Initial Plans Suggest

Migration Depth Underestimated

Elastic Cost Discipline

Skills Transfer Lag

Architecture Choices Persist

How We
Deliver

Current State and Workload Assessment

Platform Selection and Architecture Design

Implementation Planning

Foundation Implementation

Workload Migration and Modernization

Operations and Optimization

The Cloud Data Platform Cost Problem That Emerges Later

Cloud Data Platform Implementation
Capabilities

Cloud Data Platform Strategy

Databricks Implementation

Snowflake Implementation

Data Lake and Lakehouse Architecture

Platform Selection Advisory

Migration Planning and Execution

Data Architecture Modernization

Security and Governance Configuration

Performance Optimization

FinOps and Cost Optimization

Data Sharing and Collaboration

Platform Operations

Center of Excellence Establishment

Where This Applies

Common Questions

Build Cloud Data Platforms That Deliver on Their Promise

Related Services

Data Strategy & Governance

Data Analytics & Business Intelligence

Data Engineering & Modernization

Generative AI & Enterprise LLMs