Azure Data Explorer supports data ingestion from a wide range of sources. And now ADX supports data Ingestion from AWS consulting service Amazon S3 (Single Storage Service). With the support of Amazon S3, it is easy for ADX to bring data from S3 natively without passing it on to complex processes like ETL pipelines. This blog discusses Azure Data Explorer Amazon S3 and data ingestion from ADX to S3.
About Azure Data Explorer
Every enterprise will have a bulk of diverse data that is structured, semi-structured, or unstructured. To handle and manage these data, enterprise use analytics tools like Hadoop, Spark, etc. that rely on ETL pipelines. Azure Data Explorer is the best data analytic service to utilize and analyze big data. A high-performance platform like ADX is for data analysis and management. ADX helps to pull, store, and analyze terabytes of data in a few seconds.
About Amazon S3
About Amazon S3Amazon S3 (Single Storage Service) is one of the AWS consulting services which provides scalability, data availability, security, and performance. Small and large-scale industries can use this to deal with big data. Amazon S3 follows an object storage architecture, and the basic unit of it is objects stored in buckets. S3 stores and protects these objects for use cases like enterprise applications, mobile applications, backups, data archives, data lakes, websites, disaster recovery, big data analytics, and hybrid cloud storage. Some features of Amazon S3 are
- Storage classes
- Storage management and monitoring
- Access management
- Data processing
- Analytics and Insights
- Consistency
- Security
Data Ingestion from S3 to ADX
ADX is a distributed database that runs on a cluster of nodes. ADX workflow is as follows: Creating Database, Ingesting data, Query the database. In the stage of ingesting data ADX has new support from S3. The data ingestion process is given in the below diagram.
Step 1: Once the data file reaches S3, S3 invokes AWS Lambda which builds data processing triggers.
Step 2: AWS Lambda uses ADX SDK and posts a message to the Azure Storage queue. This message includes metadata, object URL & authentication token to fetch the file.
Step 3: The event trigger notifies the ADX about the file.
Step 4: Data batches pulled to AWS S3 are sealed depending on the batch policy.
Steps 2 to 4 are transparent to end users and are fully managed by ADX.
The ADX command used for data ingestion is .ingest into. This ADX command will ingest data into tables by pulling data from cloud storage files. The syntax is as given below:
.ingest into table Table (
h’https://<bucket_name>.s3.<region_name>.amazonaws.com/<object_name>;
AwsCredentials=<AWS_ACCESS_ID>,<AWS_SECRET_KEY>‘)
As this command is to interact with the Kusto engine, the code will not work if the engine is unavailable. This command ingests data as a batch through Data Management Service, ingestion batching policy is set into databases or tables. And this type of ingestion is the most preferred and best-performing method. Now all the ADX SDKs are updated with S3 ingestion support.
Conclusion
Azure Data Explorer is an analytic service for dealing with big data, and ADX ingests data from a wide range of sources. Amazon S3 stores large amounts of data as objects in buckets. For Azure Data Explorer with Amazon S3 support, it is easy to ingest data from Amazon S3 as there is no need for complex pipelines.
If your enterprise is looking for an easy step for data ingestion with ADX try the new one with Amazon S3 support, our team is on the way to help you.