Advertisement · 728 × 90
#
Hashtag
#awsglue
Advertisement · 728 × 90
Preview
Simplified permissions for Amazon S3 Tables and Iceberg materialized views AWS Glue Data Catalog now supports AWS IAM-based authorization for Amazon S3 Tables and Apache Iceberg materialized views. With IAM-based authorization, you can define all necessary permissions across storage, catalog, and query engines in a single IAM policy. This capability simplifies the integration of S3 Tables or materialized views with any AWS Analytics service, including Amazon Athena, Amazon EMR, Amazon Redshift, and AWS Glue. You can also opt in to AWS Lake Formation at any time to manage fine-grained access controls using the AWS Management Console, AWS CLI, API, and AWS CloudFormation. This feature is now available in select AWS Regions. To learn more, visit the S3 Tables documentation and the AWS Glue Data Catalog documentation.

🆕 AWS Glue Data Catalog now supports IAM-based authorization for S3 Tables and Iceberg views, simplifying permissions via a single policy. This boosts integration with AWS analytics services and is available in select regions. Learn more in the docs.

#AWS #AwsGlue #AmazonS3

0 0 0 0
Simplified permissions for Amazon S3 Tables and Iceberg materialized views AWS Glue Data Catalog now supports AWS IAM-based authorization for Amazon S3 Tables and Apache Iceberg materialized views. With IAM-based authorization, you can define all necessary permissions across storage, catalog, and query engines in a single IAM policy. This capability simplifies the integration of S3 Tables or materialized views with any AWS Analytics service, including Amazon Athena, Amazon EMR, Amazon Redshift, and AWS Glue. You can also opt in to AWS Lake Formation at any time to manage fine-grained access controls using the AWS Management Console, AWS CLI, API, and AWS CloudFormation. This feature is now available in select AWS Regions. To learn more, visit the https://docs.aws.amazon.com/AmazonS3/latest/userguide/s3-tables-integration-overview.html and the https://docs.aws.amazon.com/glue/latest/dg/glue-federation-s3tables.html.

Simplified permissions for Amazon S3 Tables and Iceberg materialized views

AWS Glue Data Catalog now supports AWS IAM-based authorization for Amazon S3 Tables and Apache Iceberg materialized views. With IAM-based authorization, you can define all necessary permissions ...

#AWS #AwsGlue #AmazonS3

0 0 0 0
Preview
AWS Glue zero-ETL integrations with Amazon DynamoDB as the source support new configurations AWS Glue zero-ETL now supports configurable change data capture (CDC) refresh intervals and on-demand data ingestion for integrations with Amazon DynamoDB as the source. This enhancement can help you to customize how frequently data changes are captured from your Amazon DynamoDB tables, with refresh intervals ranging from 15 minutes to 6 days, and trigger immediate data ingestion when needed. These capabilities bring zero-ETL integrations from Amazon DynamoDB sources to feature parity with zero-ETL integrations from SaaS sources, like Salesforce, SAP, and ServiceNow, ensuring consistent functionality across different source types. With configurable CDC refresh intervals, you can optimize your data pipeline performance by adjusting the frequency of change capture to match your specific business requirements—whether you need near real-time updates every 15 minutes or can work with longer intervals up to 6 days to reduce costs. The on-demand ingestion capability allows you to immediately capture critical data changes without waiting for the next scheduled CDC interval. This functionality is ideal for scenarios that require data to be immediately available for analytics, reporting, or downstream applications and helps strike a balance between data freshness requirements and operational efficiency. These features are available today in all AWS regions where AWS Glue zero-ETL is supported. To get started with configuring CDC refresh intervals and on-demand ingestion for your Amazon DynamoDB integrations, see the AWS Glue User Guide. To learn more about AWS Glue zero-ETL integrations, visit the AWS Glue documentation.

🆕 AWS Glue zero-ETL now supports CDC refresh intervals and on-demand data ingestion for Amazon DynamoDB, offering customizable data capture from 15 min to 6 days, ensuring feature parity with SaaS sources and optimizing pipeline performance. Available globally.

#AWS #AwsGlue #AmazonDynamodb

1 0 0 0
AWS Glue zero-ETL integrations with Amazon DynamoDB as the source support new configurations AWS Glue zero-ETL now supports configurable change data capture (CDC) refresh intervals and on-demand data ingestion for integrations with Amazon DynamoDB as the source. This enhancement can help you to customize how frequently data changes are captured from your Amazon DynamoDB tables, with refresh intervals ranging from 15 minutes to 6 days, and trigger immediate data ingestion when needed. These capabilities bring zero-ETL integrations from Amazon DynamoDB sources to feature parity with zero-ETL integrations from SaaS sources, like Salesforce, SAP, and ServiceNow, ensuring consistent functionality across different source types. With configurable CDC refresh intervals, you can optimize your data pipeline performance by adjusting the frequency of change capture to match your specific business requirements—whether you need near real-time updates every 15 minutes or can work with longer intervals up to 6 days to reduce costs. The on-demand ingestion capability allows you to immediately capture critical data changes without waiting for the next scheduled CDC interval. This functionality is ideal for scenarios that require data to be immediately available for analytics, reporting, or downstream applications and helps strike a balance between data freshness requirements and operational efficiency. These features are available today in all AWS regions where AWS Glue zero-ETL is supported. To get started with configuring CDC refresh intervals and on-demand ingestion for your Amazon DynamoDB integrations, see the AWS Glue https://docs.aws.amazon.com/glue/latest/dg/zero-etl-configuring-integration.html. To learn more about AWS Glue zero-ETL integrations, visit the AWS Glue https://docs.aws.amazon.com/glue/latest/dg/zero-etl-using.html. 

AWS Glue zero-ETL integrations with Amazon DynamoDB as the source support new configurations

AWS Glue zero-ETL now supports configurable change data capture (CDC) refresh intervals and on-demand data ingestion for integrations with Amazon DynamoDB as the source. ...

#AWS #AwsGlue #AmazonDynamodb

0 0 0 0
Preview
Amazon SageMaker Unified Studio now supports AWS Glue 5.1 for data processing jobs Amazon SageMaker Unified Studio now supports AWS Glue 5.1 for Visual ETL, notebook, and code-based data processing jobs. With AWS Glue 5.1 in Amazon SageMaker Unified Studio, data engineers and data scientists can run jobs on Apache Spark 3.5.6 with Python 3.11 and Scala 2.12.18, and use updated open table format libraries including Apache Iceberg 1.10.0, Apache Hudi 1.0.2, and Delta Lake 3.3.2. You can use AWS Glue 5.1 in Amazon SageMaker Unified Studio when creating data processing jobs by selecting Glue 5.1 from the version dropdown in job settings. This applies to Visual ETL jobs, notebook jobs, and code-based jobs, so you can take advantage of the latest Spark runtime and open table format libraries across all your data processing workflows. AWS Glue 5.1 in Amazon SageMaker Unified Studio is available in US East (N. Virginia), US East (Ohio), US West (Oregon), Europe (Ireland), Europe (Stockholm), Europe (Frankfurt), Europe (Spain), Asia Pacific (Hong Kong), Asia Pacific (Singapore), Asia Pacific (Sydney), Asia Pacific (Tokyo), Asia Pacific (Malaysia), Asia Pacific (Thailand), Asia Pacific (Mumbai), and South America (Sao Paulo). To learn more, visit the Amazon SageMaker Unified Studio documentation. For details on what's included in AWS Glue 5.1, including updated open table format support and access control capabilities, see the AWS Glue documentation.

🆕 Amazon SageMaker Unified Studio now supports AWS Glue 5.1 for data processing jobs, enabling Visual ETL, notebooks, and code-based jobs with Spark 3.5.6 and updated libraries like Apache Iceberg, Hudi, and Delta Lake. Available in multiple regions.

#AWS #AmazonSagemaker #AwsGlue

0 0 0 0
Amazon SageMaker Unified Studio now supports AWS Glue 5.1 for data processing jobs https://aws.amazon.com/sagemaker/unified-studio/now supports https://aws.amazon.com/about-aws/whats-new/2025/11/aws-glue-5-1/for Visual ETL, notebook, and code-based data processing jobs. With AWS Glue 5.1 in Amazon SageMaker Unified Studio, data engineers and data scientists can run jobs on Apache Spark 3.5.6 with Python 3.11 and Scala 2.12.18, and use updated open table format libraries including Apache Iceberg 1.10.0, Apache Hudi 1.0.2, and Delta Lake 3.3.2. You can use AWS Glue 5.1 in Amazon SageMaker Unified Studio when creating data processing jobs by selecting Glue 5.1 from the version dropdown in job settings. This applies to Visual ETL jobs, notebook jobs, and code-based jobs, so you can take advantage of the latest Spark runtime and open table format libraries across all your data processing workflows. AWS Glue 5.1 in Amazon SageMaker Unified Studio is available in US East (N. Virginia), US East (Ohio), US West (Oregon), Europe (Ireland), Europe (Stockholm), Europe (Frankfurt), Europe (Spain), Asia Pacific (Hong Kong), Asia Pacific (Singapore), Asia Pacific (Sydney), Asia Pacific (Tokyo), Asia Pacific (Malaysia), Asia Pacific (Thailand), Asia Pacific (Mumbai), and South America (Sao Paulo). To learn more, visit the Amazon SageMaker Unified Studio https://docs.aws.amazon.com/sagemaker-unified-studio/latest/userguide/smus-creating-jobs.html. For details on what's included in AWS Glue 5.1, including updated open table format support and access control capabilities, see the AWS Glue https://docs.aws.amazon.com/glue/latest/dg/release-notes.html.

Amazon SageMaker Unified Studio now supports AWS Glue 5.1 for data processing jobs

https://aws.amazon.com/sagemaker/unified-studio/now supports aws.amazon.com/about-aws/whats-new/2025... Visual ETL, notebook, and code-based data processi...

#AWS #AmazonSagemaker #AwsGlue

0 0 0 0
Preview
AWS Glue 5.1 is now available in 18 additional regions AWS Glue 5.1 is now available in eighteen additional AWS Regions: Africa (Cape Town), Asia Pacific (Hyderabad, Jakarta, Melbourne, Osaka, Seoul, Taipei), Canada (Calgary, Central), Europe (London, Milan, Paris, Zurich), Israel (Tel Aviv), Mexico (Central), Middle East (Bahrain, UAE), and US West (N. California). AWS Glue is a serverless, scalable data integration service that simplifies discovering, preparing, moving, and integrating data from multiple sources. AWS Glue 5.1 upgrades core engines to Apache Spark 3.5.6, Python 3.11, and Scala 2.12.18, bringing performance and security enhancements. It also updates support for open table format libraries, including Apache Hudi 1.0.2, Apache Iceberg 1.10.0, and Delta Lake 3.3.2. Additionally, AWS Glue 5.1 introduces support for Apache Iceberg format version 3.0, adding default column values, deletion vectors for merge-on-read tables, multi-argument transforms, and row lineage tracking. This release also extends AWS Lake Formation fine-grained access control to write operations (both DML and DDL) for Spark DataFrames and Spark SQL. Previously, this capability was limited to read operations only. AWS Glue 5.1 also adds full-table access control in Apache Spark for Apache Hudi and Delta Lake tables, providing more comprehensive security options for your data. With this expansion, AWS Glue 5.1 is now available in thirty-three AWS Regions. You can get started with AWS Glue 5.1 using AWS Glue APIs, AWS Command Line Interface (CLI), AWS Software Development Kit (SDK), AWS Glue Studio, or Amazon SageMaker Unified Studio. To learn more, visit the AWS Glue product page and our documentation.

🆕 AWS Glue 5.1 is now available in 18 more regions, bringing it to 33 total. It upgrades engines, adds Apache Iceberg 3.0 support, and extends fine-grained access control to write operations for Spark DataFrames and SQL, enhancing performance, security, and data integration.

#AWS #AwsGlue

0 0 0 0
AWS Glue 5.1 is now available in 18 additional regions AWS Glue 5.1 is now available in eighteen additional AWS Regions: Africa (Cape Town), Asia Pacific (Hyderabad, Jakarta, Melbourne, Osaka, Seoul, Taipei), Canada (Calgary, Central), Europe (London, Milan, Paris, Zurich), Israel (Tel Aviv), Mexico (Central), Middle East (Bahrain, UAE), and US West (N. California). https://aws.amazon.com/glue/ is a serverless, scalable data integration service that simplifies discovering, preparing, moving, and integrating data from multiple sources. AWS Glue 5.1 upgrades core engines to Apache Spark 3.5.6, Python 3.11, and Scala 2.12.18, bringing performance and security enhancements. It also updates support for open table format libraries, including Apache Hudi 1.0.2, Apache Iceberg 1.10.0, and Delta Lake 3.3.2. Additionally, AWS Glue 5.1 introduces support for Apache Iceberg format version 3.0, adding default column values, deletion vectors for merge-on-read tables, multi-argument transforms, and row lineage tracking. This release also extends https://aws.amazon.com/lake-formation/ fine-grained access control to write operations (both DML and DDL) for Spark DataFrames and Spark SQL. Previously, this capability was limited to read operations only. AWS Glue 5.1 also adds full-table access control in Apache Spark for Apache Hudi and Delta Lake tables, providing more comprehensive security options for your data. With this expansion, AWS Glue 5.1 is now available in thirty-three AWS Regions. You can get started with AWS Glue 5.1 using AWS Glue APIs, AWS Command Line Interface (CLI), AWS Software Development Kit (SDK), AWS Glue Studio, or Amazon SageMaker Unified Studio. To learn more, visit the AWS Glue https://aws.amazon.com/glue/ and our https://docs.aws.amazon.com/glue/latest/dg/release-notes.html. 

AWS Glue 5.1 is now available in 18 additional regions

AWS Glue 5.1 is now available in eighteen additional AWS Regions: Africa (Cape Town), Asia Pacific (Hyderabad, Jakarta, Melbourne, Osaka, Seoul, Taipei), Canada (Calgary, Central), Europe (London, Milan, Paris, Zurich), Isra...

#AWS #AwsGlue

0 0 0 0
AWS Glue launches native REST API connector for universal data integration AWS Glue now offers a native REST-based connector that enables customers to easily read data from any source with a REST-based API. Customers can now create custom connectors to any REST-enabled data source and seamlessly integrate that data into their AWS Glue ETL (Extract, Transform, and Load) jobs. This capability extends AWS Glue's existing connectivity to 100+ non-AWS data sources through 60+ native connectors and additional options on AWS Marketplace. Previously, connecting to proprietary systems or emerging platforms required customers to build custom connectors by providing specialized JARs with the necessary libraries. The new native REST API connector eliminates this complexity, making it easier to integrate data from any REST-enabled source. It reduces operational overhead by eliminating the need to install, update, or manage custom libraries, freeing teams from maintenance burdens. The connector also enhances flexibility, enabling organizations to quickly adapt to new data sources as business needs evolve. It also streamlines ETL management by allowing data engineers to focus on data transformation and business logic rather than spending time building and maintaining connector infrastructure. The AWS Glue REST API connector is available in all AWS commercial regions where AWS Glue is available. You can start using the AWS Glue REST API connector using AWS Glue APIs, AWS Command Line Interface (CLI), or AWS Software Development Kit (SDK). To get started, see AWS Glue https://docs.aws.amazon.com/glue/latest/dg/glue-connections.html.

AWS Glue launches native REST API connector for universal data integration

AWS Glue now offers a native REST-based connector that enables customers to easily read data from any source with a REST-based API. Customers can now create custom connectors to any REST-enabled data sour...

#AWS #AwsGlue

0 0 0 0
Preview
AWS Glue launches native REST API connector for universal data integration AWS Glue now offers a native REST-based connector that enables customers to easily read data from any source with a REST-based API. Customers can now create custom connectors to any REST-enabled data source and seamlessly integrate that data into their AWS Glue ETL (Extract, Transform, and Load) jobs. This capability extends AWS Glue's existing connectivity to 100+ non-AWS data sources through 60+ native connectors and additional options on AWS Marketplace. Previously, connecting to proprietary systems or emerging platforms required customers to build custom connectors by providing specialized JARs with the necessary libraries. The new native REST API connector eliminates this complexity, making it easier to integrate data from any REST-enabled source. It reduces operational overhead by eliminating the need to install, update, or manage custom libraries, freeing teams from maintenance burdens. The connector also enhances flexibility, enabling organizations to quickly adapt to new data sources as business needs evolve. It also streamlines ETL management by allowing data engineers to focus on data transformation and business logic rather than spending time building and maintaining connector infrastructure. The AWS Glue REST API connector is available in all AWS commercial regions where AWS Glue is available. You can start using the AWS Glue REST API connector using AWS Glue APIs, AWS Command Line Interface (CLI), or AWS Software Development Kit (SDK). To get started, see AWS Glue documentation.

🆕 AWS Glue introduces a native REST API connector for easy integration of any REST-enabled data source into ETL jobs, simplifying custom connector creation and reducing maintenance burdens, now available in all commercial regions.

#AWS #AwsGlue

0 0 0 0
AWS Glue is now available in Asia Pacific (New Zealand) Region AWS Glue is now available in the Asia Pacific (New Zealand) Region, enabling customers to build and run their ETL workloads closer to their data sources in this region. AWS Glue is a serverless data integration service that makes it simple to discover, prepare, and combine data for analytics, machine learning, and application development. AWS Glue provides both visual and code-based interfaces to make data integration simpler so you can analyze your data and put it to use in minutes instead of months. To learn more, visit the AWS Glue https://aws.amazon.com/glue/ and our https://docs.aws.amazon.com/glue/. For AWS Glue Region availability, please see the https://aws.amazon.com/about-aws/global-infrastructure/regional-product-services/.

AWS Glue is now available in Asia Pacific (New Zealand) Region

AWS Glue is now available in the Asia Pacific (New Zealand) Region, enabling customers to build and run their ETL workloads closer to their data sources in this region.

AWS Glue is a serverless data integration ...

#AWS #AwsGlue

0 0 0 0
Preview
AWS Glue is now available in Asia Pacific (New Zealand) Region AWS Glue is now available in the Asia Pacific (New Zealand) Region, enabling customers to build and run their ETL workloads closer to their data sources in this region. AWS Glue is a serverless data integration service that makes it simple to discover, prepare, and combine data for analytics, machine learning, and application development. AWS Glue provides both visual and code-based interfaces to make data integration simpler so you can analyze your data and put it to use in minutes instead of months. To learn more, visit the AWS Glue product page and our documentation. For AWS Glue Region availability, please see the AWS Region table.

🆕 AWS Glue is now available in Asia Pacific (New Zealand), enabling ETL workloads closer to data sources. This serverless data integration service simplifies data prep for analytics, ML, and app dev with visual and code-based interfaces. For more, visit the AWS Glue product page.

#AWS #AwsGlue

1 0 0 0
Zero-ETL for self-managed Database Sources now available in 7 new regions https://docs.aws.amazon.com/glue/latest/dg/zero-etl-using.html now supports zero-ETL for self-managed database sources in seven additional regions. Using Glue zero-ETL, you can setup an integration to replicate data from Oracle, SQL Server, MySQL or PostgreSQL databases which are located on-premises or on AWS EC2 to Redshift with a simple experience that eliminates configuration complexity. AWS zero-ETL for self-managed database sources will automatically create an integration for an on-going replication of data from your on-premises or EC2 databases through a simple, no-code interface. You can now replicate data from Oracle, SQL Server, MySQL and PostgreSQL databases into Redshift. This feature further reduces users' operational burden and saves weeks of engineering effort needed to design, build, and test data pipelines to ingest data from self-managed databases to Redshift. AWS Glue zero-ETL for self-managed database sources are available in the following additional AWS Regions: Asia Pacific (Hong Kong), Asia Pacific (Tokyo), Asia Pacific (Singapore), Asia Pacific (Sydney), Europe (London), South America (São Paulo), and US (Virginia) regions. To get started, sign into the https://us-east-1.console.aws.amazon.com/ec2/home?region=us-east-1#LaunchInstances:instanceType=r8a.large. For more information visit the https://aws.amazon.com/glue/ or review the https://docs.aws.amazon.com/glue/latest/dg/zero-etl-using.html documentation.

Zero-ETL for self-managed Database Sources now available in 7 new regions

docs.aws.amazon.com/glue/latest/dg/zero-etl-... now supports zero-ETL for self-managed database sources in seven additional regions. Using Glue z...

#AWS #AwsDatabaseMigrationService #AmazonRedshift #AwsGlue

0 0 0 0
Preview
Reliable Data with AWS Glue Data Quality On September 27, 2025, I went to the AWS User Group Chennai Meetup and it was really full of great...

✍️ New blog post by N Chandra Prakash Reddy

Reliable Data with AWS Glue Data Quality

#aws #dataquality #awsglue #costoptimization

2 0 0 0
Zero-ETL for self-managed Database Sources now available in 7 new regions https://docs.aws.amazon.com/glue/latest/dg/zero-etl-using.html now supports zero-ETL for self-managed database sources in seven additional regions. Using Glue zero-ETL, you can setup an integration to replicate data from Oracle, SQL Server, MySQL or PostgreSQL databases which are located on-premises or on AWS EC2 to Redshift with a simple experience that eliminates configuration complexity. AWS zero-ETL for self-managed database sources will automatically create an integration for an on-going replication of data from your on-premises or EC2 databases through a simple, no-code interface. You can now replicate data from Oracle, SQL Server, MySQL and PostgreSQL databases into Redshift. This feature further reduces users' operational burden and saves weeks of engineering effort needed to design, build, and test data pipelines to ingest data from self-managed databases to Redshift. AWS Glue zero-ETL for self-managed database sources are available in the following additional AWS Regions: Asia Pacific (Hong Kong), Asia Pacific (Tokyo), Asia Pacific (Singapore), Asia Pacific (Sydney), Europe (London), South America (São Paulo), and US (Virginia) regions. To get started, sign into the https://us-east-1.console.aws.amazon.com/ec2/home?region=us-east-1#LaunchInstances:instanceType=r8a.large. For more information visit the https://aws.amazon.com/glue/ or review the https://docs.aws.amazon.com/glue/latest/dg/zero-etl-using.html documentation.

Zero-ETL for self-managed Database Sources now available in 7 new regions

docs.aws.amazon.com/glue/latest/dg/zero-etl-... now supports zero-ETL for self-managed database sources in seven additional regions. Using Glue z...

#AWS #AwsDatabaseMigrationService #AmazonRedshift #AwsGlue

1 0 0 0
Preview
Zero-ETL for self-managed Database Sources now available in 7 new regions AWS Glue now supports zero-ETL for self-managed database sources in seven additional regions. Using Glue zero-ETL, you can setup an integration to replicate data from Oracle, SQL Server, MySQL or PostgreSQL databases which are located on-premises or on AWS EC2 to Redshift with a simple experience that eliminates configuration complexity. AWS zero-ETL for self-managed database sources will automatically create an integration for an on-going replication of data from your on-premises or EC2 databases through a simple, no-code interface. You can now replicate data from Oracle, SQL Server, MySQL and PostgreSQL databases into Redshift. This feature further reduces users' operational burden and saves weeks of engineering effort needed to design, build, and test data pipelines to ingest data from self-managed databases to Redshift. AWS Glue zero-ETL for self-managed database sources are available in the following additional AWS Regions: Asia Pacific (Hong Kong), Asia Pacific (Tokyo), Asia Pacific (Singapore), Asia Pacific (Sydney), Europe (London), South America (São Paulo), and US (Virginia) regions. To get started, sign into the AWS Management Console. For more information visit the AWS Glue page or review the AWS Glue zero-ETL documentation.

🆕 AWS Glue introduces zero-ETL for self-managed databases in 7 new regions: HK, Tokyo, Singapore, Sydney, London, São Paulo, and Virginia. It eases data replication from Oracle, SQL Server, MySQL, or PostgreSQL to Redshift, cutting down e…

#AWS #AwsDatabaseMigrationService #AmazonRedshift #AwsGlue

1 0 0 0
Announcing the Apache Spark upgrade agent for Amazon EMR AWS announces the Apache Spark upgrade agent, a new capability that accelerates Apache Spark version upgrades for Amazon EMR on EC2 and EMR Serverless. The agent converts complex upgrade processes that typically take months into projects spanning weeks through automated code analysis and transformation. Organizations invest substantial engineering resources analyzing API changes, resolving conflicts, and validating applications during Spark upgrades. The agent introduces conversational interfaces where engineers express upgrade requirements in natural language, while maintaining full control over code modifications. The Apache Spark upgrade agent automatically identifies API changes and behavioral modifications across PySpark and Scala applications. Engineers can initiate upgrades directly from SageMaker Unified Studio, Kiro CLI or IDE of their choice with the help of MCP (Model Context Protocol) compatibility. During the upgrade process, the agent analyzes existing code and suggests specific changes, and engineers can review and approve before implementation. The agent validates functional correctness through data quality validations. The agent currently supports upgrades from Spark 2.4 to 3.5 and maintains data processing accuracy throughout the upgrade process. The Apache Spark upgrade agent is now available in all AWS Regions where SageMaker Unified Studio is available. To start using the agent, visit SageMaker Unified Studio and select IDE Spaces or install the Kiro CLI. For detailed implementation guidance, reference documentation, and migration examples, visit the https://docs.aws.amazon.com/emr/latest/ReleaseGuide/spark-upgrades.html.

Announcing the Apache Spark upgrade agent for Amazon EMR

AWS announces the Apache Spark upgrade agent, a new capability that accelerates Apache Spark version upgrades for Amazon EMR on EC2 and EMR Serverless. The agent converts complex upgrade processes...

#AWS #AwsGovcloudUs #AwsGlue #AmazonEmr

0 0 0 0
Preview
Announcing the Apache Spark upgrade agent for Amazon EMR AWS announces the Apache Spark upgrade agent, a new capability that accelerates Apache Spark version upgrades for Amazon EMR on EC2 and EMR Serverless. The agent converts complex upgrade processes that typically take months into projects spanning weeks through automated code analysis and transformation. Organizations invest substantial engineering resources analyzing API changes, resolving conflicts, and validating applications during Spark upgrades. The agent introduces conversational interfaces where engineers express upgrade requirements in natural language, while maintaining full control over code modifications. The Apache Spark upgrade agent automatically identifies API changes and behavioral modifications across PySpark and Scala applications. Engineers can initiate upgrades directly from SageMaker Unified Studio, Kiro CLI or IDE of their choice with the help of MCP (Model Context Protocol) compatibility. During the upgrade process, the agent analyzes existing code and suggests specific changes, and engineers can review and approve before implementation. The agent validates functional correctness through data quality validations. The agent currently supports upgrades from Spark 2.4 to 3.5 and maintains data processing accuracy throughout the upgrade process. The Apache Spark upgrade agent is now available in all AWS Regions where SageMaker Unified Studio is available. To start using the agent, visit SageMaker Unified Studio and select IDE Spaces or install the Kiro CLI. For detailed implementation guidance, reference documentation, and migration examples, visit the documentation.

🆕 AWS's Apache Spark upgrade agent for Amazon EMR speeds up Spark upgrades from 2.4 to 3.5 via automated code analysis, reducing months to weeks. Available globally with SageMaker Unified Studio.

#AWS #AwsGovcloudUs #AwsGlue #AmazonEmr

0 0 0 0
AWS announces support for Apache Iceberg V3 deletion vectors and row lineage AWS now supports deletion vectors and row lineage as defined in the Apache Iceberg Version 3 (V3) specification. These new features are available with Apache Spark on Amazon EMR 7.12, AWS Glue, Amazon SageMaker notebooks, Amazon S3 Tables, and the AWS Glue Data Catalog. These Iceberg V3 capabilities help customers build petabyte-scale data lakes with improved performance for data modifications and functionality to easily track changed records. Deletion vectors write optimized delete files that speed up data pipelines and reduce data compaction costs. Row lineage provides metadata fields on each record to track changes with a simple SQL query, eliminating the computational expense of finding small changes in large tables. Get started creating V3 tables by setting the table property to 'format-version = 3' in the CREATE TABLE command in Spark or a SageMaker notebook. To upgrade existing tables, simply update the table property in metadata with the new format version. When you do this, AWS query engines that support V3 will automatically begin to use deletion vectors and row lineage. Iceberg V3 deletion vectors and row lineage are now available in all AWS Regions where each respective service/feature—Amazon EMR, AWS Glue, SageMaker notebooks, S3 Tables, and AWS Glue Data Catalog—is supported. To learn more about AWS support for Iceberg V3, visit https://docs.aws.amazon.com/AmazonS3/latest/userguide/working-with-apache-iceberg-v3.html, and read the https://aws.amazon.com/blogs/big-data/accelerate-data-lake-operations-with-apache-iceberg-v3-deletion-vectors-and-row-lineage/.

AWS announces support for Apache Iceberg V3 deletion vectors and row lineage

AWS now supports deletion vectors and row lineage as defined in the Apache Iceberg Version 3 (V3) specification. These new features are available with Apache Spark on Amazon EMR 7.1...

#AWS #AmazonS3 #AwsGlue #AmazonEmr

1 0 0 0
Preview
AWS announces support for Apache Iceberg V3 deletion vectors and row lineage AWS now supports deletion vectors and row lineage as defined in the Apache Iceberg Version 3 (V3) specification. These new features are available with Apache Spark on Amazon EMR 7.12, AWS Glue, Amazon SageMaker notebooks, Amazon S3 Tables, and the AWS Glue Data Catalog. These Iceberg V3 capabilities help customers build petabyte-scale data lakes with improved performance for data modifications and functionality to easily track changed records. Deletion vectors write optimized delete files that speed up data pipelines and reduce data compaction costs. Row lineage provides metadata fields on each record to track changes with a simple SQL query, eliminating the computational expense of finding small changes in large tables. Get started creating V3 tables by setting the table property to 'format-version = 3' in the CREATE TABLE command in Spark or a SageMaker notebook. To upgrade existing tables, simply update the table property in metadata with the new format version. When you do this, AWS query engines that support V3 will automatically begin to use deletion vectors and row lineage. Iceberg V3 deletion vectors and row lineage are now available in all AWS Regions where each respective service/feature—Amazon EMR, AWS Glue, SageMaker notebooks, S3 Tables, and AWS Glue Data Catalog—is supported. To learn more about AWS support for Iceberg V3, visit Apache Iceberg V3 on AWS, and read the blog post.

🆕 AWS supports Apache Iceberg V3 in EMR, Glue, SageMaker, S3 Tables, and Glue Data Catalog, boosting data lake performance and change tracking. Set 'format-version = 3' in CREATE TABLE. Available in all supported regions.

#AWS #AmazonS3 #AwsGlue #AmazonEmr

1 0 0 0
Amazon EMR and AWS Glue now support audit context support with Lake Formation Amazon EMR and AWS Glue now provide comprehensive audit context support for AWS Lake Formation credential vending APIs and AWS Glue Data Catalog GetTable and GetTables API calls. This auditing capability helps you maintain compliance with regulatory frameworks, including the Digital Markets Act (DMA) and data protection regulations. The feature is enabled by default, offering seamless integration into existing workflows while strengthening security and compliance monitoring across your data lake infrastructure. You can view this audit context information in AWS CloudTrail logs, enabling enhanced security auditing, regulatory compliance, and improved troubleshooting for EMR for Apache Spark native fine-grained access control (FGAC) and full table access jobs. The audit logging feature automatically records the platform type (EMR-EC2, EMR on EKS, EMR Serverless, or AWS Glue) and its corresponding identifiers like such as Cluster ID, Step ID, Job Run ID, and Virtual Cluster ID. This enables security teams to track and correlate API calls from individual Spark jobs, streamline compliance reporting, and analyze historical data access patterns. Additionally, data engineers can quickly troubleshoot access-related issues by connecting them to specific job executions, resolve FGAC permission challenges, and monitor access patterns across different compute platforms. This feature is available in all AWS Regions that support Amazon EMR, AWS Glue, and AWS Lake Formation, requiring EMR version 7.12+ or AWS Glue version 5.1+.

Amazon EMR and AWS Glue now support audit context support with Lake Formation

Amazon EMR and AWS Glue now provide comprehensive audit context support for AWS Lake Formation credential vending APIs and AWS Glue Data Catalog GetTable and GetTables API calls. This auditi...

#AWS #AmazonEmr #AwsGlue

1 0 0 0
Amazon EMR and AWS Glue now support write operations with AWS Lake Formation fine-grained access controls Amazon EMR and AWS Glue now enable you to enforce fine-grained access control (FGAC) on both read and write operations for AWS Lake Formation registered tables in your Apache Spark jobs. Previously, you could only apply Lake Formation's table, column, and row-level permissions for read operations (SELECT, DESCRIBE). This simplifies data workflows by allowing both read and write tasks in a single Spark job, eliminating the need for separate clusters or applications. Organizations can now execute end-to-end data workflows with consistent security controls, streamlining operations and reducing infrastructure costs. With this launch, administrators can control who is authorized to insert new data, update specific records, or merge changes through DML operations (CREATE, ALTER, INSERT, UPDATE, DELETE, MERGE INTO, DROP), ensuring that all data modifications adhere to specified security policies to mitigate the risk of unauthorized data modification, or misuse. This launch simplifies data governance and security frameworks by providing a single point for defining access rules in AWS Lake Formation and enforcing these rules in Spark for both read and write operations. This feature is available in all AWS Regions where Amazon EMR (EC2, EKS and Serverless), AWS Glue and AWS Lake Formation are available. To learn more, visit the https://docs.aws.amazon.com/emr/latest/EMR-Serverless-UserGuide/emr-serverless-lf-enable.html#emr-serverless-lf-enable-open-table-format-support documentation.

Amazon EMR and AWS Glue now support write operations with AWS Lake Formation fine-grained access controls

Amazon EMR and AWS Glue now enable you to enforce fine-grained access control (FGAC) on both read and write operations for AWS Lake Formation registered tables in...

#AWS #AmazonEmr #AwsGlue

1 0 0 0
Preview
Amazon EMR and AWS Glue now support audit context support with Lake Formation Amazon EMR and AWS Glue now provide comprehensive audit context support for AWS Lake Formation credential vending APIs and AWS Glue Data Catalog GetTable and GetTables API calls. This auditing capability helps you maintain compliance with regulatory frameworks, including the Digital Markets Act (DMA) and data protection regulations. The feature is enabled by default, offering seamless integration into existing workflows while strengthening security and compliance monitoring across your data lake infrastructure. You can view this audit context information in AWS CloudTrail logs, enabling enhanced security auditing, regulatory compliance, and improved troubleshooting for EMR for Apache Spark native fine-grained access control (FGAC) and full table access jobs. The audit logging feature automatically records the platform type (EMR-EC2, EMR on EKS, EMR Serverless, or AWS Glue) and its corresponding identifiers like such as Cluster ID, Step ID, Job Run ID, and Virtual Cluster ID. This enables security teams to track and correlate API calls from individual Spark jobs, streamline compliance reporting, and analyze historical data access patterns. Additionally, data engineers can quickly troubleshoot access-related issues by connecting them to specific job executions, resolve FGAC permission challenges, and monitor access patterns across different compute platforms. This feature is available in all AWS Regions that support Amazon EMR, AWS Glue, and AWS Lake Formation, requiring EMR version 7.12+ or AWS Glue version 5.1+.

🆕 Amazon EMR and AWS Glue now support audit context for AWS Lake Formation, aiding compliance with regulations like DMA. Default enabled, it logs platform details in CloudTrail, enhancing security and troubleshooting. Available in all regions with EMR 7.12+ or Glue 5.1+.

#AWS #AmazonEmr #AwsGlue

1 0 0 0
Preview
Amazon EMR and AWS Glue now support write operations with AWS Lake Formation fine-grained access controls Amazon EMR and AWS Glue now enable you to enforce fine-grained access control (FGAC) on both read and write operations for AWS Lake Formation registered tables in your Apache Spark jobs. Previously, you could only apply Lake Formation's table, column, and row-level permissions for read operations (SELECT, DESCRIBE). This simplifies data workflows by allowing both read and write tasks in a single Spark job, eliminating the need for separate clusters or applications. Organizations can now execute end-to-end data workflows with consistent security controls, streamlining operations and reducing infrastructure costs. With this launch, administrators can control who is authorized to insert new data, update specific records, or merge changes through DML operations (CREATE, ALTER, INSERT, UPDATE, DELETE, MERGE INTO, DROP), ensuring that all data modifications adhere to specified security policies to mitigate the risk of unauthorized data modification, or misuse. This launch simplifies data governance and security frameworks by providing a single point for defining access rules in AWS Lake Formation and enforcing these rules in Spark for both read and write operations. This feature is available in all AWS Regions where Amazon EMR (EC2, EKS and Serverless), AWS Glue and AWS Lake Formation are available. To learn more, visit the open table format support documentation.

🆕 Amazon EMR and AWS Glue now support write operations with AWS Lake Formation's fine-grained access controls, enabling consistent security for both read and write tasks in Spark jobs, simplifying data workflows and reducing infrastructure costs.

#AWS #AmazonEmr #AwsGlue

1 0 0 0
Introducing AWS Glue 5.1 AWS Glue 5.1 is now generally available, delivering improved performance, security updates, expanded Apache Iceberg capabilities, and AWS Lake Formation write support for data integration workloads. https://aws.amazon.com/glue/ is a serverless, scalable data integration service that simplifies discovering, preparing, moving, and integrating data from multiple sources. This release upgrades core engines to Apache Spark 3.5.6, Python 3.11, and Scala 2.12.18, bringing performance and security enhancements. It also updates support for open table format libraries, including Apache Hudi 1.0.2, Apache Iceberg 1.10.0, and Delta Lake 3.3.2. AWS Glue 5.1 introduces support for Apache Iceberg format version 3.0, adding default column values, deletion vectors for merge-on-read tables, multi-argument transforms, and row lineage tracking. This release also extends https://aws.amazon.com/lake-formation/ fine-grained access control to write operations (both DML and DDL) for Spark DataFrames and Spark SQL. Previously, this capability was limited to read operations only. AWS Glue 5.1 also adds full-table access control in Apache Spark for Apache Hudi and Delta Lake tables, providing more comprehensive security options for your data. AWS Glue 5.1 is available in US East (N. Virginia), US East (Ohio), US West (Oregon), Europe (Ireland), Europe (Stockholm), Europe (Frankfurt), Europe (Spain), Asia Pacific (Hong Kong), Asia Pacific (Singapore), Asia Pacific (Sydney), Asia Pacific (Tokyo), Asia Pacific (Malaysia), Asia Pacific (Thailand), Asia Pacific (Mumbai), and South America (São Paulo). Visit the AWS Glue https://docs.aws.amazon.com/glue/ for more information.  

Introducing AWS Glue 5.1

AWS Glue 5.1 is now generally available, delivering improved performance, security updates, expanded Apache Iceberg capabilities, and AWS Lake Formation write support for data integration workloads.

https://aws.amazon.com/glue/ is a serverless, scala...

#AWS #AwsGlue

1 0 0 0
Preview
Introducing AWS Glue 5.1 AWS Glue 5.1 is now generally available, delivering improved performance, security updates, expanded Apache Iceberg capabilities, and AWS Lake Formation write support for data integration workloads. AWS Glue is a serverless, scalable data integration service that simplifies discovering, preparing, moving, and integrating data from multiple sources. This release upgrades core engines to Apache Spark 3.5.6, Python 3.11, and Scala 2.12.18, bringing performance and security enhancements. It also updates support for open table format libraries, including Apache Hudi 1.0.2, Apache Iceberg 1.10.0, and Delta Lake 3.3.2. AWS Glue 5.1 introduces support for Apache Iceberg format version 3.0, adding default column values, deletion vectors for merge-on-read tables, multi-argument transforms, and row lineage tracking. This release also extends AWS Lake Formation fine-grained access control to write operations (both DML and DDL) for Spark DataFrames and Spark SQL. Previously, this capability was limited to read operations only. AWS Glue 5.1 also adds full-table access control in Apache Spark for Apache Hudi and Delta Lake tables, providing more comprehensive security options for your data. AWS Glue 5.1 is available in US East (N. Virginia), US East (Ohio), US West (Oregon), Europe (Ireland), Europe (Stockholm), Europe (Frankfurt), Europe (Spain), Asia Pacific (Hong Kong), Asia Pacific (Singapore), Asia Pacific (Sydney), Asia Pacific (Tokyo), Asia Pacific (Malaysia), Asia Pacific (Thailand), Asia Pacific (Mumbai), and South America (São Paulo). Visit the AWS Glue documentation for more information.

🆕 AWS Glue 5.1 is now available, enhancing performance, security, and support for Apache Iceberg and Lake Formation write operations, with new features like default column values and multi-argument transforms.

#AWS #AwsGlue

1 0 0 0
Announcing AWS Glue zero-ETL for self-managed Database Sources https://docs.aws.amazon.com/glue/latest/dg/zero-etl-using.html now supports zero-ETL for self-managed database sources. Using Glue zero-ETL, you can now setup an integration to replicate data from Oracle, SQL Server, MySQL or PostgreSQL databases which are located on-premises or on AWS EC2 to Redshift with a simple experience that eliminates configuration complexity. AWS zero-ETL for self-managed database sources will automatically create an integration for an on-going replication of data from your on-premises or EC2 databases through a simple, no-code interface. You can now replicate data from Oracle, SQL Server, MySQL and PostgreSQL databases into Redshift. This feature further reduces users' operational burden and saves weeks of engineering effort needed to design, build, and test data pipelines to ingest data from self-managed databases to Redshift.    AWS Glue zero-ETL for self-managed database sources are available in the following AWS Regions: US East (Ohio), Europe (Stockholm), Europe (Ireland), Europe (Frankfurt),  Canada West (Calgary), US West (Oregon), and Asia Pacific (Seoul) regions. To get started, sign into the https://us-east-1.console.aws.amazon.com/ec2/home?region=us-east-1#LaunchInstances:instanceType=r8a.large.  For more information visit the https://aws.amazon.com/glue/ or review the https://docs.aws.amazon.com/glue/latest/dg/zero-etl-using.html documentation.

Announcing AWS Glue zero-ETL for self-managed Database Sources

docs.aws.amazon.com/glue/latest/dg/zero-etl-... now supports zero-ETL for self-managed database sources. Using Glue zero-ETL, you can now setup an integration to replicate data from Oracle, SQL Server,...

#AWS #AwsGlue

1 0 0 0
Preview
Announcing AWS Glue zero-ETL for self-managed Database Sources AWS Glue now supports zero-ETL for self-managed database sources. Using Glue zero-ETL, you can now setup an integration to replicate data from Oracle, SQL Server, MySQL or PostgreSQL databases which are located on-premises or on AWS EC2 to Redshift with a simple experience that eliminates configuration complexity. AWS zero-ETL for self-managed database sources will automatically create an integration for an on-going replication of data from your on-premises or EC2 databases through a simple, no-code interface. You can now replicate data from Oracle, SQL Server, MySQL and PostgreSQL databases into Redshift. This feature further reduces users' operational burden and saves weeks of engineering effort needed to design, build, and test data pipelines to ingest data from self-managed databases to Redshift.    AWS Glue zero-ETL for self-managed database sources are available in the following AWS Regions: US East (Ohio), Europe (Stockholm), Europe (Ireland), Europe (Frankfurt),  Canada West (Calgary), US West (Oregon), and Asia Pacific (Seoul) regions. To get started, sign into the AWS Management Console.  For more information visit the AWS Glue page or review the AWS Glue zero-ETL documentation.

🆕 AWS Glue now offers zero-ETL for self-managed database sources, simplifying data replication from Oracle, SQL Server, MySQL, or PostgreSQL to Redshift with no-code integration, reducing setup complexity and operational burden. Available in select regions.

#AWS #AwsGlue

1 0 0 0
AWS Glue Data Quality now supports rule labeling for enhanced reporting Today, AWS announces the general availability of rule label, a feature of AWS Glue Data Quality, enabling you to apply custom key-value pair labels to your data quality rules for improved organization, filtering, and targeted reporting. This enhancement allows you to categorize data quality rules by business context, team ownership, compliance requirements, or any custom taxonomy that fits your data quality and governance needs. Rule labels provide effective way to organize analyze data quality results. You can query results by specific labels to identify failing rules within particular categories, count rule outcomes by team or domain, and create focused reports for different stakeholders. For example, you can apply all rules that pertain to finance team with a label "team=finance" and generate a customized report to showcase quality metrics specific to finance team. You can label high priority rules with "criticality=high" to prioritize remediation efforts. Labels can be authored as part of the DQDL. You can query the labels as part of rule outcomes, row-level results, and API responses, making it easy to integrate with your existing monitoring and reporting workflows. AWS Glue Data Quality rule labeling is available in all commercial AWS Regions wherehttps://docs.aws.amazon.com/glue/latest/dg/glue-data-quality.htmlis available. See the AWS Region Table for more details. To learn more about rule labeling, see the AWS Glue Data Quality https://docs.aws.amazon.com/glue/latest/dg/dqdl.html#dqdl-labels.

AWS Glue Data Quality now supports rule labeling for enhanced reporting

Today, AWS announces the general availability of rule label, a feature of AWS Glue Data Quality, enabling you to apply custom key-value pair labels to your data quality rules for improved organization, filte...

#AWS #AwsGlue

1 0 0 0
Preview
AWS Glue Data Quality now supports rule labeling for enhanced reporting Today, AWS announces the general availability of rule label, a feature of AWS Glue Data Quality, enabling you to apply custom key-value pair labels to your data quality rules for improved organization, filtering, and targeted reporting. This enhancement allows you to categorize data quality rules by business context, team ownership, compliance requirements, or any custom taxonomy that fits your data quality and governance needs. Rule labels provide effective way to organize analyze data quality results. You can query results by specific labels to identify failing rules within particular categories, count rule outcomes by team or domain, and create focused reports for different stakeholders. For example, you can apply all rules that pertain to finance team with a label "team=finance" and generate a customized report to showcase quality metrics specific to finance team. You can label high priority rules with "criticality=high" to prioritize remediation efforts. Labels can be authored as part of the DQDL. You can query the labels as part of rule outcomes, row-level results, and API responses, making it easy to integrate with your existing monitoring and reporting workflows. AWS Glue Data Quality rule labeling is available in all commercial AWS Regions where AWS Glue Data Quality is available. See the AWS Region Table for more details. To learn more about rule labeling, see the AWS Glue Data Quality documentation.

🆕 AWS Glue Data Quality now supports rule labeling for better organization and reporting, allowing custom labels for rules to categorize by context, team, or priority, enhancing data quality analysis and reporting. Available in all commercial regions.

#AWS #AwsGlue

2 0 0 0