News
Entertainment
Science & Technology
Life
Culture & Art
Hobbies
News
Entertainment
Science & Technology
Culture & Art
Hobbies
Amazon Q data integration, introduced in January 2024, allows you to use natural language to author extract, transform, load (ETL) jobs and operations in AWS Glue specific data abstraction DynamicFrame. This post introduces exciting new capabilities for Amazon Q data integration that work together to make ETL development more efficient and intuitive. We’ve added support for DataFrame-based code generation that works across any Spark environment. We’ve also introduced in-prompt context-aware development that applies details from your conversations, working seamlessly with a new iterative development experience.
Jumia is a technology company born in 2012, present in 14 African countries, with its main headquarters in Lagos, Nigeria. In this post, we share part of the journey that Jumia took with AWS Professional Services to modernize its data platform that ran under a Hadoop distribution to AWS serverless based solutions.
This post provides step-by-step instructions for creating a collaborative multi-agent framework with reasoning capabilities to decouple business applications from FMs. It demonstrates how to combine Amazon Bedrock Agents with open source multi-agent frameworks, enabling collaborations and reasoning among agents to dynamically execute various tasks. The exercise will guide you through the process of building a reasoning orchestration system using Amazon Bedrock, Amazon Bedrock Knowledge Bases, Amazon Bedrock Agents, and FMs. We also explore the integration of Amazon Bedrock Agents with open source orchestration frameworks LangGraph and CrewAI for dispatching and reasoning.
For the Amazon DynamoDB team, AWS re:Invent 2024 was an incredible experience to connect and reconnect with our customers. The key themes this year were “better together” integrations, data modeling, and building globally resilient, scalable applications on DynamoDB. In case you missed some of these sessions, or you wanted to get caught up on why customers like Klarna, Krafton, Vanguard, Fidelity, and JPMorgan Chase are building on DynamoDB, you can read this helpful summary of some of the DynamoDB highlights from re:Invent 2024.
Amazon Q embedded is a feature that lets you embed a hosted Amazon Q Business assistant on your website or application to create more personalized experiences that boost end-users’ productivity. In this post, we demonstrate how to use the Amazon Q embedded feature to add an Amazon Q Business assistant to your website or web application using basic HTML or React.
The zero-ETL integrations for Amazon Redshift are designed to automate data movement into Amazon Redshift, eliminating the need for traditional ETL pipelines. With zero-ETL integrations, you can reduce operational overhead, lower costs, and accelerate your data-driven initiatives. This enables organizations to focus more on deriving actionable insights and less on managing the complexities of data integration. In this post, we discuss the best practices for migrating your ETL pipeline from AWS DMS to zero-ETL integrations for Amazon Redshift.
In this post, we demonstrate how to implement a custom subscription workflow using Amazon DataZone, Amazon EventBridge, and AWS Lambda to automate the fulfillment process for unmanaged data assets, such as unstructured data stored in Amazon S3. This solution enhances governance and simplifies access to unstructured data assets across the organization.
HEMA is a household Dutch retail brand name since 1926, providing daily convenience products using unique design. This post describes how HEMA used Amazon DataZone to build their data mesh and enable streamlined data access across multiple business areas. It explains HEMA’s unique journey of deploying Amazon DataZone, the key challenges they overcame, and the transformative benefits they have realized since deployment in May 2024. From establishing an enterprise-wide data inventory and improving data discoverability, to enabling decentralized data sharing and governance, Amazon DataZone has been a game changer for HEMA.
In this post, we explore new features of the AWS Glue Data Catalog, which now supports improved automatic compaction of Iceberg tables for streaming data, making it straightforward for you to keep your transactional data lakes consistently performant. Enabling automatic compaction on Iceberg tables reduces metadata overhead on your Iceberg tables and improves query performance
In this blog post, we provide an introduction to preparing your own dataset for LLM training. Whether your goal is to fine-tune a pre-trained model for a specific task or to continue pre-training for domain-specific applications, having a well-curated dataset is crucial for achieving optimal performance.
Choosing the right storage configuration that meets performance requirements is a common challenge when creating and managing database instances. In this post, we provide an end-to-end guide for what storage class to choose depending on your use case. In addition, we compare the performance of different storage volumes on open source engines supported by Amazon RDS, to validate them from a database-centric perspective.
On November 11, 2024, the Apache Flink community released a new version of AWS services connectors, an AWS open source contribution. This new release, version 5.0.0, introduces a new source connector to read data from Amazon Kinesis Data Streams. In this post, we explain how the new features of this connector can improve performance and reliability of your Apache Flink application.
In this post, we’ll demonstrate how to configure an Amazon Q Business application and add a custom plugin that gives users the ability to use a natural language interface provided by Amazon Q Business to query real-time data and take actions in ServiceNow.
In this blog post, we'll dive into the various scenarios for how Cohere Rerank 3.5 improves search results for best matching 25 (BM25), a keyword-based algorithm that performs lexical search, in addition to semantic search. We will also cover how businesses can significantly improve user experience, increase engagement, and ultimately drive better search outcomes by implementing a reranking pipeline.
Fastweb, one of Italy’s leading telecommunications operators, recognized the immense potential of AI technologies early on and began investing in this area in 2019. In this post, we explore how Fastweb used cutting-edge AI and ML services to embark on their LLM journey, overcoming challenges and unlocking new opportunities along the way.
At Monzo, we use Amazon Keyspaces (for Apache Cassandra) as our main operational database. Today, we store over 350 TB of data across more than 2,000 tables in Amazon Keyspaces, handling over 2,000,000 reads and 100,000 writes per second at peak. In this post, we share how we used a different mechanism for row expiry than the Time to Live setting in Amazon Keyspaces to reduce our operating costs for an index while preserving its semantics.
Recently, AWS introduced over 50 new capabilities across its streaming services, significantly enhancing performance, scale, and cost-efficiency. Some of these innovations have tripled performance, provided 20 times faster scaling, and reduced failure recovery times by up to 90%. We have made it nearly effortless for customers to bring real-time context to AI applications and lakehouses. In this post, we discuss the top six game changers that will redefine AWS streaming data.
TUI Group is one of the world’s leading global tourism services, providing 21 million customers with an unmatched holiday experience in 180 regions. The TUI content teams are tasked with producing high-quality content for its websites, including product details, hotel information, and travel guides, often using descriptions written by hotel and third-party partners. In this post, we discuss how we used Amazon SageMaker and Amazon Bedrock to build a content generator that rewrites marketing content following specific brand and style guidelines.
DeNA Co., Ltd. (DeNA) engages in a variety of businesses, from games and live communities to sports & the community and healthcare & medical, under our mission to delight people beyond their wildest dreams. This post introduces a case study where DeNA combined Amazon Redshift Serverless and dbt (dbt Core) to accelerate data quality tests in their business.
Amazon Redshift made significant strides in 2024, that enhanced price-performance, enabled data lakehouse architectures by blurring the boundaries between data lakes and data warehouses, simplified ingestion and accelerated near real-time analytics, and incorporated generative AI capabilities to build natural language-based applications and boost user productivity. This blog post provides a comprehensive overview of the major product innovations and enhancements made to Amazon Redshift in 2024.
Amazon DynamoDB Accelerator (DAX) is a fully managed, in-memory cache for DynamoDB. By using DAX with DynamoDB, you can improve the latency for read requests in your application. In this post, we discuss how to improve latency and reduce cost when using DynamoDB for your read-heavy applications.
Amazon Bedrock Data Automation in public preview, offers a unified experience for developers of all skillsets to easily automate the extraction, transformation, and generation of relevant insights from documents, images, audio, and videos to build generative AI–powered applications. In this post, we demonstrate how to use Amazon Bedrock Data Automation in the AWS Management Console and the AWS SDK for Python (Boto3) for media analysis and intelligent document processing (IDP) workflows.
In this post, we provide a comprehensive guide for addressing performance considerations when migrating Oracle databases from Exadata to Amazon RDS for Oracle. We explore methods to analyze Exadata workload characteristics, including determining Smart IO usage, examining database-level I/O patterns, and identifying SQLs that utilize Exadata-specific features. We also discuss various alternatives available on RDS for Oracle to mitigate potential performance impacts.
Organizations are continuously seeking ways to use their proprietary knowledge and domain expertise to gain a competitive edge. With the advent of foundation models (FMs) and their remarkable natural language processing capabilities, a new opportunity has emerged to unlock the value of their data assets. As organizations strive to deliver personalized experiences to customers using […]
FundApps, founded in 2010, is one of the pioneers in the Regulatory Technology (RegTech) space, which includes compliance monitoring and reporting. FundApps decided to rearchitect their environment and transform it to a cloud-based architecture on AWS to better support the growth of their business. For more information, see Faster, cheaper, greener: Pick three — FundApps modernization journey. In this post, we focus on the persistence layer of the FundApps regulatory data service. You learn how FundApps improved the service scalability, reduced cost, and streamlined operations by migrating from SQL Server database to a cloud-centered solution combining Amazon Aurora Serverless v2 with Babelfish for Aurora PostgreSQL and Amazon Simple Storage Service (Amazon S3).
Recently, Amazon RDS launched the ability to shrink storage volumes using Amazon RDS Blue/Green Deployments – a nice addition to the list of new use cases that Blue/Green Deployments now supports. In this post, we cover how to use the new storage volume shrink feature in Amazon RDS Blue/Green Deployments to minimize the downtime required to perform the storage size reduction operation. We also review various mechanisms to monitor the progress of storage shrink and best practices on how to arrive at the optimal storage size for your shrink storage task.
You can create an Amazon RDS for Db2 instance by using the AWS Management Console, AWS Command Line Interface (AWS CLI), AWS CloudFormation, Terraform by Hashicorp, AWS Lambda functions, or other methods. One of the prerequisites for creating an RDS for Db2 instance is to configure the virtual private cloud (VPC) appropriately. This post shows how to create a VPC with best practices for any Amazon RDS database in general and Amazon RDS for Db2 in particular through a one-click automated deployment.
Today, we are excited to announce that the Llama 3.3 70B from Meta is available in Amazon SageMaker JumpStart. Llama 3.3 70B marks an exciting advancement in large language model (LLM) development, offering comparable performance to larger Llama versions with fewer computational resources. In this post, we explore how to deploy this model efficiently on Amazon SageMaker AI, using advanced SageMaker AI features for optimal performance and cost management.
With Amazon SageMaker Lakehouse unified data connectivity, you can confidently connect, explore, and unlock the full value of your data across AWS services and achieve your business objectives with agility. This post demonstrates how SageMaker Lakehouse unified data connectivity helps your data integration workload by streamlining the establishment and management of connections for various data sources.
We spoke with Dr. Swami Sivasubramanian, Vice President of Data and AI, shortly after AWS re:Invent 2024 to hear his impressions—and to get insights on how the latest AWS innovations help meet the real-world needs of customers as they build and scale transformative generative AI applications.
In How the Amazon TimeHub team designed resiliency and high availability for their data replication framework: Part 2, we covered different scenarios handling replication failures at the source database (Oracle), AWS DMS, and target database (Amazon Aurora PostgreSQL-Compatible Edition). As part of our resilience scenario testing, when there was a failover between the Oracle primary database instance and primary standby instances, and the database opened up with RESETLOGS, AWS DMS couldn’t automatically read the new set of logs in case of a new incarnation. In this post, we dive deep into the solution the Amazon TimeHub team used for detecting such a scenario and recovering from it. We then describe the post-recovery steps to validate and correct data discrepancies caused due to the failover scenario.
In How the Amazon Timehub team built a data replication framework using AWS DMS: Part 1, we covered how we built a low-latency replication solution to replicate data from an Oracle database using AWS DMS to Amazon Aurora PostgreSQL-Compatible Edition. In this post, we elaborate on our approach to address resilience of the ongoing replication between source and target databases.
In this post, we explore Clearwater Analytics’ foray into generative AI, how they’ve architected their solution with Amazon SageMaker, and dive deep into how Clearwater Analytics is using LLMs to take advantage of more than 18 years of experience within the investment management domain while optimizing model cost and performance.
In this post, we explore a solution for implementing load balancing across login nodes in Slurm-based HyperPod clusters. By distributing user activity evenly across all available nodes, this approach provides more consistent performance, better resource utilization, and a smoother experience for all users. We guide you through the setup process, providing practical steps to achieve effective load balancing in your HyperPod clusters.
Ensemble models are becoming popular within the ML communities. They generate more accurate predictions through combining the predictions of multiple models. Pipelines can quickly be used to create and end-to-end ML pipeline for ensemble models. This enables developers to build highly accurate models while maintaining efficiency, and reproducibility. In this post, we provide an example of an ensemble model that was trained and deployed using Pipelines.
With AWS DMS, you can use data validation to make sure your data was migrated accurately from the source to the target. If you enable validation for a task, AWS DMS begins comparing the source and target data immediately after a full load is performed for a table. In this post, we describe the custom framework we built on top of AWS DMS validation tasks to maintain data integrity as part of the ongoing replication between source and target databases.
Bedrock multi-agent collaboration enables developers to build, deploy, and manage multiple specialized agents working together seamlessly to address increasingly complex business workflows. In this post, we show you how agentic workflows with Amazon Bedrock Agents can help accelerate this journey for research scientists with a natural language interface. We define an example analysis pipeline, specifically for lung cancer survival with clinical, genomics, and imaging modalities of biomarkers. We showcase a variety of specialized agents including a biomarker database analyst, statistician, clinical evidence researcher, and medical imaging expert in collaboration with a supervisor agent. We demonstrate advanced capabilities of agents for self-review and planning that help build trust with end users by breaking down complex tasks into a series of steps and showing the chain of thought to generate the final answer.
In this post, we demonstrate how we innovated to build a Retrieval Augmented Generation (RAG) application with agentic workflow and a knowledge base on Amazon Bedrock. We implemented the RAG pipeline in a Slack chat-based assistant to empower the Amazon Twitch ads sales team to move quickly on new sales opportunities.
With the recent addition of physical replication as an option for RDS Blue/Green Deployments, you can overcome most of the limitations of logical replication. This makes physical replication particularly well-suited for use cases like minor version upgrades, schema changes (DDL operations) in the blue environment, and storage adjustments. In this post, we delve into the advantages of using physical replication in RDS for PostgreSQL blue/green deployments to simplify database operations and scale with application demands. We explore the key benefits of physical replication and provide a step-by-step guide to help you get started with this new capability.