Case_Magama

Magama: Intregação de aplicações com chatbot da Amazon Lex

Sobre a Magama

Magama é uma startup Chilena com 4 anos de mercado que entrega uma experiência digital inovadora. E isso é possível, pois a Magama faz uso de soluções imersivas incríveis, que transportam seus clientes para o mundo da realidade virtual através de tours virtuais em 3D, destinados tanto para eventos quanto atividades relacionadas à engenharia e arquitetura.

metaverso também é explorado pela Magama. Nesse caso, ela utiliza a inteligência artificial integrada ao mundo virtual e aliada ao chatbot, que funciona na orientação da navegação do usuário. Além disso, o assistente de voz traz diversas funcionalidades para o usuário.

Conectando o mundo do chatbots a realidade virtual

Nesse projeto específico, a Magama queria adicionar um chatbot nas suas soluções para que os usuários finais tivessem uma experiência ainda mais imersiva e fluida. Essa solução habilitaria o usuário, por exemplo, a tirar as suas dúvidas sobre o espaço virtual de forma automatizada.

A Magama identificou a AWS como o seu provedor principal de tecnologias de cloud. E foi com a DNX Brasil que a Magama descobriu o parceiro ideal para tornar a sua visão uma realidade. Um desafio adicional era a necessidade de troca de tecnologias em decorrência de uma descontinuação. No entanto, junto à Magama, modificamos a solução proposta para atender às novas necessidades.

Do ponto de vista técnico, a Magama precisava conectar a sua solução virtual com uma solução de chatbot, além de outros canais, como os de mensageria, por exemplo. Seria necessária, então, uma integração que permitisse conexões entre vários sistemas e os chatbots. E, além da conexão com o chatbot, as métricas analíticas e de controle de qualidade do atendimento dos chatbots também seriam implantadas.

As soluções: API e o dashboard

A nossa solução foi dividida em duas partes. Inicialmente, havia a necessidade de integração de aplicações com qualquer chatbot da Amazon Lex (no nosso caso Lex v2). Para isso, criamos uma API serverless que intermedeia essa comunicação. Com a tecnologia da Amazon, essa integração suporta comunicação tanto via texto quanto usando a voz do usuário. Além de receber uma voz sintetizada do chatbot para permitir casos de usos mais naturais. Amazon API Gateway e Amazon Lambda foram os serviços principais utilizados, além do próprio Amazon Lex.

A segunda parte da nossa solução foi a criação de um dashboard analítico do Amazon Lex. Nesse momento, foi usado Amazon CloudWatch Logs Insights que consome logs nativos do Amazon Lex e visualiza os resultados em um dashboard.

Toda a solução e sua infraestrutura foram escritas em código (IaC) para a sua fácil replicação, modificação e controle. Com isso, atendemos à necessidade da Magama de poder criar vários dashboards para a variedade de seus clientes.

A interação dentro e fora da realidade virtual

A solução entregue é agnóstica, uma vez que é parametrizável o suficiente para integrar qualquer chatbot do Amazon Lex e visualizar as métricas desejadas. Isso viabiliza a finalidade da Magama, que é disponibilizar inovação com chatbots em vários ambientes, dentro e fora da realidade virtual, além da captura de dados relevantes para visualização no dashboard.

Outro benefício do projeto é que a API pode ser disponibilizada para os seus contratantes diretamente. Ao mesmo tempo, a Magama tem controle do uso das APIs, tendo em vista a importância para o controle do custo por usuário ou aplicação.

E, por último, mas não menos importante, mesmo com o desafio dos ajustes no escopo e na ideação, a Magama foi bem atendida por meio de uma solução que permite que ela cresça e se torne mais escalável.

Sobre a DNX Brasil

A DNX Brasil entrega para seus clientes a melhor experiência em cloud computing. Nossas soluções são fundamentadas na nuvem AWS, como: AWS Well-Architected, contêineres ECS, Kubernetes, integração contínua/entrega contínua, service mesh, big data, analytics e inteligência artificial.

Nosso time de especialistas é composto por profissionais experientes, qualificados e certificados pela AWS, com foco em conceitos cloud-native.

 Confira nossos projetos de open-source aqui e siga-nos no LinkedIn.

Descubra o valor dos dados

A eficacia de uma líderança depende do uso de dados para tomar decisões importantes, é preciso ter um olhar amplo com informações assertivas para ter ações significativas, assim é contruida uma estratégia de dados moderna para fornecer insights às pessoas e aplicações que precisam, com segurança e em qualquer escala.

A DNX Brasil ajuda sua empresa a aplicar análise de dados em seus casos de uso mais críticos para os negócios com soluções completas que precisam de experiência em dados. 

Cromai: Treinamento de Deep Learning 15x mais rápido na nuvem

Sobre a Cromai

A Cromai é uma agtech fundada em 2017 com foco em melhorar de forma eficiente a vida do produtor agrícola. E para isso, usa aplicações de tecnologia de fronteira, principalmente, Machine Learning com a visão computacional para identificar de maneira automatizada padrões em imagens coletadas no campo, oferecendo então um diagnóstico que permite uma tomada de decisão mais precisa.

Alinhada à complexidade do campo, a Cromai possibilita que o produtor agrícola atinja seu potencial máximo produtivo utilizando IA de maneira simples e sustentável. É possível utilizar soluções que filtram a impureza vegetal na cana de açúcar a partir de sensores, por exemplo. Para plantas daninhas também é possível identificar o local em que elas nascem e, com isso, direcionar o agricultor para a melhor forma de realizar o manejo necessário.

Esses sistemas processam e analisam fatores que geram resultados para os produtores de todo Brasil. Isso possibilitou um olhar internacional que fez com que a Cromai fosse selecionada pela StartUs Insights como umas das 5 startups mais promissoras no mundo, em visão computacional para agricultura.

Desafios de uma das mais promissoras startups do mundo

O principal desafio era otimizar o tempo de treinamento da Machine Learning, pois a demora para gerar a nova versão era muito grande e impactava diretamente o core do negócio. Trouxemos o treinamento de Machine Learning para cloud AWS, dessa forma foi possível treinar novos diversos modelos, com base em imagens.

Para ter uma dimensão da quantidade de dados para a solução das plantas daninhas, mais de 20 milhões de imagens eram armazenadas no dataset. E esse fator aumentava a necessidade de ter um cluster de treinamento mais robusto. A Cromai utilizava um servidor com uma GPU direcionada ao treinamento de modelos de Deep Learning, e com esta configuração, a realização de experimentos ocorria de maneira demorada, em torno de 3 meses para treinar um modelo.

Os benefícios de treinamento de redes neurais em múltiplas GPUs em paralelo

Entendendo as necessidades da Cromai, o objetivo da nossa solução era a redução do tempo de treinamento sem que isso afetasse, de uma forma significativa, o custo dele e as métricas de performance do modelo. Estávamos confiantes, pois conseguimos entregar um bom resultado, conhecendo as possibilidades do Amazon SageMaker.

Inicialmente, nós tínhamos duas grandes vantagens que contribuíram para o sucesso do projeto. A primeira delas é que na AWS é possível usar instâncias de treinamento muito potentes, equipadas com várias GPUs modernas por instância. Essa alteração tem seus benefícios em termos de pura performance.

Em segundo, é possível distribuir o treinamento em mais de uma instância. Esta tarefa não é algo trivial, já que o treinamento de redes neurais, mesmo sendo distribuído, precisa manter sincronia entre as suas instâncias e GPUs. Para esta tarefa existem frameworks, como SageMaker distributed.

No caso do nosso projeto, devido a uma necessidade técnica, optamos pelo Horovod, framework open-source de treinamento distribuído para algoritmos de deep learning.

O Amazon SageMaker suporta esse framework e a nossa principal tarefa era a adequação do script de treinamento da Cromai para o ambiente do Amazon SageMaker. Utilizamos o S3 como armazenamento de dados de treinamento e, principalmente, adicionamos a camada do Horovod no script de treinamento.

Criamos também uma forma fácil e com transparência de custo para a Cromai escolher a quantidade e o tipo das instâncias de cada treinamento.

Criamos também uma forma fácil e com transparência de custo para a Cromai escolher a quantidade e o tipo das instâncias de cada treinamento.

Redução do tempo de treinamento e o impacto no negócio

Diminuir o tempo de treinamento era fundamental para a escalada dos projetos na Cromai, a demora no tempo do treinamento dos modelos estava afetando diretamente o sucesso do negócio.

Graças ao domínio do nosso time sobre as possibilidades existentes no Amazon SageMaker e a estratégia elaborada, conseguimos de forma efetiva resolver essa dor.

A solução desenvolvida impactou bruscamente o tempo de treinamento que caiu de 3 meses para 6 dias, mantendo todas as métricas de performances existentes. Em caso de necessidade a Cromai tem uma opção de aumentar o investimento no treinamento a fim obter resultados em até 3 dias.

Com a diminuição do tempo a interação ficou mais frequente, isso aumentou a agilidade e o time de tecnologia da Cromai agora passar mais tempo fazendo o que ama: tornar as soluções melhores e mais adequadas à realidade do produtor rural.

Sobre a DNX

Na DNX Brasil trabalhamos para trazer a melhor experiência em Cloud e aplicações para empresas nativas digitais no Brasil.

Atuamos em áreas com foco em AWS, Well-Architected Solutions, Contêineres, ECS, Kubernetes, Integração e Entrega Contínua e Soluções de Mesh e Soluções em Data (plataformas de dados, data lakes, machine learning, analytics e BI).

Confira nossos projetos de open-source em github.com/DNXLabs e siga-nos no LinkedInTwitter e Youtube.

Escrito por: Ladislav Vrbsky e Luis Campos / Revisão: Camila Targino

Descubra o valor dos dados

A eficacia de uma líderança depende do uso de dados para tomar decisões importantes, é preciso ter um olhar amplo com informações assertivas para ter ações significativas, assim é contruida uma estratégia de dados moderna para fornecer insights às pessoas e aplicações que precisam, com segurança e em qualquer escala.

A DNX Brasil ajuda sua empresa a aplicar análise de dados em seus casos de uso mais críticos para os negócios com soluções completas que precisam de experiência em dados. 

CreditorWatch Democratises Credit Data with DNX Solutions

CreditorWatch Democratises Credit Data

CreditorWatch was founded in 2010 by a small business owner who wanted to create an open source, affordable way for SMBs to access and share credit risk information. Today, CreditorWatch’s subscription-based online platform enables its 55,000+ customers—from sole traders to listed enterprises—to perform credit checks and determine the risk to their businesses. It also offers additional integrated products and services that help customers make responsible, informed credit decisions.

CreditorWatch helps businesses understand who they are trading with and any creditor issues associated with that particular business. They analyse data from 30 different sources, including both private and government sources. Some of their most powerful behaviour data is crowdsourced from their very own customers providing insights into businesses. Ultimately, CreditorWatch customers get access to Australia’s most insightful business credit rating.

The Challenge of Australia’s Largest Commercial Credit Bureau

An expansion phase saw major corporations, including Australia’s Big Four banks, looking to leverage CreditorWatch’s rich dataset and granular analytics capabilities. As a result, CreditorWatch decided to increase its agility and efficiency. With the need to provide a continuously secure and compliant environment, with reduced costs and increased time to market, CreditorWatch engaged with DNX Solutions. DNX was tasked with creating and executing a roadmap for the improvements, targeting cloud-native concepts, and bringing more efficiency to the IT and Operations teams.

Through workshops during the discovery phase, DNX determined CreditorWatch’s business and technical capabilities, such as the interdependencies, storage constraints, release process, and level of security. With the required information at hand, DNX developed a roadmap to meet CreditorWatch’s Technical and Business objectives, using AWS best practices “The 7R’s” (retire, retain, relocate, rehost, repurchase, replatform, and refactor).

A Safe Environment to Meet ISO Standards

To continue delivering a safe platform to their customers and meeting the requirements of ISO and other compliance standards, DNX constructed a new secure AWS environment utilising its DNX.one Foundation.

Rather than undergoing a lengthy and expensive process each time a safe environment needs to be recreated, DNX.one helps customers build secure and scalable container platforms at high-availability and low-cost. This unique marketplace solution designed for AWS with well-architected principles combines years of cloud experience in a platform focused on simplicity, infrastructure-as-code and open sources technologies. In addition, DNX.one provides a consistent approach to implementing designs that will scale CreditorWatch’s application needs over time.

Once CreditorWatch’s environment was secured with the best AWS and industry practices, it was time to move to the modernisation phase.

Instant Cost Reduction of 120K per Year With Data Modernisation

Due to the amount of data received on a daily basis, CreditorWatch’s database increases considerably in size and cost.

The DNX data team worked on the data Engineering by optimising CreditorWatch’s Aurora database and its tools to full capability. 

Amazon Aurora is a MySQL and PostgreSQL-compatible relational database built for the cloud that combines the performance and availability of traditional enterprise databases with the simplicity and cost-effectiveness of open source databases.

Amazon Aurora features a distributed, fault-tolerant, self-healing storage system that auto-scales up to 128TB per database instance. It delivers high performance and availability with up to 15 low-latency read replicas, point-in-time recovery, continuous backup to Amazon S3, and replication across three Availability Zones.

Aurora data is stored in the cluster volume, which is a single, virtual volume that uses solid state drives (SSDs). A cluster volume consists of copies of the data across three Availability Zones in a single AWS Region. Because the data is automatically replicated across Availability Zones, customers’ data is highly durable with less possibility of data loss. This replication also ensures that databases are more available during a failover.

The Aurora cluster volume contains all user data, schema objects, and internal metadata, such as the system tables and the binary log. Its volumes automatically grow as the amount of data in the customer’s database increases.

With extensive data knowledge and years of experience with AWS solutions and tools, DNX provided a unique solution to configure Aurora Database leveraging its full capabilities, which resulted in an instant cost reduction of over 90K per year related to instant threshold of data availability.

The DNX team also created an automated archiving process utilising AWS Airflow, which analyses CreditorWatch’s database tables, identifying data which is unused for a period of time. Unused data is then archived with a different type of file storage at a cheaper rate than S3. This process resulted in an additional cost reduction of 30K per year.

AWS Archiving Process: How it works.

The Unique Value DNX brought to the CreditorWatch Project

DNX Solutions utilised its knowledge on DevOps, Cloud, data, and Software Engineering to provide CreditorWatch with a secure environment that continually meets ISO and other compliance standards. The diversity of experience integrated within the DNX team allowed for instant identification of areas for improvement in CreditorWatch’s systems. In addition, DNX assisted CreditorWatch in bringing about a cultural change by transferring its DevOps mindset approach. Not only was the goal of agility and efficiency reached by the close of the project, but significant storage cost reductions were made enabling CreditorWatch to compete to a higher standard and continue to expand.

Descubra o valor dos dados

A eficacia de uma líderança depende do uso de dados para tomar decisões importantes, é preciso ter um olhar amplo com informações assertivas para ter ações significativas, assim é contruida uma estratégia de dados moderna para fornecer insights às pessoas e aplicações que precisam, com segurança e em qualquer escala.

A DNX Brasil ajuda sua empresa a aplicar análise de dados em seus casos de uso mais críticos para os negócios com soluções completas que precisam de experiência em dados. 

KOBA Insurance

KOBA Seguro, uma empresa orientada a dados

KOBA Insurance

A startup australiana KOBA Insurance oferece uma abrangente política de seguro de carro que se concentra em atender veículos conectados: carros que estão pré-conectados à internet. O que a difere das outras companhias de seguro? As taxas são baseadas em quanto os clientes realmente dirigem seus carros.

Funciona instalando o KOBA Rider – um módulo do tamanho de uma caixa de fósforos – na tomada On-Board Diagnostics (OBD) do carro, uma porta de computador externa geralmente localizada atrás de um painel na seção inferior do painel de instrumentos.

O KOBA Rider recebe dados de condução e GPS em tempo real e os comunica ao aplicativo de smartphone do cliente, que reconhece quando o veículo está em movimento. Então, através de seu aplicativo móvel KOBA, os clientes podem ver viagens, custos e documentos de política, quase instantaneamente.

Esse modelo de seguro de carro “pay-as-you-drive” (pague conforme você dirige) é uma mudança absoluta de paradigma.

Avançando para ser uma Empresa Orientada por Dados

Para entender melhor as necessidades do usuário e as tendências do mercado, e acelerar o tempo de lançamento no mercado, a KOBA precisava de um parceiro em nuvem experiente para modernizar seus dados. Eles precisavam de uma solução de dados personalizada que utilizasse dois serviços de código aberto específicos, Airbyte e Plotly, para receber e gerenciar dados no ambiente AWS.

Ao fazer isso, a equipe de desenvolvedores da KOBA estaria livre para passar mais tempo fazendo o que amam: produzindo novos recursos para a plataforma.

Dados em Tempo Real em um Ambiente Protegido

O primeiro passo para modernizar os dados da KOBA foi integrar todos os componentes de sua solução em um lago de dados. Isso incluía CRMs, Google Analytics, Social, sistemas pagos e outros.

A DNX projetou e implementou uma nova arquitetura de dados para atender aos requisitos de negócios da KOBA e às melhores práticas de mercado. A nova arquitetura inclui o Airbyte para absorver os dados, o Glue para extrair dados do DocumentDB da KOBA, um Data Warehouse de terceiros (DataBricks) e o Plotly para análise e relatórios. A equipe da DNX garantiu que os controles de segurança estivessem em vigor para restringir o acesso de acordo com funções e serviços, minimizando a chance de violações de dados. A DNX também garantiu que as soluções estivessem centralizadas e monitoradas, o que significa que eram simples de manter após a finalização do projeto.

A DNX configurou e integrou o Databricks da KOBA, que é usado para processar e transformar quantidades massivas de dados, além de explorar os dados por meio de modelos de aprendizado de máquina. Além disso, para permitir que a equipe da KOBA continue implantando seus aplicativos no futuro, a DNX criou um blueprint para pipelines do Airflow. Essa transferência de conhecimento, tão valorizada pela DNX, permite sustentabilidade contínua a partir do próprio negócio do cliente.

Serviços da AWS usado:

Buscando a Excelência no Atendimento ao Cliente e no Crescimento Acelerado

Agora, a KOBA tem uma única fonte de verdade (SSOT) que oferece a toda a equipe a capacidade de tomar decisões de negócios cruciais com base em dados mutuamente acessíveis. Isso significa que não há silos de trabalho impedindo as pessoas de acessar informações importantes. 

A KOBA pode obter insights de maneira mais rápida, simples e escalável, usando ferramentas com as quais estão familiarizadas, como o Data Bricks, tudo com o nível de segurança que precisam. O Databricks removeu a complexidade que eles experimentaram anteriormente, aumentando a facilidade com que visualizam dados por meio de painéis, permitindo que as equipes da KOBA acompanhem e prevejam vendas, além de gerar outros insights úteis. A compliance com os dados agora pode ser facilmente mantida e seus dados estão protegidos contra acesso não autorizado, roubo e outras violações de dados.

Conclusão

Em um mundo cada vez mais impactado pela tecnologia, a DNX oferece soluções personalizadas para qualquer empresa, independentemente de suas necessidades tecnológicas.

Para acompanhar o avanço constante da tecnologia, as empresas têm que estar preparadas para o que está por vir. Com a equipe experiente e inovadora da DNX, você pode ter certeza de encontrar a solução perfeita para suas necessidades comerciais únicas.

Como evidenciado no caso da KOBA, a modernização de dados não apenas melhora seus negócios imediatamente, mas também os prepara para trabalhar com as mudanças na indústria à medida que se desenvolvem. Não seja pego de surpresa pela próxima tecnologia disruptiva, entre em contato com a equipe de modernização de dados da DNX para preparar sua empresa para o futuro, hoje mesmo.

Descubra o valor dos dados

A eficacia de uma líderança depende do uso de dados para tomar decisões importantes, é preciso ter um olhar amplo com informações assertivas para ter ações significativas, assim é contruida uma estratégia de dados moderna para fornecer insights às pessoas e aplicações que precisam, com segurança e em qualquer escala.

A DNX Brasil ajuda sua empresa a aplicar análise de dados em seus casos de uso mais críticos para os negócios com soluções completas que precisam de experiência em dados. 

QuickSight vs Tableau for Data Analytics

Quicksight vs Tableau for Data Analytics. A Comprehensive Comparison

With so many tools available to improve business experiences, it can be difficult to know which will work best for your specific needs. Comparisons between the top competitors can save you significant resources before investing in tool purchases and training your team. Two well-known data analytics tools are Tableau and QuickSight, both of which offer a range of visualisations allowing you and your team to understand your data better. In a world where data is becoming more and more powerful, understanding the story your data tells is absolutely essential for future success.

Whilst all businesses are at different stages of their data modernisation journeys, those who invest in getting ahead now find themselves with a huge advantage over the competition. Data analytics has gone a long way since manually manipulating data in excel, and today a number of simplified platforms are available, meaning you don’t need a team full of data scientists in order to understand what’s going on around you. Tableau, founded in 2003, is now competing with QuickSight, rolled out in 2016. In this article we will comprehensively compare these two analytics tools, so you don’t have to.

Getting Started:

Unlike Tableau’s need for a desktop to create data sources, QuickSight has a range of options for data connectivity. Anyone can start viewing insights on QuickSight despite their level of training, so it allows for the whole team to understand what the data is saying. Tableau is not the easiest tool to navigate with many business users only benefitting from the tool after undertaking training. If you have a diverse team with varying technical knowledge, QuickSight is the right tool for you.

Management:

Tableau has two options for servers, Tableau Online and On-Premises Tableau servers. On-prem servers require dashboards to be developed by analysts and pushed to the server. In addition, they require provision of servers and infrastructure which can be costly to maintain, upgrade and scale. The Tableau Online option has support for a limited number of data sources and is plagued with a history of performance issues. QuickSight, on the other hand, is a cloud-native SaaS application with auto-scaling abilities. Content is browser based, meaning different version usage by clients and servers is inconsequential. In addition, QuickSight’s release cycles allow customers to use new functionality as they emerge with no need to upgrade the BI platform.

Speed and Innovation:

The use of local machines and self-managed servers inhibits Tableau’s ability to perform at great speed and often requires technology upgrades. QuickSight however, produces interactive visualisations in milliseconds thanks to its in-memory optimised engine SPICE. In regards to innovation, despite Tableau’s quarterly release cycle, most users only upgrade annually due to the complexity and costs involved. In contrast, QuickSight users can take advantage of the constant stream of new features as soon as they are released.

Cost and Scalability:

The cost difference between the two tools is so extreme that it is barely worth comparing. Tableau has three pricing options, all of which are required to be paid in full regardless of monthly usage. Tableau’s plans range from $15 to $70 per month. QuickSight is priced on a per-user basis and ranges from $5 to $28 per month. If a user goes a month without logging in, they pay nothing. In the most common scenario, QuickSight is 85% cheaper than Tableau.

The inflexible pricing plans offered by Tableau mean deciding to scale is a difficult call to make. In addition, as the amount of users and data increases so too do the overhead costs of maintaining the BI infrastructure. QuickSight, like all AWS products, is easily scalable and doesn’t require server management. Risk is reduced when experimenting with scaling thanks to QuickSight’s usage-based pricing model.

Security:

Customers utilising Tableau have some difficult decisions to make when it comes to security. Due to the deployment of agents/gateway to connect data on-premises or in Private VPCs, security levels are compromised. QuickSight allows customers to link privately to VPCs and on-premises data, protecting themselves from exposure through the public internet. With automatic back-ups in S3 for 11 9s durability and HA/multi-AZ replication, your data is safe with QuickSight.

Memory:

Tableau’s in-memory data engine Hyper, may be able to handle very large datasets, but it is no match to SPICE. SPICE by QuickSight has a constantly increasing row limit and QuickSight Q offers superior performance when it comes to integrating with RedShift and Athena to analyse large amounts of data in real time.

Sourcing and Preparing Data:

Although the frequency of data being stored on-premises is slowing, some companies are yet to undertake full data modernisation solutions and require access to on-prem locations. Tableau can handle this issue with access to data from sources such as HANA, Oracle, Hadoop/Hive and others. QuickSight, whilst primarily focussed on cloud based sources, also has the ability to connect to on-premises data through AWS Direct Connect. The growing list of databases available to QuickSight includes Teradata, SQL Server, MySQL, PostgreSQL and Oracle (via whitelisting). Tableau allows users to combine multiple data sources in order to prepare data for analysis through complex transformations and cleansing. QuickSight can utilise other AWS tools such as Glue and EMR to guarantee quality treatment of data. Beyond the two mentioned, there are multiple other ETL partners that can be accessed for data cleansing.

Dashboard Functionality and Visualisations:

Tableau has built-in support for Python and R scripting languages and offers a range of visualisation types as well as highly formatted reports and dashboards. QuickSight tends to be more popular in its visualisations, with over a dozen types of charts, plots, maps and tables available. The ease at which data points can be added to any analysis ensures clarity and allows comparisons to be made with the click of a button. Furthermore, machine learning enhances user experience by making suggestions based on the data being considered at the time.

Conclusion:

Whilst Tableau was an extremely innovative tool back when it was founded in 2003, it is no match to QuickSight. With the ability to connect to a full suite of software and platforms available within Amazon Web Services, QuickSight is so much more than a stand-alone tool. For businesses looking for a fast, scalable and easily understood data analytics tool, they cannot go wrong with QuickSight.

With the importance of data growing exponentially, it is no longer realistic to rely on the extensive knowledge of data scientists and analysts for everyday visualisations. QuickSight allows employees throughout the business to gain quick understanding of data points without having to wait for help from analysts. QuickSight is continually releasing new features to make the tool even more user friendly as time goes on.

Data Modernisation solutions offered by DNX frequently utilise QuickSight in order to provide clients with the most cost-effective, scalable and easy to use systems, increasing the power they have over their data.

DNX has the solutions and experience you need. Contact us today for a blueprint of your journey towards data security.

Harnessing the power of data in the Financial Sector

Harnessing the Power of Data in the Financial Sector

Digitisation has enabled technology to transform the financial industry. Advanced analytics, machine learning (ML), artificial intelligence (AI), big data, and the cloud have been embraced by financial companies globally, and the use of this technology brings an abundance of data.

When it comes to FinTech, pace is paramount. The more accurate trends and predictions are, the more positive the outcomes will be. Data-driven decision making is key.

How Data Can Benefit the Financial Industry

Today, FinTech businesses must be data-driven to thrive, which means treating data as an organisational asset. The collection and interpretation of data enable businesses to gain quick and accurate insights, resulting in innovation and informed decision-making.

It is recommended to set up business data in a way that provides easy access to those who need it. 

Finance and Big Data

The compilation of globally collected data, known as Big Data, has had fascinating effects on the finance industry. As billions of dollars move each day, Big Data in finance has led to technological innovations, transforming both individual businesses and the financial sector as a whole.

Analysts monitor this data each day as they establish predictions and uncover patterns. In addition, Big Data is continuously transforming the finance industry as we know it by powering advanced technology such as ML, AI, and advanced analytics..

The Influence of ML on the Market

Powered by big data, ML is changing many aspects of the financial industry, such as trade and investments, as it accounts for political and social trends that may affect the stock market, monitored in real-time.

ML powers fraud detection and prevention technologies, reducing security risks and threats. Additiontionally,  it provides advances in risk analysis, as investments and loans now rely on this technology.

Despite all the gains made so far, the technologies powered by advanced machine learning continue to evolve.

Security and Data Governance

The cost of data breaches are increasing. In 2021, the financial sector had the second-highest costs due to breaches, behind only healthcare. The technology sector was the fourth most affected, meaning the risk of breaches for FinTech organisations is high.

Data governance is necessary to mitigate risks associated with the industry, which means many companies are required to undergo data modernisation. Businesses must ensure all data is secure and protected and suspicious activity is detected and flagged, in line with strict government standards.

Taking the first steps

The journey to data modernisation offers benefits that far exceed the initial cost of investment, though the process to accreditation can be daunting. The journey begins with building strategies from clear objectives, then mapping the plan, migrating data, implementing cloud tools, and beyond.

To simplify the initial steps towards compliant data modernisation, DNX Solutions has prepared a guide to help FinTech businesses modernise their data. Click here to view the 8 steps you need to take to prepare for your Data Modernisation journey.

DNX has the solutions and experience you need. Contact us today for a blueprint of your journey towards data security.

canibuild

canibuild Data Modernisation Journey

canibuild

canibuild is a game-changer for the construction industry. After 20 years of facing the same problems over and over again, Timothy Cocaro founded canibuild to take the hassle out of building.

With canibuild, builders and their clients can see what can be constructed on their parcel of land in just minutes, in a virtual, easy-to-understand way. canibuild uses AI-Powered technology to tap into multiple real-time data sources such as high-resolution aerial imagery, local city, and county government data sets, and codification of planning rules – removing the typical “over the fence” site assessment, hand-drawn plans, and estimates. canibuild is customised for each subscriber, with individual floor plans, branding, and costs uploaded onto the platform, allowing subscribers to provide branded plans, flyers, reports, and estimations instantly, condensing outdated practices that would traditionally take weeks. It is a true one-stop-shop where users can instantly site a build, check typography, and request reports to determine build feasibility, site costs, generate site plans, check compliance and produce quotes for homes, pools, granny flats, ADU’s, sheds and more… all in just minutes! 

canIbuild is currently available in Australia, New Zealand, Canada and the United States

The Business Challenge

Due to rapid expansion, canibuild required an experienced cloud-native partner to transform its complex cloud platform to sustain and capacitate for their growth by unlocking new data and analytics functionalities. One of the major challenges was to create a Single Source Of Truth (SSOT), which involves integrating different types of data into one central location as opposed to the various data sources from which they were being collected. Among the required data for canibuild is geospatial data, a time-based data that is related to a specific location on the Earth’s surface. This data can provide insights into relationships between variables, revealing patterns and trends.

Delivering DataOps and Data Analytics to Grow the canibuild Business

The DNX team built a platform by implementing a DataOps approach consisting of a collection of technical practices, workflows, cultural norms, and architectural patterns that enable:

  • Rapid innovation and experimentation delivering new insights to customers with increasing velocity
  • Extremely high data quality and very low error rates
  • Collaboration across complex arrays of people, technology, and environments
  • Clear measurement, monitoring, and transparency of results

 

The developed data platform combines modern SaaS ingestion tools (StitchData) and DbT, AWS data services including Data Lake (S3 + Glue Catalog + Athena), Glue ETL, MWAA for orchestration, DMS for near-real-time replication, DynamoDB for control tables and Cloudwatch events for scheduling.

canibuild infrastructure

Real-time Assertive Data

After a complex process in which all relevant data were collected, sorted, and stored in one location, canibuild now has real time insights allowing their team to access the same information. The team can now predict future trends, maximise opportunities and work towards realistic goals and objectives to continue growth.

Through our knowledge transfer DNX equipped the canibuild team with knowledge on how to provision a new logical environment for its product:

  • Terraform projects
  • Terraform variables configuration
  • DMS configurations
  • Database importer/exporter
  • MWAA and how to create new DAGs
  • How to troubleshoot Airflow

Data Modernisation Outcome

With the creation of an SSOT and the transfer of all data into a central location, canibuild teams can now access the data they need sooner than ever before, allowing them to respond quickly and efficiently to their clients. Improved data analytics enables them to access real time insights and make more accurate predictions; a valuable asset in current times plagued by uncertainty. Furthermore, thanks to the simplification of the platform by DNX, canibuild’s engineers now have time to spare, allowing them to work on what they do best: producing new features!

To see your business soar towards the future with open arms, contact DNX today and learn how you can benefit from data modernisation.

Descubra o valor dos dados

A eficacia de uma líderança depende do uso de dados para tomar decisões importantes, é preciso ter um olhar amplo com informações assertivas para ter ações significativas, assim é contruida uma estratégia de dados moderna para fornecer insights às pessoas e aplicações que precisam, com segurança e em qualquer escala.

A DNX Brasil ajuda sua empresa a aplicar análise de dados em seus casos de uso mais críticos para os negócios com soluções completas que precisam de experiência em dados. 

Plutoras Data and Digital Modernisation journey

Plutora’s Data and Digital Modernisation Journey

About Plutora

Plutora offers value stream management solutions that help companies with release, test environment and analytics solutions for enterprise IT.

Among Plutora’s clients are global organisations typically in healthcare, Fintech and telecommunications, all of which are highly regulated and require tools to maintain compliance. In addition, clients in these industries require predictable software delivery due to high risk tolerance.

The Business Challenge

Although Plutora generates great value to their customers, they were looking for a partner that could assist them in decreasing the complexity of their data infrastructure. They wanted a new architecture based on the best practice of the industry, including automating their processes and modernising their multiple .Net applications due to the approaching end of support. Achieving these goals would allow Plutora to evolve and award them the agility needed to launch new features.

Data and Digital Modernisation Discovery

The DNX Digital and Data team performed a comprehensive Windows and data discovery on Plutora’s workloads which involved a kick-off, followed by a sequence of intense activities. The discovery was concluded with a workshop showcase where the team presented a roadmap stating areas of improvements for the existing solution and a modernisation to be executed afterwards to enable Plutora to achieve its objectives.

Solution

DNX proposed a four phase engagement plan to modernise Plutora’s data & analytics workloads. 

Plutora Data Project

In Phase 1, DNX validated the use of temporal tables in SQL Server to enable CDC for the ETL process. This was to improve estimation accuracy for Phase 4.

In phase 2, DNX began delivering early benefits of the modernisation project by using the SQL Server replica DB for the ETL extraction and refactoring the existing SQL Server scripts to extract incremental data only.

This reduced performance impact on the application whilst enabling a higher number of ETL queries to run in parallel, thus reducing the overall time for the ETL execution. 

In phase 3, DNX removed the complexity and modernised the ETL platform by implementing Managed Workflows for Apache Airflow (MWAA) to replace the Node App orchestrator, implementing DMS to replicate data between the SQL Server DW and the Postgres DW and Decommissioning of the Node App orchestrator. 

In the final phase, the ETL to ELT modernisation was completed.

Data Modernisation outcome

DNX delivered a data modernisation solution to Plutora that began seeing benefits quickly through a number of avenues:

 

Cost Reduction 

Plutora experienced a 30% cost reduction with the Migration of SQL Server to RDS and decommissioned redundant components as well as no cost for utilising Windows licences

 

Near Real-Time Data

The time for Data to become available for reporting was reduced from 20 minutes to just 4.

 

Simplicity

Replacing an ELT system built in-house with open source project makes Plutora more attractive to IT personnel and assists in retaining such talents.Further simplicity was achieved through reducing the number of layers on the solution resulting in reduced cost and accelerated delivery. In addition, FTE was reduced to maintain and patch servers and DB. 

 

Evolvability

A number of positive changes can now be enjoyed by Plutora, such as the removal of technical debt and decoupling from vendor and the ability to undertake agile practices due to modern practices within Data & Analytics. The data strategy has created a Single Source of Truth which allows Plutora to benefit from Machine Learning, and the merging of all logic to an application layer reduces time to change and deploy.

Conclusion

With clients who require the most up-to-date technical support, Plutora is in a position where data modernisation is absolutely crucial. With a more simplified and adaptable infrastructure, they are now able to offer the best services to their clients across the globe.

Descubra o valor dos dados

A eficacia de uma líderança depende do uso de dados para tomar decisões importantes, é preciso ter um olhar amplo com informações assertivas para ter ações significativas, assim é contruida uma estratégia de dados moderna para fornecer insights às pessoas e aplicações que precisam, com segurança e em qualquer escala.

A DNX Brasil ajuda sua empresa a aplicar análise de dados em seus casos de uso mais críticos para os negócios com soluções completas que precisam de experiência em dados. 

Reinventing myDNA Business with Data Analytics

About myDNA

myDNA is a health tech company bringing technology to healthcare with a mission to improve health worldwide. They developed a personalised wellness myDNA test that lets you discover how your body is likely to respond to food, exercise, sleep, vitamins, medications, and more, according to your genome.

It is a life changer for those who want to skip the lengthy trial and error process, and achieve their desired fitness goals sooner. Moreover, myDna is a reliable way of assisting practitioners in selecting safe and effective medications for their patients based on their unique genetic makeup. For example, doctors can prescribe antidepressants, and post-surgery pain killers that are more likely to be successful in the first instance.

The most exciting part is that this technology, which has historically been so expensive, is now available at an affordable price for normal people like you and me! Not to mention, finding out you have relatives on the other side of the world through a family matching DNA test is pretty cool!

Providing life health services based on accurate data

After replatforming myDNA IT systems from a distributed monolithic database to a microservice architecture, the team needed assistance in delivering automated tools and meaningful insights through the business. This would give them an understanding of potential areas and markets to expand their services, the agility to move and change fast as a business and, provide an advantage over competitors by delivering the services, products, and customer experience their customers seek. This is all based on data rather than assumptions.

myDNA was seeking a cloud consultant that could assist them in exploring and understanding events by expanding their data and analytics capabilities. In addition, the business planned to increase their data skills so their in-house IT team would be able to maintain and continue building the new applications in a safe and effective environment.

AWS performed a Data Lab with myDNA stakeholders where they co-designed a technical architecture and built a Pilot to start the journey. This gave the myDNA team an understanding of all the AWS cloud data and analytics solutions available. However, they required a personalised and well-designed technology roadmap taking their IT skills and myDNA business goals into consideration, as opposed to a ‘one solution fits all’ strategy. This is exactly what DNX Solutions delivered!

How did DNX Solutions help myDNA establish a modern security data strategy in just one month?

The project started with DNX’s effective and interactive discovery where our team identified the company’s needs, had a complete picture of the existing company data, the architecture used, potential technological and/or team challenges. With that, our team created a clear road map where outcomes were evident even before the conclusion of the project

project road map

In the initial phase, DNX built the MVP using AWS Console, more general roles, data sources, and built simple reports and dashboards to present basic metrics.
After that, our data cloud experts built a more robust solution fit for production, with a focus on resilience, performance, reliability, security and cost optimisation using Devops methodology, CI/CD pipelines, automation and serverless architecture whenever possible.

Once the core platform was established, we brought more data sources, integrating them into the solution, and helped to build more complex and advanced solutions such as Machine Learning.

AWS Services Used

S3 Datalakes
Raw: hosts the data extracted allowing governance, auditability, durability and security controls

DynamoDB / SSM
Stores configuration tables, parameters, and secrets used by the pipeline and ETL Jobs to automate the data process

Crawlers
Crawlers can scan the files in the datalake or databases, infer the schema and add the tables on the data catalogues

Glue ETL
Serverless Spark solution for high performance ETL jobs within AWS

Data Catalogues
Stores the metadata and metrics regarding Databases, Connections, Jobs, partitions, etc. It can grant/deny access up to the table level

Quicksight
Can consume data from multiple sources within AWS and allow user-friendly development of reports, analytics and dashboards integrated with AWS platform

Lake Formation
Low code solution to govern and administer the Data Lake. An additional layer of security including row/column level controls

Lambdas
Wild cards that can help tie the solution together in a variety of roles and use cases

Athena
Athena can query data stored in S3 using simple SQL. It allows access segregation to metadata and history via workgroups, which can be compounded with IAM roles

myDNA to provide real insights at the click of a button

There is no doubt that DNX Solutions delivered value to myDNA. The team reported they were able to deliver another data transformation that depended directly on the result of DNX’s work.

Before engagement with DNX, the myDNA team could take three to five days to deliver a few manual reports in response to business queries. The company now is able to deliver different reports based on live data with just a click of a button. Not only does the business have accurate insightful data to make their decision of what, when, and where they should invest, but they also have the agility to make these decisions.

The myDNA team can now focus on what they do best rather than spending days merging unreliable information from various sources to produce a handful of outdated reports.

The next step for myDNA is to adopt AWS machine learning to unveil predictions, achieving far better real-world results.

Descubra o valor dos dados

A eficacia de uma líderança depende do uso de dados para tomar decisões importantes, é preciso ter um olhar amplo com informações assertivas para ter ações significativas, assim é contruida uma estratégia de dados moderna para fornecer insights às pessoas e aplicações que precisam, com segurança e em qualquer escala.

A DNX Brasil ajuda sua empresa a aplicar análise de dados em seus casos de uso mais críticos para os negócios com soluções completas que precisam de experiência em dados. 

Data Archiving utilizing Managed Workflows for Apache Airflow

Data Archiving utilising Managed Workflows for Apache Airflow

We assisted a Fintech client to minimize its storage cost by archiving its data from RDS (MySQL) to S3 using an automated batch process, where all data from a specific time range should be exported to S3. Once the data is stored on S3 the historical data can be analyzed using AWS Athena and Databricks. The solution should include a delete strategy to remove all data older than two months.

Currently, the database size has increased exponentially with the number of logs that are stored in the database, this archive procedure should have a minimal impact on the production workload and be easy to orchestrate, for this specific data archiving case we are handling tables with more than 6 TB of data which should be archived in the most efficient manner, part of this data will no longer be necessary to be stored on the database.

In this scenario, Managed Workflows for Apache Airflow (MWAA), a managed orchestration service for Apache Airflow, helps us to manage all those tasks. Amazon MWAA fully supports integration with AWS services and popular third-party tools such as Apache Hadoop, Presto, Hive, and Spark to perform data processing tasks.

In this example, we will demonstrate how to build a simple batch processing that will be executed daily, getting the data from RDS and exporting it to S3 as shown below.

Export\Delete Strategy:

  • The batch routine should be executed daily
  • All data from the previous day should be exported as CSV 
  • All data older than 2 months should be deleted

Solution

  • RDS – Production database 
  • MWAA – (to orchestrate the batches)
  • S3 bucket – (to store the partitioned CSV files) 
Data Archiving utilizing Managed Workflows for Apache Airflow solution

As shown in the architecture above, MWAA is responsible for calling the SQL scripts directly on RDS, in Airflow we use MySQL operator to execute SQL scripts from RDS.

To encapsulate those tasks we use an Airflow DAG.

Airflow works with DAGs, DAG is a collection of all the tasks you want to run. A DAG is defined in a Python script, which represents the DAGs structure (tasks and their dependencies) as code.

In our scenario, the DAG will cover the following tasks:

  • Task 1 – Build procedure to export data 
  • Task 2 – Execute procedure for export
  • Task 3 – Build procedure to delete data 
  • Task 4 – Execute delete procedure 

Airflow DAG graph

Airflow DAG graph

Creating a function to call a stored procedure on RDS

EXPORT_S3_TABLES = {
"id_1": {"name": "table_1", },
"id_2": {"name": "table_2" },
"id_3": {"name": "table_3"},
"id_4": {"name": "table_4"}
}

def export_data_to_s3(dag, conn, mysql_hook, tables):
tasks = []
engine = mysql_hook.get_sqlalchemy_engine()
with engine.connect() as connection:
for schema , features in tables.items():
run_queries = []
t = features.get("name") #extract table name
statement = f'call MyDB.SpExportDataS3("{t}")'
sql_export = (statement).strip()
run_queries.append(sql_export)
task = MySqlOperator(
sql= run_queries,
mysql_conn_id='mysql_default',
task_id=f"export_{t}_to_s3",
autocommit = True,
provide_context=True,
dag=dag,
)
tasks.append(task)
return tasks

To deploy the stored procedure we can use MySQL Operator that will be responsible for executing the “.sql” files as shown below

build_proc_export_s3 = MySqlOperator(dag=dag,
                           mysql_conn_id='mysql_default', 
                           task_id='build_proc_export_to_s3',
                           sql='/sql_dir/usp_ExportDataS3.sql',
                           on_failure_callback=slack_failed_task,
                           )

Once the procedure has been deployed we can execute it using mysqlhook which will execute the stored procedure using the export_data_to_s3 function.

t_export = export_data_to_s3(dag=dag,
                        conn="mysql_default",
                        mysql_hook=prod_mysql_hook,
                        tables=EXPORT_S3_TABLES,
                        )

MWAA will orchestrate each SQL script that will be called on RDS, 2 stored procedures will be responsible for exporting and deleting the data consecutively. With this approach, all intensive work (read/process data) will be handled by the database and Airflow will work as an orchestrator for each event.

In addition, Aurora MySQL has a built-in function (INTO OUTFILE S3) that is able to export data directly to S3, that way we do not need another service to integrate RDS with the S3, the data can be persisted directly on the bucket once the procedure is called.
​​
E.g: INTO OUTFILE S3

SELECT id , col1, col2, col3 
FROM table_name  	 
INTO OUTFILE S3 's3-region-name://my-bucket-name/mydatabase/year/month/day/output_file.csv' 
FORMAT CSV HEADER 
FIELDS TERMINATED BY ' ,' 
LINES TERMINATED BY '\n'  
OVERWRITE ON; 

With this function we don’t need to handle the data with python scripts from Airflow, the data will be totally processed by the database and won’t be necessary create data transformation to output the data as CSV.

Conclusion

 

Airflow is a powerful tool that allows us to deploy smart workflows using simple python code. With this example, we demonstrated how to build a batch process to move the data from a relational database to S3 in simple steps.

There are an unlimited number of features and integrations that can be explored on MWAA, if you need flexibility and easy integration with different services (even non-AWS services), this tool can likely meet your needs.

DNX has the solutions and experience you need. Contact us today for a blueprint of your journey towards data engineering.

Data Breach 2022

What is the Real Cost of a Data Breach in 2022?

Did Data Breaches increase in 2021?

One of the biggest changes that occurred as a result of the COVID-19 pandemic is the way in which we work. Whilst remote work began as a temporary fix to deal with lockdowns, it is a shift that has been embraced by numerous businesses over the past two years. Such a sudden change, however, was not free of risk. The unpredictability of recent years has seen a focus on survival, with security falling by the wayside. And while we are all distracted by global happenings, hackers have been taking advantage.

Data breaches and the costs associated with them have been on the rise over the past several years, but the average cost per breach jumped from US$3.86 million in 2020 to US$4.24 million in 2021, becoming the highest average total cost seen in the history of IBM’s annual Data Breach report. Remote working is not solely to blame for increased data breaches, however, companies that did not implement any digital transformation changes in the wake of the pandemic had a 16.6% increase in data breach costs compared to the global average. For Australian companies, it is estimated that 30% will fall victim to some sort of data breach, and consequences can be felt for years. The Australian Cyber Security Centre (ACSC) estimates the cost of cybercrimes for Australian businesses and individuals was AU$33 billion in 2021. To protect your business from becoming a part of these statistics, it is crucial to understand how data breaches can affect you and how to take necessary precautions.

What exactly is a data breach?

Data breaches are diverse; they can be targeted, self-spreading or come from an insider; affect individuals or businesses; steal data or demand ransoms. Although certain Australian businesses are mandated by law to notify customers when a breach has occurred, many attacks are kept quiet, meaning their frequency is higher than commonly believed.

What are the different types of data breaches?

  • Scams/phishing: Fraudulent emails or websites disguised as a known sender or company.
  • Hacking: Unauthorised access gained by an attacker, usually through password discovery.
  • Data spill: Unauthorised release of data by accident or as a result of a breach.
  • Ransomware: Malicious software (malware) accesses your device and locks files. The criminals responsible then demand payment in order for access to be regained.
  • Web shell malware: Attacker gains access to a device or network, a strategy that is becoming more frequent.

The most common category of sensitive data stolen during data breaches is the Personal Identifiable Information (PII) of customers. This data not only contains financial information such as credit card details, but can also be used in future phishing attacks on individuals. The average cost per record is estimated between US$160 and US$180, meaning costs can add up very quickly for a business that loses thousands of customers’ PII in a single attack. All industries can be affected by data breaches, but those with the highest costs are healthcare, financials, pharmaceuticals and technology. According to the 2021 IBM report, each of these industries had a slight decrease in costs associated with data breaches from 2020 to 2021, except for healthcare which increased by a shocking 29.5%

What are the costs?

IBM identified the ‘Four Cost Centres’ which are the categories contributing most global data breach costs. In 2021 the costs were: Lost business cost (38%), Detection and escalation (29%), Post breach response (27%), Notification (6%).

Lost business, the highest cost category for seven consecutive years, includes business disruption and loss of revenue through system downtime (such as delayed surgeries due to ransomware in hospitals), lost customers, acquiring new customers, diminished goodwill and reputation losses.

Detection and escalation costs refer to investigative activities, auditing services, crisis management and communications.

Post breach response costs are associated with helping clients recover after a breach, such as opening new accounts and communicating with those affected. These also include legal expenditures, and, with compliance standards such as HIPAA and CDR becoming more commonplace, regulatory fines are adding significantly to costs in this category. Businesses with a high level of compliance failures are spending on average 51.1% more on data breaches than those with low compliance failures.

Notification costs include communications to those affected and regulators, determination of regulatory requirements and recruiting the assistance of experts. In Australia, businesses and not-for-profits with an annual turnover of more than $3 million, government agencies, credit reporting bodies and health service providers are required by law to inform customers of data breaches and how they can protect themselves from such breaches. It is crucial for businesses to be aware of these responsibilities or they may be subjected to paying further fines.

With lost business being the highest cost associated with breaches, it is no surprise that consequences can be felt years after the initial breach. Reports have found 53% of costs to be incurred two to three years after the breach for highly regulated industries such as healthcare and financial services.

Although significantly less than the global average, the average cost of a data breach in Australia still sits at around AU$3.35 million. Approximately 164 cybercrimes are reported each day in Australia and the attacks are growing more organised and sophisticated. One predictive factor of overall costs is the response time: the longer the lifecycle of a data breach, the more it will cost. Whilst a hacker can access an entire database in just a few hours, detecting a breach takes the average Australian organisation over six months! Many organisations never even identify that a breach has occurred, or find out through victory posts on the dark web. IBM reported that breaches contained in over 200 days cost a business US$1.26 million more than those contained in under 200 days. In addition, they found the average data breach lifecycle was a week longer in 2021 compared to the previous year.

How to avoid data breaches?

The way to protect your business against malicious use of advanced and sophisticated technology is by utilising advanced and sophisticated technology in your security systems. IBM found significantly lower overall costs for businesses with mature security postures, utilising zero trust, cloud security, AI and automation. It is estimated that with AI and machine learning, breaches are detected 27% faster. Mature zero trust systems also resulted in savings of US$1.76 million compared to organisations not utilising zero trust. Organisations with mature cloud modernisation contained breaches 77 days faster than other organisations, and those with high levels of compliance significantly reduced costs.

With data breaches on the rise, and modern businesses relying on technology more heavily than ever before, it is reasonable to predict the cost of data breaches in Australia will only increase in 2022. You can avoid becoming a victim and having to pay the price for years to come by modernising your data and meeting industry compliance regulations.

DNX has the solutions and experience you need. Contact us today for a blueprint of your journey towards data security.

Data Dependency

Data Dependency

The Importance of Data Dependency

Why not investing in data platforms is setting your company up for disaster.

Companies with legacy systems or workloads face one of three problems more often than not. Maybe your company has already experienced issues with time to market, bugs in production or limited task coverage due to a lack of confidence in releasing new features. These are the issues usually picked up upon by the CTO or a technical leader who recognises the need to invest in architecture to increase the quality and speed of progress. There is, however, another important underlying problem that no one seems to be talking about.

How and where are you storing your data?

Looking at a typical legacy system these days, it is likely to be a Java, .Net application using a relational database storing a huge amount of data; we have seen companies with tables containing up to 13 years of data!

When we ask customers why they keep all their data on the same database, they rarely have an answer. Often, old or irrelevant data has been retained without reasons, but simply because it has been forgotten about and ended up getting lost among the masses.

With consistently increasing amounts of data comes consistently increased response times for querying information from the database. Whilst it may not be noticable day to day, it could lead to serious consequences, such as losing valuable time and revenue whilst waiting for a backup to restore after a database outage.

It is puzzling to think that whilst we have our best minds considering so much down to the minute details, we largely ignore the way in which we store data. It seems we have a collective ‘out of sight, out of mind’ attitude.

It is extremely common to come across companies that are generating reports from a single database. But here’s the interesting part: each software is unique in terms of security and operation, meaning storage is different for each and every one. Let’s consider an ecommerce store. In this case you would want to organise your tables and data in a way that allows users to easily add items to their shopping cart, place an order, and pay. To make this possible, you would have a shopping cart table, orders table, and products table, which is what we call normalising the database – a relational database. So far, so good. Now let’s look at what happens when you want to run a report. To fully understand your ecommerce business you will want to see your data in various ways, for example, number of sales in NSW in the last seven days; average shopping cart price; average checkout amount; average shipping time frame. Each of these scenarios require data from multiple sources, but by keeping all your data on the same database you are risking the whole operation.

Just as you may lose deals if customers have to wait five seconds to add an item to their shopping cart, you also lose valuable resources while waiting an hour to generate a report – something that is not uncommon to see on legacy applications. Not to mention the direct and indirect consequences of having to wait hours to restore a backup after a database outage (that is, if you even have a backup!).

By choosing not to modernise your data, your business is perched squarely on a ticking time bomb. With a typical ratio of 15 – 20 Developers to 1 Database Administrator (DBA), the DBA is without a doubt the underdog. If the DBA’s suggestions are ignored, developers may begin to modernise their source code and adopt microservices whilst the company’s data in its entirety.

So what happens next?

They might use a database and probably they will use a database.  They could use something else. They have to manage states.

Instead of having separate tables for your shopping cart, orders, and products; you now have a product microservice with its own product table, far from the customer microservice and its customer table, which is located in a different database. In addition, the shopping cart may now be in a no sequence database.

Now comes the time to run your reports, but you can no longer do a SELECT in a database and join all the tables because the tables are unreachable. Now you find yourself with a whole host of different problems and a new level of complexity.

Consider the data dimension to fully modernise your application

Now that you understand the importance of data modernisation, you need to know a few key points. To take full advantage of the cloud when modernising your architecture and workloads, you have to find out which tools the cloud has to offer. First, you need to understand that Microservices have to manage states and will likely use a database to do so, due to transactional responsibility. For example, when you create a new product for your ecommerce store, you want it to exist until you actively decide to discontinue it, so you don’t want the database to forget about it – we refer to this as durability.

Consistency is equally important; for example, when you market the product as unavailable, you do not want it to be included in new orders. This is a transactional orientation.

Now we need to understand the analytical view. In order to see how many products you are selling to students in year 8 to year 12 you need to run a correlation between the products, customers and orders. This requires you to have a way of viewing things differently. Most companies choose to build a data warehouse where they can store data in a way that enables them to slice and change the dimension they are looking at. Whilst this is not optimal for transactional operations, it is optimal for analytical operations.

That segregation is crucial. If you build that, you can keep your Microservices with multiple different databases in one state or multiple states in an architecture that is completely decoupled from an analytical data warehouse facility that enables and empowers the business to understand what is happening in the business.

This is hugely important! Operating without these analytical capabilities is like piloting a plane with no radio or navigating systems: you can keep flying but you have no idea where you are going, nor what is coming your way! This analytical capability is crucial to the business but you have to segregate that responsibility. Keeping your new modernised architecture independent from your data warehouse and analytical capability is key.

So, where do we go from here? Utilising Data Lakes

DNX assisted companies enjoying high levels of success through the adoption of data lakes. A data lake can contain structured and unstructured data as well as all the information you need from Microservices, transactional databases and other sources. If you want to include external data from the market today, such as fluctuations in oil prices – go ahead! You can input them into the data lake too! You should take care to extract and clean your data if you can before putting it into the data lake, as this will make its future journey smoother.
Once all your data is in the data lake, you can then mine relevant information and input it in your data warehouse where it can be easily consumed.

Data modernisation can save your company from impending disaster, but it is no small feat!
Most people assume it is as simple as breaking down a monolithic into microservices, but the reality is far more complex.

When planning your data modernisation you must consider reporting, architectural, technical and cultural changes, as well as transactional versus analytical responsibilities of storing stages, and their segregation. All of this becomes a part of your technological road map and shows you the way to a more secure future for your business.

If you would like to know how we have achieved this for multiple clients, and can do the same for you

Na DNX Brasil, rabalhamos para trazer uma melhor experiência em nuvem e aplicações para empresas nativas digitais.

Trabalhamos com foco em AWS, Well-Architected Solutions, Containers, ECS, Kubernetes, Integração Contínua/Entrega Contínua e Malha de Serviços.

Estamos sempre em busca de profissionais experiêntes em cloud computing para nosso time, focando em conceitos cloud-native.

Confira nossos projetos open-souce em https://github.com/DNXLabs e siga-nos no Twitter, Linkedin or YouTube.

DbT and Redshift to provide efficient Quicksight reports

Using DbT and Redshift to provide efficient Quicksight reports

DbT and Redshift to provide efficient Quicksight reports

TL;DR:

Using Redshift as a Data Warehouse to integrate data from AWS Pinpoint, AWS DynamoDB, Microsoft Dynamics 365 and other external sources. 

Once the data is ingested to Redshift, DbT is used to transform the data into a format that is easier to be consumed by AWS Quicksight.

Each Quicksight report/chart has a fact table. This strategy allows Quicksight to efficiently query the data needed.

The Customer

The client is a health tech startup. They created a mobile app that feeds data to the cloud using a serverless architecture. They have several data sources and would like to integrate this data into a consolidated database (Data Warehouse). This data would then be presented in a reporting tool to help the business drive decisions. The client’s data sources:

  • AWS DynamoDB – User preferences
  • AWS Pinpoint – Mobile application clickstream
  • Microsoft Dynamics 365 – Customer relationship management
  • Stripe – Customer payments
  • Braze – A customer engagement platform

The client also needs to send data from the Data Warehouse to Braze, used by the marketing team to develop campaigns. This was done by the client, using Hightouch Reverse ETL.

The Solution

Overall Architecture Dbt

The overall architecture of the solution is presented in Figure 1. AWS Redshift is the Data Warehouse, which receives data from Pinpoint, DynamoDB, Strip and Dynamics 365. Quicksight then queries data from Redshift to produce business reports. Following, we will describe each data source integration. As a Cloud-native company, we work towards allowing our clients to easily manage their cloud infrastructure. For that reason, the infrastructure was provisioned using Terraform. Terraform allowed the client to apply the same network and data infrastructure in their 3 different environments with ease.

DynamoDB

The users’ preferences are stored on AWS DynamoDB. A simple AWS Glue job, created using Glue Studio, is used to send DynamoDB data to Redshift. It was not possible to use the COPY command from Redshift as the client’s DynamoDB contains complex attributes (SET). The job contains a 5-line custom function to flatten the JSON records from DynampoDB, presented in Table 1. For Glue to access DynamoDB tables we needed to create a VPC Endpoint

def MyTransform (glueContext, dfc) -> DynamicFrameCollection:
	df = dfc.select(list(dfc.keys())[0])
	dfc_ret = Relationalize.apply(frame = df, staging_path = "s3://bucket-name/temp", name = "root", transformation_ctx = "dfc_ret")
	df_ret = dfc_ret.select(list(dfc_ret.keys())[0])
	dyf_dropNullfields = DropNullFields.apply(frame = df_ret)
	return(DynamicFrameCollection({"CustomTransform0": dyf_dropNullfields}, glueContext))

Pinpoint

The mobile app clickstream is captured using AWS Pinpoint and stored on S3 using an AWS Kinesis delivery stream. There are many ways to load data from S3 to Redshift. Using COPY command, a Glue Job or Redshift Spectrum. We decided to use Redshift Spectrum as we would need to load the data every day. Using Spectrum we can rely on the S3 partition to filter the files to be loaded. The pinpoint bucket contains partitions for Year, Month, Day and Hour. At each run of our ELT process, we filter S3 load based on the latest date already loaded. The partitions are automatically created using Glue Crawler. Glue Crawler also automatically parse JSON into struct columns types. Table 2 show a SQL query that illustrates the use of Spectrum partitions.


select
  event_type,
  event_timestamp,
  arrival_timestamp,
  attributes.page,
  attributes.title,
  session.session_id as session_id,
  client.cognito_id as cognito_id,
  partition_0::int as year,
  partition_1::int as month,
  partition_2::int as day,
  partition_3::int as hour,
  sysdate as _dbt_created_at
  from pinpoint-analytics.bucket_name

  -- this filter will only be applied on an incremental run
  where 
  partition_0::int >= (select date_part('year', max(event_datetime)) from stg_analytics_events)
  
  and partition_1::int >= (select date_part('month', max(event_datetime)) from stg_analytics_events)
  
  and partition_2::int >= (select date_part('day', max(event_datetime)) from stg_analytics_events)

Microsoft Dynamics 365 and Stripe

Two important external data sources required in this project are CRM data from Dynamics and Payment information from Stripe. An efficient and user-friendly service that helps with data integration is Fivetran. Fivetran has more connectors than other tools, including Microsoft Dynamics and Stripe. Fivetran provides such a connector and has an easy to use interface which was essential for this client.

DbT – ELT FLow

The client wanted a data transformation tool that was scalable, collaborative and that allowed version control. DbT was our answer. As we have seen in many other clients, DbT has been the first answer when it comes to running ELT (Extract, Load, Transform) workflows. After we built the first DAGs (Directed Acyclic Graph) with DbT, using Jinja template for raw tables (source) and staging table (references) and showed it to the client, they were amazed by the simplicity and software engineering way that DbT works. Having an ELT workflow that is source controlled is a very unique feature from DbT.

In DbT, the workflow is separated into different SQL files. Each file contains a partial staging transformation of the data until the data is consolidated into a FACT or DIMENSION table. These final tables are formed by one or more staging tables. Using the Jinja templates to reference tables between each other allows DbT to create a visual representation of the relationships. Figure 2 presents an example of a DbT visualization. DbT allowed us to create tables that could be efficiently queried by Quicksight.

Dbt Visualisation

Quicksight

 

Once the data is organised and loaded into Redshift, it is time for visualising it. AWS Quicksight easily integrates with Redshift and several other data sources. It provides a number of chart options and allows the clients to embed their reports in their internal systems. For this client, we use Bar charts, Pie charts, Line charts and a Sankey diagram for customer segment flow. The client was very happy with the look and feel of the visualizations and with the loading speed. Some minor limitations from Quicksight include a) not being able to give a title to multiple Y-axis and b) making the Sankey diagram follow the dashboard theme. Except that, it allowed us to reach a great improvement in the client’s ability for data-driven decision making.
A great next step regarding the Quicksight would be to implement QuickSight object migration and version control from staging to production environments.

Conclusion

In this article, we described a simple and efficient architecture that enabled our client to obtain useful insights from their data. Redshift was used as the central repository of data, the Data Warehouse, receiving ingestion from several data sources such as Pinpoint, DynamoDB, Dynamics and Stripe. DbT was used for the ELT workflow and Quicksight for the dashboard visualisations. We expect to be using this same architecture for clients to come as it provides agile data flows and insightful dashboards.

Na DNX Brasil, rabalhamos para trazer uma melhor experiência em nuvem e aplicações para empresas nativas digitais.

Trabalhamos com foco em AWS, Well-Architected Solutions, Containers, ECS, Kubernetes, Integração Contínua/Entrega Contínua e Malha de Serviços.

Estamos sempre em busca de profissionais experiêntes em cloud computing para nosso time, focando em conceitos cloud-native.

Confira nossos projetos open-souce em https://github.com/DNXLabs e siga-nos no Twitter, Linkedin or YouTube.

Tenha informações das últimas previsões e atualizações tecnológicas

 

Sem spam - apenas novidades, atualizações e informações técnicas.