Case_Magama

Magama: Intregação de aplicações com chatbot da Amazon Lex

Sobre a Magama

Magama é uma startup Chilena com 4 anos de mercado que entrega uma experiência digital inovadora. E isso é possível, pois a Magama faz uso de soluções imersivas incríveis, que transportam seus clientes para o mundo da realidade virtual através de tours virtuais em 3D, destinados tanto para eventos quanto atividades relacionadas à engenharia e arquitetura.

metaverso também é explorado pela Magama. Nesse caso, ela utiliza a inteligência artificial integrada ao mundo virtual e aliada ao chatbot, que funciona na orientação da navegação do usuário. Além disso, o assistente de voz traz diversas funcionalidades para o usuário.

Conectando o mundo do chatbots a realidade virtual

Nesse projeto específico, a Magama queria adicionar um chatbot nas suas soluções para que os usuários finais tivessem uma experiência ainda mais imersiva e fluida. Essa solução habilitaria o usuário, por exemplo, a tirar as suas dúvidas sobre o espaço virtual de forma automatizada.

A Magama identificou a AWS como o seu provedor principal de tecnologias de cloud. E foi com a DNX Brasil que a Magama descobriu o parceiro ideal para tornar a sua visão uma realidade. Um desafio adicional era a necessidade de troca de tecnologias em decorrência de uma descontinuação. No entanto, junto à Magama, modificamos a solução proposta para atender às novas necessidades.

Do ponto de vista técnico, a Magama precisava conectar a sua solução virtual com uma solução de chatbot, além de outros canais, como os de mensageria, por exemplo. Seria necessária, então, uma integração que permitisse conexões entre vários sistemas e os chatbots. E, além da conexão com o chatbot, as métricas analíticas e de controle de qualidade do atendimento dos chatbots também seriam implantadas.

As soluções: API e o dashboard

A nossa solução foi dividida em duas partes. Inicialmente, havia a necessidade de integração de aplicações com qualquer chatbot da Amazon Lex (no nosso caso Lex v2). Para isso, criamos uma API serverless que intermedeia essa comunicação. Com a tecnologia da Amazon, essa integração suporta comunicação tanto via texto quanto usando a voz do usuário. Além de receber uma voz sintetizada do chatbot para permitir casos de usos mais naturais. Amazon API Gateway e Amazon Lambda foram os serviços principais utilizados, além do próprio Amazon Lex.

A segunda parte da nossa solução foi a criação de um dashboard analítico do Amazon Lex. Nesse momento, foi usado Amazon CloudWatch Logs Insights que consome logs nativos do Amazon Lex e visualiza os resultados em um dashboard.

Toda a solução e sua infraestrutura foram escritas em código (IaC) para a sua fácil replicação, modificação e controle. Com isso, atendemos à necessidade da Magama de poder criar vários dashboards para a variedade de seus clientes.

A interação dentro e fora da realidade virtual

A solução entregue é agnóstica, uma vez que é parametrizável o suficiente para integrar qualquer chatbot do Amazon Lex e visualizar as métricas desejadas. Isso viabiliza a finalidade da Magama, que é disponibilizar inovação com chatbots em vários ambientes, dentro e fora da realidade virtual, além da captura de dados relevantes para visualização no dashboard.

Outro benefício do projeto é que a API pode ser disponibilizada para os seus contratantes diretamente. Ao mesmo tempo, a Magama tem controle do uso das APIs, tendo em vista a importância para o controle do custo por usuário ou aplicação.

E, por último, mas não menos importante, mesmo com o desafio dos ajustes no escopo e na ideação, a Magama foi bem atendida por meio de uma solução que permite que ela cresça e se torne mais escalável.

Sobre a DNX Brasil

A DNX Brasil entrega para seus clientes a melhor experiência em cloud computing. Nossas soluções são fundamentadas na nuvem AWS, como: AWS Well-Architected, contêineres ECS, Kubernetes, integração contínua/entrega contínua, service mesh, big data, analytics e inteligência artificial.

Nosso time de especialistas é composto por profissionais experientes, qualificados e certificados pela AWS, com foco em conceitos cloud-native.

 Confira nossos projetos de open-source aqui e siga-nos no LinkedIn.

Descubra o valor dos dados

A eficacia de uma líderança depende do uso de dados para tomar decisões importantes, é preciso ter um olhar amplo com informações assertivas para ter ações significativas, assim é contruida uma estratégia de dados moderna para fornecer insights às pessoas e aplicações que precisam, com segurança e em qualquer escala.

A DNX Brasil ajuda sua empresa a aplicar análise de dados em seus casos de uso mais críticos para os negócios com soluções completas que precisam de experiência em dados. 

Case_GalaxPay

Galax Pay: Migração para nuvem garante mega investimento para a empresa

Sobre a Galax Pay

Galax Pay é uma plataforma automatizada de gerenciamento de cobranças de cartão de crédito, boletos e pix. Como uma fintech brasileira, a Galax Pay é integrada às operadoras de cartão de crédito para facilitar o processo de cobranças recorrentes. A plataforma ainda oferece acesso a relatórios completos de dados de vendas, gateway de pagamentos para faturas únicas, relatórios customizáveis, gerenciamento automatizado e outras ferramentas que facilitam a gestão de faturamento.

A empresa entendeu que um dos maiores desafios enfrentados pelos empresários brasileiros é a dificuldade de previsibilidade financeira, o que impede investimentos e melhorias em seus negócios. Assim, o sistema de pagamento financeiro Galax Pay foi criado com o objetivo de acabar com esse problema, oferecendo às empresas segurança no recebimento de seus pagamentos mensais.

Em 2015, a inadimplência crescia a uma taxa alarmante em decorrência de uma crise econômica que atingiu o país. Foi então que Márcio Vinícius, atual CEO da Galax Pay, entendeu que era fundamental aprimorar os processos de cobrança e recebimento das empresas. A Galaxy Pay surgiu em um momento em que nenhuma companhia oferecia serviço de pagamento automático de cartão de crédito a um custo acessível para os clientes.

Sobre o sistema

O principal objetivo do Galax Pay é simplificar o gerenciamento de pagamentos através da automação e facilitar os processos de recebimento de pagamentos únicos e recorrentes. Atuando como um intermediário entre bancos, empresas e clientes, a plataforma Galax Pay possibilita que pagamentos sejam efetuados e recebidos por intermédio de vários métodos – incluindo débito direto autorizado e Pix, plataforma gratuita de pagamentos eletrônicos instantâneos administrada pelo Banco Central do Brasil.

A Galax Pay facilita a comunicação das companhias com seus clientes finais, além de oferecer controle total sobre todos os pagamentos por meio de relatórios. Atualmente, a Galax Pay processa mais de R$45 milhões mensais e atende mais de 2.700 clientes.

O Desafio da Empresa

O crescimento inicial da Galax Pay foi lento em decorrência de restrições em sua infraestrutura que estava hospedada on-premise. Problemas diários que a infraestrutura apresentava demandavam quase todo o foco da equipe, reduzindo o tempo disponibilizado para desenvolver a solução.

O time da Galaxy Pay tinha 27 pessoas, e pelo menos 10 delas tinham envolvimento direto com o lançamento dos processos, monitoramento de ambiente e criação de ambiente de teste e validação. Além disso, outros departamentos da empresa operavam com uma equipe muito enxuta, o que resultou na dificuldade de crescimento – pois quando se tem uma estrutura on-premise, quanto mais desenvolvedores são contratados, mais a estrutura tem que crescer para acomodá-los.

A ausência de implantações automatizadas (CI/CD pipelines) e de estratégias de implantação fizeram com que novas versões da aplicação se tornassem amplamente indisponíveis. O repositório estava sendo utilizado indevidamente – os conceitos dos branches de desenvolvimento do GitLab estavam sendo aplicados incorretamente. Na ausência de containers era necessária uma configuração na máquina do desenvolvedor (por aplicação), o que gerou problemas relacionados à disponibilidade no ambiente final. Isso acabou por envolver diretamente os ambientes criados em uma relação de ambiente de desenvolvimento versus ambiente de teste, levando a uma grande necessidade de ambientes de testes e uma grande quantidade de fusões até que uma versão pudesse ser produzida.

Um pacote gerado manualmente foi disponibilizado no servidor, sem nenhum tapete de integração (CI) ou de disponibilidade (CD) e sem nenhuma estratégia de implantação definida – como por exemplo, uma estratégia de implantação verde azul. Ao mesmo tempo, foi liberada uma versão distribuída a todos os clientes.

A maior parte dos lançamentos causou interrupção no serviço para o cliente final, o que pode custar muito caro para a reputação de uma fintech – há uma diminuição da percepção de eficiência e confiabilidade da empresa. Além disso, o próprio uso de repositórios no GitLab e a estratégia de ambientes non-prod também precisavam ser revistos para que a empresa pudesse gerenciar o controle de qualidade por meio do uso de ambientes de teste e aumentar a velocidade dos lançamentos por meio da automação.

A fintech também precisava estar em conformidade com as normas de PCI DSS no setor de pagamentos para atestar o seu comprometimento com o Padrão de Segurança de Dados da Indústria de Pagamento com Cartão. Embora ter um ambiente seguro seja o primeiro passo para obedecer aos padrões de segurança da indústria, o que realmente conta é a capacidade de se manter continuamente em cumprimento dessas regras.

Foi nesse contexto que a Galax Pay procurou a DNX para assessorar na migração de sua estrutura on-prem para a nuvem, algo que possibilitaria o crescimento que a empresa almejava. Através dessa transformação, a DNX influenciou diretamente na habilidade da Galax Pay de atrair investidores e escalar o seu crescimento comercial agregado ao aumento do investimento – resultando em um investimento da CelCoin.

O Processo

  • Fase de Avaliação

Através de briefings executivos, a DNX entendeu e catalogou a infraestrutura existente na Galax Pay. Essa etapa exige muita habilidade e é uma parte crítica na jornada de migração. Contudo, ela permitiu que a equipe da DNX não apenas entendesse as dependências e problemas comuns no ambiente, como também estimasse um Custo Total de Propriedade (TCO), aumentando a visão da Galax Pay sobre o seu próprio negócio. Terminada essa fase, a DNX identificou os recursos e aplicações necessárias para realizar a migração.

A DNX também identificou redundâncias e recursos subutilizados, incluindo base de dados que foram replicadas em vários servidores e máquinas compradas para atender demandas de datas específicas – como por exemplo a Black Friday – e que acabavam sem uso pelo restante do ano. A identificação desses custos adicionais ajudou a Galax Pay a tomar decisões que aumentaram as oportunidades de redução de custos e escala.

O principal resultado dessa fase de avaliação foi a criação de um business case de alto nível que desenhou diversas estratégias para que o time atingisse os objetivos do projeto. A análise do negócio possibilitou que a Galax Pay avaliasse todas as opções disponíveis usando suas prioridades e necessidades como parâmetros, o que, em última instância, contribuiu para decisões mais sólidas para o projeto em questão.

Baseada na avaliação dos processos de interação com os clientes, a melhor solução encontrada foi a migração de as aplicações. Os containers disponibilizam uma forma padrão para o armazenamento de configurações, códigos e dependências das aplicações em um único objeto, compartilhando apenas um sistema operacional instalado no servidor. O uso de containers permite que a equipe faça implantações de forma rápida, confiável e consistente, independentemente do ambiente.

Com a evolução do processo de virtualização, os containers são capazes de redimensionar a aplicação rapidamente por precisarem de pouco tempo de inicialização. Esse método simplifica a automatização do processo de implantação – já que a aplicação fica empacotado e pode ser disponibilizado em diferentes ambientes, como o desenvolvimento, homologação e produção.

A DNX concluiu que esse era o melhor método para acompanhar o desenvolvimento da aplicação, já que uma vez feita a conteinerização, há a garantia de que tudo o que a aplicação necessita para operar está intrinsecamente ligada a ela. A estratégia maior era garantir a máxima disponibilidade para o usuário final.

  • Fase de Mobilização

Após a avaliação, iniciou-se o processo de planejamento – o momento em que a DNX começou a desenhar a nova arquitetura e o plano de migração de acordo com as necessidades da Galax Pay. A DNX avaliou as lacunas de tempo de resposta da nuvem e interdependência entre aplicações, descobertas na fase anterior. Além disso, foram avaliadas todas as possíveis estratégias de migração para garantir que a mais adequada fosse selecionada e atualizada no business case. Durante a etapa de Mobilização, a equipe da DNX implantou a Citadel, uma infraestrutura na nuvem arquitetada nos padrões de Well-Architected da AWS, pronta para entrar em conformidade com as normas de órgãos reguladores internacionais como PCI DSS, HIPAA, ISO 27001, CDR. E em seguida trabalhou com o cliente para projetar a plataforma da aplicação.

A solução apresentada à Galax Pay foi a de performar a migração através da modernização da aplicação e da utilização de containers utilizando o Amazon ECS, que é executado utilizando o Fargate. O ECS permite a configuração de métricas como CPU, memória e número de conexões, que auxiliam no escalonamento automático. O Fargate foi escolhido para alcançar a elasticidade e agilidade necessárias para a aplicação Galax Pay, pois permite que dois containers sejam executados ao mesmo tempo sem a necessidade de gerenciar servidores ou clusters de instância EC2.

O Fargate simplifica o processo da Galax Pay ao eliminar a necessidade da escolha de um tipo de servidor e o tempo de dimensionamento e de empacotamento de clusters. Outro motivo pelo qual o Fargate foi a escolha perfeita nesse caso foi o atendimento aos critérios de conformidade de PCI exigidos pelo ambiente. O uso do Fargate significa que a Galax Pay não precisará atualizar continuamente o sistema operacional ou utilizar sistemas de anti-vírus para a manutenção da segurança das máquinas.

Antes de iniciar a terceira e última fase do projeto, a DNX concluiu a configuração da zona de aterrissagem utilizando a fundação segura da Citadel e preparando o terreno para a migração de várias aplicações-piloto.

  • Fase de Migração

Após a comprovação do sucesso dos aplicações-piloto, começou a migração do restante dos dados da Galax Pay para o ambiente seguro criado na AWS. Para que a Galax Pay se beneficiasse totalmente de tudo que a AWS tem a oferecer, durante o processo de migração o time da DNX realizou uma modernização. Ao modernizar dados e aplicações com conceitos nativos da nuvem, a Galax Pay se preparou para um futuro de sucesso – em que a eficiência de suas operações é otimizada.

Ao replicar o banco de dados, a DNX garantiu a sincronização ativa de dados – o que possibilita que os mesmos sejam replicados no ambiente operacional, reduzindo o downtime para cutover. Ou seja, ir além de uma simples estratégia de levantamento e deslocamento permitiu que a Galax Pay evitasse trazer os problemas do passado para o futuro da empresa.

A Galax Pay entrou em contato com a DNX Solutions do Brasil à procura de uma migração de on-prem para a nuvem, mas a entrega final superou as expectativas. O cliente buscava uma migração lift-and-shift para a AWS, mas entregamos uma modernização completa de acordo com os padrões de qualidade da AWS. A Galax Pay estava ciente dessa solução, mas imaginava que seria algo para o futuro. No entanto, implementamos essa solução nesse momento, evitando que a Galax Pay tivesse que se envolver em um novo projeto mais adiante.

Com o resultado alcançado, a Galax Pay:

  • Aumentou a percepção de disponibilidade e performance da aplicação
  • Diminuiu o tempo de resposta para melhorias e correção de bugs (bug fixes) e sua efetiva disponibilização. Isso foi refletido no aumento de sua nota na plataforma de avaliação online Reclame Aqui
  • Maior segurança para o cliente ao atender os padrões PCI DSS

A modernização da aplicação foi entregue como parte do projeto de migração, aumentando a agilidade e segurança e permitindo que a Galax Pay atingisse metas projetadas para anos no futuro.

 

Aumento do Investimento e Crescimento

 

De 2020 a 2022, A Galax Pay cresceu 420% em receita do ano fiscal. Enquanto isso, o número de clientes aumentou aproximadamente em 150%, indo de 1.116 para 2.784 clientes.

Com os desafios operacionais causados por uma estrutura datada resolvidos pela migração efetuada pela DNX, as estratégias de negócio e promoção ganharam destaque. O resultado atraiu o investimento da CelCoin, que atuou como um catalisador financeiro impulsionando os negócios. A fundação segura e dimensionável entregue pela DNX Brasil garantiu que a Galax Pay estivesse preparada para lidar com aumentos de fluxo repentinos.

Estima-se que o aumento de clientes que a Galax Pay alcançou seria atingido em cinco anos, caso eles tivessem mantido sua infraestrutura on-prem.

 

Aumento de Entregas

 

Como uma fintech com uma solução digital sendo alimentada por um canal digital de aplicações, tecnologia é o cerne do negócio. O time da DNX implementou a automação de implantação e compartilhou conhecimento com a Galax Pay em relação ao GitLab e ambientes não produtivos. Isso permite a constante entrega de novas versões da aplicação diariamente.

 

Tranquilidade

 

Galax Pay agora opera a partir de uma estrutura segura de nuvem, a Citadel, que oferece tranquilidade operacional e de conformidade por meio de maior resiliência, confiabilidade e segurança.

 

Maior Desenvolvimento

 

A substituição da atualização manual pela automação otimizou o uso do tempo da equipe. Com as preocupações com a infraestrutura resolvidas, a equipe de desenvolvimento da Galax Pay agora tem tempo disponível para se concentrar nos objetivos principais da empresa e criar novos recursos para a solução.

A automação também permitiu que a Galax Pay implementasse novos recursos em um ritmo que atendesse aos desejos de seus clientes. O controle de qualidade também foi aprimorado por meio da criação de ambientes de teste e produção, permitindo que novos recursos sejam testados antes de serem liberados para o usuário final.

Antes do envolvimento da DNX, a Galax Pay estava restrita a liberar novas funcionalidades manualmente apenas aos finais de semana. Agora, o time tem a flexibilidade de liberar novas funcionalidades de três a quatro vezes por dia.

 

Conformidade PCI

 

O ambiente desenvolvido com a solução Citadel permite que a plataforma Galax Pay atinja a conformidade com PCI rapidamente, por esse ambiente ser compatível com PCI em sua construção. A Galax Pay também utilizou o DNX Managed Services, serviço oferecido pela DNX, para coletar evidências para uma empresa externa de auditoria, que confirmou sua conformidade. Isso garantiu a certificação PCI da empresa.

Uso Contínuo de Serviços Gerenciados

Reconhecendo a eficiência do trabalho da DNX ao longo do projeto, a Galax Pay optou por fazer uso contínuo do DNX Managed Services, que vem agregando valor à empresa há mais de um ano.

Atualmente, a DNX fornece um serviço de extensão SRE para a Galax Pay, onde a DNX é a parceira expert da AWS e DevOps da Galax Pay. Ao estabelercer uma parceria de confiança, a Galax Pay não precisa se lançar no mercado de trabalho em busca de mão-de-obra especializada. Isso garante benefícios ao cliente final da Galax Pay, já que o time pode manter o foco no que faz a aplicação rodar melhor – solucionar bugs, implementar melhorias e adicionar novos recursos que facilitam a vida dos das pessoas e empresas que contam com o serviço da Galax Pay.

Confira nossos projetos de open-source em github.com/DNXLabs e siga-nos no LinkedInTwitter e Youtube.

Cromai: Treinamento de Deep Learning 15x mais rápido na nuvem

Sobre a Cromai

A Cromai é uma agtech fundada em 2017 com foco em melhorar de forma eficiente a vida do produtor agrícola. E para isso, usa aplicações de tecnologia de fronteira, principalmente, Machine Learning com a visão computacional para identificar de maneira automatizada padrões em imagens coletadas no campo, oferecendo então um diagnóstico que permite uma tomada de decisão mais precisa.

Alinhada à complexidade do campo, a Cromai possibilita que o produtor agrícola atinja seu potencial máximo produtivo utilizando IA de maneira simples e sustentável. É possível utilizar soluções que filtram a impureza vegetal na cana de açúcar a partir de sensores, por exemplo. Para plantas daninhas também é possível identificar o local em que elas nascem e, com isso, direcionar o agricultor para a melhor forma de realizar o manejo necessário.

Esses sistemas processam e analisam fatores que geram resultados para os produtores de todo Brasil. Isso possibilitou um olhar internacional que fez com que a Cromai fosse selecionada pela StartUs Insights como umas das 5 startups mais promissoras no mundo, em visão computacional para agricultura.

Desafios de uma das mais promissoras startups do mundo

O principal desafio era otimizar o tempo de treinamento da Machine Learning, pois a demora para gerar a nova versão era muito grande e impactava diretamente o core do negócio. Trouxemos o treinamento de Machine Learning para cloud AWS, dessa forma foi possível treinar novos diversos modelos, com base em imagens.

Para ter uma dimensão da quantidade de dados para a solução das plantas daninhas, mais de 20 milhões de imagens eram armazenadas no dataset. E esse fator aumentava a necessidade de ter um cluster de treinamento mais robusto. A Cromai utilizava um servidor com uma GPU direcionada ao treinamento de modelos de Deep Learning, e com esta configuração, a realização de experimentos ocorria de maneira demorada, em torno de 3 meses para treinar um modelo.

Os benefícios de treinamento de redes neurais em múltiplas GPUs em paralelo

Entendendo as necessidades da Cromai, o objetivo da nossa solução era a redução do tempo de treinamento sem que isso afetasse, de uma forma significativa, o custo dele e as métricas de performance do modelo. Estávamos confiantes, pois conseguimos entregar um bom resultado, conhecendo as possibilidades do Amazon SageMaker.

Inicialmente, nós tínhamos duas grandes vantagens que contribuíram para o sucesso do projeto. A primeira delas é que na AWS é possível usar instâncias de treinamento muito potentes, equipadas com várias GPUs modernas por instância. Essa alteração tem seus benefícios em termos de pura performance.

Em segundo, é possível distribuir o treinamento em mais de uma instância. Esta tarefa não é algo trivial, já que o treinamento de redes neurais, mesmo sendo distribuído, precisa manter sincronia entre as suas instâncias e GPUs. Para esta tarefa existem frameworks, como SageMaker distributed.

No caso do nosso projeto, devido a uma necessidade técnica, optamos pelo Horovod, framework open-source de treinamento distribuído para algoritmos de deep learning.

O Amazon SageMaker suporta esse framework e a nossa principal tarefa era a adequação do script de treinamento da Cromai para o ambiente do Amazon SageMaker. Utilizamos o S3 como armazenamento de dados de treinamento e, principalmente, adicionamos a camada do Horovod no script de treinamento.

Criamos também uma forma fácil e com transparência de custo para a Cromai escolher a quantidade e o tipo das instâncias de cada treinamento.

Criamos também uma forma fácil e com transparência de custo para a Cromai escolher a quantidade e o tipo das instâncias de cada treinamento.

Redução do tempo de treinamento e o impacto no negócio

Diminuir o tempo de treinamento era fundamental para a escalada dos projetos na Cromai, a demora no tempo do treinamento dos modelos estava afetando diretamente o sucesso do negócio.

Graças ao domínio do nosso time sobre as possibilidades existentes no Amazon SageMaker e a estratégia elaborada, conseguimos de forma efetiva resolver essa dor.

A solução desenvolvida impactou bruscamente o tempo de treinamento que caiu de 3 meses para 6 dias, mantendo todas as métricas de performances existentes. Em caso de necessidade a Cromai tem uma opção de aumentar o investimento no treinamento a fim obter resultados em até 3 dias.

Com a diminuição do tempo a interação ficou mais frequente, isso aumentou a agilidade e o time de tecnologia da Cromai agora passar mais tempo fazendo o que ama: tornar as soluções melhores e mais adequadas à realidade do produtor rural.

Sobre a DNX

Na DNX Brasil trabalhamos para trazer a melhor experiência em Cloud e aplicações para empresas nativas digitais no Brasil.

Atuamos em áreas com foco em AWS, Well-Architected Solutions, Contêineres, ECS, Kubernetes, Integração e Entrega Contínua e Soluções de Mesh e Soluções em Data (plataformas de dados, data lakes, machine learning, analytics e BI).

Confira nossos projetos de open-source em github.com/DNXLabs e siga-nos no LinkedInTwitter e Youtube.

Escrito por: Ladislav Vrbsky e Luis Campos / Revisão: Camila Targino

Descubra o valor dos dados

A eficacia de uma líderança depende do uso de dados para tomar decisões importantes, é preciso ter um olhar amplo com informações assertivas para ter ações significativas, assim é contruida uma estratégia de dados moderna para fornecer insights às pessoas e aplicações que precisam, com segurança e em qualquer escala.

A DNX Brasil ajuda sua empresa a aplicar análise de dados em seus casos de uso mais críticos para os negócios com soluções completas que precisam de experiência em dados. 

DevOps is contributing to CreditorWatch’s Digital Transformation

How DevOps is contributing to CreditorWatch’s Digital Transformation

We live in a Digitally Transformed world where technology allows new forms of work in a rapidly changing environment. Traditional businesses are challenged by start-ups and tech companies with innovative and disrupting business models. New apps and services are created and become obsolete in the blink of an eye.

The traditional development, test, production, and operation models no longer serve our high-speed, connected world, but rather create bottlenecks and friction between departments. Each of the technology areas ends up becoming a silo with strict interaction rules.

On one side of the ring, we have development, trying to answer in the best and fastest way it can through the use of business insights, agile methodologies, and modern architectures and languages. In the other corner, there is IT operations, on a quest for stability and control of the production environments. IT operations is tasked with creating processes and procedures to ensure that every piece of released code is stable enough to avoid incidents, all the while continuing to protect what is already running.

And between them? A huge abyss. This distance separating Development and Operations results in clashes, increasing the time for delivery and problem resolution.

To reduce the friction and allow business ideas to become features to service consumers, the DevOps concept was forged around 2010. It is a concept that continues to grow and, in recent years, has begun changing the IT landscape.

What is DevOps?

DevOps is work culture, bringing software development closer to IT operations, allowing the business as a whole to reap the rewards of collaboration.

DevOps is not a methodology or a tool, but a set of practices built on automation, communication and shared objectives, changing organisational cultures to bring to life a new way to deliver IT. DevOps includes the whole Design, Build and Operate IT lifecycle, unifying these processes with governance and security serving as its basis, sewed up with automation, and an agile way of working.

How is DNX assisting CreditorWatch to evolve and implement a DevOps culture?

All DNX projects use DevOps practices, which provides us the ability to deliver higher quality solutions to clients, with faster and continuous delivery.

Clients are often so impressed by these results that they wish to deliver the same level of quality, knowledge, and efficiency to their own clients.

After completing a successful data modernisation project with DNX, CreditorWatch wanted to continue its digital transformation by implementing a DevOps culture in its IT operations. The DNX professional services team delivered a series of hands-on workshops where developers learned about configuration management, infrastructure as a code, and the whys of the platform. This gives developers the ability to transform into a DevOps team.

The learning curve is decreased considerably through DNX’s pattern and template creation, allowing CreditorWatch’s developer team to recreate their own means to act as a platform.

What is CreditorWatch obtaining with its Digital Transformation?

By adopting DevOps practices, CreditorWatch, represented by its CTO Joseph Vartuli, is building a culture of shared responsibility, transparency, and faster feedback as the foundation of every product and feature developed by its team. This gives them:

  • Increased competitive advantage
  • Decreased risks 
  • Decreased costs
  • Continuous delivery and deployment

Continuous delivery is an ongoing DevOps practice of building, testing, and delivering improvements to software code and user environments with the help of automated tools. The key outcome of the continuous delivery (CD) paradigm is code that is always in a deployable state.

  • Reduced downtime
  • Reduced time to market
  • Increased employee engagement and satisfaction, through the use of the latest technologies

Adopting a DevOps work culture means different teams within the business collaborate in order to reach a shared goal. Products and services are delivered to your end users at a faster rate with a higher level of quality. As technology becomes integrated with every aspect of our lives, work silos only get in the way. Just like CreditorWatch, you too can benefit from DevOps practices, transporting your business to the future.

The Unique Value DNX brought to the CreditorWatch Project

DNX Solutions utilised its knowledge on DevOps, Cloud, data, and Software Engineering to provide CreditorWatch with a secure environment that continually meets ISO and other compliance standards. The diversity of experience integrated within the DNX team allowed for instant identification of areas for improvement in CreditorWatch’s systems. In addition, DNX assisted CreditorWatch in bringing about a cultural change by transferring its DevOps mindset approach. Not only was the goal of agility and efficiency reached by the close of the project, but significant storage cost reductions were made enabling CreditorWatch to compete to a higher standard and continue to expand.

CreditorWatch Democratises Credit Data with DNX Solutions

CreditorWatch Democratises Credit Data

CreditorWatch was founded in 2010 by a small business owner who wanted to create an open source, affordable way for SMBs to access and share credit risk information. Today, CreditorWatch’s subscription-based online platform enables its 55,000+ customers—from sole traders to listed enterprises—to perform credit checks and determine the risk to their businesses. It also offers additional integrated products and services that help customers make responsible, informed credit decisions.

CreditorWatch helps businesses understand who they are trading with and any creditor issues associated with that particular business. They analyse data from 30 different sources, including both private and government sources. Some of their most powerful behaviour data is crowdsourced from their very own customers providing insights into businesses. Ultimately, CreditorWatch customers get access to Australia’s most insightful business credit rating.

The Challenge of Australia’s Largest Commercial Credit Bureau

An expansion phase saw major corporations, including Australia’s Big Four banks, looking to leverage CreditorWatch’s rich dataset and granular analytics capabilities. As a result, CreditorWatch decided to increase its agility and efficiency. With the need to provide a continuously secure and compliant environment, with reduced costs and increased time to market, CreditorWatch engaged with DNX Solutions. DNX was tasked with creating and executing a roadmap for the improvements, targeting cloud-native concepts, and bringing more efficiency to the IT and Operations teams.

Through workshops during the discovery phase, DNX determined CreditorWatch’s business and technical capabilities, such as the interdependencies, storage constraints, release process, and level of security. With the required information at hand, DNX developed a roadmap to meet CreditorWatch’s Technical and Business objectives, using AWS best practices “The 7R’s” (retire, retain, relocate, rehost, repurchase, replatform, and refactor).

A Safe Environment to Meet ISO Standards

To continue delivering a safe platform to their customers and meeting the requirements of ISO and other compliance standards, DNX constructed a new secure AWS environment utilising its DNX.one Foundation.

Rather than undergoing a lengthy and expensive process each time a safe environment needs to be recreated, DNX.one helps customers build secure and scalable container platforms at high-availability and low-cost. This unique marketplace solution designed for AWS with well-architected principles combines years of cloud experience in a platform focused on simplicity, infrastructure-as-code and open sources technologies. In addition, DNX.one provides a consistent approach to implementing designs that will scale CreditorWatch’s application needs over time.

Once CreditorWatch’s environment was secured with the best AWS and industry practices, it was time to move to the modernisation phase.

Instant Cost Reduction of 120K per Year With Data Modernisation

Due to the amount of data received on a daily basis, CreditorWatch’s database increases considerably in size and cost.

The DNX data team worked on the data Engineering by optimising CreditorWatch’s Aurora database and its tools to full capability. 

Amazon Aurora is a MySQL and PostgreSQL-compatible relational database built for the cloud that combines the performance and availability of traditional enterprise databases with the simplicity and cost-effectiveness of open source databases.

Amazon Aurora features a distributed, fault-tolerant, self-healing storage system that auto-scales up to 128TB per database instance. It delivers high performance and availability with up to 15 low-latency read replicas, point-in-time recovery, continuous backup to Amazon S3, and replication across three Availability Zones.

Aurora data is stored in the cluster volume, which is a single, virtual volume that uses solid state drives (SSDs). A cluster volume consists of copies of the data across three Availability Zones in a single AWS Region. Because the data is automatically replicated across Availability Zones, customers’ data is highly durable with less possibility of data loss. This replication also ensures that databases are more available during a failover.

The Aurora cluster volume contains all user data, schema objects, and internal metadata, such as the system tables and the binary log. Its volumes automatically grow as the amount of data in the customer’s database increases.

With extensive data knowledge and years of experience with AWS solutions and tools, DNX provided a unique solution to configure Aurora Database leveraging its full capabilities, which resulted in an instant cost reduction of over 90K per year related to instant threshold of data availability.

The DNX team also created an automated archiving process utilising AWS Airflow, which analyses CreditorWatch’s database tables, identifying data which is unused for a period of time. Unused data is then archived with a different type of file storage at a cheaper rate than S3. This process resulted in an additional cost reduction of 30K per year.

AWS Archiving Process: How it works.

The Unique Value DNX brought to the CreditorWatch Project

DNX Solutions utilised its knowledge on DevOps, Cloud, data, and Software Engineering to provide CreditorWatch with a secure environment that continually meets ISO and other compliance standards. The diversity of experience integrated within the DNX team allowed for instant identification of areas for improvement in CreditorWatch’s systems. In addition, DNX assisted CreditorWatch in bringing about a cultural change by transferring its DevOps mindset approach. Not only was the goal of agility and efficiency reached by the close of the project, but significant storage cost reductions were made enabling CreditorWatch to compete to a higher standard and continue to expand.

Descubra o valor dos dados

A eficacia de uma líderança depende do uso de dados para tomar decisões importantes, é preciso ter um olhar amplo com informações assertivas para ter ações significativas, assim é contruida uma estratégia de dados moderna para fornecer insights às pessoas e aplicações que precisam, com segurança e em qualquer escala.

A DNX Brasil ajuda sua empresa a aplicar análise de dados em seus casos de uso mais críticos para os negócios com soluções completas que precisam de experiência em dados. 

KOBA Insurance

KOBA Seguro, uma empresa orientada a dados

KOBA Insurance

A startup australiana KOBA Insurance oferece uma abrangente política de seguro de carro que se concentra em atender veículos conectados: carros que estão pré-conectados à internet. O que a difere das outras companhias de seguro? As taxas são baseadas em quanto os clientes realmente dirigem seus carros.

Funciona instalando o KOBA Rider – um módulo do tamanho de uma caixa de fósforos – na tomada On-Board Diagnostics (OBD) do carro, uma porta de computador externa geralmente localizada atrás de um painel na seção inferior do painel de instrumentos.

O KOBA Rider recebe dados de condução e GPS em tempo real e os comunica ao aplicativo de smartphone do cliente, que reconhece quando o veículo está em movimento. Então, através de seu aplicativo móvel KOBA, os clientes podem ver viagens, custos e documentos de política, quase instantaneamente.

Esse modelo de seguro de carro “pay-as-you-drive” (pague conforme você dirige) é uma mudança absoluta de paradigma.

Avançando para ser uma Empresa Orientada por Dados

Para entender melhor as necessidades do usuário e as tendências do mercado, e acelerar o tempo de lançamento no mercado, a KOBA precisava de um parceiro em nuvem experiente para modernizar seus dados. Eles precisavam de uma solução de dados personalizada que utilizasse dois serviços de código aberto específicos, Airbyte e Plotly, para receber e gerenciar dados no ambiente AWS.

Ao fazer isso, a equipe de desenvolvedores da KOBA estaria livre para passar mais tempo fazendo o que amam: produzindo novos recursos para a plataforma.

Dados em Tempo Real em um Ambiente Protegido

O primeiro passo para modernizar os dados da KOBA foi integrar todos os componentes de sua solução em um lago de dados. Isso incluía CRMs, Google Analytics, Social, sistemas pagos e outros.

A DNX projetou e implementou uma nova arquitetura de dados para atender aos requisitos de negócios da KOBA e às melhores práticas de mercado. A nova arquitetura inclui o Airbyte para absorver os dados, o Glue para extrair dados do DocumentDB da KOBA, um Data Warehouse de terceiros (DataBricks) e o Plotly para análise e relatórios. A equipe da DNX garantiu que os controles de segurança estivessem em vigor para restringir o acesso de acordo com funções e serviços, minimizando a chance de violações de dados. A DNX também garantiu que as soluções estivessem centralizadas e monitoradas, o que significa que eram simples de manter após a finalização do projeto.

A DNX configurou e integrou o Databricks da KOBA, que é usado para processar e transformar quantidades massivas de dados, além de explorar os dados por meio de modelos de aprendizado de máquina. Além disso, para permitir que a equipe da KOBA continue implantando seus aplicativos no futuro, a DNX criou um blueprint para pipelines do Airflow. Essa transferência de conhecimento, tão valorizada pela DNX, permite sustentabilidade contínua a partir do próprio negócio do cliente.

Serviços da AWS usado:

Buscando a Excelência no Atendimento ao Cliente e no Crescimento Acelerado

Agora, a KOBA tem uma única fonte de verdade (SSOT) que oferece a toda a equipe a capacidade de tomar decisões de negócios cruciais com base em dados mutuamente acessíveis. Isso significa que não há silos de trabalho impedindo as pessoas de acessar informações importantes. 

A KOBA pode obter insights de maneira mais rápida, simples e escalável, usando ferramentas com as quais estão familiarizadas, como o Data Bricks, tudo com o nível de segurança que precisam. O Databricks removeu a complexidade que eles experimentaram anteriormente, aumentando a facilidade com que visualizam dados por meio de painéis, permitindo que as equipes da KOBA acompanhem e prevejam vendas, além de gerar outros insights úteis. A compliance com os dados agora pode ser facilmente mantida e seus dados estão protegidos contra acesso não autorizado, roubo e outras violações de dados.

Conclusão

Em um mundo cada vez mais impactado pela tecnologia, a DNX oferece soluções personalizadas para qualquer empresa, independentemente de suas necessidades tecnológicas.

Para acompanhar o avanço constante da tecnologia, as empresas têm que estar preparadas para o que está por vir. Com a equipe experiente e inovadora da DNX, você pode ter certeza de encontrar a solução perfeita para suas necessidades comerciais únicas.

Como evidenciado no caso da KOBA, a modernização de dados não apenas melhora seus negócios imediatamente, mas também os prepara para trabalhar com as mudanças na indústria à medida que se desenvolvem. Não seja pego de surpresa pela próxima tecnologia disruptiva, entre em contato com a equipe de modernização de dados da DNX para preparar sua empresa para o futuro, hoje mesmo.

Descubra o valor dos dados

A eficacia de uma líderança depende do uso de dados para tomar decisões importantes, é preciso ter um olhar amplo com informações assertivas para ter ações significativas, assim é contruida uma estratégia de dados moderna para fornecer insights às pessoas e aplicações que precisam, com segurança e em qualquer escala.

A DNX Brasil ajuda sua empresa a aplicar análise de dados em seus casos de uso mais críticos para os negócios com soluções completas que precisam de experiência em dados. 

canibuild

canibuild Data Modernisation Journey

canibuild

canibuild is a game-changer for the construction industry. After 20 years of facing the same problems over and over again, Timothy Cocaro founded canibuild to take the hassle out of building.

With canibuild, builders and their clients can see what can be constructed on their parcel of land in just minutes, in a virtual, easy-to-understand way. canibuild uses AI-Powered technology to tap into multiple real-time data sources such as high-resolution aerial imagery, local city, and county government data sets, and codification of planning rules – removing the typical “over the fence” site assessment, hand-drawn plans, and estimates. canibuild is customised for each subscriber, with individual floor plans, branding, and costs uploaded onto the platform, allowing subscribers to provide branded plans, flyers, reports, and estimations instantly, condensing outdated practices that would traditionally take weeks. It is a true one-stop-shop where users can instantly site a build, check typography, and request reports to determine build feasibility, site costs, generate site plans, check compliance and produce quotes for homes, pools, granny flats, ADU’s, sheds and more… all in just minutes! 

canIbuild is currently available in Australia, New Zealand, Canada and the United States

The Business Challenge

Due to rapid expansion, canibuild required an experienced cloud-native partner to transform its complex cloud platform to sustain and capacitate for their growth by unlocking new data and analytics functionalities. One of the major challenges was to create a Single Source Of Truth (SSOT), which involves integrating different types of data into one central location as opposed to the various data sources from which they were being collected. Among the required data for canibuild is geospatial data, a time-based data that is related to a specific location on the Earth’s surface. This data can provide insights into relationships between variables, revealing patterns and trends.

Delivering DataOps and Data Analytics to Grow the canibuild Business

The DNX team built a platform by implementing a DataOps approach consisting of a collection of technical practices, workflows, cultural norms, and architectural patterns that enable:

  • Rapid innovation and experimentation delivering new insights to customers with increasing velocity
  • Extremely high data quality and very low error rates
  • Collaboration across complex arrays of people, technology, and environments
  • Clear measurement, monitoring, and transparency of results

 

The developed data platform combines modern SaaS ingestion tools (StitchData) and DbT, AWS data services including Data Lake (S3 + Glue Catalog + Athena), Glue ETL, MWAA for orchestration, DMS for near-real-time replication, DynamoDB for control tables and Cloudwatch events for scheduling.

canibuild infrastructure

Real-time Assertive Data

After a complex process in which all relevant data were collected, sorted, and stored in one location, canibuild now has real time insights allowing their team to access the same information. The team can now predict future trends, maximise opportunities and work towards realistic goals and objectives to continue growth.

Through our knowledge transfer DNX equipped the canibuild team with knowledge on how to provision a new logical environment for its product:

  • Terraform projects
  • Terraform variables configuration
  • DMS configurations
  • Database importer/exporter
  • MWAA and how to create new DAGs
  • How to troubleshoot Airflow

Data Modernisation Outcome

With the creation of an SSOT and the transfer of all data into a central location, canibuild teams can now access the data they need sooner than ever before, allowing them to respond quickly and efficiently to their clients. Improved data analytics enables them to access real time insights and make more accurate predictions; a valuable asset in current times plagued by uncertainty. Furthermore, thanks to the simplification of the platform by DNX, canibuild’s engineers now have time to spare, allowing them to work on what they do best: producing new features!

To see your business soar towards the future with open arms, contact DNX today and learn how you can benefit from data modernisation.

Descubra o valor dos dados

A eficacia de uma líderança depende do uso de dados para tomar decisões importantes, é preciso ter um olhar amplo com informações assertivas para ter ações significativas, assim é contruida uma estratégia de dados moderna para fornecer insights às pessoas e aplicações que precisam, com segurança e em qualquer escala.

A DNX Brasil ajuda sua empresa a aplicar análise de dados em seus casos de uso mais críticos para os negócios com soluções completas que precisam de experiência em dados. 

Plutoras Data and Digital Modernisation journey

Plutora’s Data and Digital Modernisation Journey

About Plutora

Plutora offers value stream management solutions that help companies with release, test environment and analytics solutions for enterprise IT.

Among Plutora’s clients are global organisations typically in healthcare, Fintech and telecommunications, all of which are highly regulated and require tools to maintain compliance. In addition, clients in these industries require predictable software delivery due to high risk tolerance.

The Business Challenge

Although Plutora generates great value to their customers, they were looking for a partner that could assist them in decreasing the complexity of their data infrastructure. They wanted a new architecture based on the best practice of the industry, including automating their processes and modernising their multiple .Net applications due to the approaching end of support. Achieving these goals would allow Plutora to evolve and award them the agility needed to launch new features.

Data and Digital Modernisation Discovery

The DNX Digital and Data team performed a comprehensive Windows and data discovery on Plutora’s workloads which involved a kick-off, followed by a sequence of intense activities. The discovery was concluded with a workshop showcase where the team presented a roadmap stating areas of improvements for the existing solution and a modernisation to be executed afterwards to enable Plutora to achieve its objectives.

Solution

DNX proposed a four phase engagement plan to modernise Plutora’s data & analytics workloads. 

Plutora Data Project

In Phase 1, DNX validated the use of temporal tables in SQL Server to enable CDC for the ETL process. This was to improve estimation accuracy for Phase 4.

In phase 2, DNX began delivering early benefits of the modernisation project by using the SQL Server replica DB for the ETL extraction and refactoring the existing SQL Server scripts to extract incremental data only.

This reduced performance impact on the application whilst enabling a higher number of ETL queries to run in parallel, thus reducing the overall time for the ETL execution. 

In phase 3, DNX removed the complexity and modernised the ETL platform by implementing Managed Workflows for Apache Airflow (MWAA) to replace the Node App orchestrator, implementing DMS to replicate data between the SQL Server DW and the Postgres DW and Decommissioning of the Node App orchestrator. 

In the final phase, the ETL to ELT modernisation was completed.

Data Modernisation outcome

DNX delivered a data modernisation solution to Plutora that began seeing benefits quickly through a number of avenues:

 

Cost Reduction 

Plutora experienced a 30% cost reduction with the Migration of SQL Server to RDS and decommissioned redundant components as well as no cost for utilising Windows licences

 

Near Real-Time Data

The time for Data to become available for reporting was reduced from 20 minutes to just 4.

 

Simplicity

Replacing an ELT system built in-house with open source project makes Plutora more attractive to IT personnel and assists in retaining such talents.Further simplicity was achieved through reducing the number of layers on the solution resulting in reduced cost and accelerated delivery. In addition, FTE was reduced to maintain and patch servers and DB. 

 

Evolvability

A number of positive changes can now be enjoyed by Plutora, such as the removal of technical debt and decoupling from vendor and the ability to undertake agile practices due to modern practices within Data & Analytics. The data strategy has created a Single Source of Truth which allows Plutora to benefit from Machine Learning, and the merging of all logic to an application layer reduces time to change and deploy.

Conclusion

With clients who require the most up-to-date technical support, Plutora is in a position where data modernisation is absolutely crucial. With a more simplified and adaptable infrastructure, they are now able to offer the best services to their clients across the globe.

Descubra o valor dos dados

A eficacia de uma líderança depende do uso de dados para tomar decisões importantes, é preciso ter um olhar amplo com informações assertivas para ter ações significativas, assim é contruida uma estratégia de dados moderna para fornecer insights às pessoas e aplicações que precisam, com segurança e em qualquer escala.

A DNX Brasil ajuda sua empresa a aplicar análise de dados em seus casos de uso mais críticos para os negócios com soluções completas que precisam de experiência em dados. 

Automating .NET Framework deployments with AWS CodePipeline to Elastic Beanstalk

Automating .NET Framework deployments with AWS CodePipeline to Elastic Beanstalk

When it comes to Windows CI/CD pipeline people immediately start thinking about tools like Jenkins, Octopus, or Azure DevOps, and don’t get me wrong because those are still great tools to deal with CI/CD complexities. However, today I will be explaining how to implement a simpler .NET Framework (Windows) CI/CD pipeline that will deploy two applications (API and Worker) to two different environments using GitHub, CodePipeline, CodeBuild (Cross-region), and Elastic Beanstalk.

Continuos Deployments

Requirements

  • AWS Account
  • GitLab repository with a .NET Framework blueprint application
  • Existing AWS Elastic Beanstalk Application and Environment

CodePipeline setup

Let’s create and configure a new CodePipeline, associating an existing GitHub repository via CodeStar connections, and linking it with an Elastic Beanstalk environment.

AWS Code Pipeline

First, let’s jump into AWS Console and go to CodePipeline.

AWS Console Code Pipeline

Once in the Codepipeline screen, let’s click on Create Pipeline button.

AWS Console CodePipeline Dashboard

This will start the multi-step screen to set up our CodePipeline.

Step 1: Choose pipeline setting

Please enter all required information as needed and click Next.

AWS Console - CodePipeline - Step 1

Step 2: Add source stage

Now let’s associate our GitHub repository using CodeStar connections.

For Source Provider we will use the new GitHub version 2 (app-based) action.

If you already have GitHub connected with your AWS account via CodeStar connection, you only need to select your GitHub repository name and branch. Otherwise, let’s click on Connect to GitHub button.

AWS Console - CodePipeline - Step 2

Once at the Create a connection screen, let’s give it a name and click on Connect to GitHub button.

AWS Console - CodePipeline - Create GitHub App connection

AWS will ask you to give permission, so it can connect with your GitHub repository.

Giving AWS CodePipeline Authorization to GitHub repository

Once you finish connecting AWS with GitHub, select the repository you want to set up a CI/CD by searching for its name.

The main branch we’ll use to trigger our pipeline will be main as a common practice, but you can choose a different one you prefer.

For the Change detection options, we’ll select Start the pipeline on source code change, so whenever we merge code or push directly to the main branch, it will trigger the pipeline.

Click Next.

AWS Console - CodePipeline - Step 2 - Source - GitHub

Step 3: Add build stage

This step is the one we will generate both source bundle artifacts used to deploy both our API and Worker (Windows Service application) to Elastic Beanstalk.

We will also need to use a Cross-region action here due to CodeBuild limitations regarding Windows builds as stated by AWS on this link.

Windows builds are available in US East (N. Virginia), US West (Oregon), EU (Ireland) and US East (Ohio). For a full list of AWS Regions where AWS CodeBuild is available, please visit our region table.

⚠️ Note: Windows builds usually take around 10 to 15 minutes to complete due to the size of the Microsoft docker image (~8GB).

AWS Console - CodePipeline - Step 3 - Build

At this point, if you try to change the Region using the select option, the Create project button will disappear, so for now, let’s just click on Create project button and we can change the region in the following screen. And, please, make sure to select one of the regions where Windows builds are available.

AWS Console - CodePipeline - Step 3 - Build - Selecting region

Once you’ve selected a region where Windows builds are available, you can start entering all the required information for your build.

AWS Console - CodePipeline - Step 3 - Build - Project configuration

For the Environment section, we need to select the Custom image option, choose Windows 2019 as our Environment type, then select Other registry and add the Microsoft Docker image registry URL (mcr.microsoft.com/dotnet/framework/sdk:4.8) to the External registry URL.

AWS Console - CodePipeline - Step 3 - Build - Create - Environment

Buildspec config can be left as default.

AWS Console - CodePipeline - Step 3 - Build - Create - Buildspec

I highly recommend you to have a look at AWS docs Build specification reference for CodeBuild If you don’t know what a buildspec file is. Here is a brief description extracted from AWS documentation.

buildspec is a collection of build commands and related settings, in YAML format, that CodeBuild uses to run a build. You can include a buildspec as part of the source code or you can define a buildspec when you create a build project. For information about how a build spec works, see How CodeBuild works.

Let’s have a look at our Buildspec file.

version: 0.2

env:
  variables:
    SOLUTION: DotNetFrameworkApp.sln
    DOTNET_FRAMEWORK: 4.8
    PACKAGE_DIRECTORY: .\packages

phases:
  install:
    commands:      
      - echo "Use this phase to install any dependency that your application may need before building it."
  pre_build:
    commands:
      - nuget restore $env:SOLUTION -PackagesDirectory $env:PACKAGE_DIRECTORY
  build:
    commands:
      - msbuild .\DotNetFrameworkApp.API\DotNetFrameworkApp.API.csproj /t:package /p:TargetFrameworkVersion=v$env:DOTNET_FRAMEWORK /p:Configuration=Release
      - msbuild .\DotNetFrameworkApp.Worker.WebApp\DotNetFrameworkApp.Worker.WebApp.csproj /t:package /p:TargetFrameworkVersion=v$env:DOTNET_FRAMEWORK /p:Configuration=Release
      - msbuild .\DotNetFrameworkApp.Worker\DotNetFrameworkApp.Worker.csproj /t:build /p:TargetFrameworkVersion=v$env:DOTNET_FRAMEWORK /p:Configuration=Release
  post_build:
    commands:
      - echo "Preparing API Source bundle artifacts"
      - $publishApiFolder = ".\publish\workspace\api"; mkdir $publishApiFolder
      - cp .\DotNetFrameworkApp.API\obj\Release\Package\DotNetFrameworkApp.API.zip $publishApiFolder\DotNetFrameworkApp.API.zip
      - cp .\SetupScripts\InstallDependencies.ps1 $publishApiFolder\InstallDependencies.ps1
      - cp .\DotNetFrameworkApp.API\aws-windows-deployment-manifest.json $publishApiFolder\aws-windows-deployment-manifest.json
      - cp -r .\DotNetFrameworkApp.API\.ebextensions $publishApiFolder
      - echo "Preparing Worker Source bundle artifacts"
      - $publishWorkerFolder = ".\publish\workspace\worker"; mkdir $publishWorkerFolder
      - cp .\DotNetFrameworkApp.Worker.WebApp\obj\Release\Package\DotNetFrameworkApp.Worker.WebApp.zip $publishWorkerFolder\DotNetFrameworkApp.Worker.WebApp.zip
      - cp -r .\DotNetFrameworkApp.Worker\bin\Release\ $publishWorkerFolder\DotNetFrameworkApp.Worker
      - cp .\SetupScripts\InstallWorker.ps1 $publishWorkerFolder\InstallWorker.ps1
      - cp .\DotNetFrameworkApp.Worker.WebApp\aws-windows-deployment-manifest.json $publishWorkerFolder\aws-windows-deployment-manifest.json
      - cp -r .\DotNetFrameworkApp.Worker.WebApp\.ebextensions $publishWorkerFolder

artifacts:
  files:
    - '**/*'
  secondary-artifacts:
    api:
      name: api
      base-directory: $publishApiFolder
      files:
        - '**/*'
    worker:
      name: worker
      base-directory: $publishWorkerFolder
      files:
        - '**/*'

As you can see, we have a few different phases in our build spec file.

  1. install: Can be used, as its names suggest, to install any build dependencies that are required by your application and not listed as a NuGet package.
  2. pre_build: That’s a good place to restore all NuGet packages.
  3. build: Here’s where we will build our applications. In this example, we are building and packing all our 3 applications.
    1. msbuild .\DemoProject.API\DemoProject.API.csproj /t:package /p:TargetFrameworkVersion=v$env:DOTNET_FRAMEWORK /p:Configuration=Release
      1. msbuild: The Microsoft Build Engine is a platform for building applications.
      2. **\DemoProject.API.csproj: The web application we are targeting in our build.
      3. /t:Package: This is the MSBuild Target named Package which we have defined as part of the implementation of the Web Packaging infrastructure.
      4. /p:TargetFrameworkVersion=v$env:DOTNET_FRAMEWORK: A target framework is the particular version of the .NET Framework that your project is built to run on.
      5. /p:Configuration=Release: The configuration that you are building, generally Debug or Release, but configurable at the solution and project levels.
    2. For .NET Core/5+ we use the .NET command-line interface (CLI), which is a cross-platform toolchain for developing, building, running, and publishing .NET applications.
    3. Last but not least, we have our Worker or Windows Service application build. One of the differences here is the absence of the parameter MSBuild Target parameter /t:build when compared with our DemoProject.API.csproj web API project. Another difference is the folder where all binaries will be published.
  4. post_build: After all applications have been built we need to prepare the source bundle artifacts for Elastic Beanstalk. At the end of this phase, CodeBuild will prepare two source bundles which will be referred to by the artifacts section.
    1. In the first part of this phase, we are creating two workspace folders, one for our API and another for our Worker.
      1. Next, we are copying a few files to our API workspace.
        1. **\DemoProject.API.zip: This is the Web API package generated by MSBuild.
        2. **\InstallDependencies.ps1: Optional PowerShell script file that can be used to install, uninstall or even prepare anything you need in your host instance before your application starts running.
        3. aws-windows-deployment-manifest.json: A deployment manifest file is simply a set of instructions that tells AWS Elastic Beanstalk how a deployment bundle should be installed. The deployment manifest file must be named aws-windows-deployment-manifest.json.
aws-windows-deployment-manifest.json API sample
    1. Our Worker’s source bundle is prepared in the second part of this phase and it contains 2 applications:
      1. One is just an almost empty .NET core Web application required by Beanstalk that we are using as a health check.
      2. The second one is our actual Worker in form of a Windows Service Application.
    2. **\InstallWorker.ps1: Here’s a sample of a PowerShell script used to execute our Worker installer.
InstallWorker.ps1 sample

aws-windows-deployment-manifest.json: Very similar to the previous one, this file, the only difference is that now we have a specific script containing instructions to install our service in the host machine.

aws-windows-deployment-manifest.json Worker sample

In the artifacts section, CodeBuild will output two source bundles (API and Worker), which will be used as an input for the deploy stage.

Once you finish configuring your CodeBuild project, click on the Continue to CodePipeline button.

AWS Console - CodeBuild - Continue to CodePipeline

Now back to CodePipeline, select the region you created a CodeBuild project, then select it from the Project name dropdown. Feel free to add environment variables if you need them.

Click Next.

AWS Console - CodePipeline - Next stage

Step 4: Add deploy stage

We are now moving to our last CodePipeline step, the deployment stage. This step is to decide where our code is going to be deployed, or what AWS service we’re going to use to get our code to work on.

⚠️ Note: You will notice that we don’t have a way to configure two different deployments, so at this time you can either skip the deploy stage or set up only one application, then fix it later on. I will choose the latter option for now.

Select AWS Elastic Beanstalk for our Deploy provider.

Choose the Region that your Elastic Beanstalk is deployed under.

Then, search and select an Application name under that region or create an application in the AWS Elastic Beanstalk console and then return to this task.

⚠️ Note: If you don’t see your application name, double-check that you are in the correct region in the top right of your AWS Console. If you aren’t you will need to select that region and perhaps start this process again from the beginning.

Search and select the Environment name from your application.

Click Next.

AWS Console - CodePipeline - Step 4 - Deploy

Review

Now it’s time to review the entire settings of our pipeline to confirm before creating.

AWS Console - CodePipeline - Step 4 - Review

Once you are done with the review step, click on Create pipeline.

Pipeline Initiated

After the pipeline is created, the process will automatically pull your code from the GitHub repository and then deploy it directly to Elastic Beanstalk.

AWS Console - CodePipeline - Pipeline initiated

Let’s customize our Pipeline

First, we need to change our Build step to output two artifacts as stated in the build spec file.

In the new Pipeline, let’s click on the Edit button.

AWS Console - CodePipeline - Edit

Click on Edit stage button located in the “Edit: Build” section.

AWS Console - CodePipeline - Edit stage - Build

Let’s edit our build.

AWS Console - CodePipeline - Edit stage - Edit build

Let’s specify our Output artifacts according to our build spec file. Then, click on Done.

Output artifacts

Now, let’s click on the Edit stage button located in the “Edit: Deploy” section.

AWS Console - CodePipeline - Edit stage

Here we will edit our current Elastic Beanstalk, then we will add a second one.

Let’s edit our current Elastic Beanstalk deployment first.

AWS Console - CodePipeline - Edit deploy

Change the action name to something more unique for your application, then select “api” in the Input artifacts dropdown and click on Done.

AWS Console - CodePipeline - Edit deploy API action

Let’s add a new action.

Let’s add a new action.

Add an Action name, like DeployWorker for instance.

Select AWS Elastic Beanstalk in the Action provider dropdown.

Choose the Region that your Elastic Beanstalk is located.

Select “worker” in the Input artifacts dropdown.

Then, select your Application and Environment name, and click on Done.

AWS Console - CodePipeline - Add deploy Worker action

Save your changes.

Now we have both of our applications covered by our pipeline.

AWS Console - CodePipeline - Two deployments

Confirm Deployment

If we go to AWS Console and access the new Elastic Beanstalk app, we should see the service starting to deploy and then transition to deployed successfully.

⚠️ Note: If you, as in this application repository demo, are creating an AWS WAF, your deployment will fail if the CodePipeline role doesn’t have the right permission to create it.

AWS Console - Elastic Beanstalk - Failed to deploy application

Let’s fix it!

On AWS Console, navigate to IAM > Roles under IAM dashboard, and find and edit the role used by your CodePipeline by giving the right set of permissions required to CodePipeline be able to create a WAF.

AWS Console - IAM - Roles

Go back to your CodePipeline and click on Retry.

AWS Console -CodePipeline - Retry deployment

That will trigger the deploy step again and if you go to your Elastic Beanstalk app, you will see the service starting to deploy and then transition to deployed successfully.

Elastic Beanstalk app

After a few seconds/minutes, the service will transition to deployed successfully.

AWS Console - Elastic Beanstalk - Successfully Deployed

If we access the app URL, we should see our health check working.

API Health check

See deployment in action

This next part is to make a change to our GitHub repository and see the change automatically deployed.

Pipeline

Demo application

You can use your repository, but for this part, we’ll be utilizing this one.

Here’s the current project structure.

(root directory name)
├── buildspec.yml
├── DotNetFrameworkApp.sln
├── DotNetFrameworkApp.API
│   ├── .ebextensions
│   │   └── waf.config
│   ├── App_Start
│   │   ├── SwaggerConfig.cs
│   │   └── WebApiConfig.cs
│   ├── Controllers
│   │   ├── HealthController.cs
│   │   └── ValuesController.cs
│   ├── aws-windows-deployment-manifest.json
│   ├── DotNetFrameworkApp.API.csproj
│   ├── Global.asax
│   └── Web.config
├── DotNetFrameworkApp.Worker
│   ├── App.config
│   ├── DotNetFrameworkApp.Worker.csproj
│   ├── Program.cs
│   ├── ProjectInstaller.cs
│   └── Worker.cs
├── DotNetFrameworkApp.Worker.WebApp
│   ├── .ebextensions
│   │   └── waf.config
│   ├── App_Start
│   │   └── WebApiConfig.cs
│   ├── Controllers
│   │   ├── HealthController.cs
│   │   └── StatusController.cs
│   ├── aws-windows-deployment-manifest.json
│   ├── DotNetFrameworkApp.Worker.WebApp.csproj
│   ├── Global.asax
│   └── Web.config

DotNetFrameworkApp repository contains 3 applications (API, Worker, and a WebApp for the Worker) created with .NET Framework 4.8.

We are also adding an extra security layer using a Web Application Firewall (WAF) to protect our Application Load Balancer, created by Elastic Beanstalk, against attacks from known unwanted hosts.

Code change

Make any change you need in your repository and either commit and push directly to main or create a new pull request and then merge that request to the main branch.

Once pushed or merged, you can take a look at the CodePipeline automatically pull and deploy this new code.

CodePipeline automatically triggered by a git push

What’s next?

The next step would be to introduce Terraform, have everything we have built here as code, have an automatic way to pass additional environment variables, and introduce logging.

Final Thoughts

AWS CodePipeline when combined with other services can be a very powerful tool you can use to modernize and automatize your Windows workloads. This is just a first step, and you definitely should start planning to have: automated tests, environment variables, and even a better way to have Observability on your application.

DNX has the solutions and experience you need. Contact us today for a blueprint of your journey towards data engineering.

Reinventing myDNA Business with Data Analytics

About myDNA

myDNA is a health tech company bringing technology to healthcare with a mission to improve health worldwide. They developed a personalised wellness myDNA test that lets you discover how your body is likely to respond to food, exercise, sleep, vitamins, medications, and more, according to your genome.

It is a life changer for those who want to skip the lengthy trial and error process, and achieve their desired fitness goals sooner. Moreover, myDna is a reliable way of assisting practitioners in selecting safe and effective medications for their patients based on their unique genetic makeup. For example, doctors can prescribe antidepressants, and post-surgery pain killers that are more likely to be successful in the first instance.

The most exciting part is that this technology, which has historically been so expensive, is now available at an affordable price for normal people like you and me! Not to mention, finding out you have relatives on the other side of the world through a family matching DNA test is pretty cool!

Providing life health services based on accurate data

After replatforming myDNA IT systems from a distributed monolithic database to a microservice architecture, the team needed assistance in delivering automated tools and meaningful insights through the business. This would give them an understanding of potential areas and markets to expand their services, the agility to move and change fast as a business and, provide an advantage over competitors by delivering the services, products, and customer experience their customers seek. This is all based on data rather than assumptions.

myDNA was seeking a cloud consultant that could assist them in exploring and understanding events by expanding their data and analytics capabilities. In addition, the business planned to increase their data skills so their in-house IT team would be able to maintain and continue building the new applications in a safe and effective environment.

AWS performed a Data Lab with myDNA stakeholders where they co-designed a technical architecture and built a Pilot to start the journey. This gave the myDNA team an understanding of all the AWS cloud data and analytics solutions available. However, they required a personalised and well-designed technology roadmap taking their IT skills and myDNA business goals into consideration, as opposed to a ‘one solution fits all’ strategy. This is exactly what DNX Solutions delivered!

How did DNX Solutions help myDNA establish a modern security data strategy in just one month?

The project started with DNX’s effective and interactive discovery where our team identified the company’s needs, had a complete picture of the existing company data, the architecture used, potential technological and/or team challenges. With that, our team created a clear road map where outcomes were evident even before the conclusion of the project

project road map

In the initial phase, DNX built the MVP using AWS Console, more general roles, data sources, and built simple reports and dashboards to present basic metrics.
After that, our data cloud experts built a more robust solution fit for production, with a focus on resilience, performance, reliability, security and cost optimisation using Devops methodology, CI/CD pipelines, automation and serverless architecture whenever possible.

Once the core platform was established, we brought more data sources, integrating them into the solution, and helped to build more complex and advanced solutions such as Machine Learning.

AWS Services Used

S3 Datalakes
Raw: hosts the data extracted allowing governance, auditability, durability and security controls

DynamoDB / SSM
Stores configuration tables, parameters, and secrets used by the pipeline and ETL Jobs to automate the data process

Crawlers
Crawlers can scan the files in the datalake or databases, infer the schema and add the tables on the data catalogues

Glue ETL
Serverless Spark solution for high performance ETL jobs within AWS

Data Catalogues
Stores the metadata and metrics regarding Databases, Connections, Jobs, partitions, etc. It can grant/deny access up to the table level

Quicksight
Can consume data from multiple sources within AWS and allow user-friendly development of reports, analytics and dashboards integrated with AWS platform

Lake Formation
Low code solution to govern and administer the Data Lake. An additional layer of security including row/column level controls

Lambdas
Wild cards that can help tie the solution together in a variety of roles and use cases

Athena
Athena can query data stored in S3 using simple SQL. It allows access segregation to metadata and history via workgroups, which can be compounded with IAM roles

myDNA to provide real insights at the click of a button

There is no doubt that DNX Solutions delivered value to myDNA. The team reported they were able to deliver another data transformation that depended directly on the result of DNX’s work.

Before engagement with DNX, the myDNA team could take three to five days to deliver a few manual reports in response to business queries. The company now is able to deliver different reports based on live data with just a click of a button. Not only does the business have accurate insightful data to make their decision of what, when, and where they should invest, but they also have the agility to make these decisions.

The myDNA team can now focus on what they do best rather than spending days merging unreliable information from various sources to produce a handful of outdated reports.

The next step for myDNA is to adopt AWS machine learning to unveil predictions, achieving far better real-world results.

Descubra o valor dos dados

A eficacia de uma líderança depende do uso de dados para tomar decisões importantes, é preciso ter um olhar amplo com informações assertivas para ter ações significativas, assim é contruida uma estratégia de dados moderna para fornecer insights às pessoas e aplicações que precisam, com segurança e em qualquer escala.

A DNX Brasil ajuda sua empresa a aplicar análise de dados em seus casos de uso mais críticos para os negócios com soluções completas que precisam de experiência em dados. 

Data Archiving utilizing Managed Workflows for Apache Airflow

Data Archiving utilising Managed Workflows for Apache Airflow

We assisted a Fintech client to minimize its storage cost by archiving its data from RDS (MySQL) to S3 using an automated batch process, where all data from a specific time range should be exported to S3. Once the data is stored on S3 the historical data can be analyzed using AWS Athena and Databricks. The solution should include a delete strategy to remove all data older than two months.

Currently, the database size has increased exponentially with the number of logs that are stored in the database, this archive procedure should have a minimal impact on the production workload and be easy to orchestrate, for this specific data archiving case we are handling tables with more than 6 TB of data which should be archived in the most efficient manner, part of this data will no longer be necessary to be stored on the database.

In this scenario, Managed Workflows for Apache Airflow (MWAA), a managed orchestration service for Apache Airflow, helps us to manage all those tasks. Amazon MWAA fully supports integration with AWS services and popular third-party tools such as Apache Hadoop, Presto, Hive, and Spark to perform data processing tasks.

In this example, we will demonstrate how to build a simple batch processing that will be executed daily, getting the data from RDS and exporting it to S3 as shown below.

Export\Delete Strategy:

  • The batch routine should be executed daily
  • All data from the previous day should be exported as CSV 
  • All data older than 2 months should be deleted

Solution

  • RDS – Production database 
  • MWAA – (to orchestrate the batches)
  • S3 bucket – (to store the partitioned CSV files) 
Data Archiving utilizing Managed Workflows for Apache Airflow solution

As shown in the architecture above, MWAA is responsible for calling the SQL scripts directly on RDS, in Airflow we use MySQL operator to execute SQL scripts from RDS.

To encapsulate those tasks we use an Airflow DAG.

Airflow works with DAGs, DAG is a collection of all the tasks you want to run. A DAG is defined in a Python script, which represents the DAGs structure (tasks and their dependencies) as code.

In our scenario, the DAG will cover the following tasks:

  • Task 1 – Build procedure to export data 
  • Task 2 – Execute procedure for export
  • Task 3 – Build procedure to delete data 
  • Task 4 – Execute delete procedure 

Airflow DAG graph

Airflow DAG graph

Creating a function to call a stored procedure on RDS

EXPORT_S3_TABLES = {
"id_1": {"name": "table_1", },
"id_2": {"name": "table_2" },
"id_3": {"name": "table_3"},
"id_4": {"name": "table_4"}
}

def export_data_to_s3(dag, conn, mysql_hook, tables):
tasks = []
engine = mysql_hook.get_sqlalchemy_engine()
with engine.connect() as connection:
for schema , features in tables.items():
run_queries = []
t = features.get("name") #extract table name
statement = f'call MyDB.SpExportDataS3("{t}")'
sql_export = (statement).strip()
run_queries.append(sql_export)
task = MySqlOperator(
sql= run_queries,
mysql_conn_id='mysql_default',
task_id=f"export_{t}_to_s3",
autocommit = True,
provide_context=True,
dag=dag,
)
tasks.append(task)
return tasks

To deploy the stored procedure we can use MySQL Operator that will be responsible for executing the “.sql” files as shown below

build_proc_export_s3 = MySqlOperator(dag=dag,
                           mysql_conn_id='mysql_default', 
                           task_id='build_proc_export_to_s3',
                           sql='/sql_dir/usp_ExportDataS3.sql',
                           on_failure_callback=slack_failed_task,
                           )

Once the procedure has been deployed we can execute it using mysqlhook which will execute the stored procedure using the export_data_to_s3 function.

t_export = export_data_to_s3(dag=dag,
                        conn="mysql_default",
                        mysql_hook=prod_mysql_hook,
                        tables=EXPORT_S3_TABLES,
                        )

MWAA will orchestrate each SQL script that will be called on RDS, 2 stored procedures will be responsible for exporting and deleting the data consecutively. With this approach, all intensive work (read/process data) will be handled by the database and Airflow will work as an orchestrator for each event.

In addition, Aurora MySQL has a built-in function (INTO OUTFILE S3) that is able to export data directly to S3, that way we do not need another service to integrate RDS with the S3, the data can be persisted directly on the bucket once the procedure is called.
​​
E.g: INTO OUTFILE S3

SELECT id , col1, col2, col3 
FROM table_name  	 
INTO OUTFILE S3 's3-region-name://my-bucket-name/mydatabase/year/month/day/output_file.csv' 
FORMAT CSV HEADER 
FIELDS TERMINATED BY ' ,' 
LINES TERMINATED BY '\n'  
OVERWRITE ON; 

With this function we don’t need to handle the data with python scripts from Airflow, the data will be totally processed by the database and won’t be necessary create data transformation to output the data as CSV.

Conclusion

 

Airflow is a powerful tool that allows us to deploy smart workflows using simple python code. With this example, we demonstrated how to build a batch process to move the data from a relational database to S3 in simple steps.

There are an unlimited number of features and integrations that can be explored on MWAA, if you need flexibility and easy integration with different services (even non-AWS services), this tool can likely meet your needs.

DNX has the solutions and experience you need. Contact us today for a blueprint of your journey towards data engineering.

Data Breach 2022

What is the Real Cost of a Data Breach in 2022?

Did Data Breaches increase in 2021?

One of the biggest changes that occurred as a result of the COVID-19 pandemic is the way in which we work. Whilst remote work began as a temporary fix to deal with lockdowns, it is a shift that has been embraced by numerous businesses over the past two years. Such a sudden change, however, was not free of risk. The unpredictability of recent years has seen a focus on survival, with security falling by the wayside. And while we are all distracted by global happenings, hackers have been taking advantage.

Data breaches and the costs associated with them have been on the rise over the past several years, but the average cost per breach jumped from US$3.86 million in 2020 to US$4.24 million in 2021, becoming the highest average total cost seen in the history of IBM’s annual Data Breach report. Remote working is not solely to blame for increased data breaches, however, companies that did not implement any digital transformation changes in the wake of the pandemic had a 16.6% increase in data breach costs compared to the global average. For Australian companies, it is estimated that 30% will fall victim to some sort of data breach, and consequences can be felt for years. The Australian Cyber Security Centre (ACSC) estimates the cost of cybercrimes for Australian businesses and individuals was AU$33 billion in 2021. To protect your business from becoming a part of these statistics, it is crucial to understand how data breaches can affect you and how to take necessary precautions.

What exactly is a data breach?

Data breaches are diverse; they can be targeted, self-spreading or come from an insider; affect individuals or businesses; steal data or demand ransoms. Although certain Australian businesses are mandated by law to notify customers when a breach has occurred, many attacks are kept quiet, meaning their frequency is higher than commonly believed.

What are the different types of data breaches?

  • Scams/phishing: Fraudulent emails or websites disguised as a known sender or company.
  • Hacking: Unauthorised access gained by an attacker, usually through password discovery.
  • Data spill: Unauthorised release of data by accident or as a result of a breach.
  • Ransomware: Malicious software (malware) accesses your device and locks files. The criminals responsible then demand payment in order for access to be regained.
  • Web shell malware: Attacker gains access to a device or network, a strategy that is becoming more frequent.

The most common category of sensitive data stolen during data breaches is the Personal Identifiable Information (PII) of customers. This data not only contains financial information such as credit card details, but can also be used in future phishing attacks on individuals. The average cost per record is estimated between US$160 and US$180, meaning costs can add up very quickly for a business that loses thousands of customers’ PII in a single attack. All industries can be affected by data breaches, but those with the highest costs are healthcare, financials, pharmaceuticals and technology. According to the 2021 IBM report, each of these industries had a slight decrease in costs associated with data breaches from 2020 to 2021, except for healthcare which increased by a shocking 29.5%

What are the costs?

IBM identified the ‘Four Cost Centres’ which are the categories contributing most global data breach costs. In 2021 the costs were: Lost business cost (38%), Detection and escalation (29%), Post breach response (27%), Notification (6%).

Lost business, the highest cost category for seven consecutive years, includes business disruption and loss of revenue through system downtime (such as delayed surgeries due to ransomware in hospitals), lost customers, acquiring new customers, diminished goodwill and reputation losses.

Detection and escalation costs refer to investigative activities, auditing services, crisis management and communications.

Post breach response costs are associated with helping clients recover after a breach, such as opening new accounts and communicating with those affected. These also include legal expenditures, and, with compliance standards such as HIPAA and CDR becoming more commonplace, regulatory fines are adding significantly to costs in this category. Businesses with a high level of compliance failures are spending on average 51.1% more on data breaches than those with low compliance failures.

Notification costs include communications to those affected and regulators, determination of regulatory requirements and recruiting the assistance of experts. In Australia, businesses and not-for-profits with an annual turnover of more than $3 million, government agencies, credit reporting bodies and health service providers are required by law to inform customers of data breaches and how they can protect themselves from such breaches. It is crucial for businesses to be aware of these responsibilities or they may be subjected to paying further fines.

With lost business being the highest cost associated with breaches, it is no surprise that consequences can be felt years after the initial breach. Reports have found 53% of costs to be incurred two to three years after the breach for highly regulated industries such as healthcare and financial services.

Although significantly less than the global average, the average cost of a data breach in Australia still sits at around AU$3.35 million. Approximately 164 cybercrimes are reported each day in Australia and the attacks are growing more organised and sophisticated. One predictive factor of overall costs is the response time: the longer the lifecycle of a data breach, the more it will cost. Whilst a hacker can access an entire database in just a few hours, detecting a breach takes the average Australian organisation over six months! Many organisations never even identify that a breach has occurred, or find out through victory posts on the dark web. IBM reported that breaches contained in over 200 days cost a business US$1.26 million more than those contained in under 200 days. In addition, they found the average data breach lifecycle was a week longer in 2021 compared to the previous year.

How to avoid data breaches?

The way to protect your business against malicious use of advanced and sophisticated technology is by utilising advanced and sophisticated technology in your security systems. IBM found significantly lower overall costs for businesses with mature security postures, utilising zero trust, cloud security, AI and automation. It is estimated that with AI and machine learning, breaches are detected 27% faster. Mature zero trust systems also resulted in savings of US$1.76 million compared to organisations not utilising zero trust. Organisations with mature cloud modernisation contained breaches 77 days faster than other organisations, and those with high levels of compliance significantly reduced costs.

With data breaches on the rise, and modern businesses relying on technology more heavily than ever before, it is reasonable to predict the cost of data breaches in Australia will only increase in 2022. You can avoid becoming a victim and having to pay the price for years to come by modernising your data and meeting industry compliance regulations.

DNX has the solutions and experience you need. Contact us today for a blueprint of your journey towards data security.

Data Dependency

Data Dependency

The Importance of Data Dependency

Why not investing in data platforms is setting your company up for disaster.

Companies with legacy systems or workloads face one of three problems more often than not. Maybe your company has already experienced issues with time to market, bugs in production or limited task coverage due to a lack of confidence in releasing new features. These are the issues usually picked up upon by the CTO or a technical leader who recognises the need to invest in architecture to increase the quality and speed of progress. There is, however, another important underlying problem that no one seems to be talking about.

How and where are you storing your data?

Looking at a typical legacy system these days, it is likely to be a Java, .Net application using a relational database storing a huge amount of data; we have seen companies with tables containing up to 13 years of data!

When we ask customers why they keep all their data on the same database, they rarely have an answer. Often, old or irrelevant data has been retained without reasons, but simply because it has been forgotten about and ended up getting lost among the masses.

With consistently increasing amounts of data comes consistently increased response times for querying information from the database. Whilst it may not be noticable day to day, it could lead to serious consequences, such as losing valuable time and revenue whilst waiting for a backup to restore after a database outage.

It is puzzling to think that whilst we have our best minds considering so much down to the minute details, we largely ignore the way in which we store data. It seems we have a collective ‘out of sight, out of mind’ attitude.

It is extremely common to come across companies that are generating reports from a single database. But here’s the interesting part: each software is unique in terms of security and operation, meaning storage is different for each and every one. Let’s consider an ecommerce store. In this case you would want to organise your tables and data in a way that allows users to easily add items to their shopping cart, place an order, and pay. To make this possible, you would have a shopping cart table, orders table, and products table, which is what we call normalising the database – a relational database. So far, so good. Now let’s look at what happens when you want to run a report. To fully understand your ecommerce business you will want to see your data in various ways, for example, number of sales in NSW in the last seven days; average shopping cart price; average checkout amount; average shipping time frame. Each of these scenarios require data from multiple sources, but by keeping all your data on the same database you are risking the whole operation.

Just as you may lose deals if customers have to wait five seconds to add an item to their shopping cart, you also lose valuable resources while waiting an hour to generate a report – something that is not uncommon to see on legacy applications. Not to mention the direct and indirect consequences of having to wait hours to restore a backup after a database outage (that is, if you even have a backup!).

By choosing not to modernise your data, your business is perched squarely on a ticking time bomb. With a typical ratio of 15 – 20 Developers to 1 Database Administrator (DBA), the DBA is without a doubt the underdog. If the DBA’s suggestions are ignored, developers may begin to modernise their source code and adopt microservices whilst the company’s data in its entirety.

So what happens next?

They might use a database and probably they will use a database.  They could use something else. They have to manage states.

Instead of having separate tables for your shopping cart, orders, and products; you now have a product microservice with its own product table, far from the customer microservice and its customer table, which is located in a different database. In addition, the shopping cart may now be in a no sequence database.

Now comes the time to run your reports, but you can no longer do a SELECT in a database and join all the tables because the tables are unreachable. Now you find yourself with a whole host of different problems and a new level of complexity.

Consider the data dimension to fully modernise your application

Now that you understand the importance of data modernisation, you need to know a few key points. To take full advantage of the cloud when modernising your architecture and workloads, you have to find out which tools the cloud has to offer. First, you need to understand that Microservices have to manage states and will likely use a database to do so, due to transactional responsibility. For example, when you create a new product for your ecommerce store, you want it to exist until you actively decide to discontinue it, so you don’t want the database to forget about it – we refer to this as durability.

Consistency is equally important; for example, when you market the product as unavailable, you do not want it to be included in new orders. This is a transactional orientation.

Now we need to understand the analytical view. In order to see how many products you are selling to students in year 8 to year 12 you need to run a correlation between the products, customers and orders. This requires you to have a way of viewing things differently. Most companies choose to build a data warehouse where they can store data in a way that enables them to slice and change the dimension they are looking at. Whilst this is not optimal for transactional operations, it is optimal for analytical operations.

That segregation is crucial. If you build that, you can keep your Microservices with multiple different databases in one state or multiple states in an architecture that is completely decoupled from an analytical data warehouse facility that enables and empowers the business to understand what is happening in the business.

This is hugely important! Operating without these analytical capabilities is like piloting a plane with no radio or navigating systems: you can keep flying but you have no idea where you are going, nor what is coming your way! This analytical capability is crucial to the business but you have to segregate that responsibility. Keeping your new modernised architecture independent from your data warehouse and analytical capability is key.

So, where do we go from here? Utilising Data Lakes

DNX assisted companies enjoying high levels of success through the adoption of data lakes. A data lake can contain structured and unstructured data as well as all the information you need from Microservices, transactional databases and other sources. If you want to include external data from the market today, such as fluctuations in oil prices – go ahead! You can input them into the data lake too! You should take care to extract and clean your data if you can before putting it into the data lake, as this will make its future journey smoother.
Once all your data is in the data lake, you can then mine relevant information and input it in your data warehouse where it can be easily consumed.

Data modernisation can save your company from impending disaster, but it is no small feat!
Most people assume it is as simple as breaking down a monolithic into microservices, but the reality is far more complex.

When planning your data modernisation you must consider reporting, architectural, technical and cultural changes, as well as transactional versus analytical responsibilities of storing stages, and their segregation. All of this becomes a part of your technological road map and shows you the way to a more secure future for your business.

If you would like to know how we have achieved this for multiple clients, and can do the same for you

Na DNX Brasil, rabalhamos para trazer uma melhor experiência em nuvem e aplicações para empresas nativas digitais.

Trabalhamos com foco em AWS, Well-Architected Solutions, Containers, ECS, Kubernetes, Integração Contínua/Entrega Contínua e Malha de Serviços.

Estamos sempre em busca de profissionais experiêntes em cloud computing para nosso time, focando em conceitos cloud-native.

Confira nossos projetos open-souce em https://github.com/DNXLabs e siga-nos no Twitter, Linkedin or YouTube.

DbT and Redshift to provide efficient Quicksight reports

Using DbT and Redshift to provide efficient Quicksight reports

DbT and Redshift to provide efficient Quicksight reports

TL;DR:

Using Redshift as a Data Warehouse to integrate data from AWS Pinpoint, AWS DynamoDB, Microsoft Dynamics 365 and other external sources. 

Once the data is ingested to Redshift, DbT is used to transform the data into a format that is easier to be consumed by AWS Quicksight.

Each Quicksight report/chart has a fact table. This strategy allows Quicksight to efficiently query the data needed.

The Customer

The client is a health tech startup. They created a mobile app that feeds data to the cloud using a serverless architecture. They have several data sources and would like to integrate this data into a consolidated database (Data Warehouse). This data would then be presented in a reporting tool to help the business drive decisions. The client’s data sources:

  • AWS DynamoDB – User preferences
  • AWS Pinpoint – Mobile application clickstream
  • Microsoft Dynamics 365 – Customer relationship management
  • Stripe – Customer payments
  • Braze – A customer engagement platform

The client also needs to send data from the Data Warehouse to Braze, used by the marketing team to develop campaigns. This was done by the client, using Hightouch Reverse ETL.

The Solution

Overall Architecture Dbt

The overall architecture of the solution is presented in Figure 1. AWS Redshift is the Data Warehouse, which receives data from Pinpoint, DynamoDB, Strip and Dynamics 365. Quicksight then queries data from Redshift to produce business reports. Following, we will describe each data source integration. As a Cloud-native company, we work towards allowing our clients to easily manage their cloud infrastructure. For that reason, the infrastructure was provisioned using Terraform. Terraform allowed the client to apply the same network and data infrastructure in their 3 different environments with ease.

DynamoDB

The users’ preferences are stored on AWS DynamoDB. A simple AWS Glue job, created using Glue Studio, is used to send DynamoDB data to Redshift. It was not possible to use the COPY command from Redshift as the client’s DynamoDB contains complex attributes (SET). The job contains a 5-line custom function to flatten the JSON records from DynampoDB, presented in Table 1. For Glue to access DynamoDB tables we needed to create a VPC Endpoint

def MyTransform (glueContext, dfc) -> DynamicFrameCollection:
	df = dfc.select(list(dfc.keys())[0])
	dfc_ret = Relationalize.apply(frame = df, staging_path = "s3://bucket-name/temp", name = "root", transformation_ctx = "dfc_ret")
	df_ret = dfc_ret.select(list(dfc_ret.keys())[0])
	dyf_dropNullfields = DropNullFields.apply(frame = df_ret)
	return(DynamicFrameCollection({"CustomTransform0": dyf_dropNullfields}, glueContext))

Pinpoint

The mobile app clickstream is captured using AWS Pinpoint and stored on S3 using an AWS Kinesis delivery stream. There are many ways to load data from S3 to Redshift. Using COPY command, a Glue Job or Redshift Spectrum. We decided to use Redshift Spectrum as we would need to load the data every day. Using Spectrum we can rely on the S3 partition to filter the files to be loaded. The pinpoint bucket contains partitions for Year, Month, Day and Hour. At each run of our ELT process, we filter S3 load based on the latest date already loaded. The partitions are automatically created using Glue Crawler. Glue Crawler also automatically parse JSON into struct columns types. Table 2 show a SQL query that illustrates the use of Spectrum partitions.


select
  event_type,
  event_timestamp,
  arrival_timestamp,
  attributes.page,
  attributes.title,
  session.session_id as session_id,
  client.cognito_id as cognito_id,
  partition_0::int as year,
  partition_1::int as month,
  partition_2::int as day,
  partition_3::int as hour,
  sysdate as _dbt_created_at
  from pinpoint-analytics.bucket_name

  -- this filter will only be applied on an incremental run
  where 
  partition_0::int >= (select date_part('year', max(event_datetime)) from stg_analytics_events)
  
  and partition_1::int >= (select date_part('month', max(event_datetime)) from stg_analytics_events)
  
  and partition_2::int >= (select date_part('day', max(event_datetime)) from stg_analytics_events)

Microsoft Dynamics 365 and Stripe

Two important external data sources required in this project are CRM data from Dynamics and Payment information from Stripe. An efficient and user-friendly service that helps with data integration is Fivetran. Fivetran has more connectors than other tools, including Microsoft Dynamics and Stripe. Fivetran provides such a connector and has an easy to use interface which was essential for this client.

DbT – ELT FLow

The client wanted a data transformation tool that was scalable, collaborative and that allowed version control. DbT was our answer. As we have seen in many other clients, DbT has been the first answer when it comes to running ELT (Extract, Load, Transform) workflows. After we built the first DAGs (Directed Acyclic Graph) with DbT, using Jinja template for raw tables (source) and staging table (references) and showed it to the client, they were amazed by the simplicity and software engineering way that DbT works. Having an ELT workflow that is source controlled is a very unique feature from DbT.

In DbT, the workflow is separated into different SQL files. Each file contains a partial staging transformation of the data until the data is consolidated into a FACT or DIMENSION table. These final tables are formed by one or more staging tables. Using the Jinja templates to reference tables between each other allows DbT to create a visual representation of the relationships. Figure 2 presents an example of a DbT visualization. DbT allowed us to create tables that could be efficiently queried by Quicksight.

Dbt Visualisation

Quicksight

 

Once the data is organised and loaded into Redshift, it is time for visualising it. AWS Quicksight easily integrates with Redshift and several other data sources. It provides a number of chart options and allows the clients to embed their reports in their internal systems. For this client, we use Bar charts, Pie charts, Line charts and a Sankey diagram for customer segment flow. The client was very happy with the look and feel of the visualizations and with the loading speed. Some minor limitations from Quicksight include a) not being able to give a title to multiple Y-axis and b) making the Sankey diagram follow the dashboard theme. Except that, it allowed us to reach a great improvement in the client’s ability for data-driven decision making.
A great next step regarding the Quicksight would be to implement QuickSight object migration and version control from staging to production environments.

Conclusion

In this article, we described a simple and efficient architecture that enabled our client to obtain useful insights from their data. Redshift was used as the central repository of data, the Data Warehouse, receiving ingestion from several data sources such as Pinpoint, DynamoDB, Dynamics and Stripe. DbT was used for the ELT workflow and Quicksight for the dashboard visualisations. We expect to be using this same architecture for clients to come as it provides agile data flows and insightful dashboards.

Na DNX Brasil, rabalhamos para trazer uma melhor experiência em nuvem e aplicações para empresas nativas digitais.

Trabalhamos com foco em AWS, Well-Architected Solutions, Containers, ECS, Kubernetes, Integração Contínua/Entrega Contínua e Malha de Serviços.

Estamos sempre em busca de profissionais experiêntes em cloud computing para nosso time, focando em conceitos cloud-native.

Confira nossos projetos open-souce em https://github.com/DNXLabs e siga-nos no Twitter, Linkedin or YouTube.

Tenha informações das últimas previsões e atualizações tecnológicas

 

Sem spam - apenas novidades, atualizações e informações técnicas.