InfoQ Homepage Big Data Content on InfoQ

Articles

RSS Feed

Newer Older

Development

Zero-Knowledge Proofs for the Layman

This article will introduce you to zero-knowledge proofs, a kind of cryptography you can use to provide the proof you know a secret, such as a private key or the solution to a problem, without ever sharing it to an interested party. While many articles exist on the topic, this will not require any high math knowledge.

Debasish Ray Chawdhuri
on Mar 18, 2024
Culture & Methods

Minimising the Impact of Machine Learning on our Climate

This article introduces the field of green software engineering, showing the Green Software Foundation’s Software Carbon Intensity Specification, which is used to estimate the carbon footprint of software, and discusses ideas on how to make machine learning greener. It aims to give you the tools to take an active part in the climate solution.

Sara Bergman
on May 30, 2023
DevOps

Data Protection Methods for Federal Organizations and beyond

The Federal Data Strategy describes a plan to “accelerate the use of data to deliver on mission, serve the public, and steward resources while protecting security, privacy, and confidentiality." This article covers what it is and how it can be applied to any organization.

Alex Tray
on Jan 18, 2023
Development

Who Moved My Code? An Anatomy of Code Obfuscation

In this article, we introduce the topic of code obfuscation, with emphasis on string obfuscation. Obfuscation is an important practice to protect source code by making it unintelligible. Obfuscation is often mistaken with encryption, but they are different concepts. In the article we will present a number of techniques and approaches used to obfuscate data in a program.

Michael Haephrati Ruth Haephrati
on Nov 09, 2022
Development

Virtual Panel: the New US-EU Data Privacy Framework

Recent rulings by several European courts have set important precedents for restricting personal data transmission from the EU to the US. As a consequence, the US and EU have started working on a new agreement. In this virtual panel, three knowledgeable experts discuss where the existing agreements fall short, and whether a new privacy agreement could improve the current situation.

Chris McLellan Jeff Jockisch Stephen Bailey Sergio De Simone
on Oct 13, 2022
DevOps

Embracing Cloud-Native for Apache DolphinScheduler with Kubernetes: a Case Study

This article shares how Apache DolphinScheduler was updated to use a more modern, cloud-native architecture. This includes moving to Kubernetes and integrating with Argo CD and Prometheus. This improves substantially the user experience of deploying, operating, and monitoring DolphinScheduler.

Yang Dian
on Jun 24, 2022
AI, ML & Data Engineering

Developing Deep Learning Systems Using Institutional Incremental Learning

Institutional incremental learning promises to achieve collaborative learning. This form of learning can address data sharing and security issues, without bringing in the complexities of federated learning. This article talks about practical approaches which help in building an object detection system.

Ritesh Sinha
on Jan 05, 2022
AI, ML & Data Engineering

Accelerating Deep Learning on the JVM with Apache Spark and NVIDIA GPUs

In this article, authors discuss how to use the combination of Deep Java Learning (DJL), Apache Spark v3, and NVIDIA GPU computing to simplify deep learning pipelines while improving performance and reducing costs. They also show the performance comparison of this solution with GPU vs CPU hardware, using Amazon EMR and NVIDIA RAPIDS Accelerator.

Haoxuan Wang Qing Lan Carol McDonald
on Jun 11, 2021
Cloud

Evolution of Azure Synapse: Apache Spark 3.0, GPU Acceleration, Delta Lake, Dataverse Support

At Microsoft Build 2021, Azure Synapse has announced significant improvements for its Apache Spark pool, its performance, and data querying and integration capabilities. This article outlines the improvements and provides the context.

Lena Hall
on May 29, 2021
Cloud

Indestructible Storage in the Cloud with Apache Bookkeeper

At Salesforce, we required a storage system that could work with two kinds of streams, one stream for write-ahead logs and one for data. But we have competing requirements from both of the streams. Being the pioneers in cloud computing, we also required our storage system to be cloud-aware as the requirements of availability and durability are ever more increasing.

Anup Ghatage
on Apr 28, 2021
AI, ML & Data Engineering

The Evolution of Precomputation Technology and its Role in Data Analytics

In this article, author Yang Li discusses the importance of precomputation techniques in databases, OLAP and data cubes, and some of the trends in using precomputation in big data analytics.

Yang Li
on Feb 11, 2021
AI, ML & Data Engineering

Performance Tuning Techniques of Hive Big Data Table

In this article, author Sudhish Koloth discusses how to tackle performance problems when using Hive Big Data tables.

Sudhish Koloth
on Feb 05, 2021