Close Menu

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    Nigeria Launches AfCFTA Air Corridor, Boosting Trade with Three Countries

    June 1, 2025

    26 killed in Israeli tank fire near aid centre, medics say

    June 1, 2025

    Apple to rename its operating systems

    June 1, 2025
    Facebook X (Twitter) Instagram
    • Home
    • Contact Us
    • About Us
    • Privacy Policy
    • Terms Of Service
    • Advertisement
    Sunday, June 1
    Facebook X (Twitter) Instagram Pinterest Vimeo
    ABSA Africa TV
    • Breaking News
    • Africa News
    • World News
    • Editorial
    • Environ/Climate
    • More
      • Cameroon
      • Ambazonia
      • Politics
      • Culture
      • Travel
      • Sports
      • Technology
      • AfroSingles
    • Donate
    ABSLive
    ABSA Africa TV
    Home»Technology»The security side of getting data AI-ready
    Technology

    The security side of getting data AI-ready

    Chris AnuBy Chris AnuMay 30, 2025No Comments5 Mins Read
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr Email Reddit
    The security side of getting data AI-ready
    Share
    Facebook Twitter LinkedIn Pinterest Email Copy Link


    Louis De Gouveia, data competency manager at iOCO.


    In my previous article, I covered the principles crucial to getting data AI-ready; namely, data must be: diverse, timely, accurate, secure, discoverable and easily consumable by machines. Here I expand on the remaining principles and the all-important issue of security.

    Artificial intelligence (AI) systems often use sensitive data − including personally identifiable information, financial records, or proprietary business information − and use of this data requires responsibility.

    Criminals are very capable of stealing sensitive information, manipulating training data to bias outcomes, or even disrupting entire generative AI (GenAI) systems. Securing data is crucial to privacy protection, maintaining model integrity and guaranteeing the responsible development of powerful AI applications.

    Three tactics can help companies to automate data security at scale, since it’s virtually impossible to do it manually. Data classification detects, categorises and labels data that feeds the next stage. Data protection defines policies like masking, tokenisation and encryption to conceal the data. Finally, data security defines policies that describe access control, such as who can access the data.

    The three concepts work together as follows: first, privacy tiers should be defined and data tagged with a security designation of sensitive, confidential, or restricted. Next, a protection policy needs to be applied to mask restricted data. Finally, an access control policy must be implemented to limit access rights.

    Data transformation is regarded as the unsung hero of consumable data for machine learning.

    Next, data needs to be discoverable. AI-ready data must be discoverable and readily accessible within the system. Discoverable data unlocks the true potential of machine learning (ML) and GenAI, allowing these workloads to find the information they need to learn, adapt and produce groundbreaking results.

    Good metadata practices drive discoverability. Beyond technical metadata, defining business metadata and semantic typing enhances both automated and human understanding. All metadata is then indexed and searchable via a data catalogue.

    Data must be easily consumable by ML or large language models (LLMs). AI initiatives won’t be successful if the data is not in the right format for ML experiments or LLM applications.

    The true potential of ML and GenAI applications rests with the ability to readily consume data. Unlike humans who can decipher handwritten notes or navigate messy spreadsheets, these technologies require information to be represented in specific formats.

    Making data easily consumable helps unlock the potential of these AI systems, allowing them to ingest information smoothly and translate it into intelligent actions for creative outputs.

    Data transformation is regarded as the unsung hero of consumable data for ML. While algorithms like linear regression grab the spotlight, the quality and shape of the data they’re trained on are just as critical.

    Moreover, the effort invested in cleaning, organising and making data consumable by ML models reaps significant rewards. Prepared data empowers models to learn effectively, leading to accurate predictions, reliable outputs and, ultimately, the success of the entire ML project.

    However, training data formats depend highly on the underlying ML infrastructure. Traditional ML systems are disk-based, and much of the data scientist workflow focuses on establishing best practices and manual coding procedures for handling large volumes of files.

    More recently, lakehouse-based ML systems have used a database-like feature store, and the data scientist workflow has transitioned to SQL as a first-class language. As a result, well-formed, high-quality, tabular data structures are the most consumable and convenient data format for ML systems.

    Making data consumable for GenAI

    Large language models (LLMs) − like OpenAI’s GPT-4, Anthropic’s Claude and Google AI’s LaMDA and Gemini − have been pre-trained on masses of text data and lie at the heart of GenAI.

    OpenAI’s GPT-3 model was estimated to be trained with approximately 45TB of data, exceeding 300 billion tokens. Despite this wealth of inputs, LLMs can’t answer specific questions about your business, because they don’t have access to the company’s data.

    The solution is to augment these models with your company’s own information, resulting in more correct, relevant and trustworthy AI applications.

    The method for integrating corporate data into an LLM-based application, in a safe and secure way, is called retrieval-augmented generation.

    The technique generally uses text information derived from unstructured, file-based sources, such as presentations, mail archives, text documents, PDFs, transcripts, etc. The text is then split into manageable chunks and converted into a numerical representation used by the LLM in a process known as embedding.

    These embeddings are then stored in a vector database like Chroma, Pinecone and Weviate. Interestingly, many traditional database vendors − such as PostgreSQL, Redis and SingleStoreDB − also support vectors. Moreover, cloud platforms like Databricks, Snowflake and Google BigQuery have recently added support for vectors, too.

    In conclusion, despite the transformative power of ML, plus GenAI’s explosive growth potential, data readiness remains the cornerstone of any successful AI implementation.

    The key principles I have discussed for establishing a robust and trusted data foundation combine to help your organisation to unlock the true potential of AI.



    Source link

    Post Views: 5
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Chris Anu
    • Website

    Related Posts

    Apple to rename its operating systems

    June 1, 2025

    Teksi Ride to add electric vehicle service

    June 1, 2025

    From Zoom rooms to mine shafts: how labour law defines the workplace

    May 31, 2025
    Leave A Reply Cancel Reply

    Top Posts

    Who is Duma Boko, Botswana’s new President?

    November 6, 2024

    As African Leaders Gather in Addis Ababa to Pick a New Chairperson, They are Reminded That it is Time For a Leadership That Represents True Pan-Africanism

    January 19, 2025

    BREAKING NEWS: Tapang Ivo Files Federal Lawsuit Against Nsahlai Law Firm for Defamation, Seeks $100K in Damages

    March 14, 2025

    Kamto Not Qualified for 2025 Presidential Elections on Technicality Reasons, Despite Declaration of Candidacy

    January 18, 2025
    Don't Miss

    Nigeria Launches AfCFTA Air Corridor, Boosting Trade with Three Countries

    By Ewang JohnsonJune 1, 2025

    Nigeria has established an AfCFTA air corridor to Kenya, Uganda, and South Africa, slashing export logistics costs…

    Your Poster Your Poster

    26 killed in Israeli tank fire near aid centre, medics say

    June 1, 2025

    Apple to rename its operating systems

    June 1, 2025

    WAFCON 2025 the Target as Banyana Kick Off #ThreeNations Series against Botswana

    June 1, 2025
    Stay In Touch
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    • YouTube
    • Vimeo

    Subscribe to Updates

    Sign up and get the latest breaking ABS Africa news before others get it.

    About Us
    About Us

    ABS TV, the first pan-African news channel broadcasting 24/7 from the diaspora, is a groundbreaking platform that bridges Africa with the rest of the world.

    We're accepting new partnerships right now.

    Address: 9894 Bissonette St, Houston TX. USA, 77036
    Contact: +1346-504-3666

    Facebook X (Twitter) Pinterest YouTube WhatsApp
    Our Picks

    Nigeria Launches AfCFTA Air Corridor, Boosting Trade with Three Countries

    June 1, 2025

    26 killed in Israeli tank fire near aid centre, medics say

    June 1, 2025

    Apple to rename its operating systems

    June 1, 2025
    Most Popular

    Nigeria Launches AfCFTA Air Corridor, Boosting Trade with Three Countries

    June 1, 2025

    Did Paul Biya Actually Return to Cameroon on Monday? The Suspicion Behind the Footage

    October 23, 2024

    Surrender 1.9B CFA and Get Your D.O’: Pirates Tell Cameroon Gov’t

    October 23, 2024
    Facebook X (Twitter) Instagram Pinterest YouTube
    • About Us
    • Contact Us
    • Privacy Policy
    • Terms Of Service
    © 2025 Absa Africa TV. All right reserved by absafricatv.

    Type above and press Enter to search. Press Esc to cancel.