Data for Dummies: The essential data glossary

15 Feb 2017

Image: DONOT6_STUDIO/Shutterstock

Are you a data novice? Do you find yourself nodding along but not really understanding what people are saying? Not to worry – here are all the data-related terms you need to know.

As we gradually move (or are dragged kicking and screaming) into Industry 4.0, we can no longer afford to ‘play dumb’ when it comes to data.

Much like we taught our parents how to use Facebook, so too must you teach yourself about encryption, cloud computing and the internet of things.

But where do you start? How do you tell your algorithms from your zip drives? How do you begin to wade through the mass of data terminology that is available at the click of a button?

Let’s begin with the basics.

Data

This is information that has been converted into another form to be processed or analysed.

Big data

This refers to the vast amounts of structured and unstructured data that can come from a myriad of sources. It incorporates the ‘three Vs’: volume, variety and velocity, and can be measured in petabytes or exabytes (a hell of a lot of information, in other words). Small data can be managed more easily, tying in with the idea presented by Allen Bonde that “big data is for machines; small data is for people”.

Open data

This is content that can be freely accessed, used, edited and distributed anywhere, by anyone, at any time. The Open Definition was introduced in 2005 and promotes the spirit of interoperability, where no technical or legal barriers to this data exist.

Data warehouse

As the name suggests, this is a digital repository where businesses store their data. A hashing system may be used to make data easily searchable, so that different company departments can each other’s content. Data warehousing is the process of this storage, which is used in everyday applications such as booking flights and withdrawing cash from an ATM.

Data mart

A subset of the data warehouse, this is a store of data used by a particular group within a company, such as the sales team. In contrast to a central archive, data marts target a specific need or purpose. Data virtualisation is the management of such data.

Data mining

Companies can mine the information gathered from raw data and analyse it to better inform future business decisions. This requires complex database software such as Microsoft SQL Server to form predictive analytics. If this seems like jargon to you, a simple example lies in supermarkets, where information is garnered from customer loyalty cards to define a target market for future products.

Data centre

This is a facility containing a large number of networked computers used for storing, processing and distributing large amounts of data. It houses IT equipment such as servers, routers and firewalls, as well as necessary infrastructure for the building such as power supplies, backup generators and ventilation systems. As the focal point of critical IT operations, data centres are the beating heart of a business.

As easy as A to Z

robot-reading

Image: kirill_makarov/Shutterstock

Algorithm

A procedure, or set of rules, for solving a particular problem

Analytics

The use of maths, statistics and computer programming to discover relevant patterns in recorded information

API

Application program interface – a set of instructions on how to access and build web-based software applications

AI

Artificial intelligence – the creation of computing machines that can simulate human intelligence

BigTable

A Google data storage system that manages the company’s core services, such as Search and Maps

Biometrics

The statistical analysis of human characteristics, both physiological and behavioural

Cache

A temporary store of data, used in web browsers to save frequently accessed web pages

Carrier

A company that offers telecommunication services, such as Vodafone or BT 

telecoms

Image: Dmitriy Karelin/Shutterstock

Cloud

In telecoms, this is the part of the network through which data passes between two points.

‘The cloud’ is also a buzzword for the internet, referring to the software and services that can be accessed online, rather than just from your computer 

Cloud computing

The delivery of hosted services over the internet, which falls under three categories:

Public: Online services delivered to the general public

Private: Services made available only to a single organisation

Hybrid: A mixture of private and public cloud services for greater flexibility

Colocation

The practice of privately owned servers renting out space in a data centre

Cooling

The provision of proper ventilation to ensure data equipment and processes remain at the optimum temperature

Disaster recovery

A strategic plan that enables a business to retain or resume critical functions after a negative incident has occurred, such as a cyberattack

DDoS

A distributed denial-of-service attack is the flooding of a website with traffic, potentially causing it to crash or shut down

Distributed file system

An application to allow clients to remotely access data stored on the server

Encryption

The conversion of data into code to prevent unauthorised access. This practice has made the news in recent months, due to recent WhatsApp policies

WhatsApp

Image: SumanBhaumik/Shutterstock

Gigabyte/Gigabit

Giga is derived from the Greek for giant, which is apt as it equals 1bn bytes of computer data storage. A gigabit has 1bn bits of information, usually used in describing telecoms technology

GDPR

General Data Protection Regulation – a European Commission privacy regulation that will come into effect on 25 May 2018, imposing harsher penalties for non-compliance with data protection standards

Hadoop

A free Java-based program under the Apache software library that allows for the processing of large data sets across a distributed computer network

IP address

Standing for Internet Protocol, this is a number assigned to a piece of hardware, such as a computer, which identifies the sender or receiver of online information

IoT

The internet of things is the interconnected system of computer devices; everyday objects that transfer data via the internet. The industrial internet of things (IIoT) is the use of this technology in the manufacturing industry

ISP

Internet service provider – exactly what it says on the tin

Java

A popular programming language used by developers to create web content and smartphone applications

Latency

You might have guessed this one – a delay in the transfer of data. Also known as that buffering symbol that turns you into a gigantic ball of rage

Megabyte/Megabit

One megabyte equals 8 megabits. Megabytes refer to computer storage and memory, whereas megabit is used to describe internet connection speed

Metadata

Data that describes other data. This information is used by search engines to filter through documents and generate appropriate matches 

search-google

Image: karlstury/Shutterstock

Open Compute Project

A Facebook-led initiative, this is a community-based organisation that shares designs of data centre products with other members of the IT industry in a bid to improve infrastructure and boost innovation

Open source

A computer program with a source code that can be modified to suit specific needs. Open source software promotes collaborative efforts, encouraging programmers to make their own work freely available 

PaaS

Platform-as-a-service – a cloud computing model that allows developers to manage online applications

PUE

Power usage effectiveness – a ratio to measure the energy efficiency of a data centre

SaaS

Software-as-a-service – a software distribution model that allows a service provider to deliver applications to a customer via the internet

Software-defined network

Network technology that enables engineers to manage network behaviour through open interfaces, controlling data traffic without touching individual switches 

Source code

The core component of a computer program that is readable by humans

Storage

An electromagnetic archive. Data storage devices can be removable and connected to the computer via an input/output setting, such as a USB stick

USB

Image: Andrii Zastrozhnov/Shutterstock

Streaming

Made famous by Netflix, this is a technique for transferring data that supports a steady, uninterrupted stream of content, allowing for superior visual or audio quality

TCP/IP

Transmission Control Protocol/Internet Protocol – a set of rules to govern communications on the internet

Terabyte/Terabit

Heading into monster territory, a terabyte is 1trn bytes of computer storage capacity. Used in data communications, a terabit is 1trn binary digits

Tier-one carrier

An internet service provider that is the sole operator of its own network, with a direct connection to the internet and other network services

Virtualisation

The creation of a virtual model of a network, server, storage device or operating system

Zip drive

A portable device used to back up computer files. Coming a long way from the birth of the floppy disk, the world’s highest capacity USB flash drive was recently revealed at CES 2017

Updated, 11.25am, 15 February 2018: This article was updated to attribute a quote about big data to Allen Bonde.

Shelly Madden was sub-editor of Silicon Republic

editorial@siliconrepublic.com