Are you a data novice? Do you find yourself nodding along but not really understanding what people are saying? Not to worry – here are all the data-related terms you need to know.
As we gradually move (or are dragged kicking and screaming) into Industry 4.0, we can no longer afford to ‘play dumb’ when it comes to data.
Much like we taught our parents how to use Facebook, so too must you teach yourself about encryption, cloud computing and the internet of things.
But where do you start? How do you tell your algorithms from your zip drives? How do you begin to wade through the mass of data terminology that is available at the click of a button?
Let’s begin with the basics.
Data
This is information that has been converted into another form to be processed or analysed.
Big data
This refers to the vast amounts of structured and unstructured data that can come from a myriad of sources. It incorporates the ‘three Vs’: volume, variety and velocity, and can be measured in petabytes or exabytes (a hell of a lot of information, in other words). Small data can be managed more easily, tying in with the idea presented by Allen Bonde that “big data is for machines; small data is for people”.
Open data
This is content that can be freely accessed, used, edited and distributed anywhere, by anyone, at any time. The Open Definition was introduced in 2005 and promotes the spirit of interoperability, where no technical or legal barriers to this data exist.
Data warehouse
As the name suggests, this is a digital repository where businesses store their data. A hashing system may be used to make data easily searchable, so that different company departments can each other’s content. Data warehousing is the process of this storage, which is used in everyday applications such as booking flights and withdrawing cash from an ATM.
Data mart
A subset of the data warehouse, this is a store of data used by a particular group within a company, such as the sales team. In contrast to a central archive, data marts target a specific need or purpose. Data virtualisation is the management of such data.
Data mining
Companies can mine the information gathered from raw data and analyse it to better inform future business decisions. This requires complex database software such as Microsoft SQL Server to form predictive analytics. If this seems like jargon to you, a simple example lies in supermarkets, where information is garnered from customer loyalty cards to define a target market for future products.
Data centre
This is a facility containing a large number of networked computers used for storing, processing and distributing large amounts of data. It houses IT equipment such as servers, routers and firewalls, as well as necessary infrastructure for the building such as power supplies, backup generators and ventilation systems. As the focal point of critical IT operations, data centres are the beating heart of a business.
As easy as A to Z
Algorithm
A procedure, or set of rules, for solving a particular problem
Analytics
The use of maths, statistics and computer programming to discover relevant patterns in recorded information
API
Application program interface – a set of instructions on how to access and build web-based software applications
AI
Artificial intelligence – the creation of computing machines that can simulate human intelligence
BigTable
A Google data storage system that manages the company’s core services, such as Search and Maps
Biometrics
The statistical analysis of human characteristics, both physiological and behavioural
Cache
A temporary store of data, used in web browsers to save frequently accessed web pages
Carrier
A company that offers telecommunication services, such as Vodafone or BT
Cloud
In telecoms, this is the part of the network through which data passes between two points.
‘The cloud’ is also a buzzword for the internet, referring to the software and services that can be accessed online, rather than just from your computer
Cloud computing
The delivery of hosted services over the internet, which falls under three categories:
Public: Online services delivered to the general public
Private: Services made available only to a single organisation
Hybrid: A mixture of private and public cloud services for greater flexibility
Colocation
The practice of privately owned servers renting out space in a data centre
Cooling
The provision of proper ventilation to ensure data equipment and processes remain at the optimum temperature
Disaster recovery
A strategic plan that enables a business to retain or resume critical functions after a negative incident has occurred, such as a cyberattack
DDoS
A distributed denial-of-service attack is the flooding of a website with traffic, potentially causing it to crash or shut down
Distributed file system
An application to allow clients to remotely access data stored on the server
Encryption
The conversion of data into code to prevent unauthorised access. This practice has made the news in recent months, due to recent WhatsApp policies
Gigabyte/Gigabit
Giga is derived from the Greek for giant, which is apt as it equals 1bn bytes of computer data storage. A gigabit has 1bn bits of information, usually used in describing telecoms technology
GDPR
General Data Protection Regulation – a European Commission privacy regulation that will come into effect on 25 May 2018, imposing harsher penalties for non-compliance with data protection standards
Hadoop
A free Java-based program under the Apache software library that allows for the processing of large data sets across a distributed computer network
IP address
Standing for Internet Protocol, this is a number assigned to a piece of hardware, such as a computer, which identifies the sender or receiver of online information
IoT
The internet of things is the interconnected system of computer devices; everyday objects that transfer data via the internet. The industrial internet of things (IIoT) is the use of this technology in the manufacturing industry
ISP
Internet service provider – exactly what it says on the tin
Java
A popular programming language used by developers to create web content and smartphone applications
Latency
You might have guessed this one – a delay in the transfer of data. Also known as that buffering symbol that turns you into a gigantic ball of rage
Megabyte/Megabit
One megabyte equals 8 megabits. Megabytes refer to computer storage and memory, whereas megabit is used to describe internet connection speed
Metadata
Data that describes other data. This information is used by search engines to filter through documents and generate appropriate matches
Open Compute Project
A Facebook-led initiative, this is a community-based organisation that shares designs of data centre products with other members of the IT industry in a bid to improve infrastructure and boost innovation
Open source
A computer program with a source code that can be modified to suit specific needs. Open source software promotes collaborative efforts, encouraging programmers to make their own work freely available
PaaS
Platform-as-a-service – a cloud computing model that allows developers to manage online applications
PUE
Power usage effectiveness – a ratio to measure the energy efficiency of a data centre
SaaS
Software-as-a-service – a software distribution model that allows a service provider to deliver applications to a customer via the internet
Software-defined network
Network technology that enables engineers to manage network behaviour through open interfaces, controlling data traffic without touching individual switches
Source code
The core component of a computer program that is readable by humans
Storage
An electromagnetic archive. Data storage devices can be removable and connected to the computer via an input/output setting, such as a USB stick
Streaming
Made famous by Netflix, this is a technique for transferring data that supports a steady, uninterrupted stream of content, allowing for superior visual or audio quality
TCP/IP
Transmission Control Protocol/Internet Protocol – a set of rules to govern communications on the internet
Terabyte/Terabit
Heading into monster territory, a terabyte is 1trn bytes of computer storage capacity. Used in data communications, a terabit is 1trn binary digits
Tier-one carrier
An internet service provider that is the sole operator of its own network, with a direct connection to the internet and other network services
Virtualisation
The creation of a virtual model of a network, server, storage device or operating system
Zip drive
A portable device used to back up computer files. Coming a long way from the birth of the floppy disk, the world’s highest capacity USB flash drive was recently revealed at CES 2017
Updated, 11.25am, 15 February 2018: This article was updated to attribute a quote about big data to Allen Bonde.