What are databases?
Definitions, types, and examples of databases
What are databases?
At its most basic definition, a database is any collection of interrelated information. When you write a grocery list on a piece of paper, you're creating a small analog database. But what is a database in computer science? In that context, you define "database" as a collection of information that's stored as data on a computer system, such as the inventory at your local grocery shop.
What are databases used for?
Databases are used to store and organize data so that it's easier to manage and access. As a collection of data grows and takes on more complexity, it becomes more difficult to keep that data organized, accessible, and secure. To help with that, you use database management systems (DBMS), which include a layer of management tools.
What is data?
Data refers to any information that's captured and stored about a single person, place, thing, or object—called an entity—as well as the attributes of that entity.
For example, if you're capturing and storing information about local restaurants, each restaurant is one entity, and its name, address, and business hours are attributes of that entity. All of the information that you collect and store about your favorite restaurants is data.
Types of databases
Databases are broadly grouped into relational and non-relational databases. Relational databases are highly structured and understand a programming language called Structured Query Language (SQL). Non-relational databases are highly diverse, supporting a variety of data structures. Since many non-relational databases do not use SQL, they are often called NoSQL databases.
Types of data structures
Table structures are relational database structures that organize data into rows and columns—rows contain entities while columns contain entity attributes. Wide tables, or wide column stores, use sparse columns with empty attributes to greatly increase the total number of columns that you can have in the table. Because some spaces are empty, wide tables are an example of a non-relational database structure.
Linear structures organize elements into a sequence.
Tree structures organize elements into a hierarchical database of nodes in parent–child relationships that stem from one root node.
Graph structures organize elements into a non-hierarchical network of nodes with complex relationships to each other.
Hash-based structures map keys to values using hash functions that associate related data by assigning indices to hash tables.
Document-oriented databases organize quantities of information about an entity into a single object (the document), which is separate from other objects. Objects do not need to be mapped in relation to each other, and a single can be edited without impacting other objects.
In a relational database—the most common type—data is organized into tables that hold information about each entity and represent pre-defined categories through rows and columns. This structured data is both efficient and flexible to access.
Non-relational databases, store unstructured or semi-structured data. They don't use tables with columns and rows the way that relational databases do. Instead, they use a storage model that's optimized for the specific requirements of the type of data being stored. Non-relational databases allow for larger sets of distributed data to be accessed, updated, and analyzed quickly.
Some non-relational databases are referred to as NoSQL databases. NoSQL refers to data stores that use no SQL or not only SQL for queries. Instead, NoSQL databases use other programming languages and constructs to query the data. Many NoSQL databases do support SQL-compatible queries, but the way that they execute these queries is usually different from the way a traditional relational database would execute the same SQL query.
One type of non-relational database—an object database—uses object-oriented programming. Objects are encoded with a state (factual data) that's stored in a field or variable and a behavior that's displayed through a method or function. Objects can be held in persistent storage forever and read and mapped directly without an API or tool, which yields faster access to data and better performance. However, object databases aren't as popular as other database types and can be challenging to support.
In-memory databases and caches
All of the data in an in-memory database is stored in a computer's random-access memory (RAM). When you query or update this type of database, you access the main memory directly. There's no disk involved. Data loads quickly because accessing main memory (which is near the processor on the motherboard) is a lot faster than accessing a disk.
In-memory databases are commonly used to store copies of frequently accessed information like pricing or inventory data. This is known as caching. When you cache data, you store a copy of it in a temporary location so that it loads faster the next time it's requested. Learn more about caching.
Databases might seem like invisible mysteries, but most of us interact with them every day. Here are some common examples of relational databases, NoSQL databases, and in-memory databases:
Banks use databases to keep track of customer transactions—everything from balance inquiries to transfers between accounts. These transactions need to happen almost instantly, and the data from huge amounts of transactions must always be up to date. For these purposes, banks use online transactional processing systems that are built with relational databases, which can handle a large number of customers, frequent data changes from transactions, and fast response times.
If you have an e-commerce website, your catalog includes individual products each with their own variety of attributes. A document-oriented database—an example of a non-relational database—uses individual documents to describe all the attributes of a single product. You can change the attributes in the document without impacting any of your other products. In-memory databases are often used to cache frequently accessed e-commerce data like inventory and pricing to speed up data retrieval and lower the load on the database.
When you join a social network, your information is added to a non-relational database of everyone who uses that network. When you connect with other people in that network, you become part of a social graph. This is why you're able to see a filtered list of your friends or professional connections and discover new people who those friends and connections know.
Non-relational databases drive online personalization which has become so prevalent that you might not even notice it. If you book a flight through a travel website, you'll also see options to book hotels and rental cars. The website's database contains a wealth of unstructured information—your flight details, travel preferences, previous car or hotel bookings— that are used to serve you with personalized suggestions to save you time, money, or effort. In-memory databases, likewise, are used as a session store to efficiently hold temporary user data, such as search preferences or shopping cart, while using the application.
When organizations want to draw insights from their own data, relational databases help them manage their analytics. A technology help desk, for example, might track customer issues in a variety of dimensions including issue type, time to resolve the issue, and customer satisfaction. A relational database using a table structure organizes customer issue data using just two dimensions at a time—but with an online analytical processing system, the help desk can look at more than one table at a time, allowing for multidimensional analysis to process large amounts of data at high speeds.
Database management systems
Database administrators use database management systems (DBMS) to control data—especially when they're working with big data. Big data refers to large volumes of structured and unstructured data that's often received by the system in real time or almost real time. A DBMS also helps to manage data that's used across multiple applications, or data that resides in multiple locations.
Different management systems offer different levels of organization, scalability, and application. In addition to the type of data that you want to organize and how you want to access it, the DBMS that you use also depends on where your data resides, the type of architecture that your database uses, and how you plan to scale.
Is your data on-premises, in the cloud, or both?
In on-premises databases, data resides on private onsite hardware (often called a private cloud). To add data capacity, database administers need to either ensure that the onsite servers have enough space available or expand their infrastructure with new hardware to create space.
In cloud-based databases, structured or unstructured data resides on a private, public, or hybrid cloud computing platform (i.e. a platform that combines private and public cloud storage). Because cloud databases are designed for a virtualized environment, they're both highly scalable and available. They also help to lower costs, because you don't need to buy as much hardware and you pay only for the storage that you use.
Is your database architecture centralized, distributed, or federated?
In a centralized database, all of the data resides in one system, in one place. This one system is the access point for all users.
A distributed database can span both relational and non-relational database types. In distributed databases, the data is stored across multiple physical locations, either on multiple on-premises computers or dispersed across a network of interconnected computers.
In a federated database, several distinct databases that run on independent servers are unified into one large object. A blockchain is a type of federated database that's used to securely manage financial ledgers and other transaction records.
Will you grow with your data by scaling up or scaling out?
Scaling up (or down), also called vertical scaling, is the process of adding resources, such as memory or more powerful CPUs, to an existing server.
Scaling out (or in), also called horizontal scaling, adds more machines to your pool of resources.
Scaling horizontally instead of scaling vertically extends the lifecycle of existing hardware, frees you to upgrade without vendor lock-in, reduces cost, and creates long-term potential for flexibility. Learn more about the difference between scaling up and scaling out.
Simplify your data operations with fully managed databases that automate scalability, availability, and security. Choose from relational, NoSQL, and in-memory databases that span proprietary and open-source engines.
Get to know the Azure SQL family of databases
Unify your SQL portfolio without sacrificing compatibility. Migrate, modernize, and deploy applications your way, from edge to cloud, using familiar SQL Server technology.
Scale fearlessly with Azure Database for PostgreSQL
Azure Database for PostgreSQL helps you scale your workload quickly and confidently with high availability, AI–powered performance optimization, and advanced security.Learn more about Azure PostreSQL
Build high-performance apps with Azure Cosmos DB
Azure Cosmos DB is a fully managed NoSQL database with open APIs and guaranteed speed at any scale.Learn more about Azure Cosmos DB
Handle high traffic efficiently with Azure Cache for Redis
Azure Cache for Redis helps you handle thousands of simultaneous users with near-instant speed by adding a quick-caching layer to your app's architecture.Learn more about Azure Cache for Redis