×
☰ Menu

Classification of Data

Data is a set of facts such as descriptions, observations, and numbers used in decision-making. We can classify data as structured, unstructured, or semi-structured data.

Structured Data

• Structured data is tabular data that is represented in a database by columns and rows.
• Relational databases are those that store tables in this format.
• The mathematical term "relation" refers to a table that contains a constructed set of data.
• All rows in a table of structured data have the same set of columns.
• SQL (Structured Query Language) is a structured data programming language.

Structured data has elements that can be addressed for effective analysis. It has been structured into a database, which is a formatted repository. It refers to all data that can be recorded in a table with rows and columns in a SQL database. They have relational keys and can be mapped into pre-designed fields with ease. Today, the data are processed in the most efficient and straightforward manner possible. Relational data is an example.

Semi-structured Data
  • Semi-structured data is information that doesn’t consist of Structured data (relational database) but still has some structure to it.
  • Semi-structured data consist of documents held in JavaScript Object Notation (JSON) format. It also includes key-value stores and graph databases.

Semi-structured data is information that does not reside in a relational database but that has some organizational properties that make it easier to analyze. With some processes, you can store them in the relation database (it could be very hard for some kind of semi-structured data), but Semi-structured exist to ease space. Example: XML data.

Unstructured Data
  • • Unstructured data is information that does not have a pre-defined data model or does not organize in a pre-defined method.
    • Unstructured data is a collection of text-heavy data that may also include numbers, dates, and facts.
    • Videos, audio, and binary data files might not have a specific structure.  They've been classified as unstructured data.

Unstructured data is data that isn't arranged in a preset way or doesn't have an established data model, making it unsuitable for a traditional relational database. So there are other platforms for storing and managing unstructured data; it is becoming more common in IT systems and is utilized by businesses in a variety of business intelligence and analytics applications. Word, PDF, Text, and Media logs are just a few examples.

 

Difference between Structured, Semi-structured, and Unstructured data

 

Properties

Structured data

Semi-structured data

Unstructured data

Technology used

It is based on Relational database table

It is based on XML/RDF(Resource Description Framework).

It is based on character and binary data

Version management

Versioning over tuples,row,tables

Versioning over tuples or graph is possible

Versioned as a whole

Transaction management

Matured transaction and various concurrency techniques

Transaction is adapted from DBMS not matured

No transaction management and no concurrency

Flexibility

It is schema dependent and less flexible

It is more flexible than structured data but less flexible than unstructured data

It is more flexible and there is absence of schema

Scalability

It is very difficult to scale DB schema

It’s scaling is simpler than structured data

It is more scalable.

Robustness

Very robust

New technology, not very spread

Query performance

Structured query allow complex joining 

Queries over anonymous nodes are possible

Only textual queries are possible