Anjar Priandoyo

Catatan Setiap Hari

Posts Tagged ‘Data

Data governance: A conceptual framework, structured review, and research agenda

leave a comment »

Data governance: A conceptual framework, structured review, and research agenda
Rene Abraham, Johannes Schneider Jan vom Brocke 2019

Some paper is just very beautifully written

Written by Anjar Priandoyo

Januari 26, 2021 at 12:22 pm

Ditulis dalam Science

Tagged with ,

Data Science Jargon

leave a comment »

Data_Science_VD.png

This is very popular Data Science Venn Diagram from Drew Conway, somewhere in 2010. This is a beginning of new jargon. I still prefer Negnevitsky’s (2002) book as the best to explain this. I also found out that some jargon is shifted just like Business Intelligence become Business Analytics, while BI itself more with Vizualization and Reporting.

Written by Anjar Priandoyo

Desember 27, 2020 at 7:19 am

Ditulis dalam Science

Tagged with

Data – Roadmap

leave a comment »

I start to realize the biggest problem in the Data.

1.There is no perfect framework (DMBOK vs DCAM is no sufficient)

https://datacrossroads.nl/2018/12/02/data-management-metamodels-damadmbok2-dcam/

2.Multiperspective of Data

https://justinhay.wordpress.com/2011/05/21/edw-reference-architecture-data-integration-layer/

3.Different understanding of data

School of DB, School of Architecture

https://stackoverflow.com/questions/3873346/logical-model-versus-domain-model

https://stackoverflow.com/questions/3507671/whats-the-difference-between-data-modelling-and-domain-modelling

https://stackoverflow.com/questions/4279089/what-is-the-difference-between-logical-data-model-and-conceptual-data-model

What is the difference between logical data model and conceptual data model?  - Stack Overflow

Master Data: “A consistent and uniform set of identifiers and attributes that describe the core entities of the enterprise and are used across multiple business processes.” Gartner Data & Analytics Summit

MDM: “that describe the core entities of the enterprise” – Master data exists and should be identified within multiple data domains:
• Party – This data domain can extend to an individual customer in B2C or a business (even hierarchy) in a B2B context. In other contexts, this domain can be extended to Patients, Providers, Citizens, Suspects or any other domain that has similar attributes like name, contact information, and, most importantly, survivorship rules when one or more records are resolved into a single record that represents the truth of that entity. It can also extend to suppliers, although that can arguably be considered a distinctly different data domain, and many MDM vendors will treat it as such.

• Product – A critical data domain for commerce, catalog, and compliance efforts, product master data differs from customer data primarily in that while the volume of records may be lower, the complexity is higher, hierarchies are much more important, and collaborative authorship is key.
• Asset – This domain can extend to both physical assets, such as replacement parts in a warehouse, physical buildings, such as hotels or warehouses, or beds within a hospital, for example, or digital assets, such as images, videos or reviews that relate to another data entity like a product or a location.
• Location – A more nebulous, but no less critical data domain, location master data will usually be a geographical location that corresponds to another data domain (like an address for a retail store or location of an off-shore drilling site) or some other location reference that can be tracked, such as delivery trucks or IP addresses.

Data Domain: Big level block that used to define master data

Data – Maturity Ladder

https://datacrossroads.nl/2018/12/02/data-management-metamodels-damadmbok2-dcam/

Data warehouse sources are often operational or transactional systems. In these types of systems, the master data comes along for the ride when an event or transaction occurs, such as a change in product inventory levels or a customer making a purchase. MDM often incorporates all possible master data sources, including not only data associated with or generated by internal systems, but also external data.

Another major difference between MDM and data warehousing is that MDM focuses on providing the enterprise with a single, unified and consistent view of these key business entities by creating and maintaining their best data representations. While a data warehouse often maintains a full history of the changes to these entities, its current view represents the last update. Plus, each data warehouse update is applied to the current view without a re-assessment of how previous updates might change the best representation.

Matching and consolidating related records doesn’t typically occur in data warehousing. MDM, on the other hand, standardizes, matches and consolidates common data elements across all possible data sources for a subject area to iteratively refresh and maintain its best master record.

https://blogs.sas.com/content/datamanagement/2017/04/27/mdm-different-data-warehousing

Written by Anjar Priandoyo

November 30, 2020 at 3:48 pm

Ditulis dalam Science

Tagged with

Data – Masterdata, Reference Data and Metadata

leave a comment »


Data

Metadata (data about data)
– Data Dictionary is a detailed definition and description of data sets (tables) and their fields (columns) aka Technical Metadata
– Business Glossary/Data Glossary, is a list of business terms with their definitions. It defines business concepts for an organization or industry and is independent from any specific database or vendor aka Business Metadata

Master Data (describe entity, noun e.g customer, product, location)
– Reference data is concerned with classification and categorisation
– Master data is concerned with business entities.

Transaction Data (describe action, verb e.g buy)

ref

Written by Anjar Priandoyo

November 29, 2020 at 3:20 pm

Ditulis dalam Science

Tagged with

Data – From Process vs Architecture point of view

leave a comment »

Interesting concept. DMBOK see SDLC precede DLC – Data is sequence after Application.

However, data also can be seen as archicture (block diagram)

Data Architecture Reference Model - Dragon1

Or as as architecture model

Enterprise Data Architecture

Written by Anjar Priandoyo

November 27, 2020 at 9:53 am

Ditulis dalam Society

Tagged with

Data Modeling Tools

leave a comment »

Data Modeling

  • Data Modeling Tools: DDL (Data Definition Language). SQLyog, PHPmyAdmin
  • Lineage Tools. ETL (e.g Talend
  • Data Profiling Tools. Data profiling is the process of reviewing source data, understanding structure, content and interrelationships, and identifying potential for data projects.
  • Metadata Repository
  • Data Model Pattern
  • Industry Data Model (e.g Shared Information Data / The Information Framework SID) (e.g PPDM)

Basically an “advance” database

Written by Anjar Priandoyo

November 7, 2020 at 7:30 pm

Ditulis dalam Science

Tagged with

Data Governance Reading

leave a comment »

I read several books about DG over the weekend.
The Key Governance:
– Data Quality (CDE identification: regulatory, finance, operation) Critical Data Element
– Metadata (CRUD, retention)
– (to some degree) Data Security

(Plotkins 2014):
– datawarehousing,
– master data management,
– data quality improvement,
– system development
– information security

Major roles and responsibilities of Business Data Stewards:
– Metadata: definitions, derivations, data quality rules, creation and usage rules.
– Stewardship and ownership: what they mean and levels of decision making.
– Data quality: what it means and establishing data quality levels in context.

(Ladley 2012): More general principles

(Soares 2014)
I found its interesting that Framework for Data can be based on DMBOK domain, data domain, organization domain, process domain, CDE, or Big data, at least six of them (Soares 2014). In practical terms, at least he mention about three:
– Data quality management
– Metadata management
– Security and privacy

DMBOK Data Governance Scope
– Creating and managing core Metadata
– Documenting rules and standards
– Managing data quality issues
– Executing operational data governance activities

Data quality management
Identify critical data elements.
Define business rules based on critical data elements.
Resolve data issues uncovered by data profiling.

Metadata management
Add and modify the definitions for business terms.
Associate business rules with business terms.
Associate reference data with business terms.
Associate business terms with table and column names.

Master data management
Identify matching attributes.
Create match rules.
Resolve duplicate suspects.
Add and modify hierarchies and groupings.

Reference data management (RDM)
Add, modify, and delete code values and code tables.
Map code values between code tables in different applications.

Security and privacy
Define sensitive data.
Flag sensitive data in the metadata repository.
Validate access by users to key systems.
Revalidate access by users to key systems.
Provide input into the acceptable use of data based on business needs, regulatory
compliance, and brand reputation.

Data quality management
Metadata management
Master data management
Reference data management (RDM)
Security and privacy

Written by Anjar Priandoyo

September 19, 2020 at 12:25 pm

Ditulis dalam Science

Tagged with

Data Management

leave a comment »

So many standards:

  • CRISP DM – Data Mining
  • DCAM® (the Data Management Capability Assessment Model) (v1 2014, v2 May 2019) https://edmcouncil.org/
  • ARMA – Generally Accepted Recordkeeping Principles
  • DAMA/DMBOK

Written by Anjar Priandoyo

September 14, 2020 at 3:48 pm

Ditulis dalam Science

Tagged with

Data Analytics

leave a comment »

Data Analytics is the new IT, interesting  some company even claim they know about data analytics simply by using BI tools.

  1. https://powerbi.microsoft.com/en-us/downloads/ – Power BI Desktop
  2. https://www.tableau.com/en-gb/products/trial – Tableau Desktop
  3. https://www.qlik.com/us/trial/qlik-sense-business – Qlik Sense
  4. https://www.alteryx.com/designer-trial/free-trial – Alteryx Designer

Written by Anjar Priandoyo

Agustus 13, 2020 at 3:21 pm

Ditulis dalam Science

Tagged with

Data Framework

leave a comment »

Interesting, there are many options that can be choose for data management framework

 

DAMA-DMBOK by DAMA international and DCAM by Enterprise Data Management Council ref, ref

Written by Anjar Priandoyo

Juni 8, 2020 at 4:10 pm

Ditulis dalam Science

Tagged with